2011-03-31 14:12:53

by Zhang, Yang Z

[permalink] [raw]
Subject: [0/7,v10] NUMA Hotplug Emulator (v10)

* PATCHSET INTRODUCTION

patch 1: Documentation.
patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
as being possible for memory hotplug.

patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface

patch 4: Abstract cpu register functions, make these interface friend for cpu
hotplug emulation
patch 5: Support cpu probe/release in x86, it provide a software method to hot
add/remove cpu with sysfs interface.
patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
domain to build the incorrect hierarchy.
patch 7: Implement per-node add_memory debugfs interface

* FEEDBACKDS & RESPONSES

v10:
rebase the patches against 2.6.38-rc8

v9:

Solve the bug reported by Eric B Munson, check the return value of cpu_down when do
CPU release.

Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5 based on his
patch.

Some small changes on debugfs per-node add_memory interface.

v8:

Reconsider David's proposal, accept the per-node add_memory interface on debugfs.
(p7).

v7:

David: We don't need two different interfaces, one in sysfs and one in debugfs,
to hotplug memory.
Response: We use the debugfs for memory hotplug emulation only, for sysfs memory probe
interface, we did not do any modifications, so we remove original patch 7
from patchset.
David: Suggest new probe files in debugfs for each online node:
/sys/kernel/debug/node_hotplug/add_node (already exists)
/sys/kernel/debug/node_hotplug/node0/add_memory
/sys/kernel/debug/node_hotplug/node1/add_memory

Response: We need not make a simple thing such complicated, We'd prefer to
rename the node_hotplug/probe interface as node_hotplug/add_memory.
/sys/kernel/debug/node_hotplug/add_node (already exists)
/sys/kernel/debug/node_hotplug/add_memory (rename probe as add_memory)

v6:

Greg KH: Suggest to use interface node_hotplug/add_node
David: Agree with Greg's suggestion
Response: We move the interface from node/add_node to node_hotplug/add_node, and we also move
memory/probe interface to node_hotplug/probe since both are related to memory hotplug.

Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to patch 1/8.
Response: Move patch 8/8 to patch 1/8, and we will include the full description in 0/8 when
we send patches in future.


v5:

David: Suggests to use a flexible method to to do node hotplug emulation. After
review our 2 versions emulator implemetations, David provides a better solution
to solve both the flexibility and memory wasting issue.

Add numa=possible=<N> command line option, provide sysfs inteface
/sys/devices/system/node/add_node interface, and move the inteface to debugfs
/sys/kernel/debug/hotplug/add_node after hearing the voice from community.

Greg KH: move the interface from hotplug/add_node to node/add_node

Response: Accept David's node=possible=<n> command line options. After talking
with David, he agree to add his patch to our patchset, thanks David's solution(patch 1).

David's original interface /sys/kernel/debug/hotplug/add_node is not so clear for
node hotplug emulation, we accept Greg's suggestion, move the interface to ndoe/add_node
(patch 2)

Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us to use configfs replace
sysfs. After Dave knows that it is just for test purpose, Dave thinks debugfs should
be the best.

Response: memory probe sysfs interface already exists, I'd like to still keep it, and extend it
to support memory add on a specified node(patch 6).

We accepts Dave's suggestion, implement memory probe interface with debugfs(patch 7).

Randy Dunlap: Correct many grammatical errors in our documentation(patch 8).

Response: Thanks for Randy's careful review, we already correct them.

v4:

Split CPU hotplug emulation code since David has send a patchset for node hotplug emulation.

v3 & v2:

1) Patch 0
Balbir & Greg: Suggest to use tool git/quilt to manage/send the patchset.
Response: Thanks for the recommendation, With help from Fengguang, I get quilt
working, it is a great tool.

2) Patch 2
Jaswinder Singh: if (hidden_num) is not required in patch 2
Response: good catching, it is removed in v2.


3) Patch 3
Dave Hansen: Suggest to create a dedicated sysfs file for each possible node.
Greg: How big would this "list" be? What will it look like exactly?
Haicheng: It should follow "one value per file". It intends to show acceptable
parameters.

For example, if we have 4 fake offlined nodes, like node 2-5, then:
$ cat /sys/devices/system/node/probe
2-5

Then user hotadds node3 to system:
$ echo 3 > /sys/devices/system/node/probe
$ cat /sys/devices/system/node/probe
2,4-5

Greg: As you are trying to add a new sysfs file, please create the matching
Documentation/ABI/ file as well.
Response: We miss it, and we already add it in v2.

Patch 4 & 5:
Paul Mundt: This looks like an incredibly painful interface. How about scrapping all
of this _emu() mess and just reworking the register_cpu() interface?
Response: accept Paul's suggestion, and remove the cpu _emu functions.

Patch 7:
Dave Hansen: If we're going to put multiple values into the file now and
add to the ABI, can we be more explicit about it?
echo "physical_address=0x40000000 numa_node=3" > memory/probe
Response: Dave's new interface was accpeted, and more we still keep the old
format for compatibility. We documented the these interfaces into
Documentation/ABI in v2.
Greg: suggest to use configfs replace for the memory probe interface
Andi: This is a debugging interface. It doesn't need to have the
most pretty interface in the world, because it will be only used for
QA by a few people. it's just a QA interface, not the next generation
of POSIX.
Response: We still keep it as sysfs interface since node/cpu/memory probe interface
are all in sysfs, we can create another group of patches to support
configfs if we have this strong requirement in future.

v1:

the RFC version for NUMA Hotplug Emulator.

* WHAT IS HOTPLUG EMULATOR

NUMA hotplug emulator is collectively named for the hotplug emulation
it is able to emulate NUMA Node Hotplug thru a pure software way. It
intends to help people easily debug and test node/cpu/memory hotplug
related stuff on a none-NUMA-hotplug-support machine, even an UMA machine.

The emulator provides mechanism to emulate the process of physcial cpu/mem
hotadd, it provides possibility to debug CPU and memory hotplug on the machines
without NUMA support for kenrel developers. It offers an interface for cpu
and memory hotplug test purpose.

* WHY DO WE USE HOTPLUG EMULATOR

We are focusing on the hotplug emualation for a few months. The emualor helps
team to reproduce all the major hotplug bugs. It plays an important role to
the hotplug code quality assuirance. Because of the hotplug emulator, we already
move most of the debug working to virtual evironment.

* Principles & Usages

NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug emulation.

1) Node hotplug emulation:

Adds a numa=possible=<N> command line option to set an additional N nodes as
being possible for memory hotplug. This set of possible nodes control
nr_node_ids and the sizes of several dynamically allocated node arrays.

This allows memory hotplug to create new nodes for newly added memory
rather than binding it to existing nodes.

For emulation on x86, it would be possible to set aside memory for hotplugged
nodes (say, anything above 2G) and to add an additional four nodes as being
possible on boot with

mem=2G numa=possible=4

and then creating a new 128M node at runtime:

# echo 128M@0x80000000 > /sys/kernel/debug/node_hotplug/add_node
On node 1 totalpages: 0
init_memory_mapping: 0000000080000000-0000000088000000
0080000000 - 0088000000 page 2M

Once the new node has been added, its memory can be onlined. If this
memory represents memory section 16, for example:

# echo online > /sys/devices/system/memory/memory16/state
Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
Policy zone: Normal
[ The memory section(s) mapped to a particular node are visible via
/sys/devices/system/node_hotplug/node1, in this example. ]

2) CPU hotplug emulation:

The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
hot-add/hot-remove in software method.

When hotplug a CPU with emulator, we are using a logical CPU to emulate the CPU
hotplug process. For the CPU supported SMT, some logical CPUs are in the same
socket, but it may located in different NUMA node after we have emulator. We
put the logical CPU into a fake CPU socket, and assign it an unique
phys_proc_id. For the fake socket, we put one logical CPU in only.

- to hide CPUs
- Using boot option "maxcpus=N" hide CPUs
N is the number of initialize CPUs
- Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
when cpu_hpe is enabled, the rest CPUs will not be initialized

- to hot-add CPU to node
# echo node_id > cpu/probe

- to hot-remove CPU
# echo cpu_id > cpu/release

3) Memory hotplug emulation:

The emulator reserves memory before OS boots, the reserved memory region is
removed from e820 table. Each online node has an add_memory interface, and
memory can be hot-added via the per-ndoe add_memory debugfs interface.

- reserve memory thru a kernel boot paramter
mem=1024m

- add a memory section to node 3
# echo 0x40000000 > node_hotplug/node3/add_memory

* ACKNOWLEDGMENT

NUMA Hotplug Emulator includes a team's efforts, thanks all of them.
They are:
Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes,
Yang Zhang and Yongkang You
---
best regards
yang


2011-04-13 18:57:56

by David Rientjes

[permalink] [raw]
Subject: Re: [0/7,v10] NUMA Hotplug Emulator (v10)

On Thu, 31 Mar 2011, Zhang, Yang Z wrote:

> * PATCHSET INTRODUCTION
>
> patch 1: Documentation.
> patch 2: Adds a numa=possible=<N> command line option to set an additional N nodes
> as being possible for memory hotplug.
>
> patch 3: Add node hotplug emulation, introduce debugfs node/add_node interface
>
> patch 4: Abstract cpu register functions, make these interface friend for cpu
> hotplug emulation
> patch 5: Support cpu probe/release in x86, it provide a software method to hot
> add/remove cpu with sysfs interface.
> patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
> domain to build the incorrect hierarchy.
> patch 7: Implement per-node add_memory debugfs interface
>

I think it would probably be better to separate these into the x86 support
(patch 2, part of 3, 4, 5, and patch 7) and the generic memory hotplug,
cpu, and documentation (patch 1, most of 3, wiring up in 4 and 5, patch
7). In each changelog you can talk about the entire scope of the feature
and what each patch is leading to, but this separation will help to get it
merged first by the x86 maintainers and then the generic bits through the
-mm tree.

Thanks for following up on this!

2011-04-15 14:04:35

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [0/7,v10] NUMA Hotplug Emulator (v10)

Any comments for those patches?

best regards
yang


> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Zhang, Yang Z
> Sent: Thursday, March 31, 2011 10:13 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> Kleen, Andi; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; Li, Xin
> Subject: [0/7,v10] NUMA Hotplug Emulator (v10)
>
> * PATCHSET INTRODUCTION
>
> patch 1: Documentation.
> patch 2: Adds a numa=possible=<N> command line option to set an additional
> N nodes
> as being possible for memory hotplug.
>
> patch 3: Add node hotplug emulation, introduce debugfs node/add_node
> interface
>
> patch 4: Abstract cpu register functions, make these interface friend for cpu
> hotplug emulation
> patch 5: Support cpu probe/release in x86, it provide a software method to hot
> add/remove cpu with sysfs interface.
> patch 6: Fake CPU socket with logical CPU on x86, to prevent the scheduling
> domain to build the incorrect hierarchy.
> patch 7: Implement per-node add_memory debugfs interface
>
> * FEEDBACKDS & RESPONSES
>
> v10:
> rebase the patches against 2.6.38-rc8
>
> v9:
>
> Solve the bug reported by Eric B Munson, check the return value of cpu_down
> when do
> CPU release.
>
> Solve the conflicts with Tejun Heo' Unificaton NUMA code, re-work patch 5
> based on his
> patch.
>
> Some small changes on debugfs per-node add_memory interface.
>
> v8:
>
> Reconsider David's proposal, accept the per-node add_memory interface on
> debugfs.
> (p7).
>
> v7:
>
> David: We don't need two different interfaces, one in sysfs and one in
> debugfs,
> to hotplug memory.
> Response: We use the debugfs for memory hotplug emulation only, for sysfs
> memory probe
> interface, we did not do any modifications, so we remove original
> patch 7
> from patchset.
> David: Suggest new probe files in debugfs for each online node:
> /sys/kernel/debug/node_hotplug/add_node
> (already exists)
>
> /sys/kernel/debug/node_hotplug/node0/add_memory
>
> /sys/kernel/debug/node_hotplug/node1/add_memory
>
> Response: We need not make a simple thing such complicated, We'd prefer to
> rename the node_hotplug/probe interface as
> node_hotplug/add_memory.
> /sys/kernel/debug/node_hotplug/add_node
> (already exists)
> /sys/kernel/debug/node_hotplug/add_memory
> (rename probe as add_memory)
>
> v6:
>
> Greg KH: Suggest to use interface node_hotplug/add_node
> David: Agree with Greg's suggestion
> Response: We move the interface from node/add_node to
> node_hotplug/add_node, and we also move
> memory/probe interface to node_hotplug/probe since both are
> related to memory hotplug.
>
> Kletnieks Valdis: suggest to renumber the patch serie, and move patch 8/8 to
> patch 1/8.
> Response: Move patch 8/8 to patch 1/8, and we will include the full description
> in 0/8 when
> we send patches in future.
>
>
> v5:
>
> David: Suggests to use a flexible method to to do node hotplug emulation. After
> review our 2 versions emulator implemetations, David provides a
> better solution
> to solve both the flexibility and memory wasting issue.
>
> Add numa=possible=<N> command line option, provide sysfs
> inteface
> /sys/devices/system/node/add_node interface, and move the
> inteface to debugfs
> /sys/kernel/debug/hotplug/add_node after hearing the voice
> from community.
>
> Greg KH: move the interface from hotplug/add_node to node/add_node
>
> Response: Accept David's node=possible=<n> command line options. After
> talking
> with David, he agree to add his patch to our patchset, thanks David's
> solution(patch 1).
>
> David's original interface /sys/kernel/debug/hotplug/add_node is
> not so clear for
> node hotplug emulation, we accept Greg's suggestion, move the
> interface to ndoe/add_node
> (patch 2)
>
> Dave Hansen: For memory hotplug, Dave reminds Greg KH's advice, suggest us
> to use configfs replace
> sysfs. After Dave knows that it is just for test purpose, Dave thinks
> debugfs should
> be the best.
>
> Response: memory probe sysfs interface already exists, I'd like to still keep it,
> and extend it
> to support memory add on a specified node(patch 6).
>
> We accepts Dave's suggestion, implement memory probe
> interface with debugfs(patch 7).
>
> Randy Dunlap: Correct many grammatical errors in our documentation(patch
> 8).
>
> Response: Thanks for Randy's careful review, we already correct them.
>
> v4:
>
> Split CPU hotplug emulation code since David has send a patchset for node
> hotplug emulation.
>
> v3 & v2:
>
> 1) Patch 0
> Balbir & Greg: Suggest to use tool git/quilt to manage/send the patchset.
> Response: Thanks for the recommendation, With help from Fengguang, I get
> quilt
> working, it is a great tool.
>
> 2) Patch 2
> Jaswinder Singh: if (hidden_num) is not required in patch 2
> Response: good catching, it is removed in v2.
>
>
> 3) Patch 3
> Dave Hansen: Suggest to create a dedicated sysfs file for each possible node.
> Greg: How big would this "list" be? What will it look like exactly?
> Haicheng: It should follow "one value per file". It intends to show acceptable
> parameters.
>
> For example, if we have 4 fake offlined nodes, like node
> 2-5, then:
> $ cat /sys/devices/system/node/probe
> 2-5
>
> Then user hotadds node3 to system:
> $ echo 3 > /sys/devices/system/node/probe
> $ cat /sys/devices/system/node/probe
> 2,4-5
>
> Greg: As you are trying to add a new sysfs file, please create the matching
> Documentation/ABI/ file as well.
> Response: We miss it, and we already add it in v2.
>
> Patch 4 & 5:
> Paul Mundt: This looks like an incredibly painful interface. How about scrapping
> all
> of this _emu() mess and just reworking the register_cpu() interface?
> Response: accept Paul's suggestion, and remove the cpu _emu functions.
>
> Patch 7:
> Dave Hansen: If we're going to put multiple values into the file now and
> add to the ABI, can we be more explicit about it?
> echo "physical_address=0x40000000 numa_node=3" >
> memory/probe
> Response: Dave's new interface was accpeted, and more we still keep the old
> format for compatibility. We documented the these interfaces
> into
> Documentation/ABI in v2.
> Greg: suggest to use configfs replace for the memory probe interface
> Andi: This is a debugging interface. It doesn't need to have the
> most pretty interface in the world, because it will be only
> used for
> QA by a few people. it's just a QA interface, not the next
> generation
> of POSIX.
> Response: We still keep it as sysfs interface since node/cpu/memory probe
> interface
> are all in sysfs, we can create another group of patches
> to support
> configfs if we have this strong requirement in future.
>
> v1:
>
> the RFC version for NUMA Hotplug Emulator.
>
> * WHAT IS HOTPLUG EMULATOR
>
> NUMA hotplug emulator is collectively named for the hotplug emulation
> it is able to emulate NUMA Node Hotplug thru a pure software way. It
> intends to help people easily debug and test node/cpu/memory hotplug
> related stuff on a none-NUMA-hotplug-support machine, even an UMA
> machine.
>
> The emulator provides mechanism to emulate the process of physcial cpu/mem
> hotadd, it provides possibility to debug CPU and memory hotplug on the
> machines
> without NUMA support for kenrel developers. It offers an interface for cpu
> and memory hotplug test purpose.
>
> * WHY DO WE USE HOTPLUG EMULATOR
>
> We are focusing on the hotplug emualation for a few months. The emualor
> helps
> team to reproduce all the major hotplug bugs. It plays an important role to
> the hotplug code quality assuirance. Because of the hotplug emulator, we
> already
> move most of the debug working to virtual evironment.
>
> * Principles & Usages
>
> NUMA hotplug emulator include 3 different parts: node/CPU/memory hotplug
> emulation.
>
> 1) Node hotplug emulation:
>
> Adds a numa=possible=<N> command line option to set an additional N nodes
> as
> being possible for memory hotplug. This set of possible nodes control
> nr_node_ids and the sizes of several dynamically allocated node arrays.
>
> This allows memory hotplug to create new nodes for newly added memory
> rather than binding it to existing nodes.
>
> For emulation on x86, it would be possible to set aside memory for hotplugged
> nodes (say, anything above 2G) and to add an additional four nodes as being
> possible on boot with
>
> mem=2G numa=possible=4
>
> and then creating a new 128M node at runtime:
>
> # echo 128M@0x80000000 >
> /sys/kernel/debug/node_hotplug/add_node
> On node 1 totalpages: 0
> init_memory_mapping: 0000000080000000-0000000088000000
> 0080000000 - 0088000000 page 2M
>
> Once the new node has been added, its memory can be onlined. If this
> memory represents memory section 16, for example:
>
> # echo online > /sys/devices/system/memory/memory16/state
> Built 2 zonelists in Node order, mobility grouping on. Total pages:
> 514846
> Policy zone: Normal
> [ The memory section(s) mapped to a particular node are visible via
> /sys/devices/system/node_hotplug/node1, in this example. ]
>
> 2) CPU hotplug emulation:
>
> The emulator reserve CPUs throu grub parameter, the reserved CPUs can be
> hot-add/hot-remove in software method.
>
> When hotplug a CPU with emulator, we are using a logical CPU to emulate the
> CPU
> hotplug process. For the CPU supported SMT, some logical CPUs are in the
> same
> socket, but it may located in different NUMA node after we have emulator.
> We
> put the logical CPU into a fake CPU socket, and assign it an unique
> phys_proc_id. For the fake socket, we put one logical CPU in only.
>
> - to hide CPUs
> - Using boot option "maxcpus=N" hide CPUs
> N is the number of initialize CPUs
> - Using boot option "cpu_hpe=on" to enable cpu hotplug emulation
> when cpu_hpe is enabled, the rest CPUs will not be initialized
>
> - to hot-add CPU to node
> # echo node_id > cpu/probe
>
> - to hot-remove CPU
> # echo cpu_id > cpu/release
>
> 3) Memory hotplug emulation:
>
> The emulator reserves memory before OS boots, the reserved memory region
> is
> removed from e820 table. Each online node has an add_memory interface, and
> memory can be hot-added via the per-ndoe add_memory debugfs interface.
>
> - reserve memory thru a kernel boot paramter
> mem=1024m
>
> - add a memory section to node 3
> # echo 0x40000000 > node_hotplug/node3/add_memory
>
> * ACKNOWLEDGMENT
>
> NUMA Hotplug Emulator includes a team's efforts, thanks all of them.
> They are:
> Andi Kleen, Haicheng Li, Shaohui Zheng, Fengguang Wu, David Rientjes,
> Yang Zhang and Yongkang You
> ---
> best regards
> yang
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-04-15 20:37:01

by David Rientjes

[permalink] [raw]
Subject: RE: [0/7,v10] NUMA Hotplug Emulator (v10)

On Fri, 15 Apr 2011, Zhang, Yang Z wrote:

> Any comments for those patches?
>

I believe I commented on them in
http://marc.info/?l=linux-kernel&m=130272108421404 :)

2011-04-16 02:32:20

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [0/7,v10] NUMA Hotplug Emulator (v10)

Sorry, maybe I miss that mail. Also thanks for your good idea.

best regards
yang


> -----Original Message-----
> From: David Rientjes [mailto:[email protected]]
> Sent: Saturday, April 16, 2011 4:37 AM
> To: Zhang, Yang Z
> Cc: Andrew Morton; [email protected]; [email protected];
> [email protected]; Kleen, Andi; [email protected]; Greg KH; Ingo
> Molnar; Len Brown; [email protected]; Yinghai Lu
> Subject: RE: [0/7,v10] NUMA Hotplug Emulator (v10)
>
> On Fri, 15 Apr 2011, Zhang, Yang Z wrote:
>
> > Any comments for those patches?
> >
>
> I believe I commented on them in
> http://marc.info/?l=linux-kernel&m=130272108421404 :)