2010-12-07 02:30:44

by Zheng, Shaohui

[permalink] [raw]
Subject: [1/7,v8] NUMA Hotplug Emulator: documentation

From: Shaohui Zheng <[email protected]>

add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
to explain the usage for the hotplug emulator.

Reviewed-By: Randy Dunlap <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Haicheng Li <[email protected]>
Signed-off-by: Shaohui Zheng <[email protected]>
---
Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt 2010-12-07 08:53:19.677622002 +0800
@@ -0,0 +1,102 @@
+NUMA Hotplug Emulator for x86_64
+---------------------------------------------------
+
+NUMA hotplug emulator is able to emulate NUMA Node Hotplug
+thru a pure software way. It intends to help people easily debug
+and test node/CPU/memory hotplug related stuff on a
+none-NUMA-hotplug-support machine, even a UMA machine and virtual
+environment.
+
+1) Node hotplug emulation:
+
+Adds a numa=possible=<N> command line option to set an additional N nodes
+as being possible for memory hotplug. This set of possible nodes
+control nr_node_ids and the sizes of several dynamically allocated node
+arrays.
+
+This allows memory hotplug to create new nodes for newly added memory
+rather than binding it to existing nodes.
+
+For emulation on x86, it would be possible to set aside memory for hotplugged
+nodes (say, anything above 2G) and to add an additional four nodes as being
+possible on boot with
+
+ mem=2G numa=possible=4
+
+and then creating a new 128M node at runtime:
+
+ # echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
+ On node 1 totalpages: 0
+ init_memory_mapping: 0000000080000000-0000000088000000
+ 0080000000 - 0088000000 page 2M
+
+Once the new node has been added, its memory can be onlined. If this
+memory represents memory section 16, for example:
+
+ # echo online > /sys/devices/system/memory/memory16/state
+ Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
+ Policy zone: Normal
+ [ The memory section(s) mapped to a particular node are visible via
+ /sys/devices/system/node/node1, in this example. ]
+
+2) CPU hotplug emulation:
+
+The emulator reserves CPUs thru grub parameter, the reserved CPUs can be
+hot-add/hot-remove in software method, it emulates the process of physical
+cpu hotplug.
+
+When hotplugging a CPU with emulator, we are using a logical CPU to emulate the
+CPU socket hotplug process. For the CPU supported SMT, some logical CPUs are in
+the same socket, but it may located in different NUMA node after we have
+emulator. We put the logical CPU into a fake CPU socket, and assign it a
+unique phys_proc_id. For the fake socket, we put one logical CPU in only.
+
+ - to hide CPUs
+ - Using boot option "maxcpus=N" hide CPUs
+ N is the number of CPUs to initialize; the reset will be hidden.
+ - Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
+ when cpu_hpe is enabled, the rest CPUs will not be initialized
+
+ - to hot-add CPU to node
+ # echo nid > cpu/probe
+
+ - to hot-remove CPU
+ # echo nid > cpu/release
+
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+The difficulty of Memory Release is well-known, we have no plan for it until
+now.
+
+ - reserve memory thru a kernel boot paramter
+ mem=1024m
+
+ - add a memory section to node 3
+ # echo 0x40000000 > mem_hotplug/node3/add_memory
+ OR
+ # echo 1024m > mem_hotplug/node3/add_memory
+
+4) Script for hotplug testing
+
+These scripts provides convenience when we hot-add memory/cpu in batch.
+
+- Online all memory sections:
+for m in /sys/devices/system/memory/memory*;
+do
+ echo online > $m/state;
+done
+
+- CPU Online:
+for c in /sys/devices/system/cpu/cpu*;
+do
+ echo 1 > $c/online;
+done
+
+- David Rientjes <[email protected]>
+- Haicheng Li <[email protected]>
+- Shaohui Zheng <[email protected]>
+ Nov 2010

--
Thanks & Regards,
Shaohui


2010-12-07 18:30:33

by Eric B Munson

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

Shaohui,

The documentation patch seems to be stale, it needs to be updated to match the
new file names.

On Tue, 07 Dec 2010, [email protected] wrote:

> From: Shaohui Zheng <[email protected]>
>
> add a text file Documentation/x86/x86_64/numa_hotplug_emulator.txt
> to explain the usage for the hotplug emulator.
>
> Reviewed-By: Randy Dunlap <[email protected]>
> Signed-off-by: David Rientjes <[email protected]>
> Signed-off-by: Haicheng Li <[email protected]>
> Signed-off-by: Shaohui Zheng <[email protected]>
> ---
> Index: linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt
> ===================================================================
> --- /dev/null 1970-01-01 00:00:00.000000000 +0000
> +++ linux-hpe4/Documentation/x86/x86_64/numa_hotplug_emulator.txt 2010-12-07 08:53:19.677622002 +0800
> @@ -0,0 +1,102 @@
> +NUMA Hotplug Emulator for x86_64
> +---------------------------------------------------
> +
> +NUMA hotplug emulator is able to emulate NUMA Node Hotplug
> +thru a pure software way. It intends to help people easily debug
> +and test node/CPU/memory hotplug related stuff on a
> +none-NUMA-hotplug-support machine, even a UMA machine and virtual
> +environment.
> +
> +1) Node hotplug emulation:
> +
> +Adds a numa=possible=<N> command line option to set an additional N nodes
> +as being possible for memory hotplug. This set of possible nodes
> +control nr_node_ids and the sizes of several dynamically allocated node
> +arrays.
> +
> +This allows memory hotplug to create new nodes for newly added memory
> +rather than binding it to existing nodes.
> +
> +For emulation on x86, it would be possible to set aside memory for hotplugged
> +nodes (say, anything above 2G) and to add an additional four nodes as being
> +possible on boot with
> +
> + mem=2G numa=possible=4
> +
> +and then creating a new 128M node at runtime:
> +
> + # echo 128M@0x80000000 > /sys/kernel/debug/node/add_node
> + On node 1 totalpages: 0
> + init_memory_mapping: 0000000080000000-0000000088000000
> + 0080000000 - 0088000000 page 2M
> +
> +Once the new node has been added, its memory can be onlined. If this
> +memory represents memory section 16, for example:
> +
> + # echo online > /sys/devices/system/memory/memory16/state
> + Built 2 zonelists in Node order, mobility grouping on. Total pages: 514846
> + Policy zone: Normal
> + [ The memory section(s) mapped to a particular node are visible via
> + /sys/devices/system/node/node1, in this example. ]
> +
> +2) CPU hotplug emulation:
> +
> +The emulator reserves CPUs thru grub parameter, the reserved CPUs can be
> +hot-add/hot-remove in software method, it emulates the process of physical
> +cpu hotplug.
> +
> +When hotplugging a CPU with emulator, we are using a logical CPU to emulate the
> +CPU socket hotplug process. For the CPU supported SMT, some logical CPUs are in
> +the same socket, but it may located in different NUMA node after we have
> +emulator. We put the logical CPU into a fake CPU socket, and assign it a
> +unique phys_proc_id. For the fake socket, we put one logical CPU in only.
> +
> + - to hide CPUs
> + - Using boot option "maxcpus=N" hide CPUs
> + N is the number of CPUs to initialize; the reset will be hidden.
> + - Using boot option "cpu_hpe=on" to enable CPU hotplug emulation
> + when cpu_hpe is enabled, the rest CPUs will not be initialized
> +
> + - to hot-add CPU to node
> + # echo nid > cpu/probe
> +
> + - to hot-remove CPU
> + # echo nid > cpu/release
> +
> +3) Memory hotplug emulation:
> +
> +The emulator reserves memory before OS boots, the reserved memory region is
> +removed from e820 table. Each online node has an add_memory interface, and
> +memory can be hot-added via the per-ndoe add_memory debugfs interface.
> +
> +The difficulty of Memory Release is well-known, we have no plan for it until
> +now.
> +
> + - reserve memory thru a kernel boot paramter
> + mem=1024m
> +
> + - add a memory section to node 3
> + # echo 0x40000000 > mem_hotplug/node3/add_memory
> + OR
> + # echo 1024m > mem_hotplug/node3/add_memory
> +
> +4) Script for hotplug testing
> +
> +These scripts provides convenience when we hot-add memory/cpu in batch.
> +
> +- Online all memory sections:
> +for m in /sys/devices/system/memory/memory*;
> +do
> + echo online > $m/state;
> +done
> +
> +- CPU Online:
> +for c in /sys/devices/system/cpu/cpu*;
> +do
> + echo 1 > $c/online;
> +done
> +
> +- David Rientjes <[email protected]>
> +- Haicheng Li <[email protected]>
> +- Shaohui Zheng <[email protected]>
> + Nov 2010
>
> --
> Thanks & Regards,
> Shaohui
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>


Attachments:
(No filename) (4.84 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2010-12-08 00:45:04

by Shaohui Zheng

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

On Tue, Dec 07, 2010 at 11:24:20AM -0700, Eric B Munson wrote:
> Shaohui,
>
> The documentation patch seems to be stale, it needs to be updated to match the
> new file names.
>
Eric,
the major change on the patchset is on the interface, for the v8 emulator,
we accept David's per-node debugfs add_memory interface, we already included
in the documentation patch. the change is very small, so it is not obvious.

This is the change on the documentation compare with v7:
+3) Memory hotplug emulation:
+
+The emulator reserves memory before OS boots, the reserved memory region is
+removed from e820 table. Each online node has an add_memory interface, and
+memory can be hot-added via the per-ndoe add_memory debugfs interface.
+
+The difficulty of Memory Release is well-known, we have no plan for it until
+now.
+
+ - reserve memory thru a kernel boot paramter
+ mem=1024m
+
+ - add a memory section to node 3
+ # echo 0x40000000 > mem_hotplug/node3/add_memory
+ OR
+ # echo 1024m > mem_hotplug/node3/add_memory
+

--
Thanks & Regards,
Shaohui

2010-12-08 17:46:42

by Eric B Munson

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

Shaohui,

I have had some success. I had run into confusion on the memory hotplug with
which files to be using to online memory. The latest patch sorted it out for me
and I can now online disabled memory in new nodes. I still cannot online an offlined
cpu. Of the 12 available thread, I have 8 activated on boot with the kernel command line:

mem=8G numa=possible=12 maxcpus=8 cpu_hpe=on

I can offline a CPU just fine according to the kernel:
root@bert:/sys/devices/system/cpu# echo 7 > release
(dmesg)
[ 911.494852] offline cpu 7.
[ 911.694323] CPU 7 is now offline

But when I try and re-add it I get an error:
root@bert:/sys/devices/system/cpu# echo 0 > probe
(dmesg)
Dec 8 10:41:55 bert kernel: [ 1190.095051] ------------[ cut here ]------------
Dec 8 10:41:55 bert kernel: [ 1190.095056] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xce/0x180()
Dec 8 10:41:55 bert kernel: [ 1190.095057] Hardware name: System Product Name
Dec 8 10:41:55 bert kernel: [ 1190.095058] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu7'
Dec 8 10:41:55 bert kernel: [ 1190.095060] Modules linked in: nfs binfmt_misc lockd fscache nfs_acl auth_rpcgss sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek radeon snd_hda_intel snd_hda_codec snd_cmipci gameport snd_pcm ttm snd_opl3_lib drm_kms_helper snd_hwdep snd_mpu401_uart drm uvcvideo snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq xhci_hcd snd_timer videodev snd_seq_device snd psmouse i7core_edac i2c_algo_bit edac_core joydev v4l1_compat shpchp snd_page_alloc v4l2_compat_ioctl32 soundcore hwmon_vid asus_atk0110 max6650 serio_raw hid_microsoft usbhid hid firewire_ohci firewire_core crc_itu_t ahci sky2 libahci
Dec 8 10:41:55 bert kernel: [ 1190.095088] Pid: 2369, comm: bash Tainted: G W 2.6.37-rc5-numa-test+ #3
Dec 8 10:41:55 bert kernel: [ 1190.095089] Call Trace:
Dec 8 10:41:55 bert kernel: [ 1190.095094] [<ffffffff8105eb1f>] warn_slowpath_common+0x7f/0xc0
Dec 8 10:41:55 bert kernel: [ 1190.095096] [<ffffffff8105ec16>] warn_slowpath_fmt+0x46/0x50
Dec 8 10:41:55 bert kernel: [ 1190.095098] [<ffffffff811cf77e>] sysfs_add_one+0xce/0x180
Dec 8 10:41:55 bert kernel: [ 1190.095100] [<ffffffff811cf8b1>] create_dir+0x81/0xd0
Dec 8 10:41:55 bert kernel: [ 1190.095102] [<ffffffff811cf97d>] sysfs_create_dir+0x7d/0xd0
Dec 8 10:41:55 bert kernel: [ 1190.095106] [<ffffffff815a2b3d>] ? sub_preempt_count+0x9d/0xd0
Dec 8 10:41:55 bert kernel: [ 1190.095109] [<ffffffff812c9ffd>] kobject_add_internal+0xbd/0x200
Dec 8 10:41:55 bert kernel: [ 1190.095111] [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
Dec 8 10:41:55 bert kernel: [ 1190.095113] [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
Dec 8 10:41:55 bert kernel: [ 1190.095117] [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
Dec 8 10:41:55 bert kernel: [ 1190.095121] [<ffffffff81598f38>] register_cpu_node+0x32/0x88
Dec 8 10:41:55 bert kernel: [ 1190.095123] [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
Dec 8 10:41:55 bert kernel: [ 1190.095127] [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
Dec 8 10:41:55 bert kernel: [ 1190.095129] [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
Dec 8 10:41:55 bert kernel: [ 1190.095131] [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
Dec 8 10:41:55 bert kernel: [ 1190.095133] [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
Dec 8 10:41:55 bert kernel: [ 1190.095137] [<ffffffff811624c8>] vfs_write+0xc8/0x190
Dec 8 10:41:55 bert kernel: [ 1190.095139] [<ffffffff81162e61>] sys_write+0x51/0x90
Dec 8 10:41:55 bert kernel: [ 1190.095142] [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
Dec 8 10:41:55 bert kernel: [ 1190.095144] ---[ end trace f615c2a524d318ea ]---
Dec 8 10:41:55 bert kernel: [ 1190.095149] Pid: 2369, comm: bash Tainted: G W 2.6.37-rc5-numa-test+ #3
Dec 8 10:41:55 bert kernel: [ 1190.095150] Call Trace:
Dec 8 10:41:55 bert kernel: [ 1190.095152] [<ffffffff812ca09b>] kobject_add_internal+0x15b/0x200
Dec 8 10:41:55 bert kernel: [ 1190.095154] [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
Dec 8 10:41:55 bert kernel: [ 1190.095156] [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
Dec 8 10:41:55 bert kernel: [ 1190.095158] [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
Dec 8 10:41:55 bert kernel: [ 1190.095160] [<ffffffff81598f38>] register_cpu_node+0x32/0x88
Dec 8 10:41:55 bert kernel: [ 1190.095162] [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
Dec 8 10:41:55 bert kernel: [ 1190.095164] [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
Dec 8 10:41:55 bert kernel: [ 1190.095166] [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
Dec 8 10:41:55 bert kernel: [ 1190.095168] [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
Dec 8 10:41:55 bert kernel: [ 1190.095170] [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
Dec 8 10:41:55 bert kernel: [ 1190.095172] [<ffffffff811624c8>] vfs_write+0xc8/0x190
Dec 8 10:41:55 bert kernel: [ 1190.095174] [<ffffffff81162e61>] sys_write+0x51/0x90
Dec 8 10:41:55 bert kernel: [ 1190.095176] [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b

Am I doing something wrong?

Thanks,
Eric


On Wed, 08 Dec 2010, Shaohui Zheng wrote:

> On Tue, Dec 07, 2010 at 11:24:20AM -0700, Eric B Munson wrote:
> > Shaohui,
> >
> > The documentation patch seems to be stale, it needs to be updated to match the
> > new file names.
> >
> Eric,
> the major change on the patchset is on the interface, for the v8 emulator,
> we accept David's per-node debugfs add_memory interface, we already included
> in the documentation patch. the change is very small, so it is not obvious.
>
> This is the change on the documentation compare with v7:
> +3) Memory hotplug emulation:
> +
> +The emulator reserves memory before OS boots, the reserved memory region is
> +removed from e820 table. Each online node has an add_memory interface, and
> +memory can be hot-added via the per-ndoe add_memory debugfs interface.
> +
> +The difficulty of Memory Release is well-known, we have no plan for it until
> +now.
> +
> + - reserve memory thru a kernel boot paramter
> + mem=1024m
> +
> + - add a memory section to node 3
> + # echo 0x40000000 > mem_hotplug/node3/add_memory
> + OR
> + # echo 1024m > mem_hotplug/node3/add_memory
> +
>
> --
> Thanks & Regards,
> Shaohui
>


Attachments:
(No filename) (6.16 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2010-12-08 18:16:52

by Eric B Munson

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

Shaohui,

I was able to online a cpu to node 0 successfully. My problem was that I did
not take the cpu offline before I released it. Everything looks to be working
for me.

Thanks for your help,
Eric
On Wed, 08 Dec 2010, Shaohui Zheng wrote:

> On Tue, Dec 07, 2010 at 11:24:20AM -0700, Eric B Munson wrote:
> > Shaohui,
> >
> > The documentation patch seems to be stale, it needs to be updated to match the
> > new file names.
> >
> Eric,
> the major change on the patchset is on the interface, for the v8 emulator,
> we accept David's per-node debugfs add_memory interface, we already included
> in the documentation patch. the change is very small, so it is not obvious.
>
> This is the change on the documentation compare with v7:
> +3) Memory hotplug emulation:
> +
> +The emulator reserves memory before OS boots, the reserved memory region is
> +removed from e820 table. Each online node has an add_memory interface, and
> +memory can be hot-added via the per-ndoe add_memory debugfs interface.
> +
> +The difficulty of Memory Release is well-known, we have no plan for it until
> +now.
> +
> + - reserve memory thru a kernel boot paramter
> + mem=1024m
> +
> + - add a memory section to node 3
> + # echo 0x40000000 > mem_hotplug/node3/add_memory
> + OR
> + # echo 1024m > mem_hotplug/node3/add_memory
> +
>
> --
> Thanks & Regards,
> Shaohui
>


Attachments:
(No filename) (1.34 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2010-12-08 21:16:25

by David Rientjes

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

On Wed, 8 Dec 2010, Eric B Munson wrote:

> Shaohui,
>
> I was able to online a cpu to node 0 successfully. My problem was that I did
> not take the cpu offline before I released it. Everything looks to be working
> for me.
>

I think it should fail more gracefully than triggering WARN_ON()s because
of duplicate sysfs dentries though, right?

2010-12-08 21:18:13

by David Rientjes

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

On Wed, 8 Dec 2010, Shaohui Zheng wrote:

> Eric,
> the major change on the patchset is on the interface, for the v8 emulator,
> we accept David's per-node debugfs add_memory interface, we already included
> in the documentation patch. the change is very small, so it is not obvious.
>

It's still stale as Eric mentioned: for instance, the reference to
/sys/kernel/debug/node/add_node which is now under mem_hotplug. There may
be other examples as well.

2010-12-09 01:34:48

by Zheng, Shaohui

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

On Wed, Dec 08, 2010 at 10:46:33AM -0700, Eric B Munson wrote:
> Shaohui,
>
> I have had some success. I had run into confusion on the memory hotplug with
> which files to be using to online memory. The latest patch sorted it out for me
> and I can now online disabled memory in new nodes. I still cannot online an offlined
> cpu. Of the 12 available thread, I have 8 activated on boot with the kernel command line:
>
> mem=8G numa=possible=12 maxcpus=8 cpu_hpe=on
>
> I can offline a CPU just fine according to the kernel:
> root@bert:/sys/devices/system/cpu# echo 7 > release
> (dmesg)
> [ 911.494852] offline cpu 7.
> [ 911.694323] CPU 7 is now offline
>
> But when I try and re-add it I get an error:
> root@bert:/sys/devices/system/cpu# echo 0 > probe
> (dmesg)
> Dec 8 10:41:55 bert kernel: [ 1190.095051] ------------[ cut here ]------------
> Dec 8 10:41:55 bert kernel: [ 1190.095056] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0xce/0x180()
> Dec 8 10:41:55 bert kernel: [ 1190.095057] Hardware name: System Product Name
> Dec 8 10:41:55 bert kernel: [ 1190.095058] sysfs: cannot create duplicate filename '/devices/system/cpu/cpu7'
> Dec 8 10:41:55 bert kernel: [ 1190.095060] Modules linked in: nfs binfmt_misc lockd fscache nfs_acl auth_rpcgss sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek radeon snd_hda_intel snd_hda_codec snd_cmipci gameport snd_pcm ttm snd_opl3_lib drm_kms_helper snd_hwdep snd_mpu401_uart drm uvcvideo snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq xhci_hcd snd_timer videodev snd_seq_device snd psmouse i7core_edac i2c_algo_bit edac_core joydev v4l1_compat shpchp snd_page_alloc v4l2_compat_ioctl32 soundcore hwmon_vid asus_atk0110 max6650 serio_raw hid_microsoft usbhid hid firewire_ohci firewire_core crc_itu_t ahci sky2 libahci
> Dec 8 10:41:55 bert kernel: [ 1190.095088] Pid: 2369, comm: bash Tainted: G W 2.6.37-rc5-numa-test+ #3
> Dec 8 10:41:55 bert kernel: [ 1190.095089] Call Trace:
> Dec 8 10:41:55 bert kernel: [ 1190.095094] [<ffffffff8105eb1f>] warn_slowpath_common+0x7f/0xc0
> Dec 8 10:41:55 bert kernel: [ 1190.095096] [<ffffffff8105ec16>] warn_slowpath_fmt+0x46/0x50
> Dec 8 10:41:55 bert kernel: [ 1190.095098] [<ffffffff811cf77e>] sysfs_add_one+0xce/0x180
> Dec 8 10:41:55 bert kernel: [ 1190.095100] [<ffffffff811cf8b1>] create_dir+0x81/0xd0
> Dec 8 10:41:55 bert kernel: [ 1190.095102] [<ffffffff811cf97d>] sysfs_create_dir+0x7d/0xd0
> Dec 8 10:41:55 bert kernel: [ 1190.095106] [<ffffffff815a2b3d>] ? sub_preempt_count+0x9d/0xd0
> Dec 8 10:41:55 bert kernel: [ 1190.095109] [<ffffffff812c9ffd>] kobject_add_internal+0xbd/0x200
> Dec 8 10:41:55 bert kernel: [ 1190.095111] [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
> Dec 8 10:41:55 bert kernel: [ 1190.095113] [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
> Dec 8 10:41:55 bert kernel: [ 1190.095117] [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
> Dec 8 10:41:55 bert kernel: [ 1190.095121] [<ffffffff81598f38>] register_cpu_node+0x32/0x88
> Dec 8 10:41:55 bert kernel: [ 1190.095123] [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
> Dec 8 10:41:55 bert kernel: [ 1190.095127] [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
> Dec 8 10:41:55 bert kernel: [ 1190.095129] [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
> Dec 8 10:41:55 bert kernel: [ 1190.095131] [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
> Dec 8 10:41:55 bert kernel: [ 1190.095133] [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
> Dec 8 10:41:55 bert kernel: [ 1190.095137] [<ffffffff811624c8>] vfs_write+0xc8/0x190
> Dec 8 10:41:55 bert kernel: [ 1190.095139] [<ffffffff81162e61>] sys_write+0x51/0x90
> Dec 8 10:41:55 bert kernel: [ 1190.095142] [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
> Dec 8 10:41:55 bert kernel: [ 1190.095144] ---[ end trace f615c2a524d318ea ]---
> Dec 8 10:41:55 bert kernel: [ 1190.095149] Pid: 2369, comm: bash Tainted: G W 2.6.37-rc5-numa-test+ #3
> Dec 8 10:41:55 bert kernel: [ 1190.095150] Call Trace:
> Dec 8 10:41:55 bert kernel: [ 1190.095152] [<ffffffff812ca09b>] kobject_add_internal+0x15b/0x200
> Dec 8 10:41:55 bert kernel: [ 1190.095154] [<ffffffff812ca258>] kobject_add_varg+0x38/0x60
> Dec 8 10:41:55 bert kernel: [ 1190.095156] [<ffffffff812ca2d3>] kobject_init_and_add+0x53/0x70
> Dec 8 10:41:55 bert kernel: [ 1190.095158] [<ffffffff8139475f>] sysdev_register+0x6f/0xf0
> Dec 8 10:41:55 bert kernel: [ 1190.095160] [<ffffffff81598f38>] register_cpu_node+0x32/0x88
> Dec 8 10:41:55 bert kernel: [ 1190.095162] [<ffffffff8158207e>] arch_register_cpu_node+0x3e/0x40
> Dec 8 10:41:55 bert kernel: [ 1190.095164] [<ffffffff8101220e>] arch_cpu_probe+0x10e/0x1f0
> Dec 8 10:41:55 bert kernel: [ 1190.095166] [<ffffffff813989d4>] cpu_probe_store+0x14/0x20
> Dec 8 10:41:55 bert kernel: [ 1190.095168] [<ffffffff81393ef0>] sysdev_class_store+0x20/0x30
> Dec 8 10:41:55 bert kernel: [ 1190.095170] [<ffffffff811cd925>] sysfs_write_file+0xe5/0x170
> Dec 8 10:41:55 bert kernel: [ 1190.095172] [<ffffffff811624c8>] vfs_write+0xc8/0x190
> Dec 8 10:41:55 bert kernel: [ 1190.095174] [<ffffffff81162e61>] sys_write+0x51/0x90
> Dec 8 10:41:55 bert kernel: [ 1190.095176] [<ffffffff8100c142>] system_call_fastpath+0x16/0x1b
>
> Am I doing something wrong?
>
> Thanks,
> Eric

Eric,
I saw that you already get this issue solved in another email, that is good. I double check your step, and I did not find any problems.

the logic to do CPU release(arch_cpu_release),
1) offline the CPU if the CPU is online
2) unregister CPU

so even if the CPU is online, you can still release the CPU directly. I should check the return value after call cpu_down.

How about add the following checking?

--- arch/x86/kernel/topology.c-orig 2010-12-09 08:03:19.883331001 +0800
+++ arch/x86/kernel/topology.c 2010-12-09 08:01:35.993331000 +0800
@@ -158,7 +158,10 @@

if (cpu_online(cpu)) {
printk(KERN_DEBUG "offline cpu %d.\n", cpu);
- cpu_down(cpu);
+ if (!cpu_down(cpu)){
+ printk(KERN_ERR "fail to offline cpu %d, give up.\n", cpu);
+ return -EPERM;
+ }
}

arch_unregister_cpu(cpu);

--
Thanks & Regards,
Shaohui

2010-12-09 01:48:05

by Zheng, Shaohui

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

On Wed, Dec 08, 2010 at 01:16:10PM -0800, David Rientjes wrote:
> On Wed, 8 Dec 2010, Eric B Munson wrote:
>
> > Shaohui,
> >
> > I was able to online a cpu to node 0 successfully. My problem was that I did
> > not take the cpu offline before I released it. Everything looks to be working
> > for me.
> >
>
> I think it should fail more gracefully than triggering WARN_ON()s because
> of duplicate sysfs dentries though, right?

Yes, we should do more checking on the return value, the duplicate dentries can
be avoided.

Another solution: force user to offline the cpu before we do cpu release.

--
Thanks & Regards,
Shaohui

2010-12-09 01:58:36

by Zheng, Shaohui

[permalink] [raw]
Subject: Re: [1/7,v8] NUMA Hotplug Emulator: documentation

On Wed, Dec 08, 2010 at 01:18:02PM -0800, David Rientjes wrote:
> On Wed, 8 Dec 2010, Shaohui Zheng wrote:
>
> > Eric,
> > the major change on the patchset is on the interface, for the v8 emulator,
> > we accept David's per-node debugfs add_memory interface, we already included
> > in the documentation patch. the change is very small, so it is not obvious.
> >
>
> It's still stale as Eric mentioned: for instance, the reference to
> /sys/kernel/debug/node/add_node which is now under mem_hotplug. There may
> be other examples as well.

I forget to udpate this part, my carelessness, thanks Eric and David.

--
Thanks & Regards,
Shaohui