2013-04-19 05:23:55

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU

When booting x86 system contains memoryless node, node numbers of CPUs
on memoryless node were changed to nearest online node number by
init_cpu_to_node() because the node is not online.

In my system, node numbers of cpu#30-44 and 75-89 were changed from 2 to 0
as follows:

$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89
node 0 size: 32394 MB
node 0 free: 27898 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74
node 1 size: 32768 MB
node 1 free: 30335 MB

If we hot add memory to memoryless node and offine/online all CPUs on
the node, node numbers of these CPUs are changed to correct node numbers
by srat_detect_node() because the node become online.

In this case, node numbers of cpu#30-44 and 75-89 were changed from 0 to 2
in my system as follows:

$ numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 45 46 47 48 49 50 51 52 53 54 55
56 57 58 59
node 0 size: 32394 MB
node 0 free: 27218 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74
node 1 size: 32768 MB
node 1 free: 30014 MB
node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 75 76 77 78 79 80 81
82 83 84 85 86 87 88 89
node 2 size: 16384 MB
node 2 free: 16384 MB

But "cpu to node" and "node to cpu" links were not changed as follows:

$ ls /sys/devices/system/cpu/cpu30/|grep node
node0
$ ls /sys/devices/system/node/node0/|grep cpu30
cpu30

"numactl --hardware" shows that cpu30 belongs to node 2. But sysfs links
does not change.

This patch changes "cpu to node" and "node to cpu" links when node number
changed by onlining CPU.

Signed-off-by: Yasuaki Ishimatsu <[email protected]>
---
v2:
Change argument's name from num to cpuid in store_online()
Add comments for explaining why node number change
---
drivers/base/cpu.c | 25 +++++++++++++++++++++++--
1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index fb10728..229d6e7 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -25,6 +25,15 @@ EXPORT_SYMBOL_GPL(cpu_subsys);
static DEFINE_PER_CPU(struct device *, cpu_sys_devices);

#ifdef CONFIG_HOTPLUG_CPU
+static void change_cpu_under_node(struct cpu *cpu,
+ unsigned int from_nid, unsigned int to_nid)
+{
+ int cpuid = cpu->dev.id;
+ unregister_cpu_under_node(cpuid, from_nid);
+ register_cpu_under_node(cpuid, to_nid);
+ cpu->node_id = to_nid;
+}
+
static ssize_t show_online(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -39,17 +48,29 @@ static ssize_t __ref store_online(struct device *dev,
const char *buf, size_t count)
{
struct cpu *cpu = container_of(dev, struct cpu, dev);
+ int cpuid = cpu->dev.id;
+ int from_nid, to_nid;
ssize_t ret;

cpu_hotplug_driver_lock();
switch (buf[0]) {
case '0':
- ret = cpu_down(cpu->dev.id);
+ ret = cpu_down(cpuid);
if (!ret)
kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
break;
case '1':
- ret = cpu_up(cpu->dev.id);
+ from_nid = cpu_to_node(cpuid);
+ ret = cpu_up(cpuid);
+
+ /*
+ * When hot adding memory to memoryless node and enabling a cpu
+ * on the node, node number of the cpu may internally change.
+ */
+ to_nid = cpu_to_node(cpuid);
+ if (from_nid != to_nid)
+ change_cpu_under_node(cpu, from_nid, to_nid);
+
if (!ret)
kobject_uevent(&dev->kobj, KOBJ_ONLINE);
break;


2013-04-22 22:35:44

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU

On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <[email protected]> wrote:

> When booting x86 system contains memoryless node, node numbers of CPUs
> on memoryless node were changed to nearest online node number by
> init_cpu_to_node() because the node is not online.
>
> ...
>
> If we hot add memory to memoryless node and offine/online all CPUs on
> the node, node numbers of these CPUs are changed to correct node numbers
> by srat_detect_node() because the node become online.

OK, here's a dumb question.

At boot time the CPUs are assigned to the "nearest online node" rather
than to their real memoryless node. The patch arranges for those CPUs
to still be assigned to the "nearest online node" _after_ some memory
is hot-added to their real node. Correct?

Would it not be better to fix this by assigning those CPUs to their real,
memoryless node right at the initial boot? Or is there something in
the kernel which makes cpus-on-a-memoryless-node not work correctly?

2013-04-23 00:05:28

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU

2013/04/23 7:35, Andrew Morton wrote:
> On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <[email protected]> wrote:
>
>> When booting x86 system contains memoryless node, node numbers of CPUs
>> on memoryless node were changed to nearest online node number by
>> init_cpu_to_node() because the node is not online.
>>
>> ...
>>
>> If we hot add memory to memoryless node and offine/online all CPUs on
>> the node, node numbers of these CPUs are changed to correct node numbers
>> by srat_detect_node() because the node become online.
>
> OK, here's a dumb question.
>
> At boot time the CPUs are assigned to the "nearest online node" rather
> than to their real memoryless node. The patch arranges for those CPUs
> to still be assigned to the "nearest online node" _after_ some memory
> is hot-added to their real node. Correct?

Yes. For changing node number of CPUs safely, we should offline CPUs.

>
> Would it not be better to fix this by assigning those CPUs to their real,
> memoryless node right at the initial boot? Or is there something in
> the kernel which makes cpus-on-a-memoryless-node not work correctly?
>

I think assigning CPUs to real node is better. But current Linux's node
strongly depend on memory. Thus if we just create cpus-on-a-memoryless-node,
the kernel cannot work correctly.

Thanks,
Yasuaki Ishimatsu

2013-04-23 00:35:05

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU

On Tue, 23 Apr 2013 09:04:46 +0900 Yasuaki Ishimatsu <[email protected]> wrote:

> 2013/04/23 7:35, Andrew Morton wrote:
> > On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <[email protected]> wrote:
> >
> >> When booting x86 system contains memoryless node, node numbers of CPUs
> >> on memoryless node were changed to nearest online node number by
> >> init_cpu_to_node() because the node is not online.
> >>
> >> ...
> >>
> >> If we hot add memory to memoryless node and offine/online all CPUs on
> >> the node, node numbers of these CPUs are changed to correct node numbers
> >> by srat_detect_node() because the node become online.
> >
> > OK, here's a dumb question.
> >
> > At boot time the CPUs are assigned to the "nearest online node" rather
> > than to their real memoryless node. The patch arranges for those CPUs
> > to still be assigned to the "nearest online node" _after_ some memory
> > is hot-added to their real node. Correct?
>
> Yes. For changing node number of CPUs safely, we should offline CPUs.
>
> >
> > Would it not be better to fix this by assigning those CPUs to their real,
> > memoryless node right at the initial boot? Or is there something in
> > the kernel which makes cpus-on-a-memoryless-node not work correctly?
> >
>
> I think assigning CPUs to real node is better. But current Linux's node
> strongly depend on memory. Thus if we just create cpus-on-a-memoryless-node,
> the kernel cannot work correctly.

hm, why. I'd have thought that if we tell the kernel something like
"this node has one zone, the size of which is zero bytes" then a
surprising amount of the existing code will Just Work.

What goes wrong?

2013-04-23 01:25:01

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU

2013/04/23 9:34, Andrew Morton wrote:
> On Tue, 23 Apr 2013 09:04:46 +0900 Yasuaki Ishimatsu <[email protected]> wrote:
>
>> 2013/04/23 7:35, Andrew Morton wrote:
>>> On Fri, 19 Apr 2013 14:23:23 +0900 Yasuaki Ishimatsu <[email protected]> wrote:
>>>
>>>> When booting x86 system contains memoryless node, node numbers of CPUs
>>>> on memoryless node were changed to nearest online node number by
>>>> init_cpu_to_node() because the node is not online.
>>>>
>>>> ...
>>>>
>>>> If we hot add memory to memoryless node and offine/online all CPUs on
>>>> the node, node numbers of these CPUs are changed to correct node numbers
>>>> by srat_detect_node() because the node become online.
>>>
>>> OK, here's a dumb question.
>>>
>>> At boot time the CPUs are assigned to the "nearest online node" rather
>>> than to their real memoryless node. The patch arranges for those CPUs
>>> to still be assigned to the "nearest online node" _after_ some memory
>>> is hot-added to their real node. Correct?
>>
>> Yes. For changing node number of CPUs safely, we should offline CPUs.
>>
>>>
>>> Would it not be better to fix this by assigning those CPUs to their real,
>>> memoryless node right at the initial boot? Or is there something in
>>> the kernel which makes cpus-on-a-memoryless-node not work correctly?
>>>
>>
>> I think assigning CPUs to real node is better. But current Linux's node
>> strongly depend on memory. Thus if we just create cpus-on-a-memoryless-node,
>> the kernel cannot work correctly.
>
> hm, why. I'd have thought that if we tell the kernel something like
> "this node has one zone, the size of which is zero bytes" then a
> surprising amount of the existing code will Just Work.
>
> What goes wrong?

Sorry I forgot detailed issue.
When I saw following issue, I tried to fix it and found that current
Linux's node strongly depend on memory.
https://lkml.org/lkml/2012/9/12/20

I'll try to fix it again.

Thanks,
Yasuaki Ishimatsu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2013-04-23 16:06:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [Bug fix PATCH v2] numa, cpu hotplug: Change links of CPU and node when changing node number by onlining CPU

Andrew Morton <[email protected]> writes:
>
> Would it not be better to fix this by assigning those CPUs to their real,
> memoryless node right at the initial boot? Or is there something in
> the kernel which makes cpus-on-a-memoryless-node not work correctly?

I probably added this originally. The original reason was that long
ago the VM was broken with memory less nodes. These days it is likely
obsolete.

-Andi

--
[email protected] -- Speaking for myself only