2010-01-15 07:42:15

by Haicheng Li

[permalink] [raw]
Subject: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

This is to fix the bug discussed in email thread: http://patchwork.kernel.org/patch/69499/.

Currently node_possible_map won't include the offlined node that has neither CPU onlined nor MEM
onlined at booting time. As a result, nr_node_ids won't be equal to possible nodes.

CC: Thomas Gleixner <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: H. Peter Anvin <[email protected]>
CC: Andi Kleen <[email protected]>
Signed-off-by: Haicheng Li <[email protected]>
---
arch/x86/mm/srat_64.c | 10 ++--------
1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..a5bc297 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -238,7 +238,7 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
void __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
- struct bootnode *nd, oldnode;
+ struct bootnode *nd;
unsigned long start, end;
int node, pxm;
int i;
@@ -277,7 +277,6 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
return;
}
nd = &nodes[node];
- oldnode = *nd;
if (!node_test_and_set(node, nodes_parsed)) {
nd->start = start;
nd->end = end;
@@ -291,13 +290,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
start, end);

- if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
+ if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
update_nodes_add(node, start, end);
- /* restore nodes[node] */
- *nd = oldnode;
- if ((nd->start | nd->end) == 0)
- node_clear(node, nodes_parsed);
- }

node_memblk_range[num_node_memblks].start = start;
node_memblk_range[num_node_memblks].end = end;


2010-01-17 02:22:33

by Haicheng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

Hello,

Would there be any potential issue with this patch? Without this fix, hotadding a CPU with MEM
attached will cause kernel BUG like below. So this could even be a fix for stable, any comments?

the BUG is:
[ 141.667487] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[ 141.667782] IP: [<ffffffff810b8a64>] cache_reap+0x71/0x236
[ 141.667969] PGD 0
[ 141.668129] Oops: 0000 [#1] SMP
[ 141.668357] last sysfs file: /sys/class/scsi_host/host4/proc_name
[ 141.668469] CPU
[ 141.668630] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc
dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc
lp parport joydev usbhid sr_mod cdrom thermal processor thermal_sys container button rtc_cmos
rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore
[ 141.671659] Pid: 126, comm: events/27 Not tainted 2.6.32 #9 Server
[ 141.671771] RIP: 0010:[<ffffffff810b8a64>] [<ffffffff810b8a64>] cache_reap+0x71/0x236
[ 141.671981] RSP: 0018:ffff88027e81bdf0 EFLAGS: 00010206
[ 141.672089] RAX: 0000000000000002 RBX: 0000000000000078 RCX: ffff88047d86e580
[ 141.672204] RDX: ffff88047dfcbc00 RSI: ffff88047f13f6c0 RDI: ffff88047d9136c0
[ 141.672319] RBP: ffff88027e81be30 R08: 0000000000000001 R09: 0000000000000001
[ 141.672433] R10: 0000000000000000 R11: 0000000000000086 R12: ffff88047d87c200
[ 141.672548] R13: ffff88047d87d680 R14: ffffffff810b89f3 R15: 0000000000000002
[ 141.672663] FS: 0000000000000000(0000) GS:ffff88028b5a0000(0000) knlGS:0000000000000000
[ 141.672807] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 141.672917] CR2: 0000000000000078 CR3: 0000000001001000 CR4: 00000000000006e0
[ 141.673032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 141.673147] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 141.673262] Process events/27 (pid: 126, threadinfo ffff88027e81a000, task ffff88027f3ea040)
[ 141.673406] Stack:
[ 141.673503] ffff88027e81be30 ffff88028b5b05a0 0000000100000000 ffff88027e81be80
[ 141.673808] <0> ffff88028b5b5b40 ffff88028b5b05a0 ffffffff810b89f3 fffffffff00000c6
[ 141.674265] <0> ffff88027e81bec0 ffffffff81057394 ffffffff8105733e ffffffff81369f3a
[ 141.674813] Call Trace:
[ 141.674915] [<ffffffff810b89f3>] ? cache_reap+0x0/0x236
[ 141.675028] [<ffffffff81057394>] worker_thread+0x17a/0x27b
[ 141.675138] [<ffffffff8105733e>] ? worker_thread+0x124/0x27b
[ 141.675256] [<ffffffff81369f3a>] ? thread_return+0x3e/0xee
[ 141.675369] [<ffffffff8105a244>] ? autoremove_wake_function+0x0/0x38
[ 141.675482] [<ffffffff8105721a>] ? worker_thread+0x0/0x27b
[ 141.675593] [<ffffffff8105a146>] kthread+0x7d/0x87
[ 141.675707] [<ffffffff81012daa>] child_rip+0xa/0x20
[ 141.675817] [<ffffffff81012710>] ? restore_args+0x0/0x30
[ 141.675927] [<ffffffff8105a0c9>] ? kthread+0x0/0x87
[ 141.676035] [<ffffffff81012da0>] ? child_rip+0x0/0x20
[ 141.676142] Code: a4 c5 68 08 00 00 65 48 8b 04 25 00 e4 00 00 48 8b 04 18 49 8b 4c 24 78 48 85
c9 74 5b 41 89 c7 48 98 48 8b 1c c1 48 85 db 74 4d <83> 3b 00 74 48 48 83 3d ff d4 65 00 00 75 04 0f
0b eb fe fa 66
[ 141.680610] RIP [<ffffffff810b8a64>] cache_reap+0x71/0x236
[ 141.680785] RSP <ffff88027e81bdf0>
[ 141.680886] CR2: 0000000000000078
[ 141.681016] ---[ end trace b1e17069ef81fe83 ]--

Thanks.

-haicheng

Haicheng Li wrote:
> This is to fix the bug discussed in email thread:
> http://patchwork.kernel.org/patch/69499/.
>
> Currently node_possible_map won't include the offlined node that has
> neither CPU onlined nor MEM onlined at booting time. As a result,
> nr_node_ids won't be equal to possible nodes.
>
> CC: Thomas Gleixner <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: H. Peter Anvin <[email protected]>
> CC: Andi Kleen <[email protected]>
> Signed-off-by: Haicheng Li <[email protected]>
> ---
> arch/x86/mm/srat_64.c | 10 ++--------
> 1 files changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index a271241..a5bc297 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -238,7 +238,7 @@ update_nodes_add(int node, unsigned long start,
> unsigned long end)
> void __init
> acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> {
> - struct bootnode *nd, oldnode;
> + struct bootnode *nd;
> unsigned long start, end;
> int node, pxm;
> int i;
> @@ -277,7 +277,6 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> return;
> }
> nd = &nodes[node];
> - oldnode = *nd;
> if (!node_test_and_set(node, nodes_parsed)) {
> nd->start = start;
> nd->end = end;
> @@ -291,13 +290,8 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
> start, end);
>
> - if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
> update_nodes_add(node, start, end);
> - /* restore nodes[node] */
> - *nd = oldnode;
> - if ((nd->start | nd->end) == 0)
> - node_clear(node, nodes_parsed);
> - }
>
> node_memblk_range[num_node_memblks].start = start;
> node_memblk_range[num_node_memblks].end = end;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2010-01-17 21:53:28

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

On Fri, 15 Jan 2010, Haicheng Li wrote:

> This is to fix the bug discussed in email thread:
> http://patchwork.kernel.org/patch/69499/.
>

It would be better to include a brief synopsis of the problem in the
changelog instead of a citation, otherwise it obfuscates the issue it
solves.

> Currently node_possible_map won't include the offlined node that has neither
> CPU onlined nor MEM onlined at booting time. As a result, nr_node_ids won't be
> equal to possible nodes.
>
> CC: Thomas Gleixner <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: H. Peter Anvin <[email protected]>
> CC: Andi Kleen <[email protected]>
> Signed-off-by: Haicheng Li <[email protected]>
> ---
> arch/x86/mm/srat_64.c | 10 ++--------
> 1 files changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index a271241..a5bc297 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -238,7 +238,7 @@ update_nodes_add(int node, unsigned long start, unsigned
> long end)
> void __init
> acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
> {
> - struct bootnode *nd, oldnode;
> + struct bootnode *nd;
> unsigned long start, end;
> int node, pxm;
> int i;
> @@ -277,7 +277,6 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> return;
> }
> nd = &nodes[node];
> - oldnode = *nd;
> if (!node_test_and_set(node, nodes_parsed)) {
> nd->start = start;
> nd->end = end;
> @@ -291,13 +290,8 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
> start, end);
>
> - if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> + if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
> update_nodes_add(node, start, end);
> - /* restore nodes[node] */
> - *nd = oldnode;
> - if ((nd->start | nd->end) == 0)
> - node_clear(node, nodes_parsed);
> - }
>
> node_memblk_range[num_node_memblks].start = start;
> node_memblk_range[num_node_memblks].end = end;

There're a couple of issues with this patch:

- it breaks CONFIG_MEMORY_HOTPLUG_SPARSE kernels when the SRAT reports
more than one entry for a single node id and the later entry does not
specify an address range, and

- carrying the bit for the node over in nodes_parsed is inappropriate
since other x86 code depends on that map including only nodes with
memory such as e820_register_active_regions() and nodes_cover_memory().

[ The existing naming is admittedly convoluted since nodes_parsed
represents nodes with memory and cpu_nodes_parsed represents nodes
with only cpus; in actuality, they should probably be
mem_nodes_parsed and no_mem_nodes_parsed, respectively, since the
latter will now include hotpluggable nodes. ]

To fix both of these issues simultaneously, I think it would be better to
set the bit in no_mem_nodes_parsed so that it gets unioned in
acpi_scan_nodes() when nodes_possible_map is actually formed.

2010-01-18 06:30:33

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

On Sun, Jan 17, 2010 at 1:53 PM, David Rientjes <[email protected]> wrote:
> On Fri, 15 Jan 2010, Haicheng Li wrote:
>
>> This is to fix the bug discussed in email thread:
>> http://patchwork.kernel.org/patch/69499/.
>>
>
> It would be better to include a brief synopsis of the problem in the
> changelog instead of a citation, otherwise it obfuscates the issue it
> solves.
>
>> Currently node_possible_map won't include the offlined node that has neither
>> CPU onlined nor MEM onlined at booting time. As a result, nr_node_ids won't be
>> equal to possible nodes.
>>
>> CC: Thomas Gleixner <[email protected]>
>> CC: Ingo Molnar <[email protected]>
>> CC: H. Peter Anvin <[email protected]>
>> CC: Andi Kleen <[email protected]>
>> Signed-off-by: Haicheng Li <[email protected]>
>> ---
>> ?arch/x86/mm/srat_64.c | ? 10 ++--------
>> ?1 files changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
>> index a271241..a5bc297 100644
>> --- a/arch/x86/mm/srat_64.c
>> +++ b/arch/x86/mm/srat_64.c
>> @@ -238,7 +238,7 @@ update_nodes_add(int node, unsigned long start, unsigned
>> long end)
>> ?void __init
>> ?acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
>> ?{
>> - ? ? struct bootnode *nd, oldnode;
>> + ? ? struct bootnode *nd;
>> ? ? ? unsigned long start, end;
>> ? ? ? int node, pxm;
>> ? ? ? int i;
>> @@ -277,7 +277,6 @@ acpi_numa_memory_affinity_init(struct
>> acpi_srat_mem_affinity *ma)
>> ? ? ? ? ? ? ? return;
>> ? ? ? }
>> ? ? ? nd = &nodes[node];
>> - ? ? oldnode = *nd;
>> ? ? ? if (!node_test_and_set(node, nodes_parsed)) {
>> ? ? ? ? ? ? ? nd->start = start;
>> ? ? ? ? ? ? ? nd->end = end;
>> @@ -291,13 +290,8 @@ acpi_numa_memory_affinity_init(struct
>> acpi_srat_mem_affinity *ma)
>> ? ? ? printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
>> ? ? ? ? ? ? ?start, end);
>>
>> - ? ? if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>> + ? ? if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)
>> ? ? ? ? ? ? ? update_nodes_add(node, start, end);
>> - ? ? ? ? ? ? /* restore nodes[node] */
>> - ? ? ? ? ? ? *nd = oldnode;
>> - ? ? ? ? ? ? if ((nd->start | nd->end) == 0)
>> - ? ? ? ? ? ? ? ? ? ? node_clear(node, nodes_parsed);
>> - ? ? }
>>
>> ? ? ? node_memblk_range[num_node_memblks].start = start;
>> ? ? ? node_memblk_range[num_node_memblks].end = end;
>
> There're a couple of issues with this patch:
>
> ?- it breaks CONFIG_MEMORY_HOTPLUG_SPARSE kernels when the SRAT reports
> ? more than one entry for a single node id and the later entry does not
> ? specify an address range, and

not sure this one.

>
> ?- carrying the bit for the node over in nodes_parsed is inappropriate
> ? since other x86 code depends on that map including only nodes with
> ? memory such as e820_register_active_regions() and nodes_cover_memory().

assume if e820 is right, even srat table have a range for hot plug
aka the e820 have hole already. then e820_register_active_regions()
and nodes_cover_memory
should be ok even bootnode have some pre reserved range.

YH

2010-01-18 10:43:42

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

On Sun, 17 Jan 2010, Yinghai Lu wrote:

> > There're a couple of issues with this patch:
> >
> > - it breaks CONFIG_MEMORY_HOTPLUG_SPARSE kernels when the SRAT reports
> > more than one entry for a single node id and the later entry does not
> > specify an address range, and
>
> not sure this one.
>

The old code would preserve the address range from the preceding SRAT
entry for that node to pass to e820_register_active_regions() when
CONFIG_MEMORY_HOTPLUG_SPARSE is enabled, this patch incorrectly clears
that data.

> > - carrying the bit for the node over in nodes_parsed is inappropriate
> > since other x86 code depends on that map including only nodes with
> > memory such as e820_register_active_regions() and nodes_cover_memory().
>
> assume if e820 is right, even srat table have a range for hot plug
> aka the e820 have hole already. then e820_register_active_regions()
> and nodes_cover_memory
> should be ok even bootnode have some pre reserved range.
>

cpu_nodes_parsed handles nodes without memory, there's no reason why a bit
should be set in nodes_parsed if its corresponding node does not have a
valid address range. We have a reasonable expectation that nodes_parsed
represents memory nodes given its use for e820_register_active_regions()
and nodes_cover_memory() as well as acpi_get_nodes() for NUMA emulation,
for example, which would be broken with this patch. See dc0985519.

2010-01-19 11:08:47

by Haicheng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

Dave & Yinghai,

Thanks for the review, and see my reply below.

David Rientjes wrote:
> On Sun, 17 Jan 2010, Yinghai Lu wrote:
>
>>> There're a couple of issues with this patch:
>>>
>>> - it breaks CONFIG_MEMORY_HOTPLUG_SPARSE kernels when the SRAT reports
>>> more than one entry for a single node id and the later entry does not
>>> specify an address range, and
>> not sure this one.
>>
> The old code would preserve the address range from the preceding SRAT
> entry for that node to pass to e820_register_active_regions() when
> CONFIG_MEMORY_HOTPLUG_SPARSE is enabled, this patch incorrectly clears
> that data.

To fully understand what could happen with old code, I write a debug patch like following:
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index dbb5381..f7118d1 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -290,6 +290,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)

printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
start, end);
+ printk(KERN_INFO "SRAT: Node %u oldnode: %lx-%lx\n", node,
+ oldnode.start, oldnode.end);
e820_register_active_regions(node, start >> PAGE_SHIFT,
end >> PAGE_SHIFT);

@@ -297,8 +299,10 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
update_nodes_add(node, start, end);
/* restore nodes[node] */
*nd = oldnode;
- if ((nd->start | nd->end) == 0)
+ if ((nd->start | nd->end) == 0) {
node_clear(node, nodes_parsed);
+ printk("node-clear: %u\n", node);
+ }
}

node_memblk_range[num_node_memblks].start = start;


Then on a system with Node0 on, but Node1 off (both cpu and memory are off), the booting log is like:
[ 0.000000] SRAT: Node 0 PXM 0 0-80000000
[ 0.000000] SRAT: Node 0 oldnode: 0-0
[ 0.000000] SRAT: Node 0 PXM 0 100000000-280000000
[ 0.000000] SRAT: Node 0 oldnode: 0-80000000
[ 0.000000] SRAT: Node 0 PXM 0 300000000-2300000000
[ 0.000000] SRAT: Node 0 oldnode: 0-280000000
[ 0.000000] SRAT: hot plug zone found 300000000 - 2300000000
[ 0.000000] SRAT: Node 0 PXM 0 2300000000-4300000000
[ 0.000000] SRAT: Node 0 oldnode: 0-280000000
[ 0.000000] SRAT: hot plug zone found 300000000 - 4300000000

[ 0.000000] SRAT: Node 1 PXM 1 4300000000-6300000000
[ 0.000000] SRAT: Node 1 oldnode: 0-0
[ 0.000000] SRAT: hot plug zone found 4300000000 - 6300000000
[ 0.000000] node-clear: 1
[ 0.000000] SRAT: Node 1 PXM 1 6300000000-8300000000
[ 0.000000] SRAT: Node 1 oldnode: 0-0
[ 0.000000] SRAT: hot plug zone found 4300000000 - 8300000000
[ 0.000000] node-clear: 1

David, per my understanding, your concern should be like, with this fix, if 3rd or 4th entry of
Node0 has no address range, then Node0 won't be recoverd with oldnode and won't be cleared in
nodes_parsed. But how is it handled by old code?

- it recovers node with oldnode as long as current entry is HOT_PLUGGABLE. so it handles the recover
issue. but I think following patch can simply fix it as well.

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index dbb5381..fdf067f 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -281,7 +281,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
if (!node_test_and_set(node, nodes_parsed)) {
nd->start = start;
nd->end = end;
- } else {
+ } else if ((nd->start | nd->end) != 0) {
if (start < nd->start)
nd->start = start;
if (nd->end < end)

- as Node1 log shows, node_clear(node, nodes_parsed) can be executed only when either the 1st entry
of the node is HOT_PLUGGABLE or all preceding entries have no address range plus current entry
itself is HOT_PLUGGABLE. as a result, even Node1 has a valid address range, since its 1st entry is
HOT_PLUGGABLE, the old code still recovers it with initial node value (i.e. zeroed) as well as
clears Node1 from nodes_parsed. Isn't it a buggy logic?

>>> - carrying the bit for the node over in nodes_parsed is inappropriate
>>> since other x86 code depends on that map including only nodes with
>>> memory such as e820_register_active_regions() and nodes_cover_memory().
>> assume if e820 is right, even srat table have a range for hot plug
>> aka the e820 have hole already. then e820_register_active_regions()
>> and nodes_cover_memory
>> should be ok even bootnode have some pre reserved range.
>>
>
> cpu_nodes_parsed handles nodes without memory, there's no reason why a bit
> should be set in nodes_parsed if its corresponding node does not have a
> valid address range.

For a node has _NOT_ either CPU or Memory like Node1, cpu_nodes_parsed cannot handle it.

> We have a reasonable expectation that nodes_parsed
> represents memory nodes given its use for e820_register_active_regions()
> and nodes_cover_memory() as well as acpi_get_nodes() for NUMA emulation,
> for example, which would be broken with this patch. See dc0985519.
>

either nodes_cover_memory() or e820_register_active_regions() or acpi_get_nodes(), they all have
node-addr-range check code, if the node-addr-range is invalid, they won't be harmed.

What should nodes_parsed be? I think it's reasonable to include all the nodes reported by SRAT
Memory Affinity Structure. Besides, also needs above small fix for recovery issue.

2010-01-19 11:29:47

by Haicheng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

Haicheng Li wrote:
> - it recovers node with oldnode as long as current entry is
> HOT_PLUGGABLE. so it handles the recover issue. but I think following
> patch can simply fix it as well.
>
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index dbb5381..fdf067f 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -281,7 +281,7 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> if (!node_test_and_set(node, nodes_parsed)) {
> nd->start = start;
> nd->end = end;
> - } else {
> + } else if ((nd->start | nd->end) != 0) {
oops, typo here, should be:
+ } else if ((start | end) != 0) {

> if (start < nd->start)
> nd->start = start;
> if (nd->end < end)
>

2010-01-19 23:30:50

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

On Tue, 19 Jan 2010, Haicheng Li wrote:

> > The old code would preserve the address range from the preceding SRAT entry
> > for that node to pass to e820_register_active_regions() when
> > CONFIG_MEMORY_HOTPLUG_SPARSE is enabled, this patch incorrectly clears that
> > data.
>
> To fully understand what could happen with old code, I write a debug patch
> like following:
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index dbb5381..f7118d1 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -290,6 +290,8 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
>
> printk(KERN_INFO "SRAT: Node %u PXM %u %lx-%lx\n", node, pxm,
> start, end);
> + printk(KERN_INFO "SRAT: Node %u oldnode: %lx-%lx\n", node,
> + oldnode.start, oldnode.end);
> e820_register_active_regions(node, start >> PAGE_SHIFT,
> end >> PAGE_SHIFT);
>
> @@ -297,8 +299,10 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> update_nodes_add(node, start, end);
> /* restore nodes[node] */
> *nd = oldnode;
> - if ((nd->start | nd->end) == 0)
> + if ((nd->start | nd->end) == 0) {
> node_clear(node, nodes_parsed);
> + printk("node-clear: %u\n", node);
> + }
> }
>
> node_memblk_range[num_node_memblks].start = start;
>
>
> Then on a system with Node0 on, but Node1 off (both cpu and memory are off),
> the booting log is like:
> [ 0.000000] SRAT: Node 0 PXM 0 0-80000000
> [ 0.000000] SRAT: Node 0 oldnode: 0-0
> [ 0.000000] SRAT: Node 0 PXM 0 100000000-280000000
> [ 0.000000] SRAT: Node 0 oldnode: 0-80000000
> [ 0.000000] SRAT: Node 0 PXM 0 300000000-2300000000
> [ 0.000000] SRAT: Node 0 oldnode: 0-280000000
> [ 0.000000] SRAT: hot plug zone found 300000000 - 2300000000
> [ 0.000000] SRAT: Node 0 PXM 0 2300000000-4300000000
> [ 0.000000] SRAT: Node 0 oldnode: 0-280000000
> [ 0.000000] SRAT: hot plug zone found 300000000 - 4300000000
>
> [ 0.000000] SRAT: Node 1 PXM 1 4300000000-6300000000
> [ 0.000000] SRAT: Node 1 oldnode: 0-0
> [ 0.000000] SRAT: hot plug zone found 4300000000 - 6300000000
> [ 0.000000] node-clear: 1
> [ 0.000000] SRAT: Node 1 PXM 1 6300000000-8300000000
> [ 0.000000] SRAT: Node 1 oldnode: 0-0
> [ 0.000000] SRAT: hot plug zone found 4300000000 - 8300000000
> [ 0.000000] node-clear: 1
>
> David, per my understanding, your concern should be like, with this fix, if
> 3rd or 4th entry of Node0 has no address range, then Node0 won't be recoverd
> with oldnode and won't be cleared in nodes_parsed. But how is it handled by
> old code?
>

It's not evident with your machine because you do not have two SRAT
entries for the same node id, one without ACPI_SRAT_MEM_HOT_PLUGGABLE and
other with ACPI_SRAT_MEM_HOT_PLUGGABLE.

The old code would preserve the address range for the former in oldnode
and then reset its data in the struct bootnode since nodes_parsed has a
bit set for that node. That's needed by later code that I've mentioned:
acpi_get_nodes(), specifically, which breaks with your patch in addition
to nodes_cover_memory() and e820_register_active_regions().

Only when the previous oldnode entry does not have a valid address range,
meaning it is [0, 0), does the bit get cleared in nodes_parsed.

> - it recovers node with oldnode as long as current entry is HOT_PLUGGABLE. so
> it handles the recover issue. but I think following patch can simply fix it as
> well.
>

If it's not ACPI_SRAT_MEM_HOT_PLUGGABLE, we know the address range is
already valid given the sanity checks that it has successfully passed
through in acpi_numa_memory_affinity_init(), so we require no further
checking. However, your patch will not reset the previous address range
when a ACPI_SRAT_MEM_HOT_PLUGGABLE entry is found for the same address
range and you're leaving the bit set in nodes_parsed.

> > cpu_nodes_parsed handles nodes without memory, there's no reason why a bit
> > should be set in nodes_parsed if its corresponding node does not have a
> > valid address range.
>
> For a node has _NOT_ either CPU or Memory like Node1, cpu_nodes_parsed cannot
> handle it.
>

It most certainly can since its sole purpose is to include memoryless
nodes in node_possible_map. It has no other use case that would break as
the result of adding hotpluggable nodes, hence the reason I suggested
renaming it no_mem_nodes_parsed.

> > We have a reasonable expectation that nodes_parsed represents memory nodes
> > given its use for e820_register_active_regions() and nodes_cover_memory() as
> > well as acpi_get_nodes() for NUMA emulation, for example, which would be
> > broken with this patch. See dc0985519.
> >
>
> either nodes_cover_memory() or e820_register_active_regions() or
> acpi_get_nodes(), they all have node-addr-range check code, if the
> node-addr-range is invalid, they won't be harmed.
>

Wrong, acpi_get_nodes() does not have such a check it only iterates over
nodes_parsed. In other words, you'd be starting a new requirement for
nodes_parsed with your patch: it would now be necessary to check for a
valid (non-zero) address range for each set bit. Instead, I'm suggesting
the nodes_parsed represents only nodes with valid memory, which is a
reasonable expectation given the semantics of both it and cpu_nodes_parsed
to handle their memoryless counterparts.

In other words, the following should easily fix the issue without breaking
the existing logic that preserves the old address range for node ids that
have SRAT entries both with and without ACPI_SRAT_MEM_HOT_PLUGGABLE.
Could you give it a try?
---
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -229,9 +229,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ node_set(node, cpu_nodes_parsed);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */

2010-01-20 16:40:20

by Haicheng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes detected by ACPI.

David Rientjes wrote:
> On Tue, 19 Jan 2010, Haicheng Li wrote:
>> David, per my understanding, your concern should be like, with this fix, if
>> 3rd or 4th entry of Node0 has no address range, then Node0 won't be recoverd
>> with oldnode and won't be cleared in nodes_parsed. But how is it handled by
>> old code?
>>
>
> It's not evident with your machine because you do not have two SRAT
> entries for the same node id, one without ACPI_SRAT_MEM_HOT_PLUGGABLE and
> other with ACPI_SRAT_MEM_HOT_PLUGGABLE.
>
> The old code would preserve the address range for the former in oldnode
> and then reset its data in the struct bootnode since nodes_parsed has a
> bit set for that node. That's needed by later code that I've mentioned:
> acpi_get_nodes(), specifically, which breaks with your patch in addition
> to nodes_cover_memory() and e820_register_active_regions().
>
> Only when the previous oldnode entry does not have a valid address range,
> meaning it is [0, 0), does the bit get cleared in nodes_parsed.

Understood, the old code is meant to make nodes_parsed _NEVER_ include the node whose memory regions
are all hotpluggable.

>> - it recovers node with oldnode as long as current entry is HOT_PLUGGABLE. so
>> it handles the recover issue. but I think following patch can simply fix it as
>> well.
>>
>
> If it's not ACPI_SRAT_MEM_HOT_PLUGGABLE, we know the address range is
> already valid given the sanity checks that it has successfully passed
> through in acpi_numa_memory_affinity_init(), so we require no further
> checking. However, your patch will not reset the previous address range
> when a ACPI_SRAT_MEM_HOT_PLUGGABLE entry is found for the same address
> range and you're leaving the bit set in nodes_parsed.

I see. the precondition is that nodes_parsed should not include such hotpluggable node, then such
data of hotpluggable mem should be kept in nodes_add[] other than in nodes[].

>>> cpu_nodes_parsed handles nodes without memory, there's no reason why a bit
>>> should be set in nodes_parsed if its corresponding node does not have a
>>> valid address range.
>> For a node has _NOT_ either CPU or Memory like Node1, cpu_nodes_parsed cannot
>> handle it.
>>
>
> It most certainly can since its sole purpose is to include memoryless
> nodes in node_possible_map. It has no other use case that would break as
> the result of adding hotpluggable nodes, hence the reason I suggested
> renaming it no_mem_nodes_parsed.

Yeah, so the key point is who should keep hotpluggable nodes, cpu_nodes_parsed or nodes_parsed?
Actually now I agree with you on this, let cpu_nodes_parsed keep hotpluggable nodes since it won't
break any old code. Originally my patch wanna let nodes_parsed keep hotpluggable nodes, which would
make things complex.

but name "no_mem_nodes_parsed" seems convoluted too because (from code logic) this nodemask is
usually based on CPU/APIC Affinity Structure.
How about rename cpu_nodes_parsed as "rest_nodes_parsed" (comparing with "mem_nodes_parsed), since
it handles
- nodes with CPU on
- nodes with hotpluggable memory region
?

>>> We have a reasonable expectation that nodes_parsed represents memory nodes
>>> given its use for e820_register_active_regions() and nodes_cover_memory() as
>>> well as acpi_get_nodes() for NUMA emulation, for example, which would be
>>> broken with this patch. See dc0985519.
>>>
>> either nodes_cover_memory() or e820_register_active_regions() or
>> acpi_get_nodes(), they all have node-addr-range check code, if the
>> node-addr-range is invalid, they won't be harmed.
>>
>
> Wrong, acpi_get_nodes() does not have such a check it only iterates over
> nodes_parsed. In other words, you'd be starting a new requirement for
> nodes_parsed with your patch: it would now be necessary to check for a
> valid (non-zero) address range for each set bit. Instead, I'm suggesting
> the nodes_parsed represents only nodes with valid memory, which is a
> reasonable expectation given the semantics of both it and cpu_nodes_parsed
> to handle their memoryless counterparts.

agreed. In term of this, using nodes_parsed to represent only nodes with valid memory can make
things simple.

> In other words, the following should easily fix the issue without breaking
> the existing logic that preserves the old address range for node ids that
> have SRAT entries both with and without ACPI_SRAT_MEM_HOT_PLUGGABLE.
> Could you give it a try?

of course, it fixes the issue because node_possible_map now includes hotpluggable node, and then
nr_node_ids becomes equal to maximum of possible nodes on the motherboard;).

let's add more changes to fix naming issue as well since it's too confusing for people to understand
the code logic. how about below patch?
---
arch/x86/mm/srat_64.c | 39 +++++++++++++++++++++++++--------------
1 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..aebbdd4 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -27,8 +27,17 @@ int acpi_numa __initdata;

static struct acpi_table_slit *acpi_slit;

-static nodemask_t nodes_parsed __initdata;
-static nodemask_t cpu_nodes_parsed __initdata;
+/* mem_nodes_parsed:
+ * - nodes with memory on
+ *
+ * rest_nodes_parsed:
+ * - nodes with CPU on
+ * - nodes with hotpluggable memory region
+ *
+ * We union these two nodemasks to get node_possible_map.
+ */
+static nodemask_t mem_nodes_parsed __initdata;
+static nodemask_t rest_nodes_parsed __initdata;
static struct bootnode nodes[MAX_NUMNODES] __initdata;
static struct bootnode nodes_add[MAX_NUMNODES];

@@ -134,7 +143,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)

apic_id = pa->apic_id;
apicid_to_node[apic_id] = node;
- node_set(node, cpu_nodes_parsed);
+ node_set(node, rest_nodes_parsed);
acpi_numa = 1;
printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
pxm, apic_id, node);
@@ -168,7 +177,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
else
apic_id = pa->apic_id;
apicid_to_node[apic_id] = node;
- node_set(node, cpu_nodes_parsed);
+ node_set(node, rest_nodes_parsed);
acpi_numa = 1;
printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
pxm, apic_id, node);
@@ -229,9 +238,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
+ node_set(node, rest_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
@@ -278,7 +289,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
}
nd = &nodes[node];
oldnode = *nd;
- if (!node_test_and_set(node, nodes_parsed)) {
+ if (!node_test_and_set(node, mem_nodes_parsed)) {
nd->start = start;
nd->end = end;
} else {
@@ -296,7 +307,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
/* restore nodes[node] */
*nd = oldnode;
if ((nd->start | nd->end) == 0)
- node_clear(node, nodes_parsed);
+ node_clear(node, mem_nodes_parsed);
}

node_memblk_range[num_node_memblks].start = start;
@@ -313,7 +324,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
unsigned long pxmram, e820ram;

pxmram = 0;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, mem_nodes_parsed) {
unsigned long s = nodes[i].start >> PAGE_SHIFT;
unsigned long e = nodes[i].end >> PAGE_SHIFT;
pxmram += e - s;
@@ -341,7 +352,7 @@ int __init acpi_get_nodes(struct bootnode *physnodes)
int i;
int ret = 0;

- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, mem_nodes_parsed) {
physnodes[ret].start = nodes[i].start;
physnodes[ret].end = nodes[i].end;
ret++;
@@ -370,7 +381,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
return -1;
}

- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, mem_nodes_parsed)
e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
/* for out of order entries in SRAT */
@@ -381,7 +392,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
}

/* Account for nodes with cpus and no memory */
- nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
+ nodes_or(node_possible_map, mem_nodes_parsed, rest_nodes_parsed);

/* Finally register nodes */
for_each_node_mask(i, node_possible_map)
@@ -416,7 +427,7 @@ static int __init find_node_by_addr(unsigned long addr)
int ret = NUMA_NO_NODE;
int i;

- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, mem_nodes_parsed) {
/*
* Find the real node that this emulated node appears on. For
* the sake of simplicity, we only use a real node's starting
@@ -466,10 +477,10 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
__acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
memcpy(apicid_to_node, fake_apicid_to_node, sizeof(apicid_to_node));

- nodes_clear(nodes_parsed);
+ nodes_clear(mem_nodes_parsed);
for (i = 0; i < num_nodes; i++)
if (fake_nodes[i].start != fake_nodes[i].end)
- node_set(i, nodes_parsed);
+ node_set(i, mem_nodes_parsed);
}

static int null_slit_node_compare(int a, int b)
--
1.5.4.4

2010-01-20 20:10:58

by David Rientjes

[permalink] [raw]
Subject: [patch] x86: set hotpluggable nodes in nodes_possible_map

On Thu, 21 Jan 2010, Haicheng Li wrote:

> let's add more changes to fix naming issue as well since it's too confusing
> for people to understand
> the code logic. how about below patch?

That should be a seperate change; there's a bugfix here (my patch) and
then a cleanup patch that you could make incrementally on mine. I don't
personally like the name "rest_nodes_parsed" since it's poor English, I
suggest renaming nodes_parsed to mem_nodes_parsed as I originally asked
and then cpu_nodes_parsed to acpi_nodes_parsed or something similiar.

Ingo, the following is my bugfix patch that addresses the issue at
http://patchwork.kernel.org/patch/69499. Haicheng, can we add your
tested-by line?



x86: set hotpluggable nodes in nodes_possible_map

nodes_possible_map does not currently include nodes that have SRAT
entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is cleared
in nodes_parsed if it does not have an online address range.

Unequivocally setting the bit in nodes_parsed is insufficient since
existing code, such as acpi_get_nodes(), assumes all nodes in the map
have online address ranges. In fact, all code using nodes_parsed assumes
such nodes represent an address range of online memory.

nodes_possible_map is created by unioning nodes_parsed and
cpu_nodes_parsed; the former represents nodes with online memory and the
latter represents memoryless nodes. We now set the bit for hotpluggable
nodes in cpu_nodes_parsed so that it also gets set in nodes_possible_map.

Signed-off-by: David Rientjes <[email protected]>
---
arch/x86/mm/srat_64.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -229,9 +229,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
+ node_set(node, cpu_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */

2010-01-20 22:48:13

by Yinghai Lu

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

On 01/20/2010 12:10 PM, David Rientjes wrote:
> On Thu, 21 Jan 2010, Haicheng Li wrote:
>
>> let's add more changes to fix naming issue as well since it's too confusing
>> for people to understand
>> the code logic. how about below patch?
>
> That should be a seperate change; there's a bugfix here (my patch) and
> then a cleanup patch that you could make incrementally on mine. I don't
> personally like the name "rest_nodes_parsed" since it's poor English, I
> suggest renaming nodes_parsed to mem_nodes_parsed as I originally asked
> and then cpu_nodes_parsed to acpi_nodes_parsed or something similiar.
>
> Ingo, the following is my bugfix patch that addresses the issue at
> http://patchwork.kernel.org/patch/69499. Haicheng, can we add your
> tested-by line?
>
>
>
> x86: set hotpluggable nodes in nodes_possible_map
>
> nodes_possible_map does not currently include nodes that have SRAT
> entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is cleared
> in nodes_parsed if it does not have an online address range.
>
> Unequivocally setting the bit in nodes_parsed is insufficient since
> existing code, such as acpi_get_nodes(), assumes all nodes in the map
> have online address ranges. In fact, all code using nodes_parsed assumes
> such nodes represent an address range of online memory.
>
> nodes_possible_map is created by unioning nodes_parsed and
> cpu_nodes_parsed; the former represents nodes with online memory and the
> latter represents memoryless nodes. We now set the bit for hotpluggable
> nodes in cpu_nodes_parsed so that it also gets set in nodes_possible_map.
>
> Signed-off-by: David Rientjes <[email protected]>
> ---
> arch/x86/mm/srat_64.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -229,9 +229,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
> printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
> }
>
> - if (changed)
> + if (changed) {
> + node_set(node, cpu_nodes_parsed);
> printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
> nd->start, nd->end);
> + }
> }
>
> /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */

if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
update_nodes_add(node, start, end);
/* restore nodes[node] */
*nd = oldnode;
if ((nd->start | nd->end) == 0)
node_clear(node, nodes_parsed);
}

removing clearing with nodes_parsed is not working?

YH

2010-01-20 23:32:39

by David Rientjes

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

On Wed, 20 Jan 2010, Yinghai Lu wrote:

> > nodes_possible_map does not currently include nodes that have SRAT
> > entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is cleared
> > in nodes_parsed if it does not have an online address range.
> >
> > Unequivocally setting the bit in nodes_parsed is insufficient since
> > existing code, such as acpi_get_nodes(), assumes all nodes in the map
> > have online address ranges. In fact, all code using nodes_parsed assumes
> > such nodes represent an address range of online memory.
> >
> > nodes_possible_map is created by unioning nodes_parsed and
> > cpu_nodes_parsed; the former represents nodes with online memory and the
> > latter represents memoryless nodes. We now set the bit for hotpluggable
> > nodes in cpu_nodes_parsed so that it also gets set in nodes_possible_map.
> >
> > Signed-off-by: David Rientjes <[email protected]>
> > ---
> > arch/x86/mm/srat_64.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> > --- a/arch/x86/mm/srat_64.c
> > +++ b/arch/x86/mm/srat_64.c
> > @@ -229,9 +229,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
> > printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
> > }
> >
> > - if (changed)
> > + if (changed) {
> > + node_set(node, cpu_nodes_parsed);
> > printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
> > nd->start, nd->end);
> > + }
> > }
> >
> > /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
>
> if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> update_nodes_add(node, start, end);
> /* restore nodes[node] */
> *nd = oldnode;
> if ((nd->start | nd->end) == 0)
> node_clear(node, nodes_parsed);
> }
>
> removing clearing with nodes_parsed is not working?
>

As previously discussed in this thread, the semantics of nodes_parsed is
to define nodes that have online address ranges; all code written to use
nodes_parsed operates on those address ranges. Requiring it to also
support memoryless (or hotpluggable) nodes would require a change in
semantics in other areas such as acpi_get_nodes() and would be contrary to
the effort in dc098551 to fix this inconsistency.

I believe Haicheng will be posting an incremental patch that changes the
name of cpu_nodes_parsed to something such as acpi_nodes_parsed to
indicate it does not necessarily represent memory but rather another ACPI
definition: either a node including only cpus or hotpluggable ranges.

2010-01-21 02:58:47

by Haicheng Li

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

David Rientjes wrote:
> On Thu, 21 Jan 2010, Haicheng Li wrote:
>
>> let's add more changes to fix naming issue as well since it's too confusing
>> for people to understand
>> the code logic. how about below patch?
>
> That should be a seperate change; there's a bugfix here (my patch) and
> then a cleanup patch that you could make incrementally on mine. I don't
> personally like the name "rest_nodes_parsed" since it's poor English, I
> suggest renaming nodes_parsed to mem_nodes_parsed as I originally asked
> and then cpu_nodes_parsed to acpi_nodes_parsed or something similiar.

IMHO, name acpi_nodes_parsed is not appropriate for this intention as nodes_parsed comes from acpi too.

Think about it again, it's better _NOT_ to mix up cpu_nodes_parsed and hotplugpable_nodes together.
We cannot assume cpu_nodes_parsed has no other use in future, like possibly cpu hotplug emulation. I
think that a clean and straightforward way is to keep nodes with hotpluggable range in a separated
nodemask, like "hp_nodes_parsed", which could also be used for future mem hotplug emulation. Then
node data corresponding to nodes_parsed is kept in nodes[], node data for hp_nodes_parsed is kept in
nodes_add[].

> Ingo, the following is my bugfix patch that addresses the issue at
> http://patchwork.kernel.org/patch/69499.

Personally I don't like such patch with confusable info.
I think following patch would be more straightfoward and clean:

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..c5552ae 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -27,8 +27,18 @@ int acpi_numa __initdata;

static struct acpi_table_slit *acpi_slit;

+/* nodes_parsed:
+ * - nodes with memory on
+ * cpu_nodes_parsed:
+ * - nodes with cpu on
+ * hp_nodes_parsed:
+ * - nodes with hotpluggable memory region
+ *
+ * We union these tree nodemasks to get node_possible_map.
+ */
static nodemask_t nodes_parsed __initdata;
static nodemask_t cpu_nodes_parsed __initdata;
+static nodemask_t hp_nodes_parsed __initdata;
static struct bootnode nodes[MAX_NUMNODES] __initdata;
static struct bootnode nodes_add[MAX_NUMNODES];

@@ -229,9 +239,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
+ node_set(node, hp_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
@@ -380,8 +392,10 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
return -1;
}

- /* Account for nodes with cpus and no memory */
+ /* Account for nodes with memory and cpus */
nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
+ /* Account for nodes with hotpluggable memory regions */
+ nodes_or(node_possible_map, node_possible_map, hp_nodes_parsed);

/* Finally register nodes */
for_each_node_mask(i, node_possible_map)

2010-01-21 03:00:11

by Haicheng Li

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

Yinghai Lu wrote:
>
> if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> update_nodes_add(node, start, end);
> /* restore nodes[node] */
> *nd = oldnode;
> if ((nd->start | nd->end) == 0)
> node_clear(node, nodes_parsed);
> }
>
> removing clearing with nodes_parsed is not working?
>
> YH

Yinghai,

Theoretically removing clearing with nodes_parsed can work fine except that it requires more
consideration, since some functions already based on nodes_parsed, like acpi_get_nodes(), is
supposing nodes_parsed just represents for nodes with memory on.

See my another email to David, I think we'd better keep hotpluggable node info separately since it
is straightforward as well as would be useful for future hotplug related usage. How do you think
about it? thanks.

-haicheng

2010-01-21 06:58:49

by David Rientjes

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

On Thu, 21 Jan 2010, Haicheng Li wrote:

> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index a271241..c5552ae 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -27,8 +27,18 @@ int acpi_numa __initdata;
>
> static struct acpi_table_slit *acpi_slit;
>
> +/* nodes_parsed:
> + * - nodes with memory on
> + * cpu_nodes_parsed:
> + * - nodes with cpu on
> + * hp_nodes_parsed:
> + * - nodes with hotpluggable memory region
> + *
> + * We union these tree nodemasks to get node_possible_map.
> + */
> static nodemask_t nodes_parsed __initdata;
> static nodemask_t cpu_nodes_parsed __initdata;
> +static nodemask_t hp_nodes_parsed __initdata;
> static struct bootnode nodes[MAX_NUMNODES] __initdata;
> static struct bootnode nodes_add[MAX_NUMNODES];
>
> @@ -229,9 +239,11 @@ update_nodes_add(int node, unsigned long start, unsigned
> long end)
> printk(KERN_ERR "SRAT: Hotplug zone not continuous.
> Partly ignored\n");
> }
>
> - if (changed)
> + if (changed) {
> + node_set(node, hp_nodes_parsed);
> printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
> nd->start, nd->end);
> + }
> }
>
> /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
> @@ -380,8 +392,10 @@ int __init acpi_scan_nodes(unsigned long start, unsigned
> long end)
> return -1;
> }
>
> - /* Account for nodes with cpus and no memory */
> + /* Account for nodes with memory and cpus */
> nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
> + /* Account for nodes with hotpluggable memory regions */
> + nodes_or(node_possible_map, node_possible_map, hp_nodes_parsed);
>
> /* Finally register nodes */
> for_each_node_mask(i, node_possible_map)
>
>

Nack, we don't need to add yet another nodemask because you're having
trouble finding a new name for a cpu_nodes_parsed. It would be perfectly
acceptable to rename nodes_parsed to mem_nodes and cpu_nodes_parsed to
no_mem_nodes, which are even shorter. But we definitely don't need
another nodemask and I think we're convoluting the bug fix here way too
much. Regardless of the future nomenclature, let's please merge my patch
to fix the outstanding kernel issue and then propose an alternative naming
scheme later.

2010-01-21 07:32:12

by Haicheng Li

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

David Rientjes wrote:
> On Thu, 21 Jan 2010, Haicheng Li wrote:
>
>> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
>> index a271241..c5552ae 100644
>> --- a/arch/x86/mm/srat_64.c
>> +++ b/arch/x86/mm/srat_64.c
>> @@ -27,8 +27,18 @@ int acpi_numa __initdata;
>>
>> static struct acpi_table_slit *acpi_slit;
>>
>> +/* nodes_parsed:
>> + * - nodes with memory on
>> + * cpu_nodes_parsed:
>> + * - nodes with cpu on
>> + * hp_nodes_parsed:
>> + * - nodes with hotpluggable memory region
>> + *
>> + * We union these tree nodemasks to get node_possible_map.
>> + */
>> static nodemask_t nodes_parsed __initdata;
>> static nodemask_t cpu_nodes_parsed __initdata;
>> +static nodemask_t hp_nodes_parsed __initdata;
>> static struct bootnode nodes[MAX_NUMNODES] __initdata;
>> static struct bootnode nodes_add[MAX_NUMNODES];
>>
>> @@ -229,9 +239,11 @@ update_nodes_add(int node, unsigned long start, unsigned
>> long end)
>> printk(KERN_ERR "SRAT: Hotplug zone not continuous.
>> Partly ignored\n");
>> }
>>
>> - if (changed)
>> + if (changed) {
>> + node_set(node, hp_nodes_parsed);
>> printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
>> nd->start, nd->end);
>> + }
>> }
>>
>> /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
>> @@ -380,8 +392,10 @@ int __init acpi_scan_nodes(unsigned long start, unsigned
>> long end)
>> return -1;
>> }
>>
>> - /* Account for nodes with cpus and no memory */
>> + /* Account for nodes with memory and cpus */
>> nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
>> + /* Account for nodes with hotpluggable memory regions */
>> + nodes_or(node_possible_map, node_possible_map, hp_nodes_parsed);
>>
>> /* Finally register nodes */
>> for_each_node_mask(i, node_possible_map)
>>
>>
>
> Nack, we don't need to add yet another nodemask because you're having
> trouble finding a new name for a cpu_nodes_parsed. It would be perfectly
Hey Dave, why do you think it's just a naming issue?

What I'm concerning is that your assumption of cpu_nodes_parsed use is wrong,
cpu_nodes_parsed is needed anyway since its semantics represent the node with cpu affinity rather
than memless node, that's also why I originally doubted cpu_node_parsed cannot handle hotplug node.
we also need hp_nodes_parsed to represent the node with hotpluggable memory region, just like why we
need nodes_parsed to repsent node with mem on.

The code could be straightforward and easy to undertand in this way.

> acceptable to rename nodes_parsed to mem_nodes and cpu_nodes_parsed to
> no_mem_nodes, which are even shorter. But we definitely don't need
> another nodemask and I think we're convoluting the bug fix here way too
> much. Regardless of the future nomenclature, let's please merge my patch
> to fix the outstanding kernel issue and then propose an alternative naming
It's important to fix the outstanding kernel issue, that's also why I keep shooting down this issue.
I wanna fix it thru a right way rather than just get it fixed.

2010-01-21 07:51:05

by David Rientjes

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

On Thu, 21 Jan 2010, Haicheng Li wrote:

> > Nack, we don't need to add yet another nodemask because you're having
> > trouble finding a new name for a cpu_nodes_parsed. It would be perfectly
> Hey Dave, why do you think it's just a naming issue?
>
> What I'm concerning is that your assumption of cpu_nodes_parsed use is wrong,
> cpu_nodes_parsed is needed anyway since its semantics represent the node with
> cpu affinity rather than memless node, that's also why I originally doubted
> cpu_node_parsed cannot handle hotplug node.

Wrong, cpu_nodes_parsed (despite its name) solely represents nodes that do
not have online address memory ranges. That's it. Nothing more, nothing
less. That's why I suggest you rename it to no_mem_nodes or something
similar. Look at the single time that the nodemask is used: to set
cleared bits in node_possible_map that were not set in nodes_parsed
because THEY DO NOT HAVE ASSOCIATED ONLINE MEMORY RANGES, the _only_ time
a node gets set in nodes_parsed.

Once you rename nodes_parsed to mem_nodes and cpu_nodes_parsed to
no_mem_nodes, it may become clearer.

> we also need hp_nodes_parsed to represent the node with hotpluggable memory
> region, just like why we need nodes_parsed to repsent node with mem on.
>

It's pointless to add another nodemask, the semantics of cpu_nodes_parsed
is perfectly legitimate for hotpluggable nodes as well. Instead of
fixating on the name, look at the code that uses it.

2010-01-21 08:33:40

by Haicheng Li

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

David Rientjes wrote:
> On Thu, 21 Jan 2010, Haicheng Li wrote:
>
>>> Nack, we don't need to add yet another nodemask because you're having
>>> trouble finding a new name for a cpu_nodes_parsed. It would be perfectly
>> Hey Dave, why do you think it's just a naming issue?
>>
>> What I'm concerning is that your assumption of cpu_nodes_parsed use is wrong,
>> cpu_nodes_parsed is needed anyway since its semantics represent the node with
>> cpu affinity rather than memless node, that's also why I originally doubted
>> cpu_node_parsed cannot handle hotplug node.
>
> Wrong, cpu_nodes_parsed (despite its name) solely represents nodes that do
> not have online address memory ranges. That's it. Nothing more, nothing
> less. That's why I suggest you rename it to no_mem_nodes or something
> similar. Look at the single time that the nodemask is used: to set
> cleared bits in node_possible_map that were not set in nodes_parsed
> because THEY DO NOT HAVE ASSOCIATED ONLINE MEMORY RANGES, the _only_ time
> a node gets set in nodes_parsed.
>
> Once you rename nodes_parsed to mem_nodes and cpu_nodes_parsed to
> no_mem_nodes, it may become clearer.
>
>> we also need hp_nodes_parsed to represent the node with hotpluggable memory
>> region, just like why we need nodes_parsed to repsent node with mem on.
>>
>
> It's pointless to add another nodemask, the semantics of cpu_nodes_parsed
> is perfectly legitimate for hotpluggable nodes as well. Instead of
> fixating on the name, look at the code that uses it.

I'm not meant to be rude, but it's illogical excuse. that it is now used by a single function
doesn't mean that it will never be used by others in future or it is only useful for that single
function.

see the code, we can find such relationships between related data-structures.
- struct bootnode nodes[] -> nodes_parsed
all the node in nodes_parsed should have a relative mem range in nodes[].

- apicid_to_node[] -> cpu_nodes_parsed
all the value of apicid_to_node[] should be valid in cpu_nodes_parsed. If we put hotpluggable node
into cpu_nodes_parsed, such relationship is broken, right?

- nodes_add[] -> ?? (this is for hotpluggable node)
all the hotplugged mem can find a corresponding node thru nodes_add[].

-haicheng

2010-01-21 23:13:11

by David Rientjes

[permalink] [raw]
Subject: Re: [patch] x86: set hotpluggable nodes in nodes_possible_map

On Thu, 21 Jan 2010, Haicheng Li wrote:

> > It's pointless to add another nodemask, the semantics of cpu_nodes_parsed is
> > perfectly legitimate for hotpluggable nodes as well. Instead of fixating on
> > the name, look at the code that uses it.
>
> I'm not meant to be rude, but it's illogical excuse. that it is now used by a
> single function doesn't mean that it will never be used by others in future or
> it is only useful for that single function.
>

It will not be used by a single function in this context for any valid
purpose, hp_nodes_parsed is completely unnecessary and overkill to address
a problem that is a relative one-liner. You're completely missing the
points that (i) no other SRAT parsing code cares about distinguishing only
nodes with hotpluggable memory ranges, and (ii) no other SRAT parsing code
even cares about distinguishing memoryless nodes with only cpus attached.
Thus, we can use cpu_nodes_allowed to represent _all_ nodes without online
memory ranges (and you can even rename it to no_mem_nodes later, if you
want).

This discussion is becoming very annoying because you have constantly
proposed new patches (I think 4 now?) that are much more complex and
without any real consistent idea behind them. Your latest proposal is to
add a nodemask because you speculate that someday it will become useful;
the truth of the matter is that we don't need to do anything with them
beyond detection so the bit gets set in nodes_possible_map.

Ingo should not need to look through what is becoming an extremely long
discussion for a bugfix that should be applied for rc5 because we're
getting _very_ late in the 2.6.33 release cycle. Do you expect Ingo to
push your fix to Linus with the rationale that "maybe someday we'll use
this new nodemask even though it may be rc5 and nobody knows what we'd
ever use it for"? Is that appropriate for -stable candidates as well?

You've already tested my patch that this thread was restarted with and it
works, so let's fix the bug. Then, later, you can rename cpu_nodes_parsed
to no_mems_nodes, which I'd agree with. You may even try to seperate the
hotpluggable nodes out into their own nodemask, but I trust that the x86
maintainers will be looking for some rationale behind that other than "it
may one day be useful."

2010-01-22 04:06:24

by Haicheng Li

[permalink] [raw]
Subject: [PATCH] x86/mm/srat_64.c: make node_possible_map include hotpluggable node

David Rientjes wrote:
> You've already tested my patch that this thread was restarted with and it
> works, so let's fix the bug. Then, later, you can rename cpu_nodes_parsed
> to no_mems_nodes, which I'd agree with. You may even try to seperate the
> hotpluggable nodes out into their own nodemask, but I trust that the x86
> maintainers will be looking for some rationale behind that other than "it
> may one day be useful."

David, you are misleading people to fix the BUG with a logically problematic patch. I don't want
such fixing to possibly bother other people someday, please let's avoid it in review stage.

> getting _very_ late in the 2.6.33 release cycle. Do you expect Ingo to
> push your fix to Linus with the rationale that "maybe someday we'll use
> this new nodemask even though it may be rc5 and nobody knows what we'd
> ever use it for"? Is that appropriate for -stable candidates as well?

Don't speak for any other people. Let maintainers themselves decide if my patch is ugly or
acceptable. I don't want to argue with you anymore if you cannot find any true problem from my
recent patch.

Below is my updated patch (in fact, it's v2 for the patch I sent out for review in
http://lkml.org/lkml/2010/1/15/9).

***

This is to fix the outstanding BUG I ever reported in email thread:
http://patchwork.kernel.org/patch/69499/.
[ 141.667487] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[ 141.667782] IP: [<ffffffff810b8a64>] cache_reap+0x71/0x236
[ 141.667969] PGD 0
[ 141.668129] Oops: 0000 [#1] SMP
[ 141.668357] last sysfs file: /sys/class/scsi_host/host4/proc_name
[ 141.668469] CPU
[ 141.668630] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc
dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc
lp parport joydev usbhid sr_mod cdrom thermal processor thermal_sys container button rtc_cmos
rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore
[ 141.671659] Pid: 126, comm: events/27 Not tainted 2.6.32 #9 Server
[ 141.671771] RIP: 0010:[<ffffffff810b8a64>] [<ffffffff810b8a64>] cache_reap+0x71/0x236
[ 141.671981] RSP: 0018:ffff88027e81bdf0 EFLAGS: 00010206
[ 141.672089] RAX: 0000000000000002 RBX: 0000000000000078 RCX: ffff88047d86e580
[ 141.672204] RDX: ffff88047dfcbc00 RSI: ffff88047f13f6c0 RDI: ffff88047d9136c0
[ 141.672319] RBP: ffff88027e81be30 R08: 0000000000000001 R09: 0000000000000001
[ 141.672433] R10: 0000000000000000 R11: 0000000000000086 R12: ffff88047d87c200
[ 141.672548] R13: ffff88047d87d680 R14: ffffffff810b89f3 R15: 0000000000000002
[ 141.672663] FS: 0000000000000000(0000) GS:ffff88028b5a0000(0000) knlGS:0000000000000000
[ 141.672807] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 141.672917] CR2: 0000000000000078 CR3: 0000000001001000 CR4: 00000000000006e0
[ 141.673032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 141.673147] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 141.673262] Process events/27 (pid: 126, threadinfo ffff88027e81a000, task ffff88027f3ea040)
[ 141.673406] Stack:
[ 141.673503] ffff88027e81be30 ffff88028b5b05a0 0000000100000000 ffff88027e81be80
[ 141.673808] <0> ffff88028b5b5b40 ffff88028b5b05a0 ffffffff810b89f3 fffffffff00000c6
[ 141.674265] <0> ffff88027e81bec0 ffffffff81057394 ffffffff8105733e ffffffff81369f3a
[ 141.674813] Call Trace:
[ 141.674915] [<ffffffff810b89f3>] ? cache_reap+0x0/0x236
[ 141.675028] [<ffffffff81057394>] worker_thread+0x17a/0x27b
[ 141.675138] [<ffffffff8105733e>] ? worker_thread+0x124/0x27b
[ 141.675256] [<ffffffff81369f3a>] ? thread_return+0x3e/0xee
[ 141.675369] [<ffffffff8105a244>] ? autoremove_wake_function+0x0/0x38
[ 141.675482] [<ffffffff8105721a>] ? worker_thread+0x0/0x27b
[ 141.675593] [<ffffffff8105a146>] kthread+0x7d/0x87
[ 141.675707] [<ffffffff81012daa>] child_rip+0xa/0x20
[ 141.675817] [<ffffffff81012710>] ? restore_args+0x0/0x30
[ 141.675927] [<ffffffff8105a0c9>] ? kthread+0x0/0x87
[ 141.676035] [<ffffffff81012da0>] ? child_rip+0x0/0x20
[ 141.676142] Code: a4 c5 68 08 00 00 65 48 8b 04 25 00 e4 00 00 48 8b 04 18 49 8b 4c 24 78 48 85
c9 74 5b 41 89 c7 48 98 48 8b 1c c1 48 85 db 74 4d <83> 3b 00 74 48 48 83 3d ff d4 65 00 00 75 04 0f
0b eb fe fa 66
[ 141.680610] RIP [<ffffffff810b8a64>] cache_reap+0x71/0x236
[ 141.680785] RSP <ffff88027e81bdf0>
[ 141.680886] CR2: 0000000000000078
[ 141.681016] ---[ end trace b1e17069ef81fe83 ]--

Existing code doesn't record hotpluggable nodes parsed from SRAT because such nodes have neither CPU
online (in cpu_nodes_parsed) nor MEM online (in nodes_parsed) at booting time. As a result,
node_possible_map won't include hotpluggable nodes and then nr_node_ids won't be equal to maximum of
the possible nodes on the system.

To fix it, naturally we add nodemask_t hp_nodes_parsed to record nodes with hotpluggable memory
region, corresponding region data is kept in existing struct bootnode nodes_add[]; finally we union
hp_nodes_parsed together with cpu_nodes_parsed and nodes_parsed to get correct node_possible_map,
which then includes all possible nodes:
- nodes with memory on
- nodes with cpu on
- nodes with hotpluggable memory region

Signed-off-by: Haicheng Li <[email protected]>
---
arch/x86/mm/srat_64.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index dbb5381..595b14d 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -29,6 +29,7 @@ static struct acpi_table_slit *acpi_slit;

static nodemask_t nodes_parsed __initdata;
static nodemask_t cpu_nodes_parsed __initdata;
+static nodemask_t hp_nodes_parsed __initdata;
static struct bootnode nodes[MAX_NUMNODES] __initdata;
static struct bootnode nodes_add[MAX_NUMNODES];

@@ -229,9 +230,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
+ node_set(node, hp_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
@@ -364,8 +367,10 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
return -1;
}

- /* Account for nodes with cpus and no memory */
+ /* Account for nodes with either memory or cpus online */
nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
+ /* Account for nodes with hotpluggable memory region */
+ nodes_or(node_possible_map, node_possible_map, hp_nodes_parsed);

/* Finally register nodes */
for_each_node_mask(i, node_possible_map)
--
1.5.3.8

2010-01-22 07:34:44

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: make node_possible_map include hotpluggable node

On 01/21/2010 08:06 PM, Haicheng Li wrote:
> David Rientjes wrote:
>> You've already tested my patch that this thread was restarted with and it
>> works, so let's fix the bug. Then, later, you can rename
> cpu_nodes_parsed
>> to no_mems_nodes, which I'd agree with. You may even try to seperate the
>> hotpluggable nodes out into their own nodemask, but I trust that the x86
>> maintainers will be looking for some rationale behind that other than "it
>> may one day be useful."
>
> David, you are misleading people to fix the BUG with a logically
> problematic patch. I don't want such fixing to possibly bother other
> people someday, please let's avoid it in review stage.
>
>> getting _very_ late in the 2.6.33 release cycle. Do you expect Ingo to
>> push your fix to Linus with the rationale that "maybe someday we'll use
>> this new nodemask even though it may be rc5 and nobody knows what we'd
>> ever use it for"? Is that appropriate for -stable candidates as well?
>
> Don't speak for any other people. Let maintainers themselves decide if
> my patch is ugly or acceptable. I don't want to argue with you anymore
> if you cannot find any true problem from my recent patch.
>
> Below is my updated patch (in fact, it's v2 for the patch I sent out for
> review in http://lkml.org/lkml/2010/1/15/9).
>

Okay... please calm down. I just read through this thread from the top,
and had missed the fact that it had gotten so tense.

I have to say I agree with David Rientjes that we need the minimal patch
for upstream and stable. If you need the additional bitmask in the
future it should be added later.

Haicheng, would you be willing to prepare a minimal patch so we can
close the issue in the release trees as quickly as possible?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-01-22 08:43:40

by Haicheng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: make node_possible_map include hotpluggable node

H. Peter Anvin wrote:
> I have to say I agree with David Rientjes that we need the minimal patch
> for upstream and stable. If you need the additional bitmask in the
> future it should be added later.
>
> Haicheng, would you be willing to prepare a minimal patch so we can
> close the issue in the release trees as quickly as possible?

Peter,

Okay, let's close it. then please take the patch pasted below, which is the one without additional
bitmask added.

---
x86: set hotpluggable nodes in nodes_possible_map

nodes_possible_map does not currently include nodes that have SRAT
entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is cleared
in nodes_parsed if it does not have an online address range.

Unequivocally setting the bit in nodes_parsed is insufficient since
existing code, such as acpi_get_nodes(), assumes all nodes in the map
have online address ranges. In fact, all code using nodes_parsed assumes
such nodes represent an address range of online memory.

nodes_possible_map is created by unioning nodes_parsed and
cpu_nodes_parsed; the former represents nodes with online memory and the
latter represents memoryless nodes. We now set the bit for hotpluggable
nodes in cpu_nodes_parsed so that it also gets set in nodes_possible_map.

Signed-off-by: David Rientjes <[email protected]>
---
arch/x86/mm/srat_64.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -229,9 +229,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
+ node_set(node, cpu_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */

2010-01-22 10:15:35

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: make node_possible_map include hotpluggable node

On 01/22/2010 12:43 AM, Haicheng Li wrote:
> H. Peter Anvin wrote:
>> I have to say I agree with David Rientjes that we need the minimal patch
>> for upstream and stable. If you need the additional bitmask in the
>> future it should be added later.
>>
>> Haicheng, would you be willing to prepare a minimal patch so we can
>> close the issue in the release trees as quickly as possible?
>
> Peter,
>
> Okay, let's close it. then please take the patch pasted below, which is
> the one without additional bitmask added.
>

As David asked, OK to add your Tested-by: line?

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-01-22 10:35:38

by Haicheng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/mm/srat_64.c: make node_possible_map include hotpluggable node

H. Peter Anvin wrote:
> On 01/22/2010 12:43 AM, Haicheng Li wrote:
>> H. Peter Anvin wrote:
>>> I have to say I agree with David Rientjes that we need the minimal patch
>>> for upstream and stable. If you need the additional bitmask in the
>>> future it should be added later.
>>>
>>> Haicheng, would you be willing to prepare a minimal patch so we can
>>> close the issue in the release trees as quickly as possible?
>> Peter,
>>
>> Okay, let's close it. then please take the patch pasted below, which is
>> the one without additional bitmask added.
>>
>
> As David asked, OK to add your Tested-by: line?
Sure,
Tested-by: Haicheng Li <[email protected]>

Thanks.
-haicheng

2010-01-22 11:16:19

by David Rientjes

[permalink] [raw]
Subject: [tip:x86/urgent] x86: Set hotpluggable nodes in nodes_possible_map

Commit-ID: fefe0d8e9f0b2d1855ab4902c10399432cfc2e16
Gitweb: http://git.kernel.org/tip/fefe0d8e9f0b2d1855ab4902c10399432cfc2e16
Author: David Rientjes <[email protected]>
AuthorDate: Wed, 20 Jan 2010 12:10:47 -0800
Committer: H. Peter Anvin <[email protected]>
CommitDate: Fri, 22 Jan 2010 02:44:43 -0800

x86: Set hotpluggable nodes in nodes_possible_map

nodes_possible_map does not currently include nodes that have SRAT
entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is
cleared in nodes_parsed if it does not have an online address range.

Unequivocally setting the bit in nodes_parsed is insufficient since
existing code, such as acpi_get_nodes(), assumes all nodes in the map
have online address ranges. In fact, all code using nodes_parsed
assumes such nodes represent an address range of online memory.

nodes_possible_map is created by unioning nodes_parsed and
cpu_nodes_parsed; the former represents nodes with online memory and
the latter represents memoryless nodes. We now set the bit for
hotpluggable nodes in cpu_nodes_parsed so that it also gets set in
nodes_possible_map.

[ hpa: Haicheng Li points out that this makes the naming of the
variable cpu_nodes_parsed somewhat counterintuitive. However, leave
it as is in the interest of keeping the pure bug fix patch small. ]

Signed-off-by: David Rientjes <[email protected]>
Tested-by: Haicheng Li <[email protected]>
LKML-Reference: <[email protected]>
Cc: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/mm/srat_64.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..28c6876 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -229,9 +229,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
+ node_set(node, cpu_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */

2010-01-23 06:52:14

by David Rientjes

[permalink] [raw]
Subject: [tip:x86/urgent] x86: Set hotpluggable nodes in nodes_possible_map

Commit-ID: 3a5fc0e40cb467e692737bc798bc99773c81e1e2
Gitweb: http://git.kernel.org/tip/3a5fc0e40cb467e692737bc798bc99773c81e1e2
Author: David Rientjes <[email protected]>
AuthorDate: Wed, 20 Jan 2010 12:10:47 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 23 Jan 2010 06:21:57 +0100

x86: Set hotpluggable nodes in nodes_possible_map

nodes_possible_map does not currently include nodes that have SRAT
entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is
cleared in nodes_parsed if it does not have an online address range.

Unequivocally setting the bit in nodes_parsed is insufficient since
existing code, such as acpi_get_nodes(), assumes all nodes in the map
have online address ranges. In fact, all code using nodes_parsed
assumes such nodes represent an address range of online memory.

nodes_possible_map is created by unioning nodes_parsed and
cpu_nodes_parsed; the former represents nodes with online memory and
the latter represents memoryless nodes. We now set the bit for
hotpluggable nodes in cpu_nodes_parsed so that it also gets set in
nodes_possible_map.

[ hpa: Haicheng Li points out that this makes the naming of the
variable cpu_nodes_parsed somewhat counterintuitive. However, leave
it as is in the interest of keeping the pure bug fix patch small. ]

Signed-off-by: David Rientjes <[email protected]>
Tested-by: Haicheng Li <[email protected]>
LKML-Reference: <[email protected]>
Cc: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/mm/srat_64.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..28c6876 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -229,9 +229,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}

- if (changed)
+ if (changed) {
+ node_set(node, cpu_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}

/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */