2012-08-20 06:21:34

by Jianguo Wu

[permalink] [raw]
Subject: [PATCH]mm/ia64: fix a node distance bug

From: Jianguo Wu <[email protected]>

Hi all,
When doing memory-hot-plug, We found node distance is wrong after offline
a node in IA64 platform. For example system has 4 nodes:
node distances:
node 0 1 2 3
0: 10 21 21 32
1: 21 10 32 21
2: 21 32 10 21
3: 32 21 21 10

linux-drf:/sys/devices/system/node/node0 # cat distance
10 21 21 32
linux-drf:/sys/devices/system/node/node1 # cat distance
21 10 32 21

After offline node2:
linux-drf:/sys/devices/system/node/node0 # cat distance
10 21 32
linux-drf:/sys/devices/system/node/node1 # cat distance
32 21 32 --------->expected value is: 21 10 21

In arch IA, we have following definition:
extern u8 numa_slit[MAX_NUMNODES * MAX_NUMNODES];
#define node_distance(from,to) (numa_slit[(from) * num_online_nodes() + (to)])

node distance is setup as following:
acpi_numa_arch_fixup()
{
...
memset(numa_slit, -1, sizeof(numa_slit));
for (i = 0; i < slit_table->locality_count; i++) {
if (!pxm_bit_test(i))
continue;
node_from = pxm_to_node(i);
for (j = 0; j < slit_table->locality_count; j++) {
if (!pxm_bit_test(j))
continue;
node_to = pxm_to_node(j);
node_distance(node_from, node_to) =
slit_table->entry[i * slit_table->locality_count + j];
}
}
...
}
num_online_nodes() is a variable value, during system boot the return vale is 4,
but after offline node2, the return value is 3, so we read a wrong node distance value.
This patch is trying to fix this bug.

Signed-off-by: Jianguo Wu <[email protected]>
---
arch/ia64/include/asm/numa.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/include/asm/numa.h b/arch/ia64/include/asm/numa.h
index 6a8a27c..2e27ef1 100644
--- a/arch/ia64/include/asm/numa.h
+++ b/arch/ia64/include/asm/numa.h
@@ -59,7 +59,7 @@ extern struct node_cpuid_s node_cpuid[NR_CPUS];
*/

extern u8 numa_slit[MAX_NUMNODES * MAX_NUMNODES];
-#define node_distance(from,to) (numa_slit[(from) * num_online_nodes() + (to)])
+#define node_distance(from,to) (numa_slit[(from) * MAX_NUMNODES + (to)])

extern int paddr_to_nid(unsigned long paddr);

--
1.7.6.1



.



2012-08-20 07:01:40

by Wen Congyang

[permalink] [raw]
Subject: Re: [PATCH]mm/ia64: fix a node distance bug

At 08/20/2012 02:21 PM, wujianguo Wrote:
> From: Jianguo Wu <[email protected]>
>
> Hi all,
> When doing memory-hot-plug, We found node distance is wrong after offline
> a node in IA64 platform. For example system has 4 nodes:
> node distances:
> node 0 1 2 3
> 0: 10 21 21 32
> 1: 21 10 32 21
> 2: 21 32 10 21
> 3: 32 21 21 10
>
> linux-drf:/sys/devices/system/node/node0 # cat distance
> 10 21 21 32
> linux-drf:/sys/devices/system/node/node1 # cat distance
> 21 10 32 21
>
> After offline node2:
> linux-drf:/sys/devices/system/node/node0 # cat distance
> 10 21 32
> linux-drf:/sys/devices/system/node/node1 # cat distance
> 32 21 32 --------->expected value is: 21 10 21
>
> In arch IA, we have following definition:
> extern u8 numa_slit[MAX_NUMNODES * MAX_NUMNODES];
> #define node_distance(from,to) (numa_slit[(from) * num_online_nodes() + (to)])
>
> node distance is setup as following:
> acpi_numa_arch_fixup()
> {
> ...
> memset(numa_slit, -1, sizeof(numa_slit));
> for (i = 0; i < slit_table->locality_count; i++) {
> if (!pxm_bit_test(i))
> continue;
> node_from = pxm_to_node(i);
> for (j = 0; j < slit_table->locality_count; j++) {
> if (!pxm_bit_test(j))
> continue;
> node_to = pxm_to_node(j);
> node_distance(node_from, node_to) =
> slit_table->entry[i * slit_table->locality_count + j];
> }
> }
> ...
> }
> num_online_nodes() is a variable value, during system boot the return vale is 4,
> but after offline node2, the return value is 3, so we read a wrong node distance value.
> This patch is trying to fix this bug.
>
> Signed-off-by: Jianguo Wu <[email protected]>
> ---
> arch/ia64/include/asm/numa.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/ia64/include/asm/numa.h b/arch/ia64/include/asm/numa.h
> index 6a8a27c..2e27ef1 100644
> --- a/arch/ia64/include/asm/numa.h
> +++ b/arch/ia64/include/asm/numa.h
> @@ -59,7 +59,7 @@ extern struct node_cpuid_s node_cpuid[NR_CPUS];
> */
>
> extern u8 numa_slit[MAX_NUMNODES * MAX_NUMNODES];
> -#define node_distance(from,to) (numa_slit[(from) * num_online_nodes() + (to)])
> +#define node_distance(from,to) (numa_slit[(from) * MAX_NUMNODES + (to)])

Hmm, MAX_NUMNODES is too large. I think num_possible_nodes() is better.

Thanks
Wen Congyang

>
> extern int paddr_to_nid(unsigned long paddr);
>

2012-08-20 13:54:33

by Jianguo Wu

[permalink] [raw]
Subject: Re: [PATCH]mm/ia64: fix a node distance bug

On 2012/8/20 15:06, Wen Congyang wrote:
> At 08/20/2012 02:21 PM, wujianguo Wrote:
>> From: Jianguo Wu <[email protected]>
>>
>> Hi all,
>> When doing memory-hot-plug, We found node distance is wrong after offline
>> a node in IA64 platform. For example system has 4 nodes:
>> node distances:
>> node 0 1 2 3
>> 0: 10 21 21 32
>> 1: 21 10 32 21
>> 2: 21 32 10 21
>> 3: 32 21 21 10
>>
>> linux-drf:/sys/devices/system/node/node0 # cat distance
>> 10 21 21 32
>> linux-drf:/sys/devices/system/node/node1 # cat distance
>> 21 10 32 21
>>
>> After offline node2:
>> linux-drf:/sys/devices/system/node/node0 # cat distance
>> 10 21 32
>> linux-drf:/sys/devices/system/node/node1 # cat distance
>> 32 21 32 --------->expected value is: 21 10 21
>>
>> In arch IA, we have following definition:
>> extern u8 numa_slit[MAX_NUMNODES * MAX_NUMNODES];
>> #define node_distance(from,to) (numa_slit[(from) * num_online_nodes() + (to)])
>>
>> node distance is setup as following:
>> acpi_numa_arch_fixup()
>> {
>> ...
>> memset(numa_slit, -1, sizeof(numa_slit));
>> for (i = 0; i < slit_table->locality_count; i++) {
>> if (!pxm_bit_test(i))
>> continue;
>> node_from = pxm_to_node(i);
>> for (j = 0; j < slit_table->locality_count; j++) {
>> if (!pxm_bit_test(j))
>> continue;
>> node_to = pxm_to_node(j);
>> node_distance(node_from, node_to) =
>> slit_table->entry[i * slit_table->locality_count + j];
>> }
>> }
>> ...
>> }
>> num_online_nodes() is a variable value, during system boot the return vale is 4,
>> but after offline node2, the return value is 3, so we read a wrong node distance value.
>> This patch is trying to fix this bug.
>>
>> Signed-off-by: Jianguo Wu <[email protected]>
>> ---
>> arch/ia64/include/asm/numa.h | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/ia64/include/asm/numa.h b/arch/ia64/include/asm/numa.h
>> index 6a8a27c..2e27ef1 100644
>> --- a/arch/ia64/include/asm/numa.h
>> +++ b/arch/ia64/include/asm/numa.h
>> @@ -59,7 +59,7 @@ extern struct node_cpuid_s node_cpuid[NR_CPUS];
>> */
>>
>> extern u8 numa_slit[MAX_NUMNODES * MAX_NUMNODES];
>> -#define node_distance(from,to) (numa_slit[(from) * num_online_nodes() + (to)])
>> +#define node_distance(from,to) (numa_slit[(from) * MAX_NUMNODES + (to)])
>
> Hmm, MAX_NUMNODES is too large. I think num_possible_nodes() is better.
>
> Thanks
> Wen Congyang
>

Hi Congyang,
Thanks for you comments.
numa_slit[MAX_NUMNODES * MAX_NUMNODES] is a static array, so I think it makes
no difference using MAX_NUMNODES or num_online_nodes().

>>
>> extern int paddr_to_nid(unsigned long paddr);
>>
>
>