2009-07-16 09:18:39

by Luming Yu

[permalink] [raw]
Subject: [RFC patch] delete improper hot pluggable code of memory affinity

Hello,

The current kernel code *wrongly* interprets Hot Pluggable bit of
Memory Affinity Structure (SRAT table in ACPI spec).
if Hot Pluggable bit is set and CONFIG_MEMORY_HOTPLUG_SPARSE is NOT
set, the memory Affinity will
be ignored. And a faked Node will be used...

An alternative is to enable CONFIG_MEMORY_HOTPLUG_SPARSE *always*
along with acpi_numa_memory_affinity_init.
Please decide which one is appropriate.

The downside of this patch is *some useful info* is lost and a follow
up patch is needed.

**The patch is enclosed in text attachment*
**Using web client to send the patch* *
**below is for review, please apply attached patch*/

Thanks,
Luming


Signed-off-by: Yu Luming <[email protected]>

srat_64.c | 16 ----------------
1 file changed, 16 deletions(-)


diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index 2dfcbf9..82423e5 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -172,11 +172,6 @@ acpi_numa_processor_affinity_init(struct
acpi_srat_cpu_affinity *pa)
pxm, apic_id, node);
}

-#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
-static inline int save_add_info(void) {return 1;}
-#else
-static inline int save_add_info(void) {return 0;}
-#endif
/*
* Update nodes_add[]
* This code supports one contiguous hot add area per node
@@ -249,9 +244,6 @@ acpi_numa_memory_affinity_init(struct
acpi_srat_mem_affinity *ma)
}
if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
return;
-
- if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
- return;
start = ma->base_address;
end = start + ma->length;
pxm = ma->proximity_domain;
@@ -291,14 +283,6 @@ acpi_numa_memory_affinity_init(struct
acpi_srat_mem_affinity *ma)
e820_register_active_regions(node, start >> PAGE_SHIFT,
end >> PAGE_SHIFT);

- if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
- update_nodes_add(node, start, end);
- /* restore nodes[node] */
- *nd = oldnode;
- if ((nd->start | nd->end) == 0)
- node_clear(node, nodes_parsed);
- }
-
node_memblk_range[num_node_memblks].start = start;
node_memblk_range[num_node_memblks].end = end;
memblk_nodeid[num_node_memblks] = node;


Attachments:
0.patch (1.33 kB)

2009-07-17 05:17:09

by Yasunori Goto

[permalink] [raw]
Subject: Re: [RFC patch] delete improper hot pluggable code of memory affinity


Hi, Luming-san.

> The current kernel code *wrongly* interprets Hot Pluggable bit of
> Memory Affinity Structure (SRAT table in ACPI spec).

I'm not sure your patch is correct or not yet, but I would like
to tell you a critical point about the definition of
Memory Affinity Structure.

The spec says the Enable bit of Memory Affinity Structure means that
the contents of its memory affinity structure is only VALID.
It doesn't mean memory is really connected at the area.
It means only that OS can read the entry.

When the enabled bit and hot pluggable bit is set on,
it may mean that the area may be hot-added after boot up.
So, kernel must check e820 or efi to confirm that memory is
really connected.

If you already know it, sorry for noise....
Just for your information.

Thanks.



> if Hot Pluggable bit is set and CONFIG_MEMORY_HOTPLUG_SPARSE is NOT
> set, the memory Affinity will
> be ignored. And a faked Node will be used...
>
> An alternative is to enable CONFIG_MEMORY_HOTPLUG_SPARSE *always*
> along with acpi_numa_memory_affinity_init.
> Please decide which one is appropriate.
>
> The downside of this patch is *some useful info* is lost and a follow
> up patch is needed.
>
> **The patch is enclosed in text attachment*
> **Using web client to send the patch* *
> **below is for review, please apply attached patch*/
>
> Thanks,
> Luming
>
>
> Signed-off-by: Yu Luming <[email protected]>
>
> srat_64.c | 16 ----------------
> 1 file changed, 16 deletions(-)
>
>
> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> index 2dfcbf9..82423e5 100644
> --- a/arch/x86/mm/srat_64.c
> +++ b/arch/x86/mm/srat_64.c
> @@ -172,11 +172,6 @@ acpi_numa_processor_affinity_init(struct
> acpi_srat_cpu_affinity *pa)
> pxm, apic_id, node);
> }
>
> -#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> -static inline int save_add_info(void) {return 1;}
> -#else
> -static inline int save_add_info(void) {return 0;}
> -#endif
> /*
> * Update nodes_add[]
> * This code supports one contiguous hot add area per node
> @@ -249,9 +244,6 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> }
> if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
> return;
> -
> - if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
> - return;
> start = ma->base_address;
> end = start + ma->length;
> pxm = ma->proximity_domain;
> @@ -291,14 +283,6 @@ acpi_numa_memory_affinity_init(struct
> acpi_srat_mem_affinity *ma)
> e820_register_active_regions(node, start >> PAGE_SHIFT,
> end >> PAGE_SHIFT);
>
> - if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> - update_nodes_add(node, start, end);
> - /* restore nodes[node] */
> - *nd = oldnode;
> - if ((nd->start | nd->end) == 0)
> - node_clear(node, nodes_parsed);
> - }
> -
> node_memblk_range[num_node_memblks].start = start;
> node_memblk_range[num_node_memblks].end = end;
> memblk_nodeid[num_node_memblks] = node;

--
Yasunori Goto

2009-07-17 05:33:50

by Luming Yu

[permalink] [raw]
Subject: Re: [RFC patch] delete improper hot pluggable code of memory affinity

Without a fix like my proposal, I have seen NUMA configure disabled by
kernel (due to the code the patch deletes) on a system with Enabled
bit set , and Hotplug-able bit cleared, and
CONFIG_MEMORY_HOTPLUG_SPARSE disabled.

On Fri, Jul 17, 2009 at 1:16 PM, Yasunori Goto<[email protected]> wrote:
>
> Hi, Luming-san.
>
>> The current kernel code *wrongly* interprets Hot Pluggable bit of
>> Memory Affinity Structure (SRAT table in ACPI spec).
>
> I'm not sure your patch is correct or not yet, but I would like
> to tell you a critical point about the definition of
> Memory Affinity Structure.
>
> The spec says the Enable bit of Memory Affinity Structure means that
> the contents of its memory affinity structure is only VALID.
> It doesn't mean memory is really connected at the area.
> It means only that OS can read the entry.
>
> When the enabled bit and hot pluggable bit is set on,
> it may mean that the area may be hot-added after boot up.
> So, kernel must check e820 or efi to confirm that memory is
> really connected.
>
> If you already know it, sorry for noise....
> Just for your information.
>
> Thanks.
>
>
>
>> if Hot Pluggable bit is set and CONFIG_MEMORY_HOTPLUG_SPARSE is NOT
>> set, the memory Affinity will
>> be ignored. And a faked Node will be used...
>>
>> An alternative is to enable CONFIG_MEMORY_HOTPLUG_SPARSE *always*
>> along with acpi_numa_memory_affinity_init.
>> Please decide which one is appropriate.
>>
>> The downside of this patch is *some useful info* is lost and a follow
>> up patch is needed.
>>
>> **The patch is enclosed in text attachment*
>> **Using web client to send the patch* *
>> **below is for review, please apply attached  patch*/
>>
>> Thanks,
>> Luming
>>
>>
>> Signed-off-by: Yu Luming <[email protected]>
>>
>>  srat_64.c |   16 ----------------
>>  1 file changed, 16 deletions(-)
>>
>>
>> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
>> index 2dfcbf9..82423e5 100644
>> --- a/arch/x86/mm/srat_64.c
>> +++ b/arch/x86/mm/srat_64.c
>> @@ -172,11 +172,6 @@ acpi_numa_processor_affinity_init(struct
>> acpi_srat_cpu_affinity *pa)
>>              pxm, apic_id, node);
>>  }
>>
>> -#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
>> -static inline int save_add_info(void) {return 1;}
>> -#else
>> -static inline int save_add_info(void) {return 0;}
>> -#endif
>>  /*
>>   * Update nodes_add[]
>>   * This code supports one contiguous hot add area per node
>> @@ -249,9 +244,6 @@ acpi_numa_memory_affinity_init(struct
>> acpi_srat_mem_affinity *ma)
>>       }
>>       if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
>>               return;
>> -
>> -     if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
>> -             return;
>>       start = ma->base_address;
>>       end = start + ma->length;
>>       pxm = ma->proximity_domain;
>> @@ -291,14 +283,6 @@ acpi_numa_memory_affinity_init(struct
>> acpi_srat_mem_affinity *ma)
>>       e820_register_active_regions(node, start >> PAGE_SHIFT,
>>                                    end >> PAGE_SHIFT);
>>
>> -     if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>> -             update_nodes_add(node, start, end);
>> -             /* restore nodes[node] */
>> -             *nd = oldnode;
>> -             if ((nd->start | nd->end) == 0)
>> -                     node_clear(node, nodes_parsed);
>> -     }
>> -
>>       node_memblk_range[num_node_memblks].start = start;
>>       node_memblk_range[num_node_memblks].end = end;
>>       memblk_nodeid[num_node_memblks] = node;
>
> --
> Yasunori Goto
>
>
>

2009-07-17 05:52:52

by Yasunori Goto

[permalink] [raw]
Subject: Re: [RFC patch] delete improper hot pluggable code of memory affinity

> Without a fix like my proposal, I have seen NUMA configure disabled by
> kernel (due to the code the patch deletes) on a system with Enabled
> bit set , and Hotplug-able bit cleared, and
> CONFIG_MEMORY_HOTPLUG_SPARSE disabled.

Ok. I guess that save_add_info() was to check percentage of
reserve area when CONFIG_MEMORY_HOTPLUG_RESERVE is set.
Its code was removed at 2.6.25, save_add_info() may be garbage now.

However, I have one question now.

>> - if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>> - update_nodes_add(node, start, end);
>> - /* restore nodes[node] */
>> - *nd = oldnode;
>> - if ((nd->start | nd->end) == 0)
>> - node_clear(node, nodes_parsed);
>> - }

I don't understand why you remove this code. could you tell me why?

Bye.


>
> On Fri, Jul 17, 2009 at 1:16 PM, Yasunori Goto<[email protected]> wrote:
> >
> > Hi, Luming-san.
> >
> >> The current kernel code *wrongly* interprets Hot Pluggable bit of
> >> Memory Affinity Structure (SRAT table in ACPI spec).
> >
> > I'm not sure your patch is correct or not yet, but I would like
> > to tell you a critical point about the definition of
> > Memory Affinity Structure.
> >
> > The spec says the Enable bit of Memory Affinity Structure means that
> > the contents of its memory affinity structure is only VALID.
> > It doesn't mean memory is really connected at the area.
> > It means only that OS can read the entry.
> >
> > When the enabled bit and hot pluggable bit is set on,
> > it may mean that the area may be hot-added after boot up.
> > So, kernel must check e820 or efi to confirm that memory is
> > really connected.
> >
> > If you already know it, sorry for noise....
> > Just for your information.
> >
> > Thanks.
> >
> >
> >
> >> if Hot Pluggable bit is set and CONFIG_MEMORY_HOTPLUG_SPARSE is NOT
> >> set, the memory Affinity will
> >> be ignored. And a faked Node will be used...
> >>
> >> An alternative is to enable CONFIG_MEMORY_HOTPLUG_SPARSE *always*
> >> along with acpi_numa_memory_affinity_init.
> >> Please decide which one is appropriate.
> >>
> >> The downside of this patch is *some useful info* is lost and a follow
> >> up patch is needed.
> >>
> >> **The patch is enclosed in text attachment*
> >> **Using web client to send the patch* *
> >> **below is for review, please apply attached ?patch*/
> >>
> >> Thanks,
> >> Luming
> >>
> >>
> >> Signed-off-by: Yu Luming <[email protected]>
> >>
> >> ?srat_64.c | ? 16 ----------------
> >> ?1 file changed, 16 deletions(-)
> >>
> >>
> >> diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
> >> index 2dfcbf9..82423e5 100644
> >> --- a/arch/x86/mm/srat_64.c
> >> +++ b/arch/x86/mm/srat_64.c
> >> @@ -172,11 +172,6 @@ acpi_numa_processor_affinity_init(struct
> >> acpi_srat_cpu_affinity *pa)
> >> ? ? ? ? ? ? ?pxm, apic_id, node);
> >> ?}
> >>
> >> -#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
> >> -static inline int save_add_info(void) {return 1;}
> >> -#else
> >> -static inline int save_add_info(void) {return 0;}
> >> -#endif
> >> ?/*
> >> ? * Update nodes_add[]
> >> ? * This code supports one contiguous hot add area per node
> >> @@ -249,9 +244,6 @@ acpi_numa_memory_affinity_init(struct
> >> acpi_srat_mem_affinity *ma)
> >> ? ? ? }
> >> ? ? ? if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
> >> ? ? ? ? ? ? ? return;
> >> -
> >> - ? ? if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
> >> - ? ? ? ? ? ? return;
> >> ? ? ? start = ma->base_address;
> >> ? ? ? end = start + ma->length;
> >> ? ? ? pxm = ma->proximity_domain;
> >> @@ -291,14 +283,6 @@ acpi_numa_memory_affinity_init(struct
> >> acpi_srat_mem_affinity *ma)
> >> ? ? ? e820_register_active_regions(node, start >> PAGE_SHIFT,
> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?end >> PAGE_SHIFT);
> >>
> >> - ? ? if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> >> - ? ? ? ? ? ? update_nodes_add(node, start, end);
> >> - ? ? ? ? ? ? /* restore nodes[node] */
> >> - ? ? ? ? ? ? *nd = oldnode;
> >> - ? ? ? ? ? ? if ((nd->start | nd->end) == 0)
> >> - ? ? ? ? ? ? ? ? ? ? node_clear(node, nodes_parsed);
> >> - ? ? }
> >> -
> >> ? ? ? node_memblk_range[num_node_memblks].start = start;
> >> ? ? ? node_memblk_range[num_node_memblks].end = end;
> >> ? ? ? memblk_nodeid[num_node_memblks] = node;
> >
> > --
> > Yasunori Goto
> >
> >
> >

--
Yasunori Goto

2009-07-17 06:03:20

by Luming Yu

[permalink] [raw]
Subject: Re: [RFC patch] delete improper hot pluggable code of memory affinity

On Fri, Jul 17, 2009 at 1:52 PM, Yasunori Goto<[email protected]> wrote:
>> Without a fix like my proposal, I have seen NUMA configure disabled by
>> kernel (due to the code the patch deletes) on a system with Enabled
>> bit set , and Hotplug-able bit cleared, and
>> CONFIG_MEMORY_HOTPLUG_SPARSE disabled.
>
> Ok. I guess that save_add_info() was to check percentage of
> reserve area when CONFIG_MEMORY_HOTPLUG_RESERVE is set.
> Its code was removed at 2.6.25, save_add_info() may be garbage now.
>
> However, I have one question now.
>
>>> -    if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
>>> -            update_nodes_add(node, start, end);
>>> -            /* restore nodes[node] */
>>> -            *nd = oldnode;
>>> -            if ((nd->start | nd->end) == 0)
>>> -                    node_clear(node, nodes_parsed);
>>> -     }
>
> I don't understand why you remove this code. could you tell me why?

Good question. This is exactly the place I'm puzzled too.
Without delete this code, I still see one fake node instead of 4 real node...
I think a flow up patch is needed here...

2009-07-17 06:21:05

by Yasunori Goto

[permalink] [raw]
Subject: Re: [RFC patch] delete improper hot pluggable code of memory affinity

> On Fri, Jul 17, 2009 at 1:52 PM, Yasunori Goto<[email protected]> wrote:
> >> Without a fix like my proposal, I have seen NUMA configure disabled by
> >> kernel (due to the code the patch deletes) on a system with Enabled
> >> bit set , and Hotplug-able bit cleared, and
> >> CONFIG_MEMORY_HOTPLUG_SPARSE disabled.
> >
> > Ok. I guess that save_add_info() was to check percentage of
> > reserve area when CONFIG_MEMORY_HOTPLUG_RESERVE is set.
> > Its code was removed at 2.6.25, save_add_info() may be garbage now.
> >
> > However, I have one question now.
> >
> >>> - ? ?if (ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) {
> >>> - ? ? ? ? ? ?update_nodes_add(node, start, end);
> >>> - ? ? ? ? ? ?/* restore nodes[node] */
> >>> - ? ? ? ? ? ?*nd = oldnode;
> >>> - ? ? ? ? ? ?if ((nd->start | nd->end) == 0)
> >>> - ? ? ? ? ? ? ? ? ? ?node_clear(node, nodes_parsed);
> >>> - ? ? }
> >
> > I don't understand why you remove this code. could you tell me why?
>
> Good question. This is exactly the place I'm puzzled too.
> Without delete this code, I still see one fake node instead of 4 real node...
> I think a flow up patch is needed here...

In my understanding, here is for the case when the SRAT entry is enabled,
but memory is not connected. (This is why I sent the first mail).
However, here may have a bug yet.

I would like to test here, but I don't have a real x86-64 machine which can
hot add memory. So, I can just guess now.

Anyway, thanks for your notification. If you have progress about it,
please let me know.

Bye.

--
Yasunori Goto

2009-07-20 07:42:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC patch] delete improper hot pluggable code of memory affinity

Luming Yu <[email protected]> writes:
>
> The downside of this patch is *some useful info* is lost and a follow
> up patch is needed.

A lot of the heuristics in the hotadd code are obsolete since x86-64
only supports sparsemem/vmememap

I posted another patch for this and also have a followup
series (not posted yet) to address some more problems.

-Andi

--
[email protected] -- Speaking for myself only.