LinuxLists.cc - [RFC Patch V1 00/30] Enable memoryless node on x86 platforms

2014-07-11 07:35:16

Subject: [RFC Patch V1 00/30] Enable memoryless node on x86 platforms

[permalink] [raw]

Subject: [RFC Patch V1 27/30] x86, numa: Kill useless code to improve code readability

According to x86 boot sequence, early_cpu_to_node() always returns
NUMA_NO_NODE when called from numa_init(). So kill useless code
to improve code readability.

Related code sequence as below:
x86_cpu_to_node_map is set until step 2, so it is still the default
value (NUMA_NO_NODE) when accessed at step 1.

start_kernel()
setup_arch()
initmem_init()
x86_numa_init()
numa_init()
early_cpu_to_node()
1) return early_per_cpu_ptr(x86_cpu_to_node_map)[cpu];
acpi_boot_init();
sfi_init()
x86_dtb_init()
generic_processor_info()
early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
init_cpu_to_node()
numa_set_node(cpu, node);
2) per_cpu(x86_cpu_to_node_map, cpu) = node;

rest_init()
kernel_init()
smp_init()
native_cpu_up()
start_secondary()
numa_set_node()
per_cpu(x86_cpu_to_node_map, cpu) = node;

Signed-off-by: Jiang Liu <[email protected]>
---
arch/x86/mm/numa.c | 10 ----------
1 file changed, 10 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index a32b706c401a..eec4f6c322bb 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -545,8 +545,6 @@ static void __init numa_init_array(void)

rr = first_node(node_online_map);
for (i = 0; i < nr_cpu_ids; i++) {
- if (early_cpu_to_node(i) != NUMA_NO_NODE)
- continue;
numa_set_node(i, rr);
rr = next_node(rr, node_online_map);
if (rr == MAX_NUMNODES)
@@ -633,14 +631,6 @@ static int __init numa_init(int (*init_func)(void))
if (ret < 0)
return ret;

- for (i = 0; i < nr_cpu_ids; i++) {
- int nid = early_cpu_to_node(i);
-
- if (nid == NUMA_NO_NODE)
- continue;
- if (!node_online(nid))
- numa_clear_node(i);
- }
numa_init_array();

/*
--
1.7.10.4

2014-07-11 07:38:19

by Jiang Liu

[permalink] [raw]

2014-07-11 07:36:47

by Jiang Liu

[permalink] [raw]

Subject: [RFC Patch V1 18/30] mm, bnx2fc: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-11 07:43:17

by Jiang Liu

[permalink] [raw]

Subject: [RFC Patch V1 20/30] mm, fcoe: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-11 07:45:46

by Jiang Liu

[permalink] [raw]

Subject: [RFC Patch V1 17/30] mm, intel_powerclamp: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-11 07:45:44

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [RFC Patch V1 25/30] mm, x86, kvm: Use cpu_to_mem()/numa_mem_id() to support memoryless node

Il 11/07/2014 09:37, Jiang Liu ha scritto:
> When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id()
> may return a node without memory, and later cause system failure/panic
> when calling kmalloc_node() and friends with returned node id.
> So use cpu_to_mem()/numa_mem_id() instead to get the nearest node with
> memory for the/current cpu.
>
> If CONFIG_HAVE_MEMORYLESS_NODES is disabled, cpu_to_mem()/numa_mem_id()
> is the same as cpu_to_node()/numa_node_id().
>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332edefc3..beb7c6d5d51b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2964,7 +2964,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
>
> static struct vmcs *alloc_vmcs_cpu(int cpu)
> {
> - int node = cpu_to_node(cpu);
> + int node = cpu_to_mem(cpu);
> struct page *pages;
> struct vmcs *vmcs;
>
>

Acked-by: Paolo Bonzini <[email protected]>

2014-07-11 07:36:21

by Jiang Liu

[permalink] [raw]

Subject: [RFC Patch V1 13/30] mm, i40e: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-11 07:48:29

by Jiang Liu

[permalink] [raw]

Subject: [RFC Patch V1 12/30] mm, IB/qib: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-11 07:36:10

by Jiang Liu

[permalink] [raw]

Subject: [RFC Patch V1 11/30] mm, char/mspec.c: Use cpu_to_mem()/numa_mem_id() to support memoryless node

2014-07-11 07:49:50

by Jiang Liu

[permalink] [raw]

On 28.07.2014 [07:30:40 -0600], Grant Likely wrote:
> On Mon, 21 Jul 2014 10:52:41 -0700, Nishanth Aravamudan <[email protected]> wrote:
> > On 11.07.2014 [15:37:39 +0800], Jiang Liu wrote:
> > > When CONFIG_HAVE_MEMORYLESS_NODES is enabled, cpu_to_node()/numa_node_id()
> > > may return a node without memory, and later cause system failure/panic
> > > when calling kmalloc_node() and friends with returned node id.
> > > So use cpu_to_mem()/numa_mem_id() instead to get the nearest node with
> > > memory for the/current cpu.
> > >
> > > If CONFIG_HAVE_MEMORYLESS_NODES is disabled, cpu_to_mem()/numa_mem_id()
> > > is the same as cpu_to_node()/numa_node_id().
> > >
> > > Signed-off-by: Jiang Liu <[email protected]>
> > > ---
> > > drivers/of/base.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > index b9864806e9b8..40d4772973ad 100644
> > > --- a/drivers/of/base.c
> > > +++ b/drivers/of/base.c
> > > @@ -85,7 +85,7 @@ EXPORT_SYMBOL(of_n_size_cells);
> > > #ifdef CONFIG_NUMA
> > > int __weak of_node_to_nid(struct device_node *np)
> > > {
> > > - return numa_node_id();
> > > + return numa_mem_id();
> > > }
> > > #endif
> >
> > Um, NAK. of_node_to_nid() returns the NUMA node ID for a given device
> > tree node. The default should be the physically local NUMA node, not the
> > nearest memory-containing node.
>
> That description doesn't match the code. This patch only changes the
> default implementation of of_node_to_nid() which doesn't take the device
> node into account *at all* when returning a node ID. Just look at the
> diff.

I meant that of_node_to_nid() seems to be used throughout the call-sites
to indicate caller locality. We want to keep using cpu_to_node() there,
and fallback appropriately in the MM (when allocations occur offnode due
to memoryless nodes), not indicate memory-specific topology the caller
itself. There was a long thread between between Tejun and I that
discussed what we are trying for: https://lkml.org/lkml/2014/7/18/278

I understand that the code unconditionally returns current's NUMA node
ID right now (ignoring the device node). That seems correct, to me, for
something like:

of_device_add:
/* device_add will assume that this device is on the same node as
* the parent. If there is no parent defined, set the node
* explicitly */
if (!ofdev->dev.parent)
set_dev_node(&ofdev->dev, of_node_to_nid(ofdev->dev.of_node));

I don't think we want the default implementation to set the NUMA node of
a dev to the nearest NUMA node with memory?

> I think this patch is correct, and it doesn't affect the override
> versions provided by powerpc and sparc.

Yes, agreed, so maybe it doesn't matter. I guess my point was simply
that it only seems reasonable to change callers of cpu_to_node() to
cpu_to_mem() that aren't in the core MM is if they care about memoryless
nodes explicitly. I don't think the OF code does, so I don't think it
should change.

Sorry for my premature NAK and lack of clarity in my explanation.

-Nish