In the presence of memoryless nodes, numa_node_id()/cpu_to_node() will
return the current CPU's NUMA node, but that may not be where we expect
to allocate from memory from. Instead, we should use
numa_mem_id()/cpu_to_mem(). On one ppc64 system with a memoryless Node
0, this ends up saving nearly 500M of slab due to less fragmentation.
Signed-off-by: Nishanth Aravamudan <[email protected]>
diff --git a/kernel/kthread.c b/kernel/kthread.c
index b5ae3ee..8573e4e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -217,7 +217,7 @@ int tsk_fork_get_node(struct task_struct *tsk)
if (tsk == kthreadd_task)
return tsk->pref_node_fork;
#endif
- return numa_node_id();
+ return numa_mem_id();
}
static void create_kthread(struct kthread_create_info *create)
@@ -369,7 +369,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
{
struct task_struct *p;
- p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
+ p = kthread_create_on_node(threadfn, data, cpu_to_mem(cpu), namefmt,
cpu);
if (IS_ERR(p))
return p;
On Tue, 28 Jan 2014, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, numa_node_id()/cpu_to_node() will
> return the current CPU's NUMA node, but that may not be where we expect
> to allocate from memory from. Instead, we should use
> numa_mem_id()/cpu_to_mem(). On one ppc64 system with a memoryless Node
> 0, this ends up saving nearly 500M of slab due to less fragmentation.
>
> Signed-off-by: Nishanth Aravamudan <[email protected]>
Acked-by: David Rientjes <[email protected]>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index b5ae3ee..8573e4e 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -217,7 +217,7 @@ int tsk_fork_get_node(struct task_struct *tsk)
> if (tsk == kthreadd_task)
> return tsk->pref_node_fork;
> #endif
> - return numa_node_id();
> + return numa_mem_id();
I'm wondering why return NUMA_NO_NODE wouldn't have the same effect and
prefer the local node?
On Tue, 28 Jan 2014, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, numa_node_id()/cpu_to_node() will
> return the current CPU's NUMA node, but that may not be where we expect
> to allocate from memory from. Instead, we should use
> numa_mem_id()/cpu_to_mem(). On one ppc64 system with a memoryless Node
> 0, this ends up saving nearly 500M of slab due to less fragmentation.
Reviewed-by: Christoph Lameter <[email protected]>
On Wed, 29 Jan 2014, David Rientjes wrote:
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index b5ae3ee..8573e4e 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -217,7 +217,7 @@ int tsk_fork_get_node(struct task_struct *tsk)
> > if (tsk == kthreadd_task)
> > return tsk->pref_node_fork;
> > #endif
> > - return numa_node_id();
> > + return numa_mem_id();
>
> I'm wondering why return NUMA_NO_NODE wouldn't have the same effect and
> prefer the local node?
>
The idea here seems to be that the allocation may occur from a cpu that is
different from where the process will run later on.
On Wed, 29 Jan 2014, Christoph Lameter wrote:
> > > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > > index b5ae3ee..8573e4e 100644
> > > --- a/kernel/kthread.c
> > > +++ b/kernel/kthread.c
> > > @@ -217,7 +217,7 @@ int tsk_fork_get_node(struct task_struct *tsk)
> > > if (tsk == kthreadd_task)
> > > return tsk->pref_node_fork;
> > > #endif
> > > - return numa_node_id();
> > > + return numa_mem_id();
> >
> > I'm wondering why return NUMA_NO_NODE wouldn't have the same effect and
> > prefer the local node?
> >
>
> The idea here seems to be that the allocation may occur from a cpu that is
> different from where the process will run later on.
>
Yeah, that makes sense for kthreadd, but I'm wondering why we have to
return numa_mem_id() rather than just NUMA_NO_NODE. Sorry for not being
specific about doing s/numa_mem_id/NUMA_NO_NODE/ here.
That should just turn kmem_cache_alloc_node() into kmem_cache_alloc() and
alloc_pages_node() into alloc_pages() for the allocators that use this
return value, task_struct and thread_info. If that's not allocating local
memory, if possible, and numa_mem_id() magically does, then there's a
problem.
Eric, did you try this when writing 207205a2ba26 ("kthread: NUMA aware
kthread_create_on_node()") or was it always numa_node_id() from the
beginning?
On Wed, 2014-01-29 at 16:27 -0800, David Rientjes wrote:
> Eric, did you try this when writing 207205a2ba26 ("kthread: NUMA aware
> kthread_create_on_node()") or was it always numa_node_id() from the
> beginning?
Hmm, I think I did not try this, its absolutely possible NUMA_NO_NODE
was better here.
On Wed, 29 Jan 2014, Eric Dumazet wrote:
> > Eric, did you try this when writing 207205a2ba26 ("kthread: NUMA aware
> > kthread_create_on_node()") or was it always numa_node_id() from the
> > beginning?
>
> Hmm, I think I did not try this, its absolutely possible NUMA_NO_NODE
> was better here.
>
Nishanth, could you change your patch to just return NUMA_NO_NODE for the
non-kthreadd case?
On 30.01.2014 [14:47:05 -0800], David Rientjes wrote:
> On Wed, 29 Jan 2014, Eric Dumazet wrote:
>
> > > Eric, did you try this when writing 207205a2ba26 ("kthread: NUMA aware
> > > kthread_create_on_node()") or was it always numa_node_id() from the
> > > beginning?
> >
> > Hmm, I think I did not try this, its absolutely possible NUMA_NO_NODE
> > was better here.
> >
>
> Nishanth, could you change your patch to just return NUMA_NO_NODE for the
> non-kthreadd case?
Something like the following?
In the presence of memoryless nodes, numa_node_id() will return the
current CPU's NUMA node, but that may not be where we expect to allocate
from memory from. Instead, we should rely on the fallback code in the
memory allocator itself, by using NUMA_NO_NODE. Also, when calling
kthread_create_on_node(), use the nearest node with memory to the cpu in
question, rather than the node it is running on.
Signed-off-by: Nishanth Aravamudan <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: [email protected]
Cc: Wanpeng Li <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Ben Herrenschmidt <[email protected]>
---
Note that I haven't yet tested this change on the system that reproduce
the original problem yet.
diff --git a/kernel/kthread.c b/kernel/kthread.c
index b5ae3ee..9a130ec 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -217,7 +217,7 @@ int tsk_fork_get_node(struct task_struct *tsk)
if (tsk == kthreadd_task)
return tsk->pref_node_fork;
#endif
- return numa_node_id();
+ return NUMA_NO_NODE;
}
static void create_kthread(struct kthread_create_info *create)
@@ -369,7 +369,7 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
{
struct task_struct *p;
- p = kthread_create_on_node(threadfn, data, cpu_to_node(cpu), namefmt,
+ p = kthread_create_on_node(threadfn, data, cpu_to_mem(cpu), namefmt,
cpu);
if (IS_ERR(p))
return p;
On Thu, 30 Jan 2014, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, numa_node_id() will return the
> current CPU's NUMA node, but that may not be where we expect to allocate
> from memory from. Instead, we should rely on the fallback code in the
> memory allocator itself, by using NUMA_NO_NODE. Also, when calling
> kthread_create_on_node(), use the nearest node with memory to the cpu in
> question, rather than the node it is running on.
>
> Signed-off-by: Nishanth Aravamudan <[email protected]>
Acked-by: David Rientjes <[email protected]>
On Thu, 30 Jan 2014, Nishanth Aravamudan wrote:
> In the presence of memoryless nodes, numa_node_id() will return the
> current CPU's NUMA node, but that may not be where we expect to allocate
> from memory from. Instead, we should rely on the fallback code in the
> memory allocator itself, by using NUMA_NO_NODE. Also, when calling
> kthread_create_on_node(), use the nearest node with memory to the cpu in
> question, rather than the node it is running on.
Looks good to me.
Reviewed-by: Christoph Lameter <[email protected]>