2011-05-04 18:17:52

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] Allocate memory cgroup structures in local nodes

From: Andi Kleen <[email protected]>

[Andrew: since this is a regression and a very simple fix
could you still consider it for .39? Thanks]

dde79e005a769 added a regression that the memory cgroup data structures
all end up in node 0 because the first attempt at allocating them
would not pass in a node hint. Since the initialization runs on CPU #0
it would all end up node 0. This is a problem on large memory systems,
where node 0 would lose a lot of memory.

Change the alloc_pages_exact to alloc_pages_exact_node. This will
still fall back to other nodes if not enough memory is available.

[RED-PEN: right now it would fall back first before trying
vmalloc_node. Probably not the best strategy ... But I left it like
that for now.]

Reported-by: Doug Nelson
CC: Michal Hocko <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
mm/page_cgroup.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 9905501..1f4e20f 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -134,7 +134,7 @@ static void *__init_refok alloc_page_cgroup(size_t size, int nid)
{
void *addr = NULL;

- addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN);
+ addr = alloc_pages_exact_node(nid, size, GFP_KERNEL | __GFP_NOWARN);
if (addr)
return addr;

--
1.7.4.4


2011-05-04 19:17:29

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] Allocate memory cgroup structures in local nodes

On Wed, 4 May 2011, Andi Kleen wrote:

> From: Andi Kleen <[email protected]>
>
> [Andrew: since this is a regression and a very simple fix
> could you still consider it for .39? Thanks]
>

Before that's considered, the order of the arguments to
alloc_pages_exact_node() needs to be fixed.

> dde79e005a769 added a regression that the memory cgroup data structures
> all end up in node 0 because the first attempt at allocating them
> would not pass in a node hint. Since the initialization runs on CPU #0
> it would all end up node 0. This is a problem on large memory systems,
> where node 0 would lose a lot of memory.
>
> Change the alloc_pages_exact to alloc_pages_exact_node. This will
> still fall back to other nodes if not enough memory is available.
>

The vmalloc_node() calls ensure that the nid is actually set in
N_HIGH_MEMORY and fails otherwise (we don't fallback to using vmalloc()),
so it looks like the failures for alloc_pages_exact_node() and
vmalloc_node() would be different? Why do we want to fallback for one and
not the other?

> [RED-PEN: right now it would fall back first before trying
> vmalloc_node. Probably not the best strategy ... But I left it like
> that for now.]
>
> Reported-by: Doug Nelson
> CC: Michal Hocko <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> mm/page_cgroup.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 9905501..1f4e20f 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -134,7 +134,7 @@ static void *__init_refok alloc_page_cgroup(size_t size, int nid)
> {
> void *addr = NULL;
>
> - addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN);
> + addr = alloc_pages_exact_node(nid, size, GFP_KERNEL | __GFP_NOWARN);
> if (addr)
> return addr;
>

2011-05-05 05:25:09

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] Allocate memory cgroup structures in local nodes

* Andi Kleen <[email protected]> [2011-05-04 11:17:38]:

> From: Andi Kleen <[email protected]>
>
> [Andrew: since this is a regression and a very simple fix
> could you still consider it for .39? Thanks]
>
> dde79e005a769 added a regression that the memory cgroup data structures
> all end up in node 0 because the first attempt at allocating them
> would not pass in a node hint. Since the initialization runs on CPU #0
> it would all end up node 0. This is a problem on large memory systems,
> where node 0 would lose a lot of memory.
>
> Change the alloc_pages_exact to alloc_pages_exact_node. This will
> still fall back to other nodes if not enough memory is available.
>
> [RED-PEN: right now it would fall back first before trying
> vmalloc_node. Probably not the best strategy ... But I left it like
> that for now.]
>
> Reported-by: Doug Nelson
> CC: Michal Hocko <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Johannes Weiner <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> mm/page_cgroup.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 9905501..1f4e20f 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -134,7 +134,7 @@ static void *__init_refok alloc_page_cgroup(size_t size, int nid)
> {
> void *addr = NULL;
>
> - addr = alloc_pages_exact(size, GFP_KERNEL | __GFP_NOWARN);
> + addr = alloc_pages_exact_node(nid, size, GFP_KERNEL | __GFP_NOWARN);

Excellent catch! My eyes might be cheating me, I see
alloc_pages_exact_node doing what you expect it to do, I think the
size is interpreted as order.

> if (addr)
> return addr;
>
> --
> 1.7.4.4
>

--
Three Cheers,
Balbir

2011-05-04 20:04:35

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Allocate memory cgroup structures in local nodes


> Before that's considered, the order of the arguments to
> alloc_pages_exact_node() needs to be fixed.

Good point. I'll send another one.

This is really misleading BTW. Grumble. Maybe it would be actually
better to
change the prototype too.


> The vmalloc_node() calls ensure that the nid is actually set in
>N_HIGH_MEMORY and fails otherwise (we don't fallback to using vmalloc()),
>so it looks like the failures for alloc_pages_exact_node() and
>vmalloc_node() would be different? Why do we want to fallback for one and
>not the other?

The right order would be to try everything (alloc_pages + vmalloc)
to get it node local, before trying everything else. Right now that's
not how it's done.

-Andi


2011-05-04 20:10:47

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] Allocate memory cgroup structures in local nodes

On Wed, 4 May 2011, Andi Kleen wrote:

> > The vmalloc_node() calls ensure that the nid is actually set in
> > N_HIGH_MEMORY and fails otherwise (we don't fallback to using vmalloc()),
> > so it looks like the failures for alloc_pages_exact_node() and
> > vmalloc_node() would be different? Why do we want to fallback for one and
> > not the other?
>
> The right order would be to try everything (alloc_pages + vmalloc)
> to get it node local, before trying everything else. Right now that's
> not how it's done.
>

Completely agreed, I think that's how it should be patched instead of only
touching the alloc_pages() allocation; we care much more about local node
than whether we're using vmalloc.

2011-05-04 20:18:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Allocate memory cgroup structures in local nodes


> Completely agreed, I think that's how it should be patched instead of only
> touching the alloc_pages() allocation; we care much more about local node
> than whether we're using vmalloc.

Right now the problem is you end up in node 0 always and then run out of
memory
later on it on a large system. That's the problem I'm trying to solve ASAP

The rest is much less important.


-Andi