From: Michal Hocko <[email protected]>
vmemmap_populate uses huge pages if the CPU supports them which is good
and usually what we want. vmemmap_alloc_block will use the bootmem
allocator in the early initialization so the allocation will most likely
succeed. This is not the case for the memory hotplug though. Such an
allocation can easily fail under memory pressure. Especially so when the
kernel memory is restricted with movable_node parameter.
There is no real reason to fail the vmemmap_populate when
vmemmap_populate_hugepages fails though. We can still fallback to
vmemmap_populate_basepages and use base pages. The performance will not
be optimal but this is much better than failing the memory hot add.
Signed-off-by: Michal Hocko <[email protected]>
---
arch/x86/mm/init_64.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 136422d7d539..e6e3c755b9cb 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1401,15 +1401,16 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
{
struct vmem_altmap *altmap = to_vmem_altmap(start);
- int err;
+ int err = -ENOMEM;
if (boot_cpu_has(X86_FEATURE_PSE))
err = vmemmap_populate_hugepages(start, end, node, altmap);
else if (altmap) {
pr_err_once("%s: no cpu support for altmap allocations\n",
__func__);
- err = -ENOMEM;
- } else
+ return err;
+ }
+ if (err)
err = vmemmap_populate_basepages(start, end, node);
if (!err)
sync_global_pgds(start, end - 1);
--
2.11.0
Ohh, scratch that. The patch is bogus. I have completely missed that
vmemmap_populate_hugepages already falls back to
vmemmap_populate_basepages. I have to revisit the bug report I have
received to see what happened apart from the allocation warning. Maybe
we just want to silent that warning.
Sorry about the noise!
On Tue 11-07-17 15:42:04, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> vmemmap_populate uses huge pages if the CPU supports them which is good
> and usually what we want. vmemmap_alloc_block will use the bootmem
> allocator in the early initialization so the allocation will most likely
> succeed. This is not the case for the memory hotplug though. Such an
> allocation can easily fail under memory pressure. Especially so when the
> kernel memory is restricted with movable_node parameter.
>
> There is no real reason to fail the vmemmap_populate when
> vmemmap_populate_hugepages fails though. We can still fallback to
> vmemmap_populate_basepages and use base pages. The performance will not
> be optimal but this is much better than failing the memory hot add.
>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> arch/x86/mm/init_64.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 136422d7d539..e6e3c755b9cb 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1401,15 +1401,16 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
> int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
> {
> struct vmem_altmap *altmap = to_vmem_altmap(start);
> - int err;
> + int err = -ENOMEM;
>
> if (boot_cpu_has(X86_FEATURE_PSE))
> err = vmemmap_populate_hugepages(start, end, node, altmap);
> else if (altmap) {
> pr_err_once("%s: no cpu support for altmap allocations\n",
> __func__);
> - err = -ENOMEM;
> - } else
> + return err;
> + }
> + if (err)
> err = vmemmap_populate_basepages(start, end, node);
> if (!err)
> sync_global_pgds(start, end - 1);
> --
> 2.11.0
>
--
Michal Hocko
SUSE Labs
Hi Michael,
On Tue, Jul 11, 2017 at 04:25:58PM +0200, Michal Hocko wrote:
> Ohh, scratch that. The patch is bogus. I have completely missed that
> vmemmap_populate_hugepages already falls back to
> vmemmap_populate_basepages. I have to revisit the bug report I have
> received to see what happened apart from the allocation warning. Maybe
> we just want to silent that warning.
Yep, this should be fixed in 8e2cdbcb86b0 ("x86-64: fall back to
regular page vmemmap on allocation failure").
I figure it's good to keep some sort of warning there, though, as it
could have performance implications when we fall back to base pages.
On Tue, 11 Jul 2017, Johannes Weiner wrote:
> Hi Michael,
>
> On Tue, Jul 11, 2017 at 04:25:58PM +0200, Michal Hocko wrote:
> > Ohh, scratch that. The patch is bogus. I have completely missed that
> > vmemmap_populate_hugepages already falls back to
> > vmemmap_populate_basepages. I have to revisit the bug report I have
> > received to see what happened apart from the allocation warning. Maybe
> > we just want to silent that warning.
>
> Yep, this should be fixed in 8e2cdbcb86b0 ("x86-64: fall back to
> regular page vmemmap on allocation failure").
>
> I figure it's good to keep some sort of warning there, though, as it
> could have performance implications when we fall back to base pages.
If someone gets to work on it then maybe also add giant page support?
We already have systems with terabytes of memory and one 1G vmemmap page
would map 128G of memory leading to a significant reduction in the use of
TLBs.
On Tue 11-07-17 13:26:23, Johannes Weiner wrote:
> Hi Michael,
>
> On Tue, Jul 11, 2017 at 04:25:58PM +0200, Michal Hocko wrote:
> > Ohh, scratch that. The patch is bogus. I have completely missed that
> > vmemmap_populate_hugepages already falls back to
> > vmemmap_populate_basepages. I have to revisit the bug report I have
> > received to see what happened apart from the allocation warning. Maybe
> > we just want to silent that warning.
>
> Yep, this should be fixed in 8e2cdbcb86b0 ("x86-64: fall back to
> regular page vmemmap on allocation failure").
>
> I figure it's good to keep some sort of warning there, though, as it
> could have performance implications when we fall back to base pages.
Yeah, but I am not really sure the allocation warning is the right thing
here because it is just too verbose. If you consider that we will get
this warning for each memory section (128MB or 2GB)... I guess the
existing
pr_warn_once("vmemmap: falling back to regular page backing\n");
or maybe make it pr_warn should be enough. What do you think?
--
Michal Hocko
SUSE Labs
On Tue, Jul 11, 2017 at 11:25:45PM +0200, Michal Hocko wrote:
> On Tue 11-07-17 13:26:23, Johannes Weiner wrote:
> > Hi Michael,
> >
> > On Tue, Jul 11, 2017 at 04:25:58PM +0200, Michal Hocko wrote:
> > > Ohh, scratch that. The patch is bogus. I have completely missed that
> > > vmemmap_populate_hugepages already falls back to
> > > vmemmap_populate_basepages. I have to revisit the bug report I have
> > > received to see what happened apart from the allocation warning. Maybe
> > > we just want to silent that warning.
> >
> > Yep, this should be fixed in 8e2cdbcb86b0 ("x86-64: fall back to
> > regular page vmemmap on allocation failure").
> >
> > I figure it's good to keep some sort of warning there, though, as it
> > could have performance implications when we fall back to base pages.
>
> Yeah, but I am not really sure the allocation warning is the right thing
> here because it is just too verbose. If you consider that we will get
> this warning for each memory section (128MB or 2GB)... I guess the
> existing
> pr_warn_once("vmemmap: falling back to regular page backing\n");
>
> or maybe make it pr_warn should be enough. What do you think?
It could be useful to dump the memory context at least once, to 1) let
the user know we're falling back but also 2) to get the default report
we split out anytime we fail in a low-memory situation - in case there
is a problem with the MM subsystem.
Maybe something along the lines of this? (totally untested)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 95651dc58e09..d03c8f244e5b 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1302,7 +1302,6 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start,
vmemmap_verify((pte_t *)pmd, node, addr, next);
continue;
}
- pr_warn_once("vmemmap: falling back to regular page backing\n");
if (vmemmap_populate_basepages(addr, next, node))
return -ENOMEM;
}
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a56c3989f773..efd3f48c667c 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -52,18 +52,24 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
{
/* If the main allocator is up use that, fallback to bootmem. */
if (slab_is_available()) {
+ unsigned int order;
+ static int warned;
struct page *page;
+ gfp_t gfp_mask;
+ order = get_order(size);
+ gfp_mask = GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN;
if (node_state(node, N_HIGH_MEMORY))
- page = alloc_pages_node(
- node, GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT,
- get_order(size));
+ page = alloc_pages_node(node, gfp_mask, size);
else
- page = alloc_pages(
- GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT,
- get_order(size));
+ page = alloc_pages(gfp_mask, size);
if (page)
return page_address(page);
+ if (!warned) {
+ warn_alloc(gfp_mask, NULL,
+ "vmemmap alloc failure: order:%u", order);
+ warned = 1;
+ }
return NULL;
} else
return __earlyonly_bootmem_alloc(node, size, size,
On Tue 11-07-17 17:45:41, Johannes Weiner wrote:
[...]
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index a56c3989f773..efd3f48c667c 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -52,18 +52,24 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int node)
> {
> /* If the main allocator is up use that, fallback to bootmem. */
> if (slab_is_available()) {
> + unsigned int order;
> + static int warned;
> struct page *page;
> + gfp_t gfp_mask;
>
> + order = get_order(size);
> + gfp_mask = GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN;
why not do
gfp_mask = GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT;
if (warned)
gfp_mask |= __GFP_NOWARN;
and get the actual allocation warning from the allocation context. Then
we can keep the warning vmemmap_populate_hugepages because that would be
more descriptive that what is going on.
Btw. __GFP_REPEAT has been replaced by __GFP_RETRY_MAYFAIL in mmotm
tree.
> if (node_state(node, N_HIGH_MEMORY))
> - page = alloc_pages_node(
> - node, GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT,
> - get_order(size));
> + page = alloc_pages_node(node, gfp_mask, size);
> else
> - page = alloc_pages(
> - GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT,
> - get_order(size));
> + page = alloc_pages(gfp_mask, size);
> if (page)
> return page_address(page);
> + if (!warned) {
> + warn_alloc(gfp_mask, NULL,
> + "vmemmap alloc failure: order:%u", order);
> + warned = 1;
> + }
> return NULL;
> } else
> return __earlyonly_bootmem_alloc(node, size, size,
--
Michal Hocko
SUSE Labs