2011-04-19 16:21:20

by Dave Hansen

[permalink] [raw]
Subject: [PATCH 1/2] break out page allocation warning code


Changed since last version:
- use pr_warning instead of open-coded KERN_WARNING
- eliminate 'wait' variable since we only use it once

--

This originally started as a simple patch to give vmalloc()
some more verbose output on failure on top of the plain
page allocator messages. Johannes suggested that it might
be nicer to lead with the vmalloc() info _before_ the page
allocator messages.

But, I do think there's a lot of value in what
__alloc_pages_slowpath() does with its filtering and so
forth.

This patch creates a new function which other allocators
can call instead of relying on the internal page allocator
warnings. It also gives this function private rate-limiting
which separates it from other printk_ratelimit() users.

Signed-off-by: Dave Hansen <[email protected]>
---

linux-2.6.git-dave/include/linux/mm.h | 2 +
linux-2.6.git-dave/mm/page_alloc.c | 62 ++++++++++++++++++++++------------
2 files changed, 43 insertions(+), 21 deletions(-)

diff -puN include/linux/mm.h~break-out-alloc-failure-messages include/linux/mm.h
--- linux-2.6.git/include/linux/mm.h~break-out-alloc-failure-messages 2011-04-18 14:59:51.278529173 -0700
+++ linux-2.6.git-dave/include/linux/mm.h 2011-04-18 14:59:51.290529171 -0700
@@ -1365,6 +1365,8 @@ extern void si_meminfo(struct sysinfo *
extern void si_meminfo_node(struct sysinfo *val, int nid);
extern int after_bootmem;

+extern void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...);
+
extern void setup_per_cpu_pageset(void);

extern void zone_pcp_update(struct zone *zone);
diff -puN mm/page_alloc.c~break-out-alloc-failure-messages mm/page_alloc.c
--- linux-2.6.git/mm/page_alloc.c~break-out-alloc-failure-messages 2011-04-18 14:59:51.282529173 -0700
+++ linux-2.6.git-dave/mm/page_alloc.c 2011-04-18 14:59:51.294529170 -0700
@@ -54,6 +54,7 @@
#include <trace/events/kmem.h>
#include <linux/ftrace_event.h>
#include <linux/memcontrol.h>
+#include <linux/ratelimit.h>

#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -1734,6 +1735,45 @@ static inline bool should_suppress_show_
return ret;
}

+static DEFINE_RATELIMIT_STATE(nopage_rs,
+ DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+
+void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...)
+{
+ va_list args;
+ unsigned int filter = SHOW_MEM_FILTER_NODES;
+
+ if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs))
+ return;
+
+ /*
+ * This documents exceptions given to allocations in certain
+ * contexts that are allowed to allocate outside current's set
+ * of allowed nodes.
+ */
+ if (!(gfp_mask & __GFP_NOMEMALLOC))
+ if (test_thread_flag(TIF_MEMDIE) ||
+ (current->flags & (PF_MEMALLOC | PF_EXITING)))
+ filter &= ~SHOW_MEM_FILTER_NODES;
+ if (in_interrupt() || !(gfp_mask & __GFP_WAIT))
+ filter &= ~SHOW_MEM_FILTER_NODES;
+
+ if (fmt) {
+ printk(KERN_WARNING);
+ va_start(args, fmt);
+ vprintk(fmt, args);
+ va_end(args);
+ }
+
+ pr_warning("%s: page allocation failure: order:%d, mode:0x%x\n",
+ current->comm, order, gfp_mask);
+
+ dump_stack();
+ if (!should_suppress_show_mem())
+ show_mem(filter);
+}
+
static inline int
should_alloc_retry(gfp_t gfp_mask, unsigned int order,
unsigned long pages_reclaimed)
@@ -2176,27 +2216,7 @@ rebalance:
}

nopage:
- if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
- unsigned int filter = SHOW_MEM_FILTER_NODES;
-
- /*
- * This documents exceptions given to allocations in certain
- * contexts that are allowed to allocate outside current's set
- * of allowed nodes.
- */
- if (!(gfp_mask & __GFP_NOMEMALLOC))
- if (test_thread_flag(TIF_MEMDIE) ||
- (current->flags & (PF_MEMALLOC | PF_EXITING)))
- filter &= ~SHOW_MEM_FILTER_NODES;
- if (in_interrupt() || !wait)
- filter &= ~SHOW_MEM_FILTER_NODES;
-
- pr_warning("%s: page allocation failure. order:%d, mode:0x%x\n",
- current->comm, order, gfp_mask);
- dump_stack();
- if (!should_suppress_show_mem())
- show_mem(filter);
- }
+ warn_alloc_failed(gfp_mask, order, NULL);
return page;
got_pg:
if (kmemcheck_enabled)
_


2011-04-19 16:21:26

by Dave Hansen

[permalink] [raw]
Subject: [PATCH 2/2] print vmalloc() state after allocation failures


New in this version:
- updated description to clarify why I added local variables

--

I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it. That's not
very useful.

During recovery, vmalloc() also nicely frees all of the memory that it
got up to the point of the failure. That is wonderful, but it also
quickly hides any issues. We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:

vmalloc(100 * 1<<30);

versus repeatedly failing 4096 bytes in to a:

vmalloc(8192);

This patch will print out messages that look like this:

[ 68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes
[ 68.124218] bash: page allocation failure: order:0, mode:0xd2
[ 68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333
[ 68.125579] Call Trace:
[ 68.125853] [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170
[ 68.126464] [<ffffffff8107e05c>] ? printk+0x6c/0x70
[ 68.126791] [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0
[ 68.127661] [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290
...

The 'order' variable is added for clarity when calling
warn_alloc_failed() to avoid having an unexplained '0' as an argument.

The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would
take us over 80 columns for the alloc_pages_node() call. If we are
going to add a line, it might as well be one that makes the sucker
easier to read.

As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace. Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.

Signed-off-by: Dave Hansen <[email protected]>
---

linux-2.6.git-dave/mm/vmalloc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff -puN mm/vmalloc.c~vmalloc-warn mm/vmalloc.c
--- linux-2.6.git/mm/vmalloc.c~vmalloc-warn 2011-04-18 15:03:35.658506887 -0700
+++ linux-2.6.git-dave/mm/vmalloc.c 2011-04-18 15:04:48.762499842 -0700
@@ -1534,6 +1534,7 @@ static void *__vmalloc_node(unsigned lon
static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
pgprot_t prot, int node, void *caller)
{
+ const int order = 0;
struct page **pages;
unsigned int nr_pages, array_size, i;
gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
@@ -1560,11 +1561,12 @@ static void *__vmalloc_area_node(struct

for (i = 0; i < area->nr_pages; i++) {
struct page *page;
+ gfp_t tmp_mask = gfp_mask | __GFP_NOWARN;

if (node < 0)
- page = alloc_page(gfp_mask);
+ page = alloc_page(tmp_mask);
else
- page = alloc_pages_node(node, gfp_mask, 0);
+ page = alloc_pages_node(node, tmp_mask, order);

if (unlikely(!page)) {
/* Successfully allocated i pages, free them in __vunmap() */
@@ -1579,6 +1581,9 @@ static void *__vmalloc_area_node(struct
return area->addr;

fail:
+ warn_alloc_failed(gfp_mask, order, "vmalloc: allocation failure, "
+ "allocated %ld of %ld bytes\n",
+ (area->nr_pages*PAGE_SIZE), area->size);
vfree(area->addr);
return NULL;
}
_