2012-11-20 20:25:45

by Dave Hansen

[permalink] [raw]
Subject: [3.7-rc6] capture_free_page() frees page without accounting for them??

Hi Mel,

I'm chasing an apparent memory leak introduced post-3.6. The
interesting thing is that it appears that the pages are in the
allocator, but not being accounted for:

http://www.spinics.net/lists/linux-mm/msg46187.html
https://bugzilla.kernel.org/show_bug.cgi?id=50181

I started auditing anything that might be messing with NR_FREE_PAGES,
and came across commit 1fb3f8ca. It does something curious with
capture_free_page() (previously known as split_free_page()).

int capture_free_page(struct page *page, int alloc_order,
...
__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));

- /* Split into individual pages */
- set_page_refcounted(page);
- split_page(page, order);
+ if (alloc_order != order)
+ expand(zone, page, alloc_order, order,
+ &zone->free_area[order], migratetype);

Note that expand() puts the pages _back_ in the allocator, but it does
not bump NR_FREE_PAGES. We "return" alloc_order' worth of pages, but we
accounted for removing 'order'.

I _think_ the correct fix is to just:

- __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << alloc_order));

I'm trying to confirm the theory my making this happen a bit more often,
but I'd appreciate a second pair of eyes on the code in case I'm reading
it wrong.


2012-11-21 00:49:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [3.7-rc6] capture_free_page() frees page without accounting for them??

I'm really evil, so I changed the loop in compact_capture_page() to
basically steal the highest-order page it can. This shouldn't _break_
anything, but it does ensure that we'll be splitting pages that we find
more often and recreating this *MUCH* faster:

- for (order = cc->order; order < MAX_ORDER; order++) {
+ for (order = MAX_ORDER - 1; order >= cc->order;order--)

I also augmented the area in capture_free_page() that I expect to be
leaking:

if (alloc_order != order) {
static int leaked_pages = 0;
leaked_pages += 1<<order;
leaked_pages -= 1<<alloc_order;
printk("%s() alloc_order(%d) != order(%d) leaked %d\n",
__func__, alloc_order, order,
leaked_pages);
expand(zone, page, alloc_order, order,
&zone->free_area[order], migratetype);
}

I add up all the fields in buddyinfo to figure out how much _should_ be
in the allocator and then compare it to MemFree to get a guess at how
much is leaked. That number correlates _really_ well with the
"leaked_pages" variable above. That pretty much seals it for me.

I'll run a stress test overnight to see if it pops up again. The patch
I'm running is attached. I'll send a properly changelogged one tomorrow
if it works.


Attachments:
leak-fix-20121120-1.patch (666.00 B)

2012-11-21 14:33:10

by Mel Gorman

[permalink] [raw]
Subject: Re: [3.7-rc6] capture_free_page() frees page without accounting for them??

On Tue, Nov 20, 2012 at 12:25:37PM -0800, Dave Hansen wrote:
> Hi Mel,
>
> I'm chasing an apparent memory leak introduced post-3.6.

An accounting leak could also contribute to the kswapd bugs we've been
seeing recently.

Andrew, this is quite important and might be worth wedging in before 3.7
comes out because it'll cause serious problems if Dave is right.

> The
> interesting thing is that it appears that the pages are in the
> allocator, but not being accounted for:
>
> http://www.spinics.net/lists/linux-mm/msg46187.html
> https://bugzilla.kernel.org/show_bug.cgi?id=50181
>

Differences in the buddy allocator and reported free figures almost
always point to either per-cpu drift or NR_FREE_PAGES accounting bugs.
Usually the drift is not too bad and the drift is always within a
margin related to the number of CPUs. NR_FREE_PAGES accounting bugs get
progressively worse until the machine starts OOM killing or locks up.

> I started auditing anything that might be messing with NR_FREE_PAGES,
> and came across commit 1fb3f8ca.

It could certainly affect NR_FREE_PAGES due to its manipulating of buddy
pages. It will not result in happy and the system would potentially need
to be running a long time before it's spotted.

> It does something curious with
> capture_free_page() (previously known as split_free_page()).
>
> int capture_free_page(struct page *page, int alloc_order,
> ...
> __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
>
> - /* Split into individual pages */
> - set_page_refcounted(page);
> - split_page(page, order);
> + if (alloc_order != order)
> + expand(zone, page, alloc_order, order,
> + &zone->free_area[order], migratetype);
>
> Note that expand() puts the pages _back_ in the allocator, but it does
> not bump NR_FREE_PAGES. We "return" alloc_order' worth of pages, but we
> accounted for removing 'order'.
>
> I _think_ the correct fix is to just:
>
> - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
> + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << alloc_order));
>

This looks correct to me but it will collide with other patches. You'll
need something like the below. If it works for you, stick a changelog on
it, feel free to put my Acked on it and get it to Andrew for ASAP
because I really think this needs to be in before 3.7 comes out or we'll
be swamped with a maze of kswapd-goes-mental bugs, all similar with
different root causes.

Thanks a million Dave!


diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd6a073..ad99f0f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1406,7 +1406,7 @@ int capture_free_page(struct page *page, int alloc_order, int migratetype)

mt = get_pageblock_migratetype(page);
if (unlikely(mt != MIGRATE_ISOLATE))
- __mod_zone_freepage_state(zone, -(1UL << order), mt);
+ __mod_zone_freepage_state(zone, -(1UL << alloc_order), mt);

if (alloc_order != order)
expand(zone, page, alloc_order, order,