2010-02-09 16:57:08

by Chris Friesen

[permalink] [raw]
Subject: tracking memory usage/leak in "inactive" field in /proc/meminfo?

Hi,

I'm hoping you can help me out. I'm on a 2.6.27 x86 system and I'm
seeing the "inactive" field in /proc/meminfo slowly growing over time to
the point where eventually the oom-killer kicks in and starts killing
things. The growth is not evident in any other field in /proc/meminfo.

I'm trying to figure out where the memory is going, and what it's being
used for.

As I've found, the fields in /proc/meminfo don't add up...in particular,
active+inactive is quite a bit larger than
buffers+cached+dirty+anonpages+mapped+pagetables+vmallocused. Initially
the difference is about 156MB, but after about 13 hrs the difference is
240MB.

How can I track down where this is going? Can you suggest any
instrumentation that I can add?

I'm reasonably capable, but I'm getting seriously confused trying to
sort out the memory subsystem. Some pointers would be appreciated.

Thanks,

Chris


2010-02-10 00:32:13

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

> Hi,
>
> I'm hoping you can help me out. I'm on a 2.6.27 x86 system and I'm
> seeing the "inactive" field in /proc/meminfo slowly growing over time to
> the point where eventually the oom-killer kicks in and starts killing
> things. The growth is not evident in any other field in /proc/meminfo.
>
> I'm trying to figure out where the memory is going, and what it's being
> used for.
>
> As I've found, the fields in /proc/meminfo don't add up...in particular,
> active+inactive is quite a bit larger than
> buffers+cached+dirty+anonpages+mapped+pagetables+vmallocused. Initially
> the difference is about 156MB, but after about 13 hrs the difference is
> 240MB.
>
> How can I track down where this is going? Can you suggest any
> instrumentation that I can add?
>
> I'm reasonably capable, but I'm getting seriously confused trying to
> sort out the memory subsystem. Some pointers would be appreciated.

can you please post your /proc/meminfo?


2010-02-10 03:53:18

by Balbir Singh

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

* KOSAKI Motohiro <[email protected]> [2010-02-10 09:32:07]:

> > Hi,
> >
> > I'm hoping you can help me out. I'm on a 2.6.27 x86 system and I'm
> > seeing the "inactive" field in /proc/meminfo slowly growing over time to
> > the point where eventually the oom-killer kicks in and starts killing
> > things. The growth is not evident in any other field in /proc/meminfo.
> >
> > I'm trying to figure out where the memory is going, and what it's being
> > used for.
> >
> > As I've found, the fields in /proc/meminfo don't add up...in particular,
> > active+inactive is quite a bit larger than
> > buffers+cached+dirty+anonpages+mapped+pagetables+vmallocused. Initially
> > the difference is about 156MB, but after about 13 hrs the difference is
> > 240MB.
> >
> > How can I track down where this is going? Can you suggest any
> > instrumentation that I can add?
> >
> > I'm reasonably capable, but I'm getting seriously confused trying to
> > sort out the memory subsystem. Some pointers would be appreciated.
>
> can you please post your /proc/meminfo?
>

Do you have swap enabled? Can you help with the OOM killed dmesg log?
Does the situation get better after OOM killing. /proc/meminfo as
Kosaki suggested would be important as well.

--
Three Cheers,
Balbir

2010-02-10 04:09:27

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

> * KOSAKI Motohiro <[email protected]> [2010-02-10 09:32:07]:
>
> > > Hi,
> > >
> > > I'm hoping you can help me out. I'm on a 2.6.27 x86 system and I'm
> > > seeing the "inactive" field in /proc/meminfo slowly growing over time to
> > > the point where eventually the oom-killer kicks in and starts killing
> > > things. The growth is not evident in any other field in /proc/meminfo.
> > >
> > > I'm trying to figure out where the memory is going, and what it's being
> > > used for.
> > >
> > > As I've found, the fields in /proc/meminfo don't add up...in particular,
> > > active+inactive is quite a bit larger than
> > > buffers+cached+dirty+anonpages+mapped+pagetables+vmallocused. Initially
> > > the difference is about 156MB, but after about 13 hrs the difference is
> > > 240MB.
> > >
> > > How can I track down where this is going? Can you suggest any
> > > instrumentation that I can add?
> > >
> > > I'm reasonably capable, but I'm getting seriously confused trying to
> > > sort out the memory subsystem. Some pointers would be appreciated.
> >
> > can you please post your /proc/meminfo?
>
> Do you have swap enabled? Can you help with the OOM killed dmesg log?
> Does the situation get better after OOM killing. /proc/meminfo as
> Kosaki suggested would be important as well.

Indeed.

Chris, 2.6.27 is a bit old. plese test it on latest kernel. and please don't use
any proprietary drivers.


2010-02-10 17:11:28

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/09/2010 06:32 PM, KOSAKI Motohiro wrote:

> can you please post your /proc/meminfo?


On 02/09/2010 09:50 PM, Balbir Singh wrote:
> Do you have swap enabled? Can you help with the OOM killed dmesg log?
> Does the situation get better after OOM killing.


On 02/09/2010 10:09 PM, KOSAKI Motohiro wrote:

> Chris, 2.6.27 is a bit old. plese test it on latest kernel. and please
don't use
> any proprietary drivers.


Thanks for the replies.

Swap is enabled in the kernel, but there is no swap configured. ipcs
shows little consumption there.

The test load relies on a number of kernel modifications, making it
difficult to use newer kernels. (This is an embedded system.) There are
no closed-source drivers loaded, though there are some that are not in
vanilla kernels. I haven't yet tried to reproduce the problem with a
minimal load--I've been more focused on trying to understand what's
going on in the code first. It's on my list to try though.

Here are some /proc/meminfo outputs from a test run where we
artificially chewed most of the free memory to try and force the oom
killer to fire sooner (otherwise it takes days for the problem to trigger).

It's spaced with tabs so I'm not sure if it'll stay aligned. The first
row is the sample number. All the HugePages entries were 0. The
DirectMap entries were constant. SwapTotal/SwapFree/SwapCached were 0,
as were Writeback/NFS_Unstable/Bounce/WritebackTmp.

Samples were taken 10 minutes apart. Between samples 49 and 50 the
oom-killer fired.

13 49 50
MemTotal 4042848 4042848 4042848
MemFree 113512 52668 69536
Buffers 20 24 76
Cached 1285588 1287456 1295128
Active 2883224 3369440 2850172
Inactive 913756 487944 990152
Dirty 36 216 252
AnonPages 2274756 2305448 2279216
Mapped 10804 12772 15760
Slab 62324 62568 63608
SReclaimable 24092 23912 24848
SUnreclaim 38232 38656 38760
PageTables 11960 12144 11848
CommitLimit 2021424 2021424 2021424
Committed_AS 12666508 12745200 7700484
VmallocUsed 23256 23256 23256

It's hard to get a good picture from just a few samples, so I've
attached an ooffice spreadsheet showing three separate runs. The
samples above are from sheet 3 in the document.

In those spreadsheets I notice that
memfree+active+inactive+slab+pagetables is basically a constant.
However, if I don't use active+inactive then I can't make the numbers
add up. And the difference between active+inactive and
buffers+cached+anonpages+dirty+mapped+pagetables+vmallocused grows
almost monotonically.

Thanks,

Chris


Attachments:
meminfo.ods (74.73 kB)

2010-02-11 00:45:45

by Minchan Kim

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

Hi, Chris.

On Thu, Feb 11, 2010 at 2:05 AM, Chris Friesen <[email protected]> wrote:
> On 02/09/2010 06:32 PM, KOSAKI Motohiro wrote:
>
>> can you please post your /proc/meminfo?
>
>
> On 02/09/2010 09:50 PM, Balbir Singh wrote:
>> Do you have swap enabled? Can you help with the OOM killed dmesg log?
>> Does the situation get better after OOM killing.
>
>
> On 02/09/2010 10:09 PM, KOSAKI Motohiro wrote:
>
>> Chris, 2.6.27 is a bit old. plese test it on latest kernel. and please
> don't use
>> any proprietary drivers.
>
>
> Thanks for the replies.
>
> Swap is enabled in the kernel, but there is no swap configured.  ipcs
> shows little consumption there.
>
> The test load relies on a number of kernel modifications, making it
> difficult to use newer kernels. (This is an embedded system.)  There are
> no closed-source drivers loaded, though there are some that are not in
> vanilla kernels.  I haven't yet tried to reproduce the problem with a
> minimal load--I've been more focused on trying to understand what's
> going on in the code first.  It's on my list to try though.
>
> Here are some /proc/meminfo outputs from a test run where we
> artificially chewed most of the free memory to try and force the oom
> killer to fire sooner (otherwise it takes days for the problem to trigger).
>
> It's spaced with tabs so I'm not sure if it'll stay aligned.  The first
> row is the sample number.  All the HugePages entries were 0.  The
> DirectMap entries were constant. SwapTotal/SwapFree/SwapCached were 0,
> as were Writeback/NFS_Unstable/Bounce/WritebackTmp.
>
> Samples were taken 10 minutes apart.  Between samples 49 and 50 the
> oom-killer fired.
>
>                13              49              50
> MemTotal        4042848         4042848         4042848
> MemFree         113512          52668           69536
> Buffers         20              24              76
> Cached          1285588         1287456         1295128
> Active          2883224         3369440         2850172
> Inactive        913756          487944          990152
> Dirty           36              216             252
> AnonPages       2274756         2305448         2279216
> Mapped          10804           12772           15760
> Slab            62324           62568           63608
> SReclaimable    24092           23912           24848
> SUnreclaim      38232           38656           38760
> PageTables      11960           12144           11848
> CommitLimit     2021424         2021424         2021424
> Committed_AS    12666508        12745200        7700484
> VmallocUsed     23256           23256           23256
>
> It's hard to get a good picture from just a few samples, so I've
> attached an ooffice spreadsheet showing three separate runs.  The
> samples above are from sheet 3 in the document.
>
> In those spreadsheets I notice that
> memfree+active+inactive+slab+pagetables is basically a constant.
> However, if I don't use active+inactive then I can't make the numbers
> add up.  And the difference between active+inactive and
> buffers+cached+anonpages+dirty+mapped+pagetables+vmallocused grows
> almost monotonically.

Such comparison is not right. That's because code pages of program account
with cached and mapped but they account just one in lru list(active +
inactive).
Also, if you use mmap on any file, above is applied.

I can't find any clue with your attachment.
You said you used kernel with some modification and non-vanilla drivers.
So I suspect that. Maybe kernel memory leak?

Now kernel don't account kernel memory allocations except SLAB.
I think this patch can help you find the kernel memory leak.
(It isn't merged with mainline by somewhy but it is useful to you :)

http://marc.info/?l=linux-mm&m=123782029809850&w=2


>
> Thanks,
>
> Chris
>



--
Kind regards,
Minchan Kim

2010-02-11 18:55:22

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/10/2010 06:45 PM, Minchan Kim wrote:
> On Thu, Feb 11, 2010 at 2:05 AM, Chris Friesen <[email protected]> wrote:

>> In those spreadsheets I notice that
>> memfree+active+inactive+slab+pagetables is basically a constant.
>> However, if I don't use active+inactive then I can't make the numbers
>> add up. And the difference between active+inactive and
>> buffers+cached+anonpages+dirty+mapped+pagetables+vmallocused grows
>> almost monotonically.
>
> Such comparison is not right. That's because code pages of program account
> with cached and mapped but they account just one in lru list(active +
> inactive).
> Also, if you use mmap on any file, above is applied.

That just makes the comparison even worse...it means that there is more
memory in active/inactive that isn't accounted for in any other category
in /proc/meminfo.


> I can't find any clue with your attachment.
> You said you used kernel with some modification and non-vanilla drivers.
> So I suspect that. Maybe kernel memory leak?

Possibly. Or it could be a use case issue, I know there have been
memory leaks fixed since 2.6.27. :)

> Now kernel don't account kernel memory allocations except SLAB.

I don't think that's entirely accurate. I think cached, buffers,
pagetables, vmallocUsed are all kernel allocations. Granted, they're
generally on behalf of userspace.

I've discovered that the generic page allocator (alloc_page, etc.) is
not tracked at all in /proc/meminfo. I seem to see the memory increase
in the page cache (that is, active/inactive), so that would seem to rule
out most direct allocations.

> I think this patch can help you find the kernel memory leak.
> (It isn't merged with mainline by somewhy but it is useful to you :)
>
> http://marc.info/?l=linux-mm&m=123782029809850&w=2

I have a modified version of that which I picked up as part of the
kmemleak backport. However, it doesn't help unless I can narrow down
*which* pages I should care about.

I tried using kmemleak directly, but it didn't find anything. I've also
tried checking for inactive pages which haven't been written to in 10
minutes, and haven't had much luck there either. But active/inactive
keeps growing, and I don't know why.

Chris

2010-02-11 19:05:21

by Rik van Riel

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/11/2010 01:54 PM, Chris Friesen wrote:
> On 02/10/2010 06:45 PM, Minchan Kim wrote:
>> On Thu, Feb 11, 2010 at 2:05 AM, Chris Friesen<[email protected]> wrote:
>
>>> In those spreadsheets I notice that
>>> memfree+active+inactive+slab+pagetables is basically a constant.
>>> However, if I don't use active+inactive then I can't make the numbers
>>> add up. And the difference between active+inactive and
>>> buffers+cached+anonpages+dirty+mapped+pagetables+vmallocused grows
>>> almost monotonically.
>>
>> Such comparison is not right. That's because code pages of program account
>> with cached and mapped but they account just one in lru list(active +
>> inactive).
>> Also, if you use mmap on any file, above is applied.
>
> That just makes the comparison even worse...it means that there is more
> memory in active/inactive that isn't accounted for in any other category
> in /proc/meminfo.

Which does not happen in the standard 2.6.27 kernel.

Are you leaking memory in your driver?

>
>> I can't find any clue with your attachment.
>> You said you used kernel with some modification and non-vanilla drivers.
>> So I suspect that. Maybe kernel memory leak?
>
> Possibly. Or it could be a use case issue, I know there have been
> memory leaks fixed since 2.6.27. :)
>
>> Now kernel don't account kernel memory allocations except SLAB.
>
> I don't think that's entirely accurate. I think cached, buffers,
> pagetables, vmallocUsed are all kernel allocations. Granted, they're
> generally on behalf of userspace.
>
> I've discovered that the generic page allocator (alloc_page, etc.) is
> not tracked at all in /proc/meminfo. I seem to see the memory increase
> in the page cache (that is, active/inactive), so that would seem to rule
> out most direct allocations.
>
>> I think this patch can help you find the kernel memory leak.
>> (It isn't merged with mainline by somewhy but it is useful to you :)
>>
>> http://marc.info/?l=linux-mm&m=123782029809850&w=2
>
> I have a modified version of that which I picked up as part of the
> kmemleak backport. However, it doesn't help unless I can narrow down
> *which* pages I should care about.
>
> I tried using kmemleak directly, but it didn't find anything. I've also
> tried checking for inactive pages which haven't been written to in 10
> minutes, and haven't had much luck there either. But active/inactive
> keeps growing, and I don't know why.
>
> Chris


--
All rights reversed.

2010-02-12 02:38:15

by Minchan Kim

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On Fri, Feb 12, 2010 at 3:54 AM, Chris Friesen <[email protected]> wrote:
> That just makes the comparison even worse...it means that there is more
> memory in active/inactive that isn't accounted for in any other category
> in /proc/meminfo.

Hmm. It's very strange. It's impossible if your kernel and drivers is normal.
Could you grep sources who increases NR_ACTIVE/INACTIVE?
I doubt one of your driver does increase and miss decrease.

>> Now kernel don't account kernel memory allocations except SLAB.
>
> I don't think that's entirely accurate.  I think cached, buffers,
> pagetables, vmallocUsed are all kernel allocations.  Granted, they're
> generally on behalf of userspace.

Yes. I just said simple. What I means kernel doesn't account whole memory
usage. :)

> I have a modified version of that which I picked up as part of the
> kmemleak backport.  However, it doesn't help unless I can narrow down
> *which* pages I should care about.

kmemleak doesn't support page allocator and ioremap.
Above URL patch just can tell who requests page which is using(ie, not
free) now.


> I tried using kmemleak directly, but it didn't find anything.  I've also
> tried checking for inactive pages which haven't been written to in 10
> minutes, and haven't had much luck there either.  But active/inactive
> keeps growing, and I don't know why.

If leak cause by alloc_page or __get_free_pages, kmemleak can't find leak.

>
> Chris
>



--
Kind regards,
Minchan Kim

2010-02-12 07:42:17

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/11/2010 08:38 PM, Minchan Kim wrote:
> On Fri, Feb 12, 2010 at 3:54 AM, Chris Friesen <[email protected]> wrote:
>> That just makes the comparison even worse...it means that there is more
>> memory in active/inactive that isn't accounted for in any other category
>> in /proc/meminfo.
>
> Hmm. It's very strange. It's impossible if your kernel and drivers is normal.
> Could you grep sources who increases NR_ACTIVE/INACTIVE?
> I doubt one of your driver does increase and miss decrease.

I instrumented the page cache to track all additions/subtractions of
pages to/from the LRU. I also added some page flags to track pages
counting towards NR_FILE_PAGES and NR_ANON_PAGES. I then periodically
scanned all of the pages on the LRU and if they weren't part of
NR_FILE_PAGES or NR_ANON_PAGES I dumped the call chain of the code that
added the page to the LRU.

After being up about 2.5 hrs, there were 4265 pages in the LRU that
weren't part of file or anon. These broke down into two separate call
chains (there were actually three separate offsets within
compat_do_execve, but the rest was identical):


backtrace:
[<ffffffff8061c162>] kmemleak_alloc_page+0x1eb/0x380
[<ffffffff80276ae8>] __pagevec_lru_add_active+0xb6/0x104
[<ffffffff80276b85>] lru_cache_add_active+0x4f/0x53
[<ffffffff8027d182>] do_wp_page+0x355/0x6f6
[<ffffffff8027eef1>] handle_mm_fault+0x62b/0x77c
[<ffffffff80632557>] do_page_fault+0x3c7/0xba0
[<ffffffff8062fb79>] error_exit+0x0/0x51
[<ffffffffffffffff>] 0xffffffffffffffff

and

backtrace:
[<ffffffff8061c162>] kmemleak_alloc_page+0x1eb/0x380
[<ffffffff80276ae8>] __pagevec_lru_add_active+0xb6/0x104
[<ffffffff80276b85>] lru_cache_add_active+0x4f/0x53
[<ffffffff8027eddc>] handle_mm_fault+0x516/0x77c
[<ffffffff8027f180>] get_user_pages+0x13e/0x462
[<ffffffff802a2f65>] get_arg_page+0x6a/0xca
[<ffffffff802a30bf>] copy_strings+0xfa/0x1d4
[<ffffffff802a31c7>] copy_strings_kernel+0x2e/0x43
[<ffffffff802d33fb>] compat_do_execve+0x1fa/0x2fd
[<ffffffff8021e405>] sys32_execve+0x44/0x62
[<ffffffff8021def5>] ia32_ptregs_common+0x25/0x50
[<ffffffffffffffff>] 0xffffffffffffffff

I'll dig into them further, but do either of these look like known issues?

Chris

2010-02-12 08:04:11

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

> backtrace:
> [<ffffffff8061c162>] kmemleak_alloc_page+0x1eb/0x380
> [<ffffffff80276ae8>] __pagevec_lru_add_active+0xb6/0x104
> [<ffffffff80276b85>] lru_cache_add_active+0x4f/0x53
> [<ffffffff8027d182>] do_wp_page+0x355/0x6f6
> [<ffffffff8027eef1>] handle_mm_fault+0x62b/0x77c
> [<ffffffff80632557>] do_page_fault+0x3c7/0xba0
> [<ffffffff8062fb79>] error_exit+0x0/0x51
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> and
>
> backtrace:
> [<ffffffff8061c162>] kmemleak_alloc_page+0x1eb/0x380
> [<ffffffff80276ae8>] __pagevec_lru_add_active+0xb6/0x104
> [<ffffffff80276b85>] lru_cache_add_active+0x4f/0x53
> [<ffffffff8027eddc>] handle_mm_fault+0x516/0x77c
> [<ffffffff8027f180>] get_user_pages+0x13e/0x462
> [<ffffffff802a2f65>] get_arg_page+0x6a/0xca
> [<ffffffff802a30bf>] copy_strings+0xfa/0x1d4
> [<ffffffff802a31c7>] copy_strings_kernel+0x2e/0x43
> [<ffffffff802d33fb>] compat_do_execve+0x1fa/0x2fd
> [<ffffffff8021e405>] sys32_execve+0x44/0x62
> [<ffffffff8021def5>] ia32_ptregs_common+0x25/0x50
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> I'll dig into them further, but do either of these look like known issues?

no known issue.
AFAIK, 2.6.27 - 2.6.33 don't have such problem.

2010-02-12 17:51:35

by Catalin Marinas

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

Minchan Kim <[email protected]> wrote:
> On Fri, Feb 12, 2010 at 3:54 AM, Chris Friesen <[email protected]> wrote:
>> I have a modified version of that which I picked up as part of the
>> kmemleak backport.  However, it doesn't help unless I can narrow down
>> *which* pages I should care about.
>
> kmemleak doesn't support page allocator and ioremap.
> Above URL patch just can tell who requests page which is using(ie, not
> free) now.

The ioremap can be easily tracked by kmemleak (it is on my to-do list
but haven't managed to do it yet). That's not far from vmalloc.

The page allocator is a bit more difficult since it's used by the slab
allocator as well and it may lead to some recursive calls into
kmemleak. I'll have a think.

Anyway, you can leak memory without this being detected by kmemleak -
just add the allocated objects to a list and never remove them.

--
Catalin

2010-02-13 06:29:14

by Balbir Singh

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

* Chris Friesen <[email protected]> [2010-02-10 11:05:16]:

> On 02/09/2010 06:32 PM, KOSAKI Motohiro wrote:
>
> > can you please post your /proc/meminfo?
>
>
> On 02/09/2010 09:50 PM, Balbir Singh wrote:
> > Do you have swap enabled? Can you help with the OOM killed dmesg log?
> > Does the situation get better after OOM killing.
>
>
> On 02/09/2010 10:09 PM, KOSAKI Motohiro wrote:
>
> > Chris, 2.6.27 is a bit old. plese test it on latest kernel. and please
> don't use
> > any proprietary drivers.
>
>
> Thanks for the replies.
>
> Swap is enabled in the kernel, but there is no swap configured. ipcs
> shows little consumption there.

OK, I did not find the OOM kill output, dmesg. Is the OOM killer doing
the right thing? If it kills the process we suspect is leaking memory,
then it is working correctly :) If the leak is in kernel space, we
need to examine the changes more closely.

>
> The test load relies on a number of kernel modifications, making it
> difficult to use newer kernels. (This is an embedded system.) There are
> no closed-source drivers loaded, though there are some that are not in
> vanilla kernels. I haven't yet tried to reproduce the problem with a
> minimal load--I've been more focused on trying to understand what's
> going on in the code first. It's on my list to try though.
>

kernel modifications that we are unaware of make the problem harder to
debug, since we have no way of knowing if they are the source of the
problem.

> Here are some /proc/meminfo outputs from a test run where we
> artificially chewed most of the free memory to try and force the oom
> killer to fire sooner (otherwise it takes days for the problem to trigger).
>
> It's spaced with tabs so I'm not sure if it'll stay aligned. The first
> row is the sample number. All the HugePages entries were 0. The
> DirectMap entries were constant. SwapTotal/SwapFree/SwapCached were 0,
> as were Writeback/NFS_Unstable/Bounce/WritebackTmp.
>
> Samples were taken 10 minutes apart. Between samples 49 and 50 the
> oom-killer fired.
>
> 13 49 50
> MemTotal 4042848 4042848 4042848
> MemFree 113512 52668 69536
> Buffers 20 24 76
> Cached 1285588 1287456 1295128
> Active 2883224 3369440 2850172
> Inactive 913756 487944 990152
> Dirty 36 216 252
> AnonPages 2274756 2305448 2279216
> Mapped 10804 12772 15760
> Slab 62324 62568 63608
> SReclaimable 24092 23912 24848
> SUnreclaim 38232 38656 38760
> PageTables 11960 12144 11848
> CommitLimit 2021424 2021424 2021424
> Committed_AS 12666508 12745200 7700484

Comitted_AS shows a large change, does the process that gets killed
use a lot of virtual memory (total_vm)? Please see my first question
as well. Can you try to set

vm.overcommit_memory=2

and run the tests to see if you still get OOM killed.

--
Three Cheers,
Balbir

2010-02-15 15:56:34

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/12/2010 01:35 AM, Chris Friesen wrote:

> After being up about 2.5 hrs, there were 4265 pages in the LRU that
> weren't part of file or anon. These broke down into two separate call
> chains (there were actually three separate offsets within
> compat_do_execve, but the rest was identical):

I added some further instrumentation to track timestamps of when they
were added to the LRU, and when they were added/removed from
NR_ANON_PAGES. Based on this, it appears that the pages are being
removed from NR_ANON_PAGES but are still left in the LRU.

It looks like I have three general paths leading to the removal of the
pages from NR_ANON_PAGES:

del from anon list backtrace:
[<ffffffff8029c951>] kmemleak_clear_anon+0x7f/0xbe
[<ffffffff802864c7>] page_remove_rmap+0x45/0x146
[<ffffffff8027dc7e>] unmap_vmas+0x41c/0x948
[<ffffffff80282405>] exit_mmap+0x7b/0x108
[<ffffffff8022f441>] mmput+0x33/0x110
[<ffffffff80233b05>] exit_mm+0x103/0x130
[<ffffffff802355b5>] do_exit+0x17b/0x91f
[<ffffffff80235d95>] do_group_exit+0x3c/0x9c
[<ffffffff80235e07>] sys_exit+0x0/0x12
[<ffffffff8021ddb5>] ia32_syscall_done+0x0/0xa
[<ffffffffffffffff>] 0xffffffffffffffff

del from anon list backtrace:
[<ffffffff8029c951>] kmemleak_clear_anon+0x7f/0xbe
[<ffffffff802864c7>] page_remove_rmap+0x45/0x146
[<ffffffff8027dc7e>] unmap_vmas+0x41c/0x948
[<ffffffff80282405>] exit_mmap+0x7b/0x108
[<ffffffff8022f441>] mmput+0x33/0x110
[<ffffffff802a3a4e>] flush_old_exec+0x1d6/0x86a
[<ffffffff802dc007>] load_elf_binary+0x366/0x1d1f
[<ffffffff802a35c6>] search_binary_handler+0xa4/0x25a
[<ffffffff802d36dc>] compat_do_execve+0x2ab/0x2fd
[<ffffffff8021e435>] sys32_execve+0x44/0x62
[<ffffffff8021df25>] ia32_ptregs_common+0x25/0x50
[<ffffffffffffffff>] 0xffffffffffffffff

del from anon list backtrace:
[<ffffffff8029c951>] kmemleak_clear_anon+0x7f/0xbe
[<ffffffff802864c7>] page_remove_rmap+0x45/0x146
[<ffffffff8027d1d7>] do_wp_page+0x37a/0x6f6
[<ffffffff8027ef21>] handle_mm_fault+0x62b/0x77c
[<ffffffff80632787>] do_page_fault+0x3c7/0xba0
[<ffffffff8062fda9>] error_exit+0x0/0x51


Looking at the code, it looks like page_remove_rmap() clears the
Anonpage flag and removes it from NR_ANON_PAGES, and the caller is
responsible for removing it from the LRU. Is that right?

I'll keep digging in the code, but does anyone know where the removal
from the LRU is supposed to happen in the above code paths?

Thanks,

Chris


2010-02-15 16:08:38

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/13/2010 12:29 AM, Balbir Singh wrote:

> OK, I did not find the OOM kill output, dmesg. Is the OOM killer doing
> the right thing? If it kills the process we suspect is leaking memory,
> then it is working correctly :) If the leak is in kernel space, we
> need to examine the changes more closely.

I didn't include the oom killer message because it didn't seem important
in this case. The oom killer took out the process with by far the
largest memory consumption, but as far as I know that process was not
the source of the leak.

It appears that the leak is in kernel space, given the unexplained pages
that are part of the active/inactive list but not in
buffers/cache/anon/swapcached.

> kernel modifications that we are unaware of make the problem harder to
> debug, since we have no way of knowing if they are the source of the
> problem.

Yes, I realize this. I'm not expecting miracles, just hoping for some
guidance.


>> Committed_AS 12666508 12745200 7700484
>
> Comitted_AS shows a large change, does the process that gets killed
> use a lot of virtual memory (total_vm)? Please see my first question
> as well. Can you try to set
>
> vm.overcommit_memory=2
>
> and run the tests to see if you still get OOM killed.

As mentioned above, the process that was killed did indeed consume a lot
of memory. I could try running with strict memory accounting, but would
you agree that that given the gradual but constant increase in the
unexplained pages described above, currently that looks like a more
likely culprit?

Chris

2010-02-15 17:00:24

by Rik van Riel

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/15/2010 10:50 AM, Chris Friesen wrote:

> Looking at the code, it looks like page_remove_rmap() clears the
> Anonpage flag and removes it from NR_ANON_PAGES, and the caller is
> responsible for removing it from the LRU. Is that right?

Nope.

> I'll keep digging in the code, but does anyone know where the removal
> from the LRU is supposed to happen in the above code paths?

Removal from the LRU is done from the page freeing code, on
the final free of the page.

It appears you have code somewhere that increments the reference
count on user pages and then forgets to lower it afterwards.

--
All rights reversed.

2010-02-16 16:59:05

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/15/2010 11:00 AM, Rik van Riel wrote:
> On 02/15/2010 10:50 AM, Chris Friesen wrote:
>
>> Looking at the code, it looks like page_remove_rmap() clears the
>> Anonpage flag and removes it from NR_ANON_PAGES, and the caller is
>> responsible for removing it from the LRU. Is that right?
>
> Nope.
>
>> I'll keep digging in the code, but does anyone know where the removal
>> from the LRU is supposed to happen in the above code paths?
>
> Removal from the LRU is done from the page freeing code, on
> the final free of the page.
>
> It appears you have code somewhere that increments the reference
> count on user pages and then forgets to lower it afterwards.

Okay, that makes sense.

I'm still trying to get a handle on the LRU removal though. The code
path that I saw most which resulted in clearing the anon bit but leaving
the page on the LRU was the following:


[<ffffffff8029c951>] kmemleak_clear_anon+0x7f/0xbe
[<ffffffff802864c7>] page_remove_rmap+0x45/0x146
[<ffffffff8027dc7e>] unmap_vmas+0x41c/0x948
[<ffffffff80282405>] exit_mmap+0x7b/0x108
[<ffffffff8022f441>] mmput+0x33/0x110
[<ffffffff80233b05>] exit_mm+0x103/0x130
[<ffffffff802355b5>] do_exit+0x17b/0x91f
[<ffffffff80235d95>] do_group_exit+0x3c/0x9c
[<ffffffff80235e07>] sys_exit+0x0/0x12
[<ffffffff8021ddb5>] ia32_syscall_done+0x0/0xa

There are a bunch of inline functions involved, but I think the chain
from page_remove_rmap() back up to unmap_vmas() looks like this:

page_remove_rmap
zap_pte_range
zap_pmd_range
zap_pud_range
unmap_page_range
unmap_vmas

So in this scenario, where do the pages actually get removed from the
LRU list (assuming that they're not in use by anyone else)?

Thanks,

Chris

2010-02-16 17:12:59

by Rik van Riel

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/16/2010 11:52 AM, Chris Friesen wrote:
> On 02/15/2010 11:00 AM, Rik van Riel wrote:

>> Removal from the LRU is done from the page freeing code, on
>> the final free of the page.

> There are a bunch of inline functions involved, but I think the chain
> from page_remove_rmap() back up to unmap_vmas() looks like this:
>
> page_remove_rmap
> zap_pte_range
> zap_pmd_range
> zap_pud_range
> unmap_page_range
> unmap_vmas
>
> So in this scenario, where do the pages actually get removed from the
> LRU list (assuming that they're not in use by anyone else)?

__page_cache_release

--
All rights reversed.

2010-02-16 21:32:22

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/16/2010 11:12 AM, Rik van Riel wrote:
> On 02/16/2010 11:52 AM, Chris Friesen wrote:
>> On 02/15/2010 11:00 AM, Rik van Riel wrote:
>
>>> Removal from the LRU is done from the page freeing code, on
>>> the final free of the page.
>
>> There are a bunch of inline functions involved, but I think the chain
>> from page_remove_rmap() back up to unmap_vmas() looks like this:
>>
>> page_remove_rmap
>> zap_pte_range
>> zap_pmd_range
>> zap_pud_range
>> unmap_page_range
>> unmap_vmas
>>
>> So in this scenario, where do the pages actually get removed from the
>> LRU list (assuming that they're not in use by anyone else)?
>
> __page_cache_release


For the backtrace scenario I posted it seems like it might actually be
release_pages(). There seems to be a plausible call chain:

__ClearPageLRU
release_pages
free_pages_and_swap_cache
tlb_flush_mmu
tlb_remove_page
zap_pte_range

Does that seem right? In this case, tlb_remove_page() is called right
after page_remove_rmap() which ultimately results in clearing the
PageAnon bit.

Chris

2010-02-16 22:23:09

by Rik van Riel

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

On 02/16/2010 04:26 PM, Chris Friesen wrote:

> For the backtrace scenario I posted it seems like it might actually be
> release_pages(). There seems to be a plausible call chain:
>
> __ClearPageLRU
> release_pages
> free_pages_and_swap_cache
> tlb_flush_mmu
> tlb_remove_page
> zap_pte_range
>
> Does that seem right? In this case, tlb_remove_page() is called right
> after page_remove_rmap() which ultimately results in clearing the
> PageAnon bit.

That is right - and pinpoints the fault for the memory leak
on some third party code that fails to release a refcount on
memory pages.

--
All rights reversed.

2010-02-18 15:46:05

by Chris Friesen

[permalink] [raw]
Subject: Re: tracking memory usage/leak in "inactive" field in /proc/meminfo? -- solved

On 02/16/2010 04:22 PM, Rik van Riel wrote:
> On 02/16/2010 04:26 PM, Chris Friesen wrote:
>
>> For the backtrace scenario I posted it seems like it might actually be
>> release_pages(). There seems to be a plausible call chain:
>>
>> __ClearPageLRU
>> release_pages
>> free_pages_and_swap_cache
>> tlb_flush_mmu
>> tlb_remove_page
>> zap_pte_range
>>
>> Does that seem right? In this case, tlb_remove_page() is called right
>> after page_remove_rmap() which ultimately results in clearing the
>> PageAnon bit.
>
> That is right - and pinpoints the fault for the memory leak
> on some third party code that fails to release a refcount on
> memory pages.

I think I've tracked down the source of the problem. Turns out one of
our vendors had misapplied a patch which ended up bumping the page count
an extra time.

Thanks to everyone that helped out.

Chris