2006-11-20 06:25:56

by Michael Raskin

[permalink] [raw]
Subject: 2.6.19-rc1-mm1+ memory problem

Short description: when X is loaded (maybe any heavy application is
sufficient, but I don't use anything heavy in console), 'free' says used
memory is growing.

Keywords: memory.

Kernel: built locally, gcc 4.0.3

I have a strange problem with 2.6.19-rc-mm kernels. After I load X, I
notice that memory is marked used at rate of tens of KB/s. Then it
starts to swap very heavily, when physical memory is all used. I tried
to verify it - it is so with all -mm kernels after 2.6.19-rc1-mm1,
including 2.6.19-rc5-mm2. At the meantime everything works OK with
kernels 2.6.18-mm3 and 2.6.19-rc1 through 2.6.19-rc6. I do not see any
options that should be memory eating in my .config . Module list is
short enough to include inline.

When I just run some things like periodical suck, oops proxy server etc
with X shut down, I do not notice "leak" from console because of small
fluctuations of memory use. When I run X and shut it down, used memory
count goes up a few megs (consistent with speed of eating it by X).

I didn't find exactly this problem in lkml or www, though the problem
with OOM on 2.6.19-rc-mm seems similar.

What should I check to fix problem or produce a useful bug report?

/etc/sysconfig/modules:

ehci-hcd, usb-storage, usbhid, ipaq, i915

Now loaded in 2.6.19-rc6:

i915, drm, ipaq, usbserial, usbhid, usb_storage, libusual, ehci_hcd,
usbcore

Main configuration options:

http://bigtip.narod.ru/temp/xorg.conf.txt
http://bigtip.narod.ru/temp/config-2.6.19-rc2-mm5-swsusp-my-1.txt
http://bigtip.narod.ru/temp/lspci.txt

Drivers:

http://bigtip.narod.ru/temp/ioports.txt
http://bigtip.narod.ru/temp/iomem.txt


2006-11-20 18:17:50

by Michael Raskin

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

Michael Raskin wrote:
> Short description: when X is loaded (maybe any heavy application is
> sufficient, but I don't use anything heavy in console), 'free' says used
> memory is growing.
>
Tried driver vesa. Leak still exists.

About leak size: with dri, xscreensaver, and nothing loaded while true;
do free >>free.log; sleep 1; done
shows ~100KB/s.

2006-11-21 08:38:12

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

On Mon, 20 Nov 2006 09:26:29 +0300
Michael Raskin <[email protected]> wrote:

> Short description: when X is loaded (maybe any heavy application is
> sufficient, but I don't use anything heavy in console), 'free' says used
> memory is growing.
>
> Keywords: memory.
>
> Kernel: built locally, gcc 4.0.3
>
> I have a strange problem with 2.6.19-rc-mm kernels. After I load X, I
> notice that memory is marked used at rate of tens of KB/s. Then it
> starts to swap very heavily, when physical memory is all used. I tried
> to verify it - it is so with all -mm kernels after 2.6.19-rc1-mm1,
> including 2.6.19-rc5-mm2. At the meantime everything works OK with
> kernels 2.6.18-mm3 and 2.6.19-rc1 through 2.6.19-rc6. I do not see any
> options that should be memory eating in my .config . Module list is
> short enough to include inline.
>
> When I just run some things like periodical suck, oops proxy server etc
> with X shut down, I do not notice "leak" from console because of small
> fluctuations of memory use. When I run X and shut it down, used memory
> count goes up a few megs (consistent with speed of eating it by X).
>
> I didn't find exactly this problem in lkml or www, though the problem
> with OOM on 2.6.19-rc-mm seems similar.
>
> What should I check to fix problem or produce a useful bug report?

Monitor /proc/meminfo

If the leak is slab, monitor /proc/slabinfo and /proc/slab_allocators.
/proc/slab_allocators needs CONFIG_DEBUG_SLAB_LEAK.

Thanks.

2006-11-21 19:45:51

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

On Tue, 21 Nov 2006 21:41:31 +0300
Michael Raskin <[email protected]> wrote:

> Andrew Morton wrote:
> > On Mon, 20 Nov 2006 09:26:29 +0300
> > Michael Raskin <[email protected]> wrote:
> >
> >> Short description: when X is loaded (maybe any heavy application is
> >> sufficient, but I don't use anything heavy in console), 'free' says used
> >> memory is growing.
> >>
> Thank you for reply.
>
> > Monitor /proc/meminfo
> Thanks for advice. I didn't think of it. I should be ashamed.
>
> Result: mysterious. All fields that grow can not account for even a
> third part.
>
> In top I found a situation when 2MB (~half a minute) go to nowhere and
> no of first 50 processes changes resident memory usage at all. The rest
> have less than a MB each.
>
> > If the leak is slab, monitor /proc/slabinfo and /proc/slab_allocators.
> I hope no.
> > /proc/slab_allocators needs CONFIG_DEBUG_SLAB_LEAK.
> >
> > Thanks.
> >
> I did a few cat /proc/meminfo. Two of them are here:
>
> MemTotal: 763532 kB 763532 kB
> MemFree: 445956 kB 430932 kB
> Buffers: 20908 kB 21048 kB
> Cached: 77008 kB 77212 kB
> SwapCached: 0 kB 0 kB
> Active: 65916 kB 66120 kB
> Inactive: 54748 kB 54884 kB
> SwapTotal: 1052216 kB 1052216 kB
> SwapFree: 1052216 kB 1052216 kB
> Dirty: 264 kB 324 kB
> Writeback: 0 kB 0 kB
> AnonPages: 22752 kB 22748 kB
> Mapped: 14616 kB 14628 kB
> Slab: 23108 kB 23088 kB
> SReclaimable: 15360 kB 15364 kB
> SUnreclaim: 7748 kB 7724 kB
> PageTables: 1216 kB 1216 kB
> NFS_Unstable: 0 kB 0 kB
> Bounce: 0 kB 0 kB
> CommitLimit: 1433980 kB 1433980 kB
> Committed_AS: 281456 kB 281448 kB
> VmallocTotal: 262104 kB 262104 kB
> VmallocUsed: 2876 kB 2876 kB
> VmallocChunk: 259028 kB 259028 kB

You lost 15MB and they didn't even turn up on the page LRU.

Can you try to determine exactly which activity causes this to happen? In
particular, is it due to the X server? If so, does any particular client
cause it to happen? Things which use 3d?

2006-11-21 21:19:09

by Michael Raskin

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

Andrew Morton wrote:
> On Tue, 21 Nov 2006 21:41:31 +0300
> Michael Raskin <[email protected]> wrote:

Sorry for leaving lkml out of "To: " in previous post.

> Can you try to determine exactly which activity causes this to happen? In
> particular, is it due to the X server? If so, does any particular client
> cause it to happen? Things which use 3d?
You were right, it's not because of personally X, but because of
environment I use.

Simplest example of reproducing code:

while true; do free | cat &>/dev/null; done

Looks like minimum (except of &>/dev/null not to involve console/xterm
output - leaks well without it too).

2006-11-24 13:24:13

by Michael Raskin

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

Michael Raskin wrote:
Strange thing: when run from xterm,

while true; do free | cat &>/dev/null; done

causes leak. While X is not loaded - no.

Also I have uploaded contents of /proc/page_owner after loosing more
than 100M. (220M used, 29M - on page_owner, lessthan 50M - for
processes). I will study it also.

http://bigtip.narod.ru/temp/page_owner.bz2
http://bigtip.narod.ru/temp/page_owner.gz

2006-11-24 23:08:05

by Michael Raskin

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

Michael Raskin wrote:
> Also I have uploaded contents of /proc/page_owner after loosing more
> than 100M. (220M used, 29M - on page_owner, lessthan 50M - for
> processes).

Top 3 entries:

89361 times:
Page allocated via order 0, mask 0x280d2
[0xc0159f31] __handle_mm_fault+1809
[0xc011318a] do_page_fault+314
[0xc04111c4] error_code+116
Can be anything. But if I understand anything, this memory is used
because someone has requested a page that is swapped out. So the memory
must be used, but not reflected in meminfo, and not by a process?


35560 times:
Page allocated via order 0, mask 0x201d2
[0xc0152ec2] __do_page_cache_readahead+450
[0xc015309a] do_page_cache_readahead+74
[0xc014d7b5] filemap_nopage+325
[0xc0159919] __handle_mm_fault+249
[0xc011318a] do_page_fault+314
[0xc04111c4] error_code+116
- is reflected in cache usage statistics, I guess..

6185 times:
Page allocated via order 0, mask 0x200d2
[0xc014e069] generic_file_buffered_write+329
[0xc014e814] __generic_file_aio_write_nolock+612
[0xc014eb85] generic_file_aio_write+85
[0xc01b26ff] ext3_file_write+63
[0xc016b23c] do_sync_write+204
[0xc016b9a7] vfs_write+167
[0xc016c2a7] sys_write+71
[0xc010303a] sysenter_past_esp+95
- negligible, really..

2006-11-25 19:04:10

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

On Sat, 25 Nov 2006 02:07:43 +0300
Michael Raskin <[email protected]> wrote:

> Michael Raskin wrote:
> > Also I have uploaded contents of /proc/page_owner after loosing more
> > than 100M. (220M used, 29M - on page_owner, lessthan 50M - for
> > processes).
>
> Top 3 entries:
>
> 89361 times:
> Page allocated via order 0, mask 0x280d2
> [0xc0159f31] __handle_mm_fault+1809
> [0xc011318a] do_page_fault+314
> [0xc04111c4] error_code+116
> Can be anything. But if I understand anything, this memory is used
> because someone has requested a page that is swapped out. So the memory
> must be used, but not reflected in meminfo, and not by a process?
>
>
> 35560 times:
> Page allocated via order 0, mask 0x201d2
> [0xc0152ec2] __do_page_cache_readahead+450
> [0xc015309a] do_page_cache_readahead+74
> [0xc014d7b5] filemap_nopage+325
> [0xc0159919] __handle_mm_fault+249
> [0xc011318a] do_page_fault+314
> [0xc04111c4] error_code+116
> - is reflected in cache usage statistics, I guess..
>
> 6185 times:
> Page allocated via order 0, mask 0x200d2
> [0xc014e069] generic_file_buffered_write+329
> [0xc014e814] __generic_file_aio_write_nolock+612
> [0xc014eb85] generic_file_aio_write+85
> [0xc01b26ff] ext3_file_write+63
> [0xc016b23c] do_sync_write+204
> [0xc016b9a7] vfs_write+167
> [0xc016c2a7] sys_write+71
> [0xc010303a] sysenter_past_esp+95
> - negligible, really..

What you should do is to cause the system to free as many pages as possible
before looking ad /proc/page_owner. For example, build `usemem' from
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz, run

usemem -m N (where N is the number of megabytes which the machine has)

a couple of times. Then check /proc/meminfo, and look to see which pages
are left over in /proc/page_owner.

Thanks.

2006-11-25 21:53:54

by Michael Raskin

[permalink] [raw]
Subject: Re: 2.6.19-rc1-mm1+ memory problem

Andrew Morton wrote:
>> 89361 times:
>> Page allocated via order 0, mask 0x280d2
>> [0xc0159f31] __handle_mm_fault+1809
>> [0xc011318a] do_page_fault+314
>> [0xc04111c4] error_code+116
>> Can be anything. But if I understand anything, this memory is used
>> because someone has requested a page that is swapped out. So the memory
>> must be used, but not reflected in meminfo, and not by a process?

> What you should do is to cause the system to free as many pages as possible
> before looking ad /proc/page_owner. For example, build `usemem' from
> http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz, run
>
> usemem -m N (where N is the number of megabytes which the machine has)
>
> a couple of times. Then check /proc/meminfo, and look to see which pages
> are left over in /proc/page_owner.

Well, I was too lazy to get this utility, used my own to allocate and
fill enough memory as to go some 50MB to deep swap (Did I understand
correctly what usemem does?). Top 3 did not change, except for exact
numbers.

2006-11-29 04:30:15

by Michael Raskin

[permalink] [raw]
Subject: Re: 2.6.19-rc6-mm2 is ok (2.6.19-rc1-mm1+ memory problem)

Michael Raskin wrote:
> I have a strange problem with 2.6.19-rc-mm kernels. After I load X, I
> notice that memory is marked used at rate of tens of KB/s. Then it

Tried 2.6.19-rc6-mm2. Now the problem is gone. Sometimes memory is
getting maked used as before, but when the loss reaches a few MB's it is
all freed. After 3 hours of X+all those scripts that cause leak +
ThunderBird I can still shut down everything except a few processes and
have only 50MB used. Script that demonstrated leak is now working
without problems and without eating memory.

Thanks.