LinuxLists.cc - OOM kernel behaviour

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On Mon, 30 Nov 2009, Jonathan Miles wrote:

> I'm running kernel 2.6.31.

Are you using a 32 bit kernel?

Disabling swap means that the kernel cannot swap out anonymous pages. Thus
the chance of OOMs conditions is increased.

In order to find the reason for the OOM (which has not much to do with
memory, the kernel has been unable to reclaim memmory for some reason)
you need to look at the debug output that the kernel generates.

2009-12-01 09:57:52

[permalink] [raw]

Subject: Re: OOM kernel behaviour

[19233.280568] Xorg invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0
[19233.280574] Xorg cpuset=/ mems_allowed=0
[19233.280588] Pid: 1143, comm: Xorg Not tainted 2.6.31-14-generic #48-Ubuntu
[19233.280591] Call Trace:
[19233.280602] [<c01b5a0f>] oom_kill_process+0x9f/0x250
[19233.280606] [<c01b601e>] ? select_bad_process+0xbe/0xf0
[19233.280619] [<c01b60a1>] __out_of_memory+0x51/0xa0
[19233.280623] [<c01b6143>] out_of_memory+0x53/0xb0
[19233.280627] [<c01b83fe>] __alloc_pages_slowpath+0x42e/0x480
[19233.280631] [<c01b855f>] __alloc_pages_nodemask+0x10f/0x120
[19233.280634] [<c01b3d03>] __read_cache_page+0x43/0xc0
[19233.280638] [<c01c2ec0>] ? shmem_readpage+0x0/0x40
[19233.280651] [<c01c2ec0>] ? shmem_readpage+0x0/0x40
[19233.280655] [<c01b45c7>] read_cache_page_async+0x27/0xd0
[19233.280696] [<f824f0e4>] ? i915_gem_object_get_pages+0x54/0x130 [i915]
[19233.280700] [<c01b4682>] read_cache_page+0x12/0x60
[19233.280727] [<f824f126>] i915_gem_object_get_pages+0x96/0x130 [i915]
[19233.280751] [<f8250e2e>] i915_gem_object_bind_to_gtt+0x14e/0x270 [i915]
[19233.280765] [<f82510a5>] i915_gem_object_pin+0x155/0x190 [i915]
[19233.280798] [<f814768b>] ? drm_gem_object_lookup+0x3b/0x50 [drm]
[19233.280824] [<f8251824>] i915_gem_object_pin_and_relocate+0x34/0x340 [i915]
[19233.280850] [<f814768b>] ? drm_gem_object_lookup+0x3b/0x50 [drm]
[19233.280876] [<f8252717>] i915_gem_execbuffer+0x467/0xc80 [i915]
[19233.280890] [<f824f9ea>] ? i915_gem_object_set_to_gtt_domain+0x4a/0x80 [i915]
[19233.280917] [<f81466c0>] drm_ioctl+0x180/0x360 [drm]
[19233.280943] [<f82522b0>] ? i915_gem_execbuffer+0x0/0xc80 [i915]
[19233.280947] [<c01caa6b>] ? __do_fault+0x3b/0x470
[19233.280955] [<c01f4cf3>] vfs_ioctl+0x73/0x90
[19233.280969] [<c01f4fc1>] do_vfs_ioctl+0x71/0x310
[19233.280972] [<c01f52bf>] sys_ioctl+0x5f/0x80
[19233.280976] [<c010336c>] syscall_call+0x7/0xb
[19233.280978] Mem-Info:
[19233.280980] DMA per-cpu:
[19233.280983] CPU 0: hi: 0, btch: 1 usd: 0
[19233.280985] CPU 1: hi: 0, btch: 1 usd: 0
[19233.280987] Normal per-cpu:
[19233.280989] CPU 0: hi: 186, btch: 31 usd: 157
[19233.281001] CPU 1: hi: 186, btch: 31 usd: 45
[19233.281003] HighMem per-cpu:
[19233.281005] CPU 0: hi: 186, btch: 31 usd: 72
[19233.281008] CPU 1: hi: 186, btch: 31 usd: 36
[19233.281012] Active_anon:186856 active_file:115555 inactive_anon:55504
[19233.281013] inactive_file:115834 unevictable:0 dirty:0 writeback:0 unstable:0
[19233.281015] free:12178 slab:8264 mapped:72156 pagetables:2716 bounce:0
[19233.281019] DMA free:8076kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:144kB inactive_file:396kB unevictable:0kB present:15852kB pages_scanned:768 all_unreclaimable? yes
[19233.281033] lowmem_reserve[]: 0 865 2007 2007
[19233.281040] Normal free:40140kB min:3728kB low:4660kB high:5592kB active_anon:253552kB inactive_anon:54412kB active_file:223304kB inactive_file:223696kB unevictable:0kB present:885944kB pages_scanned:653856 all_unreclaimable? no
[19233.281044] lowmem_reserve[]: 0 0 9134 9134
[19233.281050] HighMem free:496kB min:512kB low:1740kB high:2972kB active_anon:493872kB inactive_anon:167604kB active_file:238772kB inactive_file:239244kB unevictable:0kB present:1169244kB pages_scanned:950176 all_unreclaimable? no
[19233.281064] lowmem_reserve[]: 0 0 0 0
[19233.281068] DMA: 3*4kB 2*8kB 3*16kB 2*32kB 4*64kB 2*128kB 3*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 8076kB
[19233.281079] Normal: 745*4kB 273*8kB 660*16kB 337*32kB 93*64kB 10*128kB 7*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 40140kB
[19233.281100] HighMem: 36*4kB 40*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 496kB
[19233.281111] 337345 total pagecache pages
[19233.281112] 0 pages in swap cache
[19233.281115] Swap cache stats: add 0, delete 0, find 0/0
[19233.281126] Free swap = 0kB
[19233.281128] Total swap = 0kB
[19233.292981] 521939 pages RAM
[19233.292984] 294613 pages HighMem
[19233.292986] 8809 pages reserved
[19233.292987] 376103 pages shared
[19233.292989] 268322 pages non-shared
[19233.293002] Out of memory: kill process 19652 (vmware-vmx) score 202986 or a child
[19233.293080] Killed process 19652 (vmware-vmx)

Attachments:

OOM1.txt (4.15 kB)

2009-12-01 13:21:35

[permalink] [raw]

Subject: Re: OOM kernel behaviour

[66568.252903] hald invoked oom-killer: gfp_mask=0x280da, order=0, oomkilladj=0
[66568.252924] hald cpuset=/ mems_allowed=0
[66568.252931] Pid: 844, comm: hald Not tainted 2.6.31-14-generic #48-Ubuntu
[66568.252937] Call Trace:
[66568.252963] [<c01b5a0f>] oom_kill_process+0x9f/0x250
[66568.252981] [<c01b601e>] ? select_bad_process+0xbe/0xf0
[66568.252989] [<c01b60a1>] __out_of_memory+0x51/0xa0
[66568.252997] [<c01b6143>] out_of_memory+0x53/0xb0
[66568.253045] [<c01b83fe>] __alloc_pages_slowpath+0x42e/0x480
[66568.253055] [<c01b855f>] __alloc_pages_nodemask+0x10f/0x120
[66568.253077] [<c01ca076>] do_anonymous_page+0x66/0x200
[66568.253086] [<c012d3cd>] ? kmap_atomic_prot+0xcd/0xf0
[66568.253093] [<c01cc460>] handle_mm_fault+0x330/0x380
[66568.253116] [<c0317c58>] ? vsnprintf+0xc8/0x400
[66568.253128] [<c0572cf8>] do_page_fault+0x148/0x380
[66568.253150] [<c0572bb0>] ? do_page_fault+0x0/0x380
[66568.253160] [<c0570be3>] error_code+0x73/0x80
[66568.253181] [<c0318f06>] ? copy_to_user+0x116/0x120
[66568.253194] [<c02025e7>] simple_read_from_buffer+0x67/0xb0
[66568.253217] [<c023940f>] sysfs_read_file+0x4f/0x80
[66568.253239] [<c01e78b7>] vfs_read+0x97/0x190
[66568.253249] [<c02393c0>] ? sysfs_read_file+0x0/0x80
[66568.253270] [<c01e7ecd>] sys_read+0x3d/0x70
[66568.253281] [<c010336c>] syscall_call+0x7/0xb
[66568.253289] Mem-Info:
[66568.253305] DMA per-cpu:
[66568.253311] CPU 0: hi: 0, btch: 1 usd: 0
[66568.253319] CPU 1: hi: 0, btch: 1 usd: 0
[66568.253336] Normal per-cpu:
[66568.253343] CPU 0: hi: 186, btch: 31 usd: 181
[66568.253351] CPU 1: hi: 186, btch: 31 usd: 80
[66568.253367] HighMem per-cpu:
[66568.253373] CPU 0: hi: 186, btch: 31 usd: 35
[66568.253380] CPU 1: hi: 186, btch: 31 usd: 133
[66568.253402] Active_anon:174534 active_file:116377 inactive_anon:63691
[66568.253408] inactive_file:118428 unevictable:0 dirty:0 writeback:0 unstable:0
[66568.253413] free:12183 slab:8778 mapped:71864 pagetables:2711 bounce:0
[66568.253437] DMA free:8076kB min:64kB low:80kB high:96kB active_anon:236kB inactive_anon:256kB active_file:0kB inactive_file:56kB unevictable:0kB present:15852kB pages_scanned:504 all_unreclaimable? no
[66568.253463] lowmem_reserve[]: 0 865 2007 2007
[66568.253493] Normal free:40184kB min:3728kB low:4660kB high:5592kB active_anon:196728kB inactive_anon:60324kB active_file:248012kB inactive_file:248492kB unevictable:0kB present:885944kB pages_scanned:1232480 all_unreclaimable? no
[66568.253510] lowmem_reserve[]: 0 0 9134 9134
[66568.253538] HighMem free:472kB min:512kB low:1740kB high:2972kB active_anon:501172kB inactive_anon:194184kB active_file:217496kB inactive_file:225164kB unevictable:0kB present:1169244kB pages_scanned:1912736 all_unreclaimable? yes
[66568.253565] lowmem_reserve[]: 0 0 0 0
[66568.253588] DMA: 3*4kB 4*8kB 2*16kB 2*32kB 4*64kB 2*128kB 3*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 8076kB
[66568.253631] Normal: 9050*4kB 0*8kB 1*16kB 2*32kB 3*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 40184kB
[66568.253671] HighMem: 2*4kB 54*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 472kB
[66568.253721] 340269 total pagecache pages
[66568.253727] 0 pages in swap cache
[66568.253734] Swap cache stats: add 0, delete 0, find 0/0
[66568.253752] Free swap = 0kB
[66568.253757] Total swap = 0kB
[66568.277410] 521939 pages RAM
[66568.277413] 294613 pages HighMem
[66568.277414] 8809 pages reserved
[66568.277416] 378515 pages shared
[66568.277418] 259136 pages non-shared
[66568.277431] Out of memory: kill process 7811 (run-mozilla.sh) score 169511 or a child
[66568.277508] Killed process 7815 (thunderbird-bin)

Attachments:

OOM2.txt (3.61 kB)

2009-12-01 16:04:06

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On 11/30/2009 11:08 PM, Jonathan Miles wrote:
> Hi all,
>
> I'm not on the list, so please cc me.
>
> I'm trying to figure out why the OOM killer is being called on my
> workstation even though the last time there was nearly 1.2GB of memory
> used by the page cache, which IMHO could have been used instead.

<snip>

Hi,

I have started seeing the same problem too. I am running a 64-bit custom
compiled 2.6.31 kernel (Fedora 11) on a Dell Inspiron laptop. The funny
thing is my laptop has been running fine ever since 2.6.31 released, but
this problem has started only recently, about a week back or so. I
update Fedora daily, so I figure one of the updates could have triggered
this behaviour.

For instance, here is the mem info on my laptop as I type this out:

MemTotal: 2020588 kB
MemFree: 17680 kB
Buffers: 3568 kB
Cached: 1202744 kB
SwapCached: 35748 kB
Active: 1364408 kB
Inactive: 480464 kB
Active(anon): 1315120 kB
Inactive(anon): 441480 kB
Active(file): 49288 kB
Inactive(file): 38984 kB
Unevictable: 40 kB
Mlocked: 40 kB
SwapTotal: 2048248 kB
SwapFree: 603276 kB
Dirty: 72 kB
Writeback: 0 kB
AnonPages: 606484 kB
Mapped: 75716 kB
Slab: 65104 kB
SReclaimable: 20036 kB
SUnreclaim: 45068 kB
PageTables: 22960 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3058540 kB
Committed_AS: 6557484 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 357788 kB
VmallocChunk: 34359303907 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 8956 kB
DirectMap2M: 2050048 kB

If I start something like Firefox now, the OOM killer is invoked and the
machine is unusable until it completes. Sometimes the killer is invoked
multiple times and the only way to make the PC usable again is
to Ctrl-Alt-Bksp out of X. If RAM is low, why isn't the cache being
reclaimed?

The issue is sporadic but almost guaranteed if I leave the machine on
for more than 18 hours. I initially thought some process was leaking
memory but that seems unlikely.

I can provide more info and do some testing (config etc.) if required.

Regards,
David.

2009-12-01 16:07:52

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On Tue, 1 Dec 2009, David John wrote:

> If I start something like Firefox now, the OOM killer is invoked and the
> machine is unusable until it completes. Sometimes the killer is invoked
> multiple times and the only way to make the PC usable again is
> to Ctrl-Alt-Bksp out of X. If RAM is low, why isn't the cache being
> reclaimed?
>
> The issue is sporadic but almost guaranteed if I leave the machine on
> for more than 18 hours. I initially thought some process was leaking
> memory but that seems unlikely.
>
> I can provide more info and do some testing (config etc.) if required.

Post the kernel log output please.

2009-12-01 16:08:35

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On Tue, 1 Dec 2009, Jonathan Miles wrote:

> On 30/11/09 18:57 Christoph Lameter said the following:
> > On Mon, 30 Nov 2009, Jonathan Miles wrote:
> >
> >> I'm running kernel 2.6.31.
> >
> > Are you using a 32 bit kernel?

32 bit kernels are forced to use memory below 1G for some allocations. You
may be running out of memory because the kernel is not able to use all of
memory.

> Yep, that's what I intended. I want to tune my workstation so that it
> doesn't need to use swap. With my workload you can assume that nearly
> all processes are loaded for a reason and that at some point I want to
> use any one of them *now*.

Anonymous page use may fluctuate unpredictably based on the processes
being started. A spike may take out your system.

> > In order to find the reason for the OOM (which has not much to do with
> > memory, the kernel has been unable to reclaim memmory for some reason)
> > you need to look at the debug output that the kernel generates.
>
> I've attached a good example which shows Xorg invoking the OOM killer
> when the page-cache is using ~1.3GB out of 2GB. My understanding from
> reading linux-mm.org is that the kernel actually prefers to swap than
> reclaim from the page-cache... is this accurate? However, what if there
> is no swap, or it's configured not to use it? Should the kernel then try
> to reclaim instead?

The kernel will prefer to reclaim from clean page cache over
swapping in general. The aggressiveness of swap can be controlled via
/proc/sys/vm/swappiness.

The OOM killer output shows that the DMA is zone is marked as
unreclaimable but the allocation causing it is not using DMA. Other zones
all seem to have enough memory. Is this straight 2.6.31? No use of
containers, memory policies etc right?

2009-12-01 17:10:26

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On 12/01/2009 09:37 PM, Christoph Lameter wrote:
> Post the kernel log output please.
>
>

Here are three logs from three days. Log3.txt is today's log and the OOM
killer murdered Thunderbird as I was attempting to write this message.
The kernel config is also attached.

Regards,
David.

Attachments:

log1.txt (54.69 kB)
log2.txt (80.47 kB)
log3.txt (52.59 kB)
config-2.6.31 (84.92 kB)
Download all attachments

2009-12-01 17:27:16

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On Tue, 1 Dec 2009, David John wrote:

> Here are three logs from three days. Log3.txt is today's log and the OOM
> killer murdered Thunderbird as I was attempting to write this message.
> The kernel config is also attached.

Hmmm... This is all caused by the HIGHMEM zone freecount going beyond min
which then triggers reclaim which for some reason fails (should not there
is sufficient material there to reclaim). There is enough memory in the
NORMAL zone. Wonder if something broke in 2.6.31 in reclaim? Mel?

2009-12-01 17:32:53

[permalink] [raw]

Subject: Re: OOM kernel behaviour

There were some fixes to reclaim in 2.6.31.6 related to improper reclaim
of executable pages. Please try the latest point release with all fixes.

2009-12-01 18:07:04

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On 01/12/09 16:08 Christoph Lameter said the following:

> Anonymous page use may fluctuate unpredictably based on the processes
> being started. A spike may take out your system.

I agree that swap should be there for these times. Once I'm happy that
the system behaves properly without it, I'll re-enable swap and set a
low swappiness value.

> The kernel will prefer to reclaim from clean page cache over
> swapping in general. The aggressiveness of swap can be controlled via
> /proc/sys/vm/swappiness.
>
> The OOM killer output shows that the DMA is zone is marked as
> unreclaimable but the allocation causing it is not using DMA. Other zones
> all seem to have enough memory. Is this straight 2.6.31? No use of
> containers, memory policies etc right?

Okay, so my kernel shouldn't be behaving like this?

It's Ubuntu's (Karmic) 2.6.31 .deb and I don't know exactly what they've
done to it, but AFAIK, no containers or policies. I'll compile 2.6.31.6
and test with that. Won't be able to do this until Friday, so will check
back in then.

Thanks,

J

2009-12-01 18:42:08

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On Tue, 1 Dec 2009, Jonathan Miles wrote:

> Okay, so my kernel shouldn't be behaving like this?

No it should properly do reclaim.

2009-12-02 03:16:05

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On 12/01/2009 11:02 PM, Christoph Lameter wrote:
> There were some fixes to reclaim in 2.6.31.6 related to improper reclaim
> of executable pages. Please try the latest point release with all fixes.
>
>
>

I'll wait for 2.6.32 to come out. I was planning to upgrade anyway.
I'll keep you updated.

Regards,
David.

2009-12-02 04:29:30

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: OOM kernel behaviour

> On 12/01/2009 09:37 PM, Christoph Lameter wrote:
> > Post the kernel log output please.
>
> Here are three logs from three days. Log3.txt is today's log and the OOM
> killer murdered Thunderbird as I was attempting to write this message.
> The kernel config is also attached.

> Nov 19 18:56:07 avalanche kernel: hald invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0
> Nov 19 18:56:08 avalanche kernel: hald cpuset=/ mems_allowed=0
> Nov 19 18:56:08 avalanche kernel: Pid: 1455, comm: hald Tainted: P 2.6.31 #6

Proprietary module is loaded.

> Nov 19 18:56:08 avalanche kernel: Call Trace:
> Nov 19 18:56:08 avalanche kernel: [<ffffffff81409fc3>] ? _spin_unlock+0x26/0x31
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a3396>] oom_kill_process+0x91/0x250
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a3808>] ? select_bad_process+0x9e/0xfd
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a38f3>] __out_of_memory+0x8c/0x9b
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a3a60>] out_of_memory+0x15e/0x195
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a700c>] __alloc_pages_nodemask+0x494/0x584
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810c96e9>] alloc_pages_current+0x90/0x99
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a10e0>] __page_cache_alloc+0x5a/0x5c
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a0e37>] ? __lock_page+0x61/0x68
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a8c23>] __do_page_cache_readahead+0xcc/0x1ae
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a8d21>] ra_submit+0x1c/0x20
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810a15d8>] filemap_fault+0x183/0x34b
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810b57fa>] __do_fault+0x50/0x388
> Nov 19 18:56:08 avalanche kernel: [<ffffffff810b780e>] handle_mm_fault+0x2c6/0x6be
> Nov 19 18:56:08 avalanche kernel: [<ffffffff811d5f48>] ? __strncpy_from_user+0x27/0x51
> Nov 19 18:56:08 avalanche kernel: [<ffffffff8140a05c>] ? _spin_unlock_irqrestore+0x28/0x33
> Nov 19 18:56:08 avalanche kernel: [<ffffffff8140c3c0>] do_page_fault+0x210/0x225
> Nov 19 18:56:08 avalanche kernel: [<ffffffff8140a49f>] page_fault+0x1f/0x30
> Nov 19 18:56:08 avalanche kernel: Mem-Info:
> Nov 19 18:56:08 avalanche kernel: Node 0 DMA per-cpu:
> Nov 19 18:56:08 avalanche kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Nov 19 18:56:08 avalanche kernel: CPU 1: hi: 0, btch: 1 usd: 0
> Nov 19 18:56:08 avalanche kernel: Node 0 DMA32 per-cpu:
> Nov 19 18:56:08 avalanche kernel: CPU 0: hi: 186, btch: 31 usd: 115
> Nov 19 18:56:08 avalanche kernel: CPU 1: hi: 186, btch: 31 usd: 57
> Nov 19 18:56:08 avalanche kernel: Active_anon:343191 active_file:191 inactive_anon:116407
> Nov 19 18:56:08 avalanche kernel: inactive_file:222 unevictable:10 dirty:0 writeback:0 unstable:0
> Nov 19 18:56:08 avalanche kernel: free:3453 slab:17391 mapped:5687 pagetables:5159 bounce:0
> Nov 19 18:56:08 avalanche kernel: Node 0 DMA free:7908kB min:40kB low:48kB high:60kB active_anon:4192kB inactive_anon:3480kB active_file:12kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:3 all_unreclaimable? no
> Nov 19 18:56:08 avalanche kernel: lowmem_reserve[]: 0 1967 1967 1967

Lots anon and very few file.

> Nov 19 18:56:08 avalanche kernel: Node 0 DMA32 free:5904kB min:5652kB low:7064kB high:8476kB active_anon:1368572kB inactive_anon:462148kB active_file:752kB inactive_file:888kB unevictable:40kB present:2014692kB pages_scanned:2656 all_unreclaimable? no
> Nov 19 18:56:08 avalanche kernel: lowmem_reserve[]: 0 0 0 0
> Nov 19 18:56:08 avalanche kernel: Node 0 DMA: 3*4kB 3*8kB 8*16kB 2*32kB 2*64kB 1*128kB 3*256kB 3*512kB 3*1024kB 1*2048kB 0*4096kB = 7908kB
> Nov 19 18:56:08 avalanche kernel: Node 0 DMA32: 474*4kB 7*8kB 19*16kB 12*32kB 17*64kB 5*128kB 2*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 5904kB
> Nov 19 18:56:08 avalanche kernel: 380799 total pagecache pages
> Nov 19 18:56:08 avalanche kernel: 11432 pages in swap cache
>
> Nov 19 18:56:08 avalanche kernel: Swap cache stats: add 786520, delete 775088, find 78519/102287
> Nov 19 18:56:08 avalanche kernel: Free swap = 0kB
> Nov 19 18:56:08 avalanche kernel: Total swap = 2048248kB

no free swap.

Probably, some program in your system leaked memory.

> Nov 19 18:56:08 avalanche kernel: 514751 pages RAM
> Nov 19 18:56:08 avalanche kernel: 9604 pages reserved
> Nov 19 18:56:08 avalanche kernel: 46363 pages shared
> Nov 19 18:56:08 avalanche kernel: 457311 pages non-shared
> Nov 19 18:56:08 avalanche kernel: Out of memory: kill process 2879 (run-mozilla.sh) score 547203 or a child
> Nov 19 18:56:08 avalanche kernel: Killed process 2936 (firefox)

2009-12-02 05:23:11

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On 12/02/2009 09:59 AM, KOSAKI Motohiro wrote:
> Proprietary module is loaded.

That's the Broadcom wireless wl driver. It isn't causing the problem.

>
> Probably, some program in your system leaked memory.
>

Is there some way to find out? I tried to analyse top and ps output
but there isn't anything that stands out.

Regards,
David.

2009-12-02 05:30:58

by KOSAKI Motohiro

[permalink] [raw]

Subject: Re: OOM kernel behaviour

> On 12/02/2009 09:59 AM, KOSAKI Motohiro wrote:
> > Proprietary module is loaded.
>
> That's the Broadcom wireless wl driver. It isn't causing the problem.
>
> >
> > Probably, some program in your system leaked memory.
> >
>
> Is there some way to find out? I tried to analyse top and ps output
> but there isn't anything that stands out.

just idea.
/proc/{pid}/smaps show what file is swapped out. usually, leaked page
is swapped because nobody touched it.

2009-12-02 15:55:38

by Mel Gorman

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On Tue, Dec 01, 2009 at 11:26:37AM -0600, Christoph Lameter wrote:
> On Tue, 1 Dec 2009, David John wrote:
>
> > Here are three logs from three days. Log3.txt is today's log and the OOM
> > killer murdered Thunderbird as I was attempting to write this message.
> > The kernel config is also attached.
>
> Hmmm... This is all caused by the HIGHMEM zone freecount going beyond min
> which then triggers reclaim which for some reason fails (should not there
> is sufficient material there to reclaim). There is enough memory in the
> NORMAL zone. Wonder if something broke in 2.6.31 in reclaim? Mel?
>

I'm not aware of breakage of that level, nor do I believe the page
allocator problems are related to this bug.

However, I just took a look at the logs from the three days and I see
things like

Nov 25 23:58:53 avalanche kernel: Free swap = 0kB
Nov 25 23:58:53 avalanche kernel: Total swap = 2048248kB

Something on that system is leaking badly. Do something like

ps aux --sort vsz

and see what process has gone mental and is consuming all of swap. It's
possible that the OOM killer is triggering too easily but it's possible
that a delayed triggering of the OOM killer would have been just that -
a delay. Eventually all memory and all swap would be used.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-04 20:36:07

[permalink] [raw]

Subject: Re: OOM kernel behaviour - 2.6.32

On 01/12/09 18:07 Jonathan Miles said the following:

> It's Ubuntu's (Karmic) 2.6.31 .deb and I don't know exactly what they've
> done to it, but AFAIK, no containers or policies. I'll compile 2.6.31.6
> and test with that. Won't be able to do this until Friday, so will check
> back in then.

Everything is fine having compiled straight 2.6.32. I watched the cache
being reclaimed with top as I used more and more RAM and didn't see the
OOM killer being invoked. So it was either a problem with 2.6.31 or with
whatever Ubuntu did to make the .deb.

Thanks,

--
Jonathan Miles <[email protected]>
http://www.linkedin.com/in/jonmiles

2009-12-07 05:33:29

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On 12/02/2009 09:25 PM, Mel Gorman wrote:
> On Tue, Dec 01, 2009 at 11:26:37AM -0600, Christoph Lameter wrote:
>> On Tue, 1 Dec 2009, David John wrote:
>>
>>> Here are three logs from three days. Log3.txt is today's log and the OOM
>>> killer murdered Thunderbird as I was attempting to write this message.
>>> The kernel config is also attached.
>>
>> Hmmm... This is all caused by the HIGHMEM zone freecount going beyond min
>> which then triggers reclaim which for some reason fails (should not there
>> is sufficient material there to reclaim). There is enough memory in the
>> NORMAL zone. Wonder if something broke in 2.6.31 in reclaim? Mel?
>>
>
> I'm not aware of breakage of that level, nor do I believe the page
> allocator problems are related to this bug.
>
> However, I just took a look at the logs from the three days and I see
> things like
>
> Nov 25 23:58:53 avalanche kernel: Free swap = 0kB
> Nov 25 23:58:53 avalanche kernel: Total swap = 2048248kB
>
>
> Something on that system is leaking badly. Do something like
>
> ps aux --sort vsz
>
> and see what process has gone mental and is consuming all of swap. It's
> possible that the OOM killer is triggering too easily but it's possible
> that a delayed triggering of the OOM killer would have been just that -
> a delay. Eventually all memory and all swap would be used.
>

Hi All,

It is a leak in Compiz. Killing and restarting Compiz frees up the swap.
The issue is better in 2.6.32 for some reason. The funny thing is I've
been using Compiz with 2.6.31 for a couple of months now, with no
updates to either, so I'm not sure what triggered this problem.

Regards,
David.

2009-12-07 14:59:37

by Mel Gorman

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On Mon, Dec 07, 2009 at 11:04:24AM +0530, David John wrote:
> On 12/02/2009 09:25 PM, Mel Gorman wrote:
> > On Tue, Dec 01, 2009 at 11:26:37AM -0600, Christoph Lameter wrote:
> >> On Tue, 1 Dec 2009, David John wrote:
> >>
> >>> Here are three logs from three days. Log3.txt is today's log and the OOM
> >>> killer murdered Thunderbird as I was attempting to write this message.
> >>> The kernel config is also attached.
> >>
> >> Hmmm... This is all caused by the HIGHMEM zone freecount going beyond min
> >> which then triggers reclaim which for some reason fails (should not there
> >> is sufficient material there to reclaim). There is enough memory in the
> >> NORMAL zone. Wonder if something broke in 2.6.31 in reclaim? Mel?
> >>
> >
> > I'm not aware of breakage of that level, nor do I believe the page
> > allocator problems are related to this bug.
> >
> > However, I just took a look at the logs from the three days and I see
> > things like
> >
> > Nov 25 23:58:53 avalanche kernel: Free swap = 0kB
> > Nov 25 23:58:53 avalanche kernel: Total swap = 2048248kB
> >
> >
> > Something on that system is leaking badly. Do something like
> >
> > ps aux --sort vsz
> >
> > and see what process has gone mental and is consuming all of swap. It's
> > possible that the OOM killer is triggering too easily but it's possible
> > that a delayed triggering of the OOM killer would have been just that -
> > a delay. Eventually all memory and all swap would be used.
> >
>
> It is a leak in Compiz. Killing and restarting Compiz frees up the swap.
> The issue is better in 2.6.32 for some reason. The funny thing is I've
> been using Compiz with 2.6.31 for a couple of months now, with no
> updates to either, so I'm not sure what triggered this problem.
>

This is a total stab in the dark. Is it possible there was a change in
DRM between 2.6.31 and 2.6.32 that means resources (like textures) are
no longer been freed properly? This might be particularly the case if
you were not using KMS before but you are now.

If something like that has changed, it should probably be brought to the
attention of David Airlie.

If nothing in that regard has changed, I don't have a better alternative
theory as to why it's leaking now.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-08 03:40:33

[permalink] [raw]

Subject: Re: OOM kernel behaviour

On 12/07/2009 08:29 PM, Mel Gorman wrote:
> On Mon, Dec 07, 2009 at 11:04:24AM +0530, David John wrote:
>> On 12/02/2009 09:25 PM, Mel Gorman wrote:
>>> On Tue, Dec 01, 2009 at 11:26:37AM -0600, Christoph Lameter wrote:
>>>> On Tue, 1 Dec 2009, David John wrote:
>>>>
>>>>> Here are three logs from three days. Log3.txt is today's log and the OOM
>>>>> killer murdered Thunderbird as I was attempting to write this message.
>>>>> The kernel config is also attached.
>>>>
>>>> Hmmm... This is all caused by the HIGHMEM zone freecount going beyond min
>>>> which then triggers reclaim which for some reason fails (should not there
>>>> is sufficient material there to reclaim). There is enough memory in the
>>>> NORMAL zone. Wonder if something broke in 2.6.31 in reclaim? Mel?
>>>>
>>>
>>> I'm not aware of breakage of that level, nor do I believe the page
>>> allocator problems are related to this bug.
>>>
>>> However, I just took a look at the logs from the three days and I see
>>> things like
>>>
>>> Nov 25 23:58:53 avalanche kernel: Free swap = 0kB
>>> Nov 25 23:58:53 avalanche kernel: Total swap = 2048248kB
>>>
>>>
>>> Something on that system is leaking badly. Do something like
>>>
>>> ps aux --sort vsz
>>>
>>> and see what process has gone mental and is consuming all of swap. It's
>>> possible that the OOM killer is triggering too easily but it's possible
>>> that a delayed triggering of the OOM killer would have been just that -
>>> a delay. Eventually all memory and all swap would be used.
>>>
>>
>> It is a leak in Compiz. Killing and restarting Compiz frees up the swap.
>> The issue is better in 2.6.32 for some reason. The funny thing is I've
>> been using Compiz with 2.6.31 for a couple of months now, with no
>> updates to either, so I'm not sure what triggered this problem.
>>
>
> This is a total stab in the dark. Is it possible there was a change in
> DRM between 2.6.31 and 2.6.32 that means resources (like textures) are
> no longer been freed properly? This might be particularly the case if
> you were not using KMS before but you are now.
>
> If something like that has changed, it should probably be brought to the
> attention of David Airlie.
>
> If nothing in that regard has changed, I don't have a better alternative
> theory as to why it's leaking now.
>

No, I've had KMS enabled from before 2.6.31. There aren't any major
changes like that from 2.6.31 to 32. It's basically the same config.
Note that the same leak is present on 32, it's just that it takes
much longer to trigger the OOM killer.

I'm guessing it's an internal Compiz problem that I triggered while
changing some Compiz configuration, or it could be a leak in one of the
plugins.

Regards,
David.

2009-12-09 13:52:01