2005-09-20 21:38:00

by Chris Friesen

[permalink] [raw]
Subject: help interpreting oom-killer output


I'm running a modified 2.6.10 on an x86 uniprocessor system. I keep
having processes killed by the oom killer at the same place while
running LTP. The system has gigs of memory, so I find this kind of odd.

Could someone help me interpret the oom-killer output? The first log
looks like this.


oom-killer: gfp_mask=0xd0
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16

Free pages: 2473820kB (2468096kB HighMem)
Active:2831 inactive:393 dirty:2 writeback:0 unstable:0 free:618551
slab:215125 mapped:1760 pagetables:107
DMA free:68kB min:68kB low:84kB high:100kB active:0kB inactive:0kB
present:16384kB pages_scanned:0 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:7832kB min:3756kB low:4692kB high:5632kB active:0kB
inactive:0kB present:901120kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:2468096kB min:512kB low:640kB high:768kB active:11324kB
inactive:1572kB present:3276800kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 68kB
Normal: 10*4kB 8*8kB 13*16kB 1*32kB 5*64kB 1*128kB 7*256kB 3*512kB
2*1024kB 2*2048kB 0*4096kB = 10264kB
HighMem: 376*4kB 364*8kB 292*16kB 136*32kB 96*64kB 61*128kB 30*256kB
20*512kB 18*1024kB 16*2048kB 579*4096kB = 2468096kB
Out of Memory: Killed process 17664 (bash).



Thanks,

Chris


2005-09-20 23:54:05

by Robert Hancock

[permalink] [raw]
Subject: Re: help interpreting oom-killer output

Christopher Friesen wrote:
>
> I'm running a modified 2.6.10 on an x86 uniprocessor system. I keep
> having processes killed by the oom killer at the same place while
> running LTP. The system has gigs of memory, so I find this kind of odd.
>
> Could someone help me interpret the oom-killer output? The first log
> looks like this.

Looks like you were running out of ZONE_NORMAL memory (below 896MB).
There is lots of high memory available but the allocation could not be
satisfied from there.

I would try a newer kernel..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2005-09-21 05:03:34

by Chris Friesen

[permalink] [raw]
Subject: Re: help interpreting oom-killer output

Robert Hancock wrote:

> Looks like you were running out of ZONE_NORMAL memory (below 896MB).
> There is lots of high memory available but the allocation could not be
> satisfied from there.

Thanks for the interpretation.

> I would try a newer kernel..

I wish. Newer kernel is not an option. Any fixes need to be back-ported.

Chris

2005-09-21 15:39:48

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: help interpreting oom-killer output

On Tue, Sep 20, 2005 at 03:37:53PM -0600, Christopher Friesen wrote:
>
> I'm running a modified 2.6.10 on an x86 uniprocessor system. I keep
> having processes killed by the oom killer at the same place while
> running LTP. The system has gigs of memory, so I find this kind of odd.
>
> Could someone help me interpret the oom-killer output? The first log
> looks like this.
>
>
> oom-killer: gfp_mask=0xd0
> DMA per-cpu:
> cpu 0 hot: low 2, high 6, batch 1
> cpu 0 cold: low 0, high 2, batch 1
> Normal per-cpu:
> cpu 0 hot: low 32, high 96, batch 16
> cpu 0 cold: low 0, high 32, batch 16
> HighMem per-cpu:
> cpu 0 hot: low 32, high 96, batch 16
> cpu 0 cold: low 0, high 32, batch 16
>
> Free pages: 2473820kB (2468096kB HighMem)
> Active:2831 inactive:393 dirty:2 writeback:0 unstable:0 free:618551
> slab:215125 mapped:1760 pagetables:107
> DMA free:68kB min:68kB low:84kB high:100kB active:0kB inactive:0kB
> present:16384kB pages_scanned:0 all_unreclaimable? yes

See that the DMA zone free count is equal to the "min" watermark. Normal
and Highmem are both above the "high" watermark.

So this must be a DMA allocation (see gfp_mask). Stick a "dump_stack()"
to find out who is the allocator.

There have been a lot of changes since v2.6.10 in the OOM killer and reclaim
path.

> protections[]: 0 0 0
> Normal free:7832kB min:3756kB low:4692kB high:5632kB active:0kB
> inactive:0kB present:901120kB pages_scanned:0 all_unreclaimable? no
> protections[]: 0 0 0
> HighMem free:2468096kB min:512kB low:640kB high:768kB active:11324kB
> inactive:1572kB present:3276800kB pages_scanned:0 all_unreclaimable? no
> protections[]: 0 0 0
> DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
> 0*2048kB 0*4096kB = 68kB
> Normal: 10*4kB 8*8kB 13*16kB 1*32kB 5*64kB 1*128kB 7*256kB 3*512kB
> 2*1024kB 2*2048kB 0*4096kB = 10264kB
> HighMem: 376*4kB 364*8kB 292*16kB 136*32kB 96*64kB 61*128kB 30*256kB
> 20*512kB 18*1024kB 16*2048kB 579*4096kB = 2468096kB
> Out of Memory: Killed process 17664 (bash).

2005-09-21 16:07:39

by Chris Friesen

[permalink] [raw]
Subject: Re: help interpreting oom-killer output

Marcelo Tosatti wrote:

> See that the DMA zone free count is equal to the "min" watermark. Normal
> and Highmem are both above the "high" watermark.
>
> So this must be a DMA allocation (see gfp_mask). Stick a "dump_stack()"
> to find out who is the allocator.

The final trigger may be a DMA allocation, but the initial cause is
whatever is chewing up all the NORMAL memory.

I can repeatably trigger the fault by running LTP. When it hits the
"rename14" test, the oom killer kicks in. Before running this test, I
had over 3GB of memory free, including over 800MB of normal memory.

To track it down, I started dumping /proc/slabinfo every second while
running this test. It appears the culprit is the dentry_cache, which
consumed at least 817MB of memory (and probably peaked higher than
that). As soon as the test program died, all the memory was freed.

Anyone have any ideas what's going on?

Chris

2005-09-21 17:35:37

by Chris Friesen

[permalink] [raw]
Subject: Re: help interpreting oom-killer output

Marcelo Tosatti wrote:
> On Tue, Sep 20, 2005 at 03:37:53PM -0600, Christopher Friesen wrote:

>>oom-killer: gfp_mask=0xd0

> So this must be a DMA allocation (see gfp_mask). Stick a "dump_stack()"
> to find out who is the allocator.

Checking in gfp.h, I see:

#define __GFP_DMA 0x01
#define __GFP_HIGHMEM 0x02
#define __GFP_WAIT 0x10 /* Can wait and reschedule? */
#define __GFP_HIGH 0x20 /* Should access emergency pools? */
#define __GFP_IO 0x40 /* Can start physical IO? */
#define __GFP_FS 0x80 /* Can call down to low-level FS? */
#define GFP_KERNEL (__GFP_WAIT | __GFP_IO | __GFP_FS)

Thus, it looks like it's not a dma allocation. By my reading, it
appears to be a standard GFP_KERNEL.

Chris