2008-06-14 10:13:18

by Marco Barbero

[permalink] [raw]
Subject: OOM killer invoked on 2.6.24.4

Hi list

ProLiant DL360 G5, Dual Xeon E5345 @ 2.33GHz, 4Gb RAM
kernel 2.6.24.4
samba/slapd/heartbeat/drbd/mon

Last night mon trigger failover because slapd died.
This was in messages:


Jun 14 01:00:17 pippo kernel: sendmail-mta invoked oom-killer:
gfp_mask=0x800d0, order=0, oomkilladj=0
Jun 14 01:00:17 pippo kernel: Pid: 4520, comm: sendmail-mta Not
tainted 2.6.24.4 #2
Jun 14 01:00:17 pippo kernel: [<c014a819>] oom_kill_process+0x54/0xf8
Jun 14 01:00:17 pippo kernel: [<c014ac00>] out_of_memory+0x15c/0x190
Jun 14 01:00:17 pippo kernel: [<c014c7b2>] __alloc_pages+0x239/0x2c7
Jun 14 01:00:17 pippo kernel: [<c01661d5>] cp_new_stat64+0xfc/0x10e
Jun 14 01:00:17 pippo kernel: [<c014c879>] __get_free_pages+0x39/0x47
Jun 14 01:00:17 pippo kernel: [<c018f8cd>] proc_file_read+0x78/0x237
Jun 14 01:00:17 pippo kernel: [<c018f855>] proc_file_read+0x0/0x237
Jun 14 01:00:17 pippo kernel: [<c018c3c1>] proc_reg_read+0x5c/0x6f
Jun 14 01:00:17 pippo kernel: [<c018c365>] proc_reg_read+0x0/0x6f
Jun 14 01:00:17 pippo kernel: [<c0163f99>] vfs_read+0x9f/0x121
Jun 14 01:00:17 pippo kernel: [<c0164396>] sys_read+0x41/0x67
Jun 14 01:00:17 pippo kernel: [<c0103d56>] sysenter_past_esp+0x5f/0x85
Jun 14 01:00:17 pippo kernel: =======================
Jun 14 01:00:17 pippo kernel: Mem-info:
Jun 14 01:00:17 pippo kernel: DMA per-cpu:
Jun 14 01:00:17 pippo kernel: CPU 0: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 1: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 2: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 3: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 4: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 5: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 6: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 7: Hot: hi: 0, btch: 1 usd:
0 Cold: hi: 0, btch: 1 usd: 0
Jun 14 01:00:17 pippo kernel: Normal per-cpu:
Jun 14 01:00:17 pippo kernel: CPU 0: Hot: hi: 186, btch: 31 usd:
116 Cold: hi: 62, btch: 15 usd: 61
Jun 14 01:00:17 pippo kernel: CPU 1: Hot: hi: 186, btch: 31 usd:
37 Cold: hi: 62, btch: 15 usd: 49
Jun 14 01:00:17 pippo kernel: CPU 2: Hot: hi: 186, btch: 31 usd:
107 Cold: hi: 62, btch: 15 usd: 61
Jun 14 01:00:17 pippo kernel: CPU 3: Hot: hi: 186, btch: 31 usd:
41 Cold: hi: 62, btch: 15 usd: 56
Jun 14 01:00:17 pippo kernel: CPU 4: Hot: hi: 186, btch: 31 usd:
13 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 5: Hot: hi: 186, btch: 31 usd:
167 Cold: hi: 62, btch: 15 usd: 59
Jun 14 01:00:17 pippo kernel: CPU 6: Hot: hi: 186, btch: 31 usd:
61 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 7: Hot: hi: 186, btch: 31 usd:
102 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: HighMem per-cpu:
Jun 14 01:00:17 pippo kernel: CPU 0: Hot: hi: 186, btch: 31 usd:
2 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 1: Hot: hi: 186, btch: 31 usd:
33 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 2: Hot: hi: 186, btch: 31 usd:
91 Cold: hi: 62, btch: 15 usd: 12
Jun 14 01:00:17 pippo kernel: CPU 3: Hot: hi: 186, btch: 31 usd:
172 Cold: hi: 62, btch: 15 usd: 7
Jun 14 01:00:17 pippo kernel: CPU 4: Hot: hi: 186, btch: 31 usd:
16 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 5: Hot: hi: 186, btch: 31 usd:
63 Cold: hi: 62, btch: 15 usd: 11
Jun 14 01:00:17 pippo kernel: CPU 6: Hot: hi: 186, btch: 31 usd:
83 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: CPU 7: Hot: hi: 186, btch: 31 usd:
41 Cold: hi: 62, btch: 15 usd: 0
Jun 14 01:00:17 pippo kernel: Active:63299 inactive:6580 dirty:93
writeback:0 unstable:0
Jun 14 01:00:17 pippo kernel: free:745742 slab:211387 mapped:4303
pagetables:412 bounce:0
Jun 14 01:00:17 pippo kernel: DMA free:3548kB min:68kB low:84kB
high:100kB active:0kB inactive:4kB present:16256kB pages_scanned:25
all_unreclaimable? yes
Jun 14 01:00:17 pippo kernel: lowmem_reserve[]: 0 873 4810 4810
Jun 14 01:00:17 pippo kernel: Normal free:3736kB min:3744kB low:4680kB
high:5616kB active:4616kB inactive:4672kB present:894080kB
pages_scanned:14277 all_unreclaimable? yes
Jun 14 01:00:17 pippo kernel: lowmem_reserve[]: 0 0 31496 31496
Jun 14 01:00:17 pippo kernel: HighMem free:2975684kB min:512kB
low:4736kB high:8960kB active:248580kB inactive:21644kB
present:4031488kB pages_scanned:0 all_unreclaimable? no
Jun 14 01:00:17 pippo kernel: lowmem_reserve[]: 0 0 0 0
Jun 14 01:00:17 pippo kernel: DMA: 7*4kB 2*8kB 1*16kB 1*32kB 2*64kB
0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3548kB
Jun 14 01:00:17 pippo kernel: Normal: 58*4kB 5*8kB 2*16kB 2*32kB
5*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3760kB
Jun 14 01:00:17 pippo kernel: HighMem: 7371*4kB 5859*8kB 7946*16kB
6213*32kB 4077*64kB 3064*128kB 1859*256kB 889*512kB 452*1024kB
161*2048kB 48*4096kB = 2975684kB
Jun 14 01:00:17 pippo kernel: Swap cache: add 32, delete 32, find
16/19, race 0+0
Jun 14 01:00:17 pippo kernel: Free swap = 3903752kB
Jun 14 01:00:17 pippo kernel: Total swap = 3903752kB
Jun 14 01:00:17 pippo kernel: Free swap: 3903752kB
Jun 14 01:00:17 pippo kernel: 1245183 pages of RAM
Jun 14 01:00:17 pippo kernel: 1015807 pages of HIGHMEM
Jun 14 01:00:17 pippo kernel: 207787 reserved pages
Jun 14 01:00:17 pippo kernel: 19533 pages shared
Jun 14 01:00:17 pippo kernel: 0 pages swap cached
Jun 14 01:00:17 pippo kernel: 93 pages dirty
Jun 14 01:00:17 pippo kernel: 0 pages writeback
Jun 14 01:00:17 pippo kernel: 4303 pages mapped
Jun 14 01:00:17 pippo kernel: 211387 pages slab
Jun 14 01:00:17 pippo kernel: 412 pages pagetables
Jun 14 01:00:17 pippo kernel: Out of memory: kill process 6873 (slapd)
score 5003 or a child
Jun 14 01:00:17 pippo kernel: Killed process 6873 (slapd)



Anyone can help me in understanding what went wrong? And if I need to
upgrade to last kernel version?
Thanks in advance


2008-06-14 15:31:31

by Bart Van Assche

[permalink] [raw]
Subject: Re: OOM killer invoked on 2.6.24.4

On Sat, Jun 14, 2008 at 12:13 PM, Marco Barbero <[email protected]> wrote:
> samba/slapd/heartbeat/drbd/mon
...
> Jun 14 01:00:17 pippo kernel: Out of memory: kill process 6873 (slapd)
> score 5003 or a child
> Jun 14 01:00:17 pippo kernel: Killed process 6873 (slapd)
>
> Anyone can help me in understanding what went wrong? And if I need to
> upgrade to last kernel version?

One or more processes on your system used too much memory. Dumping the
output of the following command periodically (e.g. every 10 minutes)
to a file will tell you which process is using too much memory:

{ ps aux | head -n 1; ps aux | sort -n +4 | tail -n 10; }

Bart.

2008-06-14 17:34:21

by Marco Barbero

[permalink] [raw]
Subject: Re: OOM killer invoked on 2.6.24.4

2008/6/14 Bart Van Assche <[email protected]>:
> On Sat, Jun 14, 2008 at 12:13 PM, Marco Barbero <[email protected]> wrote:
>> samba/slapd/heartbeat/drbd/mon
> ...
>> Jun 14 01:00:17 pippo kernel: Out of memory: kill process 6873 (slapd)
>> score 5003 or a child
>> Jun 14 01:00:17 pippo kernel: Killed process 6873 (slapd)
>>
>> Anyone can help me in understanding what went wrong? And if I need to
>> upgrade to last kernel version?
>
> One or more processes on your system used too much memory. Dumping the
> output of the following command periodically (e.g. every 10 minutes)
> to a file will tell you which process is using too much memory:

> { ps aux | head -n 1; ps aux | sort -n +4 | tail -n 10; }


I'll do it. Have to say that this box had an uptime of 1 month and
never show issues. Also oom was invoked in the night when load is
minimum.
Could be related to slab and the quicklist leak?

2008-06-14 19:23:32

by Bart Van Assche

[permalink] [raw]
Subject: Re: OOM killer invoked on 2.6.24.4

On Sat, Jun 14, 2008 at 7:34 PM, Marco Barbero <[email protected]> wrote:
>> One or more processes on your system used too much memory. Dumping the
>> output of the following command periodically (e.g. every 10 minutes)
>> to a file will tell you which process is using too much memory:
>
>> { ps aux | head -n 1; ps aux | sort -n +4 | tail -n 10; }
>
> I'll do it. Have to say that this box had an uptime of 1 month and
> never show issues. Also oom was invoked in the night when load is
> minimum.
> Could be related to slab and the quicklist leak?

Are you running the kernel in 32-bit mode ? Wouldn't it be better to
run a 64 bit kernel with 4 GB RAM ?

Bart.

2008-06-16 19:56:09

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: OOM killer invoked on 2.6.24.4

On Sat, Jun 14, 2008 at 05:31:18PM +0200, Bart Van Assche wrote:
> On Sat, Jun 14, 2008 at 12:13 PM, Marco Barbero <[email protected]> wrote:
> > samba/slapd/heartbeat/drbd/mon
> ...
> > Jun 14 01:00:17 pippo kernel: Out of memory: kill process 6873 (slapd)
> > score 5003 or a child
> > Jun 14 01:00:17 pippo kernel: Killed process 6873 (slapd)
> >
> > Anyone can help me in understanding what went wrong? And if I need to
> > upgrade to last kernel version?
>
> One or more processes on your system used too much memory. Dumping the
> output of the following command periodically (e.g. every 10 minutes)
> to a file will tell you which process is using too much memory:
>
> { ps aux | head -n 1; ps aux | sort -n +4 | tail -n 10; }

There were lots of swap free:

Jun 14 01:00:17 pippo kernel: Free swap = 3903752kB
Jun 14 01:00:17 pippo kernel: Total swap = 3903752kB
Jun 14 01:00:17 pippo kernel: Free swap: 3903752kB

Something is wrong.

2008-06-16 20:47:31

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: OOM killer invoked on 2.6.24.4

On Mon, 2008-06-16 at 16:55 -0300, Marcelo Tosatti wrote:
> On Sat, Jun 14, 2008 at 05:31:18PM +0200, Bart Van Assche wrote:
> > On Sat, Jun 14, 2008 at 12:13 PM, Marco Barbero <[email protected]> wrote:
> > > samba/slapd/heartbeat/drbd/mon
> > ...
> > > Jun 14 01:00:17 pippo kernel: Out of memory: kill process 6873 (slapd)
> > > score 5003 or a child
> > > Jun 14 01:00:17 pippo kernel: Killed process 6873 (slapd)
> > >
> > > Anyone can help me in understanding what went wrong? And if I need to
> > > upgrade to last kernel version?
> >
> > One or more processes on your system used too much memory. Dumping the
> > output of the following command periodically (e.g. every 10 minutes)
> > to a file will tell you which process is using too much memory:
> >
> > { ps aux | head -n 1; ps aux | sort -n +4 | tail -n 10; }
>
> There were lots of swap free:
>
> Jun 14 01:00:17 pippo kernel: Free swap = 3903752kB
> Jun 14 01:00:17 pippo kernel: Total swap = 3903752kB
> Jun 14 01:00:17 pippo kernel: Free swap: 3903752kB
>
> Something is wrong.

It was a normal GFP_KERNEL allocation:

Jun 14 01:00:17 pippo kernel: sendmail-mta invoked oom-killer:
gfp_mask=0x800d0, order=0, oomkilladj=0

And the system /is/ low on memory:

Jun 14 01:00:17 pippo kernel: DMA free:3548kB min:68kB low:84kB
high:100kB active:0kB inactive:4kB present:16256kB pages_scanned:25
all_unreclaimable? yes
Jun 14 01:00:17 pippo kernel: lowmem_reserve[]: 0 873 4810 4810
Jun 14 01:00:17 pippo kernel: Normal free:3736kB min:3744kB low:4680kB
high:5616kB active:4616kB inactive:4672kB present:894080kB
pages_scanned:14277 all_unreclaimable? yes
Jun 14 01:00:17 pippo kernel: lowmem_reserve[]: 0 0 31496 31496
Jun 14 01:00:17 pippo kernel: HighMem free:2975684kB min:512kB
low:4736kB high:8960kB active:248580kB inactive:21644kB
present:4031488kB pages_scanned:0 all_unreclaimable? no

Still high memory available, and swap, but ZONE_DMA and ZONE_NORMAL were
all used and "all_unreclaimable? yes" . So __alloc_pages(GFP_KERNEL)
fails.

It might be tcp buffers- see
http://marc.info/?l=linux-netdev&m=121362441431941&w=2

In which case, use this workaround:

echo "98304 131072 196608" > /proc/sys/net/ipv4/tcp_mem

Mike.