2008-01-31 13:07:23

by Claude Frantz

[permalink] [raw]
Subject: OOM-killer invoked but why ?

Hello !

I'm faced to a problem where the OOM-killer is invoked but I cannot find
the reason why. The machine is rather powerfull, the load is very moderate,
the disk swap space is nearly unused. The only strange observation which
appears to me is the slow but progressive decreasing of kbbuffers during
many hours.

Can you help me to diagnose the problem and to find a good solution ?

Thanks a lot !

Claude


kernel: 2.6.22.14-72.fc6 (Fedora 6)

"sar -r" output:

12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
12:10:01 AM 1739920 1635056 48.45 9368 135620 8192960 148 0.00 0
12:20:01 AM 1691180 1683796 49.89 8644 162992 8192960 148 0.00 0
12:30:01 AM 1732076 1642900 48.68 8608 141168 8192960 148 0.00 0
12:40:01 AM 1766308 1608668 47.66 8128 134744 8192960 148 0.00 0
12:50:01 AM 1718156 1656820 49.09 6884 134288 8192960 148 0.00 0
01:00:01 AM 1728448 1646528 48.79 6476 137912 8192960 148 0.00 0
01:10:01 AM 1707652 1667324 49.40 5792 156572 8192960 148 0.00 0
01:20:01 AM 1736928 1638048 48.54 6368 138872 8192960 148 0.00 0
01:30:02 AM 1776288 1598688 47.37 5412 145136 8192960 148 0.00 0
01:40:01 AM 1780456 1594520 47.25 5464 150536 8192960 148 0.00 0
01:50:01 AM 1744856 1630120 48.30 4960 154732 8192960 148 0.00 0
02:00:02 AM 1687012 1687964 50.01 3996 171048 8192960 148 0.00 0
02:10:01 AM 1696020 1678956 49.75 3916 145424 8192960 148 0.00 0
02:20:02 AM 1740864 1634112 48.42 4340 142900 8192960 148 0.00 0
02:30:01 AM 1769460 1605516 47.57 3516 138056 8192960 148 0.00 0
02:40:02 AM 1764376 1610600 47.72 3184 138844 8192960 148 0.00 0
02:50:02 AM 1702100 1672876 49.57 3736 157448 8192960 148 0.00 0
03:00:01 AM 1750396 1624580 48.14 3556 141016 8192960 148 0.00 0
03:10:02 AM 1744168 1630808 48.32 1900 136612 8192960 148 0.00 0
03:20:01 AM 1749388 1625588 48.17 1012 136804 8192960 148 0.00 0
03:30:01 AM 1728028 1646948 48.80 1980 139104 8192960 148 0.00 0
03:40:01 AM 1718596 1656380 49.08 1136 156932 8192960 148 0.00 0
03:50:02 AM 1692684 1682292 49.85 768 140808 8192960 148 0.00 0
~~~~~~ OOM-killer in action. Then reboot.
07:30:01 AM 2134568 1240408 36.75 233624 506224 8193108 0 0.00 0
07:40:01 AM 2104412 1270564 37.65 252204 524220 8193108 0 0.00 0
07:50:01 AM 2049712 1325264 39.27 265368 527096 8193108 0 0.00 0
08:00:01 AM 1813652 1561324 46.26 281708 527296 8193108 0 0.00 0

The values in /proc/sys/vm :

/proc/sys/vm/overcommit_memory
0
/proc/sys/vm/panic_on_oom
0
/proc/sys/vm/overcommit_ratio
50
/proc/sys/vm/page-cluster
3
/proc/sys/vm/dirty_background_ratio
5
/proc/sys/vm/dirty_ratio
10
/proc/sys/vm/dirty_writeback_centisecs
499
/proc/sys/vm/dirty_expire_centisecs
2999
/proc/sys/vm/nr_pdflush_threads
2
/proc/sys/vm/swappiness
60
/proc/sys/vm/nr_hugepages
0
/proc/sys/vm/hugetlb_shm_group
0
/proc/sys/vm/lowmem_reserve_ratio
256 32
/proc/sys/vm/drop_caches
0
/proc/sys/vm/min_free_kbytes
3816
/proc/sys/vm/percpu_pagelist_fraction
0
/proc/sys/vm/max_map_count
65536
/proc/sys/vm/laptop_mode
0
/proc/sys/vm/block_dump
0
/proc/sys/vm/vfs_cache_pressure
100
/proc/sys/vm/legacy_va_layout
0
/proc/sys/vm/stat_interval
1
/proc/sys/vm/vdso_enabled
1

The syslog extract:

Jan 28 03:50:24 toaster kernel: ps invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Jan 28 03:50:24 toaster kernel: [<c045cf52>] out_of_memory+0x69/0x1a7
Jan 28 03:50:24 toaster kernel: [<c045e3bb>] __alloc_pages+0x216/0x2a0
Jan 28 03:50:24 toaster kernel: [<c04a6f1e>] proc_info_read+0x0/0x9d
Jan 28 03:50:24 toaster kernel: [<c045e471>] __get_free_pages+0x2c/0x3a
Jan 28 03:50:24 toaster kernel: [<c04a6f57>] proc_info_read+0x39/0x9d
Jan 28 03:50:24 toaster kernel: [<c04a6f1e>] proc_info_read+0x0/0x9d
Jan 28 03:50:24 toaster kernel: [<c0477dda>] vfs_read+0xa6/0x158
Jan 28 03:50:24 toaster kernel: [<c0478238>] sys_read+0x41/0x67
Jan 28 03:50:24 toaster kernel: [<c0404fa2>] syscall_call+0x7/0xb
Jan 28 03:50:24 toaster kernel: =======================
Jan 28 03:50:24 toaster kernel: Mem-info:
Jan 28 03:50:24 toaster kernel: DMA per-cpu:
Jan 28 03:50:35 toaster xinetd[3182]: START: time-dgram pid=0 from=137.193.74.3
Jan 28 03:50:48 toaster kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Jan 28 03:50:48 toaster kernel: Normal per-cpu:
Jan 28 03:50:48 toaster kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 58 Cold: hi: 62, btch: 15 usd: 60
Jan 28 03:50:48 toaster kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 34 Cold: hi: 62, btch: 15 usd: 60
Jan 28 03:50:49 toaster kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 42 Cold: hi: 62, btch: 15 usd: 52
Jan 28 03:50:49 toaster kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 85 Cold: hi: 62, btch: 15 usd: 51
Jan 28 03:50:49 toaster kernel: HighMem per-cpu:
Jan 28 03:50:49 toaster kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 176 Cold: hi: 62, btch: 15 usd: 9
Jan 28 03:50:49 toaster kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 57 Cold: hi: 62, btch: 15 usd: 14
Jan 28 03:50:49 toaster kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 143 Cold: hi: 62, btch: 15 usd: 11
Jan 28 03:50:49 toaster kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 55 Cold: hi: 62, btch: 15 usd: 0
Jan 28 03:50:49 toaster kernel: Active:186294 inactive:2340 dirty:6 writeback:55 unstable:0
Jan 28 03:50:49 toaster kernel: free:431675 slab:177466 mapped:7100 pagetables:1915 bounce:0
Jan 28 03:50:49 toaster kernel: DMA free:3544kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? yes
Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 873 3285
Jan 28 03:50:49 toaster kernel: Normal free:3684kB min:3744kB low:4680kB high:5616kB active:212kB inactive:112kB present:894080kB pages_scanned:365 all_unreclaimable? yes
Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 0 19300
Jan 28 03:50:49 toaster kernel: HighMem free:1719472kB min:512kB low:3100kB high:5688kB active:744964kB inactive:9248kB present:2470404kB pages_scanned:0 all_unreclaimable? no
Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 0 0
Jan 28 03:50:49 toaster kernel: DMA: 3*4kB 4*8kB 3*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3548kB
Jan 28 03:50:49 toaster kernel: Normal: 30*4kB 29*8kB 8*16kB 1*32kB 3*64kB 5*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3904kB
Jan 28 03:50:49 toaster kernel: HighMem: 5991*4kB 8849*8kB 18804*16kB 12622*32kB 7820*64kB 2500*128kB 354*256kB 15*512kB 1*1024kB 0*2048kB 0*4096kB = 1719332kB
Jan 28 03:50:49 toaster kernel: Swap cache: add 37, delete 37, find 0/0, race 0+0
Jan 28 03:50:49 toaster kernel: Free swap = 8192960kB
Jan 28 03:50:49 toaster kernel: Total swap = 8193108kB
Jan 28 03:50:49 toaster kernel: Free swap: 8192960kB
Jan 28 03:50:49 toaster kernel: 851840 pages of RAM
Jan 28 03:50:49 toaster kernel: 622464 pages of HIGHMEM
Jan 28 03:50:49 toaster kernel: 8096 reserved pages
Jan 28 03:50:49 toaster kernel: 638310 pages shared
Jan 28 03:50:49 toaster kernel: 0 pages swap cached
Jan 28 03:50:49 toaster kernel: 6 pages dirty
Jan 28 03:50:49 toaster kernel: 55 pages writeback
Jan 28 03:50:49 toaster kernel: 7100 pages mapped
Jan 28 03:50:49 toaster kernel: 177466 pages slab
Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859 (amavisd) score 36218 or a child
Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)

from "lspci":

SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)


2008-01-31 14:36:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: OOM-killer invoked but why ?


On Thu, 2008-01-31 at 13:53 +0100, Claude Frantz wrote:
> Hello !
>
> I'm faced to a problem where the OOM-killer is invoked but I cannot find
> the reason why. The machine is rather powerfull, the load is very moderate,
> the disk swap space is nearly unused. The only strange observation which
> appears to me is the slow but progressive decreasing of kbbuffers during
> many hours.
>
> Can you help me to diagnose the problem and to find a good solution ?
>
> Thanks a lot !
>
> Claude
>
>
> kernel: 2.6.22.14-72.fc6 (Fedora 6)
>
> "sar -r" output:
>
> 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
> 12:10:01 AM 1739920 1635056 48.45 9368 135620 8192960 148 0.00 0
> 12:20:01 AM 1691180 1683796 49.89 8644 162992 8192960 148 0.00 0
> 12:30:01 AM 1732076 1642900 48.68 8608 141168 8192960 148 0.00 0
> 12:40:01 AM 1766308 1608668 47.66 8128 134744 8192960 148 0.00 0
> 12:50:01 AM 1718156 1656820 49.09 6884 134288 8192960 148 0.00 0
> 01:00:01 AM 1728448 1646528 48.79 6476 137912 8192960 148 0.00 0
> 01:10:01 AM 1707652 1667324 49.40 5792 156572 8192960 148 0.00 0
> 01:20:01 AM 1736928 1638048 48.54 6368 138872 8192960 148 0.00 0
> 01:30:02 AM 1776288 1598688 47.37 5412 145136 8192960 148 0.00 0
> 01:40:01 AM 1780456 1594520 47.25 5464 150536 8192960 148 0.00 0
> 01:50:01 AM 1744856 1630120 48.30 4960 154732 8192960 148 0.00 0
> 02:00:02 AM 1687012 1687964 50.01 3996 171048 8192960 148 0.00 0
> 02:10:01 AM 1696020 1678956 49.75 3916 145424 8192960 148 0.00 0
> 02:20:02 AM 1740864 1634112 48.42 4340 142900 8192960 148 0.00 0
> 02:30:01 AM 1769460 1605516 47.57 3516 138056 8192960 148 0.00 0
> 02:40:02 AM 1764376 1610600 47.72 3184 138844 8192960 148 0.00 0
> 02:50:02 AM 1702100 1672876 49.57 3736 157448 8192960 148 0.00 0
> 03:00:01 AM 1750396 1624580 48.14 3556 141016 8192960 148 0.00 0
> 03:10:02 AM 1744168 1630808 48.32 1900 136612 8192960 148 0.00 0
> 03:20:01 AM 1749388 1625588 48.17 1012 136804 8192960 148 0.00 0
> 03:30:01 AM 1728028 1646948 48.80 1980 139104 8192960 148 0.00 0
> 03:40:01 AM 1718596 1656380 49.08 1136 156932 8192960 148 0.00 0
> 03:50:02 AM 1692684 1682292 49.85 768 140808 8192960 148 0.00 0
> ~~~~~~ OOM-killer in action. Then reboot.
> 07:30:01 AM 2134568 1240408 36.75 233624 506224 8193108 0 0.00 0
> 07:40:01 AM 2104412 1270564 37.65 252204 524220 8193108 0 0.00 0
> 07:50:01 AM 2049712 1325264 39.27 265368 527096 8193108 0 0.00 0
> 08:00:01 AM 1813652 1561324 46.26 281708 527296 8193108 0 0.00 0
>
> The values in /proc/sys/vm :
>
> /proc/sys/vm/overcommit_memory
> 0
> /proc/sys/vm/panic_on_oom
> 0
> /proc/sys/vm/overcommit_ratio
> 50
> /proc/sys/vm/page-cluster
> 3
> /proc/sys/vm/dirty_background_ratio
> 5
> /proc/sys/vm/dirty_ratio
> 10
> /proc/sys/vm/dirty_writeback_centisecs
> 499
> /proc/sys/vm/dirty_expire_centisecs
> 2999
> /proc/sys/vm/nr_pdflush_threads
> 2
> /proc/sys/vm/swappiness
> 60
> /proc/sys/vm/nr_hugepages
> 0
> /proc/sys/vm/hugetlb_shm_group
> 0
> /proc/sys/vm/lowmem_reserve_ratio
> 256 32
> /proc/sys/vm/drop_caches
> 0
> /proc/sys/vm/min_free_kbytes
> 3816
> /proc/sys/vm/percpu_pagelist_fraction
> 0
> /proc/sys/vm/max_map_count
> 65536
> /proc/sys/vm/laptop_mode
> 0
> /proc/sys/vm/block_dump
> 0
> /proc/sys/vm/vfs_cache_pressure
> 100
> /proc/sys/vm/legacy_va_layout
> 0
> /proc/sys/vm/stat_interval
> 1
> /proc/sys/vm/vdso_enabled
> 1
>
> The syslog extract:
>
> Jan 28 03:50:24 toaster kernel: ps invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
> Jan 28 03:50:24 toaster kernel: [<c045cf52>] out_of_memory+0x69/0x1a7
> Jan 28 03:50:24 toaster kernel: [<c045e3bb>] __alloc_pages+0x216/0x2a0
> Jan 28 03:50:24 toaster kernel: [<c04a6f1e>] proc_info_read+0x0/0x9d
> Jan 28 03:50:24 toaster kernel: [<c045e471>] __get_free_pages+0x2c/0x3a
> Jan 28 03:50:24 toaster kernel: [<c04a6f57>] proc_info_read+0x39/0x9d
> Jan 28 03:50:24 toaster kernel: [<c04a6f1e>] proc_info_read+0x0/0x9d
> Jan 28 03:50:24 toaster kernel: [<c0477dda>] vfs_read+0xa6/0x158
> Jan 28 03:50:24 toaster kernel: [<c0478238>] sys_read+0x41/0x67
> Jan 28 03:50:24 toaster kernel: [<c0404fa2>] syscall_call+0x7/0xb
> Jan 28 03:50:24 toaster kernel: =======================
> Jan 28 03:50:24 toaster kernel: Mem-info:
> Jan 28 03:50:24 toaster kernel: DMA per-cpu:
> Jan 28 03:50:35 toaster xinetd[3182]: START: time-dgram pid=0 from=137.193.74.3
> Jan 28 03:50:48 toaster kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> Jan 28 03:50:48 toaster kernel: CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> Jan 28 03:50:48 toaster kernel: CPU 2: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> Jan 28 03:50:48 toaster kernel: CPU 3: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
> Jan 28 03:50:48 toaster kernel: Normal per-cpu:
> Jan 28 03:50:48 toaster kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 58 Cold: hi: 62, btch: 15 usd: 60
> Jan 28 03:50:48 toaster kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 34 Cold: hi: 62, btch: 15 usd: 60
> Jan 28 03:50:49 toaster kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 42 Cold: hi: 62, btch: 15 usd: 52
> Jan 28 03:50:49 toaster kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 85 Cold: hi: 62, btch: 15 usd: 51
> Jan 28 03:50:49 toaster kernel: HighMem per-cpu:
> Jan 28 03:50:49 toaster kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 176 Cold: hi: 62, btch: 15 usd: 9
> Jan 28 03:50:49 toaster kernel: CPU 1: Hot: hi: 186, btch: 31 usd: 57 Cold: hi: 62, btch: 15 usd: 14
> Jan 28 03:50:49 toaster kernel: CPU 2: Hot: hi: 186, btch: 31 usd: 143 Cold: hi: 62, btch: 15 usd: 11
> Jan 28 03:50:49 toaster kernel: CPU 3: Hot: hi: 186, btch: 31 usd: 55 Cold: hi: 62, btch: 15 usd: 0
> Jan 28 03:50:49 toaster kernel: Active:186294 inactive:2340 dirty:6 writeback:55 unstable:0
> Jan 28 03:50:49 toaster kernel: free:431675 slab:177466 mapped:7100 pagetables:1915 bounce:0
> Jan 28 03:50:49 toaster kernel: DMA free:3544kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? yes
> Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 873 3285
> Jan 28 03:50:49 toaster kernel: Normal free:3684kB min:3744kB low:4680kB high:5616kB active:212kB inactive:112kB present:894080kB pages_scanned:365 all_unreclaimable? yes
> Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 0 19300
> Jan 28 03:50:49 toaster kernel: HighMem free:1719472kB min:512kB low:3100kB high:5688kB active:744964kB inactive:9248kB present:2470404kB pages_scanned:0 all_unreclaimable? no
> Jan 28 03:50:49 toaster kernel: lowmem_reserve[]: 0 0 0
> Jan 28 03:50:49 toaster kernel: DMA: 3*4kB 4*8kB 3*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3548kB
> Jan 28 03:50:49 toaster kernel: Normal: 30*4kB 29*8kB 8*16kB 1*32kB 3*64kB 5*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3904kB
> Jan 28 03:50:49 toaster kernel: HighMem: 5991*4kB 8849*8kB 18804*16kB 12622*32kB 7820*64kB 2500*128kB 354*256kB 15*512kB 1*1024kB 0*2048kB 0*4096kB = 1719332kB
> Jan 28 03:50:49 toaster kernel: Swap cache: add 37, delete 37, find 0/0, race 0+0
> Jan 28 03:50:49 toaster kernel: Free swap = 8192960kB
> Jan 28 03:50:49 toaster kernel: Total swap = 8193108kB
> Jan 28 03:50:49 toaster kernel: Free swap: 8192960kB
> Jan 28 03:50:49 toaster kernel: 851840 pages of RAM
> Jan 28 03:50:49 toaster kernel: 622464 pages of HIGHMEM
> Jan 28 03:50:49 toaster kernel: 8096 reserved pages
> Jan 28 03:50:49 toaster kernel: 638310 pages shared
> Jan 28 03:50:49 toaster kernel: 0 pages swap cached
> Jan 28 03:50:49 toaster kernel: 6 pages dirty
> Jan 28 03:50:49 toaster kernel: 55 pages writeback
> Jan 28 03:50:49 toaster kernel: 7100 pages mapped
> Jan 28 03:50:49 toaster kernel: 177466 pages slab
> Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
> Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859 (amavisd) score 36218 or a child
> Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)


You seem to have ran out of zone normal memory with all of it stuck in
kernel allocations. Would you have /proc/slabinfo available?

2008-01-31 15:17:32

by Claude Frantz

[permalink] [raw]
Subject: Re: OOM-killer invoked but why ?

Peter Zijlstra wrote:

> You seem to have ran out of zone normal memory with all of it stuck in
> kernel allocations. Would you have /proc/slabinfo available?

Thanks Peter !

No ! There is no /proc/slabinfo available.

Claude

2008-01-31 18:14:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: OOM-killer invoked but why ?


On Thu, 2008-01-31 at 15:41 +0100, Claude Frantz wrote:
> Peter Zijlstra wrote:
>
> > You seem to have ran out of zone normal memory with all of it stuck in
> > kernel allocations. Would you have /proc/slabinfo available?
>
> Thanks Peter !
>
> No ! There is no /proc/slabinfo available.

If you're using SLUB there is:
Documentation/vm/slabinfo.c

2008-02-05 10:07:55

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer invoked but why ?

On Thu, 31 Jan 2008 13:53:05 +0100 Claude Frantz <[email protected]> wrote:

> Hello !
>
> I'm faced to a problem where the OOM-killer is invoked but I cannot find
> the reason why. The machine is rather powerfull, the load is very moderate,
> the disk swap space is nearly unused. The only strange observation which
> appears to me is the slow but progressive decreasing of kbbuffers during
> many hours.
>
> Can you help me to diagnose the problem and to find a good solution ?
>
> ...
>
> Jan 28 03:50:49 toaster kernel: 177466 pages slab
> Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
> Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859 (amavisd) score 36218 or a child
> Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)

slab. Maybe you've been bitten by the quicklist leak. If you're able to
patch your kernel then please try this fix:

commit 96990a4ae979df9e235d01097d6175759331e88c
Author: Christoph Lameter <[email protected]>
Date: Mon Jan 14 00:55:14 2008 -0800

quicklists: Only consider memory that can be used with GFP_KERNEL

Quicklists calculates the size of the quicklists based on the number of
free pages. This must be the number of free pages that can be allocated
with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.

Signed-off-by: Christoph Lameter <[email protected]>
Tested-by: Dhaval Giani <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/mm/quicklist.c b/mm/quicklist.c
index ae8189c..3f703f7 100644
--- a/mm/quicklist.c
+++ b/mm/quicklist.c
@@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
static unsigned long max_pages(unsigned long min_pages)
{
unsigned long node_free_pages, max;
+ struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
+
+ node_free_pages =
+#ifdef CONFIG_ZONE_DMA
+ zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
+#endif
+#ifdef CONFIG_ZONE_DMA32
+ zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
+#endif
+ zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);

- node_free_pages = node_page_state(numa_node_id(),
- NR_FREE_PAGES);
max = node_free_pages / FRACTION_OF_NODE_MEM;
return max(max, min_pages);
}


I note that this didn't have the [email protected] cc. Christoph, did we
deliberately decide not to backport?

2008-02-05 11:03:31

by Dhaval Giani

[permalink] [raw]
Subject: Re: OOM-killer invoked but why ?

On Tue, Feb 05, 2008 at 02:07:37AM -0800, Andrew Morton wrote:
> On Thu, 31 Jan 2008 13:53:05 +0100 Claude Frantz <[email protected]> wrote:
>
> > Hello !
> >
> > I'm faced to a problem where the OOM-killer is invoked but I cannot find
> > the reason why. The machine is rather powerfull, the load is very moderate,
> > the disk swap space is nearly unused. The only strange observation which
> > appears to me is the slow but progressive decreasing of kbbuffers during
> > many hours.
> >
> > Can you help me to diagnose the problem and to find a good solution ?
> >
> > ...
> >
> > Jan 28 03:50:49 toaster kernel: 177466 pages slab
> > Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
> > Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859 (amavisd) score 36218 or a child
> > Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)
>
> slab. Maybe you've been bitten by the quicklist leak. If you're able to
> patch your kernel then please try this fix:
>
> commit 96990a4ae979df9e235d01097d6175759331e88c
> Author: Christoph Lameter <[email protected]>
> Date: Mon Jan 14 00:55:14 2008 -0800
>
> quicklists: Only consider memory that can be used with GFP_KERNEL
>
> Quicklists calculates the size of the quicklists based on the number of
> free pages. This must be the number of free pages that can be allocated
> with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
> ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.
>
> Signed-off-by: Christoph Lameter <[email protected]>
> Tested-by: Dhaval Giani <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> diff --git a/mm/quicklist.c b/mm/quicklist.c
> index ae8189c..3f703f7 100644
> --- a/mm/quicklist.c
> +++ b/mm/quicklist.c
> @@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
> static unsigned long max_pages(unsigned long min_pages)
> {
> unsigned long node_free_pages, max;
> + struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
> +
> + node_free_pages =
> +#ifdef CONFIG_ZONE_DMA
> + zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
> +#endif
> +#ifdef CONFIG_ZONE_DMA32
> + zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
> +#endif
> + zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);
>
> - node_free_pages = node_page_state(numa_node_id(),
> - NR_FREE_PAGES);
> max = node_free_pages / FRACTION_OF_NODE_MEM;
> return max(max, min_pages);
> }
>
>
> I note that this didn't have the [email protected] cc. Christoph, did we
> deliberately decide not to backport?
>

According to
http://archive.netbsd.se/?ml=linux-stable-commits&a=2008-01&m=6134301 ,
its been added to the stable tree. I remember asking Greg to add it.

Thanks
--
regards,
Dhaval

2008-02-05 22:07:14

by Greg KH

[permalink] [raw]
Subject: Re: [stable] OOM-killer invoked but why ?

On Tue, Feb 05, 2008 at 04:33:03PM +0530, Dhaval Giani wrote:
> On Tue, Feb 05, 2008 at 02:07:37AM -0800, Andrew Morton wrote:
> > On Thu, 31 Jan 2008 13:53:05 +0100 Claude Frantz <[email protected]> wrote:
> >
> > > Hello !
> > >
> > > I'm faced to a problem where the OOM-killer is invoked but I cannot find
> > > the reason why. The machine is rather powerfull, the load is very moderate,
> > > the disk swap space is nearly unused. The only strange observation which
> > > appears to me is the slow but progressive decreasing of kbbuffers during
> > > many hours.
> > >
> > > Can you help me to diagnose the problem and to find a good solution ?
> > >
> > > ...
> > >
> > > Jan 28 03:50:49 toaster kernel: 177466 pages slab
> > > Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
> > > Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859 (amavisd) score 36218 or a child
> > > Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)
> >
> > slab. Maybe you've been bitten by the quicklist leak. If you're able to
> > patch your kernel then please try this fix:
> >
> > commit 96990a4ae979df9e235d01097d6175759331e88c
> > Author: Christoph Lameter <[email protected]>
> > Date: Mon Jan 14 00:55:14 2008 -0800
> >
> > quicklists: Only consider memory that can be used with GFP_KERNEL
> >
> > Quicklists calculates the size of the quicklists based on the number of
> > free pages. This must be the number of free pages that can be allocated
> > with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
> > ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.
> >
> > Signed-off-by: Christoph Lameter <[email protected]>
> > Tested-by: Dhaval Giani <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > Signed-off-by: Linus Torvalds <[email protected]>
> >
> > diff --git a/mm/quicklist.c b/mm/quicklist.c
> > index ae8189c..3f703f7 100644
> > --- a/mm/quicklist.c
> > +++ b/mm/quicklist.c
> > @@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
> > static unsigned long max_pages(unsigned long min_pages)
> > {
> > unsigned long node_free_pages, max;
> > + struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
> > +
> > + node_free_pages =
> > +#ifdef CONFIG_ZONE_DMA
> > + zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
> > +#endif
> > +#ifdef CONFIG_ZONE_DMA32
> > + zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
> > +#endif
> > + zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);
> >
> > - node_free_pages = node_page_state(numa_node_id(),
> > - NR_FREE_PAGES);
> > max = node_free_pages / FRACTION_OF_NODE_MEM;
> > return max(max, min_pages);
> > }
> >
> >
> > I note that this didn't have the [email protected] cc. Christoph, did we
> > deliberately decide not to backport?
> >
>
> According to
> http://archive.netbsd.se/?ml=linux-stable-commits&a=2008-01&m=6134301 ,
> its been added to the stable tree. I remember asking Greg to add it.

And then Christoph told me to remove it...

thanks,

greg k-h

2008-02-05 22:13:34

by Christoph Lameter

[permalink] [raw]
Subject: Re: [stable] OOM-killer invoked but why ?

On Tue, 5 Feb 2008, Greg KH wrote:

> > > commit 96990a4ae979df9e235d01097d6175759331e88c
> > > Author: Christoph Lameter <[email protected]>
> > > Date: Mon Jan 14 00:55:14 2008 -0800
> > >
> > > quicklists: Only consider memory that can be used with GFP_KERNEL
> > >
> > > Quicklists calculates the size of the quicklists based on the number of
> > > free pages. This must be the number of free pages that can be allocated
> > > with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
> > > ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.
> > >
> > > Signed-off-by: Christoph Lameter <[email protected]>
> > > Tested-by: Dhaval Giani <[email protected]>
> > > Signed-off-by: Andrew Morton <[email protected]>
> > > Signed-off-by: Linus Torvalds <[email protected]>
> > >
> > > diff --git a/mm/quicklist.c b/mm/quicklist.c
> > > index ae8189c..3f703f7 100644
> > > --- a/mm/quicklist.c
> > > +++ b/mm/quicklist.c
> > > @@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
> > > static unsigned long max_pages(unsigned long min_pages)
> > > {
> > > unsigned long node_free_pages, max;
> > > + struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
> > > +
> > > + node_free_pages =
> > > +#ifdef CONFIG_ZONE_DMA
> > > + zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
> > > +#endif
> > > +#ifdef CONFIG_ZONE_DMA32
> > > + zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
> > > +#endif
> > > + zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);
> > >
> > > - node_free_pages = node_page_state(numa_node_id(),
> > > - NR_FREE_PAGES);
> > > max = node_free_pages / FRACTION_OF_NODE_MEM;
> > > return max(max, min_pages);
> > > }
> > >
> > >
> > > I note that this didn't have the [email protected] cc. Christoph, did we
> > > deliberately decide not to backport?
> > >
> >
> > According to
> > http://archive.netbsd.se/?ml=linux-stable-commits&a=2008-01&m=6134301 ,
> > its been added to the stable tree. I remember asking Greg to add it.
>
> And then Christoph told me to remove it...

No I asked you to add this patch and remove the earlier patch that
tinkered around with tlb flushing.

2008-02-05 22:18:24

by Oliver Pinter

[permalink] [raw]
Subject: Re: [stable] OOM-killer invoked but why ?

that are, not this version ..

this is the BAD:
----8<----
From [email protected] Mon Dec 17 16:32:25 2007
2 From: Christoph Lameter <[email protected]>
3 Date: Mon, 17 Dec 2007 16:20:27 -0800
4 Subject: quicklist: Set tlb->need_flush if pages are remaining in
quicklist 0
5 To: [email protected]
6 Cc: [email protected], [email protected],
[email protected], [email protected]
7 Message-ID: <[email protected]>
8
9
10 From: Christoph Lameter <[email protected]>
11
12 patch 421d99193537a6522aac2148286f08792167d5fd in mainline.
13
14 This ensures that the quicklists are drained. Otherwise draining may only
15 occur when the processor reaches an idle state.
16
17 Fixes fatal leakage of pgd_t's on 2.6.22 and later.
18
19 Signed-off-by: Christoph Lameter <[email protected]>
20 Reported-by: Dhaval Giani <[email protected]>
21 Signed-off-by: Andrew Morton <[email protected]>
22 Signed-off-by: Linus Torvalds <[email protected]>
23 Signed-off-by: Greg Kroah-Hartman <[email protected]>
24
25
26 ---
27 include/asm-generic/tlb.h | 4 ++++
28 1 file changed, 4 insertions(+)
29
30 --- a/include/asm-generic/tlb.h
31 +++ b/include/asm-generic/tlb.h
32 @@ -14,6 +14,7 @@
33 #define _ASM_GENERIC__TLB_H
34
35 #include <linux/swap.h>
36 +#include <linux/quicklist.h>
37 #include <asm/pgalloc.h>
38 #include <asm/tlbflush.h>
39
40 @@ -85,6 +86,9 @@ tlb_flush_mmu(struct mmu_gather *tlb, un
41 static inline void
42 tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
unsigned long end)
43 {
44 +#ifdef CONFIG_QUICKLIST
45 + tlb->need_flush += &__get_cpu_var(quicklist)[0].nr_pages != 0;
46 +#endif
47 tlb_flush_mmu(tlb, start, end);
48
49 /* keep the page table cache within bounds
----8<----

On 2/5/08, Greg KH <[email protected]> wrote:
> On Tue, Feb 05, 2008 at 04:33:03PM +0530, Dhaval Giani wrote:
> > On Tue, Feb 05, 2008 at 02:07:37AM -0800, Andrew Morton wrote:
> > > On Thu, 31 Jan 2008 13:53:05 +0100 Claude Frantz
> <[email protected]> wrote:
> > >
> > > > Hello !
> > > >
> > > > I'm faced to a problem where the OOM-killer is invoked but I cannot
> find
> > > > the reason why. The machine is rather powerfull, the load is very
> moderate,
> > > > the disk swap space is nearly unused. The only strange observation
> which
> > > > appears to me is the slow but progressive decreasing of kbbuffers
> during
> > > > many hours.
> > > >
> > > > Can you help me to diagnose the problem and to find a good solution ?
> > > >
> > > > ...
> > > >
> > > > Jan 28 03:50:49 toaster kernel: 177466 pages slab
> > > > Jan 28 03:50:49 toaster kernel: 1915 pages pagetables
> > > > Jan 28 03:50:49 toaster kernel: Out of memory: kill process 10859
> (amavisd) score 36218 or a child
> > > > Jan 28 03:50:49 toaster kernel: Killed process 19146 (amavisd)
> > >
> > > slab. Maybe you've been bitten by the quicklist leak. If you're able
> to
> > > patch your kernel then please try this fix:
> > >
> > > commit 96990a4ae979df9e235d01097d6175759331e88c
> > > Author: Christoph Lameter <[email protected]>
> > > Date: Mon Jan 14 00:55:14 2008 -0800
> > >
> > > quicklists: Only consider memory that can be used with GFP_KERNEL
> > >
> > > Quicklists calculates the size of the quicklists based on the number
> of
> > > free pages. This must be the number of free pages that can be
> allocated
> > > with GFP_KERNEL. node_page_state() includes the pages in
> ZONE_HIGHMEM and
> > > ZONE_MOVABLE which may lead the quicklists to become too large
> causing OOM.
> > >
> > > Signed-off-by: Christoph Lameter <[email protected]>
> > > Tested-by: Dhaval Giani <[email protected]>
> > > Signed-off-by: Andrew Morton <[email protected]>
> > > Signed-off-by: Linus Torvalds <[email protected]>
> > >
> > > diff --git a/mm/quicklist.c b/mm/quicklist.c
> > > index ae8189c..3f703f7 100644
> > > --- a/mm/quicklist.c
> > > +++ b/mm/quicklist.c
> > > @@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist,
> quicklist)[CONFIG_NR_QUICK];
> > > static unsigned long max_pages(unsigned long min_pages)
> > > {
> > > unsigned long node_free_pages, max;
> > > + struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
> > > +
> > > + node_free_pages =
> > > +#ifdef CONFIG_ZONE_DMA
> > > + zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
> > > +#endif
> > > +#ifdef CONFIG_ZONE_DMA32
> > > + zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
> > > +#endif
> > > + zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);
> > >
> > > - node_free_pages = node_page_state(numa_node_id(),
> > > - NR_FREE_PAGES);
> > > max = node_free_pages / FRACTION_OF_NODE_MEM;
> > > return max(max, min_pages);
> > > }
> > >
> > >
> > > I note that this didn't have the [email protected] cc. Christoph, did
> we
> > > deliberately decide not to backport?
> > >
> >
> > According to
> > http://archive.netbsd.se/?ml=linux-stable-commits&a=2008-01&m=6134301 ,
> > its been added to the stable tree. I remember asking Greg to add it.
>
> And then Christoph told me to remove it...
>
> thanks,
>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Thanks,
Oliver

2008-02-05 22:40:50

by Greg KH

[permalink] [raw]
Subject: Re: [stable] OOM-killer invoked but why ?

On Tue, Feb 05, 2008 at 02:13:12PM -0800, Christoph Lameter wrote:
> On Tue, 5 Feb 2008, Greg KH wrote:
>
> > > > commit 96990a4ae979df9e235d01097d6175759331e88c
> > > > Author: Christoph Lameter <[email protected]>
> > > > Date: Mon Jan 14 00:55:14 2008 -0800
> > > >
> > > > quicklists: Only consider memory that can be used with GFP_KERNEL
> > > >
> > > > Quicklists calculates the size of the quicklists based on the number of
> > > > free pages. This must be the number of free pages that can be allocated
> > > > with GFP_KERNEL. node_page_state() includes the pages in ZONE_HIGHMEM and
> > > > ZONE_MOVABLE which may lead the quicklists to become too large causing OOM.
> > > >
> > > > Signed-off-by: Christoph Lameter <[email protected]>
> > > > Tested-by: Dhaval Giani <[email protected]>
> > > > Signed-off-by: Andrew Morton <[email protected]>
> > > > Signed-off-by: Linus Torvalds <[email protected]>
> > > >
> > > > diff --git a/mm/quicklist.c b/mm/quicklist.c
> > > > index ae8189c..3f703f7 100644
> > > > --- a/mm/quicklist.c
> > > > +++ b/mm/quicklist.c
> > > > @@ -26,9 +26,17 @@ DEFINE_PER_CPU(struct quicklist, quicklist)[CONFIG_NR_QUICK];
> > > > static unsigned long max_pages(unsigned long min_pages)
> > > > {
> > > > unsigned long node_free_pages, max;
> > > > + struct zone *zones = NODE_DATA(numa_node_id())->node_zones;
> > > > +
> > > > + node_free_pages =
> > > > +#ifdef CONFIG_ZONE_DMA
> > > > + zone_page_state(&zones[ZONE_DMA], NR_FREE_PAGES) +
> > > > +#endif
> > > > +#ifdef CONFIG_ZONE_DMA32
> > > > + zone_page_state(&zones[ZONE_DMA32], NR_FREE_PAGES) +
> > > > +#endif
> > > > + zone_page_state(&zones[ZONE_NORMAL], NR_FREE_PAGES);
> > > >
> > > > - node_free_pages = node_page_state(numa_node_id(),
> > > > - NR_FREE_PAGES);
> > > > max = node_free_pages / FRACTION_OF_NODE_MEM;
> > > > return max(max, min_pages);
> > > > }
> > > >
> > > >
> > > > I note that this didn't have the [email protected] cc. Christoph, did we
> > > > deliberately decide not to backport?
> > > >
> > >
> > > According to
> > > http://archive.netbsd.se/?ml=linux-stable-commits&a=2008-01&m=6134301 ,
> > > its been added to the stable tree. I remember asking Greg to add it.
> >
> > And then Christoph told me to remove it...
>
> No I asked you to add this patch and remove the earlier patch that
> tinkered around with tlb flushing.

Argh, I'm too confused...

As long as everyone is happy with what is currently queued up for
.22-stable and .23-stable, I'll just shut up now and get on releasing
them :)

thanks,

greg k-h