2009-12-24 23:43:01

by Arnaud Boulan

[permalink] [raw]
Subject: OOM killer unexpectedly called with kernel 2.6.32

Hello,

When using kernel version 2.6.32.2 I have a problem where the kernel calls the OOM killer
although there are still plenty of RAM and swap available.

I am able to easily reproduce the problem when there is a huge background file tansfer between
2 disks (cp -a of several Gigabytes), and then starting a few applications. In less than a minute
the kernel starts killing random processes (firefox, kmail, kdesktop, etc), although there is still
free (or buffers/cache) memory and the swap is not used at all...

I do not reproduce this problem when using kernel 2.6.31.8 (compiled with the same compiler,
and with the same userspace)

I have no idea what could cause this problem. Any help will be appreciated.
Regards,

Arnaud

(please CC me for any reply, i'm not subscribed to the list)


Dec 24 18:16:03 picchu kernel: X invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0
Dec 24 18:16:03 picchu kernel: X cpuset=/ mems_allowed=0
Dec 24 18:16:03 picchu kernel: Pid: 10719, comm: X Not tainted 2.6.32.2 #1
Dec 24 18:16:03 picchu kernel: Call Trace:
Dec 24 18:16:03 picchu kernel: [<ffffffff8106d513>] ? cpuset_print_task_mems_allowed+0x8d/0x98
Dec 24 18:16:03 picchu kernel: [<ffffffff8108166c>] oom_kill_process+0x82/0x241
Dec 24 18:16:03 picchu kernel: [<ffffffff81052014>] ? ktime_get_ts+0xb1/0xbe
Dec 24 18:16:03 picchu kernel: [<ffffffff81081cfb>] __out_of_memory+0x134/0x14b
Dec 24 18:16:03 picchu kernel: [<ffffffff81081dff>] pagefault_out_of_memory+0x55/0x7a
Dec 24 18:16:03 picchu kernel: [<ffffffff8102594e>] mm_fault_error+0x3b/0xf6
Dec 24 18:16:03 picchu kernel: [<ffffffff81093639>] ? handle_mm_fault+0x359/0x6a4
Dec 24 18:16:03 picchu kernel: [<ffffffff81025b9d>] do_page_fault+0x194/0x1e3
Dec 24 18:16:03 picchu kernel: [<ffffffff813aed1f>] page_fault+0x1f/0x30
Dec 24 18:16:03 picchu kernel: Mem-Info:
Dec 24 18:16:03 picchu kernel: DMA per-cpu:
Dec 24 18:16:03 picchu kernel: CPU 0: hi: 0, btch: 1 usd: 0
Dec 24 18:16:03 picchu kernel: CPU 1: hi: 0, btch: 1 usd: 0
Dec 24 18:16:03 picchu kernel: DMA32 per-cpu:
Dec 24 18:16:03 picchu kernel: CPU 0: hi: 186, btch: 31 usd: 165
Dec 24 18:16:03 picchu kernel: CPU 1: hi: 186, btch: 31 usd: 64
Dec 24 18:16:03 picchu kernel: active_anon:155933 inactive_anon:54013 isolated_anon:0
Dec 24 18:16:03 picchu kernel: active_file:122843 inactive_file:129683 isolated_file:35
Dec 24 18:16:03 picchu kernel: unevictable:464 dirty:20768 writeback:18186 unstable:0
Dec 24 18:16:03 picchu kernel: free:3398 slab_reclaimable:22778 slab_unreclaimable:8060
Dec 24 18:16:03 picchu kernel: mapped:19607 shmem:63182 pagetables:8230 bounce:0
Dec 24 18:16:03 picchu kernel: DMA free:8004kB min:40kB low:48kB high:60kB active_anon:120kB inactive_anon:340kB active_file:1384kB inactive
_file:3672kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:0kB dirty:0kB writeback:0kB mapped:52kB shmem:0kB
slab_reclaimable:2300kB slab_unreclaimable:4kB kernel_stack:8kB pagetables:100kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Dec 24 18:16:03 picchu kernel: lowmem_reserve[]: 0 1993 1993 1993
Dec 24 18:16:03 picchu kernel: DMA32 free:5588kB min:5692kB low:7112kB high:8536kB active_anon:623612kB inactive_anon:215712kB active_file:4
89988kB inactive_file:515060kB unevictable:1856kB isolated(anon):0kB isolated(file):140kB present:2041776kB mlocked:1856kB dirty:83072kB wri
teback:72744kB mapped:78376kB shmem:252728kB slab_reclaimable:88812kB slab_unreclaimable:32236kB kernel_stack:2416kB pagetables:32820kB unst
able:0kB bounce:0kB writeback_tmp:0kB pages_scanned:192 all_unreclaimable? no
Dec 24 18:16:03 picchu kernel: lowmem_reserve[]: 0 0 0 0
Dec 24 18:16:03 picchu kernel: DMA: 15*4kB 3*8kB 41*16kB 49*32kB 31*64kB 13*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 8004kB
Dec 24 18:16:03 picchu kernel: DMA32: 1129*4kB 14*8kB 8*16kB 2*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 5588kB
Dec 24 18:16:03 picchu kernel: 316114 total pagecache pages
Dec 24 18:16:03 picchu kernel: 0 pages in swap cache
Dec 24 18:16:03 picchu kernel: Swap cache stats: add 11, delete 11, find 1/3
Dec 24 18:16:03 picchu kernel: Free swap = 2048248kB
Dec 24 18:16:03 picchu kernel: Total swap = 2048248kB
Dec 24 18:16:03 picchu kernel: 521616 pages RAM
Dec 24 18:16:03 picchu kernel: 9625 pages reserved
Dec 24 18:16:03 picchu kernel: 382065 pages shared
Dec 24 18:16:03 picchu kernel: 296585 pages non-shared
Dec 24 18:16:03 picchu kernel: Out of memory: kill process 3418 (kdesktop) score 967315 or a child
Dec 24 18:16:03 picchu kernel: Killed process 14144 (firefox)


Attachments:
(No filename) (4.52 kB)
config-2.6.31.8 (62.44 kB)
config-2.6.32.2 (63.89 kB)
dmesg-2.6.32.2 (58.30 kB)
cpuinfo (1.47 kB)
dmesg-2.6.31.8 (57.80 kB)
Download all attachments

2009-12-25 03:14:40

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: OOM killer unexpectedly called with kernel 2.6.32

> Hello,
>
> When using kernel version 2.6.32.2 I have a problem where the kernel calls the OOM killer
> although there are still plenty of RAM and swap available.
>
> I am able to easily reproduce the problem when there is a huge background file tansfer between
> 2 disks (cp -a of several Gigabytes), and then starting a few applications. In less than a minute
> the kernel starts killing random processes (firefox, kmail, kdesktop, etc), although there is still
> free (or buffers/cache) memory and the swap is not used at all...
>
> I do not reproduce this problem when using kernel 2.6.31.8 (compiled with the same compiler,
> and with the same userspace)
>
> I have no idea what could cause this problem. Any help will be appreciated.
> Regards,
>
> Arnaud
>
> (please CC me for any reply, i'm not subscribed to the list)
>
>
> Dec 24 18:16:03 picchu kernel: X invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0
> Dec 24 18:16:03 picchu kernel: X cpuset=/ mems_allowed=0
> Dec 24 18:16:03 picchu kernel: Pid: 10719, comm: X Not tainted 2.6.32.2 #1
> Dec 24 18:16:03 picchu kernel: Call Trace:
> Dec 24 18:16:03 picchu kernel: [<ffffffff8106d513>] ? cpuset_print_task_mems_allowed+0x8d/0x98
> Dec 24 18:16:03 picchu kernel: [<ffffffff8108166c>] oom_kill_process+0x82/0x241
> Dec 24 18:16:03 picchu kernel: [<ffffffff81052014>] ? ktime_get_ts+0xb1/0xbe
> Dec 24 18:16:03 picchu kernel: [<ffffffff81081cfb>] __out_of_memory+0x134/0x14b
> Dec 24 18:16:03 picchu kernel: [<ffffffff81081dff>] pagefault_out_of_memory+0x55/0x7a
> Dec 24 18:16:03 picchu kernel: [<ffffffff8102594e>] mm_fault_error+0x3b/0xf6
> Dec 24 18:16:03 picchu kernel: [<ffffffff81093639>] ? handle_mm_fault+0x359/0x6a4
> Dec 24 18:16:03 picchu kernel: [<ffffffff81025b9d>] do_page_fault+0x194/0x1e3
> Dec 24 18:16:03 picchu kernel: [<ffffffff813aed1f>] page_fault+0x1f/0x30
> Dec 24 18:16:03 picchu kernel: Mem-Info:
> Dec 24 18:16:03 picchu kernel: DMA per-cpu:
> Dec 24 18:16:03 picchu kernel: CPU 0: hi: 0, btch: 1 usd: 0
> Dec 24 18:16:03 picchu kernel: CPU 1: hi: 0, btch: 1 usd: 0
> Dec 24 18:16:03 picchu kernel: DMA32 per-cpu:
> Dec 24 18:16:03 picchu kernel: CPU 0: hi: 186, btch: 31 usd: 165
> Dec 24 18:16:03 picchu kernel: CPU 1: hi: 186, btch: 31 usd: 64
> Dec 24 18:16:03 picchu kernel: active_anon:155933 inactive_anon:54013 isolated_anon:0
> Dec 24 18:16:03 picchu kernel: active_file:122843 inactive_file:129683 isolated_file:35
> Dec 24 18:16:03 picchu kernel: unevictable:464 dirty:20768 writeback:18186 unstable:0
> Dec 24 18:16:03 picchu kernel: free:3398 slab_reclaimable:22778 slab_unreclaimable:8060
> Dec 24 18:16:03 picchu kernel: mapped:19607 shmem:63182 pagetables:8230 bounce:0
> Dec 24 18:16:03 picchu kernel: DMA free:8004kB min:40kB low:48kB high:60kB active_anon:120kB inactive_anon:340kB active_file:1384kB inactive
> _file:3672kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15368kB mlocked:0kB dirty:0kB writeback:0kB mapped:52kB shmem:0kB
> slab_reclaimable:2300kB slab_unreclaimable:4kB kernel_stack:8kB pagetables:100kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> Dec 24 18:16:03 picchu kernel: lowmem_reserve[]: 0 1993 1993 1993
> Dec 24 18:16:03 picchu kernel: DMA32 free:5588kB min:5692kB low:7112kB high:8536kB active_anon:623612kB inactive_anon:215712kB active_file:4
> 89988kB inactive_file:515060kB unevictable:1856kB isolated(anon):0kB isolated(file):140kB present:2041776kB mlocked:1856kB dirty:83072kB wri
> teback:72744kB mapped:78376kB shmem:252728kB slab_reclaimable:88812kB slab_unreclaimable:32236kB kernel_stack:2416kB pagetables:32820kB unst
> able:0kB bounce:0kB writeback_tmp:0kB pages_scanned:192 all_unreclaimable? no
> Dec 24 18:16:03 picchu kernel: lowmem_reserve[]: 0 0 0 0
> Dec 24 18:16:03 picchu kernel: DMA: 15*4kB 3*8kB 41*16kB 49*32kB 31*64kB 13*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 8004kB
> Dec 24 18:16:03 picchu kernel: DMA32: 1129*4kB 14*8kB 8*16kB 2*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 5588kB
> Dec 24 18:16:03 picchu kernel: 316114 total pagecache pages
> Dec 24 18:16:03 picchu kernel: 0 pages in swap cache
> Dec 24 18:16:03 picchu kernel: Swap cache stats: add 11, delete 11, find 1/3
> Dec 24 18:16:03 picchu kernel: Free swap = 2048248kB
> Dec 24 18:16:03 picchu kernel: Total swap = 2048248kB
> Dec 24 18:16:03 picchu kernel: 521616 pages RAM
> Dec 24 18:16:03 picchu kernel: 9625 pages reserved
> Dec 24 18:16:03 picchu kernel: 382065 pages shared
> Dec 24 18:16:03 picchu kernel: 296585 pages non-shared
> Dec 24 18:16:03 picchu kernel: Out of memory: kill process 3418 (kdesktop) score 967315 or a child
> Dec 24 18:16:03 picchu kernel: Killed process 14144 (firefox)
>

We've seen similar issue recently. can you please try following patch?


-----------------------------------------------------------
From: Hugh Dickins <[email protected]>

When do_nonlinear_fault() realizes that the page table must have been
corrupted for it to have been called, it does print_bad_pte() and returns
... VM_FAULT_OOM, which is hard to understand.

It made some sense when I did it for 2.6.15, when do_page_fault() just
killed the current process; but nowadays it lets the OOM killer decide who
to kill - so page table corruption in one process would be liable to kill
another.

Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee that
the process will be killed, but is good enough for such a rare
abnormality, accompanied as it is by the "BUG: Bad page map" message.

And recent HWPOISON work has copied that code into do_swap_page(), when it
finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too.

Signed-off-by: Hugh Dickins <[email protected]>
Cc: Izik Eidus <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Nick Piggin <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Andi Kleen <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Wu Fengguang <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN mm/memory.c~mm-sigbus-instead-of-abusing-oom mm/memory.c
--- a/mm/memory.c~mm-sigbus-instead-of-abusing-oom
+++ a/mm/memory.c
@@ -2527,7 +2527,7 @@ static int do_swap_page(struct mm_struct
ret = VM_FAULT_HWPOISON;
} else {
print_bad_pte(vma, address, orig_pte, NULL);
- ret = VM_FAULT_OOM;
+ ret = VM_FAULT_SIGBUS;
}
goto out;
}
@@ -2923,7 +2923,7 @@ static int do_nonlinear_fault(struct mm_
* Page table corrupted: show pte and kill process.
*/
print_bad_pte(vma, address, orig_pte, NULL);
- return VM_FAULT_OOM;
+ return VM_FAULT_SIGBUS;
}

pgoff = pte_to_pgoff(orig_pte);
_