Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932653AbXAVUxJ (ORCPT ); Mon, 22 Jan 2007 15:53:09 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932648AbXAVUxI (ORCPT ); Mon, 22 Jan 2007 15:53:08 -0500 Received: from lucidpixels.com ([75.144.35.66]:40527 "EHLO lucidpixels.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932629AbXAVUxG (ORCPT ); Mon, 22 Jan 2007 15:53:06 -0500 X-Greylist: delayed 1966 seconds by postgrey-1.27 at vger.kernel.org; Mon, 22 Jan 2007 15:53:06 EST Date: Mon, 22 Jan 2007 15:20:16 -0500 (EST) From: Justin Piszcz X-X-Sender: jpiszcz@p34.internal.lan To: Andrew Morton cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com Subject: Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2 In-Reply-To: <20070122115703.97ed54f3.akpm@osdl.org> Message-ID: References: <20070122115703.97ed54f3.akpm@osdl.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5914 Lines: 146 > What's that? Software raid or hardware raid? If the latter, which driver? Software RAID (md) On Mon, 22 Jan 2007, Andrew Morton wrote: > > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz wrote: > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > > the OOM killer and kill all of my processes? > > What's that? Software raid or hardware raid? If the latter, which driver? > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > happens every time! > > > > Anything to try? Any other output needed? Can someone shed some light on > > this situation? > > > > Thanks. > > > > > > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh) > > > > procs -----------memory---------- ---swap-- -----io---- -system-- > > ----cpu---- > > r b swpd free buff cache si so bi bo in cs us sy id > > wa > > 0 7 764 50348 12 1269988 0 0 53632 172 1902 4600 1 8 > > 29 62 > > 0 7 764 49420 12 1260004 0 0 53632 34368 1871 6357 2 11 > > 48 40 > > The wordwrapping is painful :( > > > > > The last lines of dmesg: > > [ 5947.199985] lowmem_reserve[]: 0 0 0 > > [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB > > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB > > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB > > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB > > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB > > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB > > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0 > > [ 5947.200055] Free swap = 2197628kB > > [ 5947.200058] Total swap = 2200760kB > > [ 5947.200060] Free swap: 2197628kB > > [ 5947.205664] 517888 pages of RAM > > [ 5947.205671] 288512 pages of HIGHMEM > > [ 5947.205673] 5666 reserved pages > > [ 5947.205675] 257163 pages shared > > [ 5947.205678] 600 pages swap cached > > [ 5947.205680] 88876 pages dirty > > [ 5947.205682] 115111 pages writeback > > [ 5947.205684] 5608 pages mapped > > [ 5947.205686] 49367 pages slab > > [ 5947.205688] 541 pages pagetables > > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a > > child > > [ 5947.205801] Killed process 1853 (named) > > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, > > oomkilladj=0 > > [ 5947.206621] [] out_of_memory+0x17b/0x1b0 > > [ 5947.206631] [] __alloc_pages+0x29c/0x2f0 > > [ 5947.206636] [] __pte_alloc+0x1d/0x90 > > [ 5947.206643] [] copy_page_range+0x357/0x380 > > [ 5947.206649] [] copy_process+0x765/0xfc0 > > [ 5947.206655] [] alloc_pid+0x1b9/0x280 > > [ 5947.206662] [] do_fork+0x79/0x1e0 > > [ 5947.206674] [] do_pipe+0x5f/0xc0 > > [ 5947.206680] [] sys_clone+0x36/0x40 > > [ 5947.206686] [] syscall_call+0x7/0xb > > [ 5947.206691] [] __sched_text_start+0x853/0x950 > > [ 5947.206698] ======================= > > Important information from the oom-killing event is missing. Please send > it all. > > >From your earlier reports we have several hundred MB of ZONE_NORMAL memory > which has gone awol. > > Please include /proc/meminfo from after the oom-killing. > > Please work out what is using all that slab memory, via /proc/slabinfo. > > After the oom-killing, please see if you can free up the ZONE_NORMAL memory > via a few `echo 3 > /proc/sys/vm/drop_caches' commands. See if you can > work out what happened to the missing couple-of-hundred MB from > ZONE_NORMAL. > > I believe this is the first part of it (hopefully): 2908kB active:86104kB inactive:1061904kB present:1145032kB pages_scanned:0 all_unreclaimable? no [ 5947.199985] lowmem_reserve[]: 0 0 0 [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0 [ 5947.200055] Free swap = 2197628kB [ 5947.200058] Total swap = 2200760kB [ 5947.200060] Free swap: 2197628kB [ 5947.205664] 517888 pages of RAM [ 5947.205671] 288512 pages of HIGHMEM [ 5947.205673] 5666 reserved pages [ 5947.205675] 257163 pages shared [ 5947.205678] 600 pages swap cached [ 5947.205680] 88876 pages dirty [ 5947.205682] 115111 pages writeback [ 5947.205684] 5608 pages mapped [ 5947.205686] 49367 pages slab [ 5947.205688] 541 pages pagetables [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a child [ 5947.205801] Killed process 1853 (named) [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, oomkilladj=0 [ 5947.206621] [] out_of_memory+0x17b/0x1b0 [ 5947.206631] [] __alloc_pages+0x29c/0x2f0 [ 5947.206636] [] __pte_alloc+0x1d/0x90 [ 5947.206643] [] copy_page_range+0x357/0x380 [ 5947.206649] [] copy_process+0x765/0xfc0 [ 5947.206655] [] alloc_pid+0x1b9/0x280 [ 5947.206662] [] do_fork+0x79/0x1e0 [ 5947.206674] [] do_pipe+0x5f/0xc0 [ 5947.206680] [] sys_clone+0x36/0x40 [ 5947.206686] [] syscall_call+0x7/0xb [ 5947.206691] [] __sched_text_start+0x853/0x950 [ 5947.206698] ======================= I will have to include the other parts when I am near the machine and can reboot it locally :) Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/