Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752691AbZF1H4Z (ORCPT ); Sun, 28 Jun 2009 03:56:25 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752145AbZF1H4Q (ORCPT ); Sun, 28 Jun 2009 03:56:16 -0400 Received: from mx2.redhat.com ([66.187.237.31]:46761 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751988AbZF1H4P (ORCPT ); Sun, 28 Jun 2009 03:56:15 -0400 Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20090627125412.GA1667@cmpxchg.org> References: <20090627125412.GA1667@cmpxchg.org> <3901.1245848839@redhat.com> <20090624023251.GA16483@localhost> <20090620043303.GA19855@localhost> <32411.1245336412@redhat.com> <20090517022327.280096109@intel.com> <2015.1245341938@redhat.com> <20090618095729.d2f27896.akpm@linux-foundation.org> <7561.1245768237@redhat.com> <26537.1246086769@redhat.com> To: Johannes Weiner Cc: dhowells@redhat.com, Wu Fengguang , "riel@redhat.com" , "minchan.kim@gmail.com" , Andrew Morton , LKML , Christoph Lameter , KOSAKI Motohiro , "peterz@infradead.org" , "tytso@mit.edu" , "linux-mm@kvack.org" , "elladan@eskimo.com" , "npiggin@suse.de" Subject: Re: Found the commit that causes the OOMs Date: Sun, 28 Jun 2009 08:55:48 +0100 Message-ID: <31494.1246175748@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7670 Lines: 163 Johannes Weiner wrote: > From: Johannes Weiner > Subject: vmscan: keep balancing anon lists on swap-full conditions > > Page reclaim doesn't scan and balance the anon LRU lists when > nr_swap_pages is zero to save the scan overhead for swapless systems. > > Unfortunately, this variable can reach zero when all present swap > space is occupied as well and we don't want to stop balancing in that > case or we encounter an unreclaimable mess of anon lists when swap > space gets freed up and we are theoretically in the position to page > out again. > > Use the total_swap_pages variable to have a better indicator when to > scan the anon LRU lists. > > We still might have unbalanced anon lists when swap space is added > during run time but it is a a less dynamic change in state and we > still save the scanning overhead for CONFIG_SWAP systems that never > actually set up swap space. > > Signed-off-by: Johannes Weiner This doesn't help. It may change the behaviour though: rather than locking up after a couple of OOMs, it generated 42MB of OOM messages. It didn't go wrong until its 5th pass through the LTP syscalls testsuite this time. Attached is the first part of the log where OOM messages were generated. David --- msgctl11 invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0 msgctl11 cpuset=/ mems_allowed=0 Pid: 689, comm: msgctl11 Not tainted 2.6.31-rc1-cachefs #143 Call Trace: [] ? oom_kill_process.clone.0+0xa9/0x245 [] ? __out_of_memory+0x12b/0x142 [] ? out_of_memory+0x6a/0x94 [] ? __alloc_pages_nodemask+0x42e/0x51d [] ? cache_alloc_refill+0x353/0x69c [] ? find_get_page+0x1a/0x72 [] ? copy_process+0x95/0x114f [] ? kmem_cache_alloc+0x83/0xc5 [] ? copy_process+0x95/0x114f [] ? handle_mm_fault+0x2b9/0x62f [] ? do_fork+0x13f/0x2ba [] ? do_page_fault+0x1f8/0x20d [] ? stub_clone+0x13/0x20 [] ? system_call_fastpath+0x16/0x1b Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 62 CPU 1: hi: 186, btch: 31 usd: 0 Active_anon:71393 active_file:1 inactive_anon:4670 inactive_file:0 unevictable:0 dirty:11 writeback:0 unstable:0 free:3987 slab:38927 mapped:451 pagetables:58190 bounce:0 DMA free:3928kB min:60kB low:72kB high:88kB active_anon:3176kB inactive_anon:256kB active_file:0kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 968 968 968 DMA32 free:12020kB min:3948kB low:4932kB high:5920kB active_anon:282396kB inactive_anon:18424kB active_file:4kB inactive_file:0kB unevictable:0kB present:992000kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 8*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3928kB DMA32: 2367*4kB 71*8kB 10*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 12020kB 2342 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 255744 pages RAM 5597 pages reserved 230753 pages shared 216782 pages non-shared Out of memory: kill process 30280 (msgctl11) score 161571 or a child Killed process 31149 (msgctl11) msgctl11 invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0 msgctl11 cpuset=/ mems_allowed=0 Pid: 689, comm: msgctl11 Not tainted 2.6.31-rc1-cachefs #143 Call Trace: [] ? oom_kill_process.clone.0+0xa9/0x245 [] ? __out_of_memory+0x12b/0x142 [] ? out_of_memory+0x6a/0x94 [] ? __alloc_pages_nodemask+0x42e/0x51d [] ? cache_alloc_refill+0x353/0x69c [] ? find_get_page+0x1a/0x72 [] ? copy_process+0x95/0x114f [] ? kmem_cache_alloc+0x83/0xc5 [] ? copy_process+0x95/0x114f [] ? handle_mm_fault+0x2b9/0x62f [] ? do_fork+0x13f/0x2ba [] ? do_page_fault+0x1f8/0x20d [] ? stub_clone+0x13/0x20 [] ? system_call_fastpath+0x16/0x1b Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 0 Active_anon:75955 active_file:0 inactive_anon:4990 inactive_file:2 unevictable:0 dirty:0 writeback:0 unstable:0 free:1970 slab:38326 mapped:5 pagetables:59166 bounce:0 DMA free:3932kB min:60kB low:72kB high:88kB active_anon:3172kB inactive_anon:256kB active_file:0kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 968 968 968 DMA32 free:3948kB min:3948kB low:4932kB high:5920kB active_anon:300648kB inactive_anon:19704kB active_file:0kB inactive_file:8kB unevictable:0kB present:992000kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 9*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3932kB DMA32: 457*4kB 39*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3948kB 36 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 0kB Total swap = 0kB 255744 pages RAM 5597 pages reserved 162238 pages shared 220698 pages non-shared Out of memory: kill process 30280 (msgctl11) score 160654 or a child Killed process 31155 (msgctl11) msgctl11: page allocation failure. order:1, mode:0x20 Pid: 3095, comm: msgctl11 Not tainted 2.6.31-rc1-cachefs #143 Call Trace: [] ? __alloc_pages_nodemask+0x4d4/0x51d [] ? cache_alloc_refill+0x353/0x69c [] ? free_pages_bulk.clone.1+0x4d/0x20d [] ? __alloc_skb+0x38/0x148 [] ? __netdev_alloc_skb+0x15/0x2f [] ? __kmalloc_track_caller+0xc6/0x108 [] ? __alloc_skb+0x61/0x148 [] ? __netdev_alloc_skb+0x15/0x2f [] ? e1000_clean_rx_irq+0x1ab/0x2de [] ? e1000_clean+0x71/0x20f [] ? net_rx_action+0x64/0x129 [] ? process_timeout+0x0/0xb [] ? __do_softirq+0x92/0x129 [] ? call_softirq+0x1c/0x28 [] ? do_softirq+0x2c/0x68 [] ? do_IRQ+0x9c/0xb2 [] ? ret_from_intr+0x0/0xa [] ? shrink_zone+0x1d6/0x30f [] ? mb_cache_shrink_fn+0x26/0x115 [] ? __up_read+0x13/0x90 [] ? shrink_slab+0x13e/0x150 [] ? try_to_free_pages+0x20d/0x362 [] ? isolate_pages_global+0x0/0x219 [] ? __alloc_pages_nodemask+0x34d/0x51d [] ? __do_page_cache_readahead+0x9e/0x1a1 [] ? ra_submit+0x1c/0x20 [] ? filemap_fault+0x18a/0x316 [] ? __do_fault+0x54/0x3d6 [] ? handle_mm_fault+0x2b9/0x62f [] ? do_page_fault+0x1f8/0x20d [] ? page_fault+0x1f/0x30 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/