Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757213AbXKBOgW (ORCPT ); Fri, 2 Nov 2007 10:36:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754221AbXKBOgL (ORCPT ); Fri, 2 Nov 2007 10:36:11 -0400 Received: from pne-smtpout4-sn2.hy.skanova.net ([81.228.8.154]:58639 "EHLO pne-smtpout4-sn2.hy.skanova.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754361AbXKBOgJ (ORCPT ); Fri, 2 Nov 2007 10:36:09 -0400 Date: Fri, 2 Nov 2007 16:36:04 +0200 From: Sami Farin To: Linux kernel Mailing List Cc: Rik van Riel Subject: Re: kernel 2.6.23: what IS the VM doing? Message-ID: <20071102143604.y6v7qiax2uv2wusn@m.safari.iki.fi> Mail-Followup-To: Linux kernel Mailing List , Rik van Riel References: <20070830115429.w2puzet5sllhccdk@m.safari.iki.fi> <46DE085F.2040306@redhat.com> <20070905114545.2hydqpasqx56afkf@m.safari.iki.fi> <46DED83A.8070104@redhat.com> <20070905173332.dzwkm5ofdvtjcal4@m.safari.iki.fi> <46DF3253.6030007@redhat.com> <20070914171746.7bbx2te5646zcdpr@m.safari.iki.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070914171746.7bbx2te5646zcdpr@m.safari.iki.fi> User-Agent: Mutt/1.5.16 (2007-10-15) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10849 Lines: 279 On Fri, Sep 14, 2007 at 20:17:46 +0300, Sami Farin wrote: > On Wed, Sep 05, 2007 at 18:48:51 -0400, Rik van Riel wrote: > > Sami Farin wrote: > >> On Wed, Sep 05, 2007 at 12:24:26 -0400, Rik van Riel wrote: > >> ... > >>>> *shrug* > >>> The attached patch should make sure kswapd does not free an > >>> excessive number of pages in zone_normal just because the > >>> pages in zone_highmem are difficult to free. > >>> > >>> It does give kswapd a large margin to continue putting equal > >>> pressure on all zones in normal situations. > >>> > >>> Sami, could you try out this patch to see if it helps your > >>> situation? > >> > >> Thanks, Rik. bzImage is ready, I probably reboot inside one > >> month for a reason or other 8-) > > > > The more I look at the bug, the more I see that it is probably > > not very easy to reproduce on demand. I have, however, a full > > Well, I now booted to x86_64 kernel. > > I can still reproduce this. > When I unload ipset modules, kernel resumes "normal" operation, i.e., > not swapping like mad. I now have 2GB of extra RAM, so 3GB in total, on x86_64 system. If ipset tries to allocate 512 KB or more, kernel goes into swapping frenzy, of which system does not recover in over 30 minutes unless I press sysrq+k. some kernel settings...: vm.dirty_ratio = 4 vm.dirty_background_ratio = 2 vm.swappiness = 10 vm.vfs_cache_pressure = 10 vm.overcommit_memory = 2 SysRq : Show Blocked State task PC stack pid father kswapd0 D 0000000000000000 0 258 2 ffff8100be333ca0 0000000000000046 0000000000000000 0000000000000286 ffff8100be333c50 ffffffff00000000 ffff8100be560080 ffffffff807af3c0 0000000000000064 00000001285af6e6 00000000000000ff ffffffff802463f8 Call Trace: [] __mod_timer+0xb8/0xd0 [] schedule_timeout+0x63/0xd0 [] process_timeout+0x0/0x10 [] io_schedule_timeout+0x28/0x40 [] congestion_wait+0x8c/0xb0 [] autoremove_wake_function+0x0/0x40 [] throttle_vm_writeout+0x54/0xc0 [] shrink_zone+0xe3/0x140 [] kswapd+0x510/0x5b0 [] autoremove_wake_function+0x0/0x40 [] kswapd+0x0/0x5b0 [] kthread+0x4d/0x80 [] child_rip+0xa/0x12 [] kthread+0x0/0x80 [] child_rip+0x0/0x12 irqbalance D 0000000000aa7f00 0 2110 1 ffff8100b8c4fcd8 0000000000000082 0000000000000000 0000000000000000 ffff8100a5e57f00 0000000800700006 ffff8100be5f91c0 ffff810060182140 ffff8100b8c4fc88 0000000000000282 ffff8100b8c4fcb8 ffffffff8040e1f6 Call Trace: [] __up_read+0x46/0xb0 [] io_schedule+0x28/0x40 [] sync_page+0x37/0x50 [] __wait_on_bit_lock+0x4e/0x80 [] sync_page+0x0/0x50 [] __lock_page+0x65/0x70 [] wake_bit_function+0x0/0x30 [] handle_mm_fault+0x269/0x870 [] do_page_fault+0x1a4/0x900 [] free_pages_and_swap_cache+0x9e/0xd0 [] unmap_region+0x136/0x150 [] remove_vma+0x5e/0x70 [] __up_write+0xd0/0x130 [] error_exit+0x0/0x84 svscan D 0000000000df7900 0 2438 1 ffff8100b6607cd8 0000000000000082 0000000000000000 0000000000000000 ffff81005f81fd80 0000000800700006 ffff8100bb516180 ffff810060182140 ffff8100b6607c88 0000000000000282 ffff8100b6607cb8 ffffffff8040e1f6 Call Trace: [] __up_read+0x46/0xb0 [] io_schedule+0x28/0x40 [] sync_page+0x37/0x50 [] __wait_on_bit_lock+0x4e/0x80 [] sync_page+0x0/0x50 [] __lock_page+0x65/0x70 [] wake_bit_function+0x0/0x30 [] handle_mm_fault+0x269/0x870 [] xfs_vn_getattr+0x4c/0x140 [] do_page_fault+0x1a4/0x900 [] vfs_getattr+0x60/0x90 [] vfs_fstat+0x45/0x60 [] sys32_fstat64+0x2e/0x40 [] error_exit+0x0/0x84 ... ipset D 0000000000000000 0 3713 3574 ffff8100566237a8 0000000000000086 ffff810056623748 ffffffff802335e2 ffff810056623748 0000000000000010 ffff8100b085e0c0 ffff810060182140 ffff810056623758 0000000000000282 ffff810056623788 ffffffff8040e1f6 Call Trace: [] enqueue_entity+0x42/0x60 [] __up_read+0x46/0xb0 [] io_schedule+0x28/0x40 [] sync_page+0x37/0x50 [] __wait_on_bit+0x55/0x80 [] sync_page+0x0/0x50 [] wait_on_page_bit+0x6f/0x80 [] wake_bit_function+0x0/0x30 [] shrink_page_list+0x164/0x680 [] del_timer_sync+0x1a/0x30 [] schedule_timeout+0x6b/0xd0 [] process_timeout+0x0/0x10 [] congestion_wait+0x9a/0xb0 [] shrink_inactive_list+0x40a/0x420 [] shrink_zone+0xcf/0x140 [] try_to_free_pages+0x174/0x270 [] __alloc_pages+0x160/0x350 [] printk+0x67/0x70 [] cache_alloc_refill+0x2e3/0x570 [] __kmalloc+0x113/0x120 [] :ip_set_nethash:retry+0x183/0x500 [] :ip_set:__ip_set_addip+0x6f/0x90 [] :ip_set:ip_set_sockfn_get+0x93d/0xd50 [] nf_sockopt+0x142/0x150 [] nf_getsockopt+0xf/0x20 [] ip_getsockopt+0x98/0xc0 [] raw_getsockopt+0x11/0x30 [] sock_common_getsockopt+0xf/0x20 [] sys_getsockopt+0x9c/0xd0 [] tracesys+0xdc/0xe1 SysRq : Show Memory Mem-info: DMA per-cpu: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 22 Cold: hi: 62, btch: 15 usd: 55 CPU 1: Hot: hi: 186, btch: 31 usd: 13 Cold: hi: 62, btch: 15 usd: 4 Active:10058 inactive:2274 dirty:1 writeback:106 unstable:0 free:691389 slab:15936 mapped:1584 pagetables:4579 bounce:0 DMA free:9108kB min:16kB low:20kB high:24kB active:0kB inactive:0kB present:8604kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2988 2988 2988 DMA32 free:2756448kB min:6984kB low:8728kB high:10476kB active:40232kB inactive:9096kB present:3060476kB pages_scanned:166 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 5*4kB 4*8kB 2*16kB 4*32kB 3*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 1*2048kB 1*4096kB = 9108kB DMA32: 14501*4kB 15671*8kB 13250*16kB 10866*32kB 8172*64kB 4993*128kB 2039*256kB 501*512kB 51*1024kB 10*2048kB 0*4096kB = 2756396kB Swap cache: add 320914, delete 319423, find 5260/12831, race 0+3 Free swap = 2853404kB Total swap = 3911784kB Free swap: 2853404kB 780032 pages of RAM 14430 reserved pages 52938 pages shared 1491 pages swap cached SysRq : Show Memory Mem-info: DMA per-cpu: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: Hot: hi: 186, btch: 31 usd: 27 Cold: hi: 62, btch: 15 usd: 54 CPU 1: Hot: hi: 186, btch: 31 usd: 6 Cold: hi: 62, btch: 15 usd: 1 Active:10223 inactive:2151 dirty:0 writeback:122 unstable:0 free:691292 slab:16134 mapped:1645 pagetables:4494 bounce:0 DMA free:9108kB min:16kB low:20kB high:24kB active:0kB inactive:0kB present:8604kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 2988 2988 2988 DMA32 free:2756060kB min:6984kB low:8728kB high:10476kB active:40892kB inactive:8604kB present:3060476kB pages_scanned:39 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 5*4kB 4*8kB 2*16kB 4*32kB 3*64kB 2*128kB 1*256kB 0*512kB 2*1024kB 1*2048kB 1*4096kB = 9108kB DMA32: 13655*4kB 15211*8kB 12903*16kB 10596*32kB 8005*64kB 4982*128kB 2094*256kB 522*512kB 59*1024kB 10*2048kB 0*4096kB = 2756068kB Swap cache: add 412691, delete 411114, find 12357/32547, race 0+4 Free swap = 2853920kB Total swap = 3911784kB Free swap: 2853920kB 780032 pages of RAM 14430 reserved pages 52619 pages shared 1577 pages swap cached 5 min later... kswapd0 D 0000000000000000 0 258 2 ffff8100be333d30 0000000000000046 0000000000000000 0000000000000286 ffff8100be333ce0 ffffffff00000000 ffff8100be560080 ffffffff807af3c0 0000000000000064 00000001286155f2 00000000000000ff ffffffff802463f8 Call Trace: [] __mod_timer+0xb8/0xd0 [] schedule_timeout+0x63/0xd0 [] process_timeout+0x0/0x10 [] io_schedule_timeout+0x28/0x40 [] congestion_wait+0x8c/0xb0 [] autoremove_wake_function+0x0/0x40 [] kswapd+0x545/0x5b0 [] autoremove_wake_function+0x0/0x40 [] kswapd+0x0/0x5b0 [] kthread+0x4d/0x80 [] child_rip+0xa/0x12 [] kthread+0x0/0x80 [] child_rip+0x0/0x12 Before this ipset test, I had around 100 KB swap used, after ipset finished and I restarted Xorg I had 500 MB. vmstat after I restarted Xorg: nr_free_pages 614862 nr_inactive 27488 nr_active 62666 nr_anon_pages 37065 nr_mapped 7015 nr_file_pages 66637 nr_dirty 3 nr_writeback 0 nr_slab_reclaimable 4257 nr_slab_unreclaimable 10737 nr_page_table_pages 2652 nr_unstable 0 nr_bounce 0 nr_vmscan_write 340956 pgpgin 78462562 pgpgout 67868956 pswpin 625227 pswpout 338578 pgalloc_dma 0 pgalloc_dma32 399577709 pgalloc_normal 0 pgalloc_movable 0 pgfree 400192776 pgactivate 19916454 pgdeactivate 18092870 pgfault 472557115 pgmajfault 291798 pgrefill_dma 0 pgrefill_dma32 97774325 pgrefill_normal 0 pgrefill_movable 0 pgsteal_dma 0 pgsteal_dma32 24633836 pgsteal_normal 0 pgsteal_movable 0 pgscan_kswapd_dma 0 pgscan_kswapd_dma32 28206694 pgscan_kswapd_normal 0 pgscan_kswapd_movable 0 pgscan_direct_dma 0 pgscan_direct_dma32 574376 pgscan_direct_normal 0 pgscan_direct_movable 0 pginodesteal 0 slabs_scanned 4761472 kswapd_steal 24497549 kswapd_inodesteal 232 pageoutrun 392180 allocstall 949 pgrotated 315045 -- Do what you love because life is too short for anything else. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/