Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754248Ab3EIWHw (ORCPT ); Thu, 9 May 2013 18:07:52 -0400 Received: from e39.co.us.ibm.com ([32.97.110.160]:45040 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752882Ab3EIWHu (ORCPT ); Thu, 9 May 2013 18:07:50 -0400 Date: Thu, 9 May 2013 17:07:39 -0500 From: Seth Jennings To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@suse.de, tim.c.chen@linux.intel.com Subject: Re: [RFC][PATCH 1/7] defer clearing of page_private() for swap cache pages Message-ID: <20130509220739.GA14840@cerebellum> References: <20130507211954.9815F9D1@viggo.jf.intel.com> <20130507211955.7DF88A4F@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130507211955.7DF88A4F@viggo.jf.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13050922-3620-0000-0000-000002699A74 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6970 Lines: 143 On Tue, May 07, 2013 at 02:19:55PM -0700, Dave Hansen wrote: > > From: Dave Hansen > > There are only two callers of swapcache_free() which actually > pass in a non-NULL 'struct page'. Both of them > (__remove_mapping and delete_from_swap_cache()) create a > temporary on-stack 'swp_entry_t' and set entry.val to > page_private(). > > They need to do this since __delete_from_swap_cache() does > set_page_private(page, 0) and destroys the information. > > However, I'd like to batch a few of these operations on several > pages together in a new version of __remove_mapping(), and I > would prefer not to have to allocate temporary storage for > each page. The code is pretty ugly, and it's a bit silly > to create these on-stack 'swp_entry_t's when it is so easy to > just keep the information around in 'struct page'. > > There should not be any danger in doing this since we are > absolutely on the path of freeing these page. There is no > turning back, and no other rerferences can be obtained > after it comes out of the radix tree. I get a BUG on this one: [ 26.114818] ------------[ cut here ]------------ [ 26.115282] kernel BUG at mm/memcontrol.c:4111! [ 26.115282] invalid opcode: 0000 [#1] PREEMPT SMP [ 26.115282] Modules linked in: [ 26.115282] CPU: 3 PID: 5026 Comm: cc1 Not tainted 3.9.0+ #8 [ 26.115282] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 26.115282] task: ffff88007c1cdca0 ti: ffff88001b442000 task.ti: ffff88001b442000 [ 26.115282] RIP: 0010:[] [] __mem_cgroup_uncharge_common+0x255/0x2e0 [ 26.115282] RSP: 0000:ffff88001b443708 EFLAGS: 00010206 [ 26.115282] RAX: 4000000000090009 RBX: 0000000000000000 RCX: ffffc90000014001 [ 26.115282] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffea00006e5b40 [ 26.115282] RBP: ffff88001b443738 R08: 0000000000000000 R09: 0000000000000000 [ 26.115282] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00006e5b40 [ 26.115282] R13: 0000000000000000 R14: ffffea00006e5b40 R15: 0000000000000002 [ 26.115282] FS: 00007fabd08ee700(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 [ 26.115282] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.115282] CR2: 00007fabce27a000 CR3: 000000001985f000 CR4: 00000000000006a0 [ 26.115282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 26.115282] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 26.115282] Stack: [ 26.115282] ffffffff810dcbae ffff880064a0a500 0000000000000001 ffffea00006e5b40 [ 26.115282] ffffea00006e5b40 0000000000000001 ffff88001b443748 ffffffff810f0d05 [ 26.115282] ffff88001b443778 ffffffff810ddb3e ffff88001b443778 ffffea00006e5b40 [ 26.115282] Call Trace: [ 26.115282] [] ? swap_info_get+0x5e/0xe0 [ 26.115282] [] mem_cgroup_uncharge_swapcache+0x15/0x20 [ 26.115282] [] swapcache_free+0x4e/0x70 [ 26.115282] [] __remove_mapping+0xd7/0x120 [ 26.115282] [] shrink_page_list+0x5c2/0x920 [ 26.115282] [] ? isolate_lru_pages.isra.37+0xae/0x120 [ 26.115282] [] shrink_inactive_list+0x13f/0x380 [ 26.115282] [] shrink_lruvec+0x240/0x4e0 [ 26.115282] [] shrink_zone+0x66/0x1a0 [ 26.115282] [] do_try_to_free_pages+0xeb/0x570 [ 26.115282] [] ? lookup_page_cgroup_used+0x9/0x20 [ 26.115282] [] try_to_free_pages+0x9f/0xc0 [ 26.115282] [] __alloc_pages_nodemask+0x5a7/0x970 [ 26.115282] [] handle_pte_fault+0x65e/0x880 [ 26.115282] [] handle_mm_fault+0x139/0x1e0 [ 26.115282] [] __do_page_fault+0x160/0x460 [ 26.115282] [] ? do_brk+0x1fc/0x360 [ 26.115282] [] ? lockdep_sys_exit_thunk+0x35/0x67 [ 26.115282] [] do_page_fault+0x9/0x10 [ 26.115282] [] page_fault+0x22/0x30 [ 26.115282] Code: a9 00 00 08 00 0f 85 43 fe ff ff e9 b8 fe ff ff 66 0f 1f 44 00 00 41 8b 44 24 18 85 c0 0f 89 2b fe ff ff 0f 1f 00 e9 9d fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 49 89 9c 24 48 0f 00 00 e9 0a [ 26.115282] RIP [] __mem_cgroup_uncharge_common+0x255/0x2e0 [ 26.115282] RSP [ 26.171597] ---[ end trace 5e49a21e51452c24 ]--- mm/memcontrol:4111 VM_BUG_ON(PageSwapCache(page)); Seems that mem_cgroup_uncharge_swapcache, somewhat ironically expects the SwapCache flag to be unset already. Fix might be a simple as removing that VM_BUG_ON() but there might be more to it. There usually is :) Seth > > Signed-off-by: Dave Hansen > --- > > linux.git-davehans/mm/swap_state.c | 4 ++-- > linux.git-davehans/mm/vmscan.c | 2 ++ > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff -puN mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private mm/swap_state.c > --- linux.git/mm/swap_state.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-07 13:48:13.698044473 -0700 > +++ linux.git-davehans/mm/swap_state.c 2013-05-07 13:48:13.703044693 -0700 > @@ -146,8 +146,6 @@ void __delete_from_swap_cache(struct pag > entry.val = page_private(page); > address_space = swap_address_space(entry); > radix_tree_delete(&address_space->page_tree, page_private(page)); > - set_page_private(page, 0); > - ClearPageSwapCache(page); > address_space->nrpages--; > __dec_zone_page_state(page, NR_FILE_PAGES); > INC_CACHE_INFO(del_total); > @@ -224,6 +222,8 @@ void delete_from_swap_cache(struct page > spin_unlock_irq(&address_space->tree_lock); > > swapcache_free(entry, page); > + set_page_private(page, 0); > + ClearPageSwapCache(page); > page_cache_release(page); > } > > diff -puN mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private mm/vmscan.c > --- linux.git/mm/vmscan.c~__delete_from_swap_cache-dont-clear-page-private 2013-05-07 13:48:13.700044561 -0700 > +++ linux.git-davehans/mm/vmscan.c 2013-05-07 13:48:13.705044783 -0700 > @@ -494,6 +494,8 @@ static int __remove_mapping(struct addre > __delete_from_swap_cache(page); > spin_unlock_irq(&mapping->tree_lock); > swapcache_free(swap, page); > + set_page_private(page, 0); > + ClearPageSwapCache(page); > } else { > void (*freepage)(struct page *); > > _ > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/