Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261252AbUKCAs2 (ORCPT ); Tue, 2 Nov 2004 19:48:28 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261892AbUKBWNn (ORCPT ); Tue, 2 Nov 2004 17:13:43 -0500 Received: from mail-relay-1.tiscali.it ([213.205.33.41]:61109 "EHLO mail-relay-1.tiscali.it") by vger.kernel.org with ESMTP id S262214AbUKBWLe (ORCPT ); Tue, 2 Nov 2004 17:11:34 -0500 Date: Tue, 2 Nov 2004 23:07:20 +0100 From: Andrea Arcangeli To: Dave Hansen Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andi Kleen , Andrew Morton Subject: Re: fix iounmap and a pageattr memleak (x86 and x86-64) Message-ID: <20041102220720.GV3571@dualathlon.random> References: <4187FA6D.3070604@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4187FA6D.3070604@us.ibm.com> X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-PGP-Key: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 User-Agent: Mutt/1.5.6i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5902 Lines: 139 On Tue, Nov 02, 2004 at 01:21:49PM -0800, Dave Hansen wrote: > This patch: > > >From: Andrea Arcangeli > > > >- fix silent memleak in the pageattr code that I found while searching > > for the bug Andi fixed in the second patch below (basically reference > > counting in split page was done on the pmd instead of the pte). > > > >- Part of this patch is also needed to make the above work on x86 > >(otherwise > > one of my new above BUGS() will trigger signalling the fact a bug was > > there). The below patch creates a subtle dependency that (_PAGE_PCD << > > 24) > > must not be zero. It's not the cleanest thing ever, but since it's an > > hardware bitflag I doubt it's going to break. > > > >Signed-off-by: Andi Kleen > >Signed-off-by: Andrea Arcangeli > >Signed-off-by: Andrew Morton > >--- > > > > 25-akpm/arch/i386/mm/ioremap.c | 4 ++-- > > 25-akpm/arch/i386/mm/pageattr.c | 13 +++++++------ > > 25-akpm/arch/x86_64/mm/ioremap.c | 14 +++++++------- > > 25-akpm/arch/x86_64/mm/pageattr.c | 23 ++++++++++++++--------- > > 4 files changed, 30 insertions(+), 24 deletions(-) > > is hitting this BUG() during bootup: > > /* memleak and potential failed 2M page regeneration */ > BUG_ON(!page_count(kpte_page)); > > in 2.6.10-rc1-mm2. > > Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) > Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) > Memory: 511144k/524288k available (1856k kernel code, 12608k reserved, > 1186k data, 164k init, 0k highmem) > Checking if this processor honours the WP bit even in supervisor mode... Ok. > Mount-cache hash table entries: 512 (order: 0, 4096 bytes) > ------------[ cut here ]------------ > kernel BUG at arch/i386/mm/pageattr.c:136! > invalid operand: 0000 [#1] > SMP DEBUG_PAGEALLOC > Modules linked in: > CPU: 0 > EIP: 0060:[] Not tainted VLI > EFLAGS: 00010046 (2.6.10-rc1-mm2) > EIP is at __change_page_attr+0x28c/0x358 > eax: ffffffff ebx: 017ff163 ecx: 00000000 edx: c10001e0 > esi: 00000000 edi: c000fff8 ebp: c10001e0 esp: c03f9d98 > ds: 007b es: 007b ss: 0068 > Process swapper (pid: 0, threadinfo=c03f9000 task=c0345b40) > Stack: c102ffe0 00000000 00000000 00000046 c0113ceb c17f9000 c102ff20 > c102ff20 017ff163 00000000 017ff163 00000000 c0113ceb c17fe000 > c102ffc0 c102ffc0 c1000000 c17ff000 c000fff8 017ff163 00000000 > 017ff163 00000000 00000000 > Call Trace: > [] __change_page_attr+0x2f/0x358 > [] __change_page_attr+0x2f/0x358 > [] change_page_attr+0x36/0x54 > [] kernel_map_pages+0x30/0x5f > [] __alloc_pages+0x340/0x350 > [] __get_free_pages+0x1d/0x30 > [] kmem_getpages+0x26/0xd4 > [] cache_grow+0xb1/0x150 > [] cache_alloc_refill+0x232/0x280 > [] kmem_cache_alloc+0x5a/0x78 > [] idr_pre_get+0x1c/0x44 > [] sysfs_fill_super+0x0/0xa4 > [] set_anon_super+0x10/0xb8 > [] sget+0xb3/0x148 > [] sysfs_fill_super+0x0/0xa4 > [] get_sb_single+0x26/0x8c > [] compare_single+0x0/0x8 > [] set_anon_super+0x0/0xb8 > [] sysfs_get_sb+0x19/0x1d > [] sysfs_fill_super+0x0/0xa4 > [] do_kern_mount+0x4e/0xd0 > [] kern_mount+0x11/0x15 > [] sysfs_init+0x1e/0x50 > [] mnt_init+0xb4/0xc0 > [] vfs_caches_init+0x7e/0x94 > [] start_kernel+0x12d/0x150 > Code: c7 0f 75 f5 f0 ff 4d 04 eb 08 0f 0b 85 00 88 c1 2d c0 8b 45 00 89 > ea f6 c4 80 74 07 8b 55 0c 8d 74 26 00 8b 42 04 83 f8 ff 75 08 <0f> 0b > 88 00 88 c1 2d c0 a1 ac 6c 34 c0 a8 08 0f 84 aa 00 00 00 > > I'm tracking down now to see exactly what's going on. This just a > regular, plain 4-way x86 box with 4GB of RAM. Removing that BUG_ON() > lets me boot just fine. you've a debugging option enabled. I'm afraid somebody wrote common code with the hope that change_page_attr had a natural universal API. change_page_attr API has alwasy required you a certain level of symmetry to work. Now I made it completley symmetric just to make it simpler to use and I added BUG where the previous code would screwup. If you do something like this you'll trigger a BUG_ON too: change_page_attr(page, PAGE_KERNEL_NOCACHE) change_page_attr(page, PAGE_KERNEL) change_page_attr(page, PAGE_KERNEL) the last one will BUG(). At least the new API with these changes will not leak memory anymore as far as you retain simmetry (even if you run the change_page_attr on the same page from different context). I suspect that such debugging option has never worked right. pageattr will need fixing so that to provide a natural universal API that keeps track of each page. Previously this would have failed: A change_page_attr(page1, UNCACHED) -> pte page_count = 2 B change_page_attr(page1, WHATEVER) -> pte page_count still 2 C change_page_attr(page2, UNCACHED) -> pte page count = 3 A change_page_attr(page1, PAGE_KERNEL) -> pte page count = 2 B change_page_attr(page1, PAGE_KERNEL) -> pte page count = 1 ->screwup Now the above can work fine. But still examples like the above one will trigger bugs since they're simply violating the current restricted change_page_attr API. I believe if you disable the DEBUG_PAGEALLOC it should work fine again. Still I recommend investigating _why_ debug_pagealloc is violating the API. It might not be necessary to wait for the pageattr universal feature to make DEBUG_PAGEALLOC work safe. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/