Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262445AbUKZXjC (ORCPT ); Fri, 26 Nov 2004 18:39:02 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S263162AbUKZTqW (ORCPT ); Fri, 26 Nov 2004 14:46:22 -0500 Received: from mail-relay-2.tiscali.it ([213.205.33.42]:50353 "EHLO mail-relay-2.tiscali.it") by vger.kernel.org with ESMTP id S262445AbUKZT2h (ORCPT ); Fri, 26 Nov 2004 14:28:37 -0500 Date: Thu, 25 Nov 2004 21:32:48 +0100 From: Andrea Arcangeli To: Marcelo Tosatti Cc: Alan Cox , bgagnon@coradiant.com, Linux Kernel Mailing List , Andrea Arcangeli , davem@redhat.com Subject: Re: Memory leak in 2.4.27 kernel, using mmap raw packet sockets Message-ID: <20041125203248.GD5904@dualathlon.random> References: <20041015182352.GA4937@logos.cnet> <1097980764.13226.21.camel@localhost.localdomain> <20041125150206.GF16633@logos.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041125150206.GF16633@logos.cnet> X-GPG-Key: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43 X-PGP-Key: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5 User-Agent: Mutt/1.5.6i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3726 Lines: 87 On Thu, Nov 25, 2004 at 01:02:06PM -0200, Marcelo Tosatti wrote: > On Sun, Oct 17, 2004 at 03:39:26AM +0100, Alan Cox wrote: > > On Gwe, 2004-10-15 at 19:23, Marcelo Tosatti wrote: > > > I prefer doing the "if (PageReserved(page)) put_page_testzero(page)" as > > > you propose instead of changing get_user_pages(), as there are several > > > users which rely on its behaviour. > > > > > > I have applied your fix to the 2.4 BK tree. > > > > That isnt sufficient. Consider anything else taking a reference to the > > page and the refcount going negative. And yes 2.6.x has this problem and > > far worse in some ways, but it also has the mechanism to fix it. > > > > 2.6.x uses VM_IO as a VMA flag which tells the kernel two things > > a) get_user_pages fails on it > > b) core dumping of it is forbidden > > > > 2.6.x is missing a whole pile of these (fixed in the 2.6.9-ac tree I'm > > putting together). I *think* remap_page_range() in 2.6.x can just set > > VM_IO, but older kernels didn't pass the vma so all the users would need > > fixing (OSS audio, media/video, usb audio, usb video, frame buffer > > etc). > > I dont see any practical problem with 2.4.x right now. > > get_user_pages() wont be called on driver created VMA's with PageReserved > pages because of the VM_IO bit which is set at remap_page_range(). > > Its not possible to have any vma mapped by a driver without VM_IO set. > > But the network packet mmap was an isolated case, so I'll apply Andrea's > fix just for safety, although I can't find any offender in the tree. > > > --- memory.c 2004-10-22 15:58:28.000000000 -0200 > +++ memory.c 2004-10-28 14:32:26.585813200 -0200 > @@ -499,7 +499,7 @@ > /* FIXME: call the correct function, > * depending on the type of the found page > */ > - if (!pages[i]) > + if (!pages[i] || PageReserved(pages[i])) > goto bad_page; > page_cache_get(pages[i]); > } this needs to be modified to take the ZERO_PAGE(start) into account. It's a minor detail, but we should allow that. (my 2.4-aa tree was already checking the zeropage for other reasons, so it didn't need any change) in short the above fix should be modified like this: if (!pages[i] || PageReserved(pages[i])) { if (pages[i] != ZERO_PAGE(start)) goto bad_page; } else page_cache_get(pages[i]); the zero page is guaranteed to remain reserved. the major bug is that get_user_pages was allowing _temporarily_ reserved pages to be pinned. __free_pages isn't checking the VM_IO, and as such get_user_pages should be robust against its __free_pages counterpart, this is why I believe the above fix is the right fix. ZEROPAGE is special because: 1) it's guaranteed to never be unpinned 2) it's the only reserved page that handle_mm_fault is allowed to istantiate for a filesystem data mapping The VM_IO enforcment is a nice improvement on top of the above. Checking the PageReserved bitflag is a good thing at least for the zeropage, so we don't overflow the zeropage count, which isn't nice. If you really want to fix it only using VM_IO, I still recommend to apply the above patch, and to turn 'goto bad_page' into BUG(). It'd be a bad idea not to at least add the above code as a robustness check. Comments welcome. thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/