Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751870AbbEGUSa (ORCPT ); Thu, 7 May 2015 16:18:30 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:38652 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751061AbbEGUS0 (ORCPT ); Thu, 7 May 2015 16:18:26 -0400 Date: Thu, 7 May 2015 16:18:17 -0400 From: Jerome Glisse To: Ingo Molnar Cc: Dave Hansen , Dan Williams , Linus Torvalds , Linux Kernel Mailing List , Boaz Harrosh , Jan Kara , Mike Snitzer , Neil Brown , Benjamin Herrenschmidt , Heiko Carstens , Chris Mason , Paul Mackerras , "H. Peter Anvin" , Christoph Hellwig , Alasdair Kergon , "linux-nvdimm@lists.01.org" , Mel Gorman , Matthew Wilcox , Ross Zwisler , Rik van Riel , Martin Schwidefsky , Jens Axboe , "Theodore Ts'o" , "Martin K. Petersen" , Julia Lawall , Tejun Heo , linux-fsdevel , Andrew Morton , paulmck@linux.vnet.ibm.com Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Message-ID: <20150507201815.GD5966@gmail.com> References: <20150507173641.GA21781@gmail.com> <554BA748.9030804@linux.intel.com> <20150507191107.GB22952@gmail.com> <20150507193635.GC5966@gmail.com> <20150507194832.GB23511@gmail.com> <20150507195313.GA23597@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20150507195313.GA23597@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2991 Lines: 65 On Thu, May 07, 2015 at 09:53:13PM +0200, Ingo Molnar wrote: > > * Ingo Molnar wrote: > > > > Is handling kernel pagefault on the vmemmap completely out of the > > > picture ? So we would carveout a chunck of kernel address space > > > for those pfn and use it for vmemmap and handle pagefault on it. > > > > That's pretty clever. The page fault doesn't even have to do remote > > TLB shootdown, because it only establishes mappings - so it's pretty > > atomic, a bit like the minor vmalloc() area faults we are doing. > > > > Some sort of LRA (least recently allocated) scheme could unmap the > > area in chunks if it's beyond a certain size, to keep a limit on > > size. Done from the same context and would use remote TLB shootdown. > > > > The only limitation I can see is that such faults would have to be > > able to sleep, to do the allocation. So pfn_to_page() could not be > > used in arbitrary contexts. > > So another complication would be that we cannot just unmap such pages > when we want to recycle them, because the struct page in them might be > in use - so all struct page uses would have to refcount the underlying > page. We don't really do that today: code just looks up struct pages > and assumes they never go away. I still think this is doable, like i said in another email, i think we should introduce a special pfn_to_page_dev|pmem|waffle|somethingyoulike() to place that are allowed to allocate the underlying struct page. For instance we can use a default page to backup all this special vmem range with some specialy crafted struct page that says that its is invalid memory (make this mapping read only so all write to this special struct page is forbidden). Now once an authorized user comes along and need a real struct page it trigger a page allocation that replace the page full of fake invalid struct page with a page with correct valid struct page that can be manipulated by other part of the kernel. So regular pfn_to_page() would test against special vmemmap and if special test the content of struct page for some flag. If it's the invalid page flag it returns 0. But once a proper struct page is allocated then pfn_page would return the struct page as expected. That way you will catch all invalid user of such page ie user that use the page after its lifetime is done. You will also limit the creation of the underlying proper struct page to only code that are legitimate to ask for a proper struct page for given pfn. Also you would get kernel write fault on the page full of fake struct page and that would allow to catch further wrong use. Anyway this is how i envision this and i think it would work for my usecase too (GPU it is for me :)) Cheers, J?r?me -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/