Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751587AbbEGATw (ORCPT ); Wed, 6 May 2015 20:19:52 -0400 Received: from mail-ig0-f170.google.com ([209.85.213.170]:38784 "EHLO mail-ig0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751109AbbEGATt (ORCPT ); Wed, 6 May 2015 20:19:49 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com> Date: Wed, 6 May 2015 17:19:48 -0700 X-Google-Sender-Auth: xhQjnFQmLKWpApWLsY0lFU-jRJo Message-ID: Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t From: Linus Torvalds To: Dan Williams Cc: Linux Kernel Mailing List , Boaz Harrosh , Jan Kara , Mike Snitzer , Neil Brown , Benjamin Herrenschmidt , Dave Hansen , Heiko Carstens , Chris Mason , Paul Mackerras , "H. Peter Anvin" , Christoph Hellwig , Alasdair Kergon , "linux-nvdimm@lists.01.org" , Ingo Molnar , Mel Gorman , Matthew Wilcox , Ross Zwisler , Rik van Riel , Martin Schwidefsky , Jens Axboe , "Theodore Ts'o" , "Martin K. Petersen" , Julia Lawall , Tejun Heo , linux-fsdevel , Andrew Morton Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2391 Lines: 48 On Wed, May 6, 2015 at 4:47 PM, Dan Williams wrote: > > Conceptually better, but certainly more difficult to audit if the fake > struct page is initialized in a subtle way that breaks when/if it > leaks to some unwitting context. Maybe. It could go either way, though. In particular, with the "dynamically allocated struct page" approach, if somebody uses it past the supposed lifetime of the use, things like poisoning the temporary "struct page" could be fairly effective. You can't really poison the pfn - it's just a number, and if somebody uses it later than you think (and you have re-used that physical memory for something else), you'll never ever know. I'd *assume* that most users of the dynamic "struct page" allocation have very clear lifetime rules. Those things would presumably normally get looked-up by some extended version of "get_user_pages()", and there's a clear use of the result, with no longer lifetime. Also, you do need to have some higher-level locking when you do this, to make sure that the persistent pages don't magically get re-assigned. We're presumably talking about having a filesystem in that persistent memory, so we cannot be doing IO to the pages (from some other source - whether RDMA or some special zero-copy model) while the underlying filesystem is reassigning the storage because somebody deleted the file. IOW, there had better be other external rules about when - and how long - you can use a particular persistent page. No? So the whole "when/how to allocate the temporary 'struct page'" is just another detail in that whole thing. And yes, some uses may not ever actually see that. If the whole of persistent memory is just assigned to a database or something, and the DB just wants to do a "flush this range of persistent memory to long-term disk storage", then there may not be much of a "lifetime" issue for the persistent memory. But even then you're going to have IO completion callbacks etc to let the DB know that it has hit the disk, so.. What is the primary thing that is driving this need? Do we have a very concrete example? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/