Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752054AbbEGRgx (ORCPT ); Thu, 7 May 2015 13:36:53 -0400 Received: from mail-wg0-f53.google.com ([74.125.82.53]:34980 "EHLO mail-wg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751007AbbEGRgs (ORCPT ); Thu, 7 May 2015 13:36:48 -0400 Date: Thu, 7 May 2015 19:36:41 +0200 From: Ingo Molnar To: Dan Williams Cc: Linus Torvalds , Linux Kernel Mailing List , Boaz Harrosh , Jan Kara , Mike Snitzer , Neil Brown , Benjamin Herrenschmidt , Dave Hansen , Heiko Carstens , Chris Mason , Paul Mackerras , "H. Peter Anvin" , Christoph Hellwig , Alasdair Kergon , "linux-nvdimm@lists.01.org" , Mel Gorman , Matthew Wilcox , Ross Zwisler , Rik van Riel , Martin Schwidefsky , Jens Axboe , "Theodore Ts'o" , "Martin K. Petersen" , Julia Lawall , Tejun Heo , linux-fsdevel , Andrew Morton Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Message-ID: <20150507173641.GA21781@gmail.com> References: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2596 Lines: 64 * Dan Williams wrote: > > Anyway, I did want to say that while I may not be convinced about > > the approach, I think the patches themselves don't look horrible. > > I actually like your "__pfn_t". So while I (very obviously) have > > some doubts about this approach, it may be that the most > > convincing argument is just in the code. > > Ok, I'll keep thinking about this and come back when we have a > better story about passing mmap'd persistent memory around in > userspace. So is there anything fundamentally wrong about creating struct page backing at mmap() time (and making sure aliased mmaps share struct page arrays)? Because if that is done, then the DMA agent won't even know about the memory being persistent RAM. It's just a regular struct page, that happens to point to persistent RAM. Same goes for all the high level VM APIs, futexes, etc. Everything will Just Work. It will also be relatively fast: mmap() is a relative slowpath, comparatively. As far as RAID is concerned: that's a relatively easy situation, as there's only a single user of the devices, the RAID context that manages all component devices exclusively. Device to device DMA can use the block layer directly, i.e. most of the patches you've got here in this series, except: 74287 C May 06 Dan Williams ( 232) ├─>[PATCH v2 09/10] dax: convert to __pfn_t I think DAX mmap()s need struct page backing. I think there's a simple rule: if a page is visible to user-space via the MMU then it needs struct page backing. If it's "hidden", like behind a RAID abstraction, it probably doesn't. With the remaining patches a high level RAID driver ought to be able to send pfn-to-sector and sector-to-pfn requests to other block drivers, without any unnecessary struct page allocation overhead, right? As long as the pfn concept remains a clever way to reuse our ram<->sector interfaces to implement sector<->sector IO, in the cases where the IO has no serialization or MMU concerns, not using struct page and using pfn_t looks natural. The moment it starts reaching user space APIs, like in the DAX case, and especially if it becomes user-MMU visible, it's a mistake to not have struct page backing, I think. (In that sense the current DAX mmap() code is already a partial mistake.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/