Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751332AbbEGJC0 (ORCPT ); Thu, 7 May 2015 05:02:26 -0400 Received: from mail-wi0-f181.google.com ([209.85.212.181]:38124 "EHLO mail-wi0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750907AbbEGJCX (ORCPT ); Thu, 7 May 2015 05:02:23 -0400 Date: Thu, 7 May 2015 11:02:17 +0200 From: Ingo Molnar To: Dan Williams Cc: Linus Torvalds , Linux Kernel Mailing List , Boaz Harrosh , Jan Kara , Mike Snitzer , Neil Brown , Benjamin Herrenschmidt , Dave Hansen , Heiko Carstens , Chris Mason , Paul Mackerras , "H. Peter Anvin" , Christoph Hellwig , Alasdair Kergon , "linux-nvdimm@lists.01.org" , Mel Gorman , Matthew Wilcox , Ross Zwisler , Rik van Riel , Martin Schwidefsky , Jens Axboe , "Theodore Ts'o" , "Martin K. Petersen" , Julia Lawall , Tejun Heo , linux-fsdevel , Andrew Morton Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t Message-ID: <20150507090217.GA4467@gmail.com> References: <20150506200219.40425.74411.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2736 Lines: 62 * Dan Williams wrote: > > What is the primary thing that is driving this need? Do we have a > > very concrete example? > > My pet concrete example is covered by __pfn_t. Referencing > persistent memory in an md/dm hierarchical storage configuration. > Setting aside the thrash to get existing block users to do > "bvec_set_page(page)" instead of "bvec->page = page" the onus is on > that md/dm implementation and backing storage device driver to > operate on __pfn_t. That use case is simple because there is no use > of page locking or refcounting in that path, just dma_map_page() and > kmap_atomic(). The more difficult use case is precisely what Al > picked up on, O_DIRECT and RDMA. This patchset does nothing to > address those use cases outside of not needing a struct page when > they eventually craft a bio. So why not do a dual approach? There are code paths where the 'pfn' of a persistent device is mostly used as a sector_t equivalent of terabytes of storage, not as an index of a memory object. It's not an address to a cache, it's an index into a huge storage space - which happens to be (flash) RAM. For them using pfn_t seems natural and using struct page * is a strained (not to mention expensive) model. For more complex facilities, where persistent memory is used as a memory object, especially where the underlying device is true, unfinitely writable RAM (not flash), treating it as a memory zone, or setting up dynamic struct page would be the natural approach. (with the inevitable cost of setup/teardown in the latter case) I'd say that for anything where the dynamic struct page is torn down unconditionally after completion of only a single use, the natural API is probably pfn_t, not struct page. Any synchronization is already handled at the block request layer already, and it's storage op synchronization, not memory access synchronization really. For anything more complex, that maps any of this storage to user-space, or exposes it to higher level struct page based APIs, etc., where references matter and it's more of a cache with potentially multiple users, not an IO space, the natural API is struct page. I'd say that this particular series mostly addresses the 'pfn as sector_t' side of the equation, where persistent memory is IO space, not memory space, and as such it is the more natural and thus also the cheaper/faster approach. Linus probably disagrees? :-) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/