Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932163AbbEHNqP (ORCPT ); Fri, 8 May 2015 09:46:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44843 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932133AbbEHNqJ (ORCPT ); Fri, 8 May 2015 09:46:09 -0400 Message-ID: <554CBE17.4070904@redhat.com> Date: Fri, 08 May 2015 09:45:59 -0400 From: Rik van Riel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Ingo Molnar , Dave Hansen CC: Dan Williams , Linus Torvalds , Linux Kernel Mailing List , Boaz Harrosh , Jan Kara , Mike Snitzer , Neil Brown , Benjamin Herrenschmidt , Heiko Carstens , Chris Mason , Paul Mackerras , "H. Peter Anvin" , Christoph Hellwig , Alasdair Kergon , "linux-nvdimm@lists.01.org" , Mel Gorman , Matthew Wilcox , Ross Zwisler , Martin Schwidefsky , Jens Axboe , "Theodore Ts'o" , "Martin K. Petersen" , Julia Lawall , Tejun Heo , linux-fsdevel , Andrew Morton Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t References: <20150507173641.GA21781@gmail.com> <554BA748.9030804@linux.intel.com> <20150507191107.GB22952@gmail.com> In-Reply-To: <20150507191107.GB22952@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2932 Lines: 68 On 05/07/2015 03:11 PM, Ingo Molnar wrote: > Stable, global page-struct descriptors are a given for real RAM, where > we allocate a struct page for every page in nice, large, mostly linear > arrays. > > We'd really need that for pmem too, to get the full power of struct > page: and that means allocating them in nice, large, predictable > places - such as on the device itself ... > > It might even be 'scattered' across the device, with 64 byte struct > page size we can pack 64 descriptors into a single page, so every 65 > pages we could have a page-struct page. > > Finding a pmem page's struct page would thus involve rounding it > modulo 65 and reading that page. > > The problem with that is fourfold: > > - that we now turn a very kernel internal API and data structure into > an ABI. If struct page grows beyond 64 bytes it's a problem. > > - on bootup (or device discovery time) we'd have to initialize all > the page structs. We could probably do this in a hierarchical way, > by dividing continuous pmem ranges into power-of-two groups of > blocks, and organizing them like the buddy allocator does. > > - 1.5% of storage space lost. > > - will wear-leveling properly migrate these 'hot' pages around? MST and I have been doing some thinking about how to address some of the issues above. One way could be to invert the PG_compound logic we have today, by allocating one struct page for every PMD / THP sized area (2MB on x86), and dynamically allocating struct pages for the 4kB pages inside only if the area gets split. They can be freed again when the area is not being accessed in 4kB chunks. That way we would always look at the struct page for the 2MB area first, and if the PG_split bit is set, we look at the array of dynamically allocated struct pages for this area. The advantages are obvious: boot time memory overhead and initialization time are reduced by a factor 512. CPUs could also take a whole 2MB area in order to do CPU-local 4kB allocations, defragmentation policies may become a little clearer, etc... The disadvantage is pretty obvious too: 4kB pages would no longer be the fast case, with an indirection. I do not know how much of an issue that would be, or whether it even makes sense for 4kB pages to continue being the fast case going forward. Memory trends point in one direction, file size trends in another. For persistent memory, we would not need 4kB page struct pages unless memory from a particular area was in small files AND those files were being actively accessed. Large files (mapped in 2MB chunks) or inactive small files would not need the 4kB page structs around. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/