Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754865AbcJLQUR (ORCPT ); Wed, 12 Oct 2016 12:20:17 -0400 Received: from mail-oi0-f47.google.com ([209.85.218.47]:33023 "EHLO mail-oi0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753296AbcJLQUH (ORCPT ); Wed, 12 Oct 2016 12:20:07 -0400 MIME-Version: 1.0 In-Reply-To: <57FE7A710200007800116D60@prv-mh.provo.novell.com> References: <57FCF26A02000078000F15E0@prv-mh.provo.novell.com> <20161011165811.GO19349@localhost.localdomain> <20161011183259.GA23193@localhost.localdomain> <20161011194810.GD25907@localhost.localdomain> <20161012103318.vq36ed5ebb5xxcom@hz-desktop> <57FE3B880200007800116A75@prv-mh.provo.novell.com> <20161012145826.wwxecoo4o3ypos5o@hz-desktop> <57FE75520200007800116D27@prv-mh.provo.novell.com> <57FE7A710200007800116D60@prv-mh.provo.novell.com> From: Dan Williams Date: Wed, 12 Oct 2016 09:19:56 -0700 Message-ID: Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen To: Jan Beulich Cc: Stefano Stabellini , Arnd Bergmann , andrew.cooper3@citrix.com, David Vrabel , Haozhong Zhang , Andrew Morton , Xiao Guangrong , Ross Zwisler , xen-devel@lists.xenproject.org, "linux-nvdimm@lists.01.org" , Boris Ostrovsky , Konrad Rzeszutek Wilk , Juergen Gross , Johannes Thumshirn , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3640 Lines: 69 On Wed, Oct 12, 2016 at 9:01 AM, Jan Beulich wrote: >>>> On 12.10.16 at 17:42, wrote: >> On Wed, Oct 12, 2016 at 8:39 AM, Jan Beulich wrote: >>>>>> On 12.10.16 at 16:58, wrote: >>>> On 10/12/16 05:32 -0600, Jan Beulich wrote: >>>>>>>> On 12.10.16 at 12:33, wrote: >>>>>> The layout is shown as the following diagram. >>>>>> >>>>>> +---------------+-----------+-------+----------+--------------+ >>>>>> | whatever used | Partition | Super | Reserved | /dev/pmem0p1 | >>>>>> | by kernel | Table | Block | for Xen | | >>>>>> +---------------+-----------+-------+----------+--------------+ >>>>>> \_____________________ _______________________/ >>>>>> V >>>>>> /dev/pmem0 >>>>> >>>>>I have to admit that I dislike this, for not being OS-agnostic. >>>>>Neither should there be any Xen-specific region, nor should the >>>>>"whatever used by kernel" one be restricted to just Linux. What >>>>>I could see is an OS-reserved area ahead of the partition table, >>>>>the exact usage of which depends on which OS is currently >>>>>running (and in the Xen case this might be both Xen _and_ the >>>>>Dom0 kernel, arbitrated by a tbd protocol). After all, when >>>>>running under Xen, the Dom0 may not have a need for as much >>>>>control data as it has when running on bare hardware, for it >>>>>controlling less (if any) of the actual memory ranges when Xen >>>>>is present. >>>>> >>>> >>>> Isn't this OS-reserved area still not OS-agnostic, as it requires OS >>>> to know where the reserved area is? Or do you mean it's not if it's >>>> defined by a protocol that is accepted by all OSes? >>> >>> The latter - we clearly won't get away without some agreement on >>> where to retrieve position and size of this area. I was simply >>> assuming that such a protocol already exists. >>> >> >> No, we should not mix the struct page reservation that the Dom0 kernel >> may actively use with the Xen reservation that the Dom0 kernel does >> not consume. Explain again what is wrong with the partition approach? > > Not sure what was unclear in my previous reply. I don't think there > should be apriori knowledge of whether Xen is (going to be) used on > a system, and even if it gets used, but just occasionally, it would > (apart from the abstract considerations already given) be a waste > of resources to set something aside that could be used for other > purposes while Xen is not running. Static partitioning should only be > needed for persistent data. > The reservation needs to be persistent / static even if the data is volatile, as is the case with struct page, because we can't have the size of the device change depending on use. So, from the aspect of wasting space while Xen is not in use, both partitions and the intrinsic reservation approach suffer the same problem. Setting that aside I don't want to mix 2 different use cases into the same reservation. The kernel needs to know about the struct page reservation because it needs to manage the lifetime of page references vs the lifetime of the device. It does not have the same relationship with a Xen reservation which is why I'm proposing they be managed separately. Note that Toshi and Mike added DM for DAX. This enabling ends up writing DM metadata on the device without adding new reservation mechanisms to the nvdimm core. I'm struggling to see how the Xen use case is materially different DM. In the end it's an application specific metadata space.