Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756233AbcJMTIO (ORCPT ); Thu, 13 Oct 2016 15:08:14 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:34265 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753859AbcJMTIG (ORCPT ); Thu, 13 Oct 2016 15:08:06 -0400 MIME-Version: 1.0 In-Reply-To: References: <20161011194810.GD25907@localhost.localdomain> <20161012103318.vq36ed5ebb5xxcom@hz-desktop> <57FE3B880200007800116A75@prv-mh.provo.novell.com> <20161012145826.wwxecoo4o3ypos5o@hz-desktop> <57FE75520200007800116D27@prv-mh.provo.novell.com> <57FE7A710200007800116D60@prv-mh.provo.novell.com> <57FF633E0200007800116F59@prv-mh.provo.novell.com> <20161013085344.ulju7pnnbvufc4em@hz-desktop> <57FF6B130200007800116F96@prv-mh.provo.novell.com> From: Dan Williams Date: Thu, 13 Oct 2016 11:59:53 -0700 Message-ID: Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen To: Andrew Cooper Cc: Jan Beulich , Juergen Gross , Haozhong Zhang , Xiao Guangrong , Arnd Bergmann , Boris Ostrovsky , Johannes Thumshirn , "linux-kernel@vger.kernel.org" , Stefano Stabellini , David Vrabel , "linux-nvdimm@lists.01.org" , xen-devel@lists.xenproject.org, Andrew Morton , Ross Zwisler Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2904 Lines: 58 On Thu, Oct 13, 2016 at 9:01 AM, Andrew Cooper wrote: > On 13/10/16 16:40, Dan Williams wrote: >> On Thu, Oct 13, 2016 at 2:08 AM, Jan Beulich wrote: >> [..] >>>> I think we can do the similar for Xen, like to lay another pseudo >>>> device on /dev/pmem and do the reservation, like 2. in my previous >>>> reply. >>> Well, my opinion certainly doesn't count much here, but I continue to >>> consider this a bad idea. For entities like drivers it may well be >>> appropriate, but I think there ought to be an independent concept >>> of "OS reserved", and in the Xen case this could then be shared >>> between hypervisor and Dom0 kernel. Or if we were to consider Dom0 >>> "just a guest", things should even be the other way around: Xen gets >>> all of the OS reserved space, and Dom0 needs something custom. >> You haven't made the case why Xen is special and other applications of >> persistent memory are not. > > In a Xen system, Xen runs in the baremetal root-mode ring0, and dom0 is > a VM running in ring1/3 with the nvdimm driver. This is the opposite > way around to the KVM model. > > Dom0, being the hardware domain, has default ownership of all the > hardware, but to gain access in the first place, it must request a > mapping from Xen. This is where my understanding the Xen model breaks down. Are you saying dom0 can't access the persistent memory range unless the ring0 agent has metadata storage space for tracking what it maps into dom0? That can't be true because then PCI memory ranges would not work without metadata reserve space. Dom0 still needs to map and write the DIMMs to even set up the struct page reservation, it isn't established by default. > Xen therefore needs to know and cope with being able > to give dom0 a mapping to the nvdimms, without touching the content of > the nvidmm itself (so as to avoid corrupting data). Is it true that this metadata only comes into use when remapping the dom0 discovered range(s) into a guest VM? > Once dom0 has a mapping of the nvdimm, the nvdimm driver can go to work > and figure out what is on the DIMM, and which areas are safe to use. I don't understand this ordering of events. Dom0 needs to have a mapping to even write the on-media structure to indicate a reservation. So, initial dom0 access can't depend on metadata reservation already being present. > At this point, a Xen subsystem in Linux could choose one or more areas > to hand back to the hypervisor to use as RAM/other. To me all this configuration seems to come after the fact. After dom0 sees /dev/pmemX devices, then it can go to work carving it up and writing Xen specific metadata to the range(s). The struct page reservation never comes into the picture. In fact, a raw mode namespace (one without a reservation) could be used in this model, the nvdimm core never needs to know what is happening.