Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753257AbcJKR5c (ORCPT ); Tue, 11 Oct 2016 13:57:32 -0400 Received: from mail-oi0-f53.google.com ([209.85.218.53]:33425 "EHLO mail-oi0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753243AbcJKR4W (ORCPT ); Tue, 11 Oct 2016 13:56:22 -0400 MIME-Version: 1.0 In-Reply-To: <20161011165811.GO19349@localhost.localdomain> References: <20161010003523.4423-1-haozhong.zhang@intel.com> <57FCF26A02000078000F15E0@prv-mh.provo.novell.com> <20161011165811.GO19349@localhost.localdomain> From: Dan Williams Date: Tue, 11 Oct 2016 10:51:19 -0700 Message-ID: Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen To: Konrad Rzeszutek Wilk Cc: Jan Beulich , Juergen Gross , Haozhong Zhang , Xiao Guangrong , Arnd Bergmann , "linux-nvdimm@lists.01.org" , Boris Ostrovsky , andrew.cooper3@citrix.com, "linux-kernel@vger.kernel.org" , Stefano Stabellini , David Vrabel , Johannes Thumshirn , xen-devel@lists.xenproject.org, Andrew Morton , Ross Zwisler Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2427 Lines: 50 On Tue, Oct 11, 2016 at 9:58 AM, Konrad Rzeszutek Wilk wrote: > On Tue, Oct 11, 2016 at 08:53:33AM -0700, Dan Williams wrote: >> On Tue, Oct 11, 2016 at 6:08 AM, Jan Beulich wrote: >> >>>> Andrew Cooper 10/10/16 6:44 PM >>> >> >>On 10/10/16 01:35, Haozhong Zhang wrote: >> >>> Xen hypervisor needs assistance from Dom0 Linux kernel for following tasks: >> >>> 1) Reserve an area on NVDIMM devices for Xen hypervisor to place >> >>> memory management data structures, i.e. frame table and M2P table. >> >>> 2) Report SPA ranges of NVDIMM devices and the reserved area to Xen >> >>> hypervisor. >> >> >> >>However, I can't see any justification for 1). Dom0 should not be >> >>involved in Xen's management of its own frame table and m2p. The mfns >> >>making up the pmem/pblk regions should be treated just like any other >> >>MMIO regions, and be handed wholesale to dom0 by default. >> > >> > That precludes the use as RAM extension, and I thought earlier rounds of >> > discussion had got everyone in agreement that at least for the pmem case >> > we will need some control data in Xen. >> >> The missing piece for me is why this reservation for control data >> needs to be done in the libnvdimm core? I would expect that any dax > > Isn't it done this way with Linux? That is say if the machine has > 4GB of RAM and the NVDIMM is in TB range. You want to put the 'struct page' > for the NVDIMM ranges somewhere. That place can be in regions on the > NVDIMM that ndctl can reserve. Yes. >> capable file could be mapped and made available to a guest. This >> includes /dev/ramX devices that are dax capable, but are external to >> the libnvdimm sub-system. > > This is more of just keeping track of the ranges if say the DAX file is > extremely fragmented and requires a lot of 'struct pages' to keep track of > when stiching up the VMA. Right, but why does the libnvdimm core need to know about this specific Xen reservation? For example, if Xen wants some in-kernel driver to own a pmem region and place its own metadata on the device I would recommend something like: bdev = blkdev_get_by_path("/dev/pmemX", FMODE_EXCL...); bdev_direct_access(bdev, ...); ...in other words, I don't think we want libnvdimm to grow new device types for every possible in-kernel user, Xen, MD, DM, etc. Instead, just claim the resulting device.