Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753561AbcJKUTM (ORCPT ); Tue, 11 Oct 2016 16:19:12 -0400 Received: from smtp.ctxuk.citrix.com ([185.25.65.24]:58338 "EHLO SMTP.EU.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753427AbcJKUTJ (ORCPT ); Tue, 11 Oct 2016 16:19:09 -0400 X-IronPort-AV: E=Sophos;i="5.31,330,1473120000"; d="scan'208";a="32875299" Subject: Re: [Xen-devel] [RFC KERNEL PATCH 0/2] Add Dom0 NVDIMM support for Xen To: Konrad Rzeszutek Wilk , Dan Williams References: <20161010003523.4423-1-haozhong.zhang@intel.com> <57FCF26A02000078000F15E0@prv-mh.provo.novell.com> <20161011165811.GO19349@localhost.localdomain> <20161011183259.GA23193@localhost.localdomain> <20161011194810.GD25907@localhost.localdomain> CC: Jan Beulich , Juergen Gross , "Haozhong Zhang" , Xiao Guangrong , Arnd Bergmann , "linux-nvdimm@lists.01.org" , Boris Ostrovsky , "linux-kernel@vger.kernel.org" , Stefano Stabellini , David Vrabel , Johannes Thumshirn , , Andrew Morton , Ross Zwisler From: Andrew Cooper Message-ID: Date: Tue, 11 Oct 2016 21:18:51 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.3.0 MIME-Version: 1.0 In-Reply-To: <20161011194810.GD25907@localhost.localdomain> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: AMSPEX02CAS01.citrite.net (10.69.22.112) To AMSPEX02CL02.citrite.net (10.69.22.126) X-DLP: AMS1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2834 Lines: 56 On 11/10/16 20:48, Konrad Rzeszutek Wilk wrote: > On Tue, Oct 11, 2016 at 12:28:56PM -0700, Dan Williams wrote: >> On Tue, Oct 11, 2016 at 11:33 AM, Konrad Rzeszutek Wilk >> wrote: >>> On Tue, Oct 11, 2016 at 10:51:19AM -0700, Dan Williams wrote: >> [..] >>>> Right, but why does the libnvdimm core need to know about this >>>> specific Xen reservation? For example, if Xen wants some in-kernel >>> Let me turn this around - why does the libnvdimm core need to know about >>> Linux specific parts? Shouldn't this be OS agnostic, so that FreeBSD >>> for example can also poke a hole in this and fill it with its >>> OS-management meta-data? >> Specifically the core needs to know so that it can answer the Linux >> specific question of whether the pfn returned by ->direct_access() has >> a corresponding struct page or not. It's tied to the lifetime of the >> device and the usage of the reservation needs to be coordinated >> against the references of those pages. If FreeBSD decides it needs to >> reserve "struct page" capacity at the start of the device, I would >> hope that it reuses the same on-device info block that Linux is using >> and not create a new "FreeBSD-mode" device type. > The issue here (as I understand, I may be missing something new) > is that the size of this special namespace may be different. That is > the 'struct page' on FreeBSD could be 256 bytes while on Linux it is > 64 bytes (numbers pulled out of the sky). > > Hence one would have to expand or such to re-use this. >> To be honest I do not yet understand what metadata Xen wants to store >> in the device, but it seems the producer and consumer of that metadata >> is Xen itself and not the wider Linux kernel as is the case with >> struct page. Can you fill me in on what problem Xen solves with this > Exactly! >> reservation? > The same as Linux - its variant of 'struct page'. Which I think is > smaller than the Linux one, but perhaps it is not? There is still a bootstrapping issue though, which looks (in its current form) to cause data corruption. I hope I am mistaken, and apologies if I am, but clearly we cannot build a solution that has data corruption in anything other than an exceptional circumstance. So far, the sequence of boot operations appears to look like this: Xen boots, and may find some NVDIMM SPA/MFN ranges via the NFIT table. Any ranges available only from AML need dynamically reporting back to Xen at a later point, once OSPM is up and running. The NVDIMMs must be mappable by dom0 so the contents can be inspected and deemed to be safe by the nvdimm driver/host admin, before Xen starts writing to any of it (for whatever reason). If this isn't the case, then simply booting a Xen/dom0 combo will end up corrupting a region before working out that it is safe to do so. ~Andrew