Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753179AbcCGREh (ORCPT ); Mon, 7 Mar 2016 12:04:37 -0500 Received: from g4t3426.houston.hp.com ([15.201.208.54]:50919 "EHLO g4t3426.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752966AbcCGREP (ORCPT ); Mon, 7 Mar 2016 12:04:15 -0500 Message-ID: <1457373413.15454.334.camel@hpe.com> Subject: Re: [PATCH v2 2/3] libnvdimm, pmem: adjust for section collisions with 'System RAM' From: Toshi Kani To: Dan Williams Cc: "linux-nvdimm@lists.01.org" , linux-mm , Linux Kernel Mailing List Date: Mon, 07 Mar 2016 10:56:53 -0700 In-Reply-To: References: <20160303215304.1014.69931.stgit@dwillia2-desk3.amr.corp.intel.com> <20160303215315.1014.95661.stgit@dwillia2-desk3.amr.corp.intel.com> <1457146138.15454.277.camel@hpe.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.4 (3.18.4-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5346 Lines: 126 On Fri, 2016-03-04 at 18:23 -0800, Dan Williams wrote: > On Fri, Mar 4, 2016 at 6:48 PM, Toshi Kani wrote: > > On Thu, 2016-03-03 at 13:53 -0800, Dan Williams wrote: > > > On a platform where 'Persistent Memory' and 'System RAM' are mixed > > > within a given sparsemem section, trim the namespace and notify about > > > the > > > sub-optimal alignment. > > > > > > Cc: Toshi Kani > > > Cc: Ross Zwisler > > > Signed-off-by: Dan Williams > > > --- > > >  drivers/nvdimm/namespace_devs.c |    7 ++ > > >  drivers/nvdimm/pfn.h            |   10 ++- > > >  drivers/nvdimm/pfn_devs.c       |    5 ++ > > >  drivers/nvdimm/pmem.c           |  125 ++++++++++++++++++++++++++++- > > > ---- > > > ------ > > >  4 files changed, 111 insertions(+), 36 deletions(-) > > > > > > diff --git a/drivers/nvdimm/namespace_devs.c > > > b/drivers/nvdimm/namespace_devs.c > > > index 8ebfcaae3f5a..463756ca2d4b 100644 > > > --- a/drivers/nvdimm/namespace_devs.c > > > +++ b/drivers/nvdimm/namespace_devs.c > > > @@ -133,6 +133,7 @@ bool nd_is_uuid_unique(struct device *dev, u8 > > > *uuid) > > >  bool pmem_should_map_pages(struct device *dev) > > >  { > > >       struct nd_region *nd_region = to_nd_region(dev->parent); > > > +     struct nd_namespace_io *nsio; > > > > > >       if (!IS_ENABLED(CONFIG_ZONE_DEVICE)) > > >               return false; > > > @@ -143,6 +144,12 @@ bool pmem_should_map_pages(struct device *dev) > > >       if (is_nd_pfn(dev) || is_nd_btt(dev)) > > >               return false; > > > > > > +     nsio = to_nd_namespace_io(dev); > > > +     if (region_intersects(nsio->res.start, resource_size(&nsio- > > > > res), > > > +                             IORESOURCE_SYSTEM_RAM, > > > +                             IORES_DESC_NONE) == REGION_MIXED) > > > > Should this be != REGION_DISJOINT for safe? > > Acutally, it's ok.  It doesn't need to be disjoint.  The problem is > mixing an mm-zone within a given section.  If the region intersects > system-ram then devm_memremap_pages() is a no-op and we can use the > existing page allocation and linear mapping. Oh, I see. > > > > > +             return false; > > > + > > > >  : > > > > > @@ -304,21 +311,56 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > > >       } > > > > > >       memset(pfn_sb, 0, sizeof(*pfn_sb)); > > > -     npfns = (pmem->size - SZ_8K) / SZ_4K; > > > + > > > +     /* > > > +      * Check if pmem collides with 'System RAM' when section > > > aligned > > > and > > > +      * trim it accordingly > > > +      */ > > > +     nsio = to_nd_namespace_io(&ndns->dev); > > > +     start = PHYS_SECTION_ALIGN_DOWN(nsio->res.start); > > > +     size = resource_size(&nsio->res); > > > +     if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, > > > +                             IORES_DESC_NONE) == REGION_MIXED) { > > > + > > > +             start = nsio->res.start; > > > +             start_pad = PHYS_SECTION_ALIGN_UP(start) - start; > > > +     } > > > + > > > +     start = nsio->res.start; > > > +     size = PHYS_SECTION_ALIGN_UP(start + size) - start; > > > +     if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, > > > +                             IORES_DESC_NONE) == REGION_MIXED) { > > > +             size = resource_size(&nsio->res); > > > +             end_trunc = start + size - > > > PHYS_SECTION_ALIGN_DOWN(start > > > + size); > > > +     } > > > > This check seems to assume that guest's regular memory layout does not > > change.  That is, if there is no collision at first, there won't be any > > later.  Is this a valid assumption? > > If platform firmware changes the physical alignment during the > lifetime of the namespace there's not much we can do.   The physical alignment can be changed as long as it is large enough (see below). > Another problem > not addressed by this patch is firmware choosing to hot plug system > ram into the same section as persistent memory.   Yes, and it does not have to be a hot-plug operation.  Memory size may be changed off-line.  Data image can be copied to different guests for instant deployment, or may be migrated to a different guest. > As far as I can see > all we do is ask firmware implementations to respect Linux section > boundaries and otherwise not change alignments. In addition to the requirement that pmem range alignment may not change, the code also requires a regular memory range does not change to intersect with a pmem section later.  This seems fragile to me since guest config may vary / change as I mentioned above. So, shouldn't the driver fails to attach when the range is not aligned by the section size?  Since we need to place a requirement to firmware anyway, we can simply state that it must be aligned by 128MiB (at least) on x86.  Then, memory and pmem physical layouts can be changed as long as this requirement is met. Thanks, -Toshi