Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp756058ybv; Thu, 13 Feb 2020 08:58:36 -0800 (PST) X-Google-Smtp-Source: APXvYqzV0ng9uBD/yAfc4S2GsoAo/x1CgBUj8mJOT3prQOd8z78Xza9vMeSiUSqqJxTZmPNfDRrA X-Received: by 2002:a05:6808:aa8:: with SMTP id r8mr3403996oij.7.1581613116748; Thu, 13 Feb 2020 08:58:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581613116; cv=none; d=google.com; s=arc-20160816; b=LNWRl+E9cZ/YEhPkbnlqaAJQ95pBljbCaXBYN7oxCYdqGEiThoanuyCLA/0LT2Mgtp ZNCLfzBKHynY7ABI62jWQIKUub8OQbz6xFpIC5ZSXD1vgqSL0FdEgUEpbIwe/kpn/piL EyLnWoXiDXz6xN2x0bRbnHsrBj771rh16EwoHPE0mde0em0mFqJHEHjF3QgFk2dc6a2a YEFwsciNvRbhqXlPvTdCDgAKZ/AQCBIg4Gh8o4vjL4yjywxQv9aS5QZBUtqjEZZxhViY ZVcFcdKGaSkfLEP7yiWWmgIFj/aY9Q/dT6+LZFNWHCrKXB/RKZJy5a7vhpfNR4fE7VDq x6Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:dkim-signature; bh=CsVqx+kyd9Ql+81DltyyZW5LFYl4/Tko9hU86AVRTT8=; b=yIkuhF6zbn+jdJKTVyAiHnd/GZWEpBIczNq2tiYCUU/IJafTJQ5j7fUaf4DHcVk3LY LeilnpLNqaaPaILqw33pFzLX5o6zFYHOJX6EwOA5m9jpGl+fyku8TE5UQd0YPZwQCFHk 0jkQXv0tSz/TkWnkgJXQzLbmGhzvd91FwV8SXsuOsP7kQdF6YqC8sXklBbXiF4abLGoX iOlnU2ixMHPZGM9pa8I2u0N5TONzPojxhoIXXbiDa5kJ4SZmvlJjyW2eOL8CTVfj/4Pg qGRuqHoHsgqokJS4lZpGlQD0HEpnk6XGGxj51lE/U0IX4qOD3WBCYlrH0VEICghTzM3o C3BQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ewetNaUd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g9si1455116otq.68.2020.02.13.08.58.24; Thu, 13 Feb 2020 08:58:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ewetNaUd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728220AbgBMQ6F (ORCPT + 99 others); Thu, 13 Feb 2020 11:58:05 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:27284 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727873AbgBMQ6E (ORCPT ); Thu, 13 Feb 2020 11:58:04 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1581613083; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=CsVqx+kyd9Ql+81DltyyZW5LFYl4/Tko9hU86AVRTT8=; b=ewetNaUd9xZ4+f8xaBD5AZpU4rrdWnXpE1IdNpDBzscamGHAD1HtGpmSqjGH8Z/u6z5ldB +VE6uGgAPOl88L+D3FKXFNduL8i2i8avjs9uohHGoAp8BaePXVq/WUtBgCPKdxOacaAYAh I3tZsYSP776QGQdoUzHAJ03ohJS3lZE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-101-NpHPUwgpMoqKZVVBz_ghvg-1; Thu, 13 Feb 2020 11:57:56 -0500 X-MC-Unique: NpHPUwgpMoqKZVVBz_ghvg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F0BE48010DF; Thu, 13 Feb 2020 16:57:54 +0000 (UTC) Received: from segfault.boston.devel.redhat.com (segfault.boston.devel.redhat.com [10.19.60.26]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 00C2960C05; Thu, 13 Feb 2020 16:57:52 +0000 (UTC) From: Jeff Moyer To: Dan Williams Cc: linux-nvdimm@lists.01.org, "Aneesh Kumar K.V" , Benjamin Herrenschmidt , Paul Mackerras , vishal.l.verma@intel.com, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v2 1/4] mm/memremap_pages: Introduce memremap_compat_align() References: <158155489850.3343782.2687127373754434980.stgit@dwillia2-desk3.amr.corp.intel.com> <158155490379.3343782.10305190793306743949.stgit@dwillia2-desk3.amr.corp.intel.com> X-PGP-KeyID: 1F78E1B4 X-PGP-CertKey: F6FE 280D 8293 F72C 65FD 5A58 1FF8 A7CA 1F78 E1B4 Date: Thu, 13 Feb 2020 11:57:52 -0500 In-Reply-To: <158155490379.3343782.10305190793306743949.stgit@dwillia2-desk3.amr.corp.intel.com> (Dan Williams's message of "Wed, 12 Feb 2020 16:48:23 -0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dan Williams writes: > The "sub-section memory hotplug" facility allows memremap_pages() users > like libnvdimm to compensate for hardware platforms like x86 that have a > section size larger than their hardware memory mapping granularity. The > compensation that sub-section support affords is being tolerant of > physical memory resources shifting by units smaller (64MiB on x86) than > the memory-hotplug section size (128 MiB). Where the platform > physical-memory mapping granularity is limited by the number and > capability of address-decode-registers in the memory controller. > > While the sub-section support allows memremap_pages() to operate on > sub-section (2MiB) granularity, the Power architecture may still > require 16MiB alignment on "!radix_enabled()" platforms. > > In order for libnvdimm to be able to detect and manage this per-arch > limitation, introduce memremap_compat_align() as a common minimum > alignment across all driver-facing memory-mapping interfaces, and let > Power override it to 16MiB in the "!radix_enabled()" case. > > The assumption / requirement for 16MiB to be a viable > memremap_compat_align() value is that Power does not have platforms > where its equivalent of address-decode-registers never hardware remaps a > persistent memory resource on smaller than 16MiB boundaries. Note that I > tried my best to not add a new Kconfig symbol, but header include > entanglements defeated the #ifndef memremap_compat_align design pattern > and the need to export it defeats the __weak design pattern for arch > overrides. > > Based on an initial patch by Aneesh. I have just a couple of questions. First, can you please add a comment above the generic implementation of memremap_compat_align describing its purpose, and why a platform might want to override it? Second, I will take it at face value that the power architecture requires a 16MB alignment, but it's not clear to me why mmu_linear_psize was chosen to represent that. What's the relationship, there, and can we please have a comment explaining it? Thanks! Jeff > > Link: http://lore.kernel.org/r/CAPcyv4gBGNP95APYaBcsocEa50tQj9b5h__83vgngjq3ouGX_Q@mail.gmail.com > Reported-by: Aneesh Kumar K.V > Reported-by: Jeff Moyer > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Signed-off-by: Dan Williams > --- > arch/powerpc/Kconfig | 1 + > arch/powerpc/mm/ioremap.c | 12 ++++++++++++ > drivers/nvdimm/pfn_devs.c | 2 +- > include/linux/memremap.h | 8 ++++++++ > include/linux/mmzone.h | 1 + > lib/Kconfig | 3 +++ > mm/memremap.c | 13 +++++++++++++ > 7 files changed, 39 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 497b7d0b2d7e..e6ffe905e2b9 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -122,6 +122,7 @@ config PPC > select ARCH_HAS_GCOV_PROFILE_ALL > select ARCH_HAS_KCOV > select ARCH_HAS_HUGEPD if HUGETLB_PAGE > + select ARCH_HAS_MEMREMAP_COMPAT_ALIGN > select ARCH_HAS_MMIOWB if PPC64 > select ARCH_HAS_PHYS_TO_DMA > select ARCH_HAS_PMEM_API > diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c > index fc669643ce6a..38b5ba7d3e2d 100644 > --- a/arch/powerpc/mm/ioremap.c > +++ b/arch/powerpc/mm/ioremap.c > @@ -2,6 +2,7 @@ > > #include > #include > +#include > #include > #include > > @@ -97,3 +98,14 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long size, > > return NULL; > } > + > +#ifdef CONFIG_ZONE_DEVICE > +/* override of the generic version in mm/memremap.c */ > +unsigned long memremap_compat_align(void) > +{ > + if (radix_enabled()) > + return SUBSECTION_SIZE; > + return (1UL << mmu_psize_defs[mmu_linear_psize].shift); > +} > +EXPORT_SYMBOL_GPL(memremap_compat_align); > +#endif > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > index b94f7a7e94b8..a5c25cb87116 100644 > --- a/drivers/nvdimm/pfn_devs.c > +++ b/drivers/nvdimm/pfn_devs.c > @@ -750,7 +750,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) > start = nsio->res.start; > size = resource_size(&nsio->res); > npfns = PHYS_PFN(size - SZ_8K); > - align = max(nd_pfn->align, (1UL << SUBSECTION_SHIFT)); > + align = max(nd_pfn->align, SUBSECTION_SIZE); > end_trunc = start + size - ALIGN_DOWN(start + size, align); > if (nd_pfn->mode == PFN_MODE_PMEM) { > /* > diff --git a/include/linux/memremap.h b/include/linux/memremap.h > index 6fefb09af7c3..8af1cbd8f293 100644 > --- a/include/linux/memremap.h > +++ b/include/linux/memremap.h > @@ -132,6 +132,7 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn, > > unsigned long vmem_altmap_offset(struct vmem_altmap *altmap); > void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns); > +unsigned long memremap_compat_align(void); > #else > static inline void *devm_memremap_pages(struct device *dev, > struct dev_pagemap *pgmap) > @@ -165,6 +166,12 @@ static inline void vmem_altmap_free(struct vmem_altmap *altmap, > unsigned long nr_pfns) > { > } > + > +/* when memremap_pages() is disabled all archs can remap a single page */ > +static inline unsigned long memremap_compat_align(void) > +{ > + return PAGE_SIZE; > +} > #endif /* CONFIG_ZONE_DEVICE */ > > static inline void put_dev_pagemap(struct dev_pagemap *pgmap) > @@ -172,4 +179,5 @@ static inline void put_dev_pagemap(struct dev_pagemap *pgmap) > if (pgmap) > percpu_ref_put(pgmap->ref); > } > + > #endif /* _LINUX_MEMREMAP_H_ */ > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 462f6873905a..6b77f7239af5 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1170,6 +1170,7 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec) > #define SECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SECTION_MASK) > > #define SUBSECTION_SHIFT 21 > +#define SUBSECTION_SIZE (1UL << SUBSECTION_SHIFT) > > #define PFN_SUBSECTION_SHIFT (SUBSECTION_SHIFT - PAGE_SHIFT) > #define PAGES_PER_SUBSECTION (1UL << PFN_SUBSECTION_SHIFT) > diff --git a/lib/Kconfig b/lib/Kconfig > index 0cf875fd627c..17dbc7bd3895 100644 > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@ -618,6 +618,9 @@ config ARCH_HAS_PMEM_API > config MEMREGION > bool > > +config ARCH_HAS_MEMREMAP_COMPAT_ALIGN > + bool > + > # use memcpy to implement user copies for nommu architectures > config UACCESS_MEMCPY > bool > diff --git a/mm/memremap.c b/mm/memremap.c > index 09b5b7adc773..a6905d28fe91 100644 > --- a/mm/memremap.c > +++ b/mm/memremap.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -14,6 +15,18 @@ > > static DEFINE_XARRAY(pgmap_array); > > +/* > + * Minimum compatible alignment of the resource (start, end) across > + * memremap interfaces (i.e. memremap + memremap_pages) > + */ > +#ifndef CONFIG_ARCH_HAS_MEMREMAP_COMPAT_ALIGN > +unsigned long memremap_compat_align(void) > +{ > + return SUBSECTION_SIZE; > +} > +EXPORT_SYMBOL_GPL(memremap_compat_align); > +#endif > + > #ifdef CONFIG_DEV_PAGEMAP_OPS > DEFINE_STATIC_KEY_FALSE(devmap_managed_key); > EXPORT_SYMBOL(devmap_managed_key);