2021-06-02 11:12:01

by Max Gurtovoy

[permalink] [raw]
Subject: [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API

Hi all,
In hotplugged memory (from check_pfn_span function):
"
Disallow all operations smaller than a sub-section and only
allow operations smaller than a section for
SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
enforces a larger memory_block_size_bytes() granularity for
memory that will be marked online, so this check should only
fire for direct arch_{add,remove}_memory() users outside of
add_memory_resource()
"

This restriction will disqualify, for example, large NVMe CMBs that might have
non power of 2 number of pages (e.g. 32767 pages of 4KB). For these
devices, the CMB size will be rounded down from 0x7fff000 to 0x7e00000
but it's better than having un-mapped CMB.

If we all agree on the approach, this RFC can assist as-is to these NVMe
devices and other P2PMEM devices in the future and can considered for
the next merge window.

Max Gurtovoy (3):
mm,memory_hotplug: export mhp min alignment
PCI/P2PMEM: introduce pci_p2pdma_align_size API
nvme-pci: align CMB size according to P2PMEM alignment

drivers/nvme/host/pci.c | 7 +++++++
drivers/pci/p2pdma.c | 23 +++++++++++++++++++++++
include/linux/memory_hotplug.h | 5 +++++
include/linux/pci-p2pdma.h | 5 +++++
mm/memory_hotplug.c | 33 +++++++++++++++++++--------------
5 files changed, 59 insertions(+), 14 deletions(-)

--
2.18.1


2021-06-02 11:12:12

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 1/3] mm,memory_hotplug: export mhp min alignment

Hotplugged memory has alignmet restrictions. E.g, it disallows all
operations smaller than a sub-section and only allow operations smaller
than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
for mhp users.

Signed-off-by: Max Gurtovoy <[email protected]>
---
include/linux/memory_hotplug.h | 5 +++++
mm/memory_hotplug.c | 33 +++++++++++++++++++--------------
2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 28f32fd00fe9..c55a9049b11e 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -76,6 +76,7 @@ struct mhp_params {

bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
struct range mhp_get_pluggable_range(bool need_mapping);
+unsigned long mhp_get_min_align(void);

/*
* Zone resizing functions
@@ -248,6 +249,10 @@ void mem_hotplug_done(void);
___page; \
})

+static inline unsigned long mhp_get_min_align(void)
+{
+ return 0;
+}
static inline unsigned zone_span_seqbegin(struct zone *zone)
{
return 0;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9e86e9ee0a10..161bb6704a9b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -270,24 +270,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
}
#endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */

+/*
+ * Disallow all operations smaller than a sub-section and only
+ * allow operations smaller than a section for
+ * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
+ * enforces a larger memory_block_size_bytes() granularity for
+ * memory that will be marked online, so this check should only
+ * fire for direct arch_{add,remove}_memory() users outside of
+ * add_memory_resource().
+ */
+unsigned long mhp_get_min_align(void)
+{
+ if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
+ return PAGES_PER_SUBSECTION;
+ return PAGES_PER_SECTION;
+}
+EXPORT_SYMBOL_GPL(mhp_get_min_align);
+
+
static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
const char *reason)
{
- /*
- * Disallow all operations smaller than a sub-section and only
- * allow operations smaller than a section for
- * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
- * enforces a larger memory_block_size_bytes() granularity for
- * memory that will be marked online, so this check should only
- * fire for direct arch_{add,remove}_memory() users outside of
- * add_memory_resource().
- */
- unsigned long min_align;
+ unsigned long min_align = mhp_get_min_align();

- if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
- min_align = PAGES_PER_SUBSECTION;
- else
- min_align = PAGES_PER_SECTION;
if (!IS_ALIGNED(pfn, min_align) || !IS_ALIGNED(nr_pages, min_align)) {
WARN(1, "Misaligned __%s_pages min_align: %#lx start: %#lx end: %#lx\n",
reason, min_align, pfn, pfn + nr_pages - 1);
--
2.18.1

2021-06-02 11:13:37

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment

P2PMEM is using mhp framework to connect to the memory subsystem. In
case the CMB size is not compatible to mhp alignment, the CMB mapping
will fail. Use pci_p2pdma_align_size to align CMB size in order to
successfully map non-aligned original CMB.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/nvme/host/pci.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a29b170701fc..1197263b4cd0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1817,6 +1817,7 @@ static void nvme_map_cmb(struct nvme_dev *dev)
u64 size, offset;
resource_size_t bar_size;
struct pci_dev *pdev = to_pci_dev(dev->dev);
+ unsigned long nr_pages, min_align;
int bar;

if (dev->cmb_size)
@@ -1856,6 +1857,12 @@ static void nvme_map_cmb(struct nvme_dev *dev)
if (size > bar_size - offset)
size = bar_size - offset;

+ size = pci_p2pdma_align_size(size);
+ if (!size) {
+ dev_warn(dev->ctrl.device, "CMB size is 0 after alignment\n");
+ return;
+ }
+
if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
dev_warn(dev->ctrl.device,
"failed to register the CMB\n");
--
2.18.1

2021-06-02 11:13:41

by Max Gurtovoy

[permalink] [raw]
Subject: [PATCH 2/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API

The mhp layer has alignment restrictions that may cause a failure in
adding a large P2PMEM that has non-aligned number of pages. The new API
pci_p2pdma_align_size will align the P2PMEM size to be inline with the
mhp restrictions.

Signed-off-by: Max Gurtovoy <[email protected]>
---
drivers/pci/p2pdma.c | 23 +++++++++++++++++++++++
include/linux/pci-p2pdma.h | 5 +++++
2 files changed, 28 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 196382630363..d232bc4ada2b 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -220,6 +220,29 @@ int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
}
EXPORT_SYMBOL_GPL(pci_p2pdma_add_resource);

+/**
+ * pci_p2pdma_align_size - align p2p memory size
+ * @size: size of the initial memory to align before adding.
+ *
+ * The size will be aligned according to mph alignment rules. This function
+ * should be called before adding new p2pdma resource since non-aligned memory
+ * will not be added.
+ */
+size_t pci_p2pdma_align_size(size_t size)
+{
+ unsigned int min_align, nr_pages;
+
+ min_align = mhp_get_min_align();
+ nr_pages = size >> PAGE_SHIFT;
+ if (!IS_ALIGNED(nr_pages, min_align)) {
+ nr_pages = ALIGN_DOWN(nr_pages, min_align);
+ size = nr_pages << PAGE_SHIFT;
+ }
+
+ return size;
+}
+EXPORT_SYMBOL_GPL(pci_p2pdma_align_size);
+
/*
* Note this function returns the parent PCI device with a
* reference taken. It is the caller's responsibility to drop
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..33a8fce52bec 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -38,6 +38,7 @@ int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
bool *use_p2pdma);
ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev,
bool use_p2pdma);
+size_t pci_p2pdma_align_size(size_t size);
#else /* CONFIG_PCI_P2PDMA */
static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar,
size_t size, u64 offset)
@@ -105,6 +106,10 @@ static inline ssize_t pci_p2pdma_enable_show(char *page,
{
return sprintf(page, "none\n");
}
+static inline size_t pci_p2pdma_align_size(size_t size)
+{
+ return 0;
+}
#endif /* CONFIG_PCI_P2PDMA */


--
2.18.1

2021-06-02 12:12:37

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] PCI/P2PMEM: introduce pci_p2pdma_align_size API

On 02.06.21 13:10, Max Gurtovoy wrote:
> Hi all,
> In hotplugged memory (from check_pfn_span function):
> "
> Disallow all operations smaller than a sub-section and only
> allow operations smaller than a section for
> SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
> enforces a larger memory_block_size_bytes() granularity for
> memory that will be marked online, so this check should only
> fire for direct arch_{add,remove}_memory() users outside of
> add_memory_resource()
> "
>
> This restriction will disqualify, for example, large NVMe CMBs that might have
> non power of 2 number of pages (e.g. 32767 pages of 4KB). For these
> devices, the CMB size will be rounded down from 0x7fff000 to 0x7e00000
> but it's better than having un-mapped CMB.

Just some high-level questions:

A CMB is just a PCI BAR used for communicating with the device, to be
mapped into physical address space, right? I assume the relevant hotplug
code is:

drivers/pci/p2pdma.c: addr = devm_memremap_pages(&pdev->dev, pgmap);

correct?


Having a BAR span such weird sizes will most probably never be fully
supported. But if sub-sections work for you, great.



--
Thanks,

David / dhildenb

2021-06-02 12:16:30

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm,memory_hotplug: export mhp min alignment

On 02.06.21 13:10, Max Gurtovoy wrote:
> Hotplugged memory has alignmet restrictions. E.g, it disallows all
> operations smaller than a sub-section and only allow operations smaller
> than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
> for mhp users.
>
> Signed-off-by: Max Gurtovoy <[email protected]>
> ---
> include/linux/memory_hotplug.h | 5 +++++
> mm/memory_hotplug.c | 33 +++++++++++++++++++--------------
> 2 files changed, 24 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 28f32fd00fe9..c55a9049b11e 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -76,6 +76,7 @@ struct mhp_params {
>
> bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
> struct range mhp_get_pluggable_range(bool need_mapping);
> +unsigned long mhp_get_min_align(void);
>
> /*
> * Zone resizing functions
> @@ -248,6 +249,10 @@ void mem_hotplug_done(void);
> ___page; \
> })
>
> +static inline unsigned long mhp_get_min_align(void)
> +{
> + return 0;
> +}
> static inline unsigned zone_span_seqbegin(struct zone *zone)
> {
> return 0;
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9e86e9ee0a10..161bb6704a9b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -270,24 +270,29 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> }
> #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>
> +/*
> + * Disallow all operations smaller than a sub-section and only
> + * allow operations smaller than a section for
> + * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
> + * enforces a larger memory_block_size_bytes() granularity for
> + * memory that will be marked online, so this check should only
> + * fire for direct arch_{add,remove}_memory() users outside of
> + * add_memory_resource().
> + */
> +unsigned long mhp_get_min_align(void)
> +{
> + if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
> + return PAGES_PER_SUBSECTION;
> + return PAGES_PER_SECTION;
> +}
> +EXPORT_SYMBOL_GPL(mhp_get_min_align);

We have to main interfaces to "hotplug" memory:

a) add_memory() and friends for System RAM, which have memory block
alignment requirements.

b) memremap_pages(), which has the alignemnt requirements you mention here.

I feel like what you need would better be exposed in mm/memremap.c, for
example, via "memremap_min_alignment" so it matches the "memremap_pages"
semantics.

And then, memremap_pages() is only available with CONFIG_ZONE_DEVICE,
which depends on SPARSEMEM_VMEMMAP. So you'll always have
PAGES_PER_SUBSECTION.

I can already spot "memremap_compat_align", maybe you can reuse that or
handle it accordingly in there?


--
Thanks,

David / dhildenb

2021-06-02 14:42:05

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment

On Wed, Jun 02, 2021 at 02:10:55PM +0300, Max Gurtovoy wrote:
> P2PMEM is using mhp framework to connect to the memory subsystem. In
> case the CMB size is not compatible to mhp alignment, the CMB mapping
> will fail. Use pci_p2pdma_align_size to align CMB size in order to
> successfully map non-aligned original CMB.
>
> Signed-off-by: Max Gurtovoy <[email protected]>
> ---
> drivers/nvme/host/pci.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a29b170701fc..1197263b4cd0 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1817,6 +1817,7 @@ static void nvme_map_cmb(struct nvme_dev *dev)
> u64 size, offset;
> resource_size_t bar_size;
> struct pci_dev *pdev = to_pci_dev(dev->dev);
> + unsigned long nr_pages, min_align;

These new variables don't appear to be used anywhere.

> int bar;
>
> if (dev->cmb_size)
> @@ -1856,6 +1857,12 @@ static void nvme_map_cmb(struct nvme_dev *dev)
> if (size > bar_size - offset)
> size = bar_size - offset;
>
> + size = pci_p2pdma_align_size(size);
> + if (!size) {
> + dev_warn(dev->ctrl.device, "CMB size is 0 after alignment\n");
> + return;
> + }
> +
> if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
> dev_warn(dev->ctrl.device,
> "failed to register the CMB\n");
> --

2021-06-02 14:53:06

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 3/3] nvme-pci: align CMB size according to P2PMEM alignment


On 6/2/2021 5:39 PM, Keith Busch wrote:
> On Wed, Jun 02, 2021 at 02:10:55PM +0300, Max Gurtovoy wrote:
>> P2PMEM is using mhp framework to connect to the memory subsystem. In
>> case the CMB size is not compatible to mhp alignment, the CMB mapping
>> will fail. Use pci_p2pdma_align_size to align CMB size in order to
>> successfully map non-aligned original CMB.
>>
>> Signed-off-by: Max Gurtovoy <[email protected]>
>> ---
>> drivers/nvme/host/pci.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index a29b170701fc..1197263b4cd0 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -1817,6 +1817,7 @@ static void nvme_map_cmb(struct nvme_dev *dev)
>> u64 size, offset;
>> resource_size_t bar_size;
>> struct pci_dev *pdev = to_pci_dev(dev->dev);
>> + unsigned long nr_pages, min_align;
> These new variables don't appear to be used anywhere.

oh, I have some leftovers..

Sorry.


>
>> int bar;
>>
>> if (dev->cmb_size)
>> @@ -1856,6 +1857,12 @@ static void nvme_map_cmb(struct nvme_dev *dev)
>> if (size > bar_size - offset)
>> size = bar_size - offset;
>>
>> + size = pci_p2pdma_align_size(size);
>> + if (!size) {
>> + dev_warn(dev->ctrl.device, "CMB size is 0 after alignment\n");
>> + return;
>> + }
>> +
>> if (pci_p2pdma_add_resource(pdev, bar, size, offset)) {
>> dev_warn(dev->ctrl.device,
>> "failed to register the CMB\n");
>> --

2021-06-03 10:55:46

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm,memory_hotplug: export mhp min alignment


On 6/2/2021 3:14 PM, David Hildenbrand wrote:
> On 02.06.21 13:10, Max Gurtovoy wrote:
>> Hotplugged memory has alignmet restrictions. E.g, it disallows all
>> operations smaller than a sub-section and only allow operations smaller
>> than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
>> for mhp users.
>>
>> Signed-off-by: Max Gurtovoy <[email protected]>
>> ---
>>   include/linux/memory_hotplug.h |  5 +++++
>>   mm/memory_hotplug.c            | 33 +++++++++++++++++++--------------
>>   2 files changed, 24 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/memory_hotplug.h
>> b/include/linux/memory_hotplug.h
>> index 28f32fd00fe9..c55a9049b11e 100644
>> --- a/include/linux/memory_hotplug.h
>> +++ b/include/linux/memory_hotplug.h
>> @@ -76,6 +76,7 @@ struct mhp_params {
>>     bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
>>   struct range mhp_get_pluggable_range(bool need_mapping);
>> +unsigned long mhp_get_min_align(void);
>>     /*
>>    * Zone resizing functions
>> @@ -248,6 +249,10 @@ void mem_hotplug_done(void);
>>       ___page;                \
>>    })
>>   +static inline unsigned long mhp_get_min_align(void)
>> +{
>> +    return 0;
>> +}
>>   static inline unsigned zone_span_seqbegin(struct zone *zone)
>>   {
>>       return 0;
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 9e86e9ee0a10..161bb6704a9b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -270,24 +270,29 @@ void __init
>> register_page_bootmem_info_node(struct pglist_data *pgdat)
>>   }
>>   #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>   +/*
>> + * Disallow all operations smaller than a sub-section and only
>> + * allow operations smaller than a section for
>> + * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
>> + * enforces a larger memory_block_size_bytes() granularity for
>> + * memory that will be marked online, so this check should only
>> + * fire for direct arch_{add,remove}_memory() users outside of
>> + * add_memory_resource().
>> + */
>> +unsigned long mhp_get_min_align(void)
>> +{
>> +    if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>> +        return PAGES_PER_SUBSECTION;
>> +    return PAGES_PER_SECTION;
>> +}
>> +EXPORT_SYMBOL_GPL(mhp_get_min_align);
>
> We have to main interfaces to "hotplug" memory:
>
> a) add_memory() and friends for System RAM, which have memory block
> alignment requirements.
>
> b) memremap_pages(), which has the alignemnt requirements you mention
> here.
>
> I feel like what you need would better be exposed in mm/memremap.c,
> for example, via "memremap_min_alignment" so it matches the
> "memremap_pages" semantics.
>
> And then, memremap_pages() is only available with CONFIG_ZONE_DEVICE,
> which depends on SPARSEMEM_VMEMMAP. So you'll always have
> PAGES_PER_SUBSECTION.
>
> I can already spot "memremap_compat_align", maybe you can reuse that
> or handle it accordingly in there?

Yes I think that since subsection is aligned to PAGE_SIZE I can do:

size_t pci_p2pdma_align_size(size_t size)
{
        unsigned long min_align;

        min_align = memremap_compat_align();
        if (!IS_ALIGNED(size, min_align))
                return ALIGN_DOWN(size, min_align);

        return size;
}


thoughts ?

>

2021-06-21 16:14:40

by Max Gurtovoy

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm,memory_hotplug: export mhp min alignment

hi David,

do we have a conclusion for this series ?

Is the below suggestion accepted by the maintainers ?

I would like to send a new series before closing 5.14 merge window.

On 6/3/2021 1:52 PM, Max Gurtovoy wrote:
>
> On 6/2/2021 3:14 PM, David Hildenbrand wrote:
>> On 02.06.21 13:10, Max Gurtovoy wrote:
>>> Hotplugged memory has alignmet restrictions. E.g, it disallows all
>>> operations smaller than a sub-section and only allow operations smaller
>>> than a section for SPARSEMEM_VMEMMAP. Export the alignment restrictions
>>> for mhp users.
>>>
>>> Signed-off-by: Max Gurtovoy <[email protected]>
>>> ---
>>>   include/linux/memory_hotplug.h |  5 +++++
>>>   mm/memory_hotplug.c            | 33 +++++++++++++++++++--------------
>>>   2 files changed, 24 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/linux/memory_hotplug.h
>>> b/include/linux/memory_hotplug.h
>>> index 28f32fd00fe9..c55a9049b11e 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -76,6 +76,7 @@ struct mhp_params {
>>>     bool mhp_range_allowed(u64 start, u64 size, bool need_mapping);
>>>   struct range mhp_get_pluggable_range(bool need_mapping);
>>> +unsigned long mhp_get_min_align(void);
>>>     /*
>>>    * Zone resizing functions
>>> @@ -248,6 +249,10 @@ void mem_hotplug_done(void);
>>>       ___page;                \
>>>    })
>>>   +static inline unsigned long mhp_get_min_align(void)
>>> +{
>>> +    return 0;
>>> +}
>>>   static inline unsigned zone_span_seqbegin(struct zone *zone)
>>>   {
>>>       return 0;
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 9e86e9ee0a10..161bb6704a9b 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -270,24 +270,29 @@ void __init
>>> register_page_bootmem_info_node(struct pglist_data *pgdat)
>>>   }
>>>   #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
>>>   +/*
>>> + * Disallow all operations smaller than a sub-section and only
>>> + * allow operations smaller than a section for
>>> + * SPARSEMEM_VMEMMAP. Note that check_hotplug_memory_range()
>>> + * enforces a larger memory_block_size_bytes() granularity for
>>> + * memory that will be marked online, so this check should only
>>> + * fire for direct arch_{add,remove}_memory() users outside of
>>> + * add_memory_resource().
>>> + */
>>> +unsigned long mhp_get_min_align(void)
>>> +{
>>> +    if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP))
>>> +        return PAGES_PER_SUBSECTION;
>>> +    return PAGES_PER_SECTION;
>>> +}
>>> +EXPORT_SYMBOL_GPL(mhp_get_min_align);
>>
>> We have to main interfaces to "hotplug" memory:
>>
>> a) add_memory() and friends for System RAM, which have memory block
>> alignment requirements.
>>
>> b) memremap_pages(), which has the alignemnt requirements you mention
>> here.
>>
>> I feel like what you need would better be exposed in mm/memremap.c,
>> for example, via "memremap_min_alignment" so it matches the
>> "memremap_pages" semantics.
>>
>> And then, memremap_pages() is only available with CONFIG_ZONE_DEVICE,
>> which depends on SPARSEMEM_VMEMMAP. So you'll always have
>> PAGES_PER_SUBSECTION.
>>
>> I can already spot "memremap_compat_align", maybe you can reuse that
>> or handle it accordingly in there?
>
> Yes I think that since subsection is aligned to PAGE_SIZE I can do:
>
> size_t pci_p2pdma_align_size(size_t size)
> {
>         unsigned long min_align;
>
>         min_align = memremap_compat_align();
>         if (!IS_ALIGNED(size, min_align))
>                 return ALIGN_DOWN(size, min_align);
>
>         return size;
> }
>
>
> thoughts ?
>
>>
>
> _______________________________________________
> Linux-nvme mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-nvme