2020-11-30 03:32:29

by Anshuman Khandual

[permalink] [raw]
Subject: [RFC V2 0/3] mm/hotplug: Pre-validate the address range with platform

This series adds a mechanism allowing platforms to weigh in and prevalidate
incoming address range before proceeding further with the memory hotplug.
This helps prevent potential platform errors for the given address range,
down the hotplug call chain, which inevitably fails the hotplug itself.

This mechanism was suggested by David Hildenbrand during another discussion
with respect to a memory hotplug fix on arm64 platform.

https://lore.kernel.org/linux-arm-kernel/[email protected]/

This mechanism focuses on the addressibility aspect and not [sub] section
alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span()
have been left unchanged. Wondering if all these can still be unified in
an expanded memhp_range_allowed() check, that can be called from multiple
memory hot add and remove paths.

This series applies on v5.10-rc6 and has been slightly tested on arm64.
But looking for some early feedback here.

Changes in RFC V2:

Incorporated all review feedbacks from David.

- Added additional range check in __segment_load() on s390 which was lost
- Changed is_private init in pagemap_range()
- Moved the framework into mm/memory_hotplug.c
- Made arch_get_addressable_range() a __weak function
- Renamed arch_get_addressable_range() as arch_get_mappable_range()
- Callback arch_get_mappable_range() only handles range requiring linear mapping
- Merged multiple memhp_range_allowed() checks in register_memory_resource()
- Replaced WARN() with pr_warn() in memhp_range_allowed()
- Replaced error return code ERANGE with E2BIG

Changes in RFC V1:

https://lore.kernel.org/linux-mm/[email protected]/

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Anshuman Khandual (3):
mm/hotplug: Prevalidate the address range being added with platform
arm64/mm: Define arch_get_mappable_range()
s390/mm: Define arch_get_mappable_range()

arch/arm64/mm/mmu.c | 14 +++----
arch/s390/mm/extmem.c | 5 +++
arch/s390/mm/vmem.c | 13 ++++--
include/linux/memory_hotplug.h | 2 +
mm/memory_hotplug.c | 77 +++++++++++++++++++++++++---------
mm/memremap.c | 6 ++-
6 files changed, 84 insertions(+), 33 deletions(-)

--
2.20.1


2020-11-30 03:32:34

by Anshuman Khandual

[permalink] [raw]
Subject: [RFC V2 2/3] arm64/mm: Define arch_get_mappable_range()

This overrides arch_get_mappable_range() on arm64 platform which will be
used with recently added generic framework. It drops inside_linear_region()
and subsequent check in arch_add_memory() which are no longer required.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/mm/mmu.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ca692a815731..49ec8f2838f2 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1444,16 +1444,19 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
}

-static bool inside_linear_region(u64 start, u64 size)
+struct range arch_get_mappable_range(void)
{
+ struct range memhp_range;
+
/*
* Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
* accommodating both its ends but excluding PAGE_END. Max physical
* range which can be mapped inside this linear mapping range, must
* also be derived from its end points.
*/
- return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
- (start + size - 1) <= __pa(PAGE_END - 1);
+ memhp_range.start = __pa(_PAGE_OFFSET(vabits_actual));
+ memhp_range.end = __pa(PAGE_END - 1);
+ return memhp_range;
}

int arch_add_memory(int nid, u64 start, u64 size,
@@ -1461,11 +1464,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
{
int ret, flags = 0;

- if (!inside_linear_region(start, size)) {
- pr_err("[%llx %llx] is outside linear mapping region\n", start, start + size);
- return -EINVAL;
- }
-
if (rodata_full || debug_pagealloc_enabled())
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;

--
2.20.1

2020-11-30 03:32:43

by Anshuman Khandual

[permalink] [raw]
Subject: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()

This overrides arch_get_mappabble_range() on s390 platform and drops now
redundant similar check in vmem_add_mapping(). This compensates by adding
a new check __segment_load() to preserve the existing functionality.

Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/s390/mm/extmem.c | 5 +++++
arch/s390/mm/vmem.c | 13 +++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
index 5060956b8e7d..cc055a78f7b6 100644
--- a/arch/s390/mm/extmem.c
+++ b/arch/s390/mm/extmem.c
@@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
goto out_free_resource;
}

+ if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
+ rc = -ERANGE;
+ goto out_resource;
+ }
+
rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
if (rc)
goto out_resource;
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index b239f2ba93b0..06dddcc0ce06 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
mutex_unlock(&vmem_mutex);
}

+struct range arch_get_mappable_range(void)
+{
+ struct range memhp_range;
+
+ memhp_range.start = 0;
+ memhp_range.end = VMEM_MAX_PHYS;
+ return memhp_range;
+}
+
int vmem_add_mapping(unsigned long start, unsigned long size)
{
int ret;

- if (start + size > VMEM_MAX_PHYS ||
- start + size < start)
- return -ERANGE;
-
mutex_lock(&vmem_mutex);
ret = vmem_add_range(start, size);
if (ret)
--
2.20.1

2020-11-30 03:35:09

by Anshuman Khandual

[permalink] [raw]
Subject: [RFC V2 1/3] mm/hotplug: Prevalidate the address range being added with platform

This introduces memhp_range_allowed() which can be called in various memory
hotplug paths to prevalidate the address range which is being added, with
the platform. Then memhp_range_allowed() calls memhp_get_pluggable_range()
which provides applicable address range depending on whether linear mapping
is required or not. For ranges that require linear mapping, it calls a new
arch callback arch_get_mappable_range() which the platform can override. So
the new callback, in turn provides the platform an opportunity to configure
acceptable memory hotplug address ranges in case there are constraints.

This mechanism will help prevent platform specific errors deep down during
hotplug calls. This drops now redundant check_hotplug_memory_addressable()
check in __add_pages().

Cc: David Hildenbrand <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
include/linux/memory_hotplug.h | 2 +
mm/memory_hotplug.c | 77 +++++++++++++++++++++++++---------
mm/memremap.c | 6 ++-
3 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 551093b74596..047a711ab76a 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -70,6 +70,8 @@ typedef int __bitwise mhp_t;
*/
#define MEMHP_MERGE_RESOURCE ((__force mhp_t)BIT(0))

+bool memhp_range_allowed(u64 start, u64 size, bool need_mapping);
+
/*
* Extended parameters for memory hotplug:
* altmap: alternative allocator for memmap array (optional)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 63b2e46b6555..9dd9db01985d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -107,6 +107,9 @@ static struct resource *register_memory_resource(u64 start, u64 size,
if (strcmp(resource_name, "System RAM"))
flags |= IORESOURCE_SYSRAM_DRIVER_MANAGED;

+ if (!memhp_range_allowed(start, size, 1))
+ return ERR_PTR(-E2BIG);
+
/*
* Make sure value parsed from 'mem=' only restricts memory adding
* while booting, so that memory hotplug won't be impacted. Please
@@ -284,22 +287,6 @@ static int check_pfn_span(unsigned long pfn, unsigned long nr_pages,
return 0;
}

-static int check_hotplug_memory_addressable(unsigned long pfn,
- unsigned long nr_pages)
-{
- const u64 max_addr = PFN_PHYS(pfn + nr_pages) - 1;
-
- if (max_addr >> MAX_PHYSMEM_BITS) {
- const u64 max_allowed = (1ull << (MAX_PHYSMEM_BITS + 1)) - 1;
- WARN(1,
- "Hotplugged memory exceeds maximum addressable address, range=%#llx-%#llx, maximum=%#llx\n",
- (u64)PFN_PHYS(pfn), max_addr, max_allowed);
- return -E2BIG;
- }
-
- return 0;
-}
-
/*
* Reasonably generic function for adding memory. It is
* expected that archs that support memory hotplug will
@@ -317,10 +304,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
if (WARN_ON_ONCE(!params->pgprot.pgprot))
return -EINVAL;

- err = check_hotplug_memory_addressable(pfn, nr_pages);
- if (err)
- return err;
-
if (altmap) {
/*
* Validate altmap is within bounds of the total request
@@ -1824,4 +1807,58 @@ int offline_and_remove_memory(int nid, u64 start, u64 size)
return rc;
}
EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+
+/*
+ * Platforms should define arch_get_mappable_range() that provides
+ * maximum possible addressable physical memory range for which the
+ * linear mapping could be created. The platform returned address
+ * range must adhere to these following semantics.
+ *
+ * - range.start <= range.end
+ * - Range includes both end points [range.start..range.end]
+ *
+ * There is also a fallback definition provided here, allowing the
+ * entire possible physical address range in case any platform does
+ * not define arch_get_mappable_range().
+ */
+struct range __weak arch_get_mappable_range(void)
+{
+ struct range memhp_range = {
+ .start = 0UL,
+ .end = -1ULL,
+ };
+ return memhp_range;
+}
+
+static inline struct range memhp_get_pluggable_range(bool need_mapping)
+{
+ const u64 max_phys = (1ULL << (MAX_PHYSMEM_BITS + 1)) - 1;
+ struct range memhp_range;
+
+ if (need_mapping) {
+ memhp_range = arch_get_mappable_range();
+ if (memhp_range.start > max_phys) {
+ memhp_range.start = 0;
+ memhp_range.end = 0;
+ }
+ memhp_range.end = min_t(u64, memhp_range.end, max_phys);
+ } else {
+ memhp_range.start = 0;
+ memhp_range.end = max_phys;
+ }
+ return memhp_range;
+}
+
+bool memhp_range_allowed(u64 start, u64 size, bool need_mapping)
+{
+ struct range memhp_range = memhp_get_pluggable_range(need_mapping);
+ u64 end = start + size;
+
+ if (start < end && start >= memhp_range.start && (end - 1) <= memhp_range.end)
+ return true;
+
+ pr_warn("Hotplug memory [%#llx-%#llx] exceeds maximum addressable range [%#llx-%#llx]\n",
+ start, end, memhp_range.start, memhp_range.end);
+ return false;
+}
#endif /* CONFIG_MEMORY_HOTREMOVE */
diff --git a/mm/memremap.c b/mm/memremap.c
index 16b2fb482da1..26c1825756cc 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -185,6 +185,7 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref)
static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
int range_id, int nid)
{
+ const bool is_private = pgmap->type == MEMORY_DEVICE_PRIVATE;
struct range *range = &pgmap->ranges[range_id];
struct dev_pagemap *conflict_pgmap;
int error, is_ram;
@@ -230,6 +231,9 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
if (error)
goto err_pfn_remap;

+ if (!memhp_range_allowed(range->start, range_len(range), !is_private))
+ goto err_pfn_remap;
+
mem_hotplug_begin();

/*
@@ -243,7 +247,7 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
* the CPU, we do want the linear mapping and thus use
* arch_add_memory().
*/
- if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+ if (is_private) {
error = add_pages(nid, PHYS_PFN(range->start),
PHYS_PFN(range_len(range)), params);
} else {
--
2.20.1

2020-12-02 06:47:00

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC V2 0/3] mm/hotplug: Pre-validate the address range with platform



On 11/30/20 8:59 AM, Anshuman Khandual wrote:
> This series adds a mechanism allowing platforms to weigh in and prevalidate
> incoming address range before proceeding further with the memory hotplug.
> This helps prevent potential platform errors for the given address range,
> down the hotplug call chain, which inevitably fails the hotplug itself.
>
> This mechanism was suggested by David Hildenbrand during another discussion
> with respect to a memory hotplug fix on arm64 platform.
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> This mechanism focuses on the addressibility aspect and not [sub] section
> alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span()
> have been left unchanged. Wondering if all these can still be unified in
> an expanded memhp_range_allowed() check, that can be called from multiple
> memory hot add and remove paths.
>
> This series applies on v5.10-rc6 and has been slightly tested on arm64.
> But looking for some early feedback here.
>
> Changes in RFC V2:
>
> Incorporated all review feedbacks from David.
>
> - Added additional range check in __segment_load() on s390 which was lost
> - Changed is_private init in pagemap_range()
> - Moved the framework into mm/memory_hotplug.c
> - Made arch_get_addressable_range() a __weak function
> - Renamed arch_get_addressable_range() as arch_get_mappable_range()
> - Callback arch_get_mappable_range() only handles range requiring linear mapping
> - Merged multiple memhp_range_allowed() checks in register_memory_resource()
> - Replaced WARN() with pr_warn() in memhp_range_allowed()
> - Replaced error return code ERANGE with E2BIG

There is one build failure with MEMORY_HOTPLUG=y and MEMORY_HOTREMOVE=n.
There are warnings on arm64 and s390 platforms when built with W=1 due
to lack of prototypes required with -Wmissing-prototypes. I have fixed
all these problems for the next iteration when there is broad agreement
on the overall approach.

2020-12-02 09:26:03

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC V2 1/3] mm/hotplug: Prevalidate the address range being added with platform

On 30.11.20 04:29, Anshuman Khandual wrote:
> This introduces memhp_range_allowed() which can be called in various memory
> hotplug paths to prevalidate the address range which is being added, with
> the platform. Then memhp_range_allowed() calls memhp_get_pluggable_range()
> which provides applicable address range depending on whether linear mapping
> is required or not. For ranges that require linear mapping, it calls a new
> arch callback arch_get_mappable_range() which the platform can override. So
> the new callback, in turn provides the platform an opportunity to configure
> acceptable memory hotplug address ranges in case there are constraints.
>
> This mechanism will help prevent platform specific errors deep down during
> hotplug calls. This drops now redundant check_hotplug_memory_addressable()
> check in __add_pages().
>


[...]

> /*
> * Reasonably generic function for adding memory. It is
> * expected that archs that support memory hotplug will
> @@ -317,10 +304,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
> if (WARN_ON_ONCE(!params->pgprot.pgprot))
> return -EINVAL;
>
> - err = check_hotplug_memory_addressable(pfn, nr_pages);
> - if (err)
> - return err;
> -

I was wondering if we should add a VM_BUG_ON(!memhp_range_allowed())
here to make it clearer that callers are expected to check that first.
Maybe an other places as well (e.g., arch code where we remove the
original checks)

[...]


> #endif /* CONFIG_MEMORY_HOTREMOVE */
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 16b2fb482da1..26c1825756cc 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -185,6 +185,7 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref)
> static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
> int range_id, int nid)
> {
> + const bool is_private = pgmap->type == MEMORY_DEVICE_PRIVATE;
> struct range *range = &pgmap->ranges[range_id];
> struct dev_pagemap *conflict_pgmap;
> int error, is_ram;
> @@ -230,6 +231,9 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
> if (error)
> goto err_pfn_remap;
>
> + if (!memhp_range_allowed(range->start, range_len(range), !is_private))
> + goto err_pfn_remap;
> +
> mem_hotplug_begin();
>
> /*
> @@ -243,7 +247,7 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
> * the CPU, we do want the linear mapping and thus use
> * arch_add_memory().
> */
> - if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> + if (is_private) {
> error = add_pages(nid, PHYS_PFN(range->start),
> PHYS_PFN(range_len(range)), params);
> } else {
>

In general, LGTM.

--
Thanks,

David / dhildenb

2020-12-02 09:30:25

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC V2 2/3] arm64/mm: Define arch_get_mappable_range()

On 30.11.20 04:29, Anshuman Khandual wrote:
> This overrides arch_get_mappable_range() on arm64 platform which will be
> used with recently added generic framework. It drops inside_linear_region()
> and subsequent check in arch_add_memory() which are no longer required.
>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> arch/arm64/mm/mmu.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ca692a815731..49ec8f2838f2 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -1444,16 +1444,19 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
> free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
> }
>
> -static bool inside_linear_region(u64 start, u64 size)
> +struct range arch_get_mappable_range(void)
> {
> + struct range memhp_range;
> +
> /*
> * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
> * accommodating both its ends but excluding PAGE_END. Max physical
> * range which can be mapped inside this linear mapping range, must
> * also be derived from its end points.
> */
> - return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
> - (start + size - 1) <= __pa(PAGE_END - 1);
> + memhp_range.start = __pa(_PAGE_OFFSET(vabits_actual));
> + memhp_range.end = __pa(PAGE_END - 1);
> + return memhp_range;
> }
>
> int arch_add_memory(int nid, u64 start, u64 size,
> @@ -1461,11 +1464,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
> {
> int ret, flags = 0;
>
> - if (!inside_linear_region(start, size)) {
> - pr_err("[%llx %llx] is outside linear mapping region\n", start, start + size);
> - return -EINVAL;
> - }

As discussed, I think something like a VM_BUG_ON() here might makes
sense, indicating that we require the caller to validate upfront. Same
applies to the s390x variant.

Thanks!

> -
> if (rodata_full || debug_pagealloc_enabled())
> flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>
>


--
Thanks,

David / dhildenb

2020-12-02 12:20:26

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC V2 1/3] mm/hotplug: Prevalidate the address range being added with platform


On 12/2/20 2:50 PM, David Hildenbrand wrote:
> On 30.11.20 04:29, Anshuman Khandual wrote:
>> This introduces memhp_range_allowed() which can be called in various memory
>> hotplug paths to prevalidate the address range which is being added, with
>> the platform. Then memhp_range_allowed() calls memhp_get_pluggable_range()
>> which provides applicable address range depending on whether linear mapping
>> is required or not. For ranges that require linear mapping, it calls a new
>> arch callback arch_get_mappable_range() which the platform can override. So
>> the new callback, in turn provides the platform an opportunity to configure
>> acceptable memory hotplug address ranges in case there are constraints.
>>
>> This mechanism will help prevent platform specific errors deep down during
>> hotplug calls. This drops now redundant check_hotplug_memory_addressable()
>> check in __add_pages().
>>
>
>
> [...]
>
>> /*
>> * Reasonably generic function for adding memory. It is
>> * expected that archs that support memory hotplug will
>> @@ -317,10 +304,6 @@ int __ref __add_pages(int nid, unsigned long pfn, unsigned long nr_pages,
>> if (WARN_ON_ONCE(!params->pgprot.pgprot))
>> return -EINVAL;
>>
>> - err = check_hotplug_memory_addressable(pfn, nr_pages);
>> - if (err)
>> - return err;
>> -
>
> I was wondering if we should add a VM_BUG_ON(!memhp_range_allowed())
> here to make it clearer that callers are expected to check that first.
> Maybe an other places as well (e.g., arch code where we remove the
> original checks)

Makes sense, will add them.

>
> [...]
>
>
>> #endif /* CONFIG_MEMORY_HOTREMOVE */
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 16b2fb482da1..26c1825756cc 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -185,6 +185,7 @@ static void dev_pagemap_percpu_release(struct percpu_ref *ref)
>> static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
>> int range_id, int nid)
>> {
>> + const bool is_private = pgmap->type == MEMORY_DEVICE_PRIVATE;
>> struct range *range = &pgmap->ranges[range_id];
>> struct dev_pagemap *conflict_pgmap;
>> int error, is_ram;
>> @@ -230,6 +231,9 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
>> if (error)
>> goto err_pfn_remap;
>>
>> + if (!memhp_range_allowed(range->start, range_len(range), !is_private))
>> + goto err_pfn_remap;
>> +
>> mem_hotplug_begin();
>>
>> /*
>> @@ -243,7 +247,7 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
>> * the CPU, we do want the linear mapping and thus use
>> * arch_add_memory().
>> */
>> - if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
>> + if (is_private) {
>> error = add_pages(nid, PHYS_PFN(range->start),
>> PHYS_PFN(range_len(range)), params);
>> } else {
>>
>
> In general, LGTM.
>

Okay

2020-12-02 12:21:35

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC V2 2/3] arm64/mm: Define arch_get_mappable_range()



On 12/2/20 2:56 PM, David Hildenbrand wrote:
> On 30.11.20 04:29, Anshuman Khandual wrote:
>> This overrides arch_get_mappable_range() on arm64 platform which will be
>> used with recently added generic framework. It drops inside_linear_region()
>> and subsequent check in arch_add_memory() which are no longer required.
>>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: David Hildenbrand <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> arch/arm64/mm/mmu.c | 14 ++++++--------
>> 1 file changed, 6 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index ca692a815731..49ec8f2838f2 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1444,16 +1444,19 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
>> free_empty_tables(start, end, PAGE_OFFSET, PAGE_END);
>> }
>>
>> -static bool inside_linear_region(u64 start, u64 size)
>> +struct range arch_get_mappable_range(void)
>> {
>> + struct range memhp_range;
>> +
>> /*
>> * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
>> * accommodating both its ends but excluding PAGE_END. Max physical
>> * range which can be mapped inside this linear mapping range, must
>> * also be derived from its end points.
>> */
>> - return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
>> - (start + size - 1) <= __pa(PAGE_END - 1);
>> + memhp_range.start = __pa(_PAGE_OFFSET(vabits_actual));
>> + memhp_range.end = __pa(PAGE_END - 1);
>> + return memhp_range;
>> }
>>
>> int arch_add_memory(int nid, u64 start, u64 size,
>> @@ -1461,11 +1464,6 @@ int arch_add_memory(int nid, u64 start, u64 size,
>> {
>> int ret, flags = 0;
>>
>> - if (!inside_linear_region(start, size)) {
>> - pr_err("[%llx %llx] is outside linear mapping region\n", start, start + size);
>> - return -EINVAL;
>> - }
> As discussed, I think something like a VM_BUG_ON() here might makes
> sense, indicating that we require the caller to validate upfront. Same
> applies to the s390x variant.

Sure, will do.

>
> Thanks!
>

2020-12-02 20:37:18

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()

On Mon, Nov 30, 2020 at 08:59:52AM +0530, Anshuman Khandual wrote:
> This overrides arch_get_mappabble_range() on s390 platform and drops now
> redundant similar check in vmem_add_mapping(). This compensates by adding
> a new check __segment_load() to preserve the existing functionality.
>
> Cc: Heiko Carstens <[email protected]>
> Cc: Vasily Gorbik <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> arch/s390/mm/extmem.c | 5 +++++
> arch/s390/mm/vmem.c | 13 +++++++++----
> 2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
> index 5060956b8e7d..cc055a78f7b6 100644
> --- a/arch/s390/mm/extmem.c
> +++ b/arch/s390/mm/extmem.c
> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
> goto out_free_resource;
> }
>
> + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
> + rc = -ERANGE;
> + goto out_resource;
> + }
> +
> rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
> if (rc)
> goto out_resource;
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index b239f2ba93b0..06dddcc0ce06 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
> mutex_unlock(&vmem_mutex);
> }
>
> +struct range arch_get_mappable_range(void)
> +{
> + struct range memhp_range;
> +
> + memhp_range.start = 0;
> + memhp_range.end = VMEM_MAX_PHYS;
> + return memhp_range;
> +}
> +
> int vmem_add_mapping(unsigned long start, unsigned long size)
> {
> int ret;
>
> - if (start + size > VMEM_MAX_PHYS ||
> - start + size < start)
> - return -ERANGE;
> -

I really fail to see how this could be considered an improvement for
s390. Especially I do not like that the (central) range check is now
moved to the caller (__segment_load). Which would mean potential
additional future callers would have to duplicate that code as well.

2020-12-02 20:40:50

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC V2 0/3] mm/hotplug: Pre-validate the address range with platform

On Mon, Nov 30, 2020 at 08:59:49AM +0530, Anshuman Khandual wrote:
> This series adds a mechanism allowing platforms to weigh in and prevalidate
> incoming address range before proceeding further with the memory hotplug.
> This helps prevent potential platform errors for the given address range,
> down the hotplug call chain, which inevitably fails the hotplug itself.
>
> This mechanism was suggested by David Hildenbrand during another discussion
> with respect to a memory hotplug fix on arm64 platform.
>
> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> This mechanism focuses on the addressibility aspect and not [sub] section
> alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span()
> have been left unchanged. Wondering if all these can still be unified in
> an expanded memhp_range_allowed() check, that can be called from multiple
> memory hot add and remove paths.
>
> This series applies on v5.10-rc6 and has been slightly tested on arm64.
> But looking for some early feedback here.
>
> Changes in RFC V2:
>
> Incorporated all review feedbacks from David.
>
> - Added additional range check in __segment_load() on s390 which was lost
> - Changed is_private init in pagemap_range()
> - Moved the framework into mm/memory_hotplug.c
> - Made arch_get_addressable_range() a __weak function
> - Renamed arch_get_addressable_range() as arch_get_mappable_range()
> - Callback arch_get_mappable_range() only handles range requiring linear mapping
> - Merged multiple memhp_range_allowed() checks in register_memory_resource()
> - Replaced WARN() with pr_warn() in memhp_range_allowed()
> - Replaced error return code ERANGE with E2BIG
>
> Changes in RFC V1:
>
> https://lore.kernel.org/linux-mm/[email protected]/
>
> Cc: Heiko Carstens <[email protected]>
> Cc: Vasily Gorbik <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]

Btw. please use git send-email's --cc-cover option to make sure that
all patches of this series will be sent to all listed cc's.
I really dislike to receive only the cover-letter and maybe on patch
and then have to figure out where to find the rest.

Thanks :)

2020-12-03 00:17:23

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC V2 0/3] mm/hotplug: Pre-validate the address range with platform



On 12/3/20 2:05 AM, Heiko Carstens wrote:
> On Mon, Nov 30, 2020 at 08:59:49AM +0530, Anshuman Khandual wrote:
>> This series adds a mechanism allowing platforms to weigh in and prevalidate
>> incoming address range before proceeding further with the memory hotplug.
>> This helps prevent potential platform errors for the given address range,
>> down the hotplug call chain, which inevitably fails the hotplug itself.
>>
>> This mechanism was suggested by David Hildenbrand during another discussion
>> with respect to a memory hotplug fix on arm64 platform.
>>
>> https://lore.kernel.org/linux-arm-kernel/[email protected]/
>>
>> This mechanism focuses on the addressibility aspect and not [sub] section
>> alignment aspect. Hence check_hotplug_memory_range() and check_pfn_span()
>> have been left unchanged. Wondering if all these can still be unified in
>> an expanded memhp_range_allowed() check, that can be called from multiple
>> memory hot add and remove paths.
>>
>> This series applies on v5.10-rc6 and has been slightly tested on arm64.
>> But looking for some early feedback here.
>>
>> Changes in RFC V2:
>>
>> Incorporated all review feedbacks from David.
>>
>> - Added additional range check in __segment_load() on s390 which was lost
>> - Changed is_private init in pagemap_range()
>> - Moved the framework into mm/memory_hotplug.c
>> - Made arch_get_addressable_range() a __weak function
>> - Renamed arch_get_addressable_range() as arch_get_mappable_range()
>> - Callback arch_get_mappable_range() only handles range requiring linear mapping
>> - Merged multiple memhp_range_allowed() checks in register_memory_resource()
>> - Replaced WARN() with pr_warn() in memhp_range_allowed()
>> - Replaced error return code ERANGE with E2BIG
>>
>> Changes in RFC V1:
>>
>> https://lore.kernel.org/linux-mm/[email protected]/
>>
>> Cc: Heiko Carstens <[email protected]>
>> Cc: Vasily Gorbik <[email protected]>
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Ard Biesheuvel <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: David Hildenbrand <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>
> Btw. please use git send-email's --cc-cover option to make sure that
> all patches of this series will be sent to all listed cc's.
> I really dislike to receive only the cover-letter and maybe on patch
> and then have to figure out where to find the rest.

Okay, will ensure that.

>
> Thanks :)
>

2020-12-03 00:38:26

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()



On 12/3/20 2:02 AM, Heiko Carstens wrote:
> On Mon, Nov 30, 2020 at 08:59:52AM +0530, Anshuman Khandual wrote:
>> This overrides arch_get_mappabble_range() on s390 platform and drops now
>> redundant similar check in vmem_add_mapping(). This compensates by adding
>> a new check __segment_load() to preserve the existing functionality.
>>
>> Cc: Heiko Carstens <[email protected]>
>> Cc: Vasily Gorbik <[email protected]>
>> Cc: David Hildenbrand <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> arch/s390/mm/extmem.c | 5 +++++
>> arch/s390/mm/vmem.c | 13 +++++++++----
>> 2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>> index 5060956b8e7d..cc055a78f7b6 100644
>> --- a/arch/s390/mm/extmem.c
>> +++ b/arch/s390/mm/extmem.c
>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>> goto out_free_resource;
>> }
>>
>> + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>> + rc = -ERANGE;
>> + goto out_resource;
>> + }
>> +
>> rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>> if (rc)
>> goto out_resource;
>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>> index b239f2ba93b0..06dddcc0ce06 100644
>> --- a/arch/s390/mm/vmem.c
>> +++ b/arch/s390/mm/vmem.c
>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>> mutex_unlock(&vmem_mutex);
>> }
>>
>> +struct range arch_get_mappable_range(void)
>> +{
>> + struct range memhp_range;
>> +
>> + memhp_range.start = 0;
>> + memhp_range.end = VMEM_MAX_PHYS;
>> + return memhp_range;
>> +}
>> +
>> int vmem_add_mapping(unsigned long start, unsigned long size)
>> {
>> int ret;
>>
>> - if (start + size > VMEM_MAX_PHYS ||
>> - start + size < start)
>> - return -ERANGE;
>> -
>
> I really fail to see how this could be considered an improvement for
> s390. Especially I do not like that the (central) range check is now
> moved to the caller (__segment_load). Which would mean potential
> additional future callers would have to duplicate that code as well.

The physical range check is being moved to the generic hotplug code
via arch_get_mappable_range() instead, making the existing check in
vmem_add_mapping() redundant. Dropping the check there necessitates
adding back a similar check in __segment_load(). Otherwise there
will be a loss of functionality in terms of range check.

May be we could just keep this existing check in vmem_add_mapping()
as well in order avoid this movement but then it would be redundant
check in every hotplug path.

So I guess the choice is to either have redundant range checks in
all hotplug paths or future internal callers of vmem_add_mapping()
take care of the range check.

2020-12-03 12:00:14

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()

On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
> >> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
> >> index 5060956b8e7d..cc055a78f7b6 100644
> >> --- a/arch/s390/mm/extmem.c
> >> +++ b/arch/s390/mm/extmem.c
> >> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
> >> goto out_free_resource;
> >> }
> >>
> >> + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
> >> + rc = -ERANGE;
> >> + goto out_resource;
> >> + }
> >> +
> >> rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
> >> if (rc)
> >> goto out_resource;
> >> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> >> index b239f2ba93b0..06dddcc0ce06 100644
> >> --- a/arch/s390/mm/vmem.c
> >> +++ b/arch/s390/mm/vmem.c
> >> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
> >> mutex_unlock(&vmem_mutex);
> >> }
> >>
> >> +struct range arch_get_mappable_range(void)
> >> +{
> >> + struct range memhp_range;
> >> +
> >> + memhp_range.start = 0;
> >> + memhp_range.end = VMEM_MAX_PHYS;
> >> + return memhp_range;
> >> +}
> >> +
> >> int vmem_add_mapping(unsigned long start, unsigned long size)
> >> {
> >> int ret;
> >>
> >> - if (start + size > VMEM_MAX_PHYS ||
> >> - start + size < start)
> >> - return -ERANGE;
> >> -
> >
> > I really fail to see how this could be considered an improvement for
> > s390. Especially I do not like that the (central) range check is now
> > moved to the caller (__segment_load). Which would mean potential
> > additional future callers would have to duplicate that code as well.
>
> The physical range check is being moved to the generic hotplug code
> via arch_get_mappable_range() instead, making the existing check in
> vmem_add_mapping() redundant. Dropping the check there necessitates
> adding back a similar check in __segment_load(). Otherwise there
> will be a loss of functionality in terms of range check.
>
> May be we could just keep this existing check in vmem_add_mapping()
> as well in order avoid this movement but then it would be redundant
> check in every hotplug path.
>
> So I guess the choice is to either have redundant range checks in
> all hotplug paths or future internal callers of vmem_add_mapping()
> take care of the range check.

The problem I have with this current approach from an architecture
perspective: we end up having two completely different methods which
are doing the same and must be kept in sync. This might be obvious
looking at this patch, but I'm sure this will go out-of-sync (aka
broken) sooner or later.

Therefore I would really like to see a single method to do the range
checking. Maybe you could add a callback into architecture code, so
that such an architecture specific function could also be used
elsewhere. Dunno.

2020-12-03 12:04:59

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()

On 03.12.20 12:51, Heiko Carstens wrote:
> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>> --- a/arch/s390/mm/extmem.c
>>>> +++ b/arch/s390/mm/extmem.c
>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>> goto out_free_resource;
>>>> }
>>>>
>>>> + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>> + rc = -ERANGE;
>>>> + goto out_resource;
>>>> + }
>>>> +
>>>> rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>> if (rc)
>>>> goto out_resource;
>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>> --- a/arch/s390/mm/vmem.c
>>>> +++ b/arch/s390/mm/vmem.c
>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>> mutex_unlock(&vmem_mutex);
>>>> }
>>>>
>>>> +struct range arch_get_mappable_range(void)
>>>> +{
>>>> + struct range memhp_range;
>>>> +
>>>> + memhp_range.start = 0;
>>>> + memhp_range.end = VMEM_MAX_PHYS;
>>>> + return memhp_range;
>>>> +}
>>>> +
>>>> int vmem_add_mapping(unsigned long start, unsigned long size)
>>>> {
>>>> int ret;
>>>>
>>>> - if (start + size > VMEM_MAX_PHYS ||
>>>> - start + size < start)
>>>> - return -ERANGE;
>>>> -
>>>
>>> I really fail to see how this could be considered an improvement for
>>> s390. Especially I do not like that the (central) range check is now
>>> moved to the caller (__segment_load). Which would mean potential
>>> additional future callers would have to duplicate that code as well.
>>
>> The physical range check is being moved to the generic hotplug code
>> via arch_get_mappable_range() instead, making the existing check in
>> vmem_add_mapping() redundant. Dropping the check there necessitates
>> adding back a similar check in __segment_load(). Otherwise there
>> will be a loss of functionality in terms of range check.
>>
>> May be we could just keep this existing check in vmem_add_mapping()
>> as well in order avoid this movement but then it would be redundant
>> check in every hotplug path.
>>
>> So I guess the choice is to either have redundant range checks in
>> all hotplug paths or future internal callers of vmem_add_mapping()
>> take care of the range check.
>
> The problem I have with this current approach from an architecture
> perspective: we end up having two completely different methods which
> are doing the same and must be kept in sync. This might be obvious
> looking at this patch, but I'm sure this will go out-of-sync (aka
> broken) sooner or later.

Exactly, there should be one function only that was the whole idea of
arch_get_mappable_range().

>
> Therefore I would really like to see a single method to do the range
> checking. Maybe you could add a callback into architecture code, so
> that such an architecture specific function could also be used
> elsewhere. Dunno.
>

I think we can just switch to using "memhp_range_allowed()" here then
after implementing arch_get_mappable_range().

Doesn't hurt to double check in vmem_add_mapping() - especially to keep
extmem working without changes. At least for callers of memory hotplug
it's then clear which values actually won't fail deep down in arch code.

--
Thanks,

David / dhildenb

2020-12-07 04:43:54

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()



On 12/3/20 5:31 PM, David Hildenbrand wrote:
> On 03.12.20 12:51, Heiko Carstens wrote:
>> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>>> --- a/arch/s390/mm/extmem.c
>>>>> +++ b/arch/s390/mm/extmem.c
>>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>>> goto out_free_resource;
>>>>> }
>>>>>
>>>>> + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>>> + rc = -ERANGE;
>>>>> + goto out_resource;
>>>>> + }
>>>>> +
>>>>> rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>>> if (rc)
>>>>> goto out_resource;
>>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>>> --- a/arch/s390/mm/vmem.c
>>>>> +++ b/arch/s390/mm/vmem.c
>>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>>> mutex_unlock(&vmem_mutex);
>>>>> }
>>>>>
>>>>> +struct range arch_get_mappable_range(void)
>>>>> +{
>>>>> + struct range memhp_range;
>>>>> +
>>>>> + memhp_range.start = 0;
>>>>> + memhp_range.end = VMEM_MAX_PHYS;
>>>>> + return memhp_range;
>>>>> +}
>>>>> +
>>>>> int vmem_add_mapping(unsigned long start, unsigned long size)
>>>>> {
>>>>> int ret;
>>>>>
>>>>> - if (start + size > VMEM_MAX_PHYS ||
>>>>> - start + size < start)
>>>>> - return -ERANGE;
>>>>> -
>>>>
>>>> I really fail to see how this could be considered an improvement for
>>>> s390. Especially I do not like that the (central) range check is now
>>>> moved to the caller (__segment_load). Which would mean potential
>>>> additional future callers would have to duplicate that code as well.
>>>
>>> The physical range check is being moved to the generic hotplug code
>>> via arch_get_mappable_range() instead, making the existing check in
>>> vmem_add_mapping() redundant. Dropping the check there necessitates
>>> adding back a similar check in __segment_load(). Otherwise there
>>> will be a loss of functionality in terms of range check.
>>>
>>> May be we could just keep this existing check in vmem_add_mapping()
>>> as well in order avoid this movement but then it would be redundant
>>> check in every hotplug path.
>>>
>>> So I guess the choice is to either have redundant range checks in
>>> all hotplug paths or future internal callers of vmem_add_mapping()
>>> take care of the range check.
>>
>> The problem I have with this current approach from an architecture
>> perspective: we end up having two completely different methods which
>> are doing the same and must be kept in sync. This might be obvious
>> looking at this patch, but I'm sure this will go out-of-sync (aka
>> broken) sooner or later.
>
> Exactly, there should be one function only that was the whole idea of
> arch_get_mappable_range().
>
>>
>> Therefore I would really like to see a single method to do the range
>> checking. Maybe you could add a callback into architecture code, so
>> that such an architecture specific function could also be used
>> elsewhere. Dunno.
>>
>
> I think we can just switch to using "memhp_range_allowed()" here then
> after implementing arch_get_mappable_range().
>
> Doesn't hurt to double check in vmem_add_mapping() - especially to keep
> extmem working without changes. At least for callers of memory hotplug
> it's then clear which values actually won't fail deep down in arch code.

But there is a small problem here. memhp_range_allowed() is now defined
and available with CONFIG_MEMORY_HOTPLUG where as vmem_add_mapping() and
__segment_load() are generally available without any config dependency.
So if CONFIG_MEMORY_HOTPLUG is not enabled there will be a build failure
in vmem_add_mapping() for memhp_range_allowed() symbol.

We could just move VM_BUG_ON(!memhp_range_allowed(start, size, 1)) check
from vmem_add_mapping() to arch_add_memory() like on arm64 platform. But
then __segment_load() would need that additional new check to compensate
as proposed earlier.

Also leaving vmem_add_mapping() and __segment_load() unchanged will cause
the address range check to be called three times on the hotplug path i.e

1. register_memory_resource()
2. arch_add_memory()
3. vmem_add_mapping()

Moving memhp_range_allowed() check inside arch_add_memory() seems better
and consistent with arm64. Also in the future, any platform which choose
to override arch_get_mappable() will have this additional VM_BUG_ON() in
their arch_add_memory().

2020-12-07 09:07:57

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()

On 07.12.20 05:38, Anshuman Khandual wrote:
>
>
> On 12/3/20 5:31 PM, David Hildenbrand wrote:
>> On 03.12.20 12:51, Heiko Carstens wrote:
>>> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>>>> --- a/arch/s390/mm/extmem.c
>>>>>> +++ b/arch/s390/mm/extmem.c
>>>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>>>> goto out_free_resource;
>>>>>> }
>>>>>>
>>>>>> + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>>>> + rc = -ERANGE;
>>>>>> + goto out_resource;
>>>>>> + }
>>>>>> +
>>>>>> rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>>>> if (rc)
>>>>>> goto out_resource;
>>>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>>>> --- a/arch/s390/mm/vmem.c
>>>>>> +++ b/arch/s390/mm/vmem.c
>>>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>>>> mutex_unlock(&vmem_mutex);
>>>>>> }
>>>>>>
>>>>>> +struct range arch_get_mappable_range(void)
>>>>>> +{
>>>>>> + struct range memhp_range;
>>>>>> +
>>>>>> + memhp_range.start = 0;
>>>>>> + memhp_range.end = VMEM_MAX_PHYS;
>>>>>> + return memhp_range;
>>>>>> +}
>>>>>> +
>>>>>> int vmem_add_mapping(unsigned long start, unsigned long size)
>>>>>> {
>>>>>> int ret;
>>>>>>
>>>>>> - if (start + size > VMEM_MAX_PHYS ||
>>>>>> - start + size < start)
>>>>>> - return -ERANGE;
>>>>>> -
>>>>>
>>>>> I really fail to see how this could be considered an improvement for
>>>>> s390. Especially I do not like that the (central) range check is now
>>>>> moved to the caller (__segment_load). Which would mean potential
>>>>> additional future callers would have to duplicate that code as well.
>>>>
>>>> The physical range check is being moved to the generic hotplug code
>>>> via arch_get_mappable_range() instead, making the existing check in
>>>> vmem_add_mapping() redundant. Dropping the check there necessitates
>>>> adding back a similar check in __segment_load(). Otherwise there
>>>> will be a loss of functionality in terms of range check.
>>>>
>>>> May be we could just keep this existing check in vmem_add_mapping()
>>>> as well in order avoid this movement but then it would be redundant
>>>> check in every hotplug path.
>>>>
>>>> So I guess the choice is to either have redundant range checks in
>>>> all hotplug paths or future internal callers of vmem_add_mapping()
>>>> take care of the range check.
>>>
>>> The problem I have with this current approach from an architecture
>>> perspective: we end up having two completely different methods which
>>> are doing the same and must be kept in sync. This might be obvious
>>> looking at this patch, but I'm sure this will go out-of-sync (aka
>>> broken) sooner or later.
>>
>> Exactly, there should be one function only that was the whole idea of
>> arch_get_mappable_range().
>>
>>>
>>> Therefore I would really like to see a single method to do the range
>>> checking. Maybe you could add a callback into architecture code, so
>>> that such an architecture specific function could also be used
>>> elsewhere. Dunno.
>>>
>>
>> I think we can just switch to using "memhp_range_allowed()" here then
>> after implementing arch_get_mappable_range().
>>
>> Doesn't hurt to double check in vmem_add_mapping() - especially to keep
>> extmem working without changes. At least for callers of memory hotplug
>> it's then clear which values actually won't fail deep down in arch code.
>
> But there is a small problem here. memhp_range_allowed() is now defined
> and available with CONFIG_MEMORY_HOTPLUG where as vmem_add_mapping() and
> __segment_load() are generally available without any config dependency.
> So if CONFIG_MEMORY_HOTPLUG is not enabled there will be a build failure
> in vmem_add_mapping() for memhp_range_allowed() symbol.
>
> We could just move VM_BUG_ON(!memhp_range_allowed(start, size, 1)) check
> from vmem_add_mapping() to arch_add_memory() like on arm64 platform. But
> then __segment_load() would need that additional new check to compensate
> as proposed earlier.
>
> Also leaving vmem_add_mapping() and __segment_load() unchanged will cause
> the address range check to be called three times on the hotplug path i.e
>
> 1. register_memory_resource()
> 2. arch_add_memory()
> 3. vmem_add_mapping()
>
> Moving memhp_range_allowed() check inside arch_add_memory() seems better
> and consistent with arm64. Also in the future, any platform which choose
> to override arch_get_mappable() will have this additional VM_BUG_ON() in
> their arch_add_memory().

Yeah, it might not make sense to add these checks all over the place.
The important part is that

1. There is a check somewhere (and if it's deep down in arch code)
2. There is an obvious way for callers to find out what valid values are.


I guess it would be good enough to

a) Factor out getting arch ranges into arch_get_mappable_range()
b) Provide memhp_get_pluggable_range()

Both changes only make sense with an in-tree user. I'm planning on using
this functionality in virtio-mem code. I can pickup your patches, drop
the superfluous checks, and use it from virtio-mem code. Makese sense
(BTW, looks like we'll see aarch64 support for virtio-mem soon)?

--
Thanks,

David / dhildenb

2020-12-08 05:35:43

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()



On 12/7/20 2:33 PM, David Hildenbrand wrote:
> On 07.12.20 05:38, Anshuman Khandual wrote:
>>
>>
>> On 12/3/20 5:31 PM, David Hildenbrand wrote:
>>> On 03.12.20 12:51, Heiko Carstens wrote:
>>>> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>>>>> --- a/arch/s390/mm/extmem.c
>>>>>>> +++ b/arch/s390/mm/extmem.c
>>>>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>>>>> goto out_free_resource;
>>>>>>> }
>>>>>>>
>>>>>>> + if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>>>>> + rc = -ERANGE;
>>>>>>> + goto out_resource;
>>>>>>> + }
>>>>>>> +
>>>>>>> rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>>>>> if (rc)
>>>>>>> goto out_resource;
>>>>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>>>>> --- a/arch/s390/mm/vmem.c
>>>>>>> +++ b/arch/s390/mm/vmem.c
>>>>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>>>>> mutex_unlock(&vmem_mutex);
>>>>>>> }
>>>>>>>
>>>>>>> +struct range arch_get_mappable_range(void)
>>>>>>> +{
>>>>>>> + struct range memhp_range;
>>>>>>> +
>>>>>>> + memhp_range.start = 0;
>>>>>>> + memhp_range.end = VMEM_MAX_PHYS;
>>>>>>> + return memhp_range;
>>>>>>> +}
>>>>>>> +
>>>>>>> int vmem_add_mapping(unsigned long start, unsigned long size)
>>>>>>> {
>>>>>>> int ret;
>>>>>>>
>>>>>>> - if (start + size > VMEM_MAX_PHYS ||
>>>>>>> - start + size < start)
>>>>>>> - return -ERANGE;
>>>>>>> -
>>>>>>
>>>>>> I really fail to see how this could be considered an improvement for
>>>>>> s390. Especially I do not like that the (central) range check is now
>>>>>> moved to the caller (__segment_load). Which would mean potential
>>>>>> additional future callers would have to duplicate that code as well.
>>>>>
>>>>> The physical range check is being moved to the generic hotplug code
>>>>> via arch_get_mappable_range() instead, making the existing check in
>>>>> vmem_add_mapping() redundant. Dropping the check there necessitates
>>>>> adding back a similar check in __segment_load(). Otherwise there
>>>>> will be a loss of functionality in terms of range check.
>>>>>
>>>>> May be we could just keep this existing check in vmem_add_mapping()
>>>>> as well in order avoid this movement but then it would be redundant
>>>>> check in every hotplug path.
>>>>>
>>>>> So I guess the choice is to either have redundant range checks in
>>>>> all hotplug paths or future internal callers of vmem_add_mapping()
>>>>> take care of the range check.
>>>>
>>>> The problem I have with this current approach from an architecture
>>>> perspective: we end up having two completely different methods which
>>>> are doing the same and must be kept in sync. This might be obvious
>>>> looking at this patch, but I'm sure this will go out-of-sync (aka
>>>> broken) sooner or later.
>>>
>>> Exactly, there should be one function only that was the whole idea of
>>> arch_get_mappable_range().
>>>
>>>>
>>>> Therefore I would really like to see a single method to do the range
>>>> checking. Maybe you could add a callback into architecture code, so
>>>> that such an architecture specific function could also be used
>>>> elsewhere. Dunno.
>>>>
>>>
>>> I think we can just switch to using "memhp_range_allowed()" here then
>>> after implementing arch_get_mappable_range().
>>>
>>> Doesn't hurt to double check in vmem_add_mapping() - especially to keep
>>> extmem working without changes. At least for callers of memory hotplug
>>> it's then clear which values actually won't fail deep down in arch code.
>>
>> But there is a small problem here. memhp_range_allowed() is now defined
>> and available with CONFIG_MEMORY_HOTPLUG where as vmem_add_mapping() and
>> __segment_load() are generally available without any config dependency.
>> So if CONFIG_MEMORY_HOTPLUG is not enabled there will be a build failure
>> in vmem_add_mapping() for memhp_range_allowed() symbol.
>>
>> We could just move VM_BUG_ON(!memhp_range_allowed(start, size, 1)) check
>> from vmem_add_mapping() to arch_add_memory() like on arm64 platform. But
>> then __segment_load() would need that additional new check to compensate
>> as proposed earlier.
>>
>> Also leaving vmem_add_mapping() and __segment_load() unchanged will cause
>> the address range check to be called three times on the hotplug path i.e
>>
>> 1. register_memory_resource()
>> 2. arch_add_memory()
>> 3. vmem_add_mapping()
>>
>> Moving memhp_range_allowed() check inside arch_add_memory() seems better
>> and consistent with arm64. Also in the future, any platform which choose
>> to override arch_get_mappable() will have this additional VM_BUG_ON() in
>> their arch_add_memory().
>
> Yeah, it might not make sense to add these checks all over the place.
> The important part is that
>
> 1. There is a check somewhere (and if it's deep down in arch code)
> 2. There is an obvious way for callers to find out what valid values are.
>
>
> I guess it would be good enough to
>
> a) Factor out getting arch ranges into arch_get_mappable_range()
> b) Provide memhp_get_pluggable_range()

Have posted V1 earlier in the day which hopefully accommodates all previous
suggestions but otherwise do let me know if anything else still needs to be
improved upon.

https://lore.kernel.org/linux-mm/[email protected]/

>
> Both changes only make sense with an in-tree user. I'm planning on using
> this functionality in virtio-mem code. I can pickup your patches, drop
> the superfluous checks, and use it from virtio-mem code. Makese sense
> (BTW, looks like we'll see aarch64 support for virtio-mem soon)?

I have not been following virtio-mem closely. But is there something pending
on arm64 platform which prevents virtio-mem enablement ?

2020-12-08 08:42:24

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range()

>>
>> Both changes only make sense with an in-tree user. I'm planning on using
>> this functionality in virtio-mem code. I can pickup your patches, drop
>> the superfluous checks, and use it from virtio-mem code. Makese sense
>> (BTW, looks like we'll see aarch64 support for virtio-mem soon)?
>
> I have not been following virtio-mem closely. But is there something pending
> on arm64 platform which prevents virtio-mem enablement ?

Regarding enablement, I expect things to be working out of the box
mostly. Jonathan is currently doing some testing and wants to send a
simple unlock patch once done. [1]


Now, there are some things to improve in the future. virtio-mem
adds/removes individual Linux memory blocks and logically plugs/unplugs
MAX_ORDER - 1/pageblock_order pages inside Linux memory blocks.

1. memblock

On arm64 and powerpc, we create/delete memblocks when adding/removing
memory, which is suboptimal (and the code is quite fragile as we don't
handle errors ...). Hotplugged memory never has holes, so we can tweak
relevant code to not check via the memblock api.

For example, pfn_valid() only has to check for memblock_is_map_memory()
in case of !early_section() - otherwise it can just fallback to our
generic pfn_valid() function.

2. MAX_ORDER - 1 / pageblock_order

With 64k base pages, virtio-mem can only logically plug/unplug in 512MB
granularity, which is sub-optimal and inflexible. 4/2MB would be much
better - however this would require always using 2MB THP on arm64 (IIRC
via "cont" bits). Luckily, only some distributions use 64k base pages as
default nowadays ... :)

3. Section size

virtio-mem benefits from small section sizes. Currently, we have 1G.
With 4k base pages we could easily reduce it to something what x86 has
(128 MB) - and I remember discussions regarding that already in other
(IIRC NVDIMM / DIMM) context. Again, with 64k base pages we cannot go
below 512 MB right now.

[1] https://lkml.kernel.org/r/[email protected]

--
Thanks,

David / dhildenb