2023-11-20 07:24:16

by Sumanth Korikkar

[permalink] [raw]
Subject: [PATCH 0/3] mm/memory_hotplug: fixes for memory hotplug altmap support

Patch 1 adds the missing lock in memory hotplug operations,
ensuring that the mem_hotplug_lock is held during critical operations
like mhp_init_memmap_on_memory() and mhp_deinit_memmap_on_memory()

Patch 2 deals with error handling in add_memory_resource() and considers
the possibility of altmap support. This ensures proper deallocation of
struct pages, aligning with the allocation strategy.

Patch 3 relocates the vmem_altmap code to sparse-vmemmap.c, enabling the
utilization of vmem_altmap_free() and vmem_altmap_offset() without the
dependency on CONFIG_ZONE_DEVICE.

Changes since patch series "implement "memmap on memory" feature on
s390":
Feedback from David:
Patch 1:
* Fixed subject for memory hotplug lock.
* Added locking comments before
memory_block_online()/memory_block_offline().
Patch 2:
* Fixed Fixes-by commit.

lkp report:
Patch 3:
* when CONFIG_SPARSEMEM_VMEMMAP is disabled, resolve
undefined reference to `vmem_altmap_offset' error.
* Request for Reviewed-by once again due to the change in patch3.

Thank you

Sumanth Korikkar (3):
mm/memory_hotplug: add missing mem_hotplug_lock
mm/memory_hotplug: fix error handling in add_memory_resource()
mm: use vmem_altmap code without CONFIG_ZONE_DEVICE

drivers/base/memory.c | 18 +++++++++++++++---
include/linux/memremap.h | 12 ------------
include/linux/mm.h | 15 +++++++++++++++
mm/memory_hotplug.c | 15 +++++++--------
mm/memremap.c | 14 +-------------
mm/sparse-vmemmap.c | 13 +++++++++++++
6 files changed, 51 insertions(+), 36 deletions(-)

--
2.41.0


2023-11-20 07:24:27

by Sumanth Korikkar

[permalink] [raw]
Subject: [PATCH 3/3] mm: use vmem_altmap code without CONFIG_ZONE_DEVICE

vmem_altmap_free() and vmem_altmap_offset() could be utlized without
CONFIG_ZONE_DEVICE enabled. For example,
mm/memory_hotplug.c:__add_pages() relies on that. The altmap is no
longer restricted to ZONE_DEVICE handling, but instead depends on
CONFIG_SPARSEMEM_VMEMMAP.

When CONFIG_SPARSEMEM_VMEMMAP is disabled, these functions are defined
as inline stubs, ensuring compatibility with configurations that do not
use sparsemem vmemmap. Without it, lkp reported the following:

ld: arch/x86/mm/init_64.o: in function `remove_pagetable':
init_64.c:(.meminit.text+0xfc7): undefined reference to
`vmem_altmap_free'

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Reviewed-by: Gerald Schaefer <[email protected]>
Signed-off-by: Sumanth Korikkar <[email protected]>
---
include/linux/memremap.h | 12 ------------
include/linux/mm.h | 15 +++++++++++++++
mm/memremap.c | 14 +-------------
mm/sparse-vmemmap.c | 13 +++++++++++++
4 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index 1314d9c5f05b..744c830f4b13 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -196,8 +196,6 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
struct dev_pagemap *pgmap);
bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn);

-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
-void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
unsigned long memremap_compat_align(void);
#else
static inline void *devm_memremap_pages(struct device *dev,
@@ -228,16 +226,6 @@ static inline bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn)
return false;
}

-static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
-{
- return 0;
-}
-
-static inline void vmem_altmap_free(struct vmem_altmap *altmap,
- unsigned long nr_pfns)
-{
-}
-
/* when memremap_pages() is disabled all archs can remap a single page */
static inline unsigned long memremap_compat_align(void)
{
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf5d0b1b16f4..a90f939d5d57 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3786,6 +3786,21 @@ void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap);
#endif

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+unsigned long vmem_altmap_offset(struct vmem_altmap *altmap);
+void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns);
+#else
+static inline unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
+{
+ return 0;
+}
+
+static inline void vmem_altmap_free(struct vmem_altmap *altmap,
+ unsigned long nr_pfns)
+{
+}
+#endif
+
#define VMEMMAP_RESERVE_NR 2
#ifdef CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
static inline bool __vmemmap_can_optimize(struct vmem_altmap *altmap,
diff --git a/mm/memremap.c b/mm/memremap.c
index bee85560a243..9531faa92a7c 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -7,6 +7,7 @@
#include <linux/memremap.h>
#include <linux/pfn_t.h>
#include <linux/swap.h>
+#include <linux/mm.h>
#include <linux/mmzone.h>
#include <linux/swapops.h>
#include <linux/types.h>
@@ -422,19 +423,6 @@ void devm_memunmap_pages(struct device *dev, struct dev_pagemap *pgmap)
}
EXPORT_SYMBOL_GPL(devm_memunmap_pages);

-unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
-{
- /* number of pfns from base where pfn_to_page() is valid */
- if (altmap)
- return altmap->reserve + altmap->free;
- return 0;
-}
-
-void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns)
-{
- altmap->alloc -= nr_pfns;
-}
-
/**
* get_dev_pagemap() - take a new live reference on the dev_pagemap for @pfn
* @pfn: page frame number to lookup page_map
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index a2cbe44c48e1..bd1b9a137f93 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -46,6 +46,19 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
MEMBLOCK_ALLOC_ACCESSIBLE, node);
}

+unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
+{
+ /* number of pfns from base where pfn_to_page() is valid */
+ if (altmap)
+ return altmap->reserve + altmap->free;
+ return 0;
+}
+
+void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns)
+{
+ altmap->alloc -= nr_pfns;
+}
+
void * __meminit vmemmap_alloc_block(unsigned long size, int node)
{
/* If the main allocator is up use that, fallback to bootmem. */
--
2.41.0

2023-11-20 07:24:39

by Sumanth Korikkar

[permalink] [raw]
Subject: [PATCH 2/3] mm/memory_hotplug: fix error handling in add_memory_resource()

In add_memory_resource(), creation of memory block devices occurs after
successful call to arch_add_memory(). However, creation of memory block
devices could fail. In that case, arch_remove_memory() is called to
perform necessary cleanup.

Currently with or without altmap support, arch_remove_memory() is always
passed with altmap set to NULL during error handling. This leads to
freeing of struct pages using free_pages(), eventhough the allocation
might have been performed with altmap support via
altmap_alloc_block_buf().

Fix the error handling by passing altmap in arch_remove_memory(). This
ensures the following:
* When altmap is disabled, deallocation of the struct pages array occurs
via free_pages().
* When altmap is enabled, deallocation occurs via vmem_altmap_free().

Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Reviewed-by: Gerald Schaefer <[email protected]>
Signed-off-by: Sumanth Korikkar <[email protected]>
---
mm/memory_hotplug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c8238fc5edcb..4f476a970e84 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1458,7 +1458,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
/* create memory block devices after memory was added */
ret = create_memory_block_devices(start, size, params.altmap, group);
if (ret) {
- arch_remove_memory(start, size, NULL);
+ arch_remove_memory(start, size, params.altmap);
goto error_free;
}

--
2.41.0

2023-11-20 07:24:39

by Sumanth Korikkar

[permalink] [raw]
Subject: [PATCH 1/3] mm/memory_hotplug: add missing mem_hotplug_lock

From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock
in write mode to serialise memory hotplug (e.g. access to global/zone
variables).

mhp_(de)init_memmap_on_memory() functions can change zone stats and
struct page content, but they are currently called w/o the
mem_hotplug_lock.

When memory block is being offlined and when kmemleak goes through each
populated zone, the following theoretical race conditions could occur:
CPU 0: | CPU 1:
memory_offline() |
-> offline_pages() |
-> mem_hotplug_begin() |
... |
-> mem_hotplug_done() |
| kmemleak_scan()
| -> get_online_mems()
| ...
-> mhp_deinit_memmap_on_memory() |
[not protected by mem_hotplug_begin/done()]|
Marks memory section as offline, | Retrieves zone_start_pfn
poisons vmemmap struct pages and updates | and struct page members.
the zone related data |
| ...
| -> put_online_mems()

Fix this by ensuring mem_hotplug_lock is taken before performing
mhp_init_memmap_on_memory(). Also ensure that
mhp_deinit_memmap_on_memory() holds the lock.

online/offline_pages() are currently only called from
memory_block_online/offline(), so it is safe to move the locking there.

Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
Reviewed-by: Gerald Schaefer <[email protected]>
Signed-off-by: Sumanth Korikkar <[email protected]>
---
drivers/base/memory.c | 18 +++++++++++++++---
mm/memory_hotplug.c | 13 ++++++-------
2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f3b9a4d0fa3b..8a13babd826c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -180,6 +180,9 @@ static inline unsigned long memblk_nr_poison(struct memory_block *mem)
}
#endif

+/*
+ * Must acquire mem_hotplug_lock in write mode.
+ */
static int memory_block_online(struct memory_block *mem)
{
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
@@ -204,10 +207,11 @@ static int memory_block_online(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;

+ mem_hotplug_begin();
if (nr_vmemmap_pages) {
ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
if (ret)
- return ret;
+ goto out;
}

ret = online_pages(start_pfn + nr_vmemmap_pages,
@@ -215,7 +219,7 @@ static int memory_block_online(struct memory_block *mem)
if (ret) {
if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
- return ret;
+ goto out;
}

/*
@@ -227,9 +231,14 @@ static int memory_block_online(struct memory_block *mem)
nr_vmemmap_pages);

mem->zone = zone;
+out:
+ mem_hotplug_done();
return ret;
}

+/*
+ * Must acquire mem_hotplug_lock in write mode.
+ */
static int memory_block_offline(struct memory_block *mem)
{
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
@@ -247,6 +256,7 @@ static int memory_block_offline(struct memory_block *mem)
if (mem->altmap)
nr_vmemmap_pages = mem->altmap->free;

+ mem_hotplug_begin();
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn), mem->group,
-nr_vmemmap_pages);
@@ -258,13 +268,15 @@ static int memory_block_offline(struct memory_block *mem)
if (nr_vmemmap_pages)
adjust_present_page_count(pfn_to_page(start_pfn),
mem->group, nr_vmemmap_pages);
- return ret;
+ goto out;
}

if (nr_vmemmap_pages)
mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);

mem->zone = NULL;
+out:
+ mem_hotplug_done();
return ret;
}

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1b03f4ec6fd2..c8238fc5edcb 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1129,6 +1129,9 @@ void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages)
kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
}

+/*
+ * Must be called with mem_hotplug_lock in write mode.
+ */
int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
@@ -1149,7 +1152,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
!IS_ALIGNED(pfn + nr_pages, PAGES_PER_SECTION)))
return -EINVAL;

- mem_hotplug_begin();

/* associate pfn range with the zone */
move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
@@ -1208,7 +1210,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
writeback_set_ratelimit();

memory_notify(MEM_ONLINE, &arg);
- mem_hotplug_done();
return 0;

failed_addition:
@@ -1217,7 +1218,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
(((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
memory_notify(MEM_CANCEL_ONLINE, &arg);
remove_pfn_range_from_zone(zone, pfn, nr_pages);
- mem_hotplug_done();
return ret;
}

@@ -1863,6 +1863,9 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
return 0;
}

+/*
+ * Must be called with mem_hotplug_lock in write mode.
+ */
int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
struct zone *zone, struct memory_group *group)
{
@@ -1885,8 +1888,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
!IS_ALIGNED(start_pfn + nr_pages, PAGES_PER_SECTION)))
return -EINVAL;

- mem_hotplug_begin();
-
/*
* Don't allow to offline memory blocks that contain holes.
* Consequently, memory blocks with holes can never get onlined
@@ -2027,7 +2028,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,

memory_notify(MEM_OFFLINE, &arg);
remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
- mem_hotplug_done();
return 0;

failed_removal_isolated:
@@ -2042,7 +2042,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
(unsigned long long) start_pfn << PAGE_SHIFT,
((unsigned long long) end_pfn << PAGE_SHIFT) - 1,
reason);
- mem_hotplug_done();
return ret;
}

--
2.41.0

2023-11-20 08:57:43

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm/memory_hotplug: add missing mem_hotplug_lock

On 20.11.23 08:23, Sumanth Korikkar wrote:
> From Documentation/core-api/memory-hotplug.rst:
> When adding/removing/onlining/offlining memory or adding/removing
> heterogeneous/device memory, we should always hold the mem_hotplug_lock
> in write mode to serialise memory hotplug (e.g. access to global/zone
> variables).
>
> mhp_(de)init_memmap_on_memory() functions can change zone stats and
> struct page content, but they are currently called w/o the
> mem_hotplug_lock.
>
> When memory block is being offlined and when kmemleak goes through each
> populated zone, the following theoretical race conditions could occur:
> CPU 0: | CPU 1:
> memory_offline() |
> -> offline_pages() |
> -> mem_hotplug_begin() |
> ... |
> -> mem_hotplug_done() |
> | kmemleak_scan()
> | -> get_online_mems()
> | ...
> -> mhp_deinit_memmap_on_memory() |
> [not protected by mem_hotplug_begin/done()]|
> Marks memory section as offline, | Retrieves zone_start_pfn
> poisons vmemmap struct pages and updates | and struct page members.
> the zone related data |
> | ...
> | -> put_online_mems()
>
> Fix this by ensuring mem_hotplug_lock is taken before performing
> mhp_init_memmap_on_memory(). Also ensure that
> mhp_deinit_memmap_on_memory() holds the lock.
>
> online/offline_pages() are currently only called from
> memory_block_online/offline(), so it is safe to move the locking there.
>
> Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
> Reviewed-by: Gerald Schaefer <[email protected]>
> Signed-off-by: Sumanth Korikkar <[email protected]>
> ---

Acked-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb

2023-11-20 09:00:08

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm/memory_hotplug: fix error handling in add_memory_resource()

On 20.11.23 08:23, Sumanth Korikkar wrote:
> In add_memory_resource(), creation of memory block devices occurs after
> successful call to arch_add_memory(). However, creation of memory block
> devices could fail. In that case, arch_remove_memory() is called to
> perform necessary cleanup.
>
> Currently with or without altmap support, arch_remove_memory() is always
> passed with altmap set to NULL during error handling. This leads to
> freeing of struct pages using free_pages(), eventhough the allocation
> might have been performed with altmap support via
> altmap_alloc_block_buf().
>
> Fix the error handling by passing altmap in arch_remove_memory(). This
> ensures the following:
> * When altmap is disabled, deallocation of the struct pages array occurs
> via free_pages().
> * When altmap is enabled, deallocation occurs via vmem_altmap_free().
>
> Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range")
> Reviewed-by: Gerald Schaefer <[email protected]>
> Signed-off-by: Sumanth Korikkar <[email protected]>
> ---
> mm/memory_hotplug.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c8238fc5edcb..4f476a970e84 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1458,7 +1458,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
> /* create memory block devices after memory was added */
> ret = create_memory_block_devices(start, size, params.altmap, group);
> if (ret) {
> - arch_remove_memory(start, size, NULL);
> + arch_remove_memory(start, size, params.altmap);
> goto error_free;
> }
>

We should likely CC stable on this one

Acked-by: David Hildenbrand <[email protected]>

--
Cheers,

David / dhildenb

2023-11-20 09:08:42

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: use vmem_altmap code without CONFIG_ZONE_DEVICE

On 20.11.23 08:23, Sumanth Korikkar wrote:
> vmem_altmap_free() and vmem_altmap_offset() could be utlized without
> CONFIG_ZONE_DEVICE enabled. For example,
> mm/memory_hotplug.c:__add_pages() relies on that. The altmap is no
> longer restricted to ZONE_DEVICE handling, but instead depends on
> CONFIG_SPARSEMEM_VMEMMAP.
>
> When CONFIG_SPARSEMEM_VMEMMAP is disabled, these functions are defined
> as inline stubs, ensuring compatibility with configurations that do not
> use sparsemem vmemmap. Without it, lkp reported the following:
>
> ld: arch/x86/mm/init_64.o: in function `remove_pagetable':
> init_64.c:(.meminit.text+0xfc7): undefined reference to
> `vmem_altmap_free'
>
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
> Reviewed-by: Gerald Schaefer <[email protected]>
> Signed-off-by: Sumanth Korikkar <[email protected]>
> ---


[...]

>
> +unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
> +{
> + /* number of pfns from base where pfn_to_page() is valid */
> + if (altmap)
> + return altmap->reserve + altmap->free;
> + return 0;
> +}
> +
> +void vmem_altmap_free(struct vmem_altmap *altmap, unsigned long nr_pfns)
> +{
> + altmap->alloc -= nr_pfns;
> +}

What speaks against just moving them to the header instead? They surely
are tiny ...

--
Cheers,

David / dhildenb

2023-11-20 09:40:12

by Sumanth Korikkar

[permalink] [raw]
Subject: Re: [PATCH 3/3] mm: use vmem_altmap code without CONFIG_ZONE_DEVICE

On Mon, Nov 20, 2023 at 10:01:13AM +0100, David Hildenbrand wrote:
>
> What speaks against just moving them to the header instead? They surely are
> tiny ...
>
I followed the previous convention. Sure, will make it inline and put
those definitions in header file.

Thanks