2020-03-07 08:43:47

by Baoquan He

[permalink] [raw]
Subject: [PATCH v3 0/7] mm/hotplug: Only use subsection map for VMEMMAP

Memory sub-section hotplug was added to fix the issue that nvdimm could
be mapped at non-section aligned starting address. A subsection map is
added into struct mem_section_usage to implement it.

However, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. It means
subsection map only makes sense when SPARSEMEM_VMEMMAP enabled. For the
classic sparse, subsection map is meaningless and confusing.

About the classic sparse which doesn't support subsection hotplug, Dan
said it's more because the effort and maintenance burden outweighs the
benefit. Besides, the current 64 bit ARCHes all enable
SPARSEMEM_VMEMMAP_ENABLE by default.

In this patchset, the patches 2~4 ares used to make sub-section map and the
relevant operation only available for VMEMMAP.

Patch 1 fixes a hot remove failure when the classic sparse is enabled.

Patches 5~7 are for document adding and doc/code cleanup.

Changelog

v2->v3:
David spotted a code bug in the old patch 1, the old local variable
subsection_map is invalid once ms->usage is resetting. Add a local
variable 'empty' to cache if subsection_map is empty or not.

Remove the kernel-doc comments for the newly added functions
fill_subsection_map() and clear_subsection_map(). Michal and David
suggested this.

Add a new static function is_subsection_map_empty() to check if the
handled section map is empty, but not return the value from
clear_subsection_map(). David suggested this.

Add document about only VMEMMAP supporting sub-section hotplug, and
check_pfn_span() gating the alignment and size. Michal help rephrase
the words.

v1->v2:
Move the hot remove fixing patch to the front so that people can
back port it to easier. Suggested by David.

Split the old patch which invalidate the sub-section map in
!VMEMMAP case into two patches, patch 4/7, and patch 6/7. This
makes patch reviewing easier. Suggested by David.

Take Wei Yang's fixing patch out to post alone, since it has been
reviewed and acked by people. Suggested by Andrew.

Fix a code comment mistake in the current patch 2/7. Found out by
Wei Yang during reviewing.

Baoquan He (7):
mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
mm/sparse.c: introduce new function fill_subsection_map()
mm/sparse.c: introduce a new function clear_subsection_map()
mm/sparse.c: only use subsection map in VMEMMAP case
mm/sparse.c: add note about only VMEMMAP supporting sub-section
support
mm/sparse.c: move subsection_map related codes together
mm/sparse.c: Use __get_free_pages() instead in
populate_section_memmap()

include/linux/mmzone.h | 2 +
mm/sparse.c | 159 +++++++++++++++++++++++++++--------------
2 files changed, 107 insertions(+), 54 deletions(-)

--
2.17.2


2020-03-07 08:43:47

by Baoquan He

[permalink] [raw]
Subject: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

In section_deactivate(), pfn_to_page() doesn't work any more after
ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
It caused hot remove failure:

kernel BUG at mm/page_alloc.c:4806!
invalid opcode: 0000 [#1] SMP PTI
CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:free_pages+0x85/0xa0
Call Trace:
__remove_pages+0x99/0xc0
arch_remove_memory+0x23/0x4d
try_remove_memory+0xc8/0x130
? walk_memory_blocks+0x72/0xa0
__remove_memory+0xa/0x11
acpi_memory_device_remove+0x72/0x100
acpi_bus_trim+0x55/0x90
acpi_device_hotplug+0x2eb/0x3d0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x1a7/0x370
worker_thread+0x30/0x380
? flush_rcu_work+0x30/0x30
kthread+0x112/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x35/0x40

Let's move the ->section_mem_map resetting after depopulate_section_memmap()
to fix it.

Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Baoquan He <[email protected]>
Cc: [email protected]
---
mm/sparse.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 42c18a38ffaa..1b50c15677d7 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
struct mem_section *ms = __pfn_to_section(pfn);
bool section_is_early = early_section(ms);
struct page *memmap = NULL;
+ bool empty = false;
unsigned long *subsection_map = ms->usage
? &ms->usage->subsection_map[0] : NULL;

@@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
* For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
*/
bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
- if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
+ empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
+ if (empty) {
unsigned long section_nr = pfn_to_section_nr(pfn);

/*
@@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
- ms->section_mem_map = (unsigned long)NULL;
}

if (section_is_early && memmap)
free_map_bootmem(memmap);
else
depopulate_section_memmap(pfn, nr_pages, altmap);
+
+ if (empty)
+ ms->section_mem_map = (unsigned long)NULL;
}

static struct page * __meminit section_activate(int nid, unsigned long pfn,
--
2.17.2

2020-03-07 08:44:05

by Baoquan He

[permalink] [raw]
Subject: [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case

Currently, to support subsection aligned memory region adding for pmem,
subsection map is added to track which subsection is present.

However, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. It means
subsection map only makes sense when SPARSEMEM_VMEMMAP enabled. For the
classic sparse, subsection map is meaningless and confusing.

About the classic sparse which doesn't support subsection hotplug, Dan
said it's more because the effort and maintenance burden outweighs the
benefit. Besides, the current 64 bit ARCHes all enable
SPARSEMEM_VMEMMAP_ENABLE by default.

Combining the above reasons, no need to provide subsection map and the
relevant handling for the classic sparse. Handle it with this patch.

Signed-off-by: Baoquan He <[email protected]>
---
include/linux/mmzone.h | 2 ++
mm/sparse.c | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 42b77d3b68e8..f3f264826423 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1143,7 +1143,9 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
#define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK)

struct mem_section_usage {
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
+#endif
/* See declaration of similar field in struct zone */
unsigned long pageblock_flags[0];
};
diff --git a/mm/sparse.c b/mm/sparse.c
index d9dcd58d5c1d..2142045ab5c5 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -209,6 +209,7 @@ static inline unsigned long first_present_section_nr(void)
return next_present_section_nr(-1);
}

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
static void subsection_mask_set(unsigned long *map, unsigned long pfn,
unsigned long nr_pages)
{
@@ -243,6 +244,11 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
nr_pages -= pfns;
}
}
+#else
+void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
+{
+}
+#endif

/* Record a memory area against a node. */
void __init memory_present(int nid, unsigned long start, unsigned long end)
@@ -726,6 +732,7 @@ static void free_map_bootmem(struct page *memmap)
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
{
DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
@@ -753,6 +760,17 @@ static bool is_subsection_map_empty(struct mem_section *ms)
return bitmap_empty(&ms->usage->subsection_map[0],
SUBSECTIONS_PER_SECTION);
}
+#else
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+ return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+ return true;
+}
+#endif

static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
struct vmem_altmap *altmap)
@@ -809,6 +827,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
ms->section_mem_map = (unsigned long)NULL;
}

+#ifdef CONFIG_SPARSEMEM_VMEMMAP
static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
{
struct mem_section *ms = __pfn_to_section(pfn);
@@ -830,6 +849,12 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)

return rc;
}
+#else
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+ return 0;
+}
+#endif

static struct page * __meminit section_activate(int nid, unsigned long pfn,
unsigned long nr_pages, struct vmem_altmap *altmap)
--
2.17.2

2020-03-07 08:45:02

by Baoquan He

[permalink] [raw]
Subject: [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()

This removes the unnecessary goto, and simplify codes.

Signed-off-by: Baoquan He <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
---
mm/sparse.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index fde651ab8741..266f7f5040fb 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
struct page * __meminit populate_section_memmap(unsigned long pfn,
unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
{
- struct page *page, *ret;
+ struct page *ret;
unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;

- page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
- if (page)
- goto got_map_page;
+ ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
+ get_order(memmap_size));
+ if (ret)
+ return ret;

ret = vmalloc(memmap_size);
if (ret)
- goto got_map_ptr;
+ return ret;

return NULL;
-got_map_page:
- ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
-got_map_ptr:
-
- return ret;
}

static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
--
2.17.2

2020-03-07 08:45:05

by Baoquan He

[permalink] [raw]
Subject: [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support

And tell check_pfn_span() gating the porper alignment and size of
hot added memory region.

And also move the code comments from inside section_deactivate()
to being above it. The code comments are reasonable for the whole
function, and the moving makes code cleaner.

Signed-off-by: Baoquan He <[email protected]>
---
mm/sparse.c | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 2142045ab5c5..0fbd79c4ad81 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -772,6 +772,22 @@ static bool is_subsection_map_empty(struct mem_section *ms)
}
#endif

+/*
+ * To deactivate a memory region, there are 3 cases to handle across
+ * two configurations (SPARSEMEM_VMEMMAP={y,n}):
+ *
+ * 1. deactivation of a partial hot-added section (only possible in
+ * the SPARSEMEM_VMEMMAP=y case).
+ * a) section was present at memory init.
+ * b) section was hot-added post memory init.
+ * 2. deactivation of a complete hot-added section.
+ * 3. deactivation of a complete section from memory init.
+ *
+ * For 1, when subsection_map does not empty we will not be freeing the
+ * usage map, but still need to free the vmemmap range.
+ *
+ * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
+ */
static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
struct vmem_altmap *altmap)
{
@@ -784,23 +800,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
return;

empty = is_subsection_map_empty(ms);
- /*
- * There are 3 cases to handle across two configurations
- * (SPARSEMEM_VMEMMAP={y,n}):
- *
- * 1/ deactivation of a partial hot-added section (only possible
- * in the SPARSEMEM_VMEMMAP=y case).
- * a/ section was present at memory init
- * b/ section was hot-added post memory init
- * 2/ deactivation of a complete hot-added section
- * 3/ deactivation of a complete section from memory init
- *
- * For 1/, when subsection_map does not empty we will not be
- * freeing the usage map, but still need to free the vmemmap
- * range.
- *
- * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
- */
if (empty) {
unsigned long section_nr = pfn_to_section_nr(pfn);

@@ -907,6 +906,10 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
*
* This is only intended for hotplug.
*
+ * Note that only VMEMMAP supports sub-section aligned hotplug,
+ * the proper alignment and size are gated by check_pfn_span().
+ *
+ *
* Return:
* * 0 - On success.
* * -EEXIST - Section has been present.
--
2.17.2

2020-03-07 08:45:24

by Baoquan He

[permalink] [raw]
Subject: [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together

No functional change.

Signed-off-by: Baoquan He <[email protected]>
---
mm/sparse.c | 134 +++++++++++++++++++++++++---------------------------
1 file changed, 65 insertions(+), 69 deletions(-)

diff --git a/mm/sparse.c b/mm/sparse.c
index 0fbd79c4ad81..fde651ab8741 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -244,10 +244,75 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
nr_pages -= pfns;
}
}
+
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+ DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+ DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
+ struct mem_section *ms = __pfn_to_section(pfn);
+ unsigned long *subsection_map = ms->usage
+ ? &ms->usage->subsection_map[0] : NULL;
+
+ subsection_mask_set(map, pfn, nr_pages);
+ if (subsection_map)
+ bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
+
+ if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
+ "section already deactivated (%#lx + %ld)\n",
+ pfn, nr_pages))
+ return -EINVAL;
+
+ bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
+
+ return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+ return bitmap_empty(&ms->usage->subsection_map[0],
+ SUBSECTIONS_PER_SECTION);
+}
+
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+ struct mem_section *ms = __pfn_to_section(pfn);
+ DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
+ unsigned long *subsection_map;
+ int rc = 0;
+
+ subsection_mask_set(map, pfn, nr_pages);
+
+ subsection_map = &ms->usage->subsection_map[0];
+
+ if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
+ rc = -EINVAL;
+ else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
+ rc = -EEXIST;
+ else
+ bitmap_or(subsection_map, map, subsection_map,
+ SUBSECTIONS_PER_SECTION);
+
+ return rc;
+}
#else
void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
{
}
+
+static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+ return 0;
+}
+
+static bool is_subsection_map_empty(struct mem_section *ms)
+{
+ return true;
+}
+
+static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
+{
+ return 0;
+}
#endif

/* Record a memory area against a node. */
@@ -732,46 +797,6 @@ static void free_map_bootmem(struct page *memmap)
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */

-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
- DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
- DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
- struct mem_section *ms = __pfn_to_section(pfn);
- unsigned long *subsection_map = ms->usage
- ? &ms->usage->subsection_map[0] : NULL;
-
- subsection_mask_set(map, pfn, nr_pages);
- if (subsection_map)
- bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
-
- if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
- "section already deactivated (%#lx + %ld)\n",
- pfn, nr_pages))
- return -EINVAL;
-
- bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
-
- return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
- return bitmap_empty(&ms->usage->subsection_map[0],
- SUBSECTIONS_PER_SECTION);
-}
-#else
-static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
- return 0;
-}
-
-static bool is_subsection_map_empty(struct mem_section *ms)
-{
- return true;
-}
-#endif
-
/*
* To deactivate a memory region, there are 3 cases to handle across
* two configurations (SPARSEMEM_VMEMMAP={y,n}):
@@ -826,35 +851,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
ms->section_mem_map = (unsigned long)NULL;
}

-#ifdef CONFIG_SPARSEMEM_VMEMMAP
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
- struct mem_section *ms = __pfn_to_section(pfn);
- DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
- unsigned long *subsection_map;
- int rc = 0;
-
- subsection_mask_set(map, pfn, nr_pages);
-
- subsection_map = &ms->usage->subsection_map[0];
-
- if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
- rc = -EINVAL;
- else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
- rc = -EEXIST;
- else
- bitmap_or(subsection_map, map, subsection_map,
- SUBSECTIONS_PER_SECTION);
-
- return rc;
-}
-#else
-static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
-{
- return 0;
-}
-#endif
-
static struct page * __meminit section_activate(int nid, unsigned long pfn,
unsigned long nr_pages, struct vmem_altmap *altmap)
{
--
2.17.2

2020-03-07 11:57:07

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support

On 03/07/20 at 04:42pm, Baoquan He wrote:

Sorry, the subject should be:

mm/sparse.c: add note about only VMEMMAP supporting sub-section hotplug

> And tell check_pfn_span() gating the porper alignment and size of
> hot added memory region.
>
> And also move the code comments from inside section_deactivate()
> to being above it. The code comments are reasonable for the whole
> function, and the moving makes code cleaner.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> mm/sparse.c | 37 ++++++++++++++++++++-----------------
> 1 file changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 2142045ab5c5..0fbd79c4ad81 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -772,6 +772,22 @@ static bool is_subsection_map_empty(struct mem_section *ms)
> }
> #endif
>
> +/*
> + * To deactivate a memory region, there are 3 cases to handle across
> + * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> + *
> + * 1. deactivation of a partial hot-added section (only possible in
> + * the SPARSEMEM_VMEMMAP=y case).
> + * a) section was present at memory init.
> + * b) section was hot-added post memory init.
> + * 2. deactivation of a complete hot-added section.
> + * 3. deactivation of a complete section from memory init.
> + *
> + * For 1, when subsection_map does not empty we will not be freeing the
> + * usage map, but still need to free the vmemmap range.
> + *
> + * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
> + */
> static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> struct vmem_altmap *altmap)
> {
> @@ -784,23 +800,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> return;
>
> empty = is_subsection_map_empty(ms);
> - /*
> - * There are 3 cases to handle across two configurations
> - * (SPARSEMEM_VMEMMAP={y,n}):
> - *
> - * 1/ deactivation of a partial hot-added section (only possible
> - * in the SPARSEMEM_VMEMMAP=y case).
> - * a/ section was present at memory init
> - * b/ section was hot-added post memory init
> - * 2/ deactivation of a complete hot-added section
> - * 3/ deactivation of a complete section from memory init
> - *
> - * For 1/, when subsection_map does not empty we will not be
> - * freeing the usage map, but still need to free the vmemmap
> - * range.
> - *
> - * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> - */
> if (empty) {
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> @@ -907,6 +906,10 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
> *
> * This is only intended for hotplug.
> *
> + * Note that only VMEMMAP supports sub-section aligned hotplug,
> + * the proper alignment and size are gated by check_pfn_span().
> + *
> + *
> * Return:
> * * 0 - On success.
> * * -EEXIST - Section has been present.
> --
> 2.17.2
>

2020-03-07 21:01:14

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

On Sat, 7 Mar 2020 16:42:23 +0800 Baoquan He <[email protected]> wrote:

> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
>
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
> __remove_pages+0x99/0xc0
> arch_remove_memory+0x23/0x4d
> try_remove_memory+0xc8/0x130
> ? walk_memory_blocks+0x72/0xa0
> __remove_memory+0xa/0x11
> acpi_memory_device_remove+0x72/0x100
> acpi_bus_trim+0x55/0x90
> acpi_device_hotplug+0x2eb/0x3d0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x1a7/0x370
> worker_thread+0x30/0x380
> ? flush_rcu_work+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x35/0x40
>
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.

Thanks. I think I'll cherrypick this fix and shall await more
review/testing input on the rest of the series.

2020-03-07 22:57:42

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

On 03/07/20 at 12:59pm, Andrew Morton wrote:
> On Sat, 7 Mar 2020 16:42:23 +0800 Baoquan He <[email protected]> wrote:
>
> > In section_deactivate(), pfn_to_page() doesn't work any more after
> > ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> > It caused hot remove failure:
> >
> > kernel BUG at mm/page_alloc.c:4806!
> > invalid opcode: 0000 [#1] SMP PTI
> > CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> > RIP: 0010:free_pages+0x85/0xa0
> > Call Trace:
> > __remove_pages+0x99/0xc0
> > arch_remove_memory+0x23/0x4d
> > try_remove_memory+0xc8/0x130
> > ? walk_memory_blocks+0x72/0xa0
> > __remove_memory+0xa/0x11
> > acpi_memory_device_remove+0x72/0x100
> > acpi_bus_trim+0x55/0x90
> > acpi_device_hotplug+0x2eb/0x3d0
> > acpi_hotplug_work_fn+0x1a/0x30
> > process_one_work+0x1a7/0x370
> > worker_thread+0x30/0x380
> > ? flush_rcu_work+0x30/0x30
> > kthread+0x112/0x130
> > ? kthread_create_on_node+0x60/0x60
> > ret_from_fork+0x35/0x40
> >
> > Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> > to fix it.
>
> Thanks. I think I'll cherrypick this fix and shall await more
> review/testing input on the rest of the series.

Sure, thanks.

2020-03-09 08:58:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

On 07.03.20 09:42, Baoquan He wrote:
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
>
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
> __remove_pages+0x99/0xc0
> arch_remove_memory+0x23/0x4d
> try_remove_memory+0xc8/0x130
> ? walk_memory_blocks+0x72/0xa0
> __remove_memory+0xa/0x11
> acpi_memory_device_remove+0x72/0x100
> acpi_bus_trim+0x55/0x90
> acpi_device_hotplug+0x2eb/0x3d0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x1a7/0x370
> worker_thread+0x30/0x380
> ? flush_rcu_work+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x35/0x40
>
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <[email protected]>
> Cc: [email protected]
> ---
> mm/sparse.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> struct mem_section *ms = __pfn_to_section(pfn);
> bool section_is_early = early_section(ms);
> struct page *memmap = NULL;
> + bool empty = false;
> unsigned long *subsection_map = ms->usage
> ? &ms->usage->subsection_map[0] : NULL;
>
> @@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> */
> bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> - if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> + empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> + if (empty) {
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> /*
> @@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->usage = NULL;
> }
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> - ms->section_mem_map = (unsigned long)NULL;
> }
>
> if (section_is_early && memmap)
> free_map_bootmem(memmap);
> else
> depopulate_section_memmap(pfn, nr_pages, altmap);
> +
> + if (empty)
> + ms->section_mem_map = (unsigned long)NULL;
> }
>
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
>

Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb

2020-03-09 08:59:19

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

On 07.03.20 09:42, Baoquan He wrote:
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
>
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
> __remove_pages+0x99/0xc0
> arch_remove_memory+0x23/0x4d
> try_remove_memory+0xc8/0x130
> ? walk_memory_blocks+0x72/0xa0
> __remove_memory+0xa/0x11
> acpi_memory_device_remove+0x72/0x100
> acpi_bus_trim+0x55/0x90
> acpi_device_hotplug+0x2eb/0x3d0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x1a7/0x370
> worker_thread+0x30/0x380
> ? flush_rcu_work+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x35/0x40
>
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <[email protected]>
> Cc: [email protected]
> ---
> mm/sparse.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> struct mem_section *ms = __pfn_to_section(pfn);
> bool section_is_early = early_section(ms);
> struct page *memmap = NULL;
> + bool empty = false;

Oh, one NIT: no need to initialize empty to false.


--
Thanks,

David / dhildenb

2020-03-09 09:02:35

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 4/7] mm/sparse.c: only use subsection map in VMEMMAP case

On 07.03.20 09:42, Baoquan He wrote:
> Currently, to support subsection aligned memory region adding for pmem,
> subsection map is added to track which subsection is present.
>
> However, config ZONE_DEVICE depends on SPARSEMEM_VMEMMAP. It means
> subsection map only makes sense when SPARSEMEM_VMEMMAP enabled. For the
> classic sparse, subsection map is meaningless and confusing.
>
> About the classic sparse which doesn't support subsection hotplug, Dan
> said it's more because the effort and maintenance burden outweighs the
> benefit. Besides, the current 64 bit ARCHes all enable
> SPARSEMEM_VMEMMAP_ENABLE by default.
>
> Combining the above reasons, no need to provide subsection map and the
> relevant handling for the classic sparse. Handle it with this patch.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> include/linux/mmzone.h | 2 ++
> mm/sparse.c | 25 +++++++++++++++++++++++++
> 2 files changed, 27 insertions(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 42b77d3b68e8..f3f264826423 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1143,7 +1143,9 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
> #define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK)
>
> struct mem_section_usage {
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
> +#endif
> /* See declaration of similar field in struct zone */
> unsigned long pageblock_flags[0];
> };
> diff --git a/mm/sparse.c b/mm/sparse.c
> index d9dcd58d5c1d..2142045ab5c5 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -209,6 +209,7 @@ static inline unsigned long first_present_section_nr(void)
> return next_present_section_nr(-1);
> }
>
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> static void subsection_mask_set(unsigned long *map, unsigned long pfn,
> unsigned long nr_pages)
> {
> @@ -243,6 +244,11 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> nr_pages -= pfns;
> }
> }
> +#else
> +void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> +{
> +}
> +#endif
>
> /* Record a memory area against a node. */
> void __init memory_present(int nid, unsigned long start, unsigned long end)
> @@ -726,6 +732,7 @@ static void free_map_bootmem(struct page *memmap)
> }
> #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> {
> DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> @@ -753,6 +760,17 @@ static bool is_subsection_map_empty(struct mem_section *ms)
> return bitmap_empty(&ms->usage->subsection_map[0],
> SUBSECTIONS_PER_SECTION);
> }
> +#else
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> + return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> + return true;
> +}
> +#endif
>
> static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> struct vmem_altmap *altmap)
> @@ -809,6 +827,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->section_mem_map = (unsigned long)NULL;
> }
>
> +#ifdef CONFIG_SPARSEMEM_VMEMMAP
> static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> {
> struct mem_section *ms = __pfn_to_section(pfn);
> @@ -830,6 +849,12 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
>
> return rc;
> }
> +#else
> +static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> + return 0;
> +}
> +#endif
>
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
> unsigned long nr_pages, struct vmem_altmap *altmap)
>

Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb

2020-03-09 09:09:13

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together

On 07.03.20 09:42, Baoquan He wrote:
> No functional change.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> mm/sparse.c | 134 +++++++++++++++++++++++++---------------------------
> 1 file changed, 65 insertions(+), 69 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 0fbd79c4ad81..fde651ab8741 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -244,10 +244,75 @@ void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> nr_pages -= pfns;
> }
> }
> +
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> + DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> + DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> + struct mem_section *ms = __pfn_to_section(pfn);
> + unsigned long *subsection_map = ms->usage
> + ? &ms->usage->subsection_map[0] : NULL;
> +
> + subsection_mask_set(map, pfn, nr_pages);
> + if (subsection_map)
> + bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +
> + if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> + "section already deactivated (%#lx + %ld)\n",
> + pfn, nr_pages))
> + return -EINVAL;
> +
> + bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> +
> + return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> + return bitmap_empty(&ms->usage->subsection_map[0],
> + SUBSECTIONS_PER_SECTION);
> +}
> +
> +static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> + struct mem_section *ms = __pfn_to_section(pfn);
> + DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> + unsigned long *subsection_map;
> + int rc = 0;
> +
> + subsection_mask_set(map, pfn, nr_pages);
> +
> + subsection_map = &ms->usage->subsection_map[0];
> +
> + if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
> + rc = -EINVAL;
> + else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
> + rc = -EEXIST;
> + else
> + bitmap_or(subsection_map, map, subsection_map,
> + SUBSECTIONS_PER_SECTION);
> +
> + return rc;
> +}
> #else
> void __init subsection_map_init(unsigned long pfn, unsigned long nr_pages)
> {
> }
> +
> +static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> + return 0;
> +}
> +
> +static bool is_subsection_map_empty(struct mem_section *ms)
> +{
> + return true;
> +}
> +
> +static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> +{
> + return 0;
> +}
> #endif
>
> /* Record a memory area against a node. */
> @@ -732,46 +797,6 @@ static void free_map_bootmem(struct page *memmap)
> }
> #endif /* CONFIG_SPARSEMEM_VMEMMAP */
>
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> - DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> - DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> - struct mem_section *ms = __pfn_to_section(pfn);
> - unsigned long *subsection_map = ms->usage
> - ? &ms->usage->subsection_map[0] : NULL;
> -
> - subsection_mask_set(map, pfn, nr_pages);
> - if (subsection_map)
> - bitmap_and(tmp, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -
> - if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION),
> - "section already deactivated (%#lx + %ld)\n",
> - pfn, nr_pages))
> - return -EINVAL;
> -
> - bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> -
> - return 0;
> -}
> -
> -static bool is_subsection_map_empty(struct mem_section *ms)
> -{
> - return bitmap_empty(&ms->usage->subsection_map[0],
> - SUBSECTIONS_PER_SECTION);
> -}
> -#else
> -static int clear_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> - return 0;
> -}
> -
> -static bool is_subsection_map_empty(struct mem_section *ms)
> -{
> - return true;
> -}
> -#endif
> -
> /*
> * To deactivate a memory region, there are 3 cases to handle across
> * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> @@ -826,35 +851,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->section_mem_map = (unsigned long)NULL;
> }
>
> -#ifdef CONFIG_SPARSEMEM_VMEMMAP
> -static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> - struct mem_section *ms = __pfn_to_section(pfn);
> - DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> - unsigned long *subsection_map;
> - int rc = 0;
> -
> - subsection_mask_set(map, pfn, nr_pages);
> -
> - subsection_map = &ms->usage->subsection_map[0];
> -
> - if (bitmap_empty(map, SUBSECTIONS_PER_SECTION))
> - rc = -EINVAL;
> - else if (bitmap_intersects(map, subsection_map, SUBSECTIONS_PER_SECTION))
> - rc = -EEXIST;
> - else
> - bitmap_or(subsection_map, map, subsection_map,
> - SUBSECTIONS_PER_SECTION);
> -
> - return rc;
> -}
> -#else
> -static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
> -{
> - return 0;
> -}
> -#endif
> -
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
> unsigned long nr_pages, struct vmem_altmap *altmap)
> {
>

IMHO, we don't need this patch - but just my personal opinion. Change
itself looks good on a quick glance.

--
Thanks,

David / dhildenb

2020-03-09 10:13:54

by Pankaj Gupta

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

>
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
>
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
> __remove_pages+0x99/0xc0
> arch_remove_memory+0x23/0x4d
> try_remove_memory+0xc8/0x130
> ? walk_memory_blocks+0x72/0xa0
> __remove_memory+0xa/0x11
> acpi_memory_device_remove+0x72/0x100
> acpi_bus_trim+0x55/0x90
> acpi_device_hotplug+0x2eb/0x3d0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x1a7/0x370
> worker_thread+0x30/0x380
> ? flush_rcu_work+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x35/0x40
>
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <[email protected]>
> Cc: [email protected]
> ---
> mm/sparse.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> struct mem_section *ms = __pfn_to_section(pfn);
> bool section_is_early = early_section(ms);
> struct page *memmap = NULL;
> + bool empty = false;
> unsigned long *subsection_map = ms->usage
> ? &ms->usage->subsection_map[0] : NULL;
>
> @@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> */
> bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> - if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> + empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> + if (empty) {
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> /*
> @@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->usage = NULL;
> }
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> - ms->section_mem_map = (unsigned long)NULL;
> }
>
> if (section_is_early && memmap)
> free_map_bootmem(memmap);
> else
> depopulate_section_memmap(pfn, nr_pages, altmap);
> +
> + if (empty)
> + ms->section_mem_map = (unsigned long)NULL;
> }
>
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
> --

Reviewed-by: Pankaj Gupta <[email protected]>

> 2.17.2
>
>

2020-03-09 12:56:51

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

On Sat 07-03-20 16:42:23, Baoquan He wrote:
> In section_deactivate(), pfn_to_page() doesn't work any more after
> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> It caused hot remove failure:
>
> kernel BUG at mm/page_alloc.c:4806!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> RIP: 0010:free_pages+0x85/0xa0
> Call Trace:
> __remove_pages+0x99/0xc0
> arch_remove_memory+0x23/0x4d
> try_remove_memory+0xc8/0x130
> ? walk_memory_blocks+0x72/0xa0
> __remove_memory+0xa/0x11
> acpi_memory_device_remove+0x72/0x100
> acpi_bus_trim+0x55/0x90
> acpi_device_hotplug+0x2eb/0x3d0
> acpi_hotplug_work_fn+0x1a/0x30
> process_one_work+0x1a7/0x370
> worker_thread+0x30/0x380
> ? flush_rcu_work+0x30/0x30
> kthread+0x112/0x130
> ? kthread_create_on_node+0x60/0x60
> ret_from_fork+0x35/0x40
>
> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> to fix it.
>
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Signed-off-by: Baoquan He <[email protected]>
> Cc: [email protected]

Acked-by: Michal Hocko <[email protected]>

> ---
> mm/sparse.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 42c18a38ffaa..1b50c15677d7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> struct mem_section *ms = __pfn_to_section(pfn);
> bool section_is_early = early_section(ms);
> struct page *memmap = NULL;
> + bool empty = false;
> unsigned long *subsection_map = ms->usage
> ? &ms->usage->subsection_map[0] : NULL;
>
> @@ -764,7 +765,8 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> */
> bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION);
> - if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> + empty = bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION);
> + if (empty) {
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> /*
> @@ -779,13 +781,15 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> ms->usage = NULL;
> }
> memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
> - ms->section_mem_map = (unsigned long)NULL;
> }
>
> if (section_is_early && memmap)
> free_map_bootmem(memmap);
> else
> depopulate_section_memmap(pfn, nr_pages, altmap);
> +
> + if (empty)
> + ms->section_mem_map = (unsigned long)NULL;
> }
>
> static struct page * __meminit section_activate(int nid, unsigned long pfn,
> --
> 2.17.2
>

--
Michal Hocko
SUSE Labs

2020-03-09 13:19:32

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

On 03/09/20 at 09:58am, David Hildenbrand wrote:
> On 07.03.20 09:42, Baoquan He wrote:
> > In section_deactivate(), pfn_to_page() doesn't work any more after
> > ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
> > It caused hot remove failure:
> >
> > kernel BUG at mm/page_alloc.c:4806!
> > invalid opcode: 0000 [#1] SMP PTI
> > CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> > RIP: 0010:free_pages+0x85/0xa0
> > Call Trace:
> > __remove_pages+0x99/0xc0
> > arch_remove_memory+0x23/0x4d
> > try_remove_memory+0xc8/0x130
> > ? walk_memory_blocks+0x72/0xa0
> > __remove_memory+0xa/0x11
> > acpi_memory_device_remove+0x72/0x100
> > acpi_bus_trim+0x55/0x90
> > acpi_device_hotplug+0x2eb/0x3d0
> > acpi_hotplug_work_fn+0x1a/0x30
> > process_one_work+0x1a7/0x370
> > worker_thread+0x30/0x380
> > ? flush_rcu_work+0x30/0x30
> > kthread+0x112/0x130
> > ? kthread_create_on_node+0x60/0x60
> > ret_from_fork+0x35/0x40
> >
> > Let's move the ->section_mem_map resetting after depopulate_section_memmap()
> > to fix it.
> >
> > Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> > Signed-off-by: Baoquan He <[email protected]>
> > Cc: [email protected]
> > ---
> > mm/sparse.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index 42c18a38ffaa..1b50c15677d7 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> > struct mem_section *ms = __pfn_to_section(pfn);
> > bool section_is_early = early_section(ms);
> > struct page *memmap = NULL;
> > + bool empty = false;
>
> Oh, one NIT: no need to initialize empty to false.

Thanks for careful reviewing, David.

Not very sure about this, do you have a doc or discussion thread about
not initializing local variable? Maybe Andrew can help update it if this
is not suggested.

2020-03-09 13:23:28

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case

On 09.03.20 14:18, Baoquan He wrote:
> On 03/09/20 at 09:58am, David Hildenbrand wrote:
>> On 07.03.20 09:42, Baoquan He wrote:
>>> In section_deactivate(), pfn_to_page() doesn't work any more after
>>> ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case.
>>> It caused hot remove failure:
>>>
>>> kernel BUG at mm/page_alloc.c:4806!
>>> invalid opcode: 0000 [#1] SMP PTI
>>> CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
>>> Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>>> RIP: 0010:free_pages+0x85/0xa0
>>> Call Trace:
>>> __remove_pages+0x99/0xc0
>>> arch_remove_memory+0x23/0x4d
>>> try_remove_memory+0xc8/0x130
>>> ? walk_memory_blocks+0x72/0xa0
>>> __remove_memory+0xa/0x11
>>> acpi_memory_device_remove+0x72/0x100
>>> acpi_bus_trim+0x55/0x90
>>> acpi_device_hotplug+0x2eb/0x3d0
>>> acpi_hotplug_work_fn+0x1a/0x30
>>> process_one_work+0x1a7/0x370
>>> worker_thread+0x30/0x380
>>> ? flush_rcu_work+0x30/0x30
>>> kthread+0x112/0x130
>>> ? kthread_create_on_node+0x60/0x60
>>> ret_from_fork+0x35/0x40
>>>
>>> Let's move the ->section_mem_map resetting after depopulate_section_memmap()
>>> to fix it.
>>>
>>> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>>> Signed-off-by: Baoquan He <[email protected]>
>>> Cc: [email protected]
>>> ---
>>> mm/sparse.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index 42c18a38ffaa..1b50c15677d7 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -734,6 +734,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>> struct mem_section *ms = __pfn_to_section(pfn);
>>> bool section_is_early = early_section(ms);
>>> struct page *memmap = NULL;
>>> + bool empty = false;
>>
>> Oh, one NIT: no need to initialize empty to false.
>
> Thanks for careful reviewing, David.
>
> Not very sure about this, do you have a doc or discussion thread about
> not initializing local variable? Maybe Andrew can help update it if this
> is not suggested.

The general rule is to no initialize what will always be initialized
later. Compare with most other code in-tree - e.g., sparse_init_nid.

Makes the code usually easier to follow.

--
Thanks,

David / dhildenb

2020-03-09 13:42:57

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] mm/sparse.c: move subsection_map related codes together

On 03/09/20 at 10:08am, David Hildenbrand wrote:
> On 07.03.20 09:42, Baoquan He wrote:
> > No functional change.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > mm/sparse.c | 134 +++++++++++++++++++++++++---------------------------
> > 1 file changed, 65 insertions(+), 69 deletions(-)

>
> IMHO, we don't need this patch - but just my personal opinion. Change
> itself looks good on a quick glance.

I personally like seeing function set operating on one data structure
being put together. To me, I use vi+ctags+cscope to jump to called
funtion easily. When try to get a picture of a data and handling, e.g here
the subsection map and the relevant functions, putting them together is
better to understand code. I am also fine to discard this patch, no
patch has dependency on this one in this series, it's easy to not pick
it if no one like it.

2020-03-10 14:46:54

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support

On Sat 07-03-20 16:42:27, Baoquan He wrote:
> And tell check_pfn_span() gating the porper alignment and size of
> hot added memory region.
>
> And also move the code comments from inside section_deactivate()
> to being above it. The code comments are reasonable for the whole
> function, and the moving makes code cleaner.
>
> Signed-off-by: Baoquan He <[email protected]>

Acked-by: Michal Hocko <[email protected]>

I have glanced through other patches and they seem sane but I do not
have time to go deeper to give an ack. I like this one though because it
really makes the intention clearer.

> ---
> mm/sparse.c | 37 ++++++++++++++++++++-----------------
> 1 file changed, 20 insertions(+), 17 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 2142045ab5c5..0fbd79c4ad81 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -772,6 +772,22 @@ static bool is_subsection_map_empty(struct mem_section *ms)
> }
> #endif
>
> +/*
> + * To deactivate a memory region, there are 3 cases to handle across
> + * two configurations (SPARSEMEM_VMEMMAP={y,n}):
> + *
> + * 1. deactivation of a partial hot-added section (only possible in
> + * the SPARSEMEM_VMEMMAP=y case).
> + * a) section was present at memory init.
> + * b) section was hot-added post memory init.
> + * 2. deactivation of a complete hot-added section.
> + * 3. deactivation of a complete section from memory init.
> + *
> + * For 1, when subsection_map does not empty we will not be freeing the
> + * usage map, but still need to free the vmemmap range.
> + *
> + * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
> + */
> static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> struct vmem_altmap *altmap)
> {
> @@ -784,23 +800,6 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
> return;
>
> empty = is_subsection_map_empty(ms);
> - /*
> - * There are 3 cases to handle across two configurations
> - * (SPARSEMEM_VMEMMAP={y,n}):
> - *
> - * 1/ deactivation of a partial hot-added section (only possible
> - * in the SPARSEMEM_VMEMMAP=y case).
> - * a/ section was present at memory init
> - * b/ section was hot-added post memory init
> - * 2/ deactivation of a complete hot-added section
> - * 3/ deactivation of a complete section from memory init
> - *
> - * For 1/, when subsection_map does not empty we will not be
> - * freeing the usage map, but still need to free the vmemmap
> - * range.
> - *
> - * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified
> - */
> if (empty) {
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> @@ -907,6 +906,10 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
> *
> * This is only intended for hotplug.
> *
> + * Note that only VMEMMAP supports sub-section aligned hotplug,
> + * the proper alignment and size are gated by check_pfn_span().
> + *
> + *
> * Return:
> * * 0 - On success.
> * * -EEXIST - Section has been present.
> --
> 2.17.2
>

--
Michal Hocko
SUSE Labs

2020-03-10 14:57:22

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()

On Sat 07-03-20 16:42:29, Baoquan He wrote:
> This removes the unnecessary goto, and simplify codes.
>
> Signed-off-by: Baoquan He <[email protected]>
> Reviewed-by: Wei Yang <[email protected]>
> ---
> mm/sparse.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fde651ab8741..266f7f5040fb 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
> struct page * __meminit populate_section_memmap(unsigned long pfn,
> unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
> {
> - struct page *page, *ret;
> + struct page *ret;
> unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
>
> - page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
> - if (page)
> - goto got_map_page;
> + ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
> + get_order(memmap_size));
> + if (ret)
> + return ret;
>
> ret = vmalloc(memmap_size);
> if (ret)
> - goto got_map_ptr;
> + return ret;
>
> return NULL;
> -got_map_page:
> - ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
> -got_map_ptr:
> -
> - return ret;
> }

Boy this code is ugly. Is there any reason we cannot simply use
kvmalloc_array(PAGES_PER_SECTION, sizeof(struct page), GFP_KERNEL | __GFP_NOWARN)

And if we care about locality then go even one step further
kvmalloc_node(PAGES_PER_SECTION * sizeof(struct page), GFP_KERNEL | __GFP_NOWARN, nid)

--
Michal Hocko
SUSE Labs

2020-03-10 15:01:01

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()

On 10.03.20 15:56, Michal Hocko wrote:
> On Sat 07-03-20 16:42:29, Baoquan He wrote:
>> This removes the unnecessary goto, and simplify codes.
>>
>> Signed-off-by: Baoquan He <[email protected]>
>> Reviewed-by: Wei Yang <[email protected]>
>> ---
>> mm/sparse.c | 16 ++++++----------
>> 1 file changed, 6 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index fde651ab8741..266f7f5040fb 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
>> struct page * __meminit populate_section_memmap(unsigned long pfn,
>> unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
>> {
>> - struct page *page, *ret;
>> + struct page *ret;
>> unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
>>
>> - page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
>> - if (page)
>> - goto got_map_page;
>> + ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
>> + get_order(memmap_size));
>> + if (ret)
>> + return ret;
>>
>> ret = vmalloc(memmap_size);
>> if (ret)
>> - goto got_map_ptr;
>> + return ret;
>>
>> return NULL;
>> -got_map_page:
>> - ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
>> -got_map_ptr:
>> -
>> - return ret;
>> }
>
> Boy this code is ugly. Is there any reason we cannot simply use
> kvmalloc_array(PAGES_PER_SECTION, sizeof(struct page), GFP_KERNEL | __GFP_NOWARN)
>
> And if we care about locality then go even one step further
> kvmalloc_node(PAGES_PER_SECTION * sizeof(struct page), GFP_KERNEL | __GFP_NOWARN, nid)
>

Makes perfect sense to me.

--
Thanks,

David / dhildenb

2020-03-11 04:21:38

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v3 5/7] mm/sparse.c: add note about only VMEMMAP supporting sub-section support

On 03/10/20 at 03:46pm, Michal Hocko wrote:
> On Sat 07-03-20 16:42:27, Baoquan He wrote:
> > And tell check_pfn_span() gating the porper alignment and size of
> > hot added memory region.
> >
> > And also move the code comments from inside section_deactivate()
> > to being above it. The code comments are reasonable for the whole
> > function, and the moving makes code cleaner.
> >
> > Signed-off-by: Baoquan He <[email protected]>
>
> Acked-by: Michal Hocko <[email protected]>
>
> I have glanced through other patches and they seem sane but I do not
> have time to go deeper to give an ack. I like this one though because it
> really makes the intention clearer.

Thanks for your reviewing and providing ack on this patch.

I will post a new version to rebase on the top of patch 1 and its
appended fix, then address those concerns from David.

2020-03-11 09:33:15

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v3 7/7] mm/sparse.c: Use __get_free_pages() instead in populate_section_memmap()

On 03/10/20 at 03:56pm, Michal Hocko wrote:
> On Sat 07-03-20 16:42:29, Baoquan He wrote:
> > This removes the unnecessary goto, and simplify codes.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > Reviewed-by: Wei Yang <[email protected]>
> > ---
> > mm/sparse.c | 16 ++++++----------
> > 1 file changed, 6 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/sparse.c b/mm/sparse.c
> > index fde651ab8741..266f7f5040fb 100644
> > --- a/mm/sparse.c
> > +++ b/mm/sparse.c
> > @@ -735,23 +735,19 @@ static void free_map_bootmem(struct page *memmap)
> > struct page * __meminit populate_section_memmap(unsigned long pfn,
> > unsigned long nr_pages, int nid, struct vmem_altmap *altmap)
> > {
> > - struct page *page, *ret;
> > + struct page *ret;
> > unsigned long memmap_size = sizeof(struct page) * PAGES_PER_SECTION;
> >
> > - page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
> > - if (page)
> > - goto got_map_page;
> > + ret = (void*)__get_free_pages(GFP_KERNEL|__GFP_NOWARN,
> > + get_order(memmap_size));
> > + if (ret)
> > + return ret;
> >
> > ret = vmalloc(memmap_size);
> > if (ret)
> > - goto got_map_ptr;
> > + return ret;
> >
> > return NULL;
> > -got_map_page:
> > - ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
> > -got_map_ptr:
> > -
> > - return ret;
> > }
>
> Boy this code is ugly. Is there any reason we cannot simply use
> kvmalloc_array(PAGES_PER_SECTION, sizeof(struct page), GFP_KERNEL | __GFP_NOWARN)
>
> And if we care about locality then go even one step further
> kvmalloc_node(PAGES_PER_SECTION * sizeof(struct page), GFP_KERNEL | __GFP_NOWARN, nid)

Yes, this looks better. I will use this to make a new version. Thanks.