2023-07-30 16:27:34

by Usama Arif

[permalink] [raw]
Subject: [v2 0/6] mm/memblock: Skip prep and initialization of struct pages freed later by HVO

If the region is for gigantic hugepages and if HVO is enabled, then those
struct pages which will be freed later by HVO don't need to be prepared and
initialized. This can save significant time when a large number of hugepages
are allocated at boot time.

For a 1G hugepage, this series avoid initialization and preparation of
262144 - 64 = 262080 struct pages per hugepage.

When tested on a 512G system (which can allocate max 500 1G hugepages), the
kexec-boot time with HVO and DEFERRED_STRUCT_PAGE_INIT enabled without this
patchseries to running init is 3.9 seconds. With this patch it is 1.2 seconds.
This represents an approximately 70% reduction in boot time and will
significantly reduce server downtime when using a large number of
gigantic pages.

Thanks,
Usama

[v1->v2]:
- (Mike Rapoport) Code quality improvements (function names, arguments,
comments).

[RFC->v1]:
- (Mike Rapoport) Change from passing hugepage_size in
memblock_alloc_try_nid_raw for skipping struct page initialization to
using MEMBLOCK_RSRV_NOINIT flag



Usama Arif (6):
mm: hugetlb: Skip prep of tail pages when HVO is enabled
mm: hugetlb_vmemmap: Use nid of the head page to reallocate it
memblock: pass memblock_type to memblock_setclr_flag
memblock: introduce MEMBLOCK_RSRV_NOINIT flag
mm: move allocation of gigantic hstates to the start of mm_core_init
mm: hugetlb: Skip initialization of struct pages freed later by HVO

include/linux/memblock.h | 9 +++++
mm/hugetlb.c | 71 +++++++++++++++++++++++++---------------
mm/hugetlb_vmemmap.c | 6 ++--
mm/hugetlb_vmemmap.h | 18 +++++++---
mm/internal.h | 9 +++++
mm/memblock.c | 45 +++++++++++++++++--------
mm/mm_init.c | 6 ++++
7 files changed, 118 insertions(+), 46 deletions(-)

--
2.25.1



2023-07-30 17:28:54

by Usama Arif

[permalink] [raw]
Subject: [v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO

This is done by marking the region for which to skip initialization
with the MEMBLOCK_RSRV_NOINIT flag.
If the region is for hugepages and if HVO is enabled, then those
struct pages which will be freed later don't need to be initialized.
This can save significant time when a large number of hugepages are
allocated at boot time. HUGETLB_VMEMMAP_RESERVE_SIZE
struct pages at the start of hugepage still need to be initialized.

Signed-off-by: Usama Arif <[email protected]>
---
mm/hugetlb.c | 21 +++++++++++++++++++++
mm/hugetlb_vmemmap.c | 2 +-
mm/hugetlb_vmemmap.h | 3 +++
3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bf60545496d7..8434100f60ae 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3151,6 +3151,9 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
{
struct huge_bootmem_page *m = NULL; /* initialize for clang */
int nr_nodes, node;
+ phys_addr_t hugetlb_vmemmap_reserve_size =
+ HUGETLB_VMEMMAP_RESERVE_SIZE * sizeof(struct page);
+ phys_addr_t noinit_base;

/* do node specific alloc */
if (nid != NUMA_NO_NODE) {
@@ -3158,6 +3161,15 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
0, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
if (!m)
return 0;
+
+ if (vmemmap_optimize_enabled && hugetlb_vmemmap_optimizable(h)) {
+ noinit_base = virt_to_phys(
+ (void *)((phys_addr_t) m + hugetlb_vmemmap_reserve_size));
+ memblock_reserved_mark_noinit(
+ noinit_base,
+ huge_page_size(h) - hugetlb_vmemmap_reserve_size);
+ }
+
goto found;
}
/* allocate from next node when distributing huge pages */
@@ -3172,6 +3184,15 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
*/
if (!m)
return 0;
+
+ if (vmemmap_optimize_enabled && hugetlb_vmemmap_optimizable(h)) {
+ noinit_base = virt_to_phys(
+ (void *)((phys_addr_t) m + hugetlb_vmemmap_reserve_size));
+ memblock_reserved_mark_noinit(
+ noinit_base,
+ huge_page_size(h) - hugetlb_vmemmap_reserve_size);
+ }
+
goto found;
}

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index bdf750a4786b..b5b7834e0f42 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -443,7 +443,7 @@ static int vmemmap_remap_alloc(unsigned long start, unsigned long end,
DEFINE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
EXPORT_SYMBOL(hugetlb_optimize_vmemmap_key);

-static bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON);
+bool vmemmap_optimize_enabled = IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON);
core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0);

/**
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index 3e7978a9af73..3fff6f611c19 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -64,4 +64,7 @@ static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h)
{
return hugetlb_vmemmap_optimizable_size(h) != 0;
}
+
+extern bool vmemmap_optimize_enabled;
+
#endif /* _LINUX_HUGETLB_VMEMMAP_H */
--
2.25.1


2023-07-30 17:32:32

by Usama Arif

[permalink] [raw]
Subject: [v2 3/6] memblock: pass memblock_type to memblock_setclr_flag

This allows setting flags to both memblock types and is in preparation for
setting flags (for e.g. to not initialize struct pages) on reserved
memory region.

Signed-off-by: Usama Arif <[email protected]>
---
mm/memblock.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index f9e61e565a53..43cb4404d94c 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -896,10 +896,9 @@ int __init_memblock memblock_physmem_add(phys_addr_t base, phys_addr_t size)
*
* Return: 0 on success, -errno on failure.
*/
-static int __init_memblock memblock_setclr_flag(phys_addr_t base,
- phys_addr_t size, int set, int flag)
+static int __init_memblock memblock_setclr_flag(struct memblock_type *type,
+ phys_addr_t base, phys_addr_t size, int set, int flag)
{
- struct memblock_type *type = &memblock.memory;
int i, ret, start_rgn, end_rgn;

ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn);
@@ -928,7 +927,7 @@ static int __init_memblock memblock_setclr_flag(phys_addr_t base,
*/
int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
{
- return memblock_setclr_flag(base, size, 1, MEMBLOCK_HOTPLUG);
+ return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_HOTPLUG);
}

/**
@@ -940,7 +939,7 @@ int __init_memblock memblock_mark_hotplug(phys_addr_t base, phys_addr_t size)
*/
int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
{
- return memblock_setclr_flag(base, size, 0, MEMBLOCK_HOTPLUG);
+ return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_HOTPLUG);
}

/**
@@ -957,7 +956,7 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)

system_has_some_mirror = true;

- return memblock_setclr_flag(base, size, 1, MEMBLOCK_MIRROR);
+ return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_MIRROR);
}

/**
@@ -977,7 +976,7 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
*/
int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
{
- return memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
+ return memblock_setclr_flag(&memblock.memory, base, size, 1, MEMBLOCK_NOMAP);
}

/**
@@ -989,7 +988,7 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
*/
int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
{
- return memblock_setclr_flag(base, size, 0, MEMBLOCK_NOMAP);
+ return memblock_setclr_flag(&memblock.memory, base, size, 0, MEMBLOCK_NOMAP);
}

static bool should_skip_region(struct memblock_type *type,
--
2.25.1


2023-07-30 17:37:06

by Usama Arif

[permalink] [raw]
Subject: [v2 1/6] mm: hugetlb: Skip prep of tail pages when HVO is enabled

When vmemmap is optimizable, it will free all the duplicated tail
pages in hugetlb_vmemmap_optimize while preparing the new hugepage.
Hence, there is no need to prepare them.

For 1G x86 hugepages, it avoids preparing
262144 - 64 = 262080 struct pages per hugepage.

The indirection of using __prep_compound_gigantic_folio is also removed,
as it just creates extra functions to indicate demote which can be done
with the argument.

Signed-off-by: Usama Arif <[email protected]>
---
mm/hugetlb.c | 32 ++++++++++++++------------------
mm/hugetlb_vmemmap.c | 2 +-
mm/hugetlb_vmemmap.h | 15 +++++++++++----
3 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 64a3239b6407..541c07b6d60f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1942,14 +1942,23 @@ static void prep_new_hugetlb_folio(struct hstate *h, struct folio *folio, int ni
spin_unlock_irq(&hugetlb_lock);
}

-static bool __prep_compound_gigantic_folio(struct folio *folio,
- unsigned int order, bool demote)
+static bool prep_compound_gigantic_folio(struct folio *folio, struct hstate *h, bool demote)
{
int i, j;
+ int order = huge_page_order(h);
int nr_pages = 1 << order;
struct page *p;

__folio_clear_reserved(folio);
+
+ /*
+ * No need to prep pages that will be freed later by hugetlb_vmemmap_optimize.
+ * Hence, reduce nr_pages to the pages that will be kept.
+ */
+ if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) &&
+ vmemmap_should_optimize(h, &folio->page))
+ nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);
+
for (i = 0; i < nr_pages; i++) {
p = folio_page(folio, i);

@@ -2019,18 +2028,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
return false;
}

-static bool prep_compound_gigantic_folio(struct folio *folio,
- unsigned int order)
-{
- return __prep_compound_gigantic_folio(folio, order, false);
-}
-
-static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
- unsigned int order)
-{
- return __prep_compound_gigantic_folio(folio, order, true);
-}
-
/*
* PageHuge() only returns true for hugetlbfs pages, but not for normal or
* transparent huge pages. See the PageTransHuge() documentation for more
@@ -2185,7 +2182,7 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
if (!folio)
return NULL;
if (hstate_is_gigantic(h)) {
- if (!prep_compound_gigantic_folio(folio, huge_page_order(h))) {
+ if (!prep_compound_gigantic_folio(folio, h, false)) {
/*
* Rare failure to convert pages to compound page.
* Free pages and try again - ONCE!
@@ -3201,7 +3198,7 @@ static void __init gather_bootmem_prealloc(void)

VM_BUG_ON(!hstate_is_gigantic(h));
WARN_ON(folio_ref_count(folio) != 1);
- if (prep_compound_gigantic_folio(folio, huge_page_order(h))) {
+ if (prep_compound_gigantic_folio(folio, h, false)) {
WARN_ON(folio_test_reserved(folio));
prep_new_hugetlb_folio(h, folio, folio_nid(folio));
free_huge_page(page); /* add to the hugepage allocator */
@@ -3624,8 +3621,7 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
subpage = folio_page(folio, i);
inner_folio = page_folio(subpage);
if (hstate_is_gigantic(target_hstate))
- prep_compound_gigantic_folio_for_demote(inner_folio,
- target_hstate->order);
+ prep_compound_gigantic_folio(inner_folio, target_hstate, true);
else
prep_compound_page(subpage, target_hstate->order);
folio_change_private(inner_folio, NULL);
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c2007ef5e9b0..b721e87de2b3 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -486,7 +486,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
}

/* Return true iff a HugeTLB whose vmemmap should and can be optimized. */
-static bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
+bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
{
if (!READ_ONCE(vmemmap_optimize_enabled))
return false;
diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index 25bd0e002431..3e7978a9af73 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -10,16 +10,17 @@
#define _LINUX_HUGETLB_VMEMMAP_H
#include <linux/hugetlb.h>

-#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
-int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
-void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
-
/*
* Reserve one vmemmap page, all vmemmap addresses are mapped to it. See
* Documentation/vm/vmemmap_dedup.rst.
*/
#define HUGETLB_VMEMMAP_RESERVE_SIZE PAGE_SIZE

+#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
+int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
+void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
+bool vmemmap_should_optimize(const struct hstate *h, const struct page *head);
+
static inline unsigned int hugetlb_vmemmap_size(const struct hstate *h)
{
return pages_per_huge_page(h) * sizeof(struct page);
@@ -51,6 +52,12 @@ static inline unsigned int hugetlb_vmemmap_optimizable_size(const struct hstate
{
return 0;
}
+
+static inline bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
+{
+ return false;
+}
+
#endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */

static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h)
--
2.25.1


2023-07-30 17:50:40

by Usama Arif

[permalink] [raw]
Subject: [v2 2/6] mm: hugetlb_vmemmap: Use nid of the head page to reallocate it

If tail page prep and initialization is skipped, then the "start"
page will not contain the correct nid. Use the nid from first
vmemap page.

Signed-off-by: Usama Arif <[email protected]>
---
mm/hugetlb_vmemmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index b721e87de2b3..bdf750a4786b 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -324,7 +324,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
.reuse_addr = reuse,
.vmemmap_pages = &vmemmap_pages,
};
- int nid = page_to_nid((struct page *)start);
+ int nid = page_to_nid((struct page *)reuse);
gfp_t gfp_mask = GFP_KERNEL | __GFP_THISNODE | __GFP_NORETRY |
__GFP_NOWARN;

--
2.25.1


2023-07-30 20:38:56

by kernel test robot

[permalink] [raw]
Subject: Re: [v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO

Hi Usama,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url: https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230730-231750
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230730151606.2871391-7-usama.arif%40bytedance.com
patch subject: [v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO
config: riscv-randconfig-r014-20230730 (https://download.01.org/0day-ci/archive/20230731/[email protected]/config)
compiler: riscv64-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230731/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

riscv64-linux-ld: mm/hugetlb.o: in function `.LVL1074':
>> hugetlb.c:(.init.text+0xf4): undefined reference to `vmemmap_optimize_enabled'
riscv64-linux-ld: mm/hugetlb.o: in function `.LVL1075':
hugetlb.c:(.init.text+0xf8): undefined reference to `vmemmap_optimize_enabled'
riscv64-linux-ld: mm/hugetlb.o: in function `.L0 ':
hugetlb.c:(.init.text+0x16e): undefined reference to `vmemmap_optimize_enabled'
>> riscv64-linux-ld: hugetlb.c:(.init.text+0x172): undefined reference to `vmemmap_optimize_enabled'

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-07-30 23:47:59

by Usama Arif

[permalink] [raw]
Subject: Re: [v2 0/6] mm/memblock: Skip prep and initialization of struct pages freed later by HVO



On 30/07/2023 16:16, Usama Arif wrote:
> If the region is for gigantic hugepages and if HVO is enabled, then those
> struct pages which will be freed later by HVO don't need to be prepared and
> initialized. This can save significant time when a large number of hugepages
> are allocated at boot time.
>
> For a 1G hugepage, this series avoid initialization and preparation of
> 262144 - 64 = 262080 struct pages per hugepage.
>
> When tested on a 512G system (which can allocate max 500 1G hugepages), the
> kexec-boot time with HVO and DEFERRED_STRUCT_PAGE_INIT enabled without this
> patchseries to running init is 3.9 seconds. With this patch it is 1.2 seconds.
> This represents an approximately 70% reduction in boot time and will
> significantly reduce server downtime when using a large number of
> gigantic pages.
>
> Thanks,
> Usama
>

There were build errors reported by kernel-bot when
CONFIG_HUGETLBFS/CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is disabled due to
patches 5 and 6 which should be fixed by below diff. Will wait for
review and include it in next revision as its a trivial diff

diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
index 3fff6f611c19..285b59b71203 100644
--- a/mm/hugetlb_vmemmap.h
+++ b/mm/hugetlb_vmemmap.h
@@ -38,6 +38,8 @@ static inline unsigned int
hugetlb_vmemmap_optimizable_size(const struct hstate
return 0;
return size > 0 ? size : 0;
}
+
+extern bool vmemmap_optimize_enabled;
#else
static inline int hugetlb_vmemmap_restore(const struct hstate *h,
struct page *head)
{
@@ -58,6 +60,8 @@ static inline bool vmemmap_should_optimize(const
struct hstate *h, const struct
return false;
}

+static bool vmemmap_optimize_enabled = false;
+
#endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */

static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h)
@@ -65,6 +69,4 @@ static inline bool hugetlb_vmemmap_optimizable(const
struct hstate *h)
return hugetlb_vmemmap_optimizable_size(h) != 0;
}

-extern bool vmemmap_optimize_enabled;
-
#endif /* _LINUX_HUGETLB_VMEMMAP_H */
diff --git a/mm/internal.h b/mm/internal.h
index 692bb1136a39..c3321afa36cb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1106,7 +1106,7 @@ struct vma_prepare {
#ifdef CONFIG_HUGETLBFS
void __init hugetlb_hstate_alloc_gigantic_pages(void);
#else
-static inline void __init hugetlb_hstate_alloc_gigantic_pages(void);
+static inline void __init hugetlb_hstate_alloc_gigantic_pages(void)
{
}
#endif /* CONFIG_HUGETLBFS */


> [v1->v2]:
> - (Mike Rapoport) Code quality improvements (function names, arguments,
> comments).
>
> [RFC->v1]:
> - (Mike Rapoport) Change from passing hugepage_size in
> memblock_alloc_try_nid_raw for skipping struct page initialization to
> using MEMBLOCK_RSRV_NOINIT flag
>
>
>
> Usama Arif (6):
> mm: hugetlb: Skip prep of tail pages when HVO is enabled
> mm: hugetlb_vmemmap: Use nid of the head page to reallocate it
> memblock: pass memblock_type to memblock_setclr_flag
> memblock: introduce MEMBLOCK_RSRV_NOINIT flag
> mm: move allocation of gigantic hstates to the start of mm_core_init
> mm: hugetlb: Skip initialization of struct pages freed later by HVO
>
> include/linux/memblock.h | 9 +++++
> mm/hugetlb.c | 71 +++++++++++++++++++++++++---------------
> mm/hugetlb_vmemmap.c | 6 ++--
> mm/hugetlb_vmemmap.h | 18 +++++++---
> mm/internal.h | 9 +++++
> mm/memblock.c | 45 +++++++++++++++++--------
> mm/mm_init.c | 6 ++++
> 7 files changed, 118 insertions(+), 46 deletions(-)
>

2023-07-31 01:32:32

by kernel test robot

[permalink] [raw]
Subject: Re: [v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO

Hi Usama,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url: https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230730-231750
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230730151606.2871391-7-usama.arif%40bytedance.com
patch subject: [v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20230731/[email protected]/config)
compiler: aarch64-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230731/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

aarch64-linux-ld: mm/hugetlb.o: in function `__alloc_bootmem_huge_page':
hugetlb.c:(.init.text+0x574): undefined reference to `vmemmap_optimize_enabled'
aarch64-linux-ld: mm/hugetlb.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `vmemmap_optimize_enabled' which may bind externally can not be used when making a shared object; recompile with -fPIC
hugetlb.c:(.init.text+0x574): dangerous relocation: unsupported relocation
>> aarch64-linux-ld: hugetlb.c:(.init.text+0x594): undefined reference to `vmemmap_optimize_enabled'
aarch64-linux-ld: hugetlb.c:(.init.text+0x59c): undefined reference to `vmemmap_optimize_enabled'
aarch64-linux-ld: hugetlb.c:(.init.text+0x634): undefined reference to `vmemmap_optimize_enabled'
aarch64-linux-ld: mm/hugetlb.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `vmemmap_optimize_enabled' which may bind externally can not be used when making a shared object; recompile with -fPIC
hugetlb.c:(.init.text+0x634): dangerous relocation: unsupported relocation
aarch64-linux-ld: hugetlb.c:(.init.text+0x650): undefined reference to `vmemmap_optimize_enabled'
aarch64-linux-ld: mm/hugetlb.o:hugetlb.c:(.init.text+0x658): more undefined references to `vmemmap_optimize_enabled' follow

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-07-31 07:21:15

by Oliver Sang

[permalink] [raw]
Subject: Re: [v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO



Hello,

kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:

commit: 336e2de6db9bad4d9a2d2f6bc8eb1d8beb8312f6 ("[v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO")
url: https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230730-231750
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [v2 6/6] mm: hugetlb: Skip initialization of struct pages freed later by HVO

in testcase: boot

compiler: clang-16
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+--------------------------------------------------------------------------------------+------------+------------+
| | 83e1b781c0 | 336e2de6db |
+--------------------------------------------------------------------------------------+------------+------------+
| boot_successes | 18 | 0 |
| BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#) | 0 | 18 |
+--------------------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


trampoline_32bit: 0x000000000009d000

Decompressing Linux... Parsing ELF... done.
Booting the kernel (entry_offset: 0x0000000000000000).
convert early boot stage from reboot-without-warning to failed
BUG: kernel failed in early-boot stage, last printk: Booting the kernel (entry_offset: 0x0000000000000000).
Linux version 6.5.0-rc3-00386-g336e2de6db9b #9
Command line: ip=::::vm-meta-191::dhcp root=/dev/ram0 RESULT_ROOT=/result/boot/1/vm-snb/yocto-x86_64-minimal-20190520.cgz/x86_64-randconfig-x014-20230730/clang-16/336e2de6db9bad4d9a2d2f6bc8eb1d8beb8312f6/5 BOOT_IMAGE=/pkg/linux/x86_64-randconfig-x014-20230730/clang-16/336e2de6db9bad4d9a2d2f6bc8eb1d8beb8312f6/vmlinuz-6.5.0-rc3-00386-g336e2de6db9b branch=linux-review/Usama-Arif/mm-hugetlb-Skip-prep-of-tail-pages-when-HVO-is-enabled/20230730-231750 job=/lkp/jobs/scheduled/vm-meta-191/boot-1-yocto-x86_64-minimal-20190520.cgz-x86_64-randconfig-x014-20230730-336e2de6db9b-20230731-8942-51bxa7-4.yaml user=lkp ARCH=x86_64 kconfig=x86_64-randconfig-x014-20230730 commit=336e2de6db9bad4d9a2d2f6bc8eb1d8beb8312f6 nmi_watchdog=0 vmalloc=256M initramfs_async=0 page_owner=on max_uptime=600 LKP_SERVER=internal-lkp-server selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw rcuperf.shutdown=0 watchdog_thresh=240

Kboot worker: lkp-worker27
Elapsed time: 60


To reproduce:

# build kernel
cd linux
cp config-6.5.0-rc3-00386-g336e2de6db9b .config
make HOSTCC=clang-16 CC=clang-16 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=clang-16 CC=clang-16 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Attachments:
(No filename) (4.12 kB)
config-6.5.0-rc3-00386-g336e2de6db9b (170.90 kB)
job-script (5.02 kB)
dmesg.xz (1.39 kB)
Download all attachments

2023-08-01 00:22:12

by Mike Kravetz

[permalink] [raw]
Subject: Re: [v2 1/6] mm: hugetlb: Skip prep of tail pages when HVO is enabled

On 07/30/23 16:16, Usama Arif wrote:
> When vmemmap is optimizable, it will free all the duplicated tail
> pages in hugetlb_vmemmap_optimize while preparing the new hugepage.
> Hence, there is no need to prepare them.
>
> For 1G x86 hugepages, it avoids preparing
> 262144 - 64 = 262080 struct pages per hugepage.
>
> The indirection of using __prep_compound_gigantic_folio is also removed,
> as it just creates extra functions to indicate demote which can be done
> with the argument.
>
> Signed-off-by: Usama Arif <[email protected]>
> ---
> mm/hugetlb.c | 32 ++++++++++++++------------------
> mm/hugetlb_vmemmap.c | 2 +-
> mm/hugetlb_vmemmap.h | 15 +++++++++++----
> 3 files changed, 26 insertions(+), 23 deletions(-)

Thanks,

I just started looking at this series. Adding Muchun on Cc:

>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 64a3239b6407..541c07b6d60f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1942,14 +1942,23 @@ static void prep_new_hugetlb_folio(struct hstate *h, struct folio *folio, int ni
> spin_unlock_irq(&hugetlb_lock);
> }
>
> -static bool __prep_compound_gigantic_folio(struct folio *folio,
> - unsigned int order, bool demote)
> +static bool prep_compound_gigantic_folio(struct folio *folio, struct hstate *h, bool demote)
> {
> int i, j;
> + int order = huge_page_order(h);
> int nr_pages = 1 << order;
> struct page *p;
>
> __folio_clear_reserved(folio);
> +
> + /*
> + * No need to prep pages that will be freed later by hugetlb_vmemmap_optimize.
> + * Hence, reduce nr_pages to the pages that will be kept.
> + */
> + if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) &&
> + vmemmap_should_optimize(h, &folio->page))

IIUC, vmemmap_optimize_enabled (checked in vmemmap_should_optimize) can be
modified at runtime via sysctl. If so, what prevents it from being changed
after this check and the later call to hugetlb_vmemmap_optimize()?
--
Mike Kravetz

> + nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);
> +
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -2019,18 +2028,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> return false;
> }
>
> -static bool prep_compound_gigantic_folio(struct folio *folio,
> - unsigned int order)
> -{
> - return __prep_compound_gigantic_folio(folio, order, false);
> -}
> -
> -static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
> - unsigned int order)
> -{
> - return __prep_compound_gigantic_folio(folio, order, true);
> -}
> -
> /*
> * PageHuge() only returns true for hugetlbfs pages, but not for normal or
> * transparent huge pages. See the PageTransHuge() documentation for more
> @@ -2185,7 +2182,7 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
> if (!folio)
> return NULL;
> if (hstate_is_gigantic(h)) {
> - if (!prep_compound_gigantic_folio(folio, huge_page_order(h))) {
> + if (!prep_compound_gigantic_folio(folio, h, false)) {
> /*
> * Rare failure to convert pages to compound page.
> * Free pages and try again - ONCE!
> @@ -3201,7 +3198,7 @@ static void __init gather_bootmem_prealloc(void)
>
> VM_BUG_ON(!hstate_is_gigantic(h));
> WARN_ON(folio_ref_count(folio) != 1);
> - if (prep_compound_gigantic_folio(folio, huge_page_order(h))) {
> + if (prep_compound_gigantic_folio(folio, h, false)) {
> WARN_ON(folio_test_reserved(folio));
> prep_new_hugetlb_folio(h, folio, folio_nid(folio));
> free_huge_page(page); /* add to the hugepage allocator */
> @@ -3624,8 +3621,7 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
> subpage = folio_page(folio, i);
> inner_folio = page_folio(subpage);
> if (hstate_is_gigantic(target_hstate))
> - prep_compound_gigantic_folio_for_demote(inner_folio,
> - target_hstate->order);
> + prep_compound_gigantic_folio(inner_folio, target_hstate, true);
> else
> prep_compound_page(subpage, target_hstate->order);
> folio_change_private(inner_folio, NULL);
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index c2007ef5e9b0..b721e87de2b3 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -486,7 +486,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> }
>
> /* Return true iff a HugeTLB whose vmemmap should and can be optimized. */
> -static bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
> +bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
> {
> if (!READ_ONCE(vmemmap_optimize_enabled))
> return false;
> diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
> index 25bd0e002431..3e7978a9af73 100644
> --- a/mm/hugetlb_vmemmap.h
> +++ b/mm/hugetlb_vmemmap.h
> @@ -10,16 +10,17 @@
> #define _LINUX_HUGETLB_VMEMMAP_H
> #include <linux/hugetlb.h>
>
> -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> -int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
> -void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
> -
> /*
> * Reserve one vmemmap page, all vmemmap addresses are mapped to it. See
> * Documentation/vm/vmemmap_dedup.rst.
> */
> #define HUGETLB_VMEMMAP_RESERVE_SIZE PAGE_SIZE
>
> +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> +int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
> +void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
> +bool vmemmap_should_optimize(const struct hstate *h, const struct page *head);
> +
> static inline unsigned int hugetlb_vmemmap_size(const struct hstate *h)
> {
> return pages_per_huge_page(h) * sizeof(struct page);
> @@ -51,6 +52,12 @@ static inline unsigned int hugetlb_vmemmap_optimizable_size(const struct hstate
> {
> return 0;
> }
> +
> +static inline bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
> +{
> + return false;
> +}
> +
> #endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */
>
> static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h)
> --
> 2.25.1
>

2023-08-01 02:37:17

by Muchun Song

[permalink] [raw]
Subject: Re: [v2 1/6] mm: hugetlb: Skip prep of tail pages when HVO is enabled



On 2023/7/30 23:16, Usama Arif wrote:
> When vmemmap is optimizable, it will free all the duplicated tail
> pages in hugetlb_vmemmap_optimize while preparing the new hugepage.
> Hence, there is no need to prepare them.
>
> For 1G x86 hugepages, it avoids preparing
> 262144 - 64 = 262080 struct pages per hugepage.
>
> The indirection of using __prep_compound_gigantic_folio is also removed,
> as it just creates extra functions to indicate demote which can be done
> with the argument.
>
> Signed-off-by: Usama Arif <[email protected]>
> ---
> mm/hugetlb.c | 32 ++++++++++++++------------------
> mm/hugetlb_vmemmap.c | 2 +-
> mm/hugetlb_vmemmap.h | 15 +++++++++++----
> 3 files changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 64a3239b6407..541c07b6d60f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1942,14 +1942,23 @@ static void prep_new_hugetlb_folio(struct hstate *h, struct folio *folio, int ni
> spin_unlock_irq(&hugetlb_lock);
> }
>
> -static bool __prep_compound_gigantic_folio(struct folio *folio,
> - unsigned int order, bool demote)
> +static bool prep_compound_gigantic_folio(struct folio *folio, struct hstate *h, bool demote)
> {
> int i, j;
> + int order = huge_page_order(h);
> int nr_pages = 1 << order;
> struct page *p;
>
> __folio_clear_reserved(folio);
> +
> + /*
> + * No need to prep pages that will be freed later by hugetlb_vmemmap_optimize.
> + * Hence, reduce nr_pages to the pages that will be kept.
> + */
> + if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) &&
> + vmemmap_should_optimize(h, &folio->page))
> + nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);

We need to initialize the refcount to zero of tail pages (see the big
comment below in this function), given a situation that someone (maybe
GUP) could get a ref on the tail pages when the vmemmap is optimizing,
what prevent this from happening?

Thanks.

> +
> for (i = 0; i < nr_pages; i++) {
> p = folio_page(folio, i);
>
> @@ -2019,18 +2028,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> return false;
> }
>
> -static bool prep_compound_gigantic_folio(struct folio *folio,
> - unsigned int order)
> -{
> - return __prep_compound_gigantic_folio(folio, order, false);
> -}
> -
> -static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
> - unsigned int order)
> -{
> - return __prep_compound_gigantic_folio(folio, order, true);
> -}
> -
> /*
> * PageHuge() only returns true for hugetlbfs pages, but not for normal or
> * transparent huge pages. See the PageTransHuge() documentation for more
> @@ -2185,7 +2182,7 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
> if (!folio)
> return NULL;
> if (hstate_is_gigantic(h)) {
> - if (!prep_compound_gigantic_folio(folio, huge_page_order(h))) {
> + if (!prep_compound_gigantic_folio(folio, h, false)) {
> /*
> * Rare failure to convert pages to compound page.
> * Free pages and try again - ONCE!
> @@ -3201,7 +3198,7 @@ static void __init gather_bootmem_prealloc(void)
>
> VM_BUG_ON(!hstate_is_gigantic(h));
> WARN_ON(folio_ref_count(folio) != 1);
> - if (prep_compound_gigantic_folio(folio, huge_page_order(h))) {
> + if (prep_compound_gigantic_folio(folio, h, false)) {
> WARN_ON(folio_test_reserved(folio));
> prep_new_hugetlb_folio(h, folio, folio_nid(folio));
> free_huge_page(page); /* add to the hugepage allocator */
> @@ -3624,8 +3621,7 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
> subpage = folio_page(folio, i);
> inner_folio = page_folio(subpage);
> if (hstate_is_gigantic(target_hstate))
> - prep_compound_gigantic_folio_for_demote(inner_folio,
> - target_hstate->order);
> + prep_compound_gigantic_folio(inner_folio, target_hstate, true);
> else
> prep_compound_page(subpage, target_hstate->order);
> folio_change_private(inner_folio, NULL);
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index c2007ef5e9b0..b721e87de2b3 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -486,7 +486,7 @@ int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head)
> }
>
> /* Return true iff a HugeTLB whose vmemmap should and can be optimized. */
> -static bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
> +bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
> {
> if (!READ_ONCE(vmemmap_optimize_enabled))
> return false;
> diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
> index 25bd0e002431..3e7978a9af73 100644
> --- a/mm/hugetlb_vmemmap.h
> +++ b/mm/hugetlb_vmemmap.h
> @@ -10,16 +10,17 @@
> #define _LINUX_HUGETLB_VMEMMAP_H
> #include <linux/hugetlb.h>
>
> -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> -int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
> -void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
> -
> /*
> * Reserve one vmemmap page, all vmemmap addresses are mapped to it. See
> * Documentation/vm/vmemmap_dedup.rst.
> */
> #define HUGETLB_VMEMMAP_RESERVE_SIZE PAGE_SIZE
>
> +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> +int hugetlb_vmemmap_restore(const struct hstate *h, struct page *head);
> +void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head);
> +bool vmemmap_should_optimize(const struct hstate *h, const struct page *head);
> +
> static inline unsigned int hugetlb_vmemmap_size(const struct hstate *h)
> {
> return pages_per_huge_page(h) * sizeof(struct page);
> @@ -51,6 +52,12 @@ static inline unsigned int hugetlb_vmemmap_optimizable_size(const struct hstate
> {
> return 0;
> }
> +
> +static inline bool vmemmap_should_optimize(const struct hstate *h, const struct page *head)
> +{
> + return false;
> +}
> +
> #endif /* CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP */
>
> static inline bool hugetlb_vmemmap_optimizable(const struct hstate *h)


2023-08-02 10:09:33

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [v2 1/6] mm: hugetlb: Skip prep of tail pages when HVO is enabled



On 01/08/2023 00:18, Mike Kravetz wrote:
> On 07/30/23 16:16, Usama Arif wrote:
>> When vmemmap is optimizable, it will free all the duplicated tail
>> pages in hugetlb_vmemmap_optimize while preparing the new hugepage.
>> Hence, there is no need to prepare them.
>>
>> For 1G x86 hugepages, it avoids preparing
>> 262144 - 64 = 262080 struct pages per hugepage.
>>
>> The indirection of using __prep_compound_gigantic_folio is also removed,
>> as it just creates extra functions to indicate demote which can be done
>> with the argument.
>>
>> Signed-off-by: Usama Arif <[email protected]>
>> ---
>> mm/hugetlb.c | 32 ++++++++++++++------------------
>> mm/hugetlb_vmemmap.c | 2 +-
>> mm/hugetlb_vmemmap.h | 15 +++++++++++----
>> 3 files changed, 26 insertions(+), 23 deletions(-)
>
> Thanks,
>
> I just started looking at this series. Adding Muchun on Cc:
>
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 64a3239b6407..541c07b6d60f 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1942,14 +1942,23 @@ static void prep_new_hugetlb_folio(struct hstate *h, struct folio *folio, int ni
>> spin_unlock_irq(&hugetlb_lock);
>> }
>>
>> -static bool __prep_compound_gigantic_folio(struct folio *folio,
>> - unsigned int order, bool demote)
>> +static bool prep_compound_gigantic_folio(struct folio *folio, struct hstate *h, bool demote)
>> {
>> int i, j;
>> + int order = huge_page_order(h);
>> int nr_pages = 1 << order;
>> struct page *p;
>>
>> __folio_clear_reserved(folio);
>> +
>> + /*
>> + * No need to prep pages that will be freed later by hugetlb_vmemmap_optimize.
>> + * Hence, reduce nr_pages to the pages that will be kept.
>> + */
>> + if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) &&
>> + vmemmap_should_optimize(h, &folio->page))
>
> IIUC, vmemmap_optimize_enabled (checked in vmemmap_should_optimize) can be
> modified at runtime via sysctl. If so, what prevents it from being changed
> after this check and the later call to hugetlb_vmemmap_optimize()?

Hi,

Thanks for the review.

Yes thats a good catch. The solution for this issue would be to to turn
hugetlb_free_vmemmap into a callback core_param and have a lock around
the write and when its used in gather_bootmem_prealloc, etc.

But the bigger issue during runtime is what Muchun pointed out that the
struct page refcount is not frozen to 0.

My main usecase (and maybe for others as well?) is reserving these
gigantic pages at boot time. I thought the runtime improvement might
come from free with it but doesnt look like it. Both issues could be
solved by just limiting it to boot time, as vmemmap_optimize_enabled
cannot be changed during boot time, and reference to those pages cannot
gotten by anything else as well (they aren't even initialized by
memblock after patch 6), so will include the below diff to solve both
the issues in next revision.


diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8434100f60ae..790842a6f978 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1956,7 +1956,8 @@ static bool prep_compound_gigantic_folio(struct
folio *folio, struct hstate *h,
* Hence, reduce nr_pages to the pages that will be kept.
*/
if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) &&
- vmemmap_should_optimize(h, &folio->page))
+ vmemmap_should_optimize(h, &folio->page) &&
+ system_state == SYSTEM_BOOTING)
nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct
page);
for (i = 0; i < nr_pages; i++) {

Thanks,
Usama

2023-08-02 10:41:26

by Usama Arif

[permalink] [raw]
Subject: Re: [External] Re: [v2 1/6] mm: hugetlb: Skip prep of tail pages when HVO is enabled



On 01/08/2023 03:04, Muchun Song wrote:
>
>
> On 2023/7/30 23:16, Usama Arif wrote:
>> When vmemmap is optimizable, it will free all the duplicated tail
>> pages in hugetlb_vmemmap_optimize while preparing the new hugepage.
>> Hence, there is no need to prepare them.
>>
>> For 1G x86 hugepages, it avoids preparing
>> 262144 - 64 = 262080 struct pages per hugepage.
>>
>> The indirection of using __prep_compound_gigantic_folio is also removed,
>> as it just creates extra functions to indicate demote which can be done
>> with the argument.
>>
>> Signed-off-by: Usama Arif <[email protected]>
>> ---
>>   mm/hugetlb.c         | 32 ++++++++++++++------------------
>>   mm/hugetlb_vmemmap.c |  2 +-
>>   mm/hugetlb_vmemmap.h | 15 +++++++++++----
>>   3 files changed, 26 insertions(+), 23 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 64a3239b6407..541c07b6d60f 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1942,14 +1942,23 @@ static void prep_new_hugetlb_folio(struct
>> hstate *h, struct folio *folio, int ni
>>       spin_unlock_irq(&hugetlb_lock);
>>   }
>> -static bool __prep_compound_gigantic_folio(struct folio *folio,
>> -                    unsigned int order, bool demote)
>> +static bool prep_compound_gigantic_folio(struct folio *folio, struct
>> hstate *h, bool demote)
>>   {
>>       int i, j;
>> +    int order = huge_page_order(h);
>>       int nr_pages = 1 << order;
>>       struct page *p;
>>       __folio_clear_reserved(folio);
>> +
>> +    /*
>> +     * No need to prep pages that will be freed later by
>> hugetlb_vmemmap_optimize.
>> +     * Hence, reduce nr_pages to the pages that will be kept.
>> +     */
>> +    if (IS_ENABLED(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP) &&
>> +            vmemmap_should_optimize(h, &folio->page))
>> +        nr_pages = HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page);
>
> We need to initialize the refcount to zero of tail pages (see the big
> comment below in this function), given a situation that someone (maybe
> GUP) could get a ref on the tail pages when the vmemmap is optimizing,
> what prevent this from happening?
>
> Thanks.
>

Thanks for pointing this out, will limit to boot time for solving this
in next version.