2024-02-27 13:53:33

by Baolin Wang

[permalink] [raw]
Subject: [PATCH 0/3] make the hugetlb migration strategy consistent

Hi,

As discussed in previous thread [1], there is an inconsistency when handling
hugetlb migration. When handling the migration of freed hugetlb, it prevents
fallback to other NUMA nodes in alloc_and_dissolve_hugetlb_folio(). However,
when dealing with in-use hugetlb, it allows fallback to other NUMA nodes in
alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool
and might result in unexpected failures when node bound workloads doesn't get
what is asssumed available.

This patch set tries to make the hugetlb migration strategy more clear
and consistent. Please find details in each patch.

[1]
https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com/

Changes from RFC:
- Move the gfp modification into alloc_migrate_hugetlb_folio().
- Add more comments.

Baolin Wang (3):
mm: record the migration reason for struct migration_target_control
mm: hugetlb: make the hugetlb migration strategy consistent
docs: hugetlbpage.rst: add hugetlb migration description

Documentation/admin-guide/mm/hugetlbpage.rst | 7 ++++
include/linux/hugetlb.h | 4 +-
mm/gup.c | 1 +
mm/hugetlb.c | 39 ++++++++++++++++++--
mm/internal.h | 1 +
mm/memory-failure.c | 1 +
mm/memory_hotplug.c | 1 +
mm/mempolicy.c | 3 +-
mm/migrate.c | 3 +-
mm/page_alloc.c | 1 +
mm/vmscan.c | 3 +-
11 files changed, 55 insertions(+), 9 deletions(-)

--
2.39.3



2024-02-27 13:53:34

by Baolin Wang

[permalink] [raw]
Subject: [PATCH 3/3] docs: hugetlbpage.rst: add hugetlb migration description

Add some description of the hugetlb migration strategy.

Signed-off-by: Baolin Wang <[email protected]>
---
Documentation/admin-guide/mm/hugetlbpage.rst | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index e4d4b4a8dc97..f34a0d798d5b 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -376,6 +376,13 @@ Note that the number of overcommit and reserve pages remain global quantities,
as we don't know until fault time, when the faulting task's mempolicy is
applied, from which node the huge page allocation will be attempted.

+The hugetlb may be migrated between the per-node hugepages pool in the following
+scenarios: memory offline, memory failure, longterm pinning, syscalls(mbind,
+migrate_pages and move_pages), alloc_contig_range() and alloc_contig_pages().
+Now only memory offline, memory failure and syscalls allow fallbacking to allocate
+a new hugetlb on a different node if the current node is unable to allocate during
+hugetlb migration, that means these 3 cases can break the per-node hugepages pool.
+
.. _using_huge_pages:

Using Huge Pages
--
2.39.3


2024-02-27 13:53:50

by Baolin Wang

[permalink] [raw]
Subject: [PATCH 1/3] mm: record the migration reason for struct migration_target_control

To support different hugetlb allocation strategies during hugetlb migration
based on various migration reasons, recording the migration reason for the
migration_target_control structure as a preparation.

Signed-off-by: Baolin Wang <[email protected]>
---
mm/gup.c | 1 +
mm/internal.h | 1 +
mm/memory-failure.c | 1 +
mm/memory_hotplug.c | 1 +
mm/mempolicy.c | 1 +
mm/migrate.c | 1 +
mm/page_alloc.c | 1 +
mm/vmscan.c | 3 ++-
8 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index df83182ec72d..959a1a05b059 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2132,6 +2132,7 @@ static int migrate_longterm_unpinnable_pages(
struct migration_target_control mtc = {
.nid = NUMA_NO_NODE,
.gfp_mask = GFP_USER | __GFP_NOWARN,
+ .reason = MR_LONGTERM_PIN,
};

if (migrate_pages(movable_page_list, alloc_migration_target,
diff --git a/mm/internal.h b/mm/internal.h
index 2b7efffbe4d7..47edf69b6ee6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -959,6 +959,7 @@ struct migration_target_control {
int nid; /* preferred node id */
nodemask_t *nmask;
gfp_t gfp_mask;
+ enum migrate_reason reason;
};

/*
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 9349948f1abf..780bb2aee0af 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2666,6 +2666,7 @@ static int soft_offline_in_use_page(struct page *page)
struct migration_target_control mtc = {
.nid = NUMA_NO_NODE,
.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ .reason = MR_MEMORY_FAILURE,
};

if (!huge && folio_test_large(folio)) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a444e2d7dd2b..b79ba36e09e0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1841,6 +1841,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
struct migration_target_control mtc = {
.nmask = &nmask,
.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ .reason = MR_MEMORY_HOTPLUG,
};
int ret;

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f60b4c99f130..98ceb12e0e17 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1070,6 +1070,7 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
struct migration_target_control mtc = {
.nid = dest,
.gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+ .reason = MR_SYSCALL,
};

nodes_clear(nmask);
diff --git a/mm/migrate.c b/mm/migrate.c
index 73a052a382f1..bde63010a3cf 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2060,6 +2060,7 @@ static int do_move_pages_to_node(struct list_head *pagelist, int node)
struct migration_target_control mtc = {
.nid = node,
.gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+ .reason = MR_SYSCALL,
};

err = migrate_pages(pagelist, alloc_migration_target, NULL,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 96839b210abe..8e6dd3a1028b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6266,6 +6266,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
struct migration_target_control mtc = {
.nid = zone_to_nid(cc->zone),
.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+ .reason = MR_CONTIG_RANGE,
};

lru_cache_disable();
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 402c290fbf5a..510f438bb9e0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -978,7 +978,8 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
__GFP_NOMEMALLOC | GFP_NOWAIT,
.nid = target_nid,
- .nmask = &allowed_mask
+ .nmask = &allowed_mask,
+ .reason = MR_DEMOTION,
};

if (list_empty(demote_folios))
--
2.39.3


2024-02-27 14:22:17

by Baolin Wang

[permalink] [raw]
Subject: [PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent

As discussed in previous thread [1], there is an inconsistency when handing
hugetlb migration. When handling the migration of freed hugetlb, it prevents
fallback to other NUMA nodes in alloc_and_dissolve_hugetlb_folio(). However,
when dealing with in-use hugetlb, it allows fallback to other NUMA nodes in
alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool
and might result in unexpected failures when node bound workloads doesn't get
what is asssumed available.

To make hugetlb migration strategy more clear, we should list all the scenarios
of hugetlb migration and analyze whether allocation fallback is permitted:
1) Memory offline: will call dissolve_free_huge_pages() to free the freed hugetlb,
and call do_migrate_range() to migrate the in-use hugetlb. Both can break the
per-node hugetlb pool, but as this is an explicit offlining operation, no better
choice. So should allow the hugetlb allocation fallback.
2) Memory failure: same as memory offline. Should allow fallback to a different node
might be the only option to handle it, otherwise the impact of poisoned memory can
be amplified.
3) Longterm pinning: will call migrate_longterm_unpinnable_pages() to migrate in-use
and not-longterm-pinnable hugetlb, which can break the per-node pool. But we should
fail to longterm pinning if can not allocate on current node to avoid breaking the
per-node pool.
4) Syscalls (mbind, migrate_pages, move_pages): these are explicit users operation
to move pages to other nodes, so fallback to other nodes should not be prohibited.
5) alloc_contig_range: used by CMA allocation and virtio-mem fake-offline to allocate
given range of pages. Now the freed hugetlb migration is not allowed to fallback, to
keep consistency, the in-use hugetlb migration should be also not allowed to fallback.
6) alloc_contig_pages: used by kfence, pgtable_debug etc. The strategy should be
consistent with that of alloc_contig_range().

Based on the analysis of the various scenarios above, determine whether fallback is
permitted according to the migration reason in alloc_migrate_hugetlb_folio().

[1] https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com/
Signed-off-by: Baolin Wang <[email protected]>
---
include/linux/hugetlb.h | 4 ++--
mm/hugetlb.c | 39 +++++++++++++++++++++++++++++++++++----
mm/mempolicy.c | 2 +-
mm/migrate.c | 2 +-
4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 77b30a8c6076..fa122dc509cf 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -747,7 +747,7 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask);
+ nodemask_t *nmask, gfp_t gfp_mask, int reason);
int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,
pgoff_t idx);
void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma,
@@ -1065,7 +1065,7 @@ static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,

static inline struct folio *
alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, int reason)
{
return NULL;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 418d66953224..e8eb08bbc688 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2567,13 +2567,38 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
}

static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mask,
- int nid, nodemask_t *nmask)
+ int nid, nodemask_t *nmask, int reason)
{
struct folio *folio;
+ bool allowed_fallback = false;

if (hstate_is_gigantic(h))
return NULL;

+ if (gfp_mask & __GFP_THISNODE)
+ goto alloc_new;
+
+ /*
+ * Note: the memory offline, memory failure and migration syscalls will
+ * be allowed to fallback to other nodes due to lack of a better chioce,
+ * that might break the per-node hugetlb pool. While other cases will
+ * set the __GFP_THISNODE to avoid breaking the per-node hugetlb pool.
+ */
+ switch (reason) {
+ case MR_MEMORY_HOTPLUG:
+ case MR_MEMORY_FAILURE:
+ case MR_SYSCALL:
+ case MR_MEMPOLICY_MBIND:
+ allowed_fallback = true;
+ break;
+ default:
+ break;
+ }
+
+ if (!allowed_fallback)
+ gfp_mask |= __GFP_THISNODE;
+
+alloc_new:
folio = alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
if (!folio)
return NULL;
@@ -2621,7 +2646,7 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,

/* folio migration callback function */
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, int reason)
{
spin_lock_irq(&hugetlb_lock);
if (available_huge_pages(h)) {
@@ -2636,7 +2661,7 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
}
spin_unlock_irq(&hugetlb_lock);

- return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask);
+ return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask, reason);
}

/*
@@ -6653,7 +6678,13 @@ static struct folio *alloc_hugetlb_folio_vma(struct hstate *h,

gfp_mask = htlb_alloc_mask(h);
node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
- folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask);
+ /*
+ * This is used to allocate a temporary hugetlb to hold the copied
+ * content, which will then be copied again to the final hugetlb
+ * consuming a reservation. Set the migrate reason to -1 to indicate
+ * that breaking the per-node hugetlb pool is not allowed in this case.
+ */
+ folio = alloc_hugetlb_folio_nodemask(h, node, nodemask, gfp_mask, -1);
mpol_cond_put(mpol);

return folio;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 98ceb12e0e17..436e817eeaeb 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1228,7 +1228,7 @@ static struct folio *alloc_migration_target_by_mpol(struct folio *src,
h = folio_hstate(src);
gfp = htlb_alloc_mask(h);
nodemask = policy_nodemask(gfp, pol, ilx, &nid);
- return alloc_hugetlb_folio_nodemask(h, nid, nodemask, gfp);
+ return alloc_hugetlb_folio_nodemask(h, nid, nodemask, gfp, MR_MEMPOLICY_MBIND);
}

if (folio_test_large(src))
diff --git a/mm/migrate.c b/mm/migrate.c
index bde63010a3cf..0c2b70800da3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2022,7 +2022,7 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)

gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
return alloc_hugetlb_folio_nodemask(h, nid,
- mtc->nmask, gfp_mask);
+ mtc->nmask, gfp_mask, mtc->reason);
}

if (folio_test_large(src)) {
--
2.39.3


2024-02-27 15:12:25

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: record the migration reason for struct migration_target_control

On Tue, Feb 27, 2024 at 09:52:25PM +0800, Baolin Wang wrote:
> To support different hugetlb allocation strategies during hugetlb migration
> based on various migration reasons, recording the migration reason for the
> migration_target_control structure as a preparation.

"... record the migration reason in the migration_target_control
structure..." ?


> Signed-off-by: Baolin Wang <[email protected]>

Reviewed-by: Oscar Salvador <[email protected]>


--
Oscar Salvador
SUSE Labs

2024-02-27 15:16:13

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent

On Tue, Feb 27, 2024 at 09:52:26PM +0800, Baolin Wang wrote:

> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2567,13 +2567,38 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
> }
>
> static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mask,
> - int nid, nodemask_t *nmask)
> + int nid, nodemask_t *nmask, int reason)

I still dislike taking the reason argument this far, and I'd rather have
this as a boolean specifing whether we allow fallback on other nodes.
That would mean parsing the reason in alloc_migration_target().
If we don't add a new helper e.g: gfp_allow_fallback(), we can just do
it right there an opencode it with a e.g: macro etc.

Although doing it in an inline helper might help hiding these details.

That's my take on this, but let's see what others have to say.

--
Oscar Salvador
SUSE Labs

2024-02-28 07:40:46

by Baolin Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent



On 2024/2/27 23:17, Oscar Salvador wrote:
> On Tue, Feb 27, 2024 at 09:52:26PM +0800, Baolin Wang wrote:
>
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -2567,13 +2567,38 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
>> }
>>
>> static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mask,
>> - int nid, nodemask_t *nmask)
>> + int nid, nodemask_t *nmask, int reason)
>
> I still dislike taking the reason argument this far, and I'd rather have
> this as a boolean specifing whether we allow fallback on other nodes.
> That would mean parsing the reason in alloc_migration_target().
> If we don't add a new helper e.g: gfp_allow_fallback(), we can just do
> it right there an opencode it with a e.g: macro etc.
>
> Although doing it in an inline helper might help hiding these details.
>
> That's my take on this, but let's see what others have to say.

Sure. I also expressed my preference for hiding these details within the
hugetlb core as much as possible.

Muchun, what do you think? Thanks.

2024-02-28 07:46:41

by Baolin Wang

[permalink] [raw]
Subject: Re: [PATCH 1/3] mm: record the migration reason for struct migration_target_control



On 2024/2/27 23:10, Oscar Salvador wrote:
> On Tue, Feb 27, 2024 at 09:52:25PM +0800, Baolin Wang wrote:
>> To support different hugetlb allocation strategies during hugetlb migration
>> based on various migration reasons, recording the migration reason for the
>> migration_target_control structure as a preparation.
>
> "... record the migration reason in the migration_target_control
> structure..." ?

Sure, will do.

>> Signed-off-by: Baolin Wang <[email protected]>
>
> Reviewed-by: Oscar Salvador <[email protected]>

Thanks.

2024-02-28 08:40:03

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent

On Wed, Feb 28, 2024 at 03:40:08PM +0800, Baolin Wang wrote:
>
>
> On 2024/2/27 23:17, Oscar Salvador wrote:
> > On Tue, Feb 27, 2024 at 09:52:26PM +0800, Baolin Wang wrote:
> >
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -2567,13 +2567,38 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
> > > }
> > > static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mask,
> > > - int nid, nodemask_t *nmask)
> > > + int nid, nodemask_t *nmask, int reason)
> >
> > I still dislike taking the reason argument this far, and I'd rather have
> > this as a boolean specifing whether we allow fallback on other nodes.
> > That would mean parsing the reason in alloc_migration_target().
> > If we don't add a new helper e.g: gfp_allow_fallback(), we can just do
> > it right there an opencode it with a e.g: macro etc.
> >
> > Although doing it in an inline helper might help hiding these details.
> >
> > That's my take on this, but let's see what others have to say.
>
> Sure. I also expressed my preference for hiding these details within the
> hugetlb core as much as possible.
>
> Muchun, what do you think? Thanks.

JFTR: I'm talking about https://lore.kernel.org/linux-mm/[email protected]/
or maybe something cleaner which doesn't need a new helper (we could if
we want though):

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c1ee640d87b1..ddd794e861e6 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -73,6 +73,16 @@ struct resv_map {
#endif
};

+#define MIGRATE_MEMORY_HOTPLUG 1UL << MR_MEMORY_HOTPLUG
+#define MIGRATE_MEMORY_FAILURE 1UL << MR_MEMORY_FAILURE
+#define MIGRATE_SYSCALL 1UL << MR_SYSCALL
+#define MIGRATE_MBIND 1UL << MR_MEMPOLICY_MBIND
+#define HTLB_ALLOW_FALLBACK (MIGRATE_MEMORY_HOTPLUG| \
+ MIGRATE_MEMORY_FAILURE| \
+ MIGRATE_SYSCALL| \
+ MIGRATE_MBIND)
+
+
/*
* Region tracking -- allows tracking of reservations and instantiated pages
* across the pages in a mapping.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ed1581b670d4..7e8d6b5885d6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2619,7 +2619,7 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,

/* folio migration callback function */
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
- nodemask_t *nmask, gfp_t gfp_mask)
+ nodemask_t *nmask, gfp_t gfp_mask, bool allow_fallback)
{
spin_lock_irq(&hugetlb_lock);
if (available_huge_pages(h)) {
@@ -2634,6 +2634,12 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
}
spin_unlock_irq(&hugetlb_lock);

+ /*
+ * We cannot fallback to other nodes, as we could break the per-node pool
+ */
+ if (!allow_fallback)
+ gfp_mask |= GFP_THISNODE;
+
return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask);
}

diff --git a/mm/migrate.c b/mm/migrate.c
index cc9f2bcd73b4..c1f1d011629d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2016,10 +2016,15 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)

if (folio_test_hugetlb(src)) {
struct hstate *h = folio_hstate(src);
+ bool allow_fallback = false;
+
+ if ((1UL << reason) & HTLB_ALLOW_FALLBACK)
+ allow_fallback = true;

gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
return alloc_hugetlb_folio_nodemask(h, nid,
- mtc->nmask, gfp_mask);
+ mtc->nmask, gfp_mask,
+ allow_fallback);
}

if (folio_test_large(src)) {

--
Oscar Salvador
SUSE Labs

2024-03-06 08:43:17

by Baolin Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent



On 2024/2/28 16:41, Oscar Salvador wrote:
> On Wed, Feb 28, 2024 at 03:40:08PM +0800, Baolin Wang wrote:
>>
>>
>> On 2024/2/27 23:17, Oscar Salvador wrote:
>>> On Tue, Feb 27, 2024 at 09:52:26PM +0800, Baolin Wang wrote:
>>>
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -2567,13 +2567,38 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
>>>> }
>>>> static struct folio *alloc_migrate_hugetlb_folio(struct hstate *h, gfp_t gfp_mask,
>>>> - int nid, nodemask_t *nmask)
>>>> + int nid, nodemask_t *nmask, int reason)
>>>
>>> I still dislike taking the reason argument this far, and I'd rather have
>>> this as a boolean specifing whether we allow fallback on other nodes.
>>> That would mean parsing the reason in alloc_migration_target().
>>> If we don't add a new helper e.g: gfp_allow_fallback(), we can just do
>>> it right there an opencode it with a e.g: macro etc.
>>>
>>> Although doing it in an inline helper might help hiding these details.
>>>
>>> That's my take on this, but let's see what others have to say.
>>
>> Sure. I also expressed my preference for hiding these details within the
>> hugetlb core as much as possible.
>>
>> Muchun, what do you think? Thanks.
>
> JFTR: I'm talking about https://lore.kernel.org/linux-mm/[email protected]/
> or maybe something cleaner which doesn't need a new helper (we could if
> we want though):
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index c1ee640d87b1..ddd794e861e6 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -73,6 +73,16 @@ struct resv_map {
> #endif
> };
>
> +#define MIGRATE_MEMORY_HOTPLUG 1UL << MR_MEMORY_HOTPLUG
> +#define MIGRATE_MEMORY_FAILURE 1UL << MR_MEMORY_FAILURE
> +#define MIGRATE_SYSCALL 1UL << MR_SYSCALL
> +#define MIGRATE_MBIND 1UL << MR_MEMPOLICY_MBIND
> +#define HTLB_ALLOW_FALLBACK (MIGRATE_MEMORY_HOTPLUG| \
> + MIGRATE_MEMORY_FAILURE| \
> + MIGRATE_SYSCALL| \
> + MIGRATE_MBIND)
> +
> +
> /*
> * Region tracking -- allows tracking of reservations and instantiated pages
> * across the pages in a mapping.
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ed1581b670d4..7e8d6b5885d6 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2619,7 +2619,7 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
>
> /* folio migration callback function */
> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
> - nodemask_t *nmask, gfp_t gfp_mask)
> + nodemask_t *nmask, gfp_t gfp_mask, bool allow_fallback)
> {
> spin_lock_irq(&hugetlb_lock);
> if (available_huge_pages(h)) {
> @@ -2634,6 +2634,12 @@ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
> }
> spin_unlock_irq(&hugetlb_lock);
>
> + /*
> + * We cannot fallback to other nodes, as we could break the per-node pool
> + */
> + if (!allow_fallback)
> + gfp_mask |= GFP_THISNODE; > +
> return alloc_migrate_hugetlb_folio(h, gfp_mask, preferred_nid, nmask);
> }
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index cc9f2bcd73b4..c1f1d011629d 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2016,10 +2016,15 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)
>
> if (folio_test_hugetlb(src)) {
> struct hstate *h = folio_hstate(src);
> + bool allow_fallback = false;
> +
> + if ((1UL << reason) & HTLB_ALLOW_FALLBACK)
> + allow_fallback = true;

IMHO, users also should not be aware of these hugetlb logics.

>
> gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
> return alloc_hugetlb_folio_nodemask(h, nid,
> - mtc->nmask, gfp_mask);
> + mtc->nmask, gfp_mask,
> + allow_fallback);

'allow_fallback' can be confusing, that means it is 'allow_fallback' for
a new temporary hugetlb allocation, but not 'allow_fallback' for an
available hugetlb allocation in alloc_hugetlb_folio_nodemask().

2024-03-06 09:08:18

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent

On Wed, Mar 06, 2024 at 04:35:26PM +0800, Baolin Wang wrote:
> On 2024/2/28 16:41, Oscar Salvador wrote:

> > if (folio_test_hugetlb(src)) {
> > struct hstate *h = folio_hstate(src);
> > + bool allow_fallback = false;
> > +
> > + if ((1UL << reason) & HTLB_ALLOW_FALLBACK)
> > + allow_fallback = true;
>
> IMHO, users also should not be aware of these hugetlb logics.

Note that what I wrote there was ugly, because it was just a PoC.

It could be a helper e.g:

if (hugetlb_reason_allow_alloc_fallback(reason)) (or whatever)
allow_fallback_alloc = true

> >
> > gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
> > return alloc_hugetlb_folio_nodemask(h, nid,
> > - mtc->nmask, gfp_mask);
> > + mtc->nmask, gfp_mask,
> > + allow_fallback);
>
> 'allow_fallback' can be confusing, that means it is 'allow_fallback' for a
> new temporary hugetlb allocation, but not 'allow_fallback' for an available
> hugetlb allocation in alloc_hugetlb_folio_nodemask().

Well, you can pick "alloc_fallback_on_alloc" which is more descriptive I
guess.

Bottomline line is that I do not think that choosing to allow
fallbacking or not here is spreading more logic than having the
htlb_modify_alloc_mask() here and not directly in
alloc_hugetlb_folio_nodemask().

As I said, code-wise looks fine, it is just that having to pass
the 'reason' all the way down and making the decision there makes
me go "meh..".


--
Oscar Salvador
SUSE Labs

2024-03-06 09:38:55

by Baolin Wang

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: hugetlb: make the hugetlb migration strategy consistent



On 2024/3/6 16:46, Oscar Salvador wrote:
> On Wed, Mar 06, 2024 at 04:35:26PM +0800, Baolin Wang wrote:
>> On 2024/2/28 16:41, Oscar Salvador wrote:
>
>>> if (folio_test_hugetlb(src)) {
>>> struct hstate *h = folio_hstate(src);
>>> + bool allow_fallback = false;
>>> +
>>> + if ((1UL << reason) & HTLB_ALLOW_FALLBACK)
>>> + allow_fallback = true;
>>
>> IMHO, users also should not be aware of these hugetlb logics.
>
> Note that what I wrote there was ugly, because it was just a PoC.
>
> It could be a helper e.g:
>
> if (hugetlb_reason_allow_alloc_fallback(reason)) (or whatever)
> allow_fallback_alloc = true
>
>>>
>>> gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
>>> return alloc_hugetlb_folio_nodemask(h, nid,
>>> - mtc->nmask, gfp_mask);
>>> + mtc->nmask, gfp_mask,
>>> + allow_fallback);
>>
>> 'allow_fallback' can be confusing, that means it is 'allow_fallback' for a
>> new temporary hugetlb allocation, but not 'allow_fallback' for an available
>> hugetlb allocation in alloc_hugetlb_folio_nodemask().
>
> Well, you can pick "alloc_fallback_on_alloc" which is more descriptive I
> guess.

Seems better.

>
> Bottomline line is that I do not think that choosing to allow
> fallbacking or not here is spreading more logic than having the
> htlb_modify_alloc_mask() here and not directly in
> alloc_hugetlb_folio_nodemask().
>
> As I said, code-wise looks fine, it is just that having to pass
> the 'reason' all the way down and making the decision there makes
> me go "meh..".

Well, fair enough:) let me respin it. Thanks.