2021-02-08 10:51:01

by Oscar Salvador

[permalink] [raw]
Subject: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

Free hugetlb pages are trickier to handle as to in order to guarantee
no userspace appplication disruption, we need to replace the
current free hugepage with a new one.

In order to do that, a new function called alloc_and_dissolve_huge_page
in introduced.
This function will first try to get a new fresh hugetlb page, and if it
succeeds, it will dissolve the old one.

Signed-off-by: Oscar Salvador <[email protected]>
---
include/linux/hugetlb.h | 6 ++++++
mm/compaction.c | 11 +++++++++++
mm/hugetlb.c | 35 +++++++++++++++++++++++++++++++++++
3 files changed, 52 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..f81afcb86e89 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,6 +505,7 @@ struct huge_bootmem_page {
struct hstate *hstate;
};

+bool alloc_and_dissolve_huge_page(struct page *page);
struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
@@ -773,6 +774,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};

+static inline bool alloc_and_dissolve_huge_page(struct page *page)
+{
+ return false;
+}
+
static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr,
int avoid_reserve)
diff --git a/mm/compaction.c b/mm/compaction.c
index 89cd2e60da29..7969ddc10856 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
low_pfn += compound_nr(page) - 1;
goto isolate_success_no_list;
}
+ } else {
+ /*
+ * Free hugetlb page. Allocate a new one and
+ * dissolve this is if succeed.
+ */
+ if (alloc_and_dissolve_huge_page(page)) {
+ unsigned long order = buddy_order_unsafe(page);
+
+ low_pfn += (1UL << order) - 1;
+ continue;
+ }
}
goto isolate_fail;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 18f6ee317900..79ffbb64c4ee 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2253,6 +2253,41 @@ static void restore_reserve_on_error(struct hstate *h,
}
}

+bool alloc_and_dissolve_huge_page(struct page *page)
+{
+ NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL);
+ struct page *head;
+ struct hstate *h;
+ bool ret = false;
+ int nid;
+
+ if (!nodes_allowed)
+ return ret;
+
+ spin_lock(&hugetlb_lock);
+ head = compound_head(page);
+ h = page_hstate(head);
+ nid = page_to_nid(head);
+ spin_unlock(&hugetlb_lock);
+
+ init_nodemask_of_node(nodes_allowed, nid);
+
+ /*
+ * Before dissolving the page, we need to allocate a new one,
+ * so the pool remains stable.
+ */
+ if (alloc_pool_huge_page(h, nodes_allowed, NULL)) {
+ /*
+ * Ok, we have a free hugetlb-page to replace this
+ * one. Dissolve the old page.
+ */
+ if (!dissolve_free_huge_page(page))
+ ret = true;
+ }
+
+ return ret;
+}
+
struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
{
--
2.16.3


2021-02-10 08:46:53

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

On 08.02.21 11:38, Oscar Salvador wrote:
> Free hugetlb pages are trickier to handle as to in order to guarantee
> no userspace appplication disruption, we need to replace the
> current free hugepage with a new one.
>
> In order to do that, a new function called alloc_and_dissolve_huge_page
> in introduced.
> This function will first try to get a new fresh hugetlb page, and if it
> succeeds, it will dissolve the old one.
>

Thanks for looking into this! Can we move this patch to #1 in the
series? It is the easier case.

I also wonder if we should at least try on the memory unplug path to
keep nr_pages by at least trying to allocate at new one if required, and
printing a warning if that fails (after all, we're messing with
something configured by the admin - "nr_pages"). Note that gigantic
pages are special (below).

> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> include/linux/hugetlb.h | 6 ++++++
> mm/compaction.c | 11 +++++++++++
> mm/hugetlb.c | 35 +++++++++++++++++++++++++++++++++++
> 3 files changed, 52 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..f81afcb86e89 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -505,6 +505,7 @@ struct huge_bootmem_page {
> struct hstate *hstate;
> };
>
> +bool alloc_and_dissolve_huge_page(struct page *page);
> struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr, int avoid_reserve);
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> @@ -773,6 +774,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> #else /* CONFIG_HUGETLB_PAGE */
> struct hstate {};
>
> +static inline bool alloc_and_dissolve_huge_page(struct page *page)
> +{
> + return false;
> +}
> +
> static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr,
> int avoid_reserve)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 89cd2e60da29..7969ddc10856 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> low_pfn += compound_nr(page) - 1;
> goto isolate_success_no_list;
> }
> + } else {

} else if (alloc_and_dissolve_huge_page(page))) {

...

> + /*
> + * Free hugetlb page. Allocate a new one and
> + * dissolve this is if succeed.
> + */
> + if (alloc_and_dissolve_huge_page(page)) {
> + unsigned long order = buddy_order_unsafe(page);
> +
> + low_pfn += (1UL << order) - 1;
> + continue;
> + }



Note that there is a very ugly corner case we will have to handle
gracefully (I think also in patch #1):

Assume you allocated a gigantic page (and assume that we are not using
CMA for gigantic pages for simplicity). Assume you want to allocate
another one. alloc_pool_huge_page()->...->alloc_contig_pages() will
stumble over the first allocated page. It will try to
alloc_and_dissolve_huge_page() the existing gigantic page. To do that,
it will alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on.
Bad.

We really don't want to mess with gigantic pages (migrate, dissolve)
while allocating a gigantic page. I think the easiest (and cleanest) way
forward is to not mess (isolate, migrate, dissolve) with gigantic pages
at all.

Gigantic pages are not movable, so they won't be placed on random CMA /
ZONE_MOVABLE.

Some hstate_is_gigantic(h) calls (maybe inside
alloc_and_dissolve_huge_page() ? ) along with a nice comment might be
good enough to avoid having to pass down some kind of alloc_contig
context. I even think that should be handled inside

(the main issue is that in contrast to CMA, plain alloc_contig_pages()
has no memory about which parts were allocated and will simply try
re-allocating what it previously allocated and never freed - which is
usually fine, unless we're dealing with such special cases)

Apart from that, not messing with gigantic pages feels like the right
approach (allocating/migrating gigantic pages is just horribly slow and
most probably not worth it anyway).

--
Thanks,

David / dhildenb

2021-02-10 14:27:57

by Oscar Salvador

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

On Wed, Feb 10, 2021 at 09:23:59AM +0100, David Hildenbrand wrote:
> On 08.02.21 11:38, Oscar Salvador wrote:
> > Free hugetlb pages are trickier to handle as to in order to guarantee
> > no userspace appplication disruption, we need to replace the
> > current free hugepage with a new one.
> >
> > In order to do that, a new function called alloc_and_dissolve_huge_page
> > in introduced.
> > This function will first try to get a new fresh hugetlb page, and if it
> > succeeds, it will dissolve the old one.
> >
>
> Thanks for looking into this! Can we move this patch to #1 in the series? It
> is the easier case.
>
> I also wonder if we should at least try on the memory unplug path to keep
> nr_pages by at least trying to allocate at new one if required, and printing
> a warning if that fails (after all, we're messing with something configured
> by the admin - "nr_pages"). Note that gigantic pages are special (below).

So, do you mean to allocate a new fresh hugepage in case we have a free
hugetlb page within the range we are trying to offline? That makes some
sense I guess.

I can have a look at that, and make hotplug code use the new
alloc_and_dissolve().

Thanks for bringing this up, it is somsething I did not think about.

> > + /*
> > + * Free hugetlb page. Allocate a new one and
> > + * dissolve this is if succeed.
> > + */
> > + if (alloc_and_dissolve_huge_page(page)) {
> > + unsigned long order = buddy_order_unsafe(page);
> > +
> > + low_pfn += (1UL << order) - 1;
> > + continue;
> > + }
>
>
>
> Note that there is a very ugly corner case we will have to handle gracefully
> (I think also in patch #1):
>
> Assume you allocated a gigantic page (and assume that we are not using CMA
> for gigantic pages for simplicity). Assume you want to allocate another one.
> alloc_pool_huge_page()->...->alloc_contig_pages() will stumble over the
> first allocated page. It will try to alloc_and_dissolve_huge_page() the
> existing gigantic page. To do that, it will
> alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on. Bad.

Heh, I was too naive. I have to confess I completely forgot about
gigantic pages and this cyclic dependency.

> We really don't want to mess with gigantic pages (migrate, dissolve) while
> allocating a gigantic page. I think the easiest (and cleanest) way forward
> is to not mess (isolate, migrate, dissolve) with gigantic pages at all.
>
> Gigantic pages are not movable, so they won't be placed on random CMA /
> ZONE_MOVABLE.
>
> Some hstate_is_gigantic(h) calls (maybe inside
> alloc_and_dissolve_huge_page() ? ) along with a nice comment might be good
> enough to avoid having to pass down some kind of alloc_contig context. I
> even think that should be handled inside
>
> (the main issue is that in contrast to CMA, plain alloc_contig_pages() has
> no memory about which parts were allocated and will simply try re-allocating
> what it previously allocated and never freed - which is usually fine, unless
> we're dealing with such special cases)
>
> Apart from that, not messing with gigantic pages feels like the right
> approach (allocating/migrating gigantic pages is just horribly slow and most
> probably not worth it anyway).

Yes, I also agree that we should leave out gigantic pages, at least for
now.
We might make it work in the future but I cannot come up with a fancy
way to work around that right now, so it makes sense to cut down the
complexity here.

Thanks David for the insight!

--
Oscar Salvador
SUSE L3

2021-02-10 14:39:12

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

On 10.02.21 15:24, Oscar Salvador wrote:
> On Wed, Feb 10, 2021 at 09:23:59AM +0100, David Hildenbrand wrote:
>> On 08.02.21 11:38, Oscar Salvador wrote:
>>> Free hugetlb pages are trickier to handle as to in order to guarantee
>>> no userspace appplication disruption, we need to replace the
>>> current free hugepage with a new one.
>>>
>>> In order to do that, a new function called alloc_and_dissolve_huge_page
>>> in introduced.
>>> This function will first try to get a new fresh hugetlb page, and if it
>>> succeeds, it will dissolve the old one.
>>>
>>
>> Thanks for looking into this! Can we move this patch to #1 in the series? It
>> is the easier case.
>>
>> I also wonder if we should at least try on the memory unplug path to keep
>> nr_pages by at least trying to allocate at new one if required, and printing
>> a warning if that fails (after all, we're messing with something configured
>> by the admin - "nr_pages"). Note that gigantic pages are special (below).
>
> So, do you mean to allocate a new fresh hugepage in case we have a free
> hugetlb page within the range we are trying to offline? That makes some
> sense I guess.
>
> I can have a look at that, and make hotplug code use the new
> alloc_and_dissolve().

Yes, with the difference that hotplug code most probably wants to
continue even if allocation failed (printing a warning) - mimix existing
behavior. For alloc_contig, I'd say, fail if we cannot "relocate free
huge pages that are still required to no modify nr_pages".

alloc_and_dissolve() should only allocate a page if really required
(e.g., not sure if we could skip allocation in some cases - like with
surplus pages, needs some investigation), such that the admin-configured
nr_pages stays unchanged.

--
Thanks,

David / dhildenb

2021-02-11 01:22:14

by Mike Kravetz

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

On 2/8/21 2:38 AM, Oscar Salvador wrote:
> Free hugetlb pages are trickier to handle as to in order to guarantee
> no userspace appplication disruption, we need to replace the
> current free hugepage with a new one.
>
> In order to do that, a new function called alloc_and_dissolve_huge_page
> in introduced.
> This function will first try to get a new fresh hugetlb page, and if it
> succeeds, it will dissolve the old one.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> include/linux/hugetlb.h | 6 ++++++
> mm/compaction.c | 11 +++++++++++
> mm/hugetlb.c | 35 +++++++++++++++++++++++++++++++++++
> 3 files changed, 52 insertions(+)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index ebca2ef02212..f81afcb86e89 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -505,6 +505,7 @@ struct huge_bootmem_page {
> struct hstate *hstate;
> };
>
> +bool alloc_and_dissolve_huge_page(struct page *page);
> struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr, int avoid_reserve);
> struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> @@ -773,6 +774,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
> #else /* CONFIG_HUGETLB_PAGE */
> struct hstate {};
>
> +static inline bool alloc_and_dissolve_huge_page(struct page *page)
> +{
> + return false;
> +}
> +
> static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr,
> int avoid_reserve)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 89cd2e60da29..7969ddc10856 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> low_pfn += compound_nr(page) - 1;
> goto isolate_success_no_list;
> }
> + } else {
> + /*
> + * Free hugetlb page. Allocate a new one and
> + * dissolve this is if succeed.
> + */
> + if (alloc_and_dissolve_huge_page(page)) {
> + unsigned long order = buddy_order_unsafe(page);
> +
> + low_pfn += (1UL << order) - 1;
> + continue;
> + }
> }
> goto isolate_fail;
> }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 18f6ee317900..79ffbb64c4ee 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2253,6 +2253,41 @@ static void restore_reserve_on_error(struct hstate *h,
> }
> }
>
> +bool alloc_and_dissolve_huge_page(struct page *page)
> +{
> + NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL);
> + struct page *head;
> + struct hstate *h;
> + bool ret = false;
> + int nid;
> +
> + if (!nodes_allowed)
> + return ret;
> +
> + spin_lock(&hugetlb_lock);
> + head = compound_head(page);
> + h = page_hstate(head);
> + nid = page_to_nid(head);
> + spin_unlock(&hugetlb_lock);
> +
> + init_nodemask_of_node(nodes_allowed, nid);
> +
> + /*
> + * Before dissolving the page, we need to allocate a new one,
> + * so the pool remains stable.
> + */
> + if (alloc_pool_huge_page(h, nodes_allowed, NULL)) {
> + /*
> + * Ok, we have a free hugetlb-page to replace this
> + * one. Dissolve the old page.
> + */
> + if (!dissolve_free_huge_page(page))
> + ret = true;

Should probably check for -EBUSY as this means someone started using
the page while we were allocating a new one. It would complicate the
code to try and do the 'right thing'. Right thing might be getting
dissolving the new pool page and then trying to isolate this in use
page. Of course things could change again while you are doing that. :(

Might be good enough as is. As noted previously, this code is racy.
--
Mike Kravetz

> + }
> +
> + return ret;
> +}
> +
> struct page *alloc_huge_page(struct vm_area_struct *vma,
> unsigned long addr, int avoid_reserve)
> {
>

2021-02-11 21:41:03

by Oscar Salvador

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

On Wed, Feb 10, 2021 at 05:16:29PM -0800, Mike Kravetz wrote:
> Should probably check for -EBUSY as this means someone started using
> the page while we were allocating a new one. It would complicate the
> code to try and do the 'right thing'. Right thing might be getting
> dissolving the new pool page and then trying to isolate this in use
> page. Of course things could change again while you are doing that. :(

Yeah, I kept the error handling rather low just be clear about the
approach I was leaning towards, but yes, we should definitely check
for -EBUSY on dissolve_free_huge_page().

And it might be that dissolve_free_huge_page() returns -EBUSY on the old
page, and we need to dissolve the freshly allocated one as it is not
going to be used, and that might fail as well due to reserves for
instance, or maybe someone started using it?
I have to confess that I need to check the reservation code closer to be
aware of corner cases.

We used to try to be clever in such situations in memory-failure code,
but at some point you end up thinking "ok, how many retries are
considered enough?".
That code was trickier as we were handling uncorrected/corrected memory
errors, so we strived to do our best, but here we can be more flexible
as the whole thing is racy per se, and just fail if we find too many
obstacles.

I shall resume work early next week.

Thanks for the tips ;-)

--
Oscar Salvador
SUSE L3

2021-02-25 21:49:27

by Mike Kravetz

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

On 2/10/21 12:23 AM, David Hildenbrand wrote:
> On 08.02.21 11:38, Oscar Salvador wrote:
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> low_pfn += compound_nr(page) - 1;
>> goto isolate_success_no_list;
>> }
>> + } else {
>
> } else if (alloc_and_dissolve_huge_page(page))) {
>
> ...
>
>> + /*
>> + * Free hugetlb page. Allocate a new one and
>> + * dissolve this is if succeed.
>> + */
>> + if (alloc_and_dissolve_huge_page(page)) {
>> + unsigned long order = buddy_order_unsafe(page);
>> +
>> + low_pfn += (1UL << order) - 1;
>> + continue;
>> + }
>
>
>
> Note that there is a very ugly corner case we will have to handle gracefully (I think also in patch #1):
>
> Assume you allocated a gigantic page (and assume that we are not using CMA for gigantic pages for simplicity). Assume you want to allocate another one. alloc_pool_huge_page()->...->alloc_contig_pages() will stumble over the first allocated page. It will try to alloc_and_dissolve_huge_page() the existing gigantic page. To do that, it will alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on. Bad.
>

Sorry for resurrecting an old thread.
While looking at V3 of these patches, I was exploring all the calling
sequences looking for races and other issues. It 'may' be that the
issue about infinitely allocating and freeing gigantic pages may not be
an issue. Of course, I could be mistaken. Here is my reasoning:

alloc_and_dissolve_huge_page (now isolate_or_dissolve_huge_page) will be
called from __alloc_contig_migrate_range() within alloc_contig_range().
Before calling __alloc_contig_migrate_range, we call start_isolate_page_range
to isolate all page blocks in the range. Because all the page blocks in
the range are isolated, another invocation of alloc_contig_range will
not operate on any part of that range. See the comments for
start_isolate_page_range or commit 2c7452a075d4. So, when
start_isolate_page_range goes to allocate another gigantic page it will
never notice/operate on the existing gigantic page.

Again, this is confusing and I might be missing something.

In any case, I agree that gigantic pages are tricky and we should leave
them out of the discussion for now. We can rethink this later if
necessary.
--
Mike Kravetz

2021-02-25 22:24:24

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle free hugetlb pages


> Am 25.02.2021 um 22:43 schrieb Mike Kravetz <[email protected]>:
>
> On 2/10/21 12:23 AM, David Hildenbrand wrote:
>>> On 08.02.21 11:38, Oscar Salvador wrote:
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -952,6 +952,17 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>> low_pfn += compound_nr(page) - 1;
>>> goto isolate_success_no_list;
>>> }
>>> + } else {
>>
>> } else if (alloc_and_dissolve_huge_page(page))) {
>>
>> ...
>>
>>> + /*
>>> + * Free hugetlb page. Allocate a new one and
>>> + * dissolve this is if succeed.
>>> + */
>>> + if (alloc_and_dissolve_huge_page(page)) {
>>> + unsigned long order = buddy_order_unsafe(page);
>>> +
>>> + low_pfn += (1UL << order) - 1;
>>> + continue;
>>> + }
>>
>>
>>
>> Note that there is a very ugly corner case we will have to handle gracefully (I think also in patch #1):
>>
>> Assume you allocated a gigantic page (and assume that we are not using CMA for gigantic pages for simplicity). Assume you want to allocate another one. alloc_pool_huge_page()->...->alloc_contig_pages() will stumble over the first allocated page. It will try to alloc_and_dissolve_huge_page() the existing gigantic page. To do that, it will alloc_pool_huge_page()->...->alloc_contig_pages() ... and so on. Bad.
>>
>
> Sorry for resurrecting an old thread.
> While looking at V3 of these patches, I was exploring all the calling
> sequences looking for races and other issues. It 'may' be that the
> issue about infinitely allocating and freeing gigantic pages may not be
> an issue. Of course, I could be mistaken. Here is my reasoning:
>
> alloc_and_dissolve_huge_page (now isolate_or_dissolve_huge_page) will be
> called from __alloc_contig_migrate_range() within alloc_contig_range().
> Before calling __alloc_contig_migrate_range, we call start_isolate_page_range
> to isolate all page blocks in the range. Because all the page blocks in
> the range are isolated, another invocation of alloc_contig_range will
> not operate on any part of that range. See the comments for
> start_isolate_page_range or commit 2c7452a075d4. So, when
> start_isolate_page_range goes to allocate another gigantic page it will
> never notice/operate on the existing gigantic page.
>
> Again, this is confusing and I might be missing something.

I think you are right that the endless loop is blocked. But I think the whole thing could cascade once we have multiple gigantic pages allocated.

Try allocating a new gpage. We find an existing gpage, isolate it and try to migrate it. To do that, we try allocating a new gpage. We find yet another existing gpage, isolate and try to migrate it ... until we isolated all gpages on out way to an actual usable area. Then we have to actually migrate all these in reverse order ...

Of course this only works if we can actually isolate a gigantic page - which should be the case I think (they are migratable and should be marked as movable).

>
> In any case, I agree that gigantic pages are tricky and we should leave
> them out of the discussion for now. We can rethink this later if
> necessary.

Yes, it‘s tricky and not strictly required right now because we never place them on ZONE_MOVABLE. And as I said, actual use cases might be rare.