2024-04-26 17:16:15

by Sidhartha Kumar

[permalink] [raw]
Subject: [PATCH] mm/memory-failure: remove shake_page()

Use a folio in get_any_page() to save 5 calls to compound head and
convert the last user of shake_page() to shake_folio(). This allows us
to remove the shake_page() definition.

Signed-off-by: Sidhartha Kumar <[email protected]>
---
mm/memory-failure.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 16ada4fb02b79..273f6fef29f25 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -385,11 +385,6 @@ void shake_folio(struct folio *folio)
}
EXPORT_SYMBOL_GPL(shake_folio);

-static void shake_page(struct page *page)
-{
- shake_folio(page_folio(page));
-}
-
static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
unsigned long address)
{
@@ -1433,6 +1428,7 @@ static int get_any_page(struct page *p, unsigned long flags)
{
int ret = 0, pass = 0;
bool count_increased = false;
+ struct folio *folio = page_folio(p);

if (flags & MF_COUNT_INCREASED)
count_increased = true;
@@ -1446,7 +1442,7 @@ static int get_any_page(struct page *p, unsigned long flags)
if (pass++ < 3)
goto try_again;
ret = -EBUSY;
- } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
+ } else if (!folio_test_hugetlb(folio) && !is_free_buddy_page(p)) {
/* We raced with put_page, retry. */
if (pass++ < 3)
goto try_again;
@@ -1459,7 +1455,7 @@ static int get_any_page(struct page *p, unsigned long flags)
* page, retry.
*/
if (pass++ < 3) {
- shake_page(p);
+ shake_folio(folio);
goto try_again;
}
ret = -EIO;
@@ -1467,7 +1463,7 @@ static int get_any_page(struct page *p, unsigned long flags)
}
}

- if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
+ if (folio_test_hugetlb(folio) || HWPoisonHandlable(p, flags)) {
ret = 1;
} else {
/*
@@ -1475,12 +1471,12 @@ static int get_any_page(struct page *p, unsigned long flags)
* it into something we can handle.
*/
if (pass++ < 3) {
- put_page(p);
- shake_page(p);
+ folio_put(folio);
+ shake_folio(folio);
count_increased = false;
goto try_again;
}
- put_page(p);
+ folio_put(folio);
ret = -EIO;
}
out:
@@ -1643,7 +1639,7 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,

/*
* try_to_unmap() might put mlocked page in lru cache, so call
- * shake_page() again to ensure that it's flushed.
+ * shake_folio() again to ensure that it's flushed.
*/
if (mlocked)
shake_folio(folio);
--
2.44.0



2024-04-26 17:59:18

by Sidhartha Kumar

[permalink] [raw]
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

On 4/26/24 10:34 AM, Matthew Wilcox wrote:
> On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
>> Use a folio in get_any_page() to save 5 calls to compound head and
>> convert the last user of shake_page() to shake_folio(). This allows us
>> to remove the shake_page() definition.
>
> So I didn't do this before because I wasn't convinced it was safe.
> We don't have a refcount on the folio, so the page might no longer
> be part of this folio by the time we get the refcount on the folio.
>
> I'd really like to see some argumentation for why this is safe.

If I moved down the folio = page_folio() line to after we verify
__get_hwpoison_page() has returned 1, which indicates the reference count was
successfully incremented via foliO_try_get(), that means the folio conversion
would happen after we have a refcount. In the case we don't call
__get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This means
the page has existing users so that path would be safe as well. So I think this
is safe after moving page_folio() after __get_hwpoison_page().

Does that seem correct?

Thanks,
Sid




>
>> Signed-off-by: Sidhartha Kumar <[email protected]>
>> ---
>> mm/memory-failure.c | 20 ++++++++------------
>> 1 file changed, 8 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 16ada4fb02b79..273f6fef29f25 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -385,11 +385,6 @@ void shake_folio(struct folio *folio)
>> }
>> EXPORT_SYMBOL_GPL(shake_folio);
>>
>> -static void shake_page(struct page *page)
>> -{
>> - shake_folio(page_folio(page));
>> -}
>> -
>> static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
>> unsigned long address)
>> {
>> @@ -1433,6 +1428,7 @@ static int get_any_page(struct page *p, unsigned long flags)
>> {
>> int ret = 0, pass = 0;
>> bool count_increased = false;
>> + struct folio *folio = page_folio(p);
>>
>> if (flags & MF_COUNT_INCREASED)
>> count_increased = true;
>> @@ -1446,7 +1442,7 @@ static int get_any_page(struct page *p, unsigned long flags)
>> if (pass++ < 3)
>> goto try_again;
>> ret = -EBUSY;
>> - } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
>> + } else if (!folio_test_hugetlb(folio) && !is_free_buddy_page(p)) {
>> /* We raced with put_page, retry. */
>> if (pass++ < 3)
>> goto try_again;
>> @@ -1459,7 +1455,7 @@ static int get_any_page(struct page *p, unsigned long flags)
>> * page, retry.
>> */
>> if (pass++ < 3) {
>> - shake_page(p);
>> + shake_folio(folio);
>> goto try_again;
>> }
>> ret = -EIO;
>> @@ -1467,7 +1463,7 @@ static int get_any_page(struct page *p, unsigned long flags)
>> }
>> }
>>
>> - if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
>> + if (folio_test_hugetlb(folio) || HWPoisonHandlable(p, flags)) {
>> ret = 1;
>> } else {
>> /*
>> @@ -1475,12 +1471,12 @@ static int get_any_page(struct page *p, unsigned long flags)
>> * it into something we can handle.
>> */
>> if (pass++ < 3) {
>> - put_page(p);
>> - shake_page(p);
>> + folio_put(folio);
>> + shake_folio(folio);
>> count_increased = false;
>> goto try_again;
>> }
>> - put_page(p);
>> + folio_put(folio);
>> ret = -EIO;
>> }
>> out:
>> @@ -1643,7 +1639,7 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
>>
>> /*
>> * try_to_unmap() might put mlocked page in lru cache, so call
>> - * shake_page() again to ensure that it's flushed.
>> + * shake_folio() again to ensure that it's flushed.
>> */
>> if (mlocked)
>> shake_folio(folio);
>> --
>> 2.44.0
>>
>


2024-04-26 18:27:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote:
> On 4/26/24 10:34 AM, Matthew Wilcox wrote:
> > On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
> > > Use a folio in get_any_page() to save 5 calls to compound head and
> > > convert the last user of shake_page() to shake_folio(). This allows us
> > > to remove the shake_page() definition.
> >
> > So I didn't do this before because I wasn't convinced it was safe.
> > We don't have a refcount on the folio, so the page might no longer
> > be part of this folio by the time we get the refcount on the folio.
> >
> > I'd really like to see some argumentation for why this is safe.
>
> If I moved down the folio = page_folio() line to after we verify
> __get_hwpoison_page() has returned 1, which indicates the reference count
> was successfully incremented via foliO_try_get(), that means the folio
> conversion would happen after we have a refcount. In the case we don't call
> __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This
> means the page has existing users so that path would be safe as well. So I
> think this is safe after moving page_folio() after __get_hwpoison_page().

See if you can find a hole in this chain of reasoning ...

memory_failure()
p = pfn_to_online_page(pfn);
res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
(not a hugetlb)
if (TestSetPageHWPoison(p)) {
(not already poisoned)
if (!(flags & MF_COUNT_INCREASED)) {
res = get_hwpoison_page(p, flags);

get_hwpoison_page()
ret = get_any_page(p, flags);

get_any_page()
folio = page_folio(page)

Because we don't have a reference on the folio at this point (how could
we?), the folio might be split, and now we have a pointer to a folio
which no longer contains the page (assuming we had a hwerror in what
was a tail page at this time).

2024-04-26 18:37:12

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
> Use a folio in get_any_page() to save 5 calls to compound head and
> convert the last user of shake_page() to shake_folio(). This allows us
> to remove the shake_page() definition.

So I didn't do this before because I wasn't convinced it was safe.
We don't have a refcount on the folio, so the page might no longer
be part of this folio by the time we get the refcount on the folio.

I'd really like to see some argumentation for why this is safe.

> Signed-off-by: Sidhartha Kumar <[email protected]>
> ---
> mm/memory-failure.c | 20 ++++++++------------
> 1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 16ada4fb02b79..273f6fef29f25 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -385,11 +385,6 @@ void shake_folio(struct folio *folio)
> }
> EXPORT_SYMBOL_GPL(shake_folio);
>
> -static void shake_page(struct page *page)
> -{
> - shake_folio(page_folio(page));
> -}
> -
> static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
> unsigned long address)
> {
> @@ -1433,6 +1428,7 @@ static int get_any_page(struct page *p, unsigned long flags)
> {
> int ret = 0, pass = 0;
> bool count_increased = false;
> + struct folio *folio = page_folio(p);
>
> if (flags & MF_COUNT_INCREASED)
> count_increased = true;
> @@ -1446,7 +1442,7 @@ static int get_any_page(struct page *p, unsigned long flags)
> if (pass++ < 3)
> goto try_again;
> ret = -EBUSY;
> - } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> + } else if (!folio_test_hugetlb(folio) && !is_free_buddy_page(p)) {
> /* We raced with put_page, retry. */
> if (pass++ < 3)
> goto try_again;
> @@ -1459,7 +1455,7 @@ static int get_any_page(struct page *p, unsigned long flags)
> * page, retry.
> */
> if (pass++ < 3) {
> - shake_page(p);
> + shake_folio(folio);
> goto try_again;
> }
> ret = -EIO;
> @@ -1467,7 +1463,7 @@ static int get_any_page(struct page *p, unsigned long flags)
> }
> }
>
> - if (PageHuge(p) || HWPoisonHandlable(p, flags)) {
> + if (folio_test_hugetlb(folio) || HWPoisonHandlable(p, flags)) {
> ret = 1;
> } else {
> /*
> @@ -1475,12 +1471,12 @@ static int get_any_page(struct page *p, unsigned long flags)
> * it into something we can handle.
> */
> if (pass++ < 3) {
> - put_page(p);
> - shake_page(p);
> + folio_put(folio);
> + shake_folio(folio);
> count_increased = false;
> goto try_again;
> }
> - put_page(p);
> + folio_put(folio);
> ret = -EIO;
> }
> out:
> @@ -1643,7 +1639,7 @@ static bool hwpoison_user_mappings(struct folio *folio, struct page *p,
>
> /*
> * try_to_unmap() might put mlocked page in lru cache, so call
> - * shake_page() again to ensure that it's flushed.
> + * shake_folio() again to ensure that it's flushed.
> */
> if (mlocked)
> shake_folio(folio);
> --
> 2.44.0
>

2024-04-26 18:53:53

by Sidhartha Kumar

[permalink] [raw]
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

On 4/26/24 11:27 AM, Matthew Wilcox wrote:
> On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote:
>> On 4/26/24 10:34 AM, Matthew Wilcox wrote:
>>> On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
>>>> Use a folio in get_any_page() to save 5 calls to compound head and
>>>> convert the last user of shake_page() to shake_folio(). This allows us
>>>> to remove the shake_page() definition.
>>>
>>> So I didn't do this before because I wasn't convinced it was safe.
>>> We don't have a refcount on the folio, so the page might no longer
>>> be part of this folio by the time we get the refcount on the folio.
>>>
>>> I'd really like to see some argumentation for why this is safe.
>>
>> If I moved down the folio = page_folio() line to after we verify
>> __get_hwpoison_page() has returned 1, which indicates the reference count
>> was successfully incremented via foliO_try_get(), that means the folio
>> conversion would happen after we have a refcount. In the case we don't call
>> __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This
>> means the page has existing users so that path would be safe as well. So I
>> think this is safe after moving page_folio() after __get_hwpoison_page().
>
> See if you can find a hole in this chain of reasoning ...
>
> memory_failure()
> p = pfn_to_online_page(pfn);
> res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
> (not a hugetlb)
> if (TestSetPageHWPoison(p)) {
> (not already poisoned)
> if (!(flags & MF_COUNT_INCREASED)) {
> res = get_hwpoison_page(p, flags);
>
> get_hwpoison_page()
> ret = get_any_page(p, flags);
>
> get_any_page()
> folio = page_folio(page)

That would be unsafe, the safe way would be if we moved page_folio() after the
call to __get_hw_poison() in get_any_page() and there would still be one
remaining user of shake_page() that we can't convert. A safe version of this
patch would result in a removal of one use of PageHuge() and two uses of
put_page(), would that be worth submitting?

get_any_page()
if(__get_hwpoison_page())
folio = page_folio() /* folio_try_get() returned 1, safe */

>
> Because we don't have a reference on the folio at this point (how could
> we?), the folio might be split, and now we have a pointer to a folio
> which no longer contains the page (assuming we had a hwerror in what
> was a tail page at this time).


2024-04-26 19:05:32

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

On Fri, Apr 26, 2024 at 11:53:01AM -0700, Sidhartha Kumar wrote:
> On 4/26/24 11:27 AM, Matthew Wilcox wrote:
> > On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote:
> > > On 4/26/24 10:34 AM, Matthew Wilcox wrote:
> > > > On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
> > > > > Use a folio in get_any_page() to save 5 calls to compound head and
> > > > > convert the last user of shake_page() to shake_folio(). This allows us
> > > > > to remove the shake_page() definition.
> > > >
> > > > So I didn't do this before because I wasn't convinced it was safe.
> > > > We don't have a refcount on the folio, so the page might no longer
> > > > be part of this folio by the time we get the refcount on the folio.
> > > >
> > > > I'd really like to see some argumentation for why this is safe.
> > >
> > > If I moved down the folio = page_folio() line to after we verify
> > > __get_hwpoison_page() has returned 1, which indicates the reference count
> > > was successfully incremented via foliO_try_get(), that means the folio
> > > conversion would happen after we have a refcount. In the case we don't call
> > > __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This
> > > means the page has existing users so that path would be safe as well. So I
> > > think this is safe after moving page_folio() after __get_hwpoison_page().
> >
> > See if you can find a hole in this chain of reasoning ...
> >
> > memory_failure()
> > p = pfn_to_online_page(pfn);
> > res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
> > (not a hugetlb)
> > if (TestSetPageHWPoison(p)) {
> > (not already poisoned)
> > if (!(flags & MF_COUNT_INCREASED)) {
> > res = get_hwpoison_page(p, flags);
> >
> > get_hwpoison_page()
> > ret = get_any_page(p, flags);
> >
> > get_any_page()
> > folio = page_folio(page)
>
> That would be unsafe, the safe way would be if we moved page_folio() after
> the call to __get_hw_poison() in get_any_page() and there would still be one
> remaining user of shake_page() that we can't convert. A safe version of this
> patch would result in a removal of one use of PageHuge() and two uses of
> put_page(), would that be worth submitting?
>
> get_any_page()
> if(__get_hwpoison_page())
> folio = page_folio() /* folio_try_get() returned 1, safe */

I think we should convert __get_hwpoison_page() to return either the folio
or an ERR_PTR or NULL. Also, I think we should delete the "cannot catch
tail" part and just loop in __get_hwpoison_page() until we do catch it.
See try_get_folio() in mm/gup.c for inspiration (although you can't use
it exactly because that code knows that the page is mapped into a page
table, so has a refcount).

But that's just an immediate assessment; you might find a reason that
doesn't work.

2024-04-26 19:54:59

by Jane Chu

[permalink] [raw]
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

On 4/26/2024 12:05 PM, Matthew Wilcox wrote:

> On Fri, Apr 26, 2024 at 11:53:01AM -0700, Sidhartha Kumar wrote:
>> On 4/26/24 11:27 AM, Matthew Wilcox wrote:
>>> On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote:
>>>> On 4/26/24 10:34 AM, Matthew Wilcox wrote:
>>>>> On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
>>>>>> Use a folio in get_any_page() to save 5 calls to compound head and
>>>>>> convert the last user of shake_page() to shake_folio(). This allows us
>>>>>> to remove the shake_page() definition.
>>>>> So I didn't do this before because I wasn't convinced it was safe.
>>>>> We don't have a refcount on the folio, so the page might no longer
>>>>> be part of this folio by the time we get the refcount on the folio.
>>>>>
>>>>> I'd really like to see some argumentation for why this is safe.
>>>> If I moved down the folio = page_folio() line to after we verify
>>>> __get_hwpoison_page() has returned 1, which indicates the reference count
>>>> was successfully incremented via foliO_try_get(), that means the folio
>>>> conversion would happen after we have a refcount. In the case we don't call
>>>> __get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This
>>>> means the page has existing users so that path would be safe as well. So I
>>>> think this is safe after moving page_folio() after __get_hwpoison_page().
>>> See if you can find a hole in this chain of reasoning ...
>>>
>>> memory_failure()
>>> p = pfn_to_online_page(pfn);
>>> res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
>>> (not a hugetlb)
>>> if (TestSetPageHWPoison(p)) {
>>> (not already poisoned)
>>> if (!(flags & MF_COUNT_INCREASED)) {
>>> res = get_hwpoison_page(p, flags);
>>>
>>> get_hwpoison_page()
>>> ret = get_any_page(p, flags);
>>>
>>> get_any_page()
>>> folio = page_folio(page)
>> That would be unsafe, the safe way would be if we moved page_folio() after
>> the call to __get_hw_poison() in get_any_page() and there would still be one
>> remaining user of shake_page() that we can't convert. A safe version of this
>> patch would result in a removal of one use of PageHuge() and two uses of
>> put_page(), would that be worth submitting?
>>
>> get_any_page()
>> if(__get_hwpoison_page())
>> folio = page_folio() /* folio_try_get() returned 1, safe */
> I think we should convert __get_hwpoison_page() to return either the folio
> or an ERR_PTR or NULL. Also, I think we should delete the "cannot catch
> tail" part and just loop in __get_hwpoison_page() until we do catch it.
> See try_get_folio() in mm/gup.c for inspiration (although you can't use
> it exactly because that code knows that the page is mapped into a page
> table, so has a refcount).
>
> But that's just an immediate assessment; you might find a reason that
> doesn't work.
Besides, in a possible hugetlb demote scenario, it seems to me that we
should retry
get_hwpoison_hugetlb_folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/get_hwpoison_hugetlb_folio>()
instead of falling thru to folio_try_get().

staticint__get_hwpoison_page
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/__get_hwpoison_page>(structpage
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>*page
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>,unsignedlongflags)
{
structfolio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>*folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>=page_folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page_folio>(page
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>);
intret=0;
bool <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/bool>hugetlb
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/hugetlb>=false
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/false>;

ret=get_hwpoison_hugetlb_folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/get_hwpoison_hugetlb_folio>(folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>,&hugetlb
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/hugetlb>,false
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/false>);
if(hugetlb <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/hugetlb>){
/* Make sure hugetlb demotion did not happen from under us. */
if(folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>==page_folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page_folio>(page
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>))
returnret;
if(ret>0){ <--------- folio changes due to demote
folio_put
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio_put>(folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>);
folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>=page_folio
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page_folio>(page
<https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>);


thanks,
-jane


2024-04-26 20:34:01

by Jane Chu

[permalink] [raw]
Subject: Re: [PATCH] mm/memory-failure: remove shake_page()

My apology for the gobbled message earlier.

On 4/26/2024 12:52 PM, Jane Chu wrote:
> On 4/26/2024 12:05 PM, Matthew Wilcox wrote:
> [..]
>> That would be unsafe, the safe way would be if we moved page_folio()
>> after
>>> the call to __get_hw_poison() in get_any_page() and there would
>>> still be one
>>> remaining user of shake_page() that we can't convert. A safe version
>>> of this
>>> patch would result in a removal of one use of PageHuge() and two
>>> uses of
>>> put_page(), would that be worth submitting?
>>>
>>> get_any_page()
>>>     if(__get_hwpoison_page())
>>>         folio = page_folio() /* folio_try_get() returned 1, safe */
>> I think we should convert __get_hwpoison_page() to return either the
>> folio
>> or an ERR_PTR or NULL.  Also, I think we should delete the "cannot catch
>> tail" part and just loop in __get_hwpoison_page() until we do catch it.
>> See try_get_folio() in mm/gup.c for inspiration (although you can't use
>> it exactly because that code knows that the page is mapped into a page
>> table, so has a refcount).
>>
>> But that's just an immediate assessment; you might find a reason that
>> doesn't work.
>
Besides, in a possible hugetlb demote scenario, it seems to me that we
should retry
get_hwpoison_hugetlb_folio() instead of falling thru to folio_try_get().

static int __get_hwpoison_page(struct page *page, unsigned long flags)
{
        struct folio *folio = page_folio(page);
        int ret = 0;
        bool hugetlb = false;

        ret = get_hwpoison_hugetlb_folio(folio, &hugetlb, false);
        if (hugetlb) {
                /* Make sure hugetlb demotion did not happen from under
us. */
                if (folio == page_folio(page))
                        return ret;
                if (ret > 0) {      <===== still hugetlb, don't fall
thru, retry
                        folio_put(folio);
                        folio = page_folio(page);
                }

        }

thanks!
-jane