2021-05-11 15:12:08

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v3 0/2] hwpoison: fix race with compound page allocation

Hi everyone,

I updated the patch for the issue reported by Muchun Song in [1]. I
separated into two patches (to reduce the size of core fix for stable):
patch 1/2 focuses on preventing VM_BUG_ON_PAGE() in the reported race.
Patch 2/2 includes some code improvement like calling retry code, revising
the function comment, and some refactoring.

Any comment and testing would be appreciated.

Thanks,
Naoya Horiguchi

[1] https://lore.kernel.org/linux-mm/[email protected]/T/
---
Summary:

Naoya Horiguchi (2):
mm,hwpoison: fix race with compound page allocation
mm,hwpoison: make get_hwpoison_page call get_any_page()

mm/memory-failure.c | 169 +++++++++++++++++++++++++++++-----------------------
1 file changed, 96 insertions(+), 73 deletions(-)


2021-05-11 15:13:17

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation

From: Naoya Horiguchi <[email protected]>

When hugetlb page fault (under overcommiting situation) and
memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:

CPU0: CPU1:

gather_surplus_pages()
page = alloc_surplus_huge_page()
memory_failure_hugetlb()
get_hwpoison_page(page)
__get_hwpoison_page(page)
get_page_unless_zero(page)
zero = put_page_testzero(page)
VM_BUG_ON_PAGE(!zero, page)
enqueue_huge_page(h, page)
put_page(page)

__get_hwpoison_page() only checks page refcount before taking additional
one for memory error handling, which is wrong because there's time
windows where compound pages have non-zero refcount during initialization.

So makes __get_hwpoison_page() check page status a bit more for a few
types of compound pages. PageSlab() check is added because otherwise
"non anonymous thp" path is wrongly chosen.

Fixes: ead07f6a867b ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: Muchun Song <[email protected]>
Cc: [email protected] # 5.12+
---
changelog v3:
- recheck PageHuge after holding hugetlb_lock,
---
mm/memory-failure.c | 55 ++++++++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 21 deletions(-)

diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
index a3659619d293..02668b24e512 100644
--- v5.12/mm/memory-failure.c
+++ v5.12_patched/mm/memory-failure.c
@@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
{
struct page *head = compound_head(page);

- if (!PageHuge(head) && PageTransHuge(head)) {
- /*
- * Non anonymous thp exists only in allocation/free time. We
- * can't handle such a case correctly, so let's give it up.
- * This should be better than triggering BUG_ON when kernel
- * tries to touch the "partially handled" page.
- */
- if (!PageAnon(head)) {
- pr_err("Memory failure: %#lx: non anonymous thp\n",
- page_to_pfn(page));
- return 0;
+ if (PageCompound(page)) {
+ if (PageSlab(page)) {
+ return get_page_unless_zero(page);
+ } else if (PageHuge(head)) {
+ int ret = 0;
+
+ spin_lock(&hugetlb_lock);
+ if (!PageHuge(head))
+ ret = -EBUSY;
+ else if (HPageFreed(head) || HPageMigratable(head))
+ ret = get_page_unless_zero(head);
+ spin_unlock(&hugetlb_lock);
+ return ret;
+ } else if (PageTransHuge(head)) {
+ /*
+ * Non anonymous thp exists only in allocation/free time. We
+ * can't handle such a case correctly, so let's give it up.
+ * This should be better than triggering BUG_ON when kernel
+ * tries to touch the "partially handled" page.
+ */
+ if (!PageAnon(head)) {
+ pr_err("Memory failure: %#lx: non anonymous thp\n",
+ page_to_pfn(page));
+ return 0;
+ }
+ if (get_page_unless_zero(head)) {
+ if (head == compound_head(page))
+ return 1;
+ pr_info("Memory failure: %#lx cannot catch tail\n",
+ page_to_pfn(page));
+ put_page(head);
+ }
}
+ return 0;
}

- if (get_page_unless_zero(head)) {
- if (head == compound_head(page))
- return 1;
-
- pr_info("Memory failure: %#lx cannot catch tail\n",
- page_to_pfn(page));
- put_page(head);
- }
-
- return 0;
+ return get_page_unless_zero(page);
}

/*
--
2.25.1

2021-05-11 15:13:23

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()

From: Naoya Horiguchi <[email protected]>

Now __get_hwpoison_page() could return -EBUSY in a race condition,
so it's helpful to handle it by retrying. We already have retry
logic, so make get_hwpoison_page() call get_any_page() when called
from memory_failure().

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 114 ++++++++++++++++++++++++--------------------
1 file changed, 62 insertions(+), 52 deletions(-)

diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c
index 02668b24e512..d6f470e9ee05 100644
--- v5.12/mm/memory-failure.c
+++ v5.12_patched/mm/memory-failure.c
@@ -1084,13 +1084,6 @@ static int page_action(struct page_state *ps, struct page *p,
return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
}

-/**
- * __get_hwpoison_page() - Get refcount for memory error handling:
- * @page: raw error page (hit by memory error)
- *
- * Return: return 0 if failed to grab the refcount, otherwise true (some
- * non-zero value.)
- */
static int __get_hwpoison_page(struct page *page)
{
struct page *head = compound_head(page);
@@ -1134,15 +1127,6 @@ static int __get_hwpoison_page(struct page *page)
return get_page_unless_zero(page);
}

-/*
- * Safely get reference count of an arbitrary page.
- *
- * Returns 0 for a free page, 1 for an in-use page,
- * -EIO for a page-type we cannot handle and -EBUSY if we raced with an
- * allocation.
- * We only incremented refcount in case the page was already in-use and it
- * is a known type we can handle.
- */
static int get_any_page(struct page *p, unsigned long flags)
{
int ret = 0, pass = 0;
@@ -1152,50 +1136,76 @@ static int get_any_page(struct page *p, unsigned long flags)
count_increased = true;

try_again:
- if (!count_increased && !__get_hwpoison_page(p)) {
- if (page_count(p)) {
- /* We raced with an allocation, retry. */
- if (pass++ < 3)
- goto try_again;
- ret = -EBUSY;
- } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
- /* We raced with put_page, retry. */
- if (pass++ < 3)
- goto try_again;
- ret = -EIO;
+ if (!count_increased) {
+ ret = __get_hwpoison_page(p);
+ if (!ret) {
+ if (page_count(p)) {
+ /* We raced with an allocation, retry. */
+ if (pass++ < 3)
+ goto try_again;
+ ret = -EBUSY;
+ } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
+ /* We raced with put_page, retry. */
+ if (pass++ < 3)
+ goto try_again;
+ ret = -EIO;
+ }
+ goto out;
}
+ }
+
+ if (ret == -EBUSY) {
+ /* We raced with freeing huge page to buddy, retry. */
+ if (pass++ < 3)
+ goto try_again;
+ } else if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
+ ret = 1;
} else {
- if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
- ret = 1;
- } else {
- /*
- * A page we cannot handle. Check whether we can turn
- * it into something we can handle.
- */
- if (pass++ < 3) {
- put_page(p);
- shake_page(p, 1);
- count_increased = false;
- goto try_again;
- }
+ /*
+ * A page we cannot handle. Check whether we can turn
+ * it into something we can handle.
+ */
+ if (pass++ < 3) {
put_page(p);
- ret = -EIO;
+ shake_page(p, 1);
+ count_increased = false;
+ goto try_again;
}
+ put_page(p);
+ ret = -EIO;
}
-
+out:
return ret;
}

-static int get_hwpoison_page(struct page *p, unsigned long flags,
- enum mf_flags ctxt)
+/**
+ * get_hwpoison_page() - Get refcount for memory error handling
+ * @p: Raw error page (hit by memory error)
+ * @flags: Flags controlling behavior of error handling
+ *
+ * get_hwpoison_page() takes a page refcount of an error page to handle memory
+ * error on it, after checking that the error page is in a well-defined state
+ * (defined as a page-type we can successfully handle the memor error on it,
+ * such as LRU page and hugetlb page).
+ *
+ * Memory error handling could be triggered at any time on any type of page,
+ * so it's prone to race with typical memory management lifecycle (like
+ * allocation and free). So to avoid such races, get_hwpoison_page() takes
+ * extra care for the error page's state (as done in __get_hwpoison_page()),
+ * and has some retry logic in get_any_page().
+ *
+ * Return: 0 for buddy free pages,
+ * 1 on success for in-use pages in a well-defined state,
+ * -EIO for pages on which we can not handle memory errors,
+ * -EBUSY when get_hwpoison_page() has raced with page lifecycle
+ * operations like allocation and free.
+ */
+static int get_hwpoison_page(struct page *p, unsigned long flags)
{
int ret;

zone_pcp_disable(page_zone(p));
- if (ctxt == MF_SOFT_OFFLINE)
- ret = get_any_page(p, flags);
- else
- ret = __get_hwpoison_page(p);
+ ret = get_any_page(p, flags);
zone_pcp_enable(page_zone(p));

return ret;
@@ -1384,7 +1394,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)

num_poisoned_pages_inc();

- if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
+ if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags)) {
/*
* Check "filter hit" and "race with other subpage."
*/
@@ -1608,7 +1618,7 @@ int memory_failure(unsigned long pfn, int flags)
* In fact it's dangerous to directly bump up page count from 0,
* that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
*/
- if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags, 0)) {
+ if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p, flags)) {
if (is_free_buddy_page(p)) {
if (take_page_off_buddy(p)) {
page_ref_inc(p);
@@ -1900,7 +1910,7 @@ int unpoison_memory(unsigned long pfn)
return 0;
}

- if (!get_hwpoison_page(p, flags, 0)) {
+ if (!get_hwpoison_page(p, flags)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
@@ -2116,7 +2126,7 @@ int soft_offline_page(unsigned long pfn, int flags)

retry:
get_online_mems();
- ret = get_hwpoison_page(page, flags, MF_SOFT_OFFLINE);
+ ret = get_hwpoison_page(page, flags);
put_online_mems();

if (ret > 0) {
--
2.25.1

2021-05-12 08:34:58

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation

On Wed, May 12, 2021 at 12:10:15AM +0900, Naoya Horiguchi wrote:
> @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
> {
> struct page *head = compound_head(page);
>
> - if (!PageHuge(head) && PageTransHuge(head)) {
> - /*
> - * Non anonymous thp exists only in allocation/free time. We
> - * can't handle such a case correctly, so let's give it up.
> - * This should be better than triggering BUG_ON when kernel
> - * tries to touch the "partially handled" page.
> - */
> - if (!PageAnon(head)) {
> - pr_err("Memory failure: %#lx: non anonymous thp\n",
> - page_to_pfn(page));
> - return 0;
> + if (PageCompound(page)) {
> + if (PageSlab(page)) {
> + return get_page_unless_zero(page);
> + } else if (PageHuge(head)) {
> + int ret = 0;
> +
> + spin_lock(&hugetlb_lock);
> + if (!PageHuge(head))
> + ret = -EBUSY;
> + else if (HPageFreed(head) || HPageMigratable(head))
> + ret = get_page_unless_zero(head);
> + spin_unlock(&hugetlb_lock);
> + return ret;

Uhm, I am having a hard time with that -EBUSY.
At this stage, we expect __get_hwpoison_page() to either return true or false,
depending on whether it could grab a page's refcount or not. Returning -EBUSY
here seems wrong (plus it is inconsistent with the comment above the function).
It might be useful for the latter patch, I do not know as I yet have to check
that one, but if anything, let us stay consistent here in this one.
So, if hugetlb vanished under us, let us return "we could not grab the
refcount". Does it make sense?

--
Oscar Salvador
SUSE L3

2021-05-12 08:58:48

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()

On Wed, May 12, 2021 at 12:10:16AM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <[email protected]>
>
> Now __get_hwpoison_page() could return -EBUSY in a race condition,
> so it's helpful to handle it by retrying. We already have retry
> logic, so make get_hwpoison_page() call get_any_page() when called
> from memory_failure().

As I stated in your previous patch, I think you should start returning -EBUSY
from this patch onwards.

> static int get_any_page(struct page *p, unsigned long flags)
> {
> int ret = 0, pass = 0;
> @@ -1152,50 +1136,76 @@ static int get_any_page(struct page *p, unsigned long flags)
> count_increased = true;
>
> try_again:
> - if (!count_increased && !__get_hwpoison_page(p)) {
> - if (page_count(p)) {
> - /* We raced with an allocation, retry. */
> - if (pass++ < 3)
> - goto try_again;
> - ret = -EBUSY;
> - } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> - /* We raced with put_page, retry. */
> - if (pass++ < 3)
> - goto try_again;
> - ret = -EIO;
> + if (!count_increased) {
> + ret = __get_hwpoison_page(p);
> + if (!ret) {
> + if (page_count(p)) {
> + /* We raced with an allocation, retry. */
> + if (pass++ < 3)
> + goto try_again;
> + ret = -EBUSY;
> + } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> + /* We raced with put_page, retry. */
> + if (pass++ < 3)
> + goto try_again;
> + ret = -EIO;
> + }
> + goto out;
> }
> + }

I think this can be improved.

We cannot have -EBUSY unless we come from __get_hwpoison_page() (!count_increased),
so I think a much more natural approach would be to stuff the hunk below in there,
and then place the other stuff in an else, so something like:

if (!count_increased) {
ret = __get_hwpoison_page(p);
if (!ret) {
if (page_count(p)) {
/* We raced with an allocation, retry. */
if (pass++ < 3)
goto try_again;
ret = -EBUSY;
} else if (!PageHuge(p) && !is_free_buddy_page(p)) {
/* We raced with put_page, retry. */
if (pass++ < 3)
goto try_again;
ret = -EIO;
}
goto out;
} else if (ret == -EBUSY) {
/* We raced with freeing huge page to buddy, retry. */
if (pass++ < 3)
goto try_again;
}
} else {
/* We do already have a refcount for this page, see if we can
* handle it.
*/
if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
ret = 1;
} else {
/* A page we cannot handle. Check...
}
}

Other than that, looks good to me.

> +
> + if (ret == -EBUSY) {
> + /* We raced with freeing huge page to buddy, retry. */
> + if (pass++ < 3)
> + goto try_again;

--
Oscar Salvador
SUSE L3

2021-05-12 12:22:14

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation

On Wed 12-05-21 00:10:15, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <[email protected]>
>
> When hugetlb page fault (under overcommiting situation) and
> memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
>
> CPU0: CPU1:
>
> gather_surplus_pages()
> page = alloc_surplus_huge_page()
> memory_failure_hugetlb()
> get_hwpoison_page(page)
> __get_hwpoison_page(page)
> get_page_unless_zero(page)
> zero = put_page_testzero(page)
> VM_BUG_ON_PAGE(!zero, page)
> enqueue_huge_page(h, page)
> put_page(page)
>
> __get_hwpoison_page() only checks page refcount before taking additional
> one for memory error handling, which is wrong because there's time
> windows where compound pages have non-zero refcount during initialization.
>
> So makes __get_hwpoison_page() check page status a bit more for a few
> types of compound pages. PageSlab() check is added because otherwise
> "non anonymous thp" path is wrongly chosen.

This should really describe the fix in more details. E.g.
[...]
> @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
> {
> struct page *head = compound_head(page);
>
> - if (!PageHuge(head) && PageTransHuge(head)) {
> - /*
> - * Non anonymous thp exists only in allocation/free time. We
> - * can't handle such a case correctly, so let's give it up.
> - * This should be better than triggering BUG_ON when kernel
> - * tries to touch the "partially handled" page.
> - */
> - if (!PageAnon(head)) {
> - pr_err("Memory failure: %#lx: non anonymous thp\n",
> - page_to_pfn(page));
> - return 0;
> + if (PageCompound(page)) {

So you do rely on PageCompound to be true. Which is prone to races as
well. All you need is to check before prep_compound_page and run into
get_page_unless_zero (down in this function) before hugetlb reaches
put_page_testzero. Or do I miss something that would prevent from that?
--
Michal Hocko
SUSE Labs

Subject: Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation

On Wed, May 12, 2021 at 02:19:32PM +0200, Michal Hocko wrote:
> On Wed 12-05-21 00:10:15, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <[email protected]>
> >
> > When hugetlb page fault (under overcommiting situation) and
> > memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following race:
> >
> > CPU0: CPU1:
> >
> > gather_surplus_pages()
> > page = alloc_surplus_huge_page()
> > memory_failure_hugetlb()
> > get_hwpoison_page(page)
> > __get_hwpoison_page(page)
> > get_page_unless_zero(page)
> > zero = put_page_testzero(page)
> > VM_BUG_ON_PAGE(!zero, page)
> > enqueue_huge_page(h, page)
> > put_page(page)
> >
> > __get_hwpoison_page() only checks page refcount before taking additional
> > one for memory error handling, which is wrong because there's time
> > windows where compound pages have non-zero refcount during initialization.
> >
> > So makes __get_hwpoison_page() check page status a bit more for a few
> > types of compound pages. PageSlab() check is added because otherwise
> > "non anonymous thp" path is wrongly chosen.
>
> This should really describe the fix in more details. E.g.
> [...]
> > @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
> > {
> > struct page *head = compound_head(page);
> >
> > - if (!PageHuge(head) && PageTransHuge(head)) {
> > - /*
> > - * Non anonymous thp exists only in allocation/free time. We
> > - * can't handle such a case correctly, so let's give it up.
> > - * This should be better than triggering BUG_ON when kernel
> > - * tries to touch the "partially handled" page.
> > - */
> > - if (!PageAnon(head)) {
> > - pr_err("Memory failure: %#lx: non anonymous thp\n",
> > - page_to_pfn(page));
> > - return 0;
> > + if (PageCompound(page)) {
>
> So you do rely on PageCompound to be true. Which is prone to races as
> well. All you need is to check before prep_compound_page and run into
> get_page_unless_zero (down in this function) before hugetlb reaches
> put_page_testzero. Or do I miss something that would prevent from that?

No, you're right, the race can still happen (I only considered the case when
a compound page is already allocated, that was insufficient...). Any check
outside locking seems harmful, so does the checking like below make more
sense?

static int __get_hwpoison_page(struct page *page)
{
struct page *head = compound_head(page);
int ret = 0;

#ifdef CONFIG_HUGETLB_PAGE // this #ifdef is needed for hugetlb_lock
spin_lock(&hugetlb_lock);
if (PageHuge(head) && (HPageFreed(head) || HPageMigratable(head)))
ret = get_page_unless_zero(head);
spin_unlock(&hugetlb_lock);
if (ret > 0)
return ret;
#endif

// other types of compound page.
...

return get_page_unless_zero(page);
}

- Naoya

Subject: Re: [PATCH v3 1/2] mm,hwpoison: fix race with compound page allocation

On Wed, May 12, 2021 at 10:33:24AM +0200, Oscar Salvador wrote:
> On Wed, May 12, 2021 at 12:10:15AM +0900, Naoya Horiguchi wrote:
> > @@ -1095,30 +1095,43 @@ static int __get_hwpoison_page(struct page *page)
> > {
> > struct page *head = compound_head(page);
> >
> > - if (!PageHuge(head) && PageTransHuge(head)) {
> > - /*
> > - * Non anonymous thp exists only in allocation/free time. We
> > - * can't handle such a case correctly, so let's give it up.
> > - * This should be better than triggering BUG_ON when kernel
> > - * tries to touch the "partially handled" page.
> > - */
> > - if (!PageAnon(head)) {
> > - pr_err("Memory failure: %#lx: non anonymous thp\n",
> > - page_to_pfn(page));
> > - return 0;
> > + if (PageCompound(page)) {
> > + if (PageSlab(page)) {
> > + return get_page_unless_zero(page);
> > + } else if (PageHuge(head)) {
> > + int ret = 0;
> > +
> > + spin_lock(&hugetlb_lock);
> > + if (!PageHuge(head))
> > + ret = -EBUSY;
> > + else if (HPageFreed(head) || HPageMigratable(head))
> > + ret = get_page_unless_zero(head);
> > + spin_unlock(&hugetlb_lock);
> > + return ret;
>
> Uhm, I am having a hard time with that -EBUSY.
> At this stage, we expect __get_hwpoison_page() to either return true or false,
> depending on whether it could grab a page's refcount or not. Returning -EBUSY
> here seems wrong (plus it is inconsistent with the comment above the function).
> It might be useful for the latter patch, I do not know as I yet have to check
> that one, but if anything, let us stay consistent here in this one.
> So, if hugetlb vanished under us, let us return "we could not grab the
> refcount". Does it make sense?

Yes, you are totally right. I failed to properly split the patch.
-EBUSY is non-zero, so it's considererd as "successfully pinned", which is
not true. I should've set ret to 0.

- Naoya

Subject: Re: [PATCH v3 2/2] mm,hwpoison: make get_hwpoison_page call get_any_page()

On Wed, May 12, 2021 at 10:55:22AM +0200, Oscar Salvador wrote:
> On Wed, May 12, 2021 at 12:10:16AM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <[email protected]>
> >
> > Now __get_hwpoison_page() could return -EBUSY in a race condition,
> > so it's helpful to handle it by retrying. We already have retry
> > logic, so make get_hwpoison_page() call get_any_page() when called
> > from memory_failure().
>
> As I stated in your previous patch, I think you should start returning -EBUSY
> from this patch onwards.
>
> > static int get_any_page(struct page *p, unsigned long flags)
> > {
> > int ret = 0, pass = 0;
> > @@ -1152,50 +1136,76 @@ static int get_any_page(struct page *p, unsigned long flags)
> > count_increased = true;
> >
> > try_again:
> > - if (!count_increased && !__get_hwpoison_page(p)) {
> > - if (page_count(p)) {
> > - /* We raced with an allocation, retry. */
> > - if (pass++ < 3)
> > - goto try_again;
> > - ret = -EBUSY;
> > - } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> > - /* We raced with put_page, retry. */
> > - if (pass++ < 3)
> > - goto try_again;
> > - ret = -EIO;
> > + if (!count_increased) {
> > + ret = __get_hwpoison_page(p);
> > + if (!ret) {
> > + if (page_count(p)) {
> > + /* We raced with an allocation, retry. */
> > + if (pass++ < 3)
> > + goto try_again;
> > + ret = -EBUSY;
> > + } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> > + /* We raced with put_page, retry. */
> > + if (pass++ < 3)
> > + goto try_again;
> > + ret = -EIO;
> > + }
> > + goto out;
> > }
> > + }
>
> I think this can be improved.
>
> We cannot have -EBUSY unless we come from __get_hwpoison_page() (!count_increased),
> so I think a much more natural approach would be to stuff the hunk below in there,
> and then place the other stuff in an else, so something like:
>
> if (!count_increased) {
> ret = __get_hwpoison_page(p);
> if (!ret) {
> if (page_count(p)) {
> /* We raced with an allocation, retry. */
> if (pass++ < 3)
> goto try_again;
> ret = -EBUSY;
> } else if (!PageHuge(p) && !is_free_buddy_page(p)) {
> /* We raced with put_page, retry. */
> if (pass++ < 3)
> goto try_again;
> ret = -EIO;
> }
> goto out;
> } else if (ret == -EBUSY) {
> /* We raced with freeing huge page to buddy, retry. */
> if (pass++ < 3)
> goto try_again;
> }

Moving "if (ret == -EBUSY)" block to here makes sense to me. Thank you.

> } else {

In the original logic, if __get_hwpoison_page() returns 1, we fall into the
"if (PageHuge(p) || PageLRU(p) || __PageMovable(p)" check. I guess that this
"else" seems not necessary?

> /* We do already have a refcount for this page, see if we can
> * handle it.
> */
> if (PageHuge(p) || PageLRU(p) || __PageMovable(p)) {
> ret = 1;
> } else {
> /* A page we cannot handle. Check...
> }
> }

Thanks,
Naoya Horiguchi