2021-03-05 00:04:43

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

From: Naoya Horiguchi <[email protected]>

Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later. My testcase (which calls hard-offline on some
tail page in a hugetlb, then accesses the address of the hugetlb range)
showed that page allocation code detects the page lock on buddy page and
printed out "BUG: Bad page state" message. PG_hwpoison does not prevent
it because PG_hwpoison flag is set on any subpage of the hugetlb page
but the 2nd page lock is on the head page.

This patch suggests to drop the 2nd page lock to fix the issue.

Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Cc: [email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git v5.11/mm/memory-failure.c v5.11_patched/mm/memory-failure.c
index e9481632fcd1..d8aba15295c5 100644
--- v5.11/mm/memory-failure.c
+++ v5.11_patched/mm/memory-failure.c
@@ -830,7 +830,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
page_ref_inc(p);
res = MF_RECOVERED;
}
- lock_page(hpage);
}

return res;
@@ -1286,7 +1285,8 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)

res = identify_page_state(pfn, p, page_flags);
out:
- unlock_page(head);
+ if (PageLocked(head))
+ unlock_page(head);
return res;
}

--
2.25.1


2021-03-05 07:27:31

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v1] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

On Thu, Mar 04, 2021 at 03:44:37PM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <[email protected]>

Hi Naoya,

good catch!

> Currently me_huge_page() temporary unlocks page to perform some actions
> then locks it again later. My testcase (which calls hard-offline on some
> tail page in a hugetlb, then accesses the address of the hugetlb range)
> showed that page allocation code detects the page lock on buddy page and
> printed out "BUG: Bad page state" message. PG_hwpoison does not prevent
> it because PG_hwpoison flag is set on any subpage of the hugetlb page
> but the 2nd page lock is on the head page.

I am having difficulties to parse "PG_hwpoison does not prevent it because
PG_hwpoison flag is set on any subpage of the hugetlb page".

What do you mean by that?

>
> This patch suggests to drop the 2nd page lock to fix the issue.
>
> Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
> Cc: [email protected]
> Signed-off-by: Naoya Horiguchi <[email protected]>

The fix looks fine to me:

Reviewed-by: Oscar Salvador <[email protected]>



--
Oscar Salvador
SUSE L3

2021-03-05 12:54:29

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers

Hello Oscar,

On Fri, Mar 05, 2021 at 08:26:58AM +0100, Oscar Salvador wrote:
> On Thu, Mar 04, 2021 at 03:44:37PM +0900, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <[email protected]>
>
> Hi Naoya,
>
> good catch!
>
> > Currently me_huge_page() temporary unlocks page to perform some actions
> > then locks it again later. My testcase (which calls hard-offline on some
> > tail page in a hugetlb, then accesses the address of the hugetlb range)
> > showed that page allocation code detects the page lock on buddy page and
> > printed out "BUG: Bad page state" message. PG_hwpoison does not prevent
> > it because PG_hwpoison flag is set on any subpage of the hugetlb page
> > but the 2nd page lock is on the head page.
>
> I am having difficulties to parse "PG_hwpoison does not prevent it because
> PG_hwpoison flag is set on any subpage of the hugetlb page".
>
> What do you mean by that?

What was in my mind is that check_new_page_bad() does not consider
a page with __PG_HWPOISON as bad page, so this flag works as kind of
filter, but this filtering doesn't work in my case because the
"bad page" is not the actual hwpoisoned page.

Thank for nice comment, I've updated the patch below with this description.

>
> >
> > This patch suggests to drop the 2nd page lock to fix the issue.
> >
> > Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
> > Cc: [email protected]
> > Signed-off-by: Naoya Horiguchi <[email protected]>
>
> The fix looks fine to me:
>
> Reviewed-by: Oscar Salvador <[email protected]>

Thank you!

Have a nice weekend.
- Naoya

---
From eaaaab05750c13fe9b637190410289a3168b097e Mon Sep 17 00:00:00 2001
From: Naoya Horiguchi <[email protected]>
Date: Fri, 5 Mar 2021 21:44:47 +0900
Subject: [PATCH v2] mm, hwpoison: do not lock page again when me_huge_page()
successfully recovers

Currently me_huge_page() temporary unlocks page to perform some actions
then locks it again later. My testcase (which calls hard-offline on some
tail page in a hugetlb, then accesses the address of the hugetlb range)
showed that page allocation code detects this page lock on buddy page and
printed out "BUG: Bad page state" message.

check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
page, so this flag works as kind of filter, but this filtering doesn't work
in this case because the "bad page" is not the actual hwpoisoned page.

This patch suggests to drop the 2nd page lock to fix the issue.

Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
Cc: [email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e9481632fcd1..d8aba15295c5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -830,7 +830,6 @@ static int me_huge_page(struct page *p, unsigned long pfn)
page_ref_inc(p);
res = MF_RECOVERED;
}
- lock_page(hpage);
}

return res;
@@ -1286,7 +1285,8 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)

res = identify_page_state(pfn, p, page_flags);
out:
- unlock_page(head);
+ if (PageLocked(head))
+ unlock_page(head);
return res;
}

--
2.25.1