2022-06-02 09:42:46

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 0/5] mm, hwpoison: enable 1GB hugepage support

Hi,

This patchset enables memory error handling on 1GB hugepage.

"Save raw error page" patch (1/4 of patchset [1]) is necessary, so it's
included in this series (the remaining part of hotplug related things are
still in progress). Patch 2/5 solves issues in a corner case of hugepage
handling, which might not be the main target of this patchset, but slightly
related. It was posted separately [2] but depends on 1/5, so I group them
together.

Patch 3/5 to 5/5 are main part of this series and fix a small issue about
handling 1GB hugepage, which I hope will be workable.

[1]: https://lore.kernel.org/linux-mm/[email protected]/T/#u

[2]: https://lore.kernel.org/linux-mm/[email protected]/T/

Please let me know if you have any suggestions and comments.

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (5):
mm, hwpoison, hugetlb: introduce SUBPAGE_INDEX_HWPOISON to save raw error page
mm,hwpoison: set PG_hwpoison for busy hugetlb pages
mm, hwpoison: make __page_handle_poison returns int
mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
mm, hwpoison: enable memory error handling on 1GB hugepage

include/linux/hugetlb.h | 24 ++++++++++++++++++++++++
include/linux/mm.h | 2 +-
include/ras/ras_event.h | 1 -
mm/hugetlb.c | 9 +++++++++
mm/memory-failure.c | 48 +++++++++++++++++++++---------------------------
5 files changed, 55 insertions(+), 29 deletions(-)


2022-06-02 12:12:06

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 3/5] mm, hwpoison: make __page_handle_poison returns int

From: Naoya Horiguchi <[email protected]>

__page_handle_poison() returns bool that shows whether
take_page_off_buddy() has passed or not now. But we will want to
distinguish another case of "dissolve has passed but taking off failed"
by its return value. So change the type of the return value.
No functional change.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fe6a7961dc66..f149a7864c81 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -68,7 +68,13 @@ int sysctl_memory_failure_recovery __read_mostly = 1;

atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);

-static bool __page_handle_poison(struct page *page)
+/*
+ * Return values:
+ * 1: the page is dissolved (if needed) and taken off from buddy,
+ * 0: the page is dissolved (if needed) and not taken off from buddy,
+ * < 0: failed to dissolve.
+ */
+static int __page_handle_poison(struct page *page)
{
int ret;

@@ -78,7 +84,7 @@ static bool __page_handle_poison(struct page *page)
ret = take_page_off_buddy(page);
zone_pcp_enable(page_zone(page));

- return ret > 0;
+ return ret;
}

static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
@@ -88,7 +94,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
* Doing this check for free pages is also fine since dissolve_free_huge_page
* returns 0 for non-hugetlb pages as well.
*/
- if (!__page_handle_poison(page))
+ if (__page_handle_poison(page) <= 0)
/*
* We could fail to take off the target page from buddy
* for example due to racy page allocation, but that's
@@ -1045,7 +1051,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
* save healthy subpages.
*/
put_page(hpage);
- if (__page_handle_poison(p)) {
+ if (__page_handle_poison(p) > 0) {
page_ref_inc(p);
res = MF_RECOVERED;
}
@@ -1595,8 +1601,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
*/
if (res == 0) {
unlock_page(head);
- res = MF_FAILED;
- if (__page_handle_poison(p)) {
+ if (__page_handle_poison(p) > 0) {
page_ref_inc(p);
res = MF_RECOVERED;
}
--
2.25.1


Subject: Re: [PATCH v1 3/5] mm, hwpoison: make __page_handle_poison returns int

On Tue, Jun 07, 2022 at 08:54:24PM +0800, Miaohe Lin wrote:
> On 2022/6/2 13:06, Naoya Horiguchi wrote:
> > From: Naoya Horiguchi <[email protected]>
> >
> > __page_handle_poison() returns bool that shows whether
> > take_page_off_buddy() has passed or not now. But we will want to
> > distinguish another case of "dissolve has passed but taking off failed"
> > by its return value. So change the type of the return value.
> > No functional change.
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > mm/memory-failure.c | 17 +++++++++++------
> > 1 file changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index fe6a7961dc66..f149a7864c81 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -68,7 +68,13 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
> >
> > atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
> >
> > -static bool __page_handle_poison(struct page *page)
> > +/*
> > + * Return values:
> > + * 1: the page is dissolved (if needed) and taken off from buddy,
> > + * 0: the page is dissolved (if needed) and not taken off from buddy,
> > + * < 0: failed to dissolve.
> > + */
> > +static int __page_handle_poison(struct page *page)
> > {
> > int ret;
> >
> > @@ -78,7 +84,7 @@ static bool __page_handle_poison(struct page *page)
> > ret = take_page_off_buddy(page);
> > zone_pcp_enable(page_zone(page));
> >
> > - return ret > 0;
> > + return ret;
> > }
> >
> > static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
> > @@ -88,7 +94,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
> > * Doing this check for free pages is also fine since dissolve_free_huge_page
> > * returns 0 for non-hugetlb pages as well.
> > */
> > - if (!__page_handle_poison(page))
> > + if (__page_handle_poison(page) <= 0)
> > /*
> > * We could fail to take off the target page from buddy
> > * for example due to racy page allocation, but that's
> > @@ -1045,7 +1051,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
> > * save healthy subpages.
> > */
> > put_page(hpage);
> > - if (__page_handle_poison(p)) {
> > + if (__page_handle_poison(p) > 0) {
> > page_ref_inc(p);
> > res = MF_RECOVERED;
> > }
> > @@ -1595,8 +1601,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> > */
> > if (res == 0) {
> > unlock_page(head);
> > - res = MF_FAILED;
>
> This looks like an unexpected change. res will be 0 instead of MF_FAILED if __page_handle_poison failed to
> dissolve or not taken off from buddy. But this is fixed in later patch in this series. So it should be fine.

Ah, you're right. this patch is stated as "non functional change" but that
is not true due to this. So I'll move this line deletion to 4/5 in the next
version.

>
> Reviewed-by: Miaohe Lin <[email protected]>
>
> Thanks!

Thank you :)

- Naoya Horiguchi