2022-07-14 04:59:44

by Naoya Horiguchi

[permalink] [raw]
Subject: [mm-unstable PATCH v7 0/8] mm, hwpoison: enable 1GB hugepage support (v7)

Here is v7 of "enabling memory error handling on 1GB hugepage" patchset.

I applied feedbacks provided for v6 (thank you, Miaohe).
There're a few improvements on on 3/8 and 4/8.

- v1: https://lore.kernel.org/linux-mm/[email protected]/T/#u
- v2: https://lore.kernel.org/linux-mm/[email protected]/T/#u
- v3: https://lore.kernel.org/linux-mm/[email protected]/T/#u
- v4: https://lore.kernel.org/linux-mm/[email protected]/T/#u
- v5: https://lore.kernel.org/linux-mm/[email protected]/T/#u
- v6: https://lore.kernel.org/linux-mm/[email protected]/T/#u

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (8):
mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
mm, hwpoison, hugetlb: support saving mechanism of raw error pages
mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
mm, hwpoison: set PG_hwpoison for busy hugetlb pages
mm, hwpoison: make __page_handle_poison returns int
mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
mm, hwpoison: enable memory error handling on 1GB hugepage

arch/x86/mm/hugetlbpage.c | 8 ++-
include/linux/hugetlb.h | 17 ++++-
include/linux/mm.h | 2 +-
include/linux/swapops.h | 9 +++
include/ras/ras_event.h | 1 -
mm/hugetlb.c | 58 +++++++++++----
mm/memory-failure.c | 179 ++++++++++++++++++++++++++++++++++++++--------
7 files changed, 226 insertions(+), 48 deletions(-)


2022-07-14 05:00:34

by Naoya Horiguchi

[permalink] [raw]
Subject: [mm-unstable PATCH v7 8/8] mm, hwpoison: enable memory error handling on 1GB hugepage

From: Naoya Horiguchi <[email protected]>

Now error handling code is prepared, so remove the blocking code and
enable memory error handling on 1GB hugepage.

Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
---
include/linux/mm.h | 1 -
include/ras/ras_event.h | 1 -
mm/memory-failure.c | 16 ----------------
3 files changed, 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7668831c919f..b0e83835184e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3241,7 +3241,6 @@ enum mf_action_page_type {
MF_MSG_DIFFERENT_COMPOUND,
MF_MSG_HUGE,
MF_MSG_FREE_HUGE,
- MF_MSG_NON_PMD_HUGE,
MF_MSG_UNMAP_FAILED,
MF_MSG_DIRTY_SWAPCACHE,
MF_MSG_CLEAN_SWAPCACHE,
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index d0337a41141c..cbd3ddd7c33d 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -360,7 +360,6 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
EM ( MF_MSG_HUGE, "huge page" ) \
EM ( MF_MSG_FREE_HUGE, "free huge page" ) \
- EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" ) \
EM ( MF_MSG_UNMAP_FAILED, "unmapping failed page" ) \
EM ( MF_MSG_DIRTY_SWAPCACHE, "dirty swapcache page" ) \
EM ( MF_MSG_CLEAN_SWAPCACHE, "clean swapcache page" ) \
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3721de624b98..d86b5acd5754 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -765,7 +765,6 @@ static const char * const action_page_types[] = {
[MF_MSG_DIFFERENT_COMPOUND] = "different compound page after locking",
[MF_MSG_HUGE] = "huge page",
[MF_MSG_FREE_HUGE] = "free huge page",
- [MF_MSG_NON_PMD_HUGE] = "non-pmd-sized huge page",
[MF_MSG_UNMAP_FAILED] = "unmapping failed page",
[MF_MSG_DIRTY_SWAPCACHE] = "dirty swapcache page",
[MF_MSG_CLEAN_SWAPCACHE] = "clean swapcache page",
@@ -1887,21 +1886,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb

page_flags = head->flags;

- /*
- * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
- * simply disable it. In order to make it work properly, we need
- * make sure that:
- * - conversion of a pud that maps an error hugetlb into hwpoison
- * entry properly works, and
- * - other mm code walking over page table is aware of pud-aligned
- * hwpoison entries.
- */
- if (huge_page_size(page_hstate(head)) > PMD_SIZE) {
- action_result(pfn, MF_MSG_NON_PMD_HUGE, MF_IGNORED);
- res = -EBUSY;
- goto out;
- }
-
if (!hwpoison_user_mappings(p, pfn, flags, head)) {
action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED);
res = -EBUSY;
--
2.25.1

2022-07-14 05:00:41

by Naoya Horiguchi

[permalink] [raw]
Subject: [mm-unstable PATCH v7 7/8] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage

From: Naoya Horiguchi <[email protected]>

Currently if memory_failure() (modified to remove blocking code with
subsequent patch) is called on a page in some 1GB hugepage, memory error
handling fails and the raw error page gets into leaked state. The impact
is small in production systems (just leaked single 4kB page), but this
limits the testability because unpoison doesn't work for it.
We can no longer create 1GB hugepage on the 1GB physical address range
with such leaked pages, that's not useful when testing on small systems.

When a hwpoison page in a 1GB hugepage is handled, it's caught by the
PageHWPoison check in free_pages_prepare() because the 1GB hugepage is
broken down into raw error pages before coming to this point:

if (unlikely(PageHWPoison(page)) && !order) {
...
return false;
}

Then, the page is not sent to buddy and the page refcount is left 0.

Originally this check is supposed to work when the error page is freed from
page_handle_poison() (that is called from soft-offline), but now we are
opening another path to call it, so the callers of __page_handle_poison()
need to handle the case by considering the return value 0 as success. Then
page refcount for hwpoison is properly incremented so unpoison works.

Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
---
v2 -> v3:
- remove "res = MF_FAILED" in try_memory_failure_hugetlb (by Miaohe)
---
mm/memory-failure.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c8fa3643791c..3721de624b98 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1084,7 +1084,6 @@ static int me_huge_page(struct page_state *ps, struct page *p)
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
unlock_page(hpage);
} else {
- res = MF_FAILED;
unlock_page(hpage);
/*
* migration entry prevents later access on error hugepage,
@@ -1092,9 +1091,11 @@ static int me_huge_page(struct page_state *ps, struct page *p)
* subpages.
*/
put_page(hpage);
- if (__page_handle_poison(p) > 0) {
+ if (__page_handle_poison(p) >= 0) {
page_ref_inc(p);
res = MF_RECOVERED;
+ } else {
+ res = MF_FAILED;
}
}

@@ -1874,10 +1875,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
*/
if (res == 0) {
unlock_page(head);
- res = MF_FAILED;
- if (__page_handle_poison(p) > 0) {
+ if (__page_handle_poison(p) >= 0) {
page_ref_inc(p);
res = MF_RECOVERED;
+ } else {
+ res = MF_FAILED;
}
action_result(pfn, MF_MSG_FREE_HUGE, res);
return res == MF_RECOVERED ? 0 : -EBUSY;
--
2.25.1

2022-07-14 05:01:59

by Naoya Horiguchi

[permalink] [raw]
Subject: [mm-unstable PATCH v7 5/8] mm, hwpoison: set PG_hwpoison for busy hugetlb pages

From: Naoya Horiguchi <[email protected]>

If memory_failure() fails to grab page refcount on a hugetlb page
because it's busy, it returns without setting PG_hwpoison on it.
This not only loses a chance of error containment, but breaks the rule
that action_result() should be called only when memory_failure() do
any of handling work (even if that's just setting PG_hwpoison).
This inconsistency could harm code maintainability.

So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case.

Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Miaohe Lin <[email protected]>
---
include/linux/mm.h | 1 +
mm/memory-failure.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4287bec50c28..7668831c919f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3188,6 +3188,7 @@ enum mf_flags {
MF_SOFT_OFFLINE = 1 << 3,
MF_UNPOISON = 1 << 4,
MF_SW_SIMULATED = 1 << 5,
+ MF_NO_RETRY = 1 << 6,
};
int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
unsigned long count, int mf_flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 8b9c0d228549..f15d521c3f1f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1802,7 +1802,8 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
count_increased = true;
} else {
ret = -EBUSY;
- goto out;
+ if (!(flags & MF_NO_RETRY))
+ goto out;
}

if (hugetlb_set_page_hwpoison(head, page)) {
@@ -1829,7 +1830,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
struct page *p = pfn_to_page(pfn);
struct page *head;
unsigned long page_flags;
- bool retry = true;

*hugetlb = 1;
retry:
@@ -1845,8 +1845,8 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
}
return res;
} else if (res == -EBUSY) {
- if (retry) {
- retry = false;
+ if (!(flags & MF_NO_RETRY)) {
+ flags |= MF_NO_RETRY;
goto retry;
}
action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
--
2.25.1