From: Naoya Horiguchi <[email protected]>
After a memory error happens on a clean folio, a process unexpectedly
receives SIGBUS when it accesses to the error page. This SIGBUS killing
is pointless and simply degrades the level of RAS of the system, because
the clean folio can be dropped without any data lost on memory error
handling as we do for a clean pagecache.
When memory_failure() is called on a clean folio, try_to_unmap() is called
twice (one from split_huge_page() and one from hwpoison_user_mappings()).
The root cause of the issue is that pte conversion to hwpoisoned entry is
now done in the first call of try_to_unmap() because PageHWPoison is already
set at this point, while it's actually expected to be done in the second
call. This behavior disturbs the error handling operation like removing
pagecache, which results in the malfunction described above.
So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only
when we really intend to convert pte to hwpoison entry. This can prevent
other callers of try_to_unmap() from accidentally converting to hwpoison
entries.
Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()")
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: [email protected] # 5.18+
---
include/linux/rmap.h | 2 +-
mm/memory-failure.c | 8 ++++----
mm/rmap.c | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index a4570da03e58..b87d01660412 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -94,7 +94,7 @@ enum ttu_flags {
TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */
TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */
TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */
- TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
+ TTU_HWPOISON = 0x20, /* do convert pte to hwpoison entry */
TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible
* and caller guarantees they will
* do a final flush if necessary */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a1ede7bdce95..fae9baf3be16 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1069,7 +1069,7 @@ static int me_pagecache_dirty(struct page_state *ps, struct page *p)
* cache and swap cache(ie. page is freshly swapped in). So it could be
* referenced concurrently by 2 types of PTEs:
* normal PTEs and swap PTEs. We try to handle them consistently by calling
- * try_to_unmap(TTU_IGNORE_HWPOISON) to convert the normal PTEs to swap PTEs,
+ * try_to_unmap(!TTU_HWPOISON) to convert the normal PTEs to swap PTEs,
* and then
* - clear dirty bit to prevent IO
* - remove from LRU
@@ -1486,7 +1486,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
int flags, struct page *hpage)
{
struct folio *folio = page_folio(hpage);
- enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC;
+ enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
struct address_space *mapping;
LIST_HEAD(tokill);
bool unmap_success;
@@ -1516,7 +1516,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
if (PageSwapCache(p)) {
pr_err("%#lx: keeping poisoned page in swap cache\n", pfn);
- ttu |= TTU_IGNORE_HWPOISON;
+ ttu &= ~TTU_HWPOISON;
}
/*
@@ -1531,7 +1531,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
if (page_mkclean(hpage)) {
SetPageDirty(hpage);
} else {
- ttu |= TTU_IGNORE_HWPOISON;
+ ttu &= ~TTU_HWPOISON;
pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
pfn);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 15ae24585fc4..8632e02661ac 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1602,7 +1602,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
/* Update high watermark before we lower rss */
update_hiwater_rss(mm);
- if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) {
+ if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
if (folio_test_hugetlb(folio)) {
hugetlb_count_sub(folio_nr_pages(folio), mm);
--
2.25.1
On Tue, Feb 21, 2023 at 05:59:05PM +0900, Naoya Horiguchi wrote:
> After a memory error happens on a clean folio, a process unexpectedly
> receives SIGBUS when it accesses to the error page. This SIGBUS killing
> is pointless and simply degrades the level of RAS of the system, because
> the clean folio can be dropped without any data lost on memory error
> handling as we do for a clean pagecache.
>
> When memory_failure() is called on a clean folio, try_to_unmap() is called
> twice (one from split_huge_page() and one from hwpoison_user_mappings()).
> The root cause of the issue is that pte conversion to hwpoisoned entry is
> now done in the first call of try_to_unmap() because PageHWPoison is already
> set at this point, while it's actually expected to be done in the second
> call. This behavior disturbs the error handling operation like removing
> pagecache, which results in the malfunction described above.
>
> So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only
> when we really intend to convert pte to hwpoison entry. This can prevent
> other callers of try_to_unmap() from accidentally converting to hwpoison
> entries.
>
> Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()")
How did you choose this Fixes tag?
On Tue, Feb 21, 2023 at 01:35:39PM +0000, Matthew Wilcox wrote:
> On Tue, Feb 21, 2023 at 05:59:05PM +0900, Naoya Horiguchi wrote:
> > After a memory error happens on a clean folio, a process unexpectedly
> > receives SIGBUS when it accesses to the error page. This SIGBUS killing
> > is pointless and simply degrades the level of RAS of the system, because
> > the clean folio can be dropped without any data lost on memory error
> > handling as we do for a clean pagecache.
> >
> > When memory_failure() is called on a clean folio, try_to_unmap() is called
> > twice (one from split_huge_page() and one from hwpoison_user_mappings()).
> > The root cause of the issue is that pte conversion to hwpoisoned entry is
> > now done in the first call of try_to_unmap() because PageHWPoison is already
> > set at this point, while it's actually expected to be done in the second
> > call. This behavior disturbs the error handling operation like removing
> > pagecache, which results in the malfunction described above.
> >
> > So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only
> > when we really intend to convert pte to hwpoison entry. This can prevent
> > other callers of try_to_unmap() from accidentally converting to hwpoison
> > entries.
> >
> > Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()")
>
> How did you choose this Fixes tag?
I thought that before this commit thps are anonymous thps or shmem thps,
both of which are considered as dirty thps (with no backup on storage).
The reported problem affects the case of clean folio, so I thought that
it got visible since we can have clean folios.
But in my second thought, the wrong pte conversion could also happen on
generic thp split (that happened to have no effect on visible behavior),
so I should've set Fixes tag to older commit?
Thanks,
Naoya Horiguchi