2021-06-14 02:15:05

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 0/6] mm/hwpoison: fix unpoison_memory()

Hi,

This patchset is suggesting to adjust unpoison_memory() to the new way of
isolation and refcounting of hwpoisoned pages.

As mentioned in [1], NR_FREE_PAGES is incremented via newly added function
undoing take_page_off_buddy(), so the counter is consistent with this series.

It depends on some other hwpoison-related patches (not mainlined but in linux-mm),
so I listed the dependencies here:

- mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
- mm/memory-failure: make sure wait for page writeback in memory_failure
- mm,hwpoison: make get_hwpoison_page() call get_any_page()
- mm,hwpoison: fix race with hugetlb page allocation
- mm,hwpoison: Send SIGBUS with error virutal address
- mm,hwpoison: Return -EHWPOISON to denote that the page has already been poisoned
- mm/memory-failure: Use a mutex to avoid memory_failure() races

Or GitHub branch I pushed for easy patch access might be helpful for reviewers.

git fetch https://github.com/nhoriguchi/linux hwpoison

[1] https://lore.kernel.org/linux-mm/[email protected]/

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (6):
mm/hwpoison: mf_mutex for soft offline and unpoison
mm/hwpoison: remove race consideration
mm/hwpoison: introduce MF_MSG_PAGETABLE
mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
mm/hwpoison: make some kernel pages handlable
mm/hwpoison: fix unpoison_memory()

include/linux/mm.h | 4 +-
include/linux/page-flags.h | 4 ++
include/ras/ras_event.h | 3 +-
mm/memory-failure.c | 176 +++++++++++++++++++++++++--------------------
mm/page_alloc.c | 19 +++++
5 files changed, 126 insertions(+), 80 deletions(-)


2021-06-14 02:15:33

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 4/6] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE

From: Naoya Horiguchi <[email protected]>

These action_page_types are no longer used, so remove them.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/mm.h | 2 --
include/ras/ras_event.h | 2 --
mm/memory-failure.c | 2 --
3 files changed, 6 deletions(-)

diff --git v5.13-rc5/include/linux/mm.h v5.13-rc5_patched/include/linux/mm.h
index 45008654f695..f1e3b82e1a93 100644
--- v5.13-rc5/include/linux/mm.h
+++ v5.13-rc5_patched/include/linux/mm.h
@@ -3105,7 +3105,6 @@ enum mf_action_page_type {
MF_MSG_SLAB,
MF_MSG_PAGETABLE,
MF_MSG_DIFFERENT_COMPOUND,
- MF_MSG_POISONED_HUGE,
MF_MSG_HUGE,
MF_MSG_FREE_HUGE,
MF_MSG_NON_PMD_HUGE,
@@ -3120,7 +3119,6 @@ enum mf_action_page_type {
MF_MSG_CLEAN_LRU,
MF_MSG_TRUNCATED_LRU,
MF_MSG_BUDDY,
- MF_MSG_BUDDY_2ND,
MF_MSG_DAX,
MF_MSG_UNSPLIT_THP,
MF_MSG_UNKNOWN,
diff --git v5.13-rc5/include/ras/ras_event.h v5.13-rc5_patched/include/ras/ras_event.h
index 2f459f6f87fb..23306428f5e6 100644
--- v5.13-rc5/include/ras/ras_event.h
+++ v5.13-rc5_patched/include/ras/ras_event.h
@@ -359,7 +359,6 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_SLAB, "kernel slab page" ) \
EM ( MF_MSG_PAGETABLE, "page table page page" ) \
EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
- EM ( MF_MSG_POISONED_HUGE, "huge page already hardware poisoned" ) \
EM ( MF_MSG_HUGE, "huge page" ) \
EM ( MF_MSG_FREE_HUGE, "free huge page" ) \
EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" ) \
@@ -374,7 +373,6 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \
EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \
EM ( MF_MSG_BUDDY, "free buddy page" ) \
- EM ( MF_MSG_BUDDY_2ND, "free buddy page (2nd try)" ) \
EM ( MF_MSG_DAX, "dax page" ) \
EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
EMe ( MF_MSG_UNKNOWN, "unknown page" )
diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
index 30d6519ce203..b986936e50eb 100644
--- v5.13-rc5/mm/memory-failure.c
+++ v5.13-rc5_patched/mm/memory-failure.c
@@ -710,7 +710,6 @@ static const char * const action_page_types[] = {
[MF_MSG_SLAB] = "kernel slab page",
[MF_MSG_PAGETABLE] = "page table page",
[MF_MSG_DIFFERENT_COMPOUND] = "different compound page after locking",
- [MF_MSG_POISONED_HUGE] = "huge page already hardware poisoned",
[MF_MSG_HUGE] = "huge page",
[MF_MSG_FREE_HUGE] = "free huge page",
[MF_MSG_NON_PMD_HUGE] = "non-pmd-sized huge page",
@@ -725,7 +724,6 @@ static const char * const action_page_types[] = {
[MF_MSG_CLEAN_LRU] = "clean LRU page",
[MF_MSG_TRUNCATED_LRU] = "already truncated LRU page",
[MF_MSG_BUDDY] = "free buddy page",
- [MF_MSG_BUDDY_2ND] = "free buddy page (2nd try)",
[MF_MSG_DAX] = "dax page",
[MF_MSG_UNSPLIT_THP] = "unsplit thp",
[MF_MSG_UNKNOWN] = "unknown page",
--
2.25.1

2021-06-14 02:15:45

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 5/6] mm/hwpoison: make some kernel pages handlable

From: Naoya Horiguchi <[email protected]>

HWPoisonHandlable() introduced by patch "mm,hwpoison: fix race with hugetlb
page allocation" filters error events by page type, and only limited events
reach get_page_unless_zero() to avoid race.

Actually this is too restictive because get_hwpoison_page always fails
to take refcount for any types of kernel page, leading to
MF_MSG_KERNEL_HIGH_ORDER. This is not critical (no panic), but less
informative than MF_MSG_SLAB or MF_MSG_PAGETABLE, so extend
HWPoisonHandlable() to some basic types of kernel pages (slab, pgtable,
and reserved pages).

The "handling" for these types are still primitive (just taking refcount
and setting PG_hwpoison) and some more aggressive actions for memory
error containment are possible and wanted. But compared to the older code,
these cases never enter the code block of page locks (note that
page locks is not well-defined on these pages), so it's a little safer
for functions intended for user pages not to be called for kernel pages.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
index b986936e50eb..0d51067f0129 100644
--- v5.13-rc5/mm/memory-failure.c
+++ v5.13-rc5_patched/mm/memory-failure.c
@@ -1113,7 +1113,8 @@ static int page_action(struct page_state *ps, struct page *p,
*/
static inline bool HWPoisonHandlable(struct page *page)
{
- return PageLRU(page) || __PageMovable(page);
+ return PageLRU(page) || __PageMovable(page) ||
+ PageSlab(page) || PageTable(page) || PageReserved(page);
}

static int __get_hwpoison_page(struct page *page)
@@ -1260,12 +1261,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
struct page *hpage = *hpagep;
bool mlocked = PageMlocked(hpage);

- /*
- * Here we are interested only in user-mapped pages, so skip any
- * other types of pages.
- */
- if (PageReserved(p) || PageSlab(p))
- return true;
if (!(PageLRU(hpage) || PageHuge(p)))
return true;

@@ -1670,7 +1665,10 @@ int memory_failure(unsigned long pfn, int flags)
action_result(pfn, MF_MSG_BUDDY, res);
res = res == MF_RECOVERED ? 0 : -EBUSY;
} else {
- action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
+ if (PageCompound(p))
+ action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
+ else
+ action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
res = -EBUSY;
}
goto unlock_mutex;
@@ -1681,6 +1679,20 @@ int memory_failure(unsigned long pfn, int flags)
}
}

+ if (PageSlab(p)) {
+ action_result(pfn, MF_MSG_SLAB, MF_IGNORED);
+ res = -EBUSY;
+ goto unlock_mutex;
+ } else if (PageTable(p)) {
+ action_result(pfn, MF_MSG_PAGETABLE, MF_IGNORED);
+ res = -EBUSY;
+ goto unlock_mutex;
+ } else if (PageReserved(p)) {
+ action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+ res = -EBUSY;
+ goto unlock_mutex;
+ }
+
if (PageTransHuge(hpage)) {
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
--
2.25.1

2021-06-14 02:15:45

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 1/6] mm/hwpoison: mf_mutex for soft offline and unpoison

From: Naoya Horiguchi <[email protected]>

Originally mf_mutex is introduced to serialize multiple MCE events, but
it's also helpful to exclude races among soft_offline_page() and
unpoison_memory(). So apply mf_mutex to them.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)

diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
index ae30fd6d575a..280eb6d6dd15 100644
--- v5.13-rc5/mm/memory-failure.c
+++ v5.13-rc5_patched/mm/memory-failure.c
@@ -1583,6 +1583,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
return rc;
}

+static DEFINE_MUTEX(mf_mutex);
+
/**
* memory_failure - Handle memory failure of a page.
* @pfn: Page Number of the corrupted page
@@ -1609,7 +1611,6 @@ int memory_failure(unsigned long pfn, int flags)
int res = 0;
unsigned long page_flags;
bool retry = true;
- static DEFINE_MUTEX(mf_mutex);

if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
@@ -1918,6 +1919,7 @@ int unpoison_memory(unsigned long pfn)
struct page *page;
struct page *p;
int freeit = 0;
+ int ret = 0;
unsigned long flags = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
@@ -1928,28 +1930,30 @@ int unpoison_memory(unsigned long pfn)
p = pfn_to_page(pfn);
page = compound_head(p);

+ mutex_lock(&mf_mutex);
+
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (page_count(page) > 1) {
unpoison_pr_info("Unpoison: Someone grabs the hwpoison page %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (page_mapped(page)) {
unpoison_pr_info("Unpoison: Someone maps the hwpoison page %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (page_mapping(page)) {
unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

/*
@@ -1960,7 +1964,7 @@ int unpoison_memory(unsigned long pfn)
if (!PageHuge(page) && PageTransHuge(page)) {
unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (!get_hwpoison_page(p, flags)) {
@@ -1968,7 +1972,7 @@ int unpoison_memory(unsigned long pfn)
num_poisoned_pages_dec();
unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

lock_page(page);
@@ -1990,7 +1994,9 @@ int unpoison_memory(unsigned long pfn)
if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
put_page(page);

- return 0;
+unlock_mutex:
+ mutex_unlock(&mf_mutex);
+ return ret;
}
EXPORT_SYMBOL(unpoison_memory);

@@ -2171,6 +2177,8 @@ int soft_offline_page(unsigned long pfn, int flags)
return -EIO;
}

+ mutex_lock(&mf_mutex);
+
if (PageHWPoison(page)) {
pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
put_ref_page(ref_page);
@@ -2194,5 +2202,7 @@ int soft_offline_page(unsigned long pfn, int flags)
__func__, pfn, page->flags, &page->flags);
}

+ mutex_unlock(&mf_mutex);
+
return ret;
}
--
2.25.1

2021-06-14 02:16:03

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 2/6] mm/hwpoison: remove race consideration

From: Naoya Horiguchi <[email protected]>

Now memory_failure() and unpoison_memory() are protected by mf_mutex,
so no need to explicitly check races between them. So remove them.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/memory-failure.c | 37 -------------------------------------
1 file changed, 37 deletions(-)

diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
index 280eb6d6dd15..e7910386fc9c 100644
--- v5.13-rc5/mm/memory-failure.c
+++ v5.13-rc5_patched/mm/memory-failure.c
@@ -1461,14 +1461,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
lock_page(head);
page_flags = head->flags;

- if (!PageHWPoison(head)) {
- pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
- num_poisoned_pages_dec();
- unlock_page(head);
- put_page(head);
- return 0;
- }
-
/*
* TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
* simply disable it. In order to make it work properly, we need
@@ -1730,16 +1722,6 @@ int memory_failure(unsigned long pfn, int flags)
*/
page_flags = p->flags;

- /*
- * unpoison always clear PG_hwpoison inside page lock
- */
- if (!PageHWPoison(p)) {
- pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
- num_poisoned_pages_dec();
- unlock_page(p);
- put_page(p);
- goto unlock_mutex;
- }
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
@@ -1956,17 +1938,6 @@ int unpoison_memory(unsigned long pfn)
goto unlock_mutex;
}

- /*
- * unpoison_memory() can encounter thp only when the thp is being
- * worked by memory_failure() and the page lock is not held yet.
- * In such case, we yield to memory_failure() and make unpoison fail.
- */
- if (!PageHuge(page) && PageTransHuge(page)) {
- unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
- pfn, &unpoison_rs);
- goto unlock_mutex;
- }
-
if (!get_hwpoison_page(p, flags)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
@@ -1975,20 +1946,12 @@ int unpoison_memory(unsigned long pfn)
goto unlock_mutex;
}

- lock_page(page);
- /*
- * This test is racy because PG_hwpoison is set outside of page lock.
- * That's acceptable because that won't trigger kernel panic. Instead,
- * the PG_hwpoison page will be caught and isolated on the entrance to
- * the free buddy page pool.
- */
if (TestClearPageHWPoison(page)) {
unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
pfn, &unpoison_rs);
num_poisoned_pages_dec();
freeit = 1;
}
- unlock_page(page);

put_page(page);
if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
--
2.25.1

2021-06-14 02:16:15

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 3/6] mm/hwpoison: introduce MF_MSG_PAGETABLE

From: Naoya Horiguchi <[email protected]>

Page table pages could have some amount of share on system memory,
so define it for a separate page type message.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/mm.h | 1 +
include/ras/ras_event.h | 1 +
mm/memory-failure.c | 1 +
3 files changed, 3 insertions(+)

diff --git v5.13-rc5/include/linux/mm.h v5.13-rc5_patched/include/linux/mm.h
index c274f75efcf9..45008654f695 100644
--- v5.13-rc5/include/linux/mm.h
+++ v5.13-rc5_patched/include/linux/mm.h
@@ -3103,6 +3103,7 @@ enum mf_action_page_type {
MF_MSG_KERNEL,
MF_MSG_KERNEL_HIGH_ORDER,
MF_MSG_SLAB,
+ MF_MSG_PAGETABLE,
MF_MSG_DIFFERENT_COMPOUND,
MF_MSG_POISONED_HUGE,
MF_MSG_HUGE,
diff --git v5.13-rc5/include/ras/ras_event.h v5.13-rc5_patched/include/ras/ras_event.h
index 0bdbc0d17d2f..2f459f6f87fb 100644
--- v5.13-rc5/include/ras/ras_event.h
+++ v5.13-rc5_patched/include/ras/ras_event.h
@@ -357,6 +357,7 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_KERNEL, "reserved kernel page" ) \
EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \
EM ( MF_MSG_SLAB, "kernel slab page" ) \
+ EM ( MF_MSG_PAGETABLE, "page table page page" ) \
EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
EM ( MF_MSG_POISONED_HUGE, "huge page already hardware poisoned" ) \
EM ( MF_MSG_HUGE, "huge page" ) \
diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
index e7910386fc9c..30d6519ce203 100644
--- v5.13-rc5/mm/memory-failure.c
+++ v5.13-rc5_patched/mm/memory-failure.c
@@ -708,6 +708,7 @@ static const char * const action_page_types[] = {
[MF_MSG_KERNEL] = "reserved kernel page",
[MF_MSG_KERNEL_HIGH_ORDER] = "high-order kernel page",
[MF_MSG_SLAB] = "kernel slab page",
+ [MF_MSG_PAGETABLE] = "page table page",
[MF_MSG_DIFFERENT_COMPOUND] = "different compound page after locking",
[MF_MSG_POISONED_HUGE] = "huge page already hardware poisoned",
[MF_MSG_HUGE] = "huge page",
--
2.25.1

2021-06-14 02:16:43

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v1 6/6] mm/hwpoison: fix unpoison_memory()

From: Naoya Horiguchi <[email protected]>

After recent soft-offline rework, error pages can be taken off from
buddy allocator, but the existing unpoison_memory() does not properly
undo the operation. Moreover, due to the recent change on
__get_hwpoison_page(), get_page_unless_zero() is hardly called for
hwpoisoned pages. So __get_hwpoison_page() mostly returns zero (meaning
to fail to grab page refcount) and unpoison just clears PG_hwpoison
without releasing a refcount. That does not lead to a critical issue
like kernel panic, but unpoisoned pages never get back to buddy (leaked
permanently), which is not good.

To fix this, we need to identify "taken off" pages from other types of
hwpoisoned pages. We can't use refcount or page flags for this purpose,
so a pseudo flag is defined by hacking ->private field.

Sometimes hwpoisoned pages can be still in-use, where the refcount should
be more than 1, so we can't unpoison them immediately and need to wait
until the all users release their refcount.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
include/linux/mm.h | 1 +
include/linux/page-flags.h | 4 ++
mm/memory-failure.c | 84 ++++++++++++++++++++++++++++----------
mm/page_alloc.c | 19 +++++++++
4 files changed, 86 insertions(+), 22 deletions(-)

diff --git v5.13-rc5/include/linux/mm.h v5.13-rc5_patched/include/linux/mm.h
index f1e3b82e1a93..0baf3fc97415 100644
--- v5.13-rc5/include/linux/mm.h
+++ v5.13-rc5_patched/include/linux/mm.h
@@ -3077,6 +3077,7 @@ enum mf_flags {
MF_ACTION_REQUIRED = 1 << 1,
MF_MUST_KILL = 1 << 2,
MF_SOFT_OFFLINE = 1 << 3,
+ MF_UNPOISON = 1 << 4,
};
extern int memory_failure(unsigned long pfn, int flags);
extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git v5.13-rc5/include/linux/page-flags.h v5.13-rc5_patched/include/linux/page-flags.h
index 04a34c08e0a6..00bddb3a058d 100644
--- v5.13-rc5/include/linux/page-flags.h
+++ v5.13-rc5_patched/include/linux/page-flags.h
@@ -430,7 +430,11 @@ PAGEFLAG_FALSE(Uncached)
PAGEFLAG(HWPoison, hwpoison, PF_ANY)
TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
#define __PG_HWPOISON (1UL << PG_hwpoison)
+#define MAGIC_HWPOISON 0x4857504f49534f4e
+extern void SetPageHWPoisonTakenOff(struct page *page);
+extern void ClearPageHWPoisonTakenOff(struct page *page);
extern bool take_page_off_buddy(struct page *page);
+extern void take_page_back_buddy(struct page *page);
#else
PAGEFLAG_FALSE(HWPoison)
#define __PG_HWPOISON 0
diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
index 0d51067f0129..41b0ef96e2aa 100644
--- v5.13-rc5/mm/memory-failure.c
+++ v5.13-rc5_patched/mm/memory-failure.c
@@ -1105,6 +1105,22 @@ static int page_action(struct page_state *ps, struct page *p,
return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
}

+static inline bool PageHWPoisonTakenOff(struct page *page)
+{
+ return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
+}
+
+void SetPageHWPoisonTakenOff(struct page *page)
+{
+ set_page_private(page, MAGIC_HWPOISON);
+}
+
+void ClearPageHWPoisonTakenOff(struct page *page)
+{
+ if (PageHWPoison(page))
+ set_page_private(page, 0);
+}
+
/*
* Return true if a page type of a given page is supported by hwpoison
* mechanism (while handling could fail), otherwise false. This function
@@ -1117,7 +1133,7 @@ static inline bool HWPoisonHandlable(struct page *page)
PageSlab(page) || PageTable(page) || PageReserved(page);
}

-static int __get_hwpoison_page(struct page *page)
+static int __get_hwpoison_page(struct page *page, unsigned long flags)
{
struct page *head = compound_head(page);
int ret = 0;
@@ -1127,12 +1143,19 @@ static int __get_hwpoison_page(struct page *page)
if (hugetlb)
return ret;

+ /*
+ * Finding taken-off pages, so it could be true only when called
+ * from unpoison_memory().
+ */
+ if (PageHWPoisonTakenOff(head))
+ return -EHWPOISON;
+
/*
* This check prevents from calling get_hwpoison_unless_zero()
* for any unsupported type of page in order to reduce the risk of
* unexpected races caused by taking a page refcount.
*/
- if (!HWPoisonHandlable(head))
+ if (!(flags & MF_UNPOISON) && !HWPoisonHandlable(head))
return 0;

if (PageTransHuge(head)) {
@@ -1171,7 +1194,7 @@ static int get_any_page(struct page *p, unsigned long flags)

try_again:
if (!count_increased) {
- ret = __get_hwpoison_page(p);
+ ret = __get_hwpoison_page(p, flags);
if (!ret) {
if (page_count(p)) {
/* We raced with an allocation, retry. */
@@ -1190,6 +1213,8 @@ static int get_any_page(struct page *p, unsigned long flags)
if (pass++ < 3)
goto try_again;
goto out;
+ } else if (ret == -EHWPOISON) {
+ goto out;
}
}

@@ -1233,14 +1258,18 @@ static int get_any_page(struct page *p, unsigned long flags)
* 1 on success for in-use pages in a well-defined state,
* -EIO for pages on which we can not handle memory errors,
* -EBUSY when get_hwpoison_page() has raced with page lifecycle
- * operations like allocation and free.
+ * operations like allocation and free,
+ * -EHWPOISON when the page is hwpoisoned and taken off from buddy.
*/
static int get_hwpoison_page(struct page *p, unsigned long flags)
{
int ret;

zone_pcp_disable(page_zone(p));
- ret = get_any_page(p, flags);
+ if (flags & MF_UNPOISON)
+ ret = __get_hwpoison_page(p, flags);
+ else
+ ret = get_any_page(p, flags);
zone_pcp_enable(page_zone(p));

return ret;
@@ -1895,6 +1924,17 @@ core_initcall(memory_failure_init);
pr_info(fmt, pfn); \
})

+static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
+{
+ if (TestClearPageHWPoison(p)) {
+ unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+ page_to_pfn(p), rs);
+ num_poisoned_pages_dec();
+ return 1;
+ }
+ return 0;
+}
+
/**
* unpoison_memory - Unpoison a previously poisoned page
* @pfn: Page number of the to be unpoisoned page
@@ -1911,9 +1951,7 @@ int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
- int freeit = 0;
int ret = 0;
- unsigned long flags = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);

@@ -1949,24 +1987,26 @@ int unpoison_memory(unsigned long pfn)
goto unlock_mutex;
}

- if (!get_hwpoison_page(p, flags)) {
- if (TestClearPageHWPoison(p))
- num_poisoned_pages_dec();
- unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
- pfn, &unpoison_rs);
+ ret = get_hwpoison_page(p, MF_UNPOISON);
+ if (!ret) {
+ clear_page_hwpoison(&unpoison_rs, p);
goto unlock_mutex;
- }
-
- if (TestClearPageHWPoison(page)) {
- unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
- pfn, &unpoison_rs);
- num_poisoned_pages_dec();
- freeit = 1;
- }
+ } else if (ret < 0) {
+ if (ret == -EHWPOISON) {
+ clear_page_hwpoison(&unpoison_rs, p);
+ take_page_back_buddy(p);
+ ret = 0;
+ } else
+ unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
+ pfn, &unpoison_rs);
+ goto unlock_mutex;
+ } else {
+ int freeit = clear_page_hwpoison(&unpoison_rs, p);

- put_page(page);
- if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
put_page(page);
+ if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
+ put_page(page);
+ }

unlock_mutex:
mutex_unlock(&mf_mutex);
diff --git v5.13-rc5/mm/page_alloc.c v5.13-rc5_patched/mm/page_alloc.c
index d1f5de1c1283..325e91d92b7e 100644
--- v5.13-rc5/mm/page_alloc.c
+++ v5.13-rc5_patched/mm/page_alloc.c
@@ -9158,6 +9158,7 @@ bool take_page_off_buddy(struct page *page)
del_page_from_free_list(page_head, zone, page_order);
break_down_buddy_pages(zone, page_head, page, 0,
page_order, migratetype);
+ SetPageHWPoisonTakenOff(page);
if (!is_migrate_isolate(migratetype))
__mod_zone_freepage_state(zone, -1, migratetype);
ret = true;
@@ -9169,4 +9170,22 @@ bool take_page_off_buddy(struct page *page)
spin_unlock_irqrestore(&zone->lock, flags);
return ret;
}
+
+/*
+ * Cancel takeoff done by take_page_off_buddy().
+ */
+void take_page_back_buddy(struct page *page)
+{
+ struct zone *zone = page_zone(page);
+ unsigned long pfn = page_to_pfn(page);
+ unsigned long flags;
+ int migratetype = get_pfnblock_migratetype(page, pfn);
+
+ spin_lock_irqsave(&zone->lock, flags);
+ if (put_page_testzero(page)) {
+ ClearPageHWPoisonTakenOff(page);
+ __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);
+}
#endif
--
2.25.1

2021-06-14 03:09:25

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v1 3/6] mm/hwpoison: introduce MF_MSG_PAGETABLE

On Mon, Jun 14, 2021 at 11:12:09AM +0900, Naoya Horiguchi wrote:
> +++ v5.13-rc5_patched/include/ras/ras_event.h
> @@ -357,6 +357,7 @@ TRACE_EVENT(aer_event,
> EM ( MF_MSG_KERNEL, "reserved kernel page" ) \
> EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \
> EM ( MF_MSG_SLAB, "kernel slab page" ) \
> + EM ( MF_MSG_PAGETABLE, "page table page page" ) \

Did you intend to double the word "page"?

> +++ v5.13-rc5_patched/mm/memory-failure.c
> @@ -708,6 +708,7 @@ static const char * const action_page_types[] = {
> [MF_MSG_KERNEL] = "reserved kernel page",
> [MF_MSG_KERNEL_HIGH_ORDER] = "high-order kernel page",
> [MF_MSG_SLAB] = "kernel slab page",
> + [MF_MSG_PAGETABLE] = "page table page",

... you didn't here.

2021-06-15 11:50:35

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH v1 1/6] mm/hwpoison: mf_mutex for soft offline and unpoison

On 2021/6/14 10:12, Naoya Horiguchi wrote:

>
> @@ -2171,6 +2177,8 @@ int soft_offline_page(unsigned long pfn, int flags)
> return -EIO;
> }
>
> + mutex_lock(&mf_mutex);
> +
> if (PageHWPoison(page)) {
> pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
> put_ref_page(ref_page);

Did you miss mutex_unlock() here when page already poisoned ?

> @@ -2194,5 +2202,7 @@ int soft_offline_page(unsigned long pfn, int flags)
> __func__, pfn, page->flags, &page->flags);
> }
>
> + mutex_unlock(&mf_mutex);
> +
> return ret;
> }
>

--
Thanks,
- Ding Hui

2021-06-15 12:57:58

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/hwpoison: remove race consideration

On 2021/6/14 10:12, Naoya Horiguchi wrote:
> @@ -1956,17 +1938,6 @@ int unpoison_memory(unsigned long pfn)
> goto unlock_mutex;
> }
>
> - /*
> - * unpoison_memory() can encounter thp only when the thp is being
> - * worked by memory_failure() and the page lock is not held yet.
> - * In such case, we yield to memory_failure() and make unpoison fail.
> - */
> - if (!PageHuge(page) && PageTransHuge(page)) {
> - unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
> - pfn, &unpoison_rs);
> - goto unlock_mutex;
> - }
> -

if a huge page is in process of alloc or free, HUGETLB_PAGE_DTOR can be
set after __SetPageHead() or be cleared before __ClearPageHead(), so
this condition may be true in racy.

Do we need the racy test for this situation?

> if (!get_hwpoison_page(p, flags)) {
> if (TestClearPageHWPoison(p))
> num_poisoned_pages_dec();
--
Thanks,
- Ding Hui

2021-06-16 00:41:52

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH v1 2/6] mm/hwpoison: remove race consideration

On 2021/6/16 8:11, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Tue, Jun 15, 2021 at 08:57:06PM +0800, Ding Hui wrote:
>> On 2021/6/14 10:12, Naoya Horiguchi wrote:
>>> @@ -1956,17 +1938,6 @@ int unpoison_memory(unsigned long pfn)
>>> goto unlock_mutex;
>>> }
>>> - /*
>>> - * unpoison_memory() can encounter thp only when the thp is being
>>> - * worked by memory_failure() and the page lock is not held yet.
>>> - * In such case, we yield to memory_failure() and make unpoison fail.
>>> - */
>>> - if (!PageHuge(page) && PageTransHuge(page)) {
>>> - unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
>>> - pfn, &unpoison_rs);
>>> - goto unlock_mutex;
>>> - }
>>> -
>>
>> if a huge page is in process of alloc or free, HUGETLB_PAGE_DTOR can be set
>> after __SetPageHead() or be cleared before __ClearPageHead(), so this
>> condition may be true in racy.
>
> Hi Ding,
>
> We confirm PageHWPoison() before reaching this if-block and hwpoisoned pages
> are prohibited from allocation, so it seems to me that this check never
> races with hugetlb allocation.
>
> And according to the original patch introduced this if-block (0cea3fdc416d:
> "mm/hwpoison: fix race against poison thp"), this if-block intended to close
> the race between memory_failure() and unpoison_memory(), so that's no longer
> necessary due to mf_mutex.
>

I got it and thanks for your explanation.

>> Do we need the racy test for this situation?
>
> I'm not sure, but I think that we need more stress/fuzz testing focusing on
> this subsystem, and "unpoison vs allocation" race can be covered in the topic.
>
> Thank you,
> Naoya Horiguchi
>


--
Thanks,
- Ding Hui

2021-06-17 10:16:13

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH v1 6/6] mm/hwpoison: fix unpoison_memory()

On 2021/6/14 10:12, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <[email protected]>
>
> After recent soft-offline rework, error pages can be taken off from
> buddy allocator, but the existing unpoison_memory() does not properly
> undo the operation. Moreover, due to the recent change on
> __get_hwpoison_page(), get_page_unless_zero() is hardly called for
> hwpoisoned pages. So __get_hwpoison_page() mostly returns zero (meaning
> to fail to grab page refcount) and unpoison just clears PG_hwpoison
> without releasing a refcount. That does not lead to a critical issue
> like kernel panic, but unpoisoned pages never get back to buddy (leaked
> permanently), which is not good.

As I mention in [1], I'm not sure about the exactly meaning of "broken"
in unpoison_memory().

Maybe the misunderstanding is:

I think __get_hwpoison_page() mostly returns one for hwpoisoned page.
In 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages"),
page_handle_poison() is introduced, it will add refcount for all
soft-offlineed hwpoison page.
In memory_failure() for hard-offline,page_ref_inc() called on free page
too, and for used page, we do not call put_page() after
get_hwpoison_page() != 0.
So all hwpoisoned page refcount must be great than zero when
unpoison_memory() if regardless of racy.

Recently I tested loop soft-offline random pages and unpoison them for
days, it works fine to me. (with bac9c6fa1f92 patched)

[1]:
https://lore.kernel.org/lkml/[email protected]/

>
> To fix this, we need to identify "taken off" pages from other types of
> hwpoisoned pages. We can't use refcount or page flags for this purpose,
> so a pseudo flag is defined by hacking ->private field.
>
> Sometimes hwpoisoned pages can be still in-use, where the refcount should
> be more than 1, so we can't unpoison them immediately and need to wait
> until the all users release their refcount.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---


--
Thanks,
- Ding Hui

2021-06-19 20:38:40

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH v1 6/6] mm/hwpoison: fix unpoison_memory()

On 2021/6/18 4:36 下午, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, Jun 17, 2021 at 06:00:21PM +0800, Ding Hui wrote:
>> On 2021/6/14 10:12, Naoya Horiguchi wrote:
>>> From: Naoya Horiguchi <[email protected]>
>>>
>>> After recent soft-offline rework, error pages can be taken off from
>>> buddy allocator, but the existing unpoison_memory() does not properly
>>> undo the operation. Moreover, due to the recent change on
>>> __get_hwpoison_page(), get_page_unless_zero() is hardly called for
>>> hwpoisoned pages. So __get_hwpoison_page() mostly returns zero (meaning
>>> to fail to grab page refcount) and unpoison just clears PG_hwpoison
>>> without releasing a refcount. That does not lead to a critical issue
>>> like kernel panic, but unpoisoned pages never get back to buddy (leaked
>>> permanently), which is not good.
>>
>> As I mention in [1], I'm not sure about the exactly meaning of "broken" in
>> unpoison_memory().
>>
>> Maybe the misunderstanding is:
>>
>> I think __get_hwpoison_page() mostly returns one for hwpoisoned page.
>> In 06be6ff3d2ec ("mm,hwpoison: rework soft offline for free pages"),
>> page_handle_poison() is introduced, it will add refcount for all
>> soft-offlineed hwpoison page.
>> In memory_failure() for hard-offline,page_ref_inc() called on free page
>> too, and for used page, we do not call put_page() after get_hwpoison_page()
>> != 0.
>> So all hwpoisoned page refcount must be great than zero when
>> unpoison_memory() if regardless of racy.
>
> Hi, Ding,
>
> Thanks for the comment. I feel that I failed to define the exact issue in
> unpoison. Maybe I saw and misinterpreted some random error as unpoison's
> issue during developing other hwpoison patches, so please don't take serious
> my previous wrong word "broken", sorry about that.
>
> Anyway I reconsider how to handle this 6/6, maybe it will be a clear
> description of the problem, and will be simplified.
>
>>
>> Recently I tested loop soft-offline random pages and unpoison them for days,
>> it works fine to me. (with bac9c6fa1f92 patched)
>
> Thank you for testing,
>

Hi Naoya,

I'm afraid of my description about testing is ambiguous for others, let
me clarify that I ran stress soft-offline test case from mce-test
project (https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git) for
days to verify my modify about NR_FREE_PAGES (bac9c6fa1f92), without
your current patchset, the case is loop soft-offline random pages and
unpoison them, and it works basic fine to me.

--
Thanks,
-dinghui

2021-07-28 11:10:05

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH v1 5/6] mm/hwpoison: make some kernel pages handlable

On 2021/6/14 10:12, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <[email protected]>
>
> HWPoisonHandlable() introduced by patch "mm,hwpoison: fix race with hugetlb
> page allocation" filters error events by page type, and only limited events
> reach get_page_unless_zero() to avoid race >

I want to report a bug which has relationship with "mm,hwpoison: fix
race with hugetlb page allocation", hugetlb pmd shared and also this patch.

Recently, when test hugetlb and soft offline, I encountered a crash like
this:
[449901.638605] huge_test[16596]: segfault at 8 ip 00007f5f64c39a12 sp
00007fff2105c020 error 4 in ld-2.23.so[7f5f64c2a000+26000]
[449901.638612] Code: 48 8d 35 2c 03 01 00 48 8d 3d 31 03 01 00 ba b5 00
00 00 e8 f0 a5 00 00 53 49 89 fa 89 f6 48 8d 14 76 48 83 ec 10 48 8b 47
68 <48> 8b 78 08 49 8b 82 f8 00 00 00 48 8b 40 08 4c 8d 04 d0 49 8b 42
[449901.638885] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:0 val:358
[449901.638894] ------------[ cut here ]------------
[449901.638962] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:1 val:26
[449901.638966] BUG: non-zero pgtables_bytes on freeing mm: 28672
[449901.639045] kernel BUG at fs/hugetlbfs/inode.c:443!
[449901.639193] invalid opcode: 0000 [#1] SMP NOPTI

After a few days of digging and reproduce, it turns out that there is a
mechanism conflict between the get_hwpoison_page() and hugetlb pmd share:

In huge_pmd_unshare(), the page_count is used to determine whether the
page is shared, it is not safe.

My case is the same page's refcount was increaseed by
get_hwpoison_page() little before if (page_count(virt_to_page(ptep)) ==
1) in huge_pmd_unshare(), so huge_pmd_unshare() went to wrong branch.

> Actually this is too restictive because get_hwpoison_page always fails
> to take refcount for any types of kernel page, leading to
> MF_MSG_KERNEL_HIGH_ORDER. This is not critical (no panic), but less
> informative than MF_MSG_SLAB or MF_MSG_PAGETABLE, so extend
> HWPoisonHandlable() to some basic types of kernel pages (slab, pgtable,
> and reserved pages).
>

After "mm,hwpoison: fix race with hugetlb page allocation",the
PageTable(page) is blocked to get_page_unless_zero() due to
"restictive", this bug is just fixed by side effect.

> The "handling" for these types are still primitive (just taking refcount
> and setting PG_hwpoison) and some more aggressive actions for memory
> error containment are possible and wanted. But compared to the older code,
> these cases never enter the code block of page locks (note that
> page locks is not well-defined on these pages), so it's a little safer
> for functions intended for user pages not to be called for kernel pages.
>

But the root cause is still existed, the bug may come back at any time
by unconsciously, like this patch, if the PageTable(page) is allowed to
get_page_unless_zero(), the risk is come back.

I'm not sure is there any other way to determine whether the pmd page is
shared, so I add Mike Kravetz here, and report the risk to you.

> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> mm/memory-failure.c | 28 ++++++++++++++++++++--------
> 1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
> index b986936e50eb..0d51067f0129 100644
> --- v5.13-rc5/mm/memory-failure.c
> +++ v5.13-rc5_patched/mm/memory-failure.c
> @@ -1113,7 +1113,8 @@ static int page_action(struct page_state *ps, struct page *p,
> */
> static inline bool HWPoisonHandlable(struct page *page)
> {
> - return PageLRU(page) || __PageMovable(page);
> + return PageLRU(page) || __PageMovable(page) ||
> + PageSlab(page) || PageTable(page) || PageReserved(page);
> }
> > static int __get_hwpoison_page(struct page *page)
> @@ -1260,12 +1261,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> struct page *hpage = *hpagep;
> bool mlocked = PageMlocked(hpage);
>
> - /*
> - * Here we are interested only in user-mapped pages, so skip any
> - * other types of pages.
> - */
> - if (PageReserved(p) || PageSlab(p))
> - return true;
> if (!(PageLRU(hpage) || PageHuge(p)))
> return true;
>
> @@ -1670,7 +1665,10 @@ int memory_failure(unsigned long pfn, int flags)
> action_result(pfn, MF_MSG_BUDDY, res);
> res = res == MF_RECOVERED ? 0 : -EBUSY;
> } else {
> - action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> + if (PageCompound(p))
> + action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> + else
> + action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> res = -EBUSY;
> }
> goto unlock_mutex;
> @@ -1681,6 +1679,20 @@ int memory_failure(unsigned long pfn, int flags)
> }
> }
>
> + if (PageSlab(p)) {
> + action_result(pfn, MF_MSG_SLAB, MF_IGNORED);
> + res = -EBUSY;
> + goto unlock_mutex;
> + } else if (PageTable(p)) {
> + action_result(pfn, MF_MSG_PAGETABLE, MF_IGNORED);
> + res = -EBUSY;
> + goto unlock_mutex;
> + } else if (PageReserved(p)) {
> + action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> + res = -EBUSY;
> + goto unlock_mutex;
> + }
> +
> if (PageTransHuge(hpage)) {
> if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>


--
Thanks,
- Ding Hui