2021-11-15 08:41:13

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v4 0/3] mm/hwpoison: fix unpoison_memory()

Hi,

I updated the unpoison patchset based on feedbacks for v3.

I fixed a typo in patch description and closed raced in
unpoison_taken_off_page() reported by Yang Shi. As for the build
failure reported from build bot, I commented about disabling
X86_SUPPORTS_MEMORY_FAILURE for i386 at first, but I'm not 100% sure
that someone could use the subsystem for 32 bit system, so I shift to
the easier way (setting MAGIC_HWPOISON to 32-bit integer value).

----- (cover letter copied from v2) -----
Main purpose of this series is to sync unpoison code to recent changes
around how hwpoison code takes page refcount. Unpoison should work or
simply fail (without crash) if impossible.

The recent works of keeping hwpoison pages in shmem pagecache introduce
a new state of hwpoisoned pages, but unpoison for such pages is not
supported yet with this series.

It seems that soft-offline and unpoison can be used as general purpose
page offline/online mechanism (not in the context of memory error). I
think that we need some additional works to realize it because currently
soft-offline and unpoison are assumed not to happen so frequently
(print out too many messages for aggressive usecases). But anyway this
could be another interesting next topic.

v1: https://lore.kernel.org/linux-mm/[email protected]/
v2: https://lore.kernel.org/linux-mm/[email protected]/
v3: https://lore.kernel.org/linux-mm/[email protected]/

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (3):
mm/hwpoison: mf_mutex for soft offline and unpoison
mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE
mm/hwpoison: fix unpoison_memory()

include/linux/mm.h | 3 +-
include/linux/page-flags.h | 4 ++
include/ras/ras_event.h | 2 -
mm/memory-failure.c | 171 ++++++++++++++++++++++++++++-----------------
mm/page_alloc.c | 27 +++++++
5 files changed, 139 insertions(+), 68 deletions(-)


2021-11-15 08:41:29

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v4 1/3] mm/hwpoison: mf_mutex for soft offline and unpoison

From: Naoya Horiguchi <[email protected]>

Originally mf_mutex is introduced to serialize multiple MCE events, but
it is not that useful to allow unpoison to run in parallel with memory_failure()
and soft offline. So apply mf_mutex to soft offline and unpoison.
The memory failure handler and soft offline handler get simpler with this.

Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
---
ChangeLog v4:
- fix type in commit description.

ChangeLog v3:
- merge with "mm/hwpoison: remove race consideration"
- update description

ChangeLog v2:
- add mutex_unlock() in "page already poisoned" path in soft_offline_page().
(Thanks to Ding Hui)
---
mm/memory-failure.c | 62 +++++++++++++--------------------------------
1 file changed, 18 insertions(+), 44 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index e8c38e27b753..d29c79de6034 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1507,14 +1507,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
lock_page(head);
page_flags = head->flags;

- if (!PageHWPoison(head)) {
- pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
- num_poisoned_pages_dec();
- unlock_page(head);
- put_page(head);
- return 0;
- }
-
/*
* TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
* simply disable it. In order to make it work properly, we need
@@ -1628,6 +1620,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
return rc;
}

+static DEFINE_MUTEX(mf_mutex);
+
/**
* memory_failure - Handle memory failure of a page.
* @pfn: Page Number of the corrupted page
@@ -1654,7 +1648,6 @@ int memory_failure(unsigned long pfn, int flags)
int res = 0;
unsigned long page_flags;
bool retry = true;
- static DEFINE_MUTEX(mf_mutex);

if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
@@ -1788,16 +1781,6 @@ int memory_failure(unsigned long pfn, int flags)
*/
page_flags = p->flags;

- /*
- * unpoison always clear PG_hwpoison inside page lock
- */
- if (!PageHWPoison(p)) {
- pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
- num_poisoned_pages_dec();
- unlock_page(p);
- put_page(p);
- goto unlock_mutex;
- }
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
@@ -1978,6 +1961,7 @@ int unpoison_memory(unsigned long pfn)
struct page *page;
struct page *p;
int freeit = 0;
+ int ret = 0;
unsigned long flags = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
@@ -1988,39 +1972,30 @@ int unpoison_memory(unsigned long pfn)
p = pfn_to_page(pfn);
page = compound_head(p);

+ mutex_lock(&mf_mutex);
+
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (page_count(page) > 1) {
unpoison_pr_info("Unpoison: Someone grabs the hwpoison page %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (page_mapped(page)) {
unpoison_pr_info("Unpoison: Someone maps the hwpoison page %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (page_mapping(page)) {
unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n",
pfn, &unpoison_rs);
- return 0;
- }
-
- /*
- * unpoison_memory() can encounter thp only when the thp is being
- * worked by memory_failure() and the page lock is not held yet.
- * In such case, we yield to memory_failure() and make unpoison fail.
- */
- if (!PageHuge(page) && PageTransHuge(page)) {
- unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n",
- pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

if (!get_hwpoison_page(p, flags)) {
@@ -2028,29 +2003,23 @@ int unpoison_memory(unsigned long pfn)
num_poisoned_pages_dec();
unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
pfn, &unpoison_rs);
- return 0;
+ goto unlock_mutex;
}

- lock_page(page);
- /*
- * This test is racy because PG_hwpoison is set outside of page lock.
- * That's acceptable because that won't trigger kernel panic. Instead,
- * the PG_hwpoison page will be caught and isolated on the entrance to
- * the free buddy page pool.
- */
if (TestClearPageHWPoison(page)) {
unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
pfn, &unpoison_rs);
num_poisoned_pages_dec();
freeit = 1;
}
- unlock_page(page);

put_page(page);
if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
put_page(page);

- return 0;
+unlock_mutex:
+ mutex_unlock(&mf_mutex);
+ return ret;
}
EXPORT_SYMBOL(unpoison_memory);

@@ -2231,9 +2200,12 @@ int soft_offline_page(unsigned long pfn, int flags)
return -EIO;
}

+ mutex_lock(&mf_mutex);
+
if (PageHWPoison(page)) {
pr_info("%s: %#lx page already poisoned\n", __func__, pfn);
put_ref_page(ref_page);
+ mutex_unlock(&mf_mutex);
return 0;
}

@@ -2251,5 +2223,7 @@ int soft_offline_page(unsigned long pfn, int flags)
}
}

+ mutex_unlock(&mf_mutex);
+
return ret;
}
--
2.25.1


2021-11-15 08:41:49

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v4 3/3] mm/hwpoison: fix unpoison_memory()

From: Naoya Horiguchi <[email protected]>

After recent soft-offline rework, error pages can be taken off from
buddy allocator, but the existing unpoison_memory() does not properly
undo the operation. Moreover, due to the recent change on
__get_hwpoison_page(), get_page_unless_zero() is hardly called for
hwpoisoned pages. So __get_hwpoison_page() highly likely returns -EBUSY
(meaning to fail to grab page refcount) and unpoison just clears
PG_hwpoison without releasing a refcount. That does not lead to a
critical issue like kernel panic, but unpoisoned pages never get back to
buddy (leaked permanently), which is not good.

To (partially) fix this, we need to identify "taken off" pages from
other types of hwpoisoned pages. We can't use refcount or page flags
for this purpose, so a pseudo flag is defined by hacking ->private
field. Someone might think that put_page() is enough to cancel
taken-off pages, but the normal free path contains some operations not
suitable for the current purpose, and can fire VM_BUG_ON().

Note that unpoison_memory() is now supposed to be cancel hwpoison events
injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
not by MCE injection, so please don't try to use unpoison when testing
with MCE injection.

[[email protected]: report build failure for ARCH=i386]
Signed-off-by: Naoya Horiguchi <[email protected]>
---
ChangeLog v4:
- use integer value for MAGIC_HWPOISON to avoid compile error for ARCH=i386
- close race in unpoison_taken_off_page()

ChangeLog v3:
- fix description
- add PageTable check in unpoison_memory()
- fix return value of clear_page_hwpoison()
- pass page instead head to PageHWPoisonTakenOff check.
- rename take_page_back_buddy() with put_page_back_buddy()

ChangeLog v2:
- unpoison_memory() returns as commented
- explicitly avoids unpoisoning slab pages
- separates internal pinning function into __get_unpoison_page()
---
include/linux/mm.h | 1 +
include/linux/page-flags.h | 4 ++
mm/memory-failure.c | 109 ++++++++++++++++++++++++++++++-------
mm/page_alloc.c | 27 +++++++++
4 files changed, 122 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7941bca871dc..8bbb7205ef9f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3220,6 +3220,7 @@ enum mf_flags {
MF_ACTION_REQUIRED = 1 << 1,
MF_MUST_KILL = 1 << 2,
MF_SOFT_OFFLINE = 1 << 3,
+ MF_UNPOISON = 1 << 4,
};
extern int memory_failure(unsigned long pfn, int flags);
extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 52ec4b5e5615..a4fe056910bb 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached)
PAGEFLAG(HWPoison, hwpoison, PF_ANY)
TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
#define __PG_HWPOISON (1UL << PG_hwpoison)
+#define MAGIC_HWPOISON 0x48575053U /* HWPS */
+extern void SetPageHWPoisonTakenOff(struct page *page);
+extern void ClearPageHWPoisonTakenOff(struct page *page);
extern bool take_page_off_buddy(struct page *page);
+extern bool put_page_back_buddy(struct page *page);
#else
PAGEFLAG_FALSE(HWPoison, hwpoison)
#define __PG_HWPOISON 0
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 722036539b44..0f8b798cba69 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p,
return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
}

+static inline bool PageHWPoisonTakenOff(struct page *page)
+{
+ return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
+}
+
+void SetPageHWPoisonTakenOff(struct page *page)
+{
+ set_page_private(page, MAGIC_HWPOISON);
+}
+
+void ClearPageHWPoisonTakenOff(struct page *page)
+{
+ if (PageHWPoison(page))
+ set_page_private(page, 0);
+}
+
/*
* Return true if a page type of a given page is supported by hwpoison
* mechanism (while handling could fail), otherwise false. This function
@@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
return ret;
}

+static int __get_unpoison_page(struct page *page)
+{
+ struct page *head = compound_head(page);
+ int ret = 0;
+ bool hugetlb = false;
+
+ ret = get_hwpoison_huge_page(head, &hugetlb);
+ if (hugetlb)
+ return ret;
+
+ /*
+ * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
+ * but also isolated from buddy freelist, so need to identify the
+ * state and have to cancel both operations to unpoison.
+ */
+ if (PageHWPoisonTakenOff(page))
+ return -EHWPOISON;
+
+ return get_page_unless_zero(page) ? 1 : 0;
+}
+
/**
* get_hwpoison_page() - Get refcount for memory error handling
* @p: Raw error page (hit by memory error)
@@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
* extra care for the error page's state (as done in __get_hwpoison_page()),
* and has some retry logic in get_any_page().
*
+ * When called from unpoison_memory(), the caller should already ensure that
+ * the given page has PG_hwpoison. So it's never reused for other page
+ * allocations, and __get_unpoison_page() never races with them.
+ *
* Return: 0 on failure,
* 1 on success for in-use pages in a well-defined state,
* -EIO for pages on which we can not handle memory errors,
* -EBUSY when get_hwpoison_page() has raced with page lifecycle
- * operations like allocation and free.
+ * operations like allocation and free,
+ * -EHWPOISON when the page is hwpoisoned and taken off from buddy.
*/
static int get_hwpoison_page(struct page *p, unsigned long flags)
{
int ret;

zone_pcp_disable(page_zone(p));
- ret = get_any_page(p, flags);
+ if (flags & MF_UNPOISON)
+ ret = __get_unpoison_page(p);
+ else
+ ret = get_any_page(p, flags);
zone_pcp_enable(page_zone(p));

return ret;
@@ -1942,6 +1987,28 @@ core_initcall(memory_failure_init);
pr_info(fmt, pfn); \
})

+static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
+{
+ if (TestClearPageHWPoison(p)) {
+ unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+ page_to_pfn(p), rs);
+ num_poisoned_pages_dec();
+ return 1;
+ }
+ return 0;
+}
+
+static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
+ struct page *p)
+{
+ if (put_page_back_buddy(p)) {
+ unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
+ page_to_pfn(p), rs);
+ return 0;
+ }
+ return -EBUSY;
+}
+
/**
* unpoison_memory - Unpoison a previously poisoned page
* @pfn: Page number of the to be unpoisoned page
@@ -1958,9 +2025,7 @@ int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
- int freeit = 0;
- int ret = 0;
- unsigned long flags = 0;
+ int ret = -EBUSY;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);

@@ -1996,24 +2061,30 @@ int unpoison_memory(unsigned long pfn)
goto unlock_mutex;
}

- if (!get_hwpoison_page(p, flags)) {
- if (TestClearPageHWPoison(p))
- num_poisoned_pages_dec();
- unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
- pfn, &unpoison_rs);
+ if (PageSlab(page) || PageTable(page))
goto unlock_mutex;
- }

- if (TestClearPageHWPoison(page)) {
- unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
- pfn, &unpoison_rs);
- num_poisoned_pages_dec();
- freeit = 1;
- }
+ ret = get_hwpoison_page(p, MF_UNPOISON);
+ if (!ret) {
+ if (clear_page_hwpoison(&unpoison_rs, page))
+ ret = 0;
+ else
+ ret = -EBUSY;
+ } else if (ret < 0) {
+ if (ret == -EHWPOISON) {
+ ret = unpoison_taken_off_page(&unpoison_rs, p);
+ } else
+ unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
+ pfn, &unpoison_rs);
+ } else {
+ int freeit = clear_page_hwpoison(&unpoison_rs, p);

- put_page(page);
- if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
put_page(page);
+ if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
+ put_page(page);
+ ret = 0;
+ }
+ }

unlock_mutex:
mutex_unlock(&mf_mutex);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..d9ba6c1b9f43 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -19,6 +19,7 @@
#include <linux/mm.h>
#include <linux/highmem.h>
#include <linux/swap.h>
+#include <linux/swapops.h>
#include <linux/interrupt.h>
#include <linux/pagemap.h>
#include <linux/jiffies.h>
@@ -9448,6 +9449,7 @@ bool take_page_off_buddy(struct page *page)
del_page_from_free_list(page_head, zone, page_order);
break_down_buddy_pages(zone, page_head, page, 0,
page_order, migratetype);
+ SetPageHWPoisonTakenOff(page);
if (!is_migrate_isolate(migratetype))
__mod_zone_freepage_state(zone, -1, migratetype);
ret = true;
@@ -9459,4 +9461,29 @@ bool take_page_off_buddy(struct page *page)
spin_unlock_irqrestore(&zone->lock, flags);
return ret;
}
+
+/*
+ * Cancel takeoff done by take_page_off_buddy().
+ */
+bool put_page_back_buddy(struct page *page)
+{
+ struct zone *zone = page_zone(page);
+ unsigned long pfn = page_to_pfn(page);
+ unsigned long flags;
+ int migratetype = get_pfnblock_migratetype(page, pfn);
+ bool ret = false;
+
+ spin_lock_irqsave(&zone->lock, flags);
+ if (put_page_testzero(page)) {
+ ClearPageHWPoisonTakenOff(page);
+ __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
+ if (TestClearPageHWPoison(page)) {
+ num_poisoned_pages_dec();
+ ret = true;
+ }
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);
+
+ return ret;
+}
#endif
--
2.25.1


2021-11-15 08:42:01

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v4 2/3] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE

From: Naoya Horiguchi <[email protected]>

These action_page_types are no longer used, so remove them.

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Yang Shi <[email protected]>
---
include/linux/mm.h | 2 --
include/ras/ras_event.h | 2 --
mm/memory-failure.c | 2 --
3 files changed, 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a7e4a9e7d807..7941bca871dc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3247,7 +3247,6 @@ enum mf_action_page_type {
MF_MSG_KERNEL_HIGH_ORDER,
MF_MSG_SLAB,
MF_MSG_DIFFERENT_COMPOUND,
- MF_MSG_POISONED_HUGE,
MF_MSG_HUGE,
MF_MSG_FREE_HUGE,
MF_MSG_NON_PMD_HUGE,
@@ -3262,7 +3261,6 @@ enum mf_action_page_type {
MF_MSG_CLEAN_LRU,
MF_MSG_TRUNCATED_LRU,
MF_MSG_BUDDY,
- MF_MSG_BUDDY_2ND,
MF_MSG_DAX,
MF_MSG_UNSPLIT_THP,
MF_MSG_UNKNOWN,
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 0bdbc0d17d2f..d0337a41141c 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -358,7 +358,6 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \
EM ( MF_MSG_SLAB, "kernel slab page" ) \
EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \
- EM ( MF_MSG_POISONED_HUGE, "huge page already hardware poisoned" ) \
EM ( MF_MSG_HUGE, "huge page" ) \
EM ( MF_MSG_FREE_HUGE, "free huge page" ) \
EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" ) \
@@ -373,7 +372,6 @@ TRACE_EVENT(aer_event,
EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \
EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \
EM ( MF_MSG_BUDDY, "free buddy page" ) \
- EM ( MF_MSG_BUDDY_2ND, "free buddy page (2nd try)" ) \
EM ( MF_MSG_DAX, "dax page" ) \
EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
EMe ( MF_MSG_UNKNOWN, "unknown page" )
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d29c79de6034..722036539b44 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -723,7 +723,6 @@ static const char * const action_page_types[] = {
[MF_MSG_KERNEL_HIGH_ORDER] = "high-order kernel page",
[MF_MSG_SLAB] = "kernel slab page",
[MF_MSG_DIFFERENT_COMPOUND] = "different compound page after locking",
- [MF_MSG_POISONED_HUGE] = "huge page already hardware poisoned",
[MF_MSG_HUGE] = "huge page",
[MF_MSG_FREE_HUGE] = "free huge page",
[MF_MSG_NON_PMD_HUGE] = "non-pmd-sized huge page",
@@ -738,7 +737,6 @@ static const char * const action_page_types[] = {
[MF_MSG_CLEAN_LRU] = "clean LRU page",
[MF_MSG_TRUNCATED_LRU] = "already truncated LRU page",
[MF_MSG_BUDDY] = "free buddy page",
- [MF_MSG_BUDDY_2ND] = "free buddy page (2nd try)",
[MF_MSG_DAX] = "dax page",
[MF_MSG_UNSPLIT_THP] = "unsplit thp",
[MF_MSG_UNKNOWN] = "unknown page",
--
2.25.1


2021-11-16 00:04:31

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] mm/hwpoison: fix unpoison_memory()

On Mon, Nov 15, 2021 at 12:40 AM Naoya Horiguchi
<[email protected]> wrote:
>
> From: Naoya Horiguchi <[email protected]>
>
> After recent soft-offline rework, error pages can be taken off from
> buddy allocator, but the existing unpoison_memory() does not properly
> undo the operation. Moreover, due to the recent change on
> __get_hwpoison_page(), get_page_unless_zero() is hardly called for
> hwpoisoned pages. So __get_hwpoison_page() highly likely returns -EBUSY
> (meaning to fail to grab page refcount) and unpoison just clears
> PG_hwpoison without releasing a refcount. That does not lead to a
> critical issue like kernel panic, but unpoisoned pages never get back to
> buddy (leaked permanently), which is not good.
>
> To (partially) fix this, we need to identify "taken off" pages from
> other types of hwpoisoned pages. We can't use refcount or page flags
> for this purpose, so a pseudo flag is defined by hacking ->private
> field. Someone might think that put_page() is enough to cancel
> taken-off pages, but the normal free path contains some operations not
> suitable for the current purpose, and can fire VM_BUG_ON().
>
> Note that unpoison_memory() is now supposed to be cancel hwpoison events
> injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page,
> not by MCE injection, so please don't try to use unpoison when testing
> with MCE injection.
>
> [[email protected]: report build failure for ARCH=i386]
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> ChangeLog v4:
> - use integer value for MAGIC_HWPOISON to avoid compile error for ARCH=i386
> - close race in unpoison_taken_off_page()

Reviewed-by: Yang Shi <[email protected]>

>
> ChangeLog v3:
> - fix description
> - add PageTable check in unpoison_memory()
> - fix return value of clear_page_hwpoison()
> - pass page instead head to PageHWPoisonTakenOff check.
> - rename take_page_back_buddy() with put_page_back_buddy()
>
> ChangeLog v2:
> - unpoison_memory() returns as commented
> - explicitly avoids unpoisoning slab pages
> - separates internal pinning function into __get_unpoison_page()
> ---
> include/linux/mm.h | 1 +
> include/linux/page-flags.h | 4 ++
> mm/memory-failure.c | 109 ++++++++++++++++++++++++++++++-------
> mm/page_alloc.c | 27 +++++++++
> 4 files changed, 122 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7941bca871dc..8bbb7205ef9f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3220,6 +3220,7 @@ enum mf_flags {
> MF_ACTION_REQUIRED = 1 << 1,
> MF_MUST_KILL = 1 << 2,
> MF_SOFT_OFFLINE = 1 << 3,
> + MF_UNPOISON = 1 << 4,
> };
> extern int memory_failure(unsigned long pfn, int flags);
> extern void memory_failure_queue(unsigned long pfn, int flags);
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 52ec4b5e5615..a4fe056910bb 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached)
> PAGEFLAG(HWPoison, hwpoison, PF_ANY)
> TESTSCFLAG(HWPoison, hwpoison, PF_ANY)
> #define __PG_HWPOISON (1UL << PG_hwpoison)
> +#define MAGIC_HWPOISON 0x48575053U /* HWPS */
> +extern void SetPageHWPoisonTakenOff(struct page *page);
> +extern void ClearPageHWPoisonTakenOff(struct page *page);
> extern bool take_page_off_buddy(struct page *page);
> +extern bool put_page_back_buddy(struct page *page);
> #else
> PAGEFLAG_FALSE(HWPoison, hwpoison)
> #define __PG_HWPOISON 0
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 722036539b44..0f8b798cba69 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p,
> return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
> }
>
> +static inline bool PageHWPoisonTakenOff(struct page *page)
> +{
> + return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON;
> +}
> +
> +void SetPageHWPoisonTakenOff(struct page *page)
> +{
> + set_page_private(page, MAGIC_HWPOISON);
> +}
> +
> +void ClearPageHWPoisonTakenOff(struct page *page)
> +{
> + if (PageHWPoison(page))
> + set_page_private(page, 0);
> +}
> +
> /*
> * Return true if a page type of a given page is supported by hwpoison
> * mechanism (while handling could fail), otherwise false. This function
> @@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags)
> return ret;
> }
>
> +static int __get_unpoison_page(struct page *page)
> +{
> + struct page *head = compound_head(page);
> + int ret = 0;
> + bool hugetlb = false;
> +
> + ret = get_hwpoison_huge_page(head, &hugetlb);
> + if (hugetlb)
> + return ret;
> +
> + /*
> + * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison,
> + * but also isolated from buddy freelist, so need to identify the
> + * state and have to cancel both operations to unpoison.
> + */
> + if (PageHWPoisonTakenOff(page))
> + return -EHWPOISON;
> +
> + return get_page_unless_zero(page) ? 1 : 0;
> +}
> +
> /**
> * get_hwpoison_page() - Get refcount for memory error handling
> * @p: Raw error page (hit by memory error)
> @@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags)
> * extra care for the error page's state (as done in __get_hwpoison_page()),
> * and has some retry logic in get_any_page().
> *
> + * When called from unpoison_memory(), the caller should already ensure that
> + * the given page has PG_hwpoison. So it's never reused for other page
> + * allocations, and __get_unpoison_page() never races with them.
> + *
> * Return: 0 on failure,
> * 1 on success for in-use pages in a well-defined state,
> * -EIO for pages on which we can not handle memory errors,
> * -EBUSY when get_hwpoison_page() has raced with page lifecycle
> - * operations like allocation and free.
> + * operations like allocation and free,
> + * -EHWPOISON when the page is hwpoisoned and taken off from buddy.
> */
> static int get_hwpoison_page(struct page *p, unsigned long flags)
> {
> int ret;
>
> zone_pcp_disable(page_zone(p));
> - ret = get_any_page(p, flags);
> + if (flags & MF_UNPOISON)
> + ret = __get_unpoison_page(p);
> + else
> + ret = get_any_page(p, flags);
> zone_pcp_enable(page_zone(p));
>
> return ret;
> @@ -1942,6 +1987,28 @@ core_initcall(memory_failure_init);
> pr_info(fmt, pfn); \
> })
>
> +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p)
> +{
> + if (TestClearPageHWPoison(p)) {
> + unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> + page_to_pfn(p), rs);
> + num_poisoned_pages_dec();
> + return 1;
> + }
> + return 0;
> +}
> +
> +static inline int unpoison_taken_off_page(struct ratelimit_state *rs,
> + struct page *p)
> +{
> + if (put_page_back_buddy(p)) {
> + unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> + page_to_pfn(p), rs);
> + return 0;
> + }
> + return -EBUSY;
> +}
> +
> /**
> * unpoison_memory - Unpoison a previously poisoned page
> * @pfn: Page number of the to be unpoisoned page
> @@ -1958,9 +2025,7 @@ int unpoison_memory(unsigned long pfn)
> {
> struct page *page;
> struct page *p;
> - int freeit = 0;
> - int ret = 0;
> - unsigned long flags = 0;
> + int ret = -EBUSY;
> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
>
> @@ -1996,24 +2061,30 @@ int unpoison_memory(unsigned long pfn)
> goto unlock_mutex;
> }
>
> - if (!get_hwpoison_page(p, flags)) {
> - if (TestClearPageHWPoison(p))
> - num_poisoned_pages_dec();
> - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
> - pfn, &unpoison_rs);
> + if (PageSlab(page) || PageTable(page))
> goto unlock_mutex;
> - }
>
> - if (TestClearPageHWPoison(page)) {
> - unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n",
> - pfn, &unpoison_rs);
> - num_poisoned_pages_dec();
> - freeit = 1;
> - }
> + ret = get_hwpoison_page(p, MF_UNPOISON);
> + if (!ret) {
> + if (clear_page_hwpoison(&unpoison_rs, page))
> + ret = 0;
> + else
> + ret = -EBUSY;
> + } else if (ret < 0) {
> + if (ret == -EHWPOISON) {
> + ret = unpoison_taken_off_page(&unpoison_rs, p);
> + } else
> + unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
> + pfn, &unpoison_rs);
> + } else {
> + int freeit = clear_page_hwpoison(&unpoison_rs, p);
>
> - put_page(page);
> - if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1))
> put_page(page);
> + if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) {
> + put_page(page);
> + ret = 0;
> + }
> + }
>
> unlock_mutex:
> mutex_unlock(&mf_mutex);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index c5952749ad40..d9ba6c1b9f43 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -19,6 +19,7 @@
> #include <linux/mm.h>
> #include <linux/highmem.h>
> #include <linux/swap.h>
> +#include <linux/swapops.h>
> #include <linux/interrupt.h>
> #include <linux/pagemap.h>
> #include <linux/jiffies.h>
> @@ -9448,6 +9449,7 @@ bool take_page_off_buddy(struct page *page)
> del_page_from_free_list(page_head, zone, page_order);
> break_down_buddy_pages(zone, page_head, page, 0,
> page_order, migratetype);
> + SetPageHWPoisonTakenOff(page);
> if (!is_migrate_isolate(migratetype))
> __mod_zone_freepage_state(zone, -1, migratetype);
> ret = true;
> @@ -9459,4 +9461,29 @@ bool take_page_off_buddy(struct page *page)
> spin_unlock_irqrestore(&zone->lock, flags);
> return ret;
> }
> +
> +/*
> + * Cancel takeoff done by take_page_off_buddy().
> + */
> +bool put_page_back_buddy(struct page *page)
> +{
> + struct zone *zone = page_zone(page);
> + unsigned long pfn = page_to_pfn(page);
> + unsigned long flags;
> + int migratetype = get_pfnblock_migratetype(page, pfn);
> + bool ret = false;
> +
> + spin_lock_irqsave(&zone->lock, flags);
> + if (put_page_testzero(page)) {
> + ClearPageHWPoisonTakenOff(page);
> + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE);
> + if (TestClearPageHWPoison(page)) {
> + num_poisoned_pages_dec();
> + ret = true;
> + }
> + }
> + spin_unlock_irqrestore(&zone->lock, flags);
> +
> + return ret;
> +}
> #endif
> --
> 2.25.1
>

2022-01-22 00:25:48

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm/hwpoison: mf_mutex for soft offline and unpoison

On 11/15/21 00:40, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <[email protected]>
>
> Originally mf_mutex is introduced to serialize multiple MCE events, but
> it is not that useful to allow unpoison to run in parallel with memory_failure()
> and soft offline. So apply mf_mutex to soft offline and unpoison.
> The memory failure handler and soft offline handler get simpler with this.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Reviewed-by: Yang Shi <[email protected]>
> ---
> ChangeLog v4:
> - fix type in commit description.
>
> ChangeLog v3:
> - merge with "mm/hwpoison: remove race consideration"
> - update description
>
> ChangeLog v2:
> - add mutex_unlock() in "page already poisoned" path in soft_offline_page().
> (Thanks to Ding Hui)
> ---
> mm/memory-failure.c | 62 +++++++++++++--------------------------------
> 1 file changed, 18 insertions(+), 44 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index e8c38e27b753..d29c79de6034 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c

Thanks for working on this. I tried to exercise memory error handling for
hugetlb pages and ran into issues addressed by these patches.

> @@ -1507,14 +1507,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> lock_page(head);
> page_flags = head->flags;
>
> - if (!PageHWPoison(head)) {
> - pr_err("Memory failure: %#lx: just unpoisoned\n", pfn);
> - num_poisoned_pages_dec();
> - unlock_page(head);
> - put_page(head);
> - return 0;
> - }
> -
> /*
> * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> * simply disable it. In order to make it work properly, we need
> @@ -1628,6 +1620,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> return rc;
> }
>
> +static DEFINE_MUTEX(mf_mutex);

There are only two places other places where PageHWPoison is modified without
the mutex. They are:
- test_and_clear_pmem_poison
I 'think' pmem error handling is done separately so this does not apply.
- clear_hwpoisoned_pages
Called before removing memory (and deleting memmap) to reconcile count
of poisoned pages. Should not be an issue and technically I do not think
the ClearPageHWPoison() is actually needed in this routine.

Reviewed-by: Mike Kravetz <[email protected]>
--
Mike Kravetz

2022-01-22 00:27:53

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE

On 11/15/21 00:40, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <[email protected]>
>
> These action_page_types are no longer used, so remove them.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Acked-by: Yang Shi <[email protected]>
> ---
> include/linux/mm.h | 2 --
> include/ras/ras_event.h | 2 --
> mm/memory-failure.c | 2 --
> 3 files changed, 6 deletions(-)

Reviewed-by: Mike Kravetz <[email protected]>

--
Mike Kravetz

2022-01-22 02:12:15

by Mike Kravetz

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] mm/hwpoison: mf_mutex for soft offline and unpoison

On 1/20/22 16:09, Mike Kravetz wrote:
> On 11/15/21 00:40, Naoya Horiguchi wrote:
>> From: Naoya Horiguchi <[email protected]>
>>
>> Originally mf_mutex is introduced to serialize multiple MCE events, but
>> it is not that useful to allow unpoison to run in parallel with memory_failure()
>> and soft offline. So apply mf_mutex to soft offline and unpoison.
>> The memory failure handler and soft offline handler get simpler with this.
>>

Sorry for the late question.

It is not directly part of this change, but can we also remove the check in
__soft_offline_page after this comment?

/*
* Check PageHWPoison again inside page lock because PageHWPoison
* is set by memory_failure() outside page lock. Note that
* memory_failure() also double-checks PageHWPoison inside page lock,
* so there's no race between soft_offline_page() and memory_failure().
*/

--
Mike Kravetz