2022-06-14 04:39:32

by zhenwei pi

[permalink] [raw]
Subject: [PATCH v4 0/2] mm/memory-failure: don't allow to unpoison hw corrupted page

v3 -> v4:
- Add debug entry "hwpoisoned-pages" to show the number of hwpoisoned
pages.
- Disable unpoison when a read HW memory failure occurs.

v2 -> v3:
- David pointed out that virt_to_kpte() is broken(no pmd_large() test
on a PMD), so drop this API in this patch, walk kmap instead.

v1 -> v2:
- this change gets protected by mf_mutex
- use -EOPNOTSUPP instead of -EPERM

v1:
- check KPTE to avoid to unpoison hardware corrupted page

zhenwei pi (2):
mm/memory-failure: introduce "hwpoisoned-pages" entry
mm/memory-failure: disable unpoison once hw error happens

Documentation/vm/hwpoison.rst | 7 ++++++-
mm/hwpoison-inject.c | 25 ++++++++++++++++++++++++-
mm/memory-failure.c | 1 +
3 files changed, 31 insertions(+), 2 deletions(-)

--
2.20.1


2022-06-14 04:41:13

by zhenwei pi

[permalink] [raw]
Subject: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry

Add a new debug entry to show the number of hwpoisoned pages. And
use module_get/module_put to manager this kernel module, don't allow
to remove this module unless hwpoisoned-pages is zero.

Signed-off-by: zhenwei pi <[email protected]>
---
Documentation/vm/hwpoison.rst | 4 ++++
mm/hwpoison-inject.c | 19 ++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c742de1769d1..c832a8b192d4 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -155,6 +155,10 @@ Testing
flag bits are defined in include/linux/kernel-page-flags.h and
documented in Documentation/admin-guide/mm/pagemap.rst

+ hwpoisoned-pages
+ The number of hwpoisoned pages. The hwpoison kernel module can not be
+ removed unless this count is zero.
+
* Architecture specific MCE injector

x86 has mce-inject, mce-test
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 5c0cddd81505..9e522ecedeef 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -10,6 +10,7 @@
#include "internal.h"

static struct dentry *hwpoison_dir;
+static atomic_t hwpoisoned_pages;

static int hwpoison_inject(void *data, u64 val)
{
@@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
err = memory_failure(pfn, 0);
+ if (!err) {
+ WARN_ON(!try_module_get(THIS_MODULE));
+ atomic_inc(&hwpoisoned_pages);
+ }
+
return (err == -EOPNOTSUPP) ? 0 : err;
}

static int hwpoison_unpoison(void *data, u64 val)
{
+ int ret;
+
if (!capable(CAP_SYS_ADMIN))
return -EPERM;

- return unpoison_memory(val);
+ ret = unpoison_memory(val);
+ if (!ret) {
+ atomic_dec(&hwpoisoned_pages);
+ module_put(THIS_MODULE);
+ }
+
+ return ret;
}

DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
@@ -99,6 +113,9 @@ static int pfn_inject_init(void)
debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
&hwpoison_filter_flags_value);

+ debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
+ &hwpoisoned_pages);
+
#ifdef CONFIG_MEMCG
debugfs_create_u64("corrupt-filter-memcg", 0600, hwpoison_dir,
&hwpoison_filter_memcg);
--
2.20.1

2022-06-14 04:42:44

by zhenwei pi

[permalink] [raw]
Subject: [PATCH v4 2/2] mm/memory-failure: disable unpoison once hw error happens

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
cleared on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only,
the kernel has a chance to access the page with *NOT PRESENT* KPTE.
This leads BUG during accessing on the corrupted KPTE.

Do not allow to unpoison hardware corrupted page in unpoison_memory() to
avoid BUG like this:

Unpoison: Software-unpoisoned page 0x61234
BUG: unable to handle page fault for address: ffff888061234000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
RIP: 0010:clear_page_erms+0x7/0x10
Code: ...
RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
prep_new_page+0x151/0x170
get_page_from_freelist+0xca0/0xe20
? sysvec_apic_timer_interrupt+0xab/0xc0
? asm_sysvec_apic_timer_interrupt+0x1b/0x20
__alloc_pages+0x17e/0x340
__folio_alloc+0x17/0x40
vma_alloc_folio+0x84/0x280
__handle_mm_fault+0x8d4/0xeb0
handle_mm_fault+0xd5/0x2a0
do_user_addr_fault+0x1d0/0x680
? kvm_read_and_reset_apf_flags+0x3b/0x50
exc_page_fault+0x78/0x170
asm_exc_page_fault+0x27/0x30

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens.

Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Cc: Naoya Horiguchi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
---
Documentation/vm/hwpoison.rst | 3 ++-
mm/hwpoison-inject.c | 6 ++++++
mm/memory-failure.c | 1 +
3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c832a8b192d4..ac439381cad4 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -120,7 +120,8 @@ Testing
unpoison-pfn
Software-unpoison page at PFN echoed into this file. This way
a page can be reused again. This only works for Linux
- injected failures, not for real memory failures.
+ injected failures, not for real memory failures. Once any hardware
+ memory failure happens, the feature is disabled.

Note these injection interfaces are not stable and might change between
kernel versions
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 9e522ecedeef..787d2daf41e8 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -7,6 +7,7 @@
#include <linux/swap.h>
#include <linux/pagemap.h>
#include <linux/hugetlb.h>
+#include <linux/swapops.h>
#include "internal.h"

static struct dentry *hwpoison_dir;
@@ -65,6 +66,11 @@ static int hwpoison_unpoison(void *data, u64 val)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;

+ if (atomic_read(&hwpoisoned_pages) != atomic_long_read(&num_poisoned_pages)) {
+ pr_info("Unpoison is disabled after hardware memory failure happened\n");
+ return -EOPNOTSUPP;
+ }
+
ret = unpoison_memory(val);
if (!ret) {
atomic_dec(&hwpoisoned_pages);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b85661cbdc4a..a3e6bd4b5528 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -68,6 +68,7 @@ int sysctl_memory_failure_early_kill __read_mostly = 0;
int sysctl_memory_failure_recovery __read_mostly = 1;

atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
+EXPORT_SYMBOL_GPL(num_poisoned_pages);

static bool __page_handle_poison(struct page *page)
{
--
2.20.1

2022-06-14 05:13:55

by Muchun Song

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry

On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote:
> Add a new debug entry to show the number of hwpoisoned pages. And
> use module_get/module_put to manager this kernel module, don't allow
> to remove this module unless hwpoisoned-pages is zero.
>
> Signed-off-by: zhenwei pi <[email protected]>
> ---
> Documentation/vm/hwpoison.rst | 4 ++++
> mm/hwpoison-inject.c | 19 ++++++++++++++++++-
> 2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index c742de1769d1..c832a8b192d4 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -155,6 +155,10 @@ Testing
> flag bits are defined in include/linux/kernel-page-flags.h and
> documented in Documentation/admin-guide/mm/pagemap.rst
>
> + hwpoisoned-pages

A bit weird to me. IIUC, this means the number of **software** poisoned
pages instead of **hardware**. The prefix "hw" may be not suitable. How
about "poisoned-pages" (a little simplified), "poisoned-pfns" (keep the
name consistent with "corrupt-pfn" and "unpoison-pfn") or "swpoisoned-pages"
(sw means software)?

> + The number of hwpoisoned pages. The hwpoison kernel module can not be
> + removed unless this count is zero.
> +
> * Architecture specific MCE injector
>
> x86 has mce-inject, mce-test
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..9e522ecedeef 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -10,6 +10,7 @@
> #include "internal.h"
>
> static struct dentry *hwpoison_dir;
> +static atomic_t hwpoisoned_pages;
>
> static int hwpoison_inject(void *data, u64 val)
> {
> @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
> inject:
> pr_info("Injecting memory failure at pfn %#lx\n", pfn);
> err = memory_failure(pfn, 0);
> + if (!err) {
> + WARN_ON(!try_module_get(THIS_MODULE));

__module_get() is enough since we already hold a refcount at open time.
This WARN_ON() will not be triggered unless something unexpected happens.

> + atomic_inc(&hwpoisoned_pages);
> + }
> +
> return (err == -EOPNOTSUPP) ? 0 : err;
> }
>
> static int hwpoison_unpoison(void *data, u64 val)
> {
> + int ret;
> +
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> - return unpoison_memory(val);
> + ret = unpoison_memory(val);
> + if (!ret) {
> + atomic_dec(&hwpoisoned_pages);
> + module_put(THIS_MODULE);
> + }
> +
> + return ret;
> }
>
> DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> @@ -99,6 +113,9 @@ static int pfn_inject_init(void)
> debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
> &hwpoison_filter_flags_value);
>
> + debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
> + &hwpoisoned_pages);
> +
> #ifdef CONFIG_MEMCG
> debugfs_create_u64("corrupt-filter-memcg", 0600, hwpoison_dir,
> &hwpoison_filter_memcg);
> --
> 2.20.1
>
>

Subject: Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry

On Tue, Jun 14, 2022 at 12:38:29PM +0800, zhenwei pi wrote:
> Add a new debug entry to show the number of hwpoisoned pages. And
> use module_get/module_put to manager this kernel module, don't allow
> to remove this module unless hwpoisoned-pages is zero.
>
> Signed-off-by: zhenwei pi <[email protected]>
> ---
> Documentation/vm/hwpoison.rst | 4 ++++
> mm/hwpoison-inject.c | 19 ++++++++++++++++++-
> 2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
> index c742de1769d1..c832a8b192d4 100644
> --- a/Documentation/vm/hwpoison.rst
> +++ b/Documentation/vm/hwpoison.rst
> @@ -155,6 +155,10 @@ Testing
> flag bits are defined in include/linux/kernel-page-flags.h and
> documented in Documentation/admin-guide/mm/pagemap.rst
>
> + hwpoisoned-pages
> + The number of hwpoisoned pages. The hwpoison kernel module can not be
> + removed unless this count is zero.
> +
> * Architecture specific MCE injector
>
> x86 has mce-inject, mce-test
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index 5c0cddd81505..9e522ecedeef 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -10,6 +10,7 @@
> #include "internal.h"
>
> static struct dentry *hwpoison_dir;
> +static atomic_t hwpoisoned_pages;
>
> static int hwpoison_inject(void *data, u64 val)
> {
> @@ -49,15 +50,28 @@ static int hwpoison_inject(void *data, u64 val)
> inject:
> pr_info("Injecting memory failure at pfn %#lx\n", pfn);
> err = memory_failure(pfn, 0);
> + if (!err) {
> + WARN_ON(!try_module_get(THIS_MODULE));
> + atomic_inc(&hwpoisoned_pages);
> + }

There's a few other interfaces to generate "software-simulated memory error"
event, i.e. madvise_inject_error() and hard_offline_page_store(). So you need
handle such code path.

> +
> return (err == -EOPNOTSUPP) ? 0 : err;
> }
>
> static int hwpoison_unpoison(void *data, u64 val)
> {
> + int ret;
> +
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> - return unpoison_memory(val);
> + ret = unpoison_memory(val);
> + if (!ret) {
> + atomic_dec(&hwpoisoned_pages);
> + module_put(THIS_MODULE);
> + }
> +
> + return ret;
> }
>
> DEFINE_DEBUGFS_ATTRIBUTE(hwpoison_fops, NULL, hwpoison_inject, "%lli\n");
> @@ -99,6 +113,9 @@ static int pfn_inject_init(void)
> debugfs_create_u64("corrupt-filter-flags-value", 0600, hwpoison_dir,
> &hwpoison_filter_flags_value);
>
> + debugfs_create_atomic_t("hwpoisoned-pages", 0400, hwpoison_dir,
> + &hwpoisoned_pages);

I'm not sure how useful this interface from userspace (controlling test process
with this?). Do we really need to expose this to userspace?


TBH I feel that another approach like below is more desirable:

- define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
- set the flag when calling memory_failure() from the three callers
mentioned above,
- define a global variable (typed bool) in mm/memory_failure.c_to show that
the system has experienced a real hardware memory error events.
- once memory_failure() is called without MF_SW_SIMULATED, the new global
bool variable is set, and afterward unpoison_memory always fails with
-EOPNOTSUPP.

Thanks,
Naoya Horiguchi

2022-06-14 08:11:37

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry

&hwpoisoned_pages);
>
> I'm not sure how useful this interface from userspace (controlling test process
> with this?). Do we really need to expose this to userspace?
>
>
> TBH I feel that another approach like below is more desirable:
>
> - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
> - set the flag when calling memory_failure() from the three callers
> mentioned above,
> - define a global variable (typed bool) in mm/memory_failure.c_to show that
> the system has experienced a real hardware memory error events.
> - once memory_failure() is called without MF_SW_SIMULATED, the new global
> bool variable is set, and afterward unpoison_memory always fails with
> -EOPNOTSUPP.

Exactly what I had in mind.

--
Thanks,

David / dhildenb

2022-06-14 08:37:59

by Miaohe Lin

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] mm/memory-failure: introduce "hwpoisoned-pages" entry

On 2022/6/14 15:13, David Hildenbrand wrote:
> &hwpoisoned_pages);
>>
>> I'm not sure how useful this interface from userspace (controlling test process
>> with this?). Do we really need to expose this to userspace?
>>
>>
>> TBH I feel that another approach like below is more desirable:
>>
>> - define a new flag in "enum mf_flags" (for example named MF_SW_SIMULATED),
>> - set the flag when calling memory_failure() from the three callers
>> mentioned above,
>> - define a global variable (typed bool) in mm/memory_failure.c_to show that
>> the system has experienced a real hardware memory error events.
>> - once memory_failure() is called without MF_SW_SIMULATED, the new global
>> bool variable is set, and afterward unpoison_memory always fails with
>> -EOPNOTSUPP.
>
> Exactly what I had in mind.

This approach should be more straightforward. ;)

>