2022-06-15 02:38:07

by zhenwei pi

[permalink] [raw]
Subject: [PATCH v5 0/1] mm/memory-failure: don't allow to unpoison hw corrupted page

v4 -> v5:
- Add mf_flags 'MF_SW_SIMULATED' to distinguish SW/HW memory failure,
and use a global variable to record HW memory failure, once HW
memory failure happens, disable unpoison.

v3 -> v4:
- Add debug entry "hwpoisoned-pages" to show the number of hwpoisoned
pages.
- Disable unpoison when a read HW memory failure occurs.

v2 -> v3:
- David pointed out that virt_to_kpte() is broken(no pmd_large() test
on a PMD), so drop this API in this patch, walk kmap instead.

v1 -> v2:
- this change gets protected by mf_mutex
- use -EOPNOTSUPP instead of -EPERM

v1:
- check KPTE to avoid to unpoison hardware corrupted page

zhenwei pi (1):
mm/memory-failure: disable unpoison once hw error happens

Documentation/vm/hwpoison.rst | 3 ++-
drivers/base/memory.c | 2 +-
include/linux/mm.h | 1 +
mm/hwpoison-inject.c | 2 +-
mm/madvise.c | 2 +-
mm/memory-failure.c | 12 ++++++++++++
6 files changed, 18 insertions(+), 4 deletions(-)

--
2.20.1


2022-06-15 02:39:31

by zhenwei pi

[permalink] [raw]
Subject: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
cleared on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only,
the kernel has a chance to access the page with *NOT PRESENT* KPTE.
This leads BUG during accessing on the corrupted KPTE.

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens to avoid BUG like this:

Unpoison: Software-unpoisoned page 0x61234
BUG: unable to handle page fault for address: ffff888061234000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
RIP: 0010:clear_page_erms+0x7/0x10
Code: ...
RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
prep_new_page+0x151/0x170
get_page_from_freelist+0xca0/0xe20
? sysvec_apic_timer_interrupt+0xab/0xc0
? asm_sysvec_apic_timer_interrupt+0x1b/0x20
__alloc_pages+0x17e/0x340
__folio_alloc+0x17/0x40
vma_alloc_folio+0x84/0x280
__handle_mm_fault+0x8d4/0xeb0
handle_mm_fault+0xd5/0x2a0
do_user_addr_fault+0x1d0/0x680
? kvm_read_and_reset_apf_flags+0x3b/0x50
exc_page_fault+0x78/0x170
asm_exc_page_fault+0x27/0x30

Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Cc: Naoya Horiguchi <[email protected]>
Cc: David Hildenbrand <[email protected]>
Signed-off-by: zhenwei pi <[email protected]>
---
Documentation/vm/hwpoison.rst | 3 ++-
drivers/base/memory.c | 2 +-
include/linux/mm.h | 1 +
mm/hwpoison-inject.c | 2 +-
mm/madvise.c | 2 +-
mm/memory-failure.c | 12 ++++++++++++
6 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index c742de1769d1..b9d5253c1305 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -120,7 +120,8 @@ Testing
unpoison-pfn
Software-unpoison page at PFN echoed into this file. This way
a page can be reused again. This only works for Linux
- injected failures, not for real memory failures.
+ injected failures, not for real memory failures. Once any hardware
+ memory failure happens, this feature is disabled.

Note these injection interfaces are not stable and might change between
kernel versions
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 084d67fd55cc..bc60c9cd3230 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -558,7 +558,7 @@ static ssize_t hard_offline_page_store(struct device *dev,
if (kstrtoull(buf, 0, &pfn) < 0)
return -EINVAL;
pfn >>= PAGE_SHIFT;
- ret = memory_failure(pfn, 0);
+ ret = memory_failure(pfn, MF_SW_SIMULATED);
if (ret == -EOPNOTSUPP)
ret = 0;
return ret ? ret : count;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..4346e51484ba 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3232,6 +3232,7 @@ enum mf_flags {
MF_MUST_KILL = 1 << 2,
MF_SOFT_OFFLINE = 1 << 3,
MF_UNPOISON = 1 << 4,
+ MF_SW_SIMULATED = 1 << 5,
};
extern int memory_failure(unsigned long pfn, int flags);
extern void memory_failure_queue(unsigned long pfn, int flags);
diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
index 5c0cddd81505..65e242b5a432 100644
--- a/mm/hwpoison-inject.c
+++ b/mm/hwpoison-inject.c
@@ -48,7 +48,7 @@ static int hwpoison_inject(void *data, u64 val)

inject:
pr_info("Injecting memory failure at pfn %#lx\n", pfn);
- err = memory_failure(pfn, 0);
+ err = memory_failure(pfn, MF_SW_SIMULATED);
return (err == -EOPNOTSUPP) ? 0 : err;
}

diff --git a/mm/madvise.c b/mm/madvise.c
index d7b4f2602949..0316bbc6441b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1112,7 +1112,7 @@ static int madvise_inject_error(int behavior,
} else {
pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
pfn, start);
- ret = memory_failure(pfn, MF_COUNT_INCREASED);
+ ret = memory_failure(pfn, MF_COUNT_INCREASED | MF_SW_SIMULATED);
if (ret == -EOPNOTSUPP)
ret = 0;
}
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b85661cbdc4a..385b5e99bfc1 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;

atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);

+static bool hw_memory_failure;
+
static bool __page_handle_poison(struct page *page)
{
int ret;
@@ -1768,6 +1770,9 @@ int memory_failure(unsigned long pfn, int flags)

mutex_lock(&mf_mutex);

+ if (!(flags & MF_SW_SIMULATED))
+ hw_memory_failure = true;
+
p = pfn_to_online_page(pfn);
if (!p) {
res = arch_memory_failure(pfn, flags);
@@ -2103,6 +2108,13 @@ int unpoison_memory(unsigned long pfn)

mutex_lock(&mf_mutex);

+ if (hw_memory_failure) {
+ unpoison_pr_info("Unpoison: Disabled after HW memory failure %#lx\n",
+ pfn, &unpoison_rs);
+ ret = -EOPNOTSUPP;
+ goto unlock_mutex;
+ }
+
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
--
2.20.1

2022-06-15 04:30:55

by Oscar Salvador

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
> Currently unpoison_memory(unsigned long pfn) is designed for soft
> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
> cleared on a x86 platform once hardware memory corrupts.
>
> Unpoisoning a hardware corrupted page puts page back buddy only,
> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
> This leads BUG during accessing on the corrupted KPTE.
>
> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
> happens to avoid BUG like this:
>
> Unpoison: Software-unpoisoned page 0x61234
> BUG: unable to handle page fault for address: ffff888061234000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
> RIP: 0010:clear_page_erms+0x7/0x10
> Code: ...
> RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
> RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
> RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
> R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
> R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
> FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> prep_new_page+0x151/0x170
> get_page_from_freelist+0xca0/0xe20
> ? sysvec_apic_timer_interrupt+0xab/0xc0
> ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
> __alloc_pages+0x17e/0x340
> __folio_alloc+0x17/0x40
> vma_alloc_folio+0x84/0x280
> __handle_mm_fault+0x8d4/0xeb0
> handle_mm_fault+0xd5/0x2a0
> do_user_addr_fault+0x1d0/0x680
> ? kvm_read_and_reset_apf_flags+0x3b/0x50
> exc_page_fault+0x78/0x170
> asm_exc_page_fault+0x27/0x30
>
> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
> Cc: Naoya Horiguchi <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Signed-off-by: zhenwei pi <[email protected]>
> ---
...
...
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b85661cbdc4a..385b5e99bfc1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>
> atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>
> +static bool hw_memory_failure;
> +
> static bool __page_handle_poison(struct page *page)
> {
> int ret;
> @@ -1768,6 +1770,9 @@ int memory_failure(unsigned long pfn, int flags)
>
> mutex_lock(&mf_mutex);
>
> + if (!(flags & MF_SW_SIMULATED))
> + hw_memory_failure = true;
> +
> p = pfn_to_online_page(pfn);
> if (!p) {
> res = arch_memory_failure(pfn, flags);
> @@ -2103,6 +2108,13 @@ int unpoison_memory(unsigned long pfn)
>
> mutex_lock(&mf_mutex);
>
> + if (hw_memory_failure) {
> + unpoison_pr_info("Unpoison: Disabled after HW memory failure %#lx\n",
> + pfn, &unpoison_rs);
> + ret = -EOPNOTSUPP;
> + goto unlock_mutex;
> + }

If we disable this, I would move this at the beginning of the function.
We do not really care whether the pfn is valid or getting the head of
the page, etc.
So, unless I'm missing something, this should be enough?

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 385b5e99bfc1..ece15f07dee7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2100,6 +2100,12 @@ int unpoison_memory(unsigned long pfn)
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);

+ if (hw_memory_failure) {
+ unpoison_pr_info("Unpoison: Disabled after HW memory failure\n",
+ &unpoison_rs);
+ ret -EOPNOTSUPP;
+ }
+
if (!pfn_valid(pfn))
return -ENXIO;


--
Oscar Salvador
SUSE Labs

2022-06-15 05:53:22

by zhenwei pi

[permalink] [raw]
Subject: Re: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens



On 6/15/22 12:23, Oscar Salvador wrote:
> On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
>> cleared on a x86 platform once hardware memory corrupts.
>>
>> Unpoisoning a hardware corrupted page puts page back buddy only,
>> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
>> This leads BUG during accessing on the corrupted KPTE.
>>
>> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
>> happens to avoid BUG like this:
>>
>> Unpoison: Software-unpoisoned page 0x61234
>> BUG: unable to handle page fault for address: ffff888061234000
>> #PF: supervisor write access in kernel mode
>> #PF: error_code(0x0002) - not-present page
>> PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
>> Oops: 0002 [#1] PREEMPT SMP NOPTI
>> CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
>> RIP: 0010:clear_page_erms+0x7/0x10
>> Code: ...
>> RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
>> RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
>> RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
>> R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
>> R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
>> FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> PKRU: 55555554
>> Call Trace:
>> <TASK>
>> prep_new_page+0x151/0x170
>> get_page_from_freelist+0xca0/0xe20
>> ? sysvec_apic_timer_interrupt+0xab/0xc0
>> ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
>> __alloc_pages+0x17e/0x340
>> __folio_alloc+0x17/0x40
>> vma_alloc_folio+0x84/0x280
>> __handle_mm_fault+0x8d4/0xeb0
>> handle_mm_fault+0xd5/0x2a0
>> do_user_addr_fault+0x1d0/0x680
>> ? kvm_read_and_reset_apf_flags+0x3b/0x50
>> exc_page_fault+0x78/0x170
>> asm_exc_page_fault+0x27/0x30
>>
>> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
>> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
>> Cc: Naoya Horiguchi <[email protected]>
>> Cc: David Hildenbrand <[email protected]>
>> Signed-off-by: zhenwei pi <[email protected]>
>> ---
> ...
> ...
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b85661cbdc4a..385b5e99bfc1 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>>
>> atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>>
>> +static bool hw_memory_failure;
>> +
>> static bool __page_handle_poison(struct page *page)
>> {
>> int ret;
>> @@ -1768,6 +1770,9 @@ int memory_failure(unsigned long pfn, int flags)
>>
>> mutex_lock(&mf_mutex);
>>
>> + if (!(flags & MF_SW_SIMULATED))
>> + hw_memory_failure = true;
>> +
>> p = pfn_to_online_page(pfn);
>> if (!p) {
>> res = arch_memory_failure(pfn, flags);
>> @@ -2103,6 +2108,13 @@ int unpoison_memory(unsigned long pfn)
>>
>> mutex_lock(&mf_mutex);
>>
>> + if (hw_memory_failure) {
>> + unpoison_pr_info("Unpoison: Disabled after HW memory failure %#lx\n",
>> + pfn, &unpoison_rs);
>> + ret = -EOPNOTSUPP;
>> + goto unlock_mutex;
>> + }
>
> If we disable this, I would move this at the beginning of the function.
> We do not really care whether the pfn is valid or getting the head of
> the page, etc.
> So, unless I'm missing something, this should be enough?
>

Hi,

Because memory_failure() may be called by hardware error randomly,
hw_memory_failure should be protected by mf_mutex to avoid this case:
int unpoison_memory(unsigned long pfn)
{
...
if (hw_memory_failure) {
}
... --> memory_failure() happens, and mark hw_memory_failure as true
mutex_lock(&mf_mutex);
}

> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 385b5e99bfc1..ece15f07dee7 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2100,6 +2100,12 @@ int unpoison_memory(unsigned long pfn)
> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> DEFAULT_RATELIMIT_BURST);
>
> + if (hw_memory_failure) {
> + unpoison_pr_info("Unpoison: Disabled after HW memory failure\n",
> + &unpoison_rs);
> + ret -EOPNOTSUPP;
> + }
> +
> if (!pfn_valid(pfn))
> return -ENXIO;
>
>

--
zhenwei pi

2022-06-15 05:55:57

by Oscar Salvador

[permalink] [raw]
Subject: Re: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

On Wed, Jun 15, 2022 at 01:18:23PM +0800, zhenwei pi wrote:
> Hi,
>
> Because memory_failure() may be called by hardware error randomly,
> hw_memory_failure should be protected by mf_mutex to avoid this case:
> int unpoison_memory(unsigned long pfn)
> {
> ...
> if (hw_memory_failure) {
> }
> ... --> memory_failure() happens, and mark hw_memory_failure as true
> mutex_lock(&mf_mutex);

Yeah, I am aware of that.
But once memory_failure() sets hw_memory_failure to true, it does not really matter
whether unpoison_memory() checks that while holding or not the lock, does it?

Note that it does not really matter in the end, but I am just curious whether
there is any strong impediment to that.


--
Oscar Salvador
SUSE Labs

Subject: Re: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

On Wed, Jun 15, 2022 at 07:52:06AM +0200, Oscar Salvador wrote:
> On Wed, Jun 15, 2022 at 01:18:23PM +0800, zhenwei pi wrote:
> > Hi,
> >
> > Because memory_failure() may be called by hardware error randomly,
> > hw_memory_failure should be protected by mf_mutex to avoid this case:
> > int unpoison_memory(unsigned long pfn)
> > {
> > ...
> > if (hw_memory_failure) {
> > }
> > ... --> memory_failure() happens, and mark hw_memory_failure as true
> > mutex_lock(&mf_mutex);

I think that this race can cause the reported problem (hw_memory_failure is
unreliable outside mf_mutex), so we need put the check in mf_mutex for the proper fix.

Thanks,
Naoya Horiguchi

>
> Yeah, I am aware of that.
> But once memory_failure() sets hw_memory_failure to true, it does not really matter
> whether unpoison_memory() checks that while holding or not the lock, does it?
>
> Note that it does not really matter in the end, but I am just curious whether
> there is any strong impediment to that.
>
>
> --
> Oscar Salvador
> SUSE Labs

Subject: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
> Currently unpoison_memory(unsigned long pfn) is designed for soft
> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
> cleared on a x86 platform once hardware memory corrupts.
>
> Unpoisoning a hardware corrupted page puts page back buddy only,
> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
> This leads BUG during accessing on the corrupted KPTE.
>
> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
> happens to avoid BUG like this:
...

>
> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
> Cc: Naoya Horiguchi <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Signed-off-by: zhenwei pi <[email protected]>

Cc to stable?
I think that the current approach seems predictable to me than earlier versions,
so I can agree with sending this to stable a little more confidently.

> ---
> Documentation/vm/hwpoison.rst | 3 ++-
> drivers/base/memory.c | 2 +-
> include/linux/mm.h | 1 +
> mm/hwpoison-inject.c | 2 +-
> mm/madvise.c | 2 +-
> mm/memory-failure.c | 12 ++++++++++++
> 6 files changed, 18 insertions(+), 4 deletions(-)
>

...

> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b85661cbdc4a..385b5e99bfc1 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>
> atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>
> +static bool hw_memory_failure;

Could you set the initial value explicitly? Using a default value is good,
but doing as the surrounding code do is better for consistency. And this
variable can be updated only once, so adding __read_mostly macro is also fine.

Thanks,
Naoya Horiguchi

2022-06-15 08:37:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

On 15.06.22 10:15, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>> poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets
>> cleared on a x86 platform once hardware memory corrupts.
>>
>> Unpoisoning a hardware corrupted page puts page back buddy only,
>> the kernel has a chance to access the page with *NOT PRESENT* KPTE.
>> This leads BUG during accessing on the corrupted KPTE.
>>
>> Suggested by David&Naoya, disable unpoison mechanism when a real HW error
>> happens to avoid BUG like this:
> ...
>
>>
>> Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support")
>> Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
>> Cc: Naoya Horiguchi <[email protected]>
>> Cc: David Hildenbrand <[email protected]>
>> Signed-off-by: zhenwei pi <[email protected]>
>
> Cc to stable?
> I think that the current approach seems predictable to me than earlier versions,
> so I can agree with sending this to stable a little more confidently.
>
>> ---
>> Documentation/vm/hwpoison.rst | 3 ++-
>> drivers/base/memory.c | 2 +-
>> include/linux/mm.h | 1 +
>> mm/hwpoison-inject.c | 2 +-
>> mm/madvise.c | 2 +-
>> mm/memory-failure.c | 12 ++++++++++++
>> 6 files changed, 18 insertions(+), 4 deletions(-)
>>
>
> ...
>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b85661cbdc4a..385b5e99bfc1 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
>>
>> atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
>>
>> +static bool hw_memory_failure;
>
> Could you set the initial value explicitly? Using a default value is good,
> but doing as the surrounding code do is better for consistency. And this
> variable can be updated only once, so adding __read_mostly macro is also fine.

No strong opinion. __read_mostly makes sense, but I assume we don't
really care about performance that much when dealing with HW errors.

With or without changes around this initialization

Acked-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb

Subject: Re: [PATCH v5 1/1] mm/memory-failure: disable unpoison once hw error happens

On Wed, Jun 15, 2022 at 10:21:09AM +0200, David Hildenbrand wrote:
> On 15.06.22 10:15, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Wed, Jun 15, 2022 at 10:00:05AM +0800, zhenwei pi wrote:
...
> >
> > ...
> >
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index b85661cbdc4a..385b5e99bfc1 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -69,6 +69,8 @@ int sysctl_memory_failure_recovery __read_mostly = 1;
> >>
> >> atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
> >>
> >> +static bool hw_memory_failure;
> >
> > Could you set the initial value explicitly? Using a default value is good,
> > but doing as the surrounding code do is better for consistency. And this
> > variable can be updated only once, so adding __read_mostly macro is also fine.
>
> No strong opinion. __read_mostly makes sense, but I assume we don't
> really care about performance that much when dealing with HW errors.

That's right, mm/memory-failure.c should be mostly performance insensitive.

- Naoya Horiguchi