LinuxLists.cc - [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

2022-06-05 06:24:06

Subject: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
puts page back buddy only, this leads BUG during accessing on the
corrupted KPTE.

Do not allow to unpoison hardware corrupted page in unpoison_memory()
to avoid BUG like this:

Unpoison: Software-unpoisoned page 0x61234
BUG: unable to handle page fault for address: ffff888061234000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
RIP: 0010:clear_page_erms+0x7/0x10
Code: ...
RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
prep_new_page+0x151/0x170
get_page_from_freelist+0xca0/0xe20
? sysvec_apic_timer_interrupt+0xab/0xc0
? asm_sysvec_apic_timer_interrupt+0x1b/0x20
__alloc_pages+0x17e/0x340
__folio_alloc+0x17/0x40
vma_alloc_folio+0x84/0x280
__handle_mm_fault+0x8d4/0xeb0
handle_mm_fault+0xd5/0x2a0
do_user_addr_fault+0x1d0/0x680
? kvm_read_and_reset_apf_flags+0x3b/0x50
exc_page_fault+0x78/0x170
asm_exc_page_fault+0x27/0x30

Signed-off-by: zhenwei pi <[email protected]>
---
mm/memory-failure.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b85661cbdc4a..ec49571924f4 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
{
struct page *page;
struct page *p;
+ pte_t *kpte;
int ret = -EBUSY;
int freeit = 0;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
@@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
p = pfn_to_page(pfn);
page = compound_head(p);

+ kpte = virt_to_kpte((unsigned long)page_to_virt(p));
+ if (kpte && !pte_present(*kpte)) {
+ unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
+ pfn, &unpoison_rs);
+ return -EPERM;
+ }
+
mutex_lock(&mf_mutex);

if (!PageHWPoison(p)) {
--
2.20.1

2022-06-06 03:50:17

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <[email protected]> wrote:

> Currently unpoison_memory(unsigned long pfn) is designed for soft
> poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
> puts page back buddy only, this leads BUG during accessing on the
> corrupted KPTE.
>
> Do not allow to unpoison hardware corrupted page in unpoison_memory()
> to avoid BUG like this:
>
> Unpoison: Software-unpoisoned page 0x61234
> BUG: unable to handle page fault for address: ffff888061234000

Thanks.

> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
> {
> struct page *page;
> struct page *p;
> + pte_t *kpte;
> int ret = -EBUSY;
> int freeit = 0;
> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
> p = pfn_to_page(pfn);
> page = compound_head(p);
>
> + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
> + if (kpte && !pte_present(*kpte)) {
> + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
> + pfn, &unpoison_rs);
> + return -EPERM;
> + }
> +
> mutex_lock(&mf_mutex);
>
> if (!PageHWPoison(p)) {

I guess we don't want to let fault injection crash the kernel, so a
cc:stable seems appropriate here.

Can we think up a suitable Fixes: commit? I'm suspecting this bug has
been there for a long time?

2022-06-06 05:23:32

by zhenwei pi

[permalink] [raw]

Subject: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On 6/5/22 02:56, Andrew Morton wrote:
> On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <[email protected]> wrote:
>
>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>> poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
>> puts page back buddy only, this leads BUG during accessing on the
>> corrupted KPTE.
>>
>> Do not allow to unpoison hardware corrupted page in unpoison_memory()
>> to avoid BUG like this:
>>
>> Unpoison: Software-unpoisoned page 0x61234
>> BUG: unable to handle page fault for address: ffff888061234000
>
> Thanks.
>
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
>> {
>> struct page *page;
>> struct page *p;
>> + pte_t *kpte;
>> int ret = -EBUSY;
>> int freeit = 0;
>> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>> @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
>> p = pfn_to_page(pfn);
>> page = compound_head(p);
>>
>> + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
>> + if (kpte && !pte_present(*kpte)) {
>> + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
>> + pfn, &unpoison_rs);
>> + return -EPERM;
>> + }
>> +
>> mutex_lock(&mf_mutex);
>>
>> if (!PageHWPoison(p)) {
>
> I guess we don't want to let fault injection crash the kernel, so a
> cc:stable seems appropriate here.
>
> Can we think up a suitable Fixes: commit? I'm suspecting this bug has
> been there for a long time?
>

Sure!

2009-Dec-16, hwpoison_unpoison() was introduced into linux in commit:
847ce401df392("HWPOISON: Add unpoisoning support")
...
There is no hardware level unpoisioning, so this cannot be used for real
memory errors, only for software injected errors.
...

We can find that this function should be used for software level
unpoisoning only in both commit log and comment in source code.
unfortunately there is no check in function hwpoison_unpoison().

2020-May-20, 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the
whole page is affected and poisoned")

This clears KPTE, and leads BUG(described in this patch) during
unpoisoning the hardware corrupted page.

Fixes: 847ce401df392("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole
page is affected and poisoned")

Cc: Wu Fengguang <[email protected]>
Cc: Tony Luck <[email protected]>.

--
zhenwei pi

2022-06-06 06:11:09

by HORIGUCHI NAOYA(堀口　直也)

[permalink] [raw]

Subject: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On Sun, Jun 05, 2022 at 12:24:24PM +0800, zhenwei pi wrote:
>
>
> On 6/5/22 02:56, Andrew Morton wrote:
> > On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <[email protected]> wrote:
> >
> > > Currently unpoison_memory(unsigned long pfn) is designed for soft
> > > poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
> > > puts page back buddy only, this leads BUG during accessing on the
> > > corrupted KPTE.

Thank you for the patch. I think this will be helpful for integration testing.

You mention "hardware corrupted page" as the condition of this bug, and I
think that it means a real hardware error, but this BUG seems to be
triggered when we use mce-inject or APEI (these are also software injection
without corrupting the memory physically). So the actual condition is
"when memory_failure() is called by MCE handler"?

> > >
> > > Do not allow to unpoison hardware corrupted page in unpoison_memory()
> > > to avoid BUG like this:
> > >
> > > Unpoison: Software-unpoisoned page 0x61234
> > > BUG: unable to handle page fault for address: ffff888061234000
> >
> > Thanks.
> >
> > > --- a/mm/memory-failure.c
> > > +++ b/mm/memory-failure.c
> > > @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
> > > {
> > > struct page *page;
> > > struct page *p;
> > > + pte_t *kpte;
> > > int ret = -EBUSY;
> > > int freeit = 0;
> > > static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> > > @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
> > > p = pfn_to_page(pfn);
> > > page = compound_head(p);
> > > + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
> > > + if (kpte && !pte_present(*kpte)) {
> > > + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
> > > + pfn, &unpoison_rs);

This can prevent unpoison for hwpoison on 4kB pages, but not for hugetlb pages,
where I see the similar BUG as follows (even with applying your patch):

[ 917.806712] BUG: unable to handle page fault for address: ffff9f7bb3201000
[ 917.810144] #PF: supervisor write access in kernel mode
[ 917.812588] #PF: error_code(0x0002) - not-present page
[ 917.815007] PGD 104801067 P4D 104801067 PUD 10006b063 PMD 1052d0063 PTE 800ffffeccdfe062
[ 917.818768] Oops: 0002 [#1] PREEMPT SMP PTI
[ 917.820759] CPU: 0 PID: 7774 Comm: test_alloc_gene Tainted: G M OE 5.18.0-v5.18-220606-0942-029-ge4dcc+ #47
[ 917.825720] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[ 917.829762] RIP: 0010:clear_page_erms+0x7/0x10
[ 917.831867] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 48 85 ff 0f 84 d3 00 00 00 0f b6 0f 4c
[ 917.840540] RSP: 0000:ffffab49c25ebdf0 EFLAGS: 00010246
[ 917.842839] RAX: 0000000000000000 RBX: ffffd538c4cc8000 RCX: 0000000000001000
[ 917.845835] RDX: 0000000080000000 RSI: 00007f2aeb600000 RDI: ffff9f7bb3201000
[ 917.848687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 917.851377] R10: 0000000000000002 R11: ffff9f7b87e3a2a0 R12: 0000000000000000
[ 917.854035] R13: 0000000000000001 R14: ffffd538c4cc8000 R15: ffff9f7bc002a5d8
[ 917.856539] FS: 00007f2aebad3740(0000) GS:ffff9f7bbbc00000(0000) knlGS:0000000000000000
[ 917.859229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 917.861149] CR2: ffff9f7bb3201000 CR3: 0000000107726003 CR4: 0000000000170ef0
[ 917.863433] Call Trace:
[ 917.864266] <TASK>
[ 917.864961] clear_huge_page+0x147/0x270
[ 917.866236] hugetlb_fault+0x440/0xad0
[ 917.867366] handle_mm_fault+0x270/0x290
[ 917.868532] do_user_addr_fault+0x1c3/0x680
[ 917.869768] exc_page_fault+0x6c/0x160
[ 917.870912] ? asm_exc_page_fault+0x8/0x30
[ 917.872082] asm_exc_page_fault+0x1e/0x30
[ 917.873220] RIP: 0033:0x7f2aeb8ba367

I don't think of a workaround for this now ...

> > > + return -EPERM;

Is -EOPNOTSUPP a better error code?

> > > + }
> > > +
> > > mutex_lock(&mf_mutex);
> > > if (!PageHWPoison(p)) {
> >
> > I guess we don't want to let fault injection crash the kernel, so a
> > cc:stable seems appropriate here.
> >
> > Can we think up a suitable Fixes: commit? I'm suspecting this bug has
> > been there for a long time?
> >
>
> Sure!
>
> 2009-Dec-16, hwpoison_unpoison() was introduced into linux in commit:
> 847ce401df392("HWPOISON: Add unpoisoning support")
> ...
> There is no hardware level unpoisioning, so this cannot be used for real
> memory errors, only for software injected errors.
> ...
>
> We can find that this function should be used for software level unpoisoning
> only in both commit log and comment in source code. unfortunately there is
> no check in function hwpoison_unpoison().
>
>
> 2020-May-20, 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole
> page is affected and poisoned")
>
> This clears KPTE, and leads BUG(described in this patch) during unpoisoning
> the hardware corrupted page.
>
>
> Fixes: 847ce401df392("HWPOISON: Add unpoisoning support")
> Fixes: 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole page
> is affected and poisoned")
>
> Cc: Wu Fengguang <[email protected]>
> Cc: Tony Luck <[email protected]>.

Thanks for checking the history, I agree with sending to stable.

Thanks,
Naoya Horiguchi

2022-06-06 07:49:55

by zhenwei pi

[permalink] [raw]

Subject: Re: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On 6/6/22 12:32, HORIGUCHI NAOYA(堀口直也) wrote:
> On Sun, Jun 05, 2022 at 12:24:24PM +0800, zhenwei pi wrote:
>>
>>
>> On 6/5/22 02:56, Andrew Morton wrote:
>>> On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <[email protected]> wrote:
>>>
>>>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>>>> poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
>>>> puts page back buddy only, this leads BUG during accessing on the
>>>> corrupted KPTE.
>
> Thank you for the patch. I think this will be helpful for integration testing.
>
> You mention "hardware corrupted page" as the condition of this bug, and I
> think that it means a real hardware error, but this BUG seems to be
> triggered when we use mce-inject or APEI (these are also software injection
> without corrupting the memory physically). So the actual condition is
> "when memory_failure() is called by MCE handler"?
>

Yes, I use QEMU to emulate a 'real hardware error' by command:
virsh qemu-monitor-command vm --hmp mce 0 9 0xbd000000000000c0 0xd
0x61234000 0x8c

>>>>
>>>> Do not allow to unpoison hardware corrupted page in unpoison_memory()
>>>> to avoid BUG like this:
>>>>
>>>> Unpoison: Software-unpoisoned page 0x61234
>>>> BUG: unable to handle page fault for address: ffff888061234000
>>>
>>> Thanks.
>>>
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
>>>> {
>>>> struct page *page;
>>>> struct page *p;
>>>> + pte_t *kpte;
>>>> int ret = -EBUSY;
>>>> int freeit = 0;
>>>> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>>>> @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
>>>> p = pfn_to_page(pfn);
>>>> page = compound_head(p);
>>>> + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
>>>> + if (kpte && !pte_present(*kpte)) {
>>>> + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
>>>> + pfn, &unpoison_rs);
>
> This can prevent unpoison for hwpoison on 4kB pages, but not for hugetlb pages,
> where I see the similar BUG as follows (even with applying your patch):
>
> [ 917.806712] BUG: unable to handle page fault for address: ffff9f7bb3201000
> [ 917.810144] #PF: supervisor write access in kernel mode
> [ 917.812588] #PF: error_code(0x0002) - not-present page
> [ 917.815007] PGD 104801067 P4D 104801067 PUD 10006b063 PMD 1052d0063 PTE 800ffffeccdfe062
> [ 917.818768] Oops: 0002 [#1] PREEMPT SMP PTI
> [ 917.820759] CPU: 0 PID: 7774 Comm: test_alloc_gene Tainted: G M OE 5.18.0-v5.18-220606-0942-029-ge4dcc+ #47
> [ 917.825720] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> [ 917.829762] RIP: 0010:clear_page_erms+0x7/0x10
> [ 917.831867] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 48 85 ff 0f 84 d3 00 00 00 0f b6 0f 4c
> [ 917.840540] RSP: 0000:ffffab49c25ebdf0 EFLAGS: 00010246
> [ 917.842839] RAX: 0000000000000000 RBX: ffffd538c4cc8000 RCX: 0000000000001000
> [ 917.845835] RDX: 0000000080000000 RSI: 00007f2aeb600000 RDI: ffff9f7bb3201000
> [ 917.848687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 917.851377] R10: 0000000000000002 R11: ffff9f7b87e3a2a0 R12: 0000000000000000
> [ 917.854035] R13: 0000000000000001 R14: ffffd538c4cc8000 R15: ffff9f7bc002a5d8
> [ 917.856539] FS: 00007f2aebad3740(0000) GS:ffff9f7bbbc00000(0000) knlGS:0000000000000000
> [ 917.859229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 917.861149] CR2: ffff9f7bb3201000 CR3: 0000000107726003 CR4: 0000000000170ef0
> [ 917.863433] Call Trace:
> [ 917.864266] <TASK>
> [ 917.864961] clear_huge_page+0x147/0x270
> [ 917.866236] hugetlb_fault+0x440/0xad0
> [ 917.867366] handle_mm_fault+0x270/0x290
> [ 917.868532] do_user_addr_fault+0x1c3/0x680
> [ 917.869768] exc_page_fault+0x6c/0x160
> [ 917.870912] ? asm_exc_page_fault+0x8/0x30
> [ 917.872082] asm_exc_page_fault+0x1e/0x30
> [ 917.873220] RIP: 0033:0x7f2aeb8ba367
>
> I don't think of a workaround for this now ...
>

Could you please tell me how to reproduce this issue?

>>>> + return -EPERM;
>
> Is -EOPNOTSUPP a better error code?
>

OK!

>>>> + }
>>>> +
>>>> mutex_lock(&mf_mutex);
>>>> if (!PageHWPoison(p)) {
>>>
>>> I guess we don't want to let fault injection crash the kernel, so a
>>> cc:stable seems appropriate here.
>>>
>>> Can we think up a suitable Fixes: commit? I'm suspecting this bug has
>>> been there for a long time?
>>>
>>
>> Sure!
>>
>> 2009-Dec-16, hwpoison_unpoison() was introduced into linux in commit:
>> 847ce401df392("HWPOISON: Add unpoisoning support")
>> ...
>> There is no hardware level unpoisioning, so this cannot be used for real
>> memory errors, only for software injected errors.
>> ...
>>
>> We can find that this function should be used for software level unpoisoning
>> only in both commit log and comment in source code. unfortunately there is
>> no check in function hwpoison_unpoison().
>>
>>
>> 2020-May-20, 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole
>> page is affected and poisoned")
>>
>> This clears KPTE, and leads BUG(described in this patch) during unpoisoning
>> the hardware corrupted page.
>>
>>
>> Fixes: 847ce401df392("HWPOISON: Add unpoisoning support")
>> Fixes: 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole page
>> is affected and poisoned")
>>
>> Cc: Wu Fengguang <[email protected]>
>> Cc: Tony Luck <[email protected]>.
>
> Thanks for checking the history, I agree with sending to stable.
>
> Thanks,
> Naoya Horiguchi

--
zhenwei pi

2022-06-06 09:35:38

by HORIGUCHI NAOYA(堀口　直也)

[permalink] [raw]

Subject: Re: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On Mon, Jun 06, 2022 at 03:20:27PM +0800, zhenwei pi wrote:
>
>
> On 6/6/22 12:32, HORIGUCHI NAOYA(堀口直也) wrote:
> > On Sun, Jun 05, 2022 at 12:24:24PM +0800, zhenwei pi wrote:
> > >
> > >
> > > On 6/5/22 02:56, Andrew Morton wrote:
> > > > On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <[email protected]> wrote:
> > > >
> > > > > Currently unpoison_memory(unsigned long pfn) is designed for soft
> > > > > poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
> > > > > puts page back buddy only, this leads BUG during accessing on the
> > > > > corrupted KPTE.
> >
> > Thank you for the patch. I think this will be helpful for integration testing.
> >
> > You mention "hardware corrupted page" as the condition of this bug, and I
> > think that it means a real hardware error, but this BUG seems to be
> > triggered when we use mce-inject or APEI (these are also software injection
> > without corrupting the memory physically). So the actual condition is
> > "when memory_failure() is called by MCE handler"?
> >
>
> Yes, I use QEMU to emulate a 'real hardware error' by command:
> virsh qemu-monitor-command vm --hmp mce 0 9 0xbd000000000000c0 0xd
> 0x61234000 0x8c
>
> > > > >
> > > > > Do not allow to unpoison hardware corrupted page in unpoison_memory()
> > > > > to avoid BUG like this:
> > > > >
> > > > > Unpoison: Software-unpoisoned page 0x61234
> > > > > BUG: unable to handle page fault for address: ffff888061234000
> > > >
> > > > Thanks.
> > > >
> > > > > --- a/mm/memory-failure.c
> > > > > +++ b/mm/memory-failure.c
> > > > > @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
> > > > > {
> > > > > struct page *page;
> > > > > struct page *p;
> > > > > + pte_t *kpte;
> > > > > int ret = -EBUSY;
> > > > > int freeit = 0;
> > > > > static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> > > > > @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
> > > > > p = pfn_to_page(pfn);
> > > > > page = compound_head(p);
> > > > > + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
> > > > > + if (kpte && !pte_present(*kpte)) {
> > > > > + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
> > > > > + pfn, &unpoison_rs);
> >
> > This can prevent unpoison for hwpoison on 4kB pages, but not for hugetlb pages,
> > where I see the similar BUG as follows (even with applying your patch):
> >
> > [ 917.806712] BUG: unable to handle page fault for address: ffff9f7bb3201000
> > [ 917.810144] #PF: supervisor write access in kernel mode
> > [ 917.812588] #PF: error_code(0x0002) - not-present page
> > [ 917.815007] PGD 104801067 P4D 104801067 PUD 10006b063 PMD 1052d0063 PTE 800ffffeccdfe062
> > [ 917.818768] Oops: 0002 [#1] PREEMPT SMP PTI
> > [ 917.820759] CPU: 0 PID: 7774 Comm: test_alloc_gene Tainted: G M OE 5.18.0-v5.18-220606-0942-029-ge4dcc+ #47
> > [ 917.825720] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> > [ 917.829762] RIP: 0010:clear_page_erms+0x7/0x10
> > [ 917.831867] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 48 85 ff 0f 84 d3 00 00 00 0f b6 0f 4c
> > [ 917.840540] RSP: 0000:ffffab49c25ebdf0 EFLAGS: 00010246
> > [ 917.842839] RAX: 0000000000000000 RBX: ffffd538c4cc8000 RCX: 0000000000001000
> > [ 917.845835] RDX: 0000000080000000 RSI: 00007f2aeb600000 RDI: ffff9f7bb3201000
> > [ 917.848687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [ 917.851377] R10: 0000000000000002 R11: ffff9f7b87e3a2a0 R12: 0000000000000000
> > [ 917.854035] R13: 0000000000000001 R14: ffffd538c4cc8000 R15: ffff9f7bc002a5d8
> > [ 917.856539] FS: 00007f2aebad3740(0000) GS:ffff9f7bbbc00000(0000) knlGS:0000000000000000
> > [ 917.859229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 917.861149] CR2: ffff9f7bb3201000 CR3: 0000000107726003 CR4: 0000000000170ef0
> > [ 917.863433] Call Trace:
> > [ 917.864266] <TASK>
> > [ 917.864961] clear_huge_page+0x147/0x270
> > [ 917.866236] hugetlb_fault+0x440/0xad0
> > [ 917.867366] handle_mm_fault+0x270/0x290
> > [ 917.868532] do_user_addr_fault+0x1c3/0x680
> > [ 917.869768] exc_page_fault+0x6c/0x160
> > [ 917.870912] ? asm_exc_page_fault+0x8/0x30
> > [ 917.872082] asm_exc_page_fault+0x1e/0x30
> > [ 917.873220] RIP: 0033:0x7f2aeb8ba367
> >
> > I don't think of a workaround for this now ...
> >
>
> Could you please tell me how to reproduce this issue?

You are familiar with qemu-monitor-command, so the following procedure
should work for you:

- run a process using hugepages on your VM,
- check the guest physical address of the hugepage (page-types.c is helpful for this),
- inject a MCE with virsh qemu-monitor-command on the guest physical address, then
- unpoison the injected physical address.

Maybe the above is enough, but in case let me share my procedure using
my own test tool.

$ git clone https://github.com/nhoriguchi/mm_regression
$ cd mm_regression
$ ... # Make sure the prerequisites (see README.md) are met.
$ make # Some files may fail to build, but it's ok if
# test_alloc_generic.c is built.
$ ./run.sh prepare debug
$ ./run.sh recipe list | grep mce/uc/srao/backend-hugetlb > work/debug/recipelist
$ RUN_MODE=all ./run.sh project run
$ RUN_MODE=all ./run.sh project run -a # when you want to rerun

I don't want bother you to learn this tool, so if something go wrong,
feel free to let me know.

Thanks,
Naoya Horiguchi

2022-06-08 00:51:20

by Miaohe Lin

[permalink] [raw]

Subject: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On 2022/6/4 18:32, zhenwei pi wrote:
> Currently unpoison_memory(unsigned long pfn) is designed for soft
> poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
> puts page back buddy only, this leads BUG during accessing on the
> corrupted KPTE.
>
> Do not allow to unpoison hardware corrupted page in unpoison_memory()
> to avoid BUG like this:
>
> Unpoison: Software-unpoisoned page 0x61234
> BUG: unable to handle page fault for address: ffff888061234000
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
> RIP: 0010:clear_page_erms+0x7/0x10
> Code: ...
> RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
> RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
> RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
> R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
> R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
> FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> prep_new_page+0x151/0x170
> get_page_from_freelist+0xca0/0xe20
> ? sysvec_apic_timer_interrupt+0xab/0xc0
> ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
> __alloc_pages+0x17e/0x340
> __folio_alloc+0x17/0x40
> vma_alloc_folio+0x84/0x280
> __handle_mm_fault+0x8d4/0xeb0
> handle_mm_fault+0xd5/0x2a0
> do_user_addr_fault+0x1d0/0x680
> ? kvm_read_and_reset_apf_flags+0x3b/0x50
> exc_page_fault+0x78/0x170
> asm_exc_page_fault+0x27/0x30
>

Thanks for fixing this issue.

> Signed-off-by: zhenwei pi <[email protected]>
> ---
> mm/memory-failure.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b85661cbdc4a..ec49571924f4 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
> {
> struct page *page;
> struct page *p;
> + pte_t *kpte;
> int ret = -EBUSY;
> int freeit = 0;
> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
> @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
> p = pfn_to_page(pfn);
> page = compound_head(p);
>
> + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
> + if (kpte && !pte_present(*kpte)) {

It seems this bug is specified to x86? IIUC, not all arch will unmap the entire page if the
whole page is affected and poisoned. So the above virt_to_kpte + !pte_present check could not
detect the hardware corrupted page reliably, i.e. if page is not *whole* unmapped, e.g. *possible*
hugetlb page, we will still unpoison a hardware corrupted page. Or am I miss something?

> + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
> + pfn, &unpoison_rs);
> + return -EPERM;
> + }
> +

I think -EOPNOTSUPP might be a better error code too as Naoya pointed out.

> mutex_lock(&mf_mutex);
>
> if (!PageHWPoison(p)) {
>

Thanks!

2022-06-08 04:19:50

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On 06.06.22 11:15, HORIGUCHI NAOYA(堀口直也) wrote:
> On Mon, Jun 06, 2022 at 03:20:27PM +0800, zhenwei pi wrote:
>>
>>
>> On 6/6/22 12:32, HORIGUCHI NAOYA(堀口直也) wrote:
>>> On Sun, Jun 05, 2022 at 12:24:24PM +0800, zhenwei pi wrote:
>>>>
>>>>
>>>> On 6/5/22 02:56, Andrew Morton wrote:
>>>>> On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <[email protected]> wrote:
>>>>>
>>>>>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>>>>>> poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
>>>>>> puts page back buddy only, this leads BUG during accessing on the
>>>>>> corrupted KPTE.
>>>
>>> Thank you for the patch. I think this will be helpful for integration testing.
>>>
>>> You mention "hardware corrupted page" as the condition of this bug, and I
>>> think that it means a real hardware error, but this BUG seems to be
>>> triggered when we use mce-inject or APEI (these are also software injection
>>> without corrupting the memory physically). So the actual condition is
>>> "when memory_failure() is called by MCE handler"?
>>>
>>
>> Yes, I use QEMU to emulate a 'real hardware error' by command:
>> virsh qemu-monitor-command vm --hmp mce 0 9 0xbd000000000000c0 0xd
>> 0x61234000 0x8c
>>
>>>>>>
>>>>>> Do not allow to unpoison hardware corrupted page in unpoison_memory()
>>>>>> to avoid BUG like this:
>>>>>>
>>>>>> Unpoison: Software-unpoisoned page 0x61234
>>>>>> BUG: unable to handle page fault for address: ffff888061234000
>>>>>
>>>>> Thanks.
>>>>>
>>>>>> --- a/mm/memory-failure.c
>>>>>> +++ b/mm/memory-failure.c
>>>>>> @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
>>>>>> {
>>>>>> struct page *page;
>>>>>> struct page *p;
>>>>>> + pte_t *kpte;
>>>>>> int ret = -EBUSY;
>>>>>> int freeit = 0;
>>>>>> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>>>>>> @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
>>>>>> p = pfn_to_page(pfn);
>>>>>> page = compound_head(p);
>>>>>> + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
>>>>>> + if (kpte && !pte_present(*kpte)) {
>>>>>> + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
>>>>>> + pfn, &unpoison_rs);
>>>
>>> This can prevent unpoison for hwpoison on 4kB pages, but not for hugetlb pages,
>>> where I see the similar BUG as follows (even with applying your patch):
>>>
>>> [ 917.806712] BUG: unable to handle page fault for address: ffff9f7bb3201000
>>> [ 917.810144] #PF: supervisor write access in kernel mode
>>> [ 917.812588] #PF: error_code(0x0002) - not-present page
>>> [ 917.815007] PGD 104801067 P4D 104801067 PUD 10006b063 PMD 1052d0063 PTE 800ffffeccdfe062
>>> [ 917.818768] Oops: 0002 [#1] PREEMPT SMP PTI
>>> [ 917.820759] CPU: 0 PID: 7774 Comm: test_alloc_gene Tainted: G M OE 5.18.0-v5.18-220606-0942-029-ge4dcc+ #47
>>> [ 917.825720] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
>>> [ 917.829762] RIP: 0010:clear_page_erms+0x7/0x10
>>> [ 917.831867] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 48 85 ff 0f 84 d3 00 00 00 0f b6 0f 4c
>>> [ 917.840540] RSP: 0000:ffffab49c25ebdf0 EFLAGS: 00010246
>>> [ 917.842839] RAX: 0000000000000000 RBX: ffffd538c4cc8000 RCX: 0000000000001000
>>> [ 917.845835] RDX: 0000000080000000 RSI: 00007f2aeb600000 RDI: ffff9f7bb3201000
>>> [ 917.848687] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>>> [ 917.851377] R10: 0000000000000002 R11: ffff9f7b87e3a2a0 R12: 0000000000000000
>>> [ 917.854035] R13: 0000000000000001 R14: ffffd538c4cc8000 R15: ffff9f7bc002a5d8
>>> [ 917.856539] FS: 00007f2aebad3740(0000) GS:ffff9f7bbbc00000(0000) knlGS:0000000000000000
>>> [ 917.859229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 917.861149] CR2: ffff9f7bb3201000 CR3: 0000000107726003 CR4: 0000000000170ef0
>>> [ 917.863433] Call Trace:
>>> [ 917.864266] <TASK>
>>> [ 917.864961] clear_huge_page+0x147/0x270
>>> [ 917.866236] hugetlb_fault+0x440/0xad0
>>> [ 917.867366] handle_mm_fault+0x270/0x290
>>> [ 917.868532] do_user_addr_fault+0x1c3/0x680
>>> [ 917.869768] exc_page_fault+0x6c/0x160
>>> [ 917.870912] ? asm_exc_page_fault+0x8/0x30
>>> [ 917.872082] asm_exc_page_fault+0x1e/0x30
>>> [ 917.873220] RIP: 0033:0x7f2aeb8ba367
>>>
>>> I don't think of a workaround for this now ...
>>>
>>
>> Could you please tell me how to reproduce this issue?
>
> You are familiar with qemu-monitor-command, so the following procedure
> should work for you:
>
> - run a process using hugepages on your VM,
> - check the guest physical address of the hugepage (page-types.c is helpful for this),
> - inject a MCE with virsh qemu-monitor-command on the guest physical address, then
> - unpoison the injected physical address.

That's triggered via debugfs / HWPOISON_INJECT, right?

That's a DEBUG_KERNEL option, so I'm not 100% sure if we really want to
cc stable.

--
Thanks,

David / dhildenb

2022-06-08 05:55:20

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On Tue, 7 Jun 2022 14:36:00 +0200 David Hildenbrand <[email protected]> wrote:

> On 06.06.22 11:15, HORIGUCHI NAOYA(堀口直也) wrote:
> >>> [ 917.864266] <TASK>
> >>> [ 917.864961] clear_huge_page+0x147/0x270
> >>> [ 917.866236] hugetlb_fault+0x440/0xad0
> >>> [ 917.867366] handle_mm_fault+0x270/0x290
> >>> [ 917.868532] do_user_addr_fault+0x1c3/0x680
> >>> [ 917.869768] exc_page_fault+0x6c/0x160
> >>> [ 917.870912] ? asm_exc_page_fault+0x8/0x30
> >>> [ 917.872082] asm_exc_page_fault+0x1e/0x30
> >>> [ 917.873220] RIP: 0033:0x7f2aeb8ba367
> >>>
> >>> I don't think of a workaround for this now ...
> >>>
> >>
> >> Could you please tell me how to reproduce this issue?
> >
> > You are familiar with qemu-monitor-command, so the following procedure
> > should work for you:
> >
> > - run a process using hugepages on your VM,
> > - check the guest physical address of the hugepage (page-types.c is helpful for this),
> > - inject a MCE with virsh qemu-monitor-command on the guest physical address, then
> > - unpoison the injected physical address.
>
> That's triggered via debugfs / HWPOISON_INJECT, right?
>
> That's a DEBUG_KERNEL option, so I'm not 100% sure if we really want to
> cc stable.

Sure, it's hardly a must-have. But let's also take the patch
complexity&risk into account. This is one dang simple patch.

Or is it. Should these things be happening outside mf_mutex? What the
heck is the role of mf_mutex anyway?

2022-06-08 06:20:50

by HORIGUCHI NAOYA(堀口　直也)

[permalink] [raw]

Subject: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On Tue, Jun 07, 2022 at 02:59:59PM -0700, Andrew Morton wrote:
> On Tue, 7 Jun 2022 14:36:00 +0200 David Hildenbrand <[email protected]> wrote:
>
> > On 06.06.22 11:15, HORIGUCHI NAOYA(堀口直也) wrote:
> > >>> [ 917.864266] <TASK>
> > >>> [ 917.864961] clear_huge_page+0x147/0x270
> > >>> [ 917.866236] hugetlb_fault+0x440/0xad0
> > >>> [ 917.867366] handle_mm_fault+0x270/0x290
> > >>> [ 917.868532] do_user_addr_fault+0x1c3/0x680
> > >>> [ 917.869768] exc_page_fault+0x6c/0x160
> > >>> [ 917.870912] ? asm_exc_page_fault+0x8/0x30
> > >>> [ 917.872082] asm_exc_page_fault+0x1e/0x30
> > >>> [ 917.873220] RIP: 0033:0x7f2aeb8ba367
> > >>>
> > >>> I don't think of a workaround for this now ...
> > >>>
> > >>
> > >> Could you please tell me how to reproduce this issue?
> > >
> > > You are familiar with qemu-monitor-command, so the following procedure
> > > should work for you:
> > >
> > > - run a process using hugepages on your VM,
> > > - check the guest physical address of the hugepage (page-types.c is helpful for this),
> > > - inject a MCE with virsh qemu-monitor-command on the guest physical address, then
> > > - unpoison the injected physical address.
> >
> > That's triggered via debugfs / HWPOISON_INJECT, right?
> >
> > That's a DEBUG_KERNEL option, so I'm not 100% sure if we really want to
> > cc stable.

Sure, the impact of the bug is limited.

>
> Sure, it's hardly a must-have. But let's also take the patch
> complexity&risk into account. This is one dang simple patch.
>
> Or is it. Should these things be happening outside mf_mutex? What the
> heck is the role of mf_mutex anyway?

mf_mutex is to ensure that only one error handling thread can handle
the pfn at one time, but set_mce_nospec() is called outside it now.
So if we want to prevent the race with unmap, both of set_mce_nospec()
and the new kpte check might need to be done in mf_mutex.

- Naoya Horiguchi

2022-06-08 07:32:30

by zhenwei pi

[permalink] [raw]

Subject: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On 6/8/22 07:43, HORIGUCHI NAOYA(堀口直也) wrote:
> On Tue, Jun 07, 2022 at 02:59:59PM -0700, Andrew Morton wrote:
>> On Tue, 7 Jun 2022 14:36:00 +0200 David Hildenbrand <[email protected]> wrote:
>>
>>> On 06.06.22 11:15, HORIGUCHI NAOYA(堀口直也) wrote:
>>>>>> [ 917.864266] <TASK>
>>>>>> [ 917.864961] clear_huge_page+0x147/0x270
>>>>>> [ 917.866236] hugetlb_fault+0x440/0xad0
>>>>>> [ 917.867366] handle_mm_fault+0x270/0x290
>>>>>> [ 917.868532] do_user_addr_fault+0x1c3/0x680
>>>>>> [ 917.869768] exc_page_fault+0x6c/0x160
>>>>>> [ 917.870912] ? asm_exc_page_fault+0x8/0x30
>>>>>> [ 917.872082] asm_exc_page_fault+0x1e/0x30
>>>>>> [ 917.873220] RIP: 0033:0x7f2aeb8ba367
>>>>>>
>>>>>> I don't think of a workaround for this now ...
>>>>>>
>>>>>
>>>>> Could you please tell me how to reproduce this issue?
>>>>
>>>> You are familiar with qemu-monitor-command, so the following procedure
>>>> should work for you:
>>>>
>>>> - run a process using hugepages on your VM,
>>>> - check the guest physical address of the hugepage (page-types.c is helpful for this),
>>>> - inject a MCE with virsh qemu-monitor-command on the guest physical address, then
>>>> - unpoison the injected physical address.
>>>
>>> That's triggered via debugfs / HWPOISON_INJECT, right?
>>>
>>> That's a DEBUG_KERNEL option, so I'm not 100% sure if we really want to
>>> cc stable.
>
> Sure, the impact of the bug is limited.
>
>>
>> Sure, it's hardly a must-have. But let's also take the patch
>> complexity&risk into account. This is one dang simple patch.
>>
>> Or is it. Should these things be happening outside mf_mutex? What the
>> heck is the role of mf_mutex anyway?
>
> mf_mutex is to ensure that only one error handling thread can handle
> the pfn at one time, but set_mce_nospec() is called outside it now.
> So if we want to prevent the race with unmap, both of set_mce_nospec()
> and the new kpte check might need to be done in mf_mutex.
>
> - Naoya Horiguchi

OK, I'll sent a v2 patch which includes:
- this change gets protected by mf_mutex
- use -EOPNOTSUPP instead of -EPERM

By the way, I assume that the similar trace(provided by Naoya) is not a
same issue. It seems undissolved huge page with corrupted KPTE. I'm
trying to fix it ...

--
zhenwei pi

2022-06-08 10:40:23

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw corrupted page

On 07.06.22 23:59, Andrew Morton wrote:
> On Tue, 7 Jun 2022 14:36:00 +0200 David Hildenbrand <[email protected]> wrote:
>
>> On 06.06.22 11:15, HORIGUCHI NAOYA(堀口直也) wrote:
>>>>> [ 917.864266] <TASK>
>>>>> [ 917.864961] clear_huge_page+0x147/0x270
>>>>> [ 917.866236] hugetlb_fault+0x440/0xad0
>>>>> [ 917.867366] handle_mm_fault+0x270/0x290
>>>>> [ 917.868532] do_user_addr_fault+0x1c3/0x680
>>>>> [ 917.869768] exc_page_fault+0x6c/0x160
>>>>> [ 917.870912] ? asm_exc_page_fault+0x8/0x30
>>>>> [ 917.872082] asm_exc_page_fault+0x1e/0x30
>>>>> [ 917.873220] RIP: 0033:0x7f2aeb8ba367
>>>>>
>>>>> I don't think of a workaround for this now ...
>>>>>
>>>>
>>>> Could you please tell me how to reproduce this issue?
>>>
>>> You are familiar with qemu-monitor-command, so the following procedure
>>> should work for you:
>>>
>>> - run a process using hugepages on your VM,
>>> - check the guest physical address of the hugepage (page-types.c is helpful for this),
>>> - inject a MCE with virsh qemu-monitor-command on the guest physical address, then
>>> - unpoison the injected physical address.
>>
>> That's triggered via debugfs / HWPOISON_INJECT, right?
>>
>> That's a DEBUG_KERNEL option, so I'm not 100% sure if we really want to
>> cc stable.
>
> Sure, it's hardly a must-have. But let's also take the patch
> complexity&risk into account. This is one dang simple patch.
>
> Or is it. Should these things be happening outside mf_mutex? What the
> heck is the role of mf_mutex anyway?

For example, I'm not even sure if we're allowed to use virt_to_kpte()
out of random context at all.

If we have a PMD direct map, why should it be okay to use virt_to_kpte()?

Maybe I am just wrong, I asked that question on the next patch version
as well.

--
Thanks,

David / dhildenb