LinuxLists.cc - [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

2021-01-28 12:04:56

Subject: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

when one page is already hwpoisoned by AO action, process may not be
killed, the process mapping this page may make a syscall include this
page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
mode it may be fixed by fixup_exception, current code will just return
error code to user process.

This is not suffient, we should send a SIGBUS to the process and log the
info to console, as we can't trust the process will handle the error
correctly.

Suggested-by: Feng Yang <[email protected]>
Signed-off-by: Aili Yao <[email protected]>
---
arch/x86/mm/fault.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..36d1e385512b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long error_code,
* In this case we need to make sure we're not recursively
* faulting through the emulate_vsyscall() logic.
*/
+#ifdef CONFIG_MEMORY_FAILURE
+ if (si_code == BUS_MCEERR_AR && signal == SIGBUS)
+ pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
+ current->comm, current->pid, address);
+
+ if ((current->thread.sig_on_uaccess_err && signal) ||
+ (si_code == BUS_MCEERR_AR && signal == SIGBUS)) {
+#else
if (current->thread.sig_on_uaccess_err && signal) {
+#endif
sanitize_error_code(address, &error_code);

set_signal_archinfo(address, error_code);
@@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
{
/* Kernel mode? Handle exceptions or die: */
if (!(error_code & X86_PF_USER)) {
+#ifdef CONFIG_MEMORY_FAILURE
+ if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
+ no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
+ else
+ no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+#else
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+#endif
return;
}

--
2.25.1

2021-01-28 17:45:58

by Luck, Tony

[permalink] [raw]

Subject: Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> when one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> mode it may be fixed by fixup_exception, current code will just return
> error code to user process.

Shouldn't the AO action that poisoned the page have also unmapped it?
>
> This is not suffient, we should send a SIGBUS to the process and log the
> info to console, as we can't trust the process will handle the error
> correctly.

I agree with this part ... few apps check for -EFAULT and do the right
thing. But I'm not sure how this happens. Can you provide a bit more
detail on the steps

-Tony

P.S. Typo: s/suffient/sufficient/

>
> Suggested-by: Feng Yang <[email protected]>
> Signed-off-by: Aili Yao <[email protected]>
> ---
> arch/x86/mm/fault.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f1f1b5a0956a..36d1e385512b 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> * In this case we need to make sure we're not recursively
> * faulting through the emulate_vsyscall() logic.
> */
> +#ifdef CONFIG_MEMORY_FAILURE
> + if (si_code == BUS_MCEERR_AR && signal == SIGBUS)
> + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> + current->comm, current->pid, address);
> +
> + if ((current->thread.sig_on_uaccess_err && signal) ||
> + (si_code == BUS_MCEERR_AR && signal == SIGBUS)) {
> +#else
> if (current->thread.sig_on_uaccess_err && signal) {
> +#endif
> sanitize_error_code(address, &error_code);
>
> set_signal_archinfo(address, error_code);
> @@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
> {
> /* Kernel mode? Handle exceptions or die: */
> if (!(error_code & X86_PF_USER)) {
> +#ifdef CONFIG_MEMORY_FAILURE
> + if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
> + no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
> + else
> + no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#else
> no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#endif
> return;
> }
>
> --
> 2.25.1
>

2021-01-29 03:38:37

by yaoaili [么爱利]

[permalink] [raw]

Subject: Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

On Thu, 28 Jan 2021 09:43:52 -0800
"Luck, Tony" <[email protected]> wrote:

> On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> > when one page is already hwpoisoned by AO action, process may not be
> > killed, the process mapping this page may make a syscall include this
> > page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> > mode it may be fixed by fixup_exception, current code will just return
> > error code to user process.
>
> Shouldn't the AO action that poisoned the page have also unmapped it?

Yes, The page has been unmapped in the code mm/rmap.c:

1567 if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
1568 pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
1569 if (PageHuge(page)) {
1570 hugetlb_count_sub(compound_nr(page), mm);
1571 set_huge_swap_pte_at(mm, address,
1572 pvmw.pte, pteval,
1573 vma_mmu_pagesize(vma));
1574 } else {
1575 dec_mm_counter(mm, mm_counter(page));
1576 set_pte_at(mm, address, pvmw.pte, pteval);
1577 }
1578
1579 }

The pte for this page of processes mapping it should be marked with SWP_HWPOISON.
But the process may not be informed and may continue with the address which has been
ummaped, Then accessing the content in the page will trigger a page fault.

Normally, it will hit the code in arch/x86/mm/fault.c:

945 #ifdef CONFIG_MEMORY_FAILURE
946 if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
947 struct task_struct *tsk = current;
948 unsigned lsb = 0;
949
950 pr_err(
951 "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
952 tsk->comm, tsk->pid, address);
953 if (fault & VM_FAULT_HWPOISON_LARGE)
954 lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
955 if (fault & VM_FAULT_HWPOISON)
956 lsb = PAGE_SHIFT;
957 force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);
958 return;
959 }
960 #endif
961 force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
962 }

But when the user processes do a syscall and make a copyin action in kernel space,
the page fault triggered by this action will not got the the above code, it actually
go to the code in arch/x86/mm/fault.c:

650 if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
674 /*
675 * Barring that, we can do the fixup and be happy.
676 */
677 return;
678 }

> >
> > This is not suffient, we should send a SIGBUS to the process and log the
> > info to console, as we can't trust the process will handle the error
> > correctly.
>
> I agree with this part ... few apps check for -EFAULT and do the right
> thing. But I'm not sure how this happens. Can you provide a bit more
> detail on the steps
>

Attachment is a simple code to test this, you can try this test with:
./einj_mem_uc -f copyin2

In my enviroment, the stack will be:

[ 1583.063050] Memory failure: 0x1030254: recovery action for dirty LRU page: Recovered
[ 1583.071724] MCE: Killing einj_mem_uc:11139 due to hardware memory corruption fault at 7f4d59032000
[ 1583.081732] CPU: 38 PID: 11139 Comm: einj_mem_uc Kdump: loaded Not tainted 5.11.0-rc2+ #43
[ 1583.102607] Call Trace:
[ 1583.105338] dump_stack+0x57/0x6a
[ 1583.109041] no_context.cold+0xf6/0x284
[ 1583.113315] mm_fault_error+0xc3/0x1b0
[ 1583.117503] exc_page_fault+0x57/0x110
[ 1583.121690] asm_exc_page_fault+0x1e/0x30
[ 1583.126159] RIP: 0010:__get_user_nocheck_8+0x10/0x13
[ 1583.131704] Code: 0f b7 10 31 c0 0f 01 ca c3 90 0f 01 cb 0f ae e8 8b 10 31 c0 0f 01 ca c3 66 90 0f 01 cb 0f ae e8 48 8b 10 31 c0 0f 01 ca c3 90 <0f> 01 ca 31 d2 48 c7 c0 f2 ff ff ff c3 cc cc cc 0f 1f 44 00 00 40
[ 1583.152659] RSP: 0018:ffffb9e462917d90 EFLAGS: 00050293
[ 1583.158490] RAX: 00007f4d59032000 RBX: 0000000000000000 RCX: 00007f4d59032000
[ 1583.166453] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 00007f4d590321ff
[ 1583.174418] RBP: 0000000000000200 R08: 0000000000000200 R09: ffffb9e462917e50
[ 1583.182382] R10: 0000000000000200 R11: 0000000000000000 R12: ffffb9e462917e60
[ 1583.190345] R13: ffff941470e93058 R14: 0000000000001000 R15: ffffffffc0626540
[ 1583.198310] iov_iter_fault_in_readable+0x4f/0x120
[ 1583.203657] generic_perform_write+0x83/0x1c0
[ 1583.208520] ext4_buffered_write_iter+0x93/0x150 [ext4]
[ 1583.214378] new_sync_write+0x11f/0x1b0
[ 1583.218661] vfs_write+0x1c0/0x280
[ 1583.222455] ksys_write+0x5f/0xe0
[ 1583.226153] do_syscall_64+0x33/0x40
[ 1583.230142] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1583.235778] RIP: 0033:0x7f4d58b35cd0

> P.S. Typo: s/suffient/sufficient/

Thanks for correction!

--
Best Regards!

Aili Yao

Attachments:

(No filename) (5.21 kB)
einj_mem_uc.c (15.51 kB)
Download all attachments

2021-01-29 22:59:22

by Luck, Tony

[permalink] [raw]

Subject: Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

Thanks for the explanation and test code. I think I see better what
is going on here.

[I took your idea for using madvise(...MADV_HWPOISON) and added a new "-S"
option to my einj_mem_uc test program to use madvise instead of ACPI/EINJ
for injections. Update pushed here:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git ]

There have been some small changes to arch/x86/mm/fault.c since you wrote
the patch. Can you rebase to v5.11-rc5?

Also maybe this might be a case to use IS_ENABLED() instead of #ifdef to
make the code a little less ugly. At least for the 2nd hunk in your patch
this would work well:

if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
(fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)))
no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
else
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);

The first hunk might need a bit more thought.

-Tony

2021-02-01 07:25:38

by yaoaili [么爱利]

[permalink] [raw]

Subject: Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

On Fri, 29 Jan 2021 14:55:29 -0800
"Luck, Tony" <[email protected]> wrote:

> Thanks for the explanation and test code. I think I see better what
> is going on here.
>
> [I took your idea for using madvise(...MADV_HWPOISON) and added a new "-S"
> option to my einj_mem_uc test program to use madvise instead of ACPI/EINJ
> for injections. Update pushed here:
> git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git ]
>
> There have been some small changes to arch/x86/mm/fault.c since you wrote
> the patch. Can you rebase to v5.11-rc5?

Yes, I will rebase it to newest version.

> Also maybe this might be a case to use IS_ENABLED() instead of #ifdef to
> make the code a little less ugly. At least for the 2nd hunk in your patch
> this would work well:
>
> if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
> (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)))
> no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
> else
> no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
>
> The first hunk might need a bit more thought.
>
Do you mean the force_sig_mceerr and force_sig_fault difference? I see a hwpoison related comment
there, but it's better to follow the usual way force_sig_mceerr, I will modify this in a v2 patch.

Or something other, you may post a better one.

Thanks

--
Best Regards!

Aili Yao