2022-01-24 18:51:43

by 罗飞

[permalink] [raw]
Subject: [PATCH] x86/mce: Always call kill_me_maybe() to handle memory failure in user mode

Just killing the current process is not enough, it is necessory
to offload the faulty page.

In the virtualization scenario, qemu does not set MCG_STATUS_RIPV by
default. When injecting an SRAR error into the virtual machine, only
the current process will be killed, but the faulty page will be
released and reused, which is very likely to cause the virtual
machine to crash.

Signed-off-by: luofei <[email protected]>
---
arch/x86/kernel/cpu/mce/core.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5818b837fd4d..bc6c353b9250 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1519,10 +1519,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
BUG_ON(!on_thread_stack() || !user_mode(regs));

if (kill_current_task)
- queue_task_work(&m, msg, kill_me_now);
- else
- queue_task_work(&m, msg, kill_me_maybe);
-
+ force_sig(SIGBUS);
+ queue_task_work(&m, msg, kill_me_maybe);
} else {
/*
* Handle an MCE which has happened in kernel space but from
--
2.27.0


2022-01-24 19:01:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/mce: Always call kill_me_maybe() to handle memory failure in user mode

On Mon, Jan 24, 2022 at 03:15:01AM -0500, luofei wrote:
> Just killing the current process is not enough, it is necessory
> to offload the faulty page.
>
> In the virtualization scenario, qemu does not set MCG_STATUS_RIPV by
> default.

Yes, we've had this before. Fix qemu.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette