2020-06-10 20:18:13

by David P. Reed

[permalink] [raw]
Subject: [PATCH] Fix undefined operation VMXOFF during reboot and crash

If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
done on all CPUS, to allow the INIT IPI to function, since
INIT is suppressed when CPUs are in VMX root operation.
However, VMXOFF causes an undefined operation fault if the CPU is not
in VMX operation, that is, VMXON has not been executed, or VMXOFF
has been executed, but VMX is enabled. This fix makes the reboot
work more reliably by modifying the #UD handler to skip the VMXOFF
if VMX is enabled on the CPU and the VMXOFF is executed as part
of cpu_emergency_vmxoff().
The logic in reboot.c is also corrected, since the point of forcing
the processor out of VMX root operation is because when VMX root
operation is enabled, the processor INIT signal is always masked.
See Intel SDM section on differences between VMX Root operation and normal
operation. Thus every CPU must be forced out of VMX operation.
Since the CPU will hang rather than restart, a manual "reset" is the
only way out of this state (or if there is a BMC, it can issue a RESET
to the chip).

Signed-off-by: David P. Reed <[email protected]>
---
arch/x86/include/asm/virtext.h | 24 ++++++++++++----
arch/x86/kernel/reboot.c | 13 ++-------
arch/x86/kernel/traps.c | 52 ++++++++++++++++++++++++++++++++--
3 files changed, 71 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..ea2d67191684 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -13,12 +13,16 @@
#ifndef _ASM_X86_VIRTEX_H
#define _ASM_X86_VIRTEX_H

+#include <linux/percpu.h>
+
#include <asm/processor.h>

#include <asm/vmx.h>
#include <asm/svm.h>
#include <asm/tlbflush.h>

+DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff);
+
/*
* VMX functions:
*/
@@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void)
/** Disable VMX on the current CPU
*
* vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * on the CPU previously. Only call this function directly if you know VMX
+ * is enabled *and* CPU is in VMX root operation.
*/
static inline void cpu_vmxoff(void)
{
@@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
}

-/** Disable VMX if it is enabled on the current CPU
+/** Force disable VMX if it is enabled on the current CPU.
+ * Note that if CPU is not in VMX root operation this
+ * VMXOFF will fault an undefined operation fault.
+ * So the 'doing_emergency_vmxoff' percpu flag is set,
+ * the trap handler for just restarts execution after
+ * the VMXOFF instruction.
*
- * You shouldn't call this if cpu_has_vmx() returns 0.
+ * You shouldn't call this directly if cpu_has_vmx() returns 0.
*/
static inline void __cpu_emergency_vmxoff(void)
{
- if (cpu_vmx_enabled())
+ if (cpu_vmx_enabled()) {
+ this_cpu_write(doing_emergency_vmxoff, 1);
cpu_vmxoff();
+ this_cpu_write(doing_emergency_vmxoff, 0);
+ }
}

-/** Disable VMX if it is supported and enabled on the current CPU
+/** Force disable VMX if it is supported and enabled on the current CPU
*/
static inline void cpu_emergency_vmxoff(void)
{
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 3ca43be4f9cf..abc8b51a57c7 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
*
* For safety, we will avoid running the nmi_shootdown_cpus()
* stuff unnecessarily, but we don't have a way to check
- * if other CPUs have VMX enabled. So we will call it only if the
- * CPU we are running on has VMX enabled.
- *
- * We will miss cases where VMX is not enabled on all CPUs. This
- * shouldn't do much harm because KVM always enable VMX on all
- * CPUs anyway. But we can miss it on the small window where KVM
- * is still enabling VMX.
+ * if other CPUs have VMX enabled.
*/
- if (cpu_has_vmx() && cpu_vmx_enabled()) {
+ if (cpu_has_vmx()) {
/* Disable VMX on this CPU. */
- cpu_vmxoff();
+ cpu_emergency_vmxoff();

/* Halt and disable VMX on the other CPUs */
nmi_shootdown_cpus(vmxoff_nmi);
-
}
}

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4cc541051994..2dcf57ef467e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -39,6 +39,7 @@
#include <linux/io.h>
#include <linux/hardirq.h>
#include <linux/atomic.h>
+#include <linux/percpu.h>

#include <asm/stacktrace.h>
#include <asm/processor.h>
@@ -59,6 +60,7 @@
#include <asm/umip.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#include <asm/virtext.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -70,6 +72,8 @@
#include <asm/proto.h>
#endif

+DEFINE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff) = 0;
+
DECLARE_BITMAP(system_vectors, NR_VECTORS);

static inline void cond_local_irq_enable(struct pt_regs *regs)
@@ -115,6 +119,43 @@ int fixup_bug(struct pt_regs *regs, int trapnr)
return 0;
}

+/*
+ * Fix any unwanted undefined operation fault due to VMXOFF instruction that
+ * is needed to ensure that CPU is not in VMX root operation at time of
+ * a reboot/panic CPU reset. There is no safe and reliable way to know
+ * if a processor is in VMX root operation, other than to skip the
+ * VMXOFF. It is safe to just skip any VMXOFF that might generate this
+ * exception, when VMX operation is enabled in CR4. In the extremely
+ * rare case that a VMXOFF is erroneously executed while VMX is enabled,
+ * but VMXON has not been executed yet, the undefined opcode fault
+ * should not be missed by valid code, though it would be an error.
+ * To detect this, we could somehow restrict the instruction address
+ * to the specific use during reboot/panic.
+ */
+static int fixup_emergency_vmxoff(struct pt_regs *regs, int trapnr)
+{
+ const static u8 insn_vmxoff[3] = { 0x0f, 0x01, 0xc4 };
+ u8 ud[3];
+
+ if (trapnr != X86_TRAP_UD)
+ return 0;
+ if (!cpu_vmx_enabled())
+ return 0;
+ if (!this_cpu_read(doing_emergency_vmxoff))
+ return 0;
+
+ /* undefined instruction must be in kernel and be VMXOFF */
+ if (regs->ip < TASK_SIZE_MAX)
+ return 0;
+ if (probe_kernel_address((u8 *)regs->ip, ud))
+ return 0;
+ if (memcmp(ud, insn_vmxoff, sizeof(insn_vmxoff)))
+ return 0;
+
+ regs->ip += sizeof(insn_vmxoff);
+ return 1;
+}
+
static nokprobe_inline int
do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
struct pt_regs *regs, long error_code)
@@ -193,9 +234,16 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
/*
* WARN*()s end up here; fix them up before we call the
* notifier chain.
+ * Also, VMXOFF causes unwanted fault during reboot
+ * if VMX is enabled, but not in VMX root operation. Fix
+ * before calling notifier chain.
*/
- if (!user_mode(regs) && fixup_bug(regs, trapnr))
- return;
+ if (!user_mode(regs)) {
+ if (fixup_bug(regs, trapnr))
+ return;
+ if (fixup_emergency_vmxoff(regs, trapnr))
+ return;
+ }

if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
NOTIFY_STOP) {
--
2.26.2


2020-06-10 20:20:34

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

Hi David,

If you happen to make a v2 of this patch, there are a few comments
that begin with "/**" but they are not kernel-doc comments, so they
should instead begin with just "/*". Please see below.

(and you did not introduce this comment style here.)

On 6/10/20 11:12 AM, David P. Reed wrote:
>
> Signed-off-by: David P. Reed <[email protected]>
> ---
> arch/x86/include/asm/virtext.h | 24 ++++++++++++----
> arch/x86/kernel/reboot.c | 13 ++-------
> arch/x86/kernel/traps.c | 52 ++++++++++++++++++++++++++++++++--
> 3 files changed, 71 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
> index 9aad0e0876fb..ea2d67191684 100644
> --- a/arch/x86/include/asm/virtext.h
> +++ b/arch/x86/include/asm/virtext.h
> @@ -13,12 +13,16 @@
> #ifndef _ASM_X86_VIRTEX_H
> #define _ASM_X86_VIRTEX_H
>
> +#include <linux/percpu.h>
> +
> #include <asm/processor.h>
>
> #include <asm/vmx.h>
> #include <asm/svm.h>
> #include <asm/tlbflush.h>
>
> +DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff);
> +
> /*
> * VMX functions:
> */
> @@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void)
> /** Disable VMX on the current CPU

just
/* Disable VMX on the current CPU

> *
> * vmxoff causes a undefined-opcode exception if vmxon was not run
> - * on the CPU previously. Only call this function if you know VMX
> - * is enabled.
> + * on the CPU previously. Only call this function directly if you know VMX
> + * is enabled *and* CPU is in VMX root operation.
> */
> static inline void cpu_vmxoff(void)
> {
> @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void)
> return __read_cr4() & X86_CR4_VMXE;
> }
>
> -/** Disable VMX if it is enabled on the current CPU
> +/** Force disable VMX if it is enabled on the current CPU.

just
/* Force disable VMX if it is enabled on the current CPU.

> + * Note that if CPU is not in VMX root operation this
> + * VMXOFF will fault an undefined operation fault.
> + * So the 'doing_emergency_vmxoff' percpu flag is set,
> + * the trap handler for just restarts execution after
> + * the VMXOFF instruction.
> *
> - * You shouldn't call this if cpu_has_vmx() returns 0.
> + * You shouldn't call this directly if cpu_has_vmx() returns 0.
> */
> static inline void __cpu_emergency_vmxoff(void)
> {
> - if (cpu_vmx_enabled())
> + if (cpu_vmx_enabled()) {
> + this_cpu_write(doing_emergency_vmxoff, 1);
> cpu_vmxoff();
> + this_cpu_write(doing_emergency_vmxoff, 0);
> + }
> }
>
> -/** Disable VMX if it is supported and enabled on the current CPU
> +/** Force disable VMX if it is supported and enabled on the current CPU

ditto.


> */
> static inline void cpu_emergency_vmxoff(void)
> {


thanks.
--
~Randy

2020-06-10 21:39:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

"David P. Reed" <[email protected]> writes:
> +/*
> + * Fix any unwanted undefined operation fault due to VMXOFF instruction that
> + * is needed to ensure that CPU is not in VMX root operation at time of
> + * a reboot/panic CPU reset. There is no safe and reliable way to know
> + * if a processor is in VMX root operation, other than to skip the
> + * VMXOFF. It is safe to just skip any VMXOFF that might generate this
> + * exception, when VMX operation is enabled in CR4. In the extremely
> + * rare case that a VMXOFF is erroneously executed while VMX is enabled,
> + * but VMXON has not been executed yet, the undefined opcode fault
> + * should not be missed by valid code, though it would be an error.
> + * To detect this, we could somehow restrict the instruction address
> + * to the specific use during reboot/panic.
> + */
> +static int fixup_emergency_vmxoff(struct pt_regs *regs, int trapnr)
> +{
> + const static u8 insn_vmxoff[3] = { 0x0f, 0x01, 0xc4 };
> + u8 ud[3];
> +
> + if (trapnr != X86_TRAP_UD)
> + return 0;
> + if (!cpu_vmx_enabled())
> + return 0;
> + if (!this_cpu_read(doing_emergency_vmxoff))
> + return 0;
> +
> + /* undefined instruction must be in kernel and be VMXOFF */
> + if (regs->ip < TASK_SIZE_MAX)
> + return 0;
> + if (probe_kernel_address((u8 *)regs->ip, ud))
> + return 0;
> + if (memcmp(ud, insn_vmxoff, sizeof(insn_vmxoff)))
> + return 0;
> +
> + regs->ip += sizeof(insn_vmxoff);
> + return 1;

We have exception fixups to avoid exactly that kind of horrible
workarounds all over the place.

static inline int cpu_vmxoff_safe(void)
{
int err;

asm volatile("2: vmxoff; xor %[err],%[err]\n"
"1:\n\t"
".section .fixup,\"ax\"\n\t"
"3: mov %[fault],%[err] ; jmp 1b\n\t"
".previous\n\t"
_ASM_EXTABLE(2b, 3b)
: [err] "=a" (err)
: [fault] "i" (-EFAULT)
: "memory");
return err;
}

static inline void __cpu_emergency_vmxoff(void)
{
if (!cpu_vmx_enabled())
return;
if (!cpu_vmxoff_safe())
cr4_clear_bits(X86_CR4_VMXE);
}

Problem solved.

Thanks,

tglx

2020-06-10 21:39:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

On Wed, Jun 10, 2020 at 02:12:50PM -0400, David P. Reed wrote:
> If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
> done on all CPUS, to allow the INIT IPI to function, since
> INIT is suppressed when CPUs are in VMX root operation.
> However, VMXOFF causes an undefined operation fault if the CPU is not
> in VMX operation, that is, VMXON has not been executed, or VMXOFF
> has been executed, but VMX is enabled. This fix makes the reboot
> work more reliably by modifying the #UD handler to skip the VMXOFF
> if VMX is enabled on the CPU and the VMXOFF is executed as part
> of cpu_emergency_vmxoff().
> The logic in reboot.c is also corrected, since the point of forcing
> the processor out of VMX root operation is because when VMX root
> operation is enabled, the processor INIT signal is always masked.
> See Intel SDM section on differences between VMX Root operation and normal
> operation. Thus every CPU must be forced out of VMX operation.
> Since the CPU will hang rather than restart, a manual "reset" is the
> only way out of this state (or if there is a BMC, it can issue a RESET
> to the chip).
>
> Signed-off-by: David P. Reed <[email protected]>
> ---
> @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void)
> return __read_cr4() & X86_CR4_VMXE;
> }
>
> -/** Disable VMX if it is enabled on the current CPU
> +/** Force disable VMX if it is enabled on the current CPU.
> + * Note that if CPU is not in VMX root operation this
> + * VMXOFF will fault an undefined operation fault.
> + * So the 'doing_emergency_vmxoff' percpu flag is set,
> + * the trap handler for just restarts execution after
> + * the VMXOFF instruction.
> *
> - * You shouldn't call this if cpu_has_vmx() returns 0.
> + * You shouldn't call this directly if cpu_has_vmx() returns 0.
> */
> static inline void __cpu_emergency_vmxoff(void)
> {
> - if (cpu_vmx_enabled())
> + if (cpu_vmx_enabled()) {
> + this_cpu_write(doing_emergency_vmxoff, 1);
> cpu_vmxoff();
> + this_cpu_write(doing_emergency_vmxoff, 0);
> + }
> }

...

> +/*
> + * Fix any unwanted undefined operation fault due to VMXOFF instruction that
> + * is needed to ensure that CPU is not in VMX root operation at time of
> + * a reboot/panic CPU reset. There is no safe and reliable way to know
> + * if a processor is in VMX root operation, other than to skip the
> + * VMXOFF. It is safe to just skip any VMXOFF that might generate this
> + * exception, when VMX operation is enabled in CR4. In the extremely
> + * rare case that a VMXOFF is erroneously executed while VMX is enabled,
> + * but VMXON has not been executed yet, the undefined opcode fault
> + * should not be missed by valid code, though it would be an error.
> + * To detect this, we could somehow restrict the instruction address
> + * to the specific use during reboot/panic.
> + */
> +static int fixup_emergency_vmxoff(struct pt_regs *regs, int trapnr)
> +{
> + const static u8 insn_vmxoff[3] = { 0x0f, 0x01, 0xc4 };
> + u8 ud[3];
> +
> + if (trapnr != X86_TRAP_UD)
> + return 0;
> + if (!cpu_vmx_enabled())
> + return 0;
> + if (!this_cpu_read(doing_emergency_vmxoff))
> + return 0;
> +
> + /* undefined instruction must be in kernel and be VMXOFF */
> + if (regs->ip < TASK_SIZE_MAX)
> + return 0;
> + if (probe_kernel_address((u8 *)regs->ip, ud))
> + return 0;
> + if (memcmp(ud, insn_vmxoff, sizeof(insn_vmxoff)))
> + return 0;
> +
> + regs->ip += sizeof(insn_vmxoff);
> + return 1;
> +}
> +
> static nokprobe_inline int
> do_trap_no_signal(struct task_struct *tsk, int trapnr, const char *str,
> struct pt_regs *regs, long error_code)
> @@ -193,9 +234,16 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
> /*
> * WARN*()s end up here; fix them up before we call the
> * notifier chain.
> + * Also, VMXOFF causes unwanted fault during reboot
> + * if VMX is enabled, but not in VMX root operation. Fix
> + * before calling notifier chain.
> */
> - if (!user_mode(regs) && fixup_bug(regs, trapnr))
> - return;
> + if (!user_mode(regs)) {
> + if (fixup_bug(regs, trapnr))
> + return;
> + if (fixup_emergency_vmxoff(regs, trapnr))
> + return;
> + }

Isn't this just a really kludgy way of doing fixup on vmxoff? E.g. wouldn't
the below patch do the trick?

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..54bc84d7028d 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -32,13 +32,15 @@ static inline int cpu_has_vmx(void)

/** Disable VMX on the current CPU
*
- * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * VMXOFF causes a #UD if the CPU is not post-VMXON, eat any #UDs to handle
+ * races with a hypervisor doing VMXOFF, e.g. if an NMI arrived between VMXOFF
+ * and clearing CR4.VMXE.
*/
static inline void cpu_vmxoff(void)
{
- asm volatile ("vmxoff");
+ asm volatile("1: vmxoff\n\t"
+ "2:\n\t"
+ _ASM_EXTABLE(1b, 2b));
cr4_clear_bits(X86_CR4_VMXE);
}


2020-06-10 21:45:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

Gah, I typed too slow :-)

On Wed, Jun 10, 2020 at 11:34:21PM +0200, Thomas Gleixner wrote:
> We have exception fixups to avoid exactly that kind of horrible
> workarounds all over the place.
>
> static inline int cpu_vmxoff_safe(void)
> {
> int err;
>
> asm volatile("2: vmxoff; xor %[err],%[err]\n"
> "1:\n\t"
> ".section .fixup,\"ax\"\n\t"
> "3: mov %[fault],%[err] ; jmp 1b\n\t"
> ".previous\n\t"
> _ASM_EXTABLE(2b, 3b)
> : [err] "=a" (err)
> : [fault] "i" (-EFAULT)
> : "memory");
> return err;
> }
>
> static inline void __cpu_emergency_vmxoff(void)
> {
> if (!cpu_vmx_enabled())
> return;
> if (!cpu_vmxoff_safe())
> cr4_clear_bits(X86_CR4_VMXE);

This bit is wrong, CR4.VMXE should be cleared even if VMXOFF faults, e.g.
if this is called in NMI context and the NMI arrived in KVM code between
VMXOFF and clearing CR4.VMXE.

All other VMXOFF faults are mode related, i.e. any fault is guaranteed to
be due to the !post-VMXON check unless we're magically in RM, VM86, compat
mode, or at CPL>0.

> }
>
> Problem solved.
>
> Thanks,
>
> tglx

2020-06-10 22:04:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash



> On Jun 10, 2020, at 11:21 AM, David P. Reed <[email protected]> wrote:
>
> If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
> done on all CPUS, to allow the INIT IPI to function, since
> INIT is suppressed when CPUs are in VMX root operation.
> However, VMXOFF causes an undefined operation fault if the CPU is not
> in VMX operation, that is, VMXON has not been executed, or VMXOFF
> has been executed, but VMX is enabled.

I’m surprised. Wouldn’t this mean that emergency reboots always fail it a VM is running? I would think someone would have noticed before.

> This fix makes the reboot
> work more reliably by modifying the #UD handler to skip the VMXOFF
> if VMX is enabled on the CPU and the VMXOFF is executed as part
> of cpu_emergency_vmxoff().

NAK. See below.

> The logic in reboot.c is also corrected, since the point of forcing
> the processor out of VMX root operation is because when VMX root
> operation is enabled, the processor INIT signal is always masked.
> See Intel SDM section on differences between VMX Root operation and normal
> operation. Thus every CPU must be forced out of VMX operation.
> Since the CPU will hang rather than restart, a manual "reset" is the
> only way out of this state (or if there is a BMC, it can issue a RESET
> to the chip).
>
> Signed-off-by: David P. Reed <[email protected]>
> ---
> arch/x86/include/asm/virtext.h | 24 ++++++++++++----
> arch/x86/kernel/reboot.c | 13 ++-------
> arch/x86/kernel/traps.c | 52 ++++++++++++++++++++++++++++++++--
> 3 files changed, 71 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
> index 9aad0e0876fb..ea2d67191684 100644
> --- a/arch/x86/include/asm/virtext.h
> +++ b/arch/x86/include/asm/virtext.h
> @@ -13,12 +13,16 @@
> #ifndef _ASM_X86_VIRTEX_H
> #define _ASM_X86_VIRTEX_H
>
> +#include <linux/percpu.h>
> +
> #include <asm/processor.h>
>
> #include <asm/vmx.h>
> #include <asm/svm.h>
> #include <asm/tlbflush.h>
>
> +DECLARE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff);
> +
> /*
> * VMX functions:
> */
> @@ -33,8 +37,8 @@ static inline int cpu_has_vmx(void)
> /** Disable VMX on the current CPU
> *
> * vmxoff causes a undefined-opcode exception if vmxon was not run
> - * on the CPU previously. Only call this function if you know VMX
> - * is enabled.
> + * on the CPU previously. Only call this function directly if you know VMX
> + * is enabled *and* CPU is in VMX root operation.
> */

So presumably the bug is someone calling this inappropriatelet?

> static inline void cpu_vmxoff(void)
> {
> @@ -47,17 +51,25 @@ static inline int cpu_vmx_enabled(void)
> return __read_() & X86_CR4_VMXE;
> }
>
> -/** Disable VMX if it is enabled on the current CPU
> +/** Force disable VMX if it is enabled on the current CPU.
> + * Note that if CPU is not in VMX root operation this
> + * VMXOFF will fault an undefined operation fault.
> + * So the 'doing_emergency_vmxoff' percpu flag is set,
> + * the trap handler for just restarts execution after
> + * the VMXOFF instruction.
> *
> - * You shouldn't call this if cpu_has_vmx() returns 0.
> + * You shouldn't call this directly if cpu_has_vmx() returns 0.
> */
> static inline void __cpu_emergency_vmxoff(void)
> {
> - if (cpu_vmx_enabled())
> + if (cpu_vmx_enabled()) {
> + this_cpu_write(doing_emergency_vmxoff, 1);
> cpu_vmxoff();
> + this_cpu_write(doing_emergency_vmxoff, 0);
> + }
> }

NAK. Just write this in asm with an exception handler that does the right thing.

Please also try to identify the actual bug. Because I have a sneaking suspicion that you are running an out of tree module that has issues. If so, the patch should explain this.

>
> -/** Disable VMX if it is supported and enabled on the current CPU
> +/** Force disable VMX if it is supported and enabled on the current CPU
> */
> static inline void cpu_emergency_vmxoff(void)
> {
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 3ca43be4f9cf..abc8b51a57c7 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
> *
> * For safety, we will avoid running the nmi_shootdown_cpus()
> * stuff unnecessarily, but we don't have a way to check
> - * if other CPUs have VMX enabled. So we will call it only if the
> - * CPU we are running on has VMX enabled.
> - *
> - * We will miss cases where VMX is not enabled on all CPUs. This
> - * shouldn't do much harm because KVM always enable VMX on all
> - * CPUs anyway. But we can miss it on the small window where KVM
> - * is still enabling VMX.
> + * if other CPUs have VMX enabled.
> */
> - if (cpu_has_vmx() && cpu_vmx_enabled()) {
> + if (cpu_has_vmx()) {
> /* Disable VMX on this CPU. */
> - cpu_vmxoff();
> + cpu_emergency_vmxoff();
>
> /* Halt and disable VMX on the other CPUs */
> nmi_shootdown_cpus(vmxoff_nmi);
> -
> }
> }
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 4cc541051994..2dcf57ef467e 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -39,6 +39,7 @@
> #include <linux/io.h>
> #include <linux/hardirq.h>
> #include <linux/atomic.h>
> +#include <linux/percpu.h>
>
> #include <asm/stacktrace.h>
> #include <asm/processor.h>
> @@ -59,6 +60,7 @@
> #include <asm/umip.h>
> #include <asm/insn.h>
> #include <asm/insn-eval.h>
> +#include <asm/virtext.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
> @@ -70,6 +72,8 @@
> #include <asm/proto.h>
> #endif
>
> +DEFINE_PER_CPU_READ_MOSTLY(int, doing_emergency_vmxoff) = 0;
> +
> DECLARE_BITMAP(system_vectors, NR_VECTORS);
>
> static inline void cond_local_irq_enable(struct pt_regs *regs)
> @@ -115,6 +119,43 @@ int fixup_bug(struct pt_regs *regs, int trapnr)
> return 0;
> }
>
> +/*
> + * Fix any unwanted undefined operation fault due to VMXOFF instruction that
> + * is needed to ensure that CPU is not in VMX root operation at time of
> + * a reboot/panic CPU reset. There is no safe and reliable way to know
> + * if a processor is in VMX root operation, other than to skip the
> + * VMXOFF. It is safe to just skip any VMXOFF that might generate this
> + * exception, when VMX operation is enabled in CR4. In the extremely
> + * rare case that a VMXOFF is erroneously executed while VMX is enabled,
> + * but VMXON has not been executed yet, the undefined opcode fault
> + * should not be missed by valid code, though it would be an error.
> + * To detect this, we could somehow restrict the instruction address
> + * to the specific use during reboot/panic.
> + */
> +static int fixup_emergency_vmxoff(struct pt_regs *regs, int trapnr)
> +{

NAK.

2020-06-10 22:13:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

Sean Christopherson <[email protected]> writes:
> Gah, I typed too slow :-)

Haha. I had the same thought.

> On Wed, Jun 10, 2020 at 11:34:21PM +0200, Thomas Gleixner wrote:
>> We have exception fixups to avoid exactly that kind of horrible
>> workarounds all over the place.
>>
>> static inline int cpu_vmxoff_safe(void)
>> {
>> int err;
>>
>> asm volatile("2: vmxoff; xor %[err],%[err]\n"
>> "1:\n\t"
>> ".section .fixup,\"ax\"\n\t"
>> "3: mov %[fault],%[err] ; jmp 1b\n\t"
>> ".previous\n\t"
>> _ASM_EXTABLE(2b, 3b)
>> : [err] "=a" (err)
>> : [fault] "i" (-EFAULT)
>> : "memory");
>> return err;
>> }
>>
>> static inline void __cpu_emergency_vmxoff(void)
>> {
>> if (!cpu_vmx_enabled())
>> return;
>> if (!cpu_vmxoff_safe())
>> cr4_clear_bits(X86_CR4_VMXE);
>
> This bit is wrong, CR4.VMXE should be cleared even if VMXOFF faults, e.g.
> if this is called in NMI context and the NMI arrived in KVM code between
> VMXOFF and clearing CR4.VMXE.

Oh, right.

> All other VMXOFF faults are mode related, i.e. any fault is guaranteed to
> be due to the !post-VMXON check unless we're magically in RM, VM86, compat
> mode, or at CPL>0.

Your patch is simpler indeed.

Thanks,

tglx

2020-06-11 00:02:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

On Wed, Jun 10, 2020 at 02:59:19PM -0700, Andy Lutomirski wrote:
>
>
> > On Jun 10, 2020, at 11:21 AM, David P. Reed <[email protected]> wrote:
> >
> > If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
> > done on all CPUS, to allow the INIT IPI to function, since
> > INIT is suppressed when CPUs are in VMX root operation.
> > However, VMXOFF causes an undefined operation fault if the CPU is not
> > in VMX operation, that is, VMXON has not been executed, or VMXOFF
> > has been executed, but VMX is enabled.
>
> I’m surprised. Wouldn’t this mean that emergency reboots always fail it a VM
> is running? I would think someone would have noticed before.

The call to cpu_vmxoff() is conditioned on CR4.VMXE==1, which KVM toggles in
tandem with VMXON and VMXOFF. Out of tree hypervisors presumably do the
same. That's obviously not atomic though, e.g. VMXOFF will #UD if the
vmxoff_nmi() NMI arrives between CR4.VMXE=1 and VMXON, or between VMXOFF
and CR4.VMXE=0.

2020-06-11 00:18:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

On Wed, Jun 10, 2020 at 5:00 PM Sean Christopherson
<[email protected]> wrote:
>
> On Wed, Jun 10, 2020 at 02:59:19PM -0700, Andy Lutomirski wrote:
> >
> >
> > > On Jun 10, 2020, at 11:21 AM, David P. Reed <[email protected]> wrote:
> > >
> > > If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
> > > done on all CPUS, to allow the INIT IPI to function, since
> > > INIT is suppressed when CPUs are in VMX root operation.
> > > However, VMXOFF causes an undefined operation fault if the CPU is not
> > > in VMX operation, that is, VMXON has not been executed, or VMXOFF
> > > has been executed, but VMX is enabled.
> >
> > I’m surprised. Wouldn’t this mean that emergency reboots always fail it a VM
> > is running? I would think someone would have noticed before.
>
> The call to cpu_vmxoff() is conditioned on CR4.VMXE==1, which KVM toggles in
> tandem with VMXON and VMXOFF. Out of tree hypervisors presumably do the
> same. That's obviously not atomic though, e.g. VMXOFF will #UD if the
> vmxoff_nmi() NMI arrives between CR4.VMXE=1 and VMXON, or between VMXOFF
> and CR4.VMXE=0.

It would be nice for the commit message to say "this happens when
nmxoff_nmi() races with KVM's VMXON/VMXOFF toggling". Or the commit
message should say something else if the bug happens for a different
reason.

The race with KVM should be quite unusual, since it involves rebooting
concurrently with loading or unloading KVM.

2020-06-11 17:50:18

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

On Thu, Jun 11, 2020 at 10:00 AM Sean Christopherson
<[email protected]> wrote:
>
> On Thu, Jun 11, 2020 at 12:33:20PM -0400, David P. Reed wrote:
> > To respond to Thomas Gleixner's suggestion about exception masking mechanism
> > - it may well be a better fix, but a) I used "BUG" as a model, and b) the
> > exception masking is undocumented anywhere I can find. These are "static
> > inline" routines, and only the "emergency" version needs protection, because
> > you'd want a random VMXOFF to actually trap.
>
> The only in-kernel usage of cpu_vmxoff() are for emergencies. And, the only
> reasonable source of faults on VMXOFF is that VMX is already off, i.e. for
> the kernel's usage, the goal is purely to ensure VMX is disabled, how we get
> there doesn't truly matter.
>
> > In at least one of the calls to emergency, it is stated that no locks may be
> > taken at all because of where it was.
> >
> > Further, I have a different patch that requires a scratch page per processor
> > to exist, but which never takes a UD fault. (basically, it attempts VMXON
> > first, and then does VMXOFF after VMXON, which ensures exit from VMX root
> > mode, but VMXON needs a blank page to either succeed or fail without GP
> > fault). If someone prefers that, it's local to the routine, but requires a
> > new scratch page per processor be allocated. So after testing it, I decided
> > in the interest of memory reduction that the masking of UD was preferable.
>
> Please no, doing VMXON, even temporarily, could cause breakage. The CPU's
> VMCS cache isn't cleared on VMXOFF. Doing VMXON after kdump_nmi_callback()
> invokes cpu_crash_vmclear_loaded_vmcss() would create a window where VMPTRLD
> could succeed in a hypervisor and lead to memory corruption in the new
> kernel when the VMCS is evicted from the non-coherent VMCS cache.
>
> > I'm happy to resubmit the masking exception patch as version 2, if it works
> > in my test case.
> >
> > Advice?
>
> Please test the below, which simply eats any exception on VMXOFF.
>
> diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
> index 9aad0e0876fb..54bc84d7028d 100644
> --- a/arch/x86/include/asm/virtext.h
> +++ b/arch/x86/include/asm/virtext.h
> @@ -32,13 +32,15 @@ static inline int cpu_has_vmx(void)
>
> /** Disable VMX on the current CPU
> *
> - * vmxoff causes a undefined-opcode exception if vmxon was not run
> - * on the CPU previously. Only call this function if you know VMX
> - * is enabled.
> + * VMXOFF causes a #UD if the CPU is not post-VMXON, eat any #UDs to handle
> + * races with a hypervisor doing VMXOFF, e.g. if an NMI arrived between VMXOFF
> + * and clearing CR4.VMXE.
> */
> static inline void cpu_vmxoff(void)
> {
> - asm volatile ("vmxoff");
> + asm volatile("1: vmxoff\n\t"
> + "2:\n\t"
> + _ASM_EXTABLE(1b, 2b));
> cr4_clear_bits(X86_CR4_VMXE);
> }

I think that just eating exceptions like this is asking for trouble.
How about having a separate cpu_emergency_vmxoff() that eats
exceptions and leaving cpu_vmxoff() alone? Or make cpu_vmxoff()
return an error on failure and have the normal caller WARN if there's
an error.

Silently eating exceptions in the non-emergency path makes it too easy
to regress something without noticing.

2020-06-11 17:50:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] Fix undefined operation VMXOFF during reboot and crash

On Thu, Jun 11, 2020 at 12:33:20PM -0400, David P. Reed wrote:
> To respond to Thomas Gleixner's suggestion about exception masking mechanism
> - it may well be a better fix, but a) I used "BUG" as a model, and b) the
> exception masking is undocumented anywhere I can find. These are "static
> inline" routines, and only the "emergency" version needs protection, because
> you'd want a random VMXOFF to actually trap.

The only in-kernel usage of cpu_vmxoff() are for emergencies. And, the only
reasonable source of faults on VMXOFF is that VMX is already off, i.e. for
the kernel's usage, the goal is purely to ensure VMX is disabled, how we get
there doesn't truly matter.

> In at least one of the calls to emergency, it is stated that no locks may be
> taken at all because of where it was.
>
> Further, I have a different patch that requires a scratch page per processor
> to exist, but which never takes a UD fault. (basically, it attempts VMXON
> first, and then does VMXOFF after VMXON, which ensures exit from VMX root
> mode, but VMXON needs a blank page to either succeed or fail without GP
> fault). If someone prefers that, it's local to the routine, but requires a
> new scratch page per processor be allocated. So after testing it, I decided
> in the interest of memory reduction that the masking of UD was preferable.

Please no, doing VMXON, even temporarily, could cause breakage. The CPU's
VMCS cache isn't cleared on VMXOFF. Doing VMXON after kdump_nmi_callback()
invokes cpu_crash_vmclear_loaded_vmcss() would create a window where VMPTRLD
could succeed in a hypervisor and lead to memory corruption in the new
kernel when the VMCS is evicted from the non-coherent VMCS cache.

> I'm happy to resubmit the masking exception patch as version 2, if it works
> in my test case.
>
> Advice?

Please test the below, which simply eats any exception on VMXOFF.

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..54bc84d7028d 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -32,13 +32,15 @@ static inline int cpu_has_vmx(void)

/** Disable VMX on the current CPU
*
- * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * VMXOFF causes a #UD if the CPU is not post-VMXON, eat any #UDs to handle
+ * races with a hypervisor doing VMXOFF, e.g. if an NMI arrived between VMXOFF
+ * and clearing CR4.VMXE.
*/
static inline void cpu_vmxoff(void)
{
- asm volatile ("vmxoff");
+ asm volatile("1: vmxoff\n\t"
+ "2:\n\t"
+ _ASM_EXTABLE(1b, 2b));
cr4_clear_bits(X86_CR4_VMXE);
}

2020-06-11 19:49:47

by David P. Reed

[permalink] [raw]
Subject: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
done on all CPUS, to allow the INIT IPI to function, since
INIT is suppressed when CPUs are in VMX root operation.
Problem is that VMXOFF will causes undefined operation fault when CPU not
in VMX operation, that is, VMXON has not been executed yet, or VMXOFF
has been execute but VMX still enabled. Patch makes the reboot
work more reliably by masking the exception on VMXOFF in the
crash/panic/reboot path, which uses cpu_emergency_vmxoff().
Can happen with KVM due to a race, but that race is rare today.
Problem discovered doing out-of-tree x-visor development that uses VMX
in a novel way for kernel performance analysis.
The logic in reboot.c is also corrected, since the point of forcing
the processor out of VMX root operation is to allow the INIT signal
to be unmasked. See Intel SDM section on differences between VMX Root
operation and normal operation. Thus every CPU must be forced out of
VMX operation. Since the CPU may hang rather if INIT fails than restart,
a manual hardware "reset" is the only way out of this state in a
lights-out datacenter (well, if there is a BMC, it can issue a
hardware RESET to the chip).
Style errors in original file fixed, at request of Randy Dunlap:
eliminate '/**' in non-kernel-doc comments.

Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
Reported-by: David P. Reed <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: David P. Reed <[email protected]>
---
arch/x86/include/asm/virtext.h | 40 ++++++++++++++++++++++++----------
arch/x86/kernel/reboot.c | 13 +++--------
2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..ed22c1983da8 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void)
}


-/** Disable VMX on the current CPU
+/* Disable VMX on the current CPU
*
- * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * vmxoff causes an undefined-opcode exception if vmxon was not run
+ * on the CPU previously. Only call this function directly if you know VMX
+ * is enabled *and* CPU is in VMX root operation.
*/
static inline void cpu_vmxoff(void)
{
- asm volatile ("vmxoff");
+ asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success */
cr4_clear_bits(X86_CR4_VMXE);
}

@@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
}

-/** Disable VMX if it is enabled on the current CPU
- *
- * You shouldn't call this if cpu_has_vmx() returns 0.
+/*
+ * Safely disable VMX root operation if active
+ * Note that if CPU is not in VMX root operation this
+ * VMXOFF will fault an undefined operation fault,
+ * so use the exception masking facility to handle that RARE
+ * case.
+ * You shouldn't call this directly if cpu_has_vmx() returns 0
+ */
+static inline void cpu_vmxoff_safe(void)
+{
+ asm volatile("1:vmxoff\n\t" /* clears all flags on success */
+ "2:\n\t"
+ _ASM_EXTABLE(1b, 2b)
+ ::: "cc", "memory");
+ cr4_clear_bits(X86_CR4_VMXE);
+}
+
+/*
+ * Force disable VMX if it is enabled on the current CPU,
+ * when it is unknown whether CPU is in VMX operation.
*/
static inline void __cpu_emergency_vmxoff(void)
{
- if (cpu_vmx_enabled())
- cpu_vmxoff();
+ if (!cpu_vmx_enabled())
+ return;
+ cpu_vmxoff_safe();
}

-/** Disable VMX if it is supported and enabled on the current CPU
+/* Force disable VMX if it is supported on current CPU
*/
static inline void cpu_emergency_vmxoff(void)
{
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e040ba6be27b..b0e6b106a67e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
*
* For safety, we will avoid running the nmi_shootdown_cpus()
* stuff unnecessarily, but we don't have a way to check
- * if other CPUs have VMX enabled. So we will call it only if the
- * CPU we are running on has VMX enabled.
- *
- * We will miss cases where VMX is not enabled on all CPUs. This
- * shouldn't do much harm because KVM always enable VMX on all
- * CPUs anyway. But we can miss it on the small window where KVM
- * is still enabling VMX.
+ * if other CPUs have VMX enabled.
*/
- if (cpu_has_vmx() && cpu_vmx_enabled()) {
+ if (cpu_has_vmx()) {
/* Disable VMX on this CPU. */
- cpu_vmxoff();
+ cpu_emergency_vmxoff();

/* Halt and disable VMX on the other CPUs */
nmi_shootdown_cpus(vmxoff_nmi);
-
}
}

--
2.26.2

2020-06-11 19:54:16

by David P. Reed

[permalink] [raw]
Subject: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

If a panic/reboot occurs when CR4 has VMX enabled, a VMXOFF is
done on all CPUS, to allow the INIT IPI to function, since
INIT is suppressed when CPUs are in VMX root operation.
Problem is that VMXOFF will causes undefined operation fault when CPU not
in VMX operation, that is, VMXON has not been executed yet, or VMXOFF
has been execute but VMX still enabled. Patch makes the reboot
work more reliably by masking the exception on VMXOFF in the
crash/panic/reboot path, which uses cpu_emergency_vmxoff().
Can happen with KVM due to a race, but that race is rare today.
Problem discovered doing out-of-tree x-visor development that uses VMX
in a novel way for kernel performance analysis.
The logic in reboot.c is also corrected, since the point of forcing
the processor out of VMX root operation is to allow the INIT signal
to be unmasked. See Intel SDM section on differences between VMX Root
operation and normal operation. Thus every CPU must be forced out of
VMX operation. Since the CPU may hang rather if INIT fails than restart,
a manual hardware "reset" is the only way out of this state in a
lights-out datacenter (well, if there is a BMC, it can issue a
hardware RESET to the chip).
Style errors in original file fixed, at request of Randy Dunlap:
eliminate '/**' in non-kernel-doc comments.

Fixes: 208067 <https://bugzilla.kernel.org/show_bug.cgi?id=208067>
Reported-by: David P. Reed <[email protected]>
Reported-by: Randy Dunlap <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: David P. Reed <[email protected]>
---
arch/x86/include/asm/virtext.h | 40 ++++++++++++++++++++++++----------
arch/x86/kernel/reboot.c | 13 +++--------
2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 9aad0e0876fb..ed22c1983da8 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -30,15 +30,15 @@ static inline int cpu_has_vmx(void)
}


-/** Disable VMX on the current CPU
+/* Disable VMX on the current CPU
*
- * vmxoff causes a undefined-opcode exception if vmxon was not run
- * on the CPU previously. Only call this function if you know VMX
- * is enabled.
+ * vmxoff causes an undefined-opcode exception if vmxon was not run
+ * on the CPU previously. Only call this function directly if you know VMX
+ * is enabled *and* CPU is in VMX root operation.
*/
static inline void cpu_vmxoff(void)
{
- asm volatile ("vmxoff");
+ asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success */
cr4_clear_bits(X86_CR4_VMXE);
}

@@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
return __read_cr4() & X86_CR4_VMXE;
}

-/** Disable VMX if it is enabled on the current CPU
- *
- * You shouldn't call this if cpu_has_vmx() returns 0.
+/*
+ * Safely disable VMX root operation if active
+ * Note that if CPU is not in VMX root operation this
+ * VMXOFF will fault an undefined operation fault,
+ * so use the exception masking facility to handle that RARE
+ * case.
+ * You shouldn't call this directly if cpu_has_vmx() returns 0
+ */
+static inline void cpu_vmxoff_safe(void)
+{
+ asm volatile("1:vmxoff\n\t" /* clears all flags on success */
+ "2:\n\t"
+ _ASM_EXTABLE(1b, 2b)
+ ::: "cc", "memory");
+ cr4_clear_bits(X86_CR4_VMXE);
+}
+
+/*
+ * Force disable VMX if it is enabled on the current CPU,
+ * when it is unknown whether CPU is in VMX operation.
*/
static inline void __cpu_emergency_vmxoff(void)
{
- if (cpu_vmx_enabled())
- cpu_vmxoff();
+ if (!cpu_vmx_enabled())
+ return;
+ cpu_vmxoff_safe();
}

-/** Disable VMX if it is supported and enabled on the current CPU
+/* Force disable VMX if it is supported on current CPU
*/
static inline void cpu_emergency_vmxoff(void)
{
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e040ba6be27b..b0e6b106a67e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
*
* For safety, we will avoid running the nmi_shootdown_cpus()
* stuff unnecessarily, but we don't have a way to check
- * if other CPUs have VMX enabled. So we will call it only if the
- * CPU we are running on has VMX enabled.
- *
- * We will miss cases where VMX is not enabled on all CPUs. This
- * shouldn't do much harm because KVM always enable VMX on all
- * CPUs anyway. But we can miss it on the small window where KVM
- * is still enabling VMX.
+ * if other CPUs have VMX enabled.
*/
- if (cpu_has_vmx() && cpu_vmx_enabled()) {
+ if (cpu_has_vmx()) {
/* Disable VMX on this CPU. */
- cpu_vmxoff();
+ cpu_emergency_vmxoff();

/* Halt and disable VMX on the other CPUs */
nmi_shootdown_cpus(vmxoff_nmi);
-
}
}

--
2.26.2

2020-06-25 06:35:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
> -/** Disable VMX on the current CPU
> +/* Disable VMX on the current CPU
> *
> - * vmxoff causes a undefined-opcode exception if vmxon was not run
> - * on the CPU previously. Only call this function if you know VMX
> - * is enabled.
> + * vmxoff causes an undefined-opcode exception if vmxon was not run
> + * on the CPU previously. Only call this function directly if you know VMX
> + * is enabled *and* CPU is in VMX root operation.
> */
> static inline void cpu_vmxoff(void)
> {
> - asm volatile ("vmxoff");
> + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success */
> cr4_clear_bits(X86_CR4_VMXE);
> }
>
> @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
> return __read_cr4() & X86_CR4_VMXE;
> }
>
> -/** Disable VMX if it is enabled on the current CPU
> - *
> - * You shouldn't call this if cpu_has_vmx() returns 0.
> +/*
> + * Safely disable VMX root operation if active
> + * Note that if CPU is not in VMX root operation this
> + * VMXOFF will fault an undefined operation fault,
> + * so use the exception masking facility to handle that RARE
> + * case.
> + * You shouldn't call this directly if cpu_has_vmx() returns 0
> + */
> +static inline void cpu_vmxoff_safe(void)
> +{
> + asm volatile("1:vmxoff\n\t" /* clears all flags on success */

Eh, I wouldn't bother with the comment, there are a million other caveats
with VMXOFF that are far more interesting.

> + "2:\n\t"
> + _ASM_EXTABLE(1b, 2b)
> + ::: "cc", "memory");

Adding the memory and flags clobber should be a separate patch.

> + cr4_clear_bits(X86_CR4_VMXE);
> +}


I don't see any value in safe/unsafe variants. The only in-kernel user of
VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF
helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add
the exception fixup to cpu_vmxoff() and call it good.

> +
> +/*
> + * Force disable VMX if it is enabled on the current CPU,
> + * when it is unknown whether CPU is in VMX operation.
> */
> static inline void __cpu_emergency_vmxoff(void)
> {
> - if (cpu_vmx_enabled())
> - cpu_vmxoff();
> + if (!cpu_vmx_enabled())
> + return;
> + cpu_vmxoff_safe();

Unnecessary churn.

> }
>
> -/** Disable VMX if it is supported and enabled on the current CPU
> +/* Force disable VMX if it is supported on current CPU
> */
> static inline void cpu_emergency_vmxoff(void)
> {
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index e040ba6be27b..b0e6b106a67e 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
> *
> * For safety, we will avoid running the nmi_shootdown_cpus()
> * stuff unnecessarily, but we don't have a way to check
> - * if other CPUs have VMX enabled. So we will call it only if the
> - * CPU we are running on has VMX enabled.
> - *
> - * We will miss cases where VMX is not enabled on all CPUs. This
> - * shouldn't do much harm because KVM always enable VMX on all
> - * CPUs anyway. But we can miss it on the small window where KVM
> - * is still enabling VMX.
> + * if other CPUs have VMX enabled.
> */
> - if (cpu_has_vmx() && cpu_vmx_enabled()) {
> + if (cpu_has_vmx()) {
> /* Disable VMX on this CPU. */
> - cpu_vmxoff();
> + cpu_emergency_vmxoff();

This also needs to be in a separate patch. And it should use
__cpu_emergency_vmxoff() instead of cpu_emergency_vmxoff().

>
> /* Halt and disable VMX on the other CPUs */
> nmi_shootdown_cpus(vmxoff_nmi);
> -
> }
> }
>
> --
> 2.26.2
>

2020-06-25 14:47:14

by David P. Reed

[permalink] [raw]
Subject: Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

[Sorry: this is resent because my mailer included HTML, rejected by LKML]
Thanks for the response, Sean ... I had thought everyone was too busy to follow up from the first version.
 
I confess I'm not sure why this should be broken up into a patch series, given that it is so very small and is all aimed at the same category of bug.
 
And the "emergency" path pre-existed, I didn't want to propose removing it, since I assumed it was there for a reason. I didn't want to include my own judgement as to whether there should only be one path. (I'm pretty sure I didn't find a VMXOFF in KVM separately from the instance in this include file, but I will check).
 
A question: if I make it a series, I have to test each patch doesn't break something individually, in order to handle the case where one patch is accepted and the others are not. Do I need to test each individual patch thoroughly as an independent patch against all those cases?
I know the combination don't break anything and fixes the issues I've discovered by testing all combinations (and I've done some thorough testing of panics, oopses crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash source to verify the fix fixes the problem's manifestations and to verify that it doesn't break any of the working paths.
 
That said, I'm willing to do a v3 "series" based on these suggestions if it will smooth its acceptance. If it's not going to get accepted after doing that, my motivation is flagging.
On Thursday, June 25, 2020 2:06am, "Sean Christopherson" <[email protected]> said:



> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
> > -/** Disable VMX on the current CPU
> > +/* Disable VMX on the current CPU
> > *
> > - * vmxoff causes a undefined-opcode exception if vmxon was not run
> > - * on the CPU previously. Only call this function if you know VMX
> > - * is enabled.
> > + * vmxoff causes an undefined-opcode exception if vmxon was not run
> > + * on the CPU previously. Only call this function directly if you know VMX
> > + * is enabled *and* CPU is in VMX root operation.
> > */
> > static inline void cpu_vmxoff(void)
> > {
> > - asm volatile ("vmxoff");
> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success
> */
> > cr4_clear_bits(X86_CR4_VMXE);
> > }
> >
> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
> > return __read_cr4() & X86_CR4_VMXE;
> > }
> >
> > -/** Disable VMX if it is enabled on the current CPU
> > - *
> > - * You shouldn't call this if cpu_has_vmx() returns 0.
> > +/*
> > + * Safely disable VMX root operation if active
> > + * Note that if CPU is not in VMX root operation this
> > + * VMXOFF will fault an undefined operation fault,
> > + * so use the exception masking facility to handle that RARE
> > + * case.
> > + * You shouldn't call this directly if cpu_has_vmx() returns 0
> > + */
> > +static inline void cpu_vmxoff_safe(void)
> > +{
> > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */
>
> Eh, I wouldn't bother with the comment, there are a million other caveats
> with VMXOFF that are far more interesting.
>
> > + "2:\n\t"
> > + _ASM_EXTABLE(1b, 2b)
> > + ::: "cc", "memory");
>
> Adding the memory and flags clobber should be a separate patch.
>
> > + cr4_clear_bits(X86_CR4_VMXE);
> > +}
>
>
> I don't see any value in safe/unsafe variants. The only in-kernel user of
> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF
> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add
> the exception fixup to cpu_vmxoff() and call it good.
>
> > +
> > +/*
> > + * Force disable VMX if it is enabled on the current CPU,
> > + * when it is unknown whether CPU is in VMX operation.
> > */
> > static inline void __cpu_emergency_vmxoff(void)
> > {
> > - if (cpu_vmx_enabled())
> > - cpu_vmxoff();
> > + if (!cpu_vmx_enabled())
> > + return;
> > + cpu_vmxoff_safe();
>
> Unnecessary churn.
>
> > }
> >
> > -/** Disable VMX if it is supported and enabled on the current CPU
> > +/* Force disable VMX if it is supported on current CPU
> > */
> > static inline void cpu_emergency_vmxoff(void)
> > {
> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index e040ba6be27b..b0e6b106a67e 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
> > *
> > * For safety, we will avoid running the nmi_shootdown_cpus()
> > * stuff unnecessarily, but we don't have a way to check
> > - * if other CPUs have VMX enabled. So we will call it only if the
> > - * CPU we are running on has VMX enabled.
> > - *
> > - * We will miss cases where VMX is not enabled on all CPUs. This
> > - * shouldn't do much harm because KVM always enable VMX on all
> > - * CPUs anyway. But we can miss it on the small window where KVM
> > - * is still enabling VMX.
> > + * if other CPUs have VMX enabled.
> > */
> > - if (cpu_has_vmx() && cpu_vmx_enabled()) {
> > + if (cpu_has_vmx()) {
> > /* Disable VMX on this CPU. */
> > - cpu_vmxoff();
> > + cpu_emergency_vmxoff();
>
> This also needs to be in a separate patch. And it should use
> __cpu_emergency_vmxoff() instead of cpu_emergency_vmxoff().
>
> >
> > /* Halt and disable VMX on the other CPUs */
> > nmi_shootdown_cpus(vmxoff_nmi);
> > -
> > }
> > }
> >
> > --
> > 2.26.2
> >
>

2020-06-25 15:00:25

by David P. Reed

[permalink] [raw]
Subject: Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

Correction to my comment below.
On Thursday, June 25, 2020 10:45am, "David P. Reed" <[email protected]> said:

> [Sorry: this is resent because my mailer included HTML, rejected by LKML]
> Thanks for the response, Sean ... I had thought everyone was too busy to follow up
> from the first version.
>  
> I confess I'm not sure why this should be broken up into a patch series, given
> that it is so very small and is all aimed at the same category of bug.
>  
> And the "emergency" path pre-existed, I didn't want to propose removing it, since
> I assumed it was there for a reason. I didn't want to include my own judgement as
> to whether there should only be one path. (I'm pretty sure I didn't find a VMXOFF
> in KVM separately from the instance in this include file, but I will check).
Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though it uses exception masking, seems to do other things, perhaps related to nested KVM, but I haven't studied the deep logic of KVM nesting.

>  
> A question: if I make it a series, I have to test each patch doesn't break
> something individually, in order to handle the case where one patch is accepted
> and the others are not. Do I need to test each individual patch thoroughly as an
> independent patch against all those cases?
> I know the combination don't break anything and fixes the issues I've discovered
> by testing all combinations (and I've done some thorough testing of panics, oopses
> crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash source
> to verify the fix fixes the problem's manifestations and to verify that it doesn't
> break any of the working paths.
>  
> That said, I'm willing to do a v3 "series" based on these suggestions if it will
> smooth its acceptance. If it's not going to get accepted after doing that, my
> motivation is flagging.
> On Thursday, June 25, 2020 2:06am, "Sean Christopherson"
> <[email protected]> said:
>
>
>
>> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
>> > -/** Disable VMX on the current CPU
>> > +/* Disable VMX on the current CPU
>> > *
>> > - * vmxoff causes a undefined-opcode exception if vmxon was not run
>> > - * on the CPU previously. Only call this function if you know VMX
>> > - * is enabled.
>> > + * vmxoff causes an undefined-opcode exception if vmxon was not run
>> > + * on the CPU previously. Only call this function directly if you know VMX
>> > + * is enabled *and* CPU is in VMX root operation.
>> > */
>> > static inline void cpu_vmxoff(void)
>> > {
>> > - asm volatile ("vmxoff");
>> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success
>> */
>> > cr4_clear_bits(X86_CR4_VMXE);
>> > }
>> >
>> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
>> > return __read_cr4() & X86_CR4_VMXE;
>> > }
>> >
>> > -/** Disable VMX if it is enabled on the current CPU
>> > - *
>> > - * You shouldn't call this if cpu_has_vmx() returns 0.
>> > +/*
>> > + * Safely disable VMX root operation if active
>> > + * Note that if CPU is not in VMX root operation this
>> > + * VMXOFF will fault an undefined operation fault,
>> > + * so use the exception masking facility to handle that RARE
>> > + * case.
>> > + * You shouldn't call this directly if cpu_has_vmx() returns 0
>> > + */
>> > +static inline void cpu_vmxoff_safe(void)
>> > +{
>> > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */
>>
>> Eh, I wouldn't bother with the comment, there are a million other caveats
>> with VMXOFF that are far more interesting.
>>
>> > + "2:\n\t"
>> > + _ASM_EXTABLE(1b, 2b)
>> > + ::: "cc", "memory");
>>
>> Adding the memory and flags clobber should be a separate patch.
>>
>> > + cr4_clear_bits(X86_CR4_VMXE);
>> > +}
>>
>>
>> I don't see any value in safe/unsafe variants. The only in-kernel user of
>> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF
>> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add
>> the exception fixup to cpu_vmxoff() and call it good.
>>
>> > +
>> > +/*
>> > + * Force disable VMX if it is enabled on the current CPU,
>> > + * when it is unknown whether CPU is in VMX operation.
>> > */
>> > static inline void __cpu_emergency_vmxoff(void)
>> > {
>> > - if (cpu_vmx_enabled())
>> > - cpu_vmxoff();
>> > + if (!cpu_vmx_enabled())
>> > + return;
>> > + cpu_vmxoff_safe();
>>
>> Unnecessary churn.
>>
>> > }
>> >
>> > -/** Disable VMX if it is supported and enabled on the current CPU
>> > +/* Force disable VMX if it is supported on current CPU
>> > */
>> > static inline void cpu_emergency_vmxoff(void)
>> > {
>> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
>> > index e040ba6be27b..b0e6b106a67e 100644
>> > --- a/arch/x86/kernel/reboot.c
>> > +++ b/arch/x86/kernel/reboot.c
>> > @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
>> > *
>> > * For safety, we will avoid running the nmi_shootdown_cpus()
>> > * stuff unnecessarily, but we don't have a way to check
>> > - * if other CPUs have VMX enabled. So we will call it only if the
>> > - * CPU we are running on has VMX enabled.
>> > - *
>> > - * We will miss cases where VMX is not enabled on all CPUs. This
>> > - * shouldn't do much harm because KVM always enable VMX on all
>> > - * CPUs anyway. But we can miss it on the small window where KVM
>> > - * is still enabling VMX.
>> > + * if other CPUs have VMX enabled.
>> > */
>> > - if (cpu_has_vmx() && cpu_vmx_enabled()) {
>> > + if (cpu_has_vmx()) {
>> > /* Disable VMX on this CPU. */
>> > - cpu_vmxoff();
>> > + cpu_emergency_vmxoff();
>>
>> This also needs to be in a separate patch. And it should use
>> __cpu_emergency_vmxoff() instead of cpu_emergency_vmxoff().
>>
>> >
>> > /* Halt and disable VMX on the other CPUs */
>> > nmi_shootdown_cpus(vmxoff_nmi);
>> > -
>> > }
>> > }
>> >
>> > --
>> > 2.26.2
>> >
>>
>
>


2020-06-29 20:56:20

by David P. Reed

[permalink] [raw]
Subject: Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

Simple question for those on the To: and CC: list here. Should I abandon any hope of this patch being accepted? It's been a long time.

The non-response after I acknowledged that this was discovered by when working on a personal, non-commercial research project - which is "out-of-tree" (apparently dirty words on LKML) has me thinking my contribution is unwanted. That's fine, I suppose. I can maintain this patch out-of-tree as well.
I did incorporate all the helpful suggestions I received in this second patch, and given some encouragement, will happily submit a revised v3 if there is any likelihood of acceptance. I'm wary of doing more radical changes (like combining emergency and normal paths).


On Thursday, June 25, 2020 10:59am, "David P. Reed" <[email protected]> said:

> Correction to my comment below.
> On Thursday, June 25, 2020 10:45am, "David P. Reed" <[email protected]> said:
>
>> [Sorry: this is resent because my mailer included HTML, rejected by LKML]
>> Thanks for the response, Sean ... I had thought everyone was too busy to follow
>> up
>> from the first version.
>>  
>> I confess I'm not sure why this should be broken up into a patch series, given
>> that it is so very small and is all aimed at the same category of bug.
>>  
>> And the "emergency" path pre-existed, I didn't want to propose removing it, since
>> I assumed it was there for a reason. I didn't want to include my own judgement as
>> to whether there should only be one path. (I'm pretty sure I didn't find a VMXOFF
>> in KVM separately from the instance in this include file, but I will check).
> Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though it
> uses exception masking, seems to do other things, perhaps related to nested KVM,
> but I haven't studied the deep logic of KVM nesting.
>
>>  
>> A question: if I make it a series, I have to test each patch doesn't break
>> something individually, in order to handle the case where one patch is accepted
>> and the others are not. Do I need to test each individual patch thoroughly as an
>> independent patch against all those cases?
>> I know the combination don't break anything and fixes the issues I've discovered
>> by testing all combinations (and I've done some thorough testing of panics,
>> oopses
>> crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash
>> source
>> to verify the fix fixes the problem's manifestations and to verify that it
>> doesn't
>> break any of the working paths.
>>  
>> That said, I'm willing to do a v3 "series" based on these suggestions if it will
>> smooth its acceptance. If it's not going to get accepted after doing that, my
>> motivation is flagging.
>> On Thursday, June 25, 2020 2:06am, "Sean Christopherson"
>> <[email protected]> said:
>>
>>
>>
>>> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
>>> > -/** Disable VMX on the current CPU
>>> > +/* Disable VMX on the current CPU
>>> > *
>>> > - * vmxoff causes a undefined-opcode exception if vmxon was not run
>>> > - * on the CPU previously. Only call this function if you know VMX
>>> > - * is enabled.
>>> > + * vmxoff causes an undefined-opcode exception if vmxon was not run
>>> > + * on the CPU previously. Only call this function directly if you know VMX
>>> > + * is enabled *and* CPU is in VMX root operation.
>>> > */
>>> > static inline void cpu_vmxoff(void)
>>> > {
>>> > - asm volatile ("vmxoff");
>>> > + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success
>>> */
>>> > cr4_clear_bits(X86_CR4_VMXE);
>>> > }
>>> >
>>> > @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
>>> > return __read_cr4() & X86_CR4_VMXE;
>>> > }
>>> >
>>> > -/** Disable VMX if it is enabled on the current CPU
>>> > - *
>>> > - * You shouldn't call this if cpu_has_vmx() returns 0.
>>> > +/*
>>> > + * Safely disable VMX root operation if active
>>> > + * Note that if CPU is not in VMX root operation this
>>> > + * VMXOFF will fault an undefined operation fault,
>>> > + * so use the exception masking facility to handle that RARE
>>> > + * case.
>>> > + * You shouldn't call this directly if cpu_has_vmx() returns 0
>>> > + */
>>> > +static inline void cpu_vmxoff_safe(void)
>>> > +{
>>> > + asm volatile("1:vmxoff\n\t" /* clears all flags on success */
>>>
>>> Eh, I wouldn't bother with the comment, there are a million other caveats
>>> with VMXOFF that are far more interesting.
>>>
>>> > + "2:\n\t"
>>> > + _ASM_EXTABLE(1b, 2b)
>>> > + ::: "cc", "memory");
>>>
>>> Adding the memory and flags clobber should be a separate patch.
>>>
>>> > + cr4_clear_bits(X86_CR4_VMXE);
>>> > +}
>>>
>>>
>>> I don't see any value in safe/unsafe variants. The only in-kernel user of
>>> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF
>>> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add
>>> the exception fixup to cpu_vmxoff() and call it good.
>>>
>>> > +
>>> > +/*
>>> > + * Force disable VMX if it is enabled on the current CPU,
>>> > + * when it is unknown whether CPU is in VMX operation.
>>> > */
>>> > static inline void __cpu_emergency_vmxoff(void)
>>> > {
>>> > - if (cpu_vmx_enabled())
>>> > - cpu_vmxoff();
>>> > + if (!cpu_vmx_enabled())
>>> > + return;
>>> > + cpu_vmxoff_safe();
>>>
>>> Unnecessary churn.
>>>
>>> > }
>>> >
>>> > -/** Disable VMX if it is supported and enabled on the current CPU
>>> > +/* Force disable VMX if it is supported on current CPU
>>> > */
>>> > static inline void cpu_emergency_vmxoff(void)
>>> > {
>>> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
>>> > index e040ba6be27b..b0e6b106a67e 100644
>>> > --- a/arch/x86/kernel/reboot.c
>>> > +++ b/arch/x86/kernel/reboot.c
>>> > @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
>>> > *
>>> > * For safety, we will avoid running the nmi_shootdown_cpus()
>>> > * stuff unnecessarily, but we don't have a way to check
>>> > - * if other CPUs have VMX enabled. So we will call it only if the
>>> > - * CPU we are running on has VMX enabled.
>>> > - *
>>> > - * We will miss cases where VMX is not enabled on all CPUs. This
>>> > - * shouldn't do much harm because KVM always enable VMX on all
>>> > - * CPUs anyway. But we can miss it on the small window where KVM
>>> > - * is still enabling VMX.
>>> > + * if other CPUs have VMX enabled.
>>> > */
>>> > - if (cpu_has_vmx() && cpu_vmx_enabled()) {
>>> > + if (cpu_has_vmx()) {
>>> > /* Disable VMX on this CPU. */
>>> > - cpu_vmxoff();
>>> > + cpu_emergency_vmxoff();
>>>
>>> This also needs to be in a separate patch. And it should use
>>> __cpu_emergency_vmxoff() instead of cpu_emergency_vmxoff().
>>>
>>> >
>>> > /* Halt and disable VMX on the other CPUs */
>>> > nmi_shootdown_cpus(vmxoff_nmi);
>>> > -
>>> > }
>>> > }
>>> >
>>> > --
>>> > 2.26.2
>>> >
>>>
>>
>>
>
>
>


2020-06-29 21:24:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash



> On Jun 29, 2020, at 1:54 PM, David P. Reed <[email protected]> wrote:
>
> Simple question for those on the To: and CC: list here. Should I abandon any hope of this patch being accepted? It's been a long time.
>
> The non-response after I acknowledged that this was discovered by when working on a personal, non-commercial research project - which is "out-of-tree" (apparently dirty words on LKML) has me thinking my contribution is unwanted. That's fine, I suppose. I can maintain this patch out-of-tree as well.
> I did incorporate all the helpful suggestions I received in this second patch, and given some encouragement, will happily submit a revised v3 if there is any likelihood of acceptance. I'm wary of doing more radical changes (like combining emergency and normal paths).
>

Sorry about being slow and less actively encouraging than we should be. We absolutely welcome personal contributions. The actual problem is that everyone is worked and we’re all slow. Also, you may be hitting a corner case in the process: is this a KVM patch or an x86 patch?

>
> On Thursday, June 25, 2020 10:59am, "David P. Reed" <[email protected]> said:
>
>> Correction to my comment below.
>> On Thursday, June 25, 2020 10:45am, "David P. Reed" <[email protected]> said:
>>
>>> [Sorry: this is resent because my mailer included HTML, rejected by LKML]
>>> Thanks for the response, Sean ... I had thought everyone was too busy to follow
>>> up
>>> from the first version.
>>>
>>> I confess I'm not sure why this should be broken up into a patch series, given
>>> that it is so very small and is all aimed at the same category of bug.
>>>
>>> And the "emergency" path pre-existed, I didn't want to propose removing it, since
>>> I assumed it was there for a reason. I didn't want to include my own judgement as
>>> to whether there should only be one path. (I'm pretty sure I didn't find a VMXOFF
>>> in KVM separately from the instance in this include file, but I will check).
>> Just checked. Yes, the kvm code's handling of VMXOFF is separate, and though it
>> uses exception masking, seems to do other things, perhaps related to nested KVM,
>> but I haven't studied the deep logic of KVM nesting.
>>
>>>
>>> A question: if I make it a series, I have to test each patch doesn't break
>>> something individually, in order to handle the case where one patch is accepted
>>> and the others are not. Do I need to test each individual patch thoroughly as an
>>> independent patch against all those cases?
>>> I know the combination don't break anything and fixes the issues I've discovered
>>> by testing all combinations (and I've done some thorough testing of panics,
>>> oopses
>>> crashes, kexec, ... under all combinations of CR4.VMXE enablement and crash
>>> source
>>> to verify the fix fixes the problem's manifestations and to verify that it
>>> doesn't
>>> break any of the working paths.
>>>
>>> That said, I'm willing to do a v3 "series" based on these suggestions if it will
>>> smooth its acceptance. If it's not going to get accepted after doing that, my
>>> motivation is flagging.
>>> On Thursday, June 25, 2020 2:06am, "Sean Christopherson"
>>> <[email protected]> said:
>>>
>>>
>>>
>>>>> On Thu, Jun 11, 2020 at 03:48:18PM -0400, David P. Reed wrote:
>>>>>> -/** Disable VMX on the current CPU
>>>>>> +/* Disable VMX on the current CPU
>>>>>> *
>>>>>> - * vmxoff causes a undefined-opcode exception if vmxon was not run
>>>>>> - * on the CPU previously. Only call this function if you know VMX
>>>>>> - * is enabled.
>>>>>> + * vmxoff causes an undefined-opcode exception if vmxon was not run
>>>>>> + * on the CPU previously. Only call this function directly if you know VMX
>>>>>> + * is enabled *and* CPU is in VMX root operation.
>>>>>> */
>>>>>> static inline void cpu_vmxoff(void)
>>>>>> {
>>>>>> - asm volatile ("vmxoff");
>>>>>> + asm volatile ("vmxoff" ::: "cc", "memory"); /* clears all flags on success
>>>>> */
>>>>>> cr4_clear_bits(X86_CR4_VMXE);
>>>>>> }
>>>>>>
>>>>>> @@ -47,17 +47,35 @@ static inline int cpu_vmx_enabled(void)
>>>>>> return __read_cr4() & X86_CR4_VMXE;
>>>>>> }
>>>>>>
>>>>>> -/** Disable VMX if it is enabled on the current CPU
>>>>>> - *
>>>>>> - * You shouldn't call this if cpu_has_vmx() returns 0.
>>>>>> +/*
>>>>>> + * Safely disable VMX root operation if active
>>>>>> + * Note that if CPU is not in VMX root operation this
>>>>>> + * VMXOFF will fault an undefined operation fault,
>>>>>> + * so use the exception masking facility to handle that RARE
>>>>>> + * case.
>>>>>> + * You shouldn't call this directly if cpu_has_vmx() returns 0
>>>>>> + */
>>>>>> +static inline void cpu_vmxoff_safe(void)
>>>>>> +{
>>>>>> + asm volatile("1:vmxoff\n\t" /* clears all flags on success */
>>>>>
>>>>> Eh, I wouldn't bother with the comment, there are a million other caveats
>>>>> with VMXOFF that are far more interesting.
>>>>>
>>>>>> + "2:\n\t"
>>>>>> + _ASM_EXTABLE(1b, 2b)
>>>>>> + ::: "cc", "memory");
>>>>>
>>>>> Adding the memory and flags clobber should be a separate patch.
>>>>>
>>>>>> + cr4_clear_bits(X86_CR4_VMXE);
>>>>>> +}
>>>>>
>>>>>
>>>>> I don't see any value in safe/unsafe variants. The only in-kernel user of
>>>>> VMXOFF outside of the emergency flows is KVM, which has its own VMXOFF
>>>>> helper, i.e. all users of cpu_vmxoff() want the "safe" variant. Just add
>>>>> the exception fixup to cpu_vmxoff() and call it good.
>>>>>
>>>>>> +
>>>>>> +/*
>>>>>> + * Force disable VMX if it is enabled on the current CPU,
>>>>>> + * when it is unknown whether CPU is in VMX operation.
>>>>>> */
>>>>>> static inline void __cpu_emergency_vmxoff(void)
>>>>>> {
>>>>>> - if (cpu_vmx_enabled())
>>>>>> - cpu_vmxoff();
>>>>>> + if (!cpu_vmx_enabled())
>>>>>> + return;
>>>>>> + cpu_vmxoff_safe();
>>>>>
>>>>> Unnecessary churn.
>>>>>
>>>>>> }
>>>>>>
>>>>>> -/** Disable VMX if it is supported and enabled on the current CPU
>>>>>> +/* Force disable VMX if it is supported on current CPU
>>>>>> */
>>>>>> static inline void cpu_emergency_vmxoff(void)
>>>>>> {
>>>>>> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
>>>>>> index e040ba6be27b..b0e6b106a67e 100644
>>>>>> --- a/arch/x86/kernel/reboot.c
>>>>>> +++ b/arch/x86/kernel/reboot.c
>>>>>> @@ -540,21 +540,14 @@ static void emergency_vmx_disable_all(void)
>>>>>> *
>>>>>> * For safety, we will avoid running the nmi_shootdown_cpus()
>>>>>> * stuff unnecessarily, but we don't have a way to check
>>>>>> - * if other CPUs have VMX enabled. So we will call it only if the
>>>>>> - * CPU we are running on has VMX enabled.
>>>>>> - *
>>>>>> - * We will miss cases where VMX is not enabled on all CPUs. This
>>>>>> - * shouldn't do much harm because KVM always enable VMX on all
>>>>>> - * CPUs anyway. But we can miss it on the small window where KVM
>>>>>> - * is still enabling VMX.
>>>>>> + * if other CPUs have VMX enabled.
>>>>>> */
>>>>>> - if (cpu_has_vmx() && cpu_vmx_enabled()) {
>>>>>> + if (cpu_has_vmx()) {
>>>>>> /* Disable VMX on this CPU. */
>>>>>> - cpu_vmxoff();
>>>>>> + cpu_emergency_vmxoff();
>>>>>
>>>>> This also needs to be in a separate patch. And it should use
>>>>> __cpu_emergency_vmxoff() instead of cpu_emergency_vmxoff().
>>>>>
>>>>>>
>>>>>> /* Halt and disable VMX on the other CPUs */
>>>>>> nmi_shootdown_cpus(vmxoff_nmi);
>>>>>> -
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> --
>>>>>> 2.26.2
>>>>>>
>>>>>
>>>
>>>
>>
>>
>>
>
>

2020-06-29 21:51:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash

On Mon, Jun 29, 2020 at 02:22:45PM -0700, Andy Lutomirski wrote:
>
>
> > On Jun 29, 2020, at 1:54 PM, David P. Reed <[email protected]> wrote:
> >
> > Simple question for those on the To: and CC: list here. Should I
> > abandon any hope of this patch being accepted? It's been a long time.
> >
> > The non-response after I acknowledged that this was discovered by when
> > working on a personal, non-commercial research project - which is
> > "out-of-tree" (apparently dirty words on LKML) has me thinking my
> > contribution is unwanted. That's fine, I suppose. I can maintain this patch
> > out-of-tree as well. I did incorporate all the helpful suggestions I
> > received in this second patch, and given some encouragement, will happily
> > submit a revised v3 if there is any likelihood of acceptance. I'm wary of
> > doing more radical changes (like combining emergency and normal paths).
> >
>
> Sorry about being slow and less actively encouraging than we should be. We
> absolutely welcome personal contributions. The actual problem is that
> everyone is worked and we’re all slow. Also, you may be hitting a corner case
> in the process: is this a KVM patch or an x86 patch?

It's an x86 patch as it's not KVM specific, e.g. this code also helps play
nice with out of tree hypervisors.

The code change is mostly good, but it needs to be split up as there are
three separate fixes:

1. Handle #UD on VMXON due to a race.
2. Mark memory and flags as clobbered by VMXON.
3. Change emergency_vmx_disable_all() to not manually check cpu_vmx_enabled().

Yes, the changes are tiny, but if for example #3 introduces a bug then we
don't have to revert #1 and #2. Or perhaps older kernels are only subject
to the #1 and #2 and thus dumping all three changes into a single patch makes
it all harder to backport. In other words, all the usual "one change per
patch" reasons.

2020-06-29 22:48:13

by David P. Reed

[permalink] [raw]
Subject: Re: [PATCH v2] Fix undefined operation VMXOFF during reboot and crash



On Monday, June 29, 2020 5:49pm, "Sean Christopherson" <[email protected]> said:

> On Mon, Jun 29, 2020 at 02:22:45PM -0700, Andy Lutomirski wrote:
>>
>>
>> > On Jun 29, 2020, at 1:54 PM, David P. Reed <[email protected]> wrote:
>> >
>> > Simple question for those on the To: and CC: list here. Should I
>> > abandon any hope of this patch being accepted? It's been a long time.
>> >
>> > The non-response after I acknowledged that this was discovered by when
>> > working on a personal, non-commercial research project - which is
>> > "out-of-tree" (apparently dirty words on LKML) has me thinking my
>> > contribution is unwanted. That's fine, I suppose. I can maintain this patch
>> > out-of-tree as well. I did incorporate all the helpful suggestions I
>> > received in this second patch, and given some encouragement, will happily
>> > submit a revised v3 if there is any likelihood of acceptance. I'm wary of
>> > doing more radical changes (like combining emergency and normal paths).
>> >
>>
>> Sorry about being slow and less actively encouraging than we should be. We
>> absolutely welcome personal contributions. The actual problem is that
>> everyone is worked and we’re all slow. Also, you may be hitting a corner
>> case
>> in the process: is this a KVM patch or an x86 patch?
>
> It's an x86 patch as it's not KVM specific, e.g. this code also helps play
> nice with out of tree hypervisors.
>
> The code change is mostly good, but it needs to be split up as there are
> three separate fixes:
>
> 1. Handle #UD on VMXON due to a race.
> 2. Mark memory and flags as clobbered by VMXON.
> 3. Change emergency_vmx_disable_all() to not manually check cpu_vmx_enabled().
>
> Yes, the changes are tiny, but if for example #3 introduces a bug then we
> don't have to revert #1 and #2. Or perhaps older kernels are only subject
> to the #1 and #2 and thus dumping all three changes into a single patch makes
> it all harder to backport. In other words, all the usual "one change per
> patch" reasons.
>
Thanks. If no one else responds with additional suggestions, I will make it into 3 patches.
I'm happy to learn the nuances of the kernel patch regimen.



2020-07-04 20:50:53

by David P. Reed

[permalink] [raw]
Subject: [PATCH v3 0/3] Fix undefined operation VMXOFF during reboot and crash

At the request of Sean Christopherson, the original patch was split
into three patches, each fixing a distinct issue related to the original
bug, of a hang due to VMXOFF causing an undefined operation fault
when the kernel reboots with CR4.VMXE set. The combination of
the patches is the complete fix to the reported bug, and a lurking
error in asm side effects.

David P. Reed (3):
Correct asm VMXOFF side effects
Fix undefined operation fault that can hang a cpu on crash or panic
Force all cpus to exit VMX root operation on crash/panic reliably

arch/x86/include/asm/virtext.h | 24 ++++++++++++++++--------
arch/x86/kernel/reboot.c | 20 +++++++-------------
2 files changed, 23 insertions(+), 21 deletions(-)

--
2.26.2