2023-03-10 21:42:43

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 00/18] x86/reboot: KVM: Clean up "emergency" virt code

Instead of having the reboot code blindly try to disable virtualization
during an emergency, use the existing callback into KVM to disable virt
as "needed". In quotes because KVM still somewhat blindly attempts to
disable virt, e.g. if KVM is loaded but doesn't have active VMs and thus
hasn't enabled hardware. That could theoretically be "fixed", but due to
the callback being invoked from NMI context, I'm not convinced it would
be worth the complexity. E.g. false positives would still be possible,
and KVM would have to play games with the per-CPU hardware_enabled flag
to ensure there are no false negatives.

The callback is currently used only to VMCLEAR the per-CPU list of VMCSes,
but not using the callback to disable virt isn't intentional. Arguably, a
callback should have been used in the initial "disable virt" code added by
commit d176720d34c7 ("x86: disable VMX on all CPUs on reboot"). And the
kexec logic added (much later) by commit f23d1f4a1160 ("x86/kexec: VMCLEAR
VMCSs loaded on all cpus if necessary") simply missed the opportunity to
use the callback for all virtualization needs.

Once KVM handles disabling virt, move all of the helpers provided by
virtext.h into KVM proper.

There's one outlier patch, "Make KVM_AMD depend on CPU_SUP_AMD or
CPU_SUP_HYGON", that I included here because it felt weird to pull in the
"must be AMD or Hygon" check without KVM demanding that at build time.

Note, there have been conversations at various times about supporting
additional in-tree users of virtualization. Somewhat counter-intuitively,
giving KVM full ownership of virt actually make it _easier_ to support
additional virt users as having all of the code in one place makes it
easier to extract the bits that need to be shared.

v2:
- Disable task migration when probing basic SVM and VMX support to avoid
logging misleading info (wrong CPU) if probing fails.

v1: https://lore.kernel.org/all/[email protected]

Sean Christopherson (18):
x86/reboot: VMCLEAR active VMCSes before emergency reboot
x86/reboot: Expose VMCS crash hooks if and only if KVM_INTEL is
enabled
x86/reboot: Harden virtualization hooks for emergency reboot
x86/reboot: KVM: Handle VMXOFF in KVM's reboot callback
x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot
callback
x86/reboot: Hoist "disable virt" helpers above "emergency reboot" path
x86/reboot: Disable virtualization during reboot iff callback is
registered
x86/reboot: Assert that IRQs are disabled when turning off
virtualization
x86/virt: KVM: Open code cpu_has_vmx() in KVM VMX
x86/virt: KVM: Move VMXOFF helpers into KVM VMX
KVM: SVM: Make KVM_AMD depend on CPU_SUP_AMD or CPU_SUP_HYGON
x86/virt: Drop unnecessary check on extended CPUID level in
cpu_has_svm()
x86/virt: KVM: Open code cpu_has_svm() into kvm_is_svm_supported()
KVM: SVM: Check that the current CPU supports SVM in
kvm_is_svm_supported()
KVM: VMX: Ensure CPU is stable when probing basic VMX support
x86/virt: KVM: Move "disable SVM" helper into KVM SVM
KVM: x86: Force kvm_rebooting=true during emergency reboot/crash
KVM: SVM: Use "standard" stgi() helper when disabling SVM

arch/x86/include/asm/kexec.h | 2 -
arch/x86/include/asm/reboot.h | 7 ++
arch/x86/include/asm/virtext.h | 154 ---------------------------------
arch/x86/kernel/crash.c | 31 -------
arch/x86/kernel/reboot.c | 65 ++++++++++----
arch/x86/kvm/Kconfig | 2 +-
arch/x86/kvm/svm/svm.c | 70 ++++++++++++---
arch/x86/kvm/vmx/vmx.c | 68 +++++++++++----
8 files changed, 168 insertions(+), 231 deletions(-)
delete mode 100644 arch/x86/include/asm/virtext.h


base-commit: 45dd9bc75d9adc9483f0c7d662ba6e73ed698a0b
--
2.40.0.rc1.284.g88254d51c5-goog



2023-03-10 21:42:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 01/18] x86/reboot: VMCLEAR active VMCSes before emergency reboot

VMCLEAR active VMCSes before any emergency reboot, not just if the kernel
may kexec into a new kernel after a crash. Per Intel's SDM, the VMX
architecture doesn't require the CPU to flush the VMCS cache on INIT. If
an emergency reboot doesn't RESET CPUs, cached VMCSes could theoretically
be kept and only be written back to memory after the new kernel is booted,
i.e. could effectively corrupt memory after reboot.

Opportunistically remove the setting of the global pointer to NULL to make
checkpatch happy.

Cc: Andrew Cooper <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kexec.h | 2 --
arch/x86/include/asm/reboot.h | 2 ++
arch/x86/kernel/crash.c | 31 -------------------------------
arch/x86/kernel/reboot.c | 22 ++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 10 +++-------
5 files changed, 27 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index a3760ca796aa..256eee99afc8 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -208,8 +208,6 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image);
#endif
#endif

-typedef void crash_vmclear_fn(void);
-extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index bc5b4d788c08..2551baec927d 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,6 +25,8 @@ void __noreturn machine_real_restart(unsigned int type);
#define MRR_BIOS 0
#define MRR_APM 1

+typedef void crash_vmclear_fn(void);
+extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
void cpu_emergency_disable_virtualization(void);

typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..54cd959cb316 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -48,38 +48,12 @@ struct crash_memmap_data {
unsigned int type;
};

-/*
- * This is used to VMCLEAR all VMCSs loaded on the
- * processor. And when loading kvm_intel module, the
- * callback function pointer will be assigned.
- *
- * protected by rcu.
- */
-crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss = NULL;
-EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss);
-
-static inline void cpu_crash_vmclear_loaded_vmcss(void)
-{
- crash_vmclear_fn *do_vmclear_operation = NULL;
-
- rcu_read_lock();
- do_vmclear_operation = rcu_dereference(crash_vmclear_loaded_vmcss);
- if (do_vmclear_operation)
- do_vmclear_operation();
- rcu_read_unlock();
-}
-
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
{
crash_save_cpu(regs, cpu);

- /*
- * VMCLEAR VMCSs loaded on all cpus if needed.
- */
- cpu_crash_vmclear_loaded_vmcss();
-
/*
* Disable Intel PT to stop its logging
*/
@@ -133,11 +107,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)

crash_smp_send_stop();

- /*
- * VMCLEAR VMCSs loaded on this cpu if needed.
- */
- cpu_crash_vmclear_loaded_vmcss();
-
cpu_emergency_disable_virtualization();

/*
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index d03c551defcc..299b970e5f82 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -787,6 +787,26 @@ void machine_crash_shutdown(struct pt_regs *regs)
}
#endif

+/*
+ * This is used to VMCLEAR all VMCSs loaded on the
+ * processor. And when loading kvm_intel module, the
+ * callback function pointer will be assigned.
+ *
+ * protected by rcu.
+ */
+crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
+EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss);
+
+static inline void cpu_crash_vmclear_loaded_vmcss(void)
+{
+ crash_vmclear_fn *do_vmclear_operation = NULL;
+
+ rcu_read_lock();
+ do_vmclear_operation = rcu_dereference(crash_vmclear_loaded_vmcss);
+ if (do_vmclear_operation)
+ do_vmclear_operation();
+ rcu_read_unlock();
+}

/* This is the CPU performing the emergency shutdown work. */
int crashing_cpu = -1;
@@ -798,6 +818,8 @@ int crashing_cpu = -1;
*/
void cpu_emergency_disable_virtualization(void)
{
+ cpu_crash_vmclear_loaded_vmcss();
+
cpu_emergency_vmxoff();
cpu_emergency_svm_disable();
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcac3efcde41..302086255be6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -41,7 +41,7 @@
#include <asm/idtentry.h>
#include <asm/io.h>
#include <asm/irq_remapping.h>
-#include <asm/kexec.h>
+#include <asm/reboot.h>
#include <asm/perf_event.h>
#include <asm/mmu_context.h>
#include <asm/mshyperv.h>
@@ -743,7 +743,6 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
return ret;
}

-#ifdef CONFIG_KEXEC_CORE
static void crash_vmclear_local_loaded_vmcss(void)
{
int cpu = raw_smp_processor_id();
@@ -753,7 +752,6 @@ static void crash_vmclear_local_loaded_vmcss(void)
loaded_vmcss_on_cpu_link)
vmcs_clear(v->vmcs);
}
-#endif /* CONFIG_KEXEC_CORE */

static void __loaded_vmcs_clear(void *arg)
{
@@ -8553,10 +8551,9 @@ static void __vmx_exit(void)
{
allow_smaller_maxphyaddr = false;

-#ifdef CONFIG_KEXEC_CORE
RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
synchronize_rcu();
-#endif
+
vmx_cleanup_l1d_flush();
}

@@ -8605,10 +8602,9 @@ static int __init vmx_init(void)
pi_init_cpu(cpu);
}

-#ifdef CONFIG_KEXEC_CORE
rcu_assign_pointer(crash_vmclear_loaded_vmcss,
crash_vmclear_local_loaded_vmcss);
-#endif
+
vmx_check_vmcs12_offsets();

/*
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:42:51

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 02/18] x86/reboot: Expose VMCS crash hooks if and only if KVM_INTEL is enabled

Expose the crash/reboot hooks used by KVM to do VMCLEAR+VMXOFF if and
only if there's a potential in-tree user, KVM_INTEL.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/reboot.h | 2 ++
arch/x86/kernel/reboot.c | 4 ++++
2 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 2551baec927d..33c8e911e0de 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,8 +25,10 @@ void __noreturn machine_real_restart(unsigned int type);
#define MRR_BIOS 0
#define MRR_APM 1

+#if IS_ENABLED(CONFIG_KVM_INTEL)
typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
+#endif
void cpu_emergency_disable_virtualization(void);

typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 299b970e5f82..6c0b1634b884 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -787,6 +787,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
}
#endif

+#if IS_ENABLED(CONFIG_KVM_INTEL)
/*
* This is used to VMCLEAR all VMCSs loaded on the
* processor. And when loading kvm_intel module, the
@@ -807,6 +808,7 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
do_vmclear_operation();
rcu_read_unlock();
}
+#endif

/* This is the CPU performing the emergency shutdown work. */
int crashing_cpu = -1;
@@ -818,7 +820,9 @@ int crashing_cpu = -1;
*/
void cpu_emergency_disable_virtualization(void)
{
+#if IS_ENABLED(CONFIG_KVM_INTEL)
cpu_crash_vmclear_loaded_vmcss();
+#endif

cpu_emergency_vmxoff();
cpu_emergency_svm_disable();
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:42:54

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 03/18] x86/reboot: Harden virtualization hooks for emergency reboot

Provide dedicated helpers to (un)register virt hooks used during an
emergency crash/reboot, and WARN if there is an attempt to overwrite
the registered callback, or an attempt to do an unpaired unregister.

Opportunsitically use rcu_assign_pointer() instead of RCU_INIT_POINTER(),
mainly so that the set/unset paths are more symmetrical, but also because
any performance gains from using RCU_INIT_POINTER() are meaningless for
this code.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/reboot.h | 5 +++--
arch/x86/kernel/reboot.c | 30 ++++++++++++++++++++++++------
arch/x86/kvm/vmx/vmx.c | 6 ++----
3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 33c8e911e0de..1d098a7d329a 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -26,8 +26,9 @@ void __noreturn machine_real_restart(unsigned int type);
#define MRR_APM 1

#if IS_ENABLED(CONFIG_KVM_INTEL)
-typedef void crash_vmclear_fn(void);
-extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
+typedef void (cpu_emergency_virt_cb)(void);
+void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
+void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback);
#endif
void cpu_emergency_disable_virtualization(void);

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 6c0b1634b884..78182b2969db 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -795,17 +795,35 @@ void machine_crash_shutdown(struct pt_regs *regs)
*
* protected by rcu.
*/
-crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
-EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss);
+static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;
+
+void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback)
+{
+ if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback)))
+ return;
+
+ rcu_assign_pointer(cpu_emergency_virt_callback, callback);
+}
+EXPORT_SYMBOL_GPL(cpu_emergency_register_virt_callback);
+
+void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback)
+{
+ if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback) != callback))
+ return;
+
+ rcu_assign_pointer(cpu_emergency_virt_callback, NULL);
+ synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(cpu_emergency_unregister_virt_callback);

static inline void cpu_crash_vmclear_loaded_vmcss(void)
{
- crash_vmclear_fn *do_vmclear_operation = NULL;
+ cpu_emergency_virt_cb *callback;

rcu_read_lock();
- do_vmclear_operation = rcu_dereference(crash_vmclear_loaded_vmcss);
- if (do_vmclear_operation)
- do_vmclear_operation();
+ callback = rcu_dereference(cpu_emergency_virt_callback);
+ if (callback)
+ callback();
rcu_read_unlock();
}
#endif
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 302086255be6..41095fc864f3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8551,8 +8551,7 @@ static void __vmx_exit(void)
{
allow_smaller_maxphyaddr = false;

- RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
- synchronize_rcu();
+ cpu_emergency_unregister_virt_callback(crash_vmclear_local_loaded_vmcss);

vmx_cleanup_l1d_flush();
}
@@ -8602,8 +8601,7 @@ static int __init vmx_init(void)
pi_init_cpu(cpu);
}

- rcu_assign_pointer(crash_vmclear_loaded_vmcss,
- crash_vmclear_local_loaded_vmcss);
+ cpu_emergency_register_virt_callback(crash_vmclear_local_loaded_vmcss);

vmx_check_vmcs12_offsets();

--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:43:05

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 04/18] x86/reboot: KVM: Handle VMXOFF in KVM's reboot callback

Use KVM VMX's reboot/crash callback to do VMXOFF in an emergency instead
of manually and blindly doing VMXOFF. There's no need to attempt VMXOFF
if KVM (or some other out-of-tree hypervisor) isn't loaded/active, i.e.
if the CPU can't possibly be post-VMXON.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/virtext.h | 10 ----------
arch/x86/kernel/reboot.c | 30 +++++++++---------------------
arch/x86/kvm/vmx/vmx.c | 8 +++++---
3 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 3b12e6b99412..5bc29fab15da 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -70,16 +70,6 @@ static inline void __cpu_emergency_vmxoff(void)
cpu_vmxoff();
}

-/** Disable VMX if it is supported and enabled on the current CPU
- */
-static inline void cpu_emergency_vmxoff(void)
-{
- if (cpu_has_vmx())
- __cpu_emergency_vmxoff();
-}
-
-
-

/*
* SVM functions:
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 78182b2969db..5fb1fbf14c82 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -788,13 +788,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
#endif

#if IS_ENABLED(CONFIG_KVM_INTEL)
-/*
- * This is used to VMCLEAR all VMCSs loaded on the
- * processor. And when loading kvm_intel module, the
- * callback function pointer will be assigned.
- *
- * protected by rcu.
- */
+/* RCU-protected callback to disable virtualization prior to reboot. */
static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;

void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback)
@@ -815,17 +809,6 @@ void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback)
synchronize_rcu();
}
EXPORT_SYMBOL_GPL(cpu_emergency_unregister_virt_callback);
-
-static inline void cpu_crash_vmclear_loaded_vmcss(void)
-{
- cpu_emergency_virt_cb *callback;
-
- rcu_read_lock();
- callback = rcu_dereference(cpu_emergency_virt_callback);
- if (callback)
- callback();
- rcu_read_unlock();
-}
#endif

/* This is the CPU performing the emergency shutdown work. */
@@ -839,10 +822,15 @@ int crashing_cpu = -1;
void cpu_emergency_disable_virtualization(void)
{
#if IS_ENABLED(CONFIG_KVM_INTEL)
- cpu_crash_vmclear_loaded_vmcss();
-#endif
+ cpu_emergency_virt_cb *callback;

- cpu_emergency_vmxoff();
+ rcu_read_lock();
+ callback = rcu_dereference(cpu_emergency_virt_callback);
+ if (callback)
+ callback();
+ rcu_read_unlock();
+#endif
+ /* KVM_AMD doesn't yet utilize the common callback. */
cpu_emergency_svm_disable();
}

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 41095fc864f3..9e196b9fe183 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -743,7 +743,7 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
return ret;
}

-static void crash_vmclear_local_loaded_vmcss(void)
+static void vmx_emergency_disable(void)
{
int cpu = raw_smp_processor_id();
struct loaded_vmcs *v;
@@ -751,6 +751,8 @@ static void crash_vmclear_local_loaded_vmcss(void)
list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
loaded_vmcss_on_cpu_link)
vmcs_clear(v->vmcs);
+
+ __cpu_emergency_vmxoff();
}

static void __loaded_vmcs_clear(void *arg)
@@ -8551,7 +8553,7 @@ static void __vmx_exit(void)
{
allow_smaller_maxphyaddr = false;

- cpu_emergency_unregister_virt_callback(crash_vmclear_local_loaded_vmcss);
+ cpu_emergency_unregister_virt_callback(vmx_emergency_disable);

vmx_cleanup_l1d_flush();
}
@@ -8601,7 +8603,7 @@ static int __init vmx_init(void)
pi_init_cpu(cpu);
}

- cpu_emergency_register_virt_callback(crash_vmclear_local_loaded_vmcss);
+ cpu_emergency_register_virt_callback(vmx_emergency_disable);

vmx_check_vmcs12_offsets();

--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:43:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 05/18] x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback

Use the virt callback to disable SVM (and set GIF=1) during an emergency
instead of blindly attempting to disable SVM. Like the VMX case, if KVM
(or an out-of-tree hypervisor) isn't loaded/active, SVM can't be in use.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/reboot.h | 2 +-
arch/x86/include/asm/virtext.h | 8 --------
arch/x86/kernel/reboot.c | 6 ++----
arch/x86/kvm/svm/svm.c | 19 +++++++++++++++++--
4 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 1d098a7d329a..dc2b77e6704b 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,7 +25,7 @@ void __noreturn machine_real_restart(unsigned int type);
#define MRR_BIOS 0
#define MRR_APM 1

-#if IS_ENABLED(CONFIG_KVM_INTEL)
+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
typedef void (cpu_emergency_virt_cb)(void);
void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback);
diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 5bc29fab15da..aaed66249ccf 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -133,12 +133,4 @@ static inline void cpu_svm_disable(void)
}
}

-/** Makes sure SVM is disabled, if it is supported on the CPU
- */
-static inline void cpu_emergency_svm_disable(void)
-{
- if (cpu_has_svm(NULL))
- cpu_svm_disable();
-}
-
#endif /* _ASM_X86_VIRTEX_H */
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 5fb1fbf14c82..db535542b7ab 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -787,7 +787,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
}
#endif

-#if IS_ENABLED(CONFIG_KVM_INTEL)
+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
/* RCU-protected callback to disable virtualization prior to reboot. */
static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;

@@ -821,7 +821,7 @@ int crashing_cpu = -1;
*/
void cpu_emergency_disable_virtualization(void)
{
-#if IS_ENABLED(CONFIG_KVM_INTEL)
+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
cpu_emergency_virt_cb *callback;

rcu_read_lock();
@@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
callback();
rcu_read_unlock();
#endif
- /* KVM_AMD doesn't yet utilize the common callback. */
- cpu_emergency_svm_disable();
}

#if defined(CONFIG_SMP)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b43775490074..541dd978a94b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -38,6 +38,7 @@
#include <asm/spec-ctrl.h>
#include <asm/cpu_device_id.h>
#include <asm/traps.h>
+#include <asm/reboot.h>
#include <asm/fpu/api.h>

#include <asm/virtext.h>
@@ -565,6 +566,11 @@ void __svm_write_tsc_multiplier(u64 multiplier)
preempt_enable();
}

+static void svm_emergency_disable(void)
+{
+ cpu_svm_disable();
+}
+
static void svm_hardware_disable(void)
{
/* Make sure we clean up behind us */
@@ -5092,6 +5098,13 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
.pmu_ops = &amd_pmu_ops,
};

+static void __svm_exit(void)
+{
+ kvm_x86_vendor_exit();
+
+ cpu_emergency_unregister_virt_callback(svm_emergency_disable);
+}
+
static int __init svm_init(void)
{
int r;
@@ -5105,6 +5118,8 @@ static int __init svm_init(void)
if (r)
return r;

+ cpu_emergency_register_virt_callback(svm_emergency_disable);
+
/*
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
@@ -5117,14 +5132,14 @@ static int __init svm_init(void)
return 0;

err_kvm_init:
- kvm_x86_vendor_exit();
+ __svm_exit();
return r;
}

static void __exit svm_exit(void)
{
kvm_exit();
- kvm_x86_vendor_exit();
+ __svm_exit();
}

module_init(svm_init)
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:43:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 06/18] x86/reboot: Hoist "disable virt" helpers above "emergency reboot" path

Move the various "disable virtualization" helpers above the emergency
reboot path so that emergency_reboot_disable_virtualization() can be
stubbed out in a future patch if neither KVM_INTEL nor KVM_AMD is enabled,
i.e. if there is no in-tree user of CPU virtualization.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/reboot.h | 4 +-
arch/x86/kernel/reboot.c | 82 +++++++++++++++++------------------
2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index dc2b77e6704b..2be5b89c9a05 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -29,8 +29,10 @@ void __noreturn machine_real_restart(unsigned int type);
typedef void (cpu_emergency_virt_cb)(void);
void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback);
-#endif
void cpu_emergency_disable_virtualization(void);
+#else
+static inline void cpu_emergency_disable_virtualization(void) {}
+#endif

typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
void nmi_panic_self_stop(struct pt_regs *regs);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index db535542b7ab..cb268ec7ce85 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -530,6 +530,46 @@ static inline void kb_wait(void)

static inline void nmi_shootdown_cpus_on_restart(void);

+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
+/* RCU-protected callback to disable virtualization prior to reboot. */
+static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;
+
+void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback)
+{
+ if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback)))
+ return;
+
+ rcu_assign_pointer(cpu_emergency_virt_callback, callback);
+}
+EXPORT_SYMBOL_GPL(cpu_emergency_register_virt_callback);
+
+void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback)
+{
+ if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback) != callback))
+ return;
+
+ rcu_assign_pointer(cpu_emergency_virt_callback, NULL);
+ synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(cpu_emergency_unregister_virt_callback);
+
+/*
+ * Disable virtualization, i.e. VMX or SVM, to ensure INIT is recognized during
+ * reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM blocks INIT if
+ * GIF=0, i.e. if the crash occurred between CLGI and STGI.
+ */
+void cpu_emergency_disable_virtualization(void)
+{
+ cpu_emergency_virt_cb *callback;
+
+ rcu_read_lock();
+ callback = rcu_dereference(cpu_emergency_virt_callback);
+ if (callback)
+ callback();
+ rcu_read_unlock();
+}
+#endif /* CONFIG_KVM_INTEL || CONFIG_KVM_AMD */
+
static void emergency_reboot_disable_virtualization(void)
{
/* Just make sure we won't change CPUs while doing this */
@@ -787,51 +827,9 @@ void machine_crash_shutdown(struct pt_regs *regs)
}
#endif

-#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
-/* RCU-protected callback to disable virtualization prior to reboot. */
-static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;
-
-void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback)
-{
- if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback)))
- return;
-
- rcu_assign_pointer(cpu_emergency_virt_callback, callback);
-}
-EXPORT_SYMBOL_GPL(cpu_emergency_register_virt_callback);
-
-void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback)
-{
- if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback) != callback))
- return;
-
- rcu_assign_pointer(cpu_emergency_virt_callback, NULL);
- synchronize_rcu();
-}
-EXPORT_SYMBOL_GPL(cpu_emergency_unregister_virt_callback);
-#endif
-
/* This is the CPU performing the emergency shutdown work. */
int crashing_cpu = -1;

-/*
- * Disable virtualization, i.e. VMX or SVM, to ensure INIT is recognized during
- * reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM blocks INIT if
- * GIF=0, i.e. if the crash occurred between CLGI and STGI.
- */
-void cpu_emergency_disable_virtualization(void)
-{
-#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
- cpu_emergency_virt_cb *callback;
-
- rcu_read_lock();
- callback = rcu_dereference(cpu_emergency_virt_callback);
- if (callback)
- callback();
- rcu_read_unlock();
-#endif
-}
-
#if defined(CONFIG_SMP)

static nmi_shootdown_cb shootdown_callback;
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:43:35

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 07/18] x86/reboot: Disable virtualization during reboot iff callback is registered

Attempt to disable virtualization during an emergency reboot if and only
if there is a registered virt callback, i.e. iff a hypervisor (KVM) is
active. If there's no active hypervisor, then the CPU can't be operating
with VMX or SVM enabled (barring an egregious bug).

Note, IRQs are disabled, which prevents KVM from coming along and enabling
virtualization after the fact.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kernel/reboot.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index cb268ec7ce85..dd7def3d4144 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -22,7 +22,6 @@
#include <asm/reboot_fixups.h>
#include <asm/reboot.h>
#include <asm/pci_x86.h>
-#include <asm/virtext.h>
#include <asm/cpu.h>
#include <asm/nmi.h>
#include <asm/smp.h>
@@ -568,7 +567,6 @@ void cpu_emergency_disable_virtualization(void)
callback();
rcu_read_unlock();
}
-#endif /* CONFIG_KVM_INTEL || CONFIG_KVM_AMD */

static void emergency_reboot_disable_virtualization(void)
{
@@ -585,7 +583,7 @@ static void emergency_reboot_disable_virtualization(void)
* Do the NMI shootdown even if virtualization is off on _this_ CPU, as
* other CPUs may have virtualization enabled.
*/
- if (cpu_has_vmx() || cpu_has_svm(NULL)) {
+ if (rcu_access_pointer(cpu_emergency_virt_callback)) {
/* Safely force _this_ CPU out of VMX/SVM operation. */
cpu_emergency_disable_virtualization();

@@ -593,6 +591,9 @@ static void emergency_reboot_disable_virtualization(void)
nmi_shootdown_cpus_on_restart();
}
}
+#else
+static void emergency_reboot_disable_virtualization(void) { }
+#endif /* CONFIG_KVM_INTEL || CONFIG_KVM_AMD */


void __attribute__((weak)) mach_reboot_fixups(void)
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:43:41

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 08/18] x86/reboot: Assert that IRQs are disabled when turning off virtualization

Assert that IRQs are disabled when turning off virtualization in an
emergency. KVM enables hardware via on_each_cpu(), i.e. could re-enable
hardware if a pending IPI were delivered after disabling virtualization.

Remove a misleading comment from emergency_reboot_disable_virtualization()
about "just" needing to guarantee the CPU is stable (see above).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kernel/reboot.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index dd7def3d4144..f0d405bc718e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -561,6 +561,13 @@ void cpu_emergency_disable_virtualization(void)
{
cpu_emergency_virt_cb *callback;

+ /*
+ * IRQs must be disabled as KVM enables virtualization in hardware via
+ * function call IPIs, i.e. IRQs need to be disabled to guarantee
+ * virtualization stays disabled.
+ */
+ lockdep_assert_irqs_disabled();
+
rcu_read_lock();
callback = rcu_dereference(cpu_emergency_virt_callback);
if (callback)
@@ -570,7 +577,6 @@ void cpu_emergency_disable_virtualization(void)

static void emergency_reboot_disable_virtualization(void)
{
- /* Just make sure we won't change CPUs while doing this */
local_irq_disable();

/*
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:43:54

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 09/18] x86/virt: KVM: Open code cpu_has_vmx() in KVM VMX

Fold the raw CPUID check for VMX into kvm_is_vmx_supported(), its sole
user. Keep the check even though KVM also checks X86_FEATURE_VMX, as the
intent is to provide a unique error message if VMX is unsupported by
hardware, whereas X86_FEATURE_VMX may be clear due to firmware and/or
kernel actions.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/virtext.h | 10 ----------
arch/x86/kvm/vmx/vmx.c | 2 +-
2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index aaed66249ccf..b1171a5ad452 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -22,14 +22,6 @@
/*
* VMX functions:
*/
-
-static inline int cpu_has_vmx(void)
-{
- unsigned long ecx = cpuid_ecx(1);
- return test_bit(5, &ecx); /* CPUID.1:ECX.VMX[bit 5] -> VT */
-}
-
-
/**
* cpu_vmxoff() - Disable VMX on the current CPU
*
@@ -61,8 +53,6 @@ static inline int cpu_vmx_enabled(void)
}

/** Disable VMX if it is enabled on the current CPU
- *
- * You shouldn't call this if cpu_has_vmx() returns 0.
*/
static inline void __cpu_emergency_vmxoff(void)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9e196b9fe183..58856e196536 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2747,7 +2747,7 @@ static bool kvm_is_vmx_supported(void)
{
int cpu = raw_smp_processor_id();

- if (!cpu_has_vmx()) {
+ if (!(cpuid_ecx(1) & feature_bit(VMX))) {
pr_err("VMX not supported by CPU %d\n", cpu);
return false;
}
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:43:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 10/18] x86/virt: KVM: Move VMXOFF helpers into KVM VMX

Now that VMX is disabled in emergencies via the virt callbacks, move the
VMXOFF helpers into KVM, the only remaining user.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/virtext.h | 42 ----------------------------------
arch/x86/kvm/vmx/vmx.c | 29 ++++++++++++++++++++---
2 files changed, 26 insertions(+), 45 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index b1171a5ad452..a27801f2bc71 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -19,48 +19,6 @@
#include <asm/svm.h>
#include <asm/tlbflush.h>

-/*
- * VMX functions:
- */
-/**
- * cpu_vmxoff() - Disable VMX on the current CPU
- *
- * Disable VMX and clear CR4.VMXE (even if VMXOFF faults)
- *
- * Note, VMXOFF causes a #UD if the CPU is !post-VMXON, but it's impossible to
- * atomically track post-VMXON state, e.g. this may be called in NMI context.
- * Eat all faults as all other faults on VMXOFF faults are mode related, i.e.
- * faults are guaranteed to be due to the !post-VMXON check unless the CPU is
- * magically in RM, VM86, compat mode, or at CPL>0.
- */
-static inline int cpu_vmxoff(void)
-{
- asm_volatile_goto("1: vmxoff\n\t"
- _ASM_EXTABLE(1b, %l[fault])
- ::: "cc", "memory" : fault);
-
- cr4_clear_bits(X86_CR4_VMXE);
- return 0;
-
-fault:
- cr4_clear_bits(X86_CR4_VMXE);
- return -EIO;
-}
-
-static inline int cpu_vmx_enabled(void)
-{
- return __read_cr4() & X86_CR4_VMXE;
-}
-
-/** Disable VMX if it is enabled on the current CPU
- */
-static inline void __cpu_emergency_vmxoff(void)
-{
- if (cpu_vmx_enabled())
- cpu_vmxoff();
-}
-
-
/*
* SVM functions:
*/
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 58856e196536..158853ab0d1b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -47,7 +47,6 @@
#include <asm/mshyperv.h>
#include <asm/mwait.h>
#include <asm/spec-ctrl.h>
-#include <asm/virtext.h>
#include <asm/vmx.h>

#include "capabilities.h"
@@ -743,6 +742,29 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
return ret;
}

+/*
+ * Disable VMX and clear CR4.VMXE (even if VMXOFF faults)
+ *
+ * Note, VMXOFF causes a #UD if the CPU is !post-VMXON, but it's impossible to
+ * atomically track post-VMXON state, e.g. this may be called in NMI context.
+ * Eat all faults as all other faults on VMXOFF faults are mode related, i.e.
+ * faults are guaranteed to be due to the !post-VMXON check unless the CPU is
+ * magically in RM, VM86, compat mode, or at CPL>0.
+ */
+static int kvm_cpu_vmxoff(void)
+{
+ asm_volatile_goto("1: vmxoff\n\t"
+ _ASM_EXTABLE(1b, %l[fault])
+ ::: "cc", "memory" : fault);
+
+ cr4_clear_bits(X86_CR4_VMXE);
+ return 0;
+
+fault:
+ cr4_clear_bits(X86_CR4_VMXE);
+ return -EIO;
+}
+
static void vmx_emergency_disable(void)
{
int cpu = raw_smp_processor_id();
@@ -752,7 +774,8 @@ static void vmx_emergency_disable(void)
loaded_vmcss_on_cpu_link)
vmcs_clear(v->vmcs);

- __cpu_emergency_vmxoff();
+ if (__read_cr4() & X86_CR4_VMXE)
+ kvm_cpu_vmxoff();
}

static void __loaded_vmcs_clear(void *arg)
@@ -2848,7 +2871,7 @@ static void vmx_hardware_disable(void)
{
vmclear_local_loaded_vmcss();

- if (cpu_vmxoff())
+ if (kvm_cpu_vmxoff())
kvm_spurious_fault();

hv_reset_evmcs();
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:44:16

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 11/18] KVM: SVM: Make KVM_AMD depend on CPU_SUP_AMD or CPU_SUP_HYGON

Make building KVM SVM support depend on support for AMD or Hygon. KVM
already effectively restricts SVM support to AMD and Hygon by virtue of
the vendor string checks in cpu_has_svm(), and KVM VMX supports depends
on one of its three known vendors (Intel, Centaur, or Zhaoxin).

Add the CPU_SUP_HYGON clause even though CPU_SUP_HYGON selects CPU_SUP_AMD
to document that KVM SVM support isn't just for AMD CPUs, and to prevent
breakage should Hygon support ever become a standalone thing.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8e578311ca9d..0d403e9b6a47 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -102,7 +102,7 @@ config X86_SGX_KVM

config KVM_AMD
tristate "KVM for AMD processors support"
- depends on KVM
+ depends on KVM && (CPU_SUP_AMD || CPU_SUP_HYGON)
help
Provides support for KVM on AMD processors equipped with the AMD-V
(SVM) extensions.
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:44:32

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 12/18] x86/virt: Drop unnecessary check on extended CPUID level in cpu_has_svm()

Drop the explicit check on the extended CPUID level in cpu_has_svm(), the
kernel's cached CPUID info will leave the entire SVM leaf unset if said
leaf is not supported by hardware. Prior to using cached information,
the check was needed to avoid false positives due to Intel's rather crazy
CPUID behavior of returning the values of the maximum supported leaf if
the specified leaf is unsupported.

Fixes: 682a8108872f ("x86/kvm/svm: Simplify cpu_has_svm()")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/virtext.h | 6 ------
1 file changed, 6 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index a27801f2bc71..be50c414efe4 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -39,12 +39,6 @@ static inline int cpu_has_svm(const char **msg)
return 0;
}

- if (boot_cpu_data.extended_cpuid_level < SVM_CPUID_FUNC) {
- if (msg)
- *msg = "can't execute cpuid_8000000a";
- return 0;
- }
-
if (!boot_cpu_has(X86_FEATURE_SVM)) {
if (msg)
*msg = "svm not available";
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:44:34

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 13/18] x86/virt: KVM: Open code cpu_has_svm() into kvm_is_svm_supported()

Fold the guts of cpu_has_svm() into kvm_is_svm_supported(), its sole
remaining user.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/virtext.h | 28 ----------------------------
arch/x86/kvm/svm/svm.c | 11 ++++++++---
2 files changed, 8 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index be50c414efe4..632575e257d8 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -22,35 +22,7 @@
/*
* SVM functions:
*/
-
-/** Check if the CPU has SVM support
- *
- * You can use the 'msg' arg to get a message describing the problem,
- * if the function returns zero. Simply pass NULL if you are not interested
- * on the messages; gcc should take care of not generating code for
- * the messages on this case.
- */
-static inline int cpu_has_svm(const char **msg)
-{
- if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
- boot_cpu_data.x86_vendor != X86_VENDOR_HYGON) {
- if (msg)
- *msg = "not amd or hygon";
- return 0;
- }
-
- if (!boot_cpu_has(X86_FEATURE_SVM)) {
- if (msg)
- *msg = "svm not available";
- return 0;
- }
- return 1;
-}
-
-
/** Disable SVM on the current CPU
- *
- * You should call this only if cpu_has_svm() returned true.
*/
static inline void cpu_svm_disable(void)
{
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 541dd978a94b..2934f185960d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -523,11 +523,16 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
static bool kvm_is_svm_supported(void)
{
int cpu = raw_smp_processor_id();
- const char *msg;
u64 vm_cr;

- if (!cpu_has_svm(&msg)) {
- pr_err("SVM not supported by CPU %d, %s\n", cpu, msg);
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
+ boot_cpu_data.x86_vendor != X86_VENDOR_HYGON) {
+ pr_err("CPU %d isn't AMD or Hygon\n", cpu);
+ return false;
+ }
+
+ if (!boot_cpu_has(X86_FEATURE_SVM)) {
+ pr_err("SVM not supported by CPU %d\n", cpu);
return false;
}

--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:44:44

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 14/18] KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()

Check "this" CPU instead of the boot CPU when querying SVM support so that
the per-CPU checks done during hardware enabling actually function as
intended, i.e. will detect issues where SVM isn't support on all CPUs.

Disable migration for the use from svm_init() mostly so that the standard
accessors for the per-CPU data can be used without getting yelled at by
CONFIG_DEBUG_PREEMPT=y sanity checks. Preventing the "disabled by BIOS"
error message from reporting the wrong CPU is largely a bonus, as ensuring
a stable CPU during module load is a non-goal for KVM.

Link: https://lore.kernel.org/all/[email protected]
Cc: Kai Huang <[email protected]>
Cc: Chao Gao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2934f185960d..f04b61c3d9d8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -520,18 +520,20 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
vcpu->arch.osvw.status |= 1;
}

-static bool kvm_is_svm_supported(void)
+static bool __kvm_is_svm_supported(void)
{
- int cpu = raw_smp_processor_id();
+ int cpu = smp_processor_id();
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
+
u64 vm_cr;

- if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
- boot_cpu_data.x86_vendor != X86_VENDOR_HYGON) {
+ if (c->x86_vendor != X86_VENDOR_AMD &&
+ c->x86_vendor != X86_VENDOR_HYGON) {
pr_err("CPU %d isn't AMD or Hygon\n", cpu);
return false;
}

- if (!boot_cpu_has(X86_FEATURE_SVM)) {
+ if (!cpu_has(c, X86_FEATURE_SVM)) {
pr_err("SVM not supported by CPU %d\n", cpu);
return false;
}
@@ -550,9 +552,20 @@ static bool kvm_is_svm_supported(void)
return true;
}

+static bool kvm_is_svm_supported(void)
+{
+ bool supported;
+
+ migrate_disable();
+ supported = __kvm_is_svm_supported();
+ migrate_enable();
+
+ return supported;
+}
+
static int svm_check_processor_compat(void)
{
- if (!kvm_is_svm_supported())
+ if (!__kvm_is_svm_supported())
return -EIO;

return 0;
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:45:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 15/18] KVM: VMX: Ensure CPU is stable when probing basic VMX support

Disable migration when probing VMX support during module load to ensure
the CPU is stable, mostly to match similar SVM logic, where allowing
migration effective requires deliberately writing buggy code. As a bonus,
KVM won't report the wrong CPU to userspace if VMX is unsupported, but in
practice that is a very, very minor bonus as the only way that reporting
the wrong CPU would actually matter is if hardware is broken or if the
system is misconfigured, i.e. if KVM gets migrated from a CPU that _does_
support VMX to a CPU that does _not_ support VMX.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 158853ab0d1b..374e3ddbd476 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2766,9 +2766,9 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
return 0;
}

-static bool kvm_is_vmx_supported(void)
+static bool __kvm_is_vmx_supported(void)
{
- int cpu = raw_smp_processor_id();
+ int cpu = smp_processor_id();

if (!(cpuid_ecx(1) & feature_bit(VMX))) {
pr_err("VMX not supported by CPU %d\n", cpu);
@@ -2784,13 +2784,24 @@ static bool kvm_is_vmx_supported(void)
return true;
}

+static bool kvm_is_vmx_supported(void)
+{
+ bool supported;
+
+ migrate_disable();
+ supported = __kvm_is_vmx_supported();
+ migrate_enable();
+
+ return supported;
+}
+
static int vmx_check_processor_compat(void)
{
int cpu = raw_smp_processor_id();
struct vmcs_config vmcs_conf;
struct vmx_capability vmx_cap;

- if (!kvm_is_vmx_supported())
+ if (!__kvm_is_vmx_supported())
return -EIO;

if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) {
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:45:11

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 16/18] x86/virt: KVM: Move "disable SVM" helper into KVM SVM

Move cpu_svm_disable() into KVM proper now that all hardware
virtualization management is routed through KVM. Remove the now-empty
virtext.h.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/virtext.h | 50 ----------------------------------
arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++--
2 files changed, 25 insertions(+), 53 deletions(-)
delete mode 100644 arch/x86/include/asm/virtext.h

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
deleted file mode 100644
index 632575e257d8..000000000000
--- a/arch/x86/include/asm/virtext.h
+++ /dev/null
@@ -1,50 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/* CPU virtualization extensions handling
- *
- * This should carry the code for handling CPU virtualization extensions
- * that needs to live in the kernel core.
- *
- * Author: Eduardo Habkost <[email protected]>
- *
- * Copyright (C) 2008, Red Hat Inc.
- *
- * Contains code from KVM, Copyright (C) 2006 Qumranet, Inc.
- */
-#ifndef _ASM_X86_VIRTEX_H
-#define _ASM_X86_VIRTEX_H
-
-#include <asm/processor.h>
-
-#include <asm/vmx.h>
-#include <asm/svm.h>
-#include <asm/tlbflush.h>
-
-/*
- * SVM functions:
- */
-/** Disable SVM on the current CPU
- */
-static inline void cpu_svm_disable(void)
-{
- uint64_t efer;
-
- wrmsrl(MSR_VM_HSAVE_PA, 0);
- rdmsrl(MSR_EFER, efer);
- if (efer & EFER_SVME) {
- /*
- * Force GIF=1 prior to disabling SVM to ensure INIT and NMI
- * aren't blocked, e.g. if a fatal error occurred between CLGI
- * and STGI. Note, STGI may #UD if SVM is disabled from NMI
- * context between reading EFER and executing STGI. In that
- * case, GIF must already be set, otherwise the NMI would have
- * been blocked, so just eat the fault.
- */
- asm_volatile_goto("1: stgi\n\t"
- _ASM_EXTABLE(1b, %l[fault])
- ::: "memory" : fault);
-fault:
- wrmsrl(MSR_EFER, efer & ~EFER_SVME);
- }
-}
-
-#endif /* _ASM_X86_VIRTEX_H */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f04b61c3d9d8..2db03991dcdf 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -41,7 +41,6 @@
#include <asm/reboot.h>
#include <asm/fpu/api.h>

-#include <asm/virtext.h>
#include "trace.h"

#include "svm.h"
@@ -584,9 +583,32 @@ void __svm_write_tsc_multiplier(u64 multiplier)
preempt_enable();
}

+static inline void kvm_cpu_svm_disable(void)
+{
+ uint64_t efer;
+
+ wrmsrl(MSR_VM_HSAVE_PA, 0);
+ rdmsrl(MSR_EFER, efer);
+ if (efer & EFER_SVME) {
+ /*
+ * Force GIF=1 prior to disabling SVM to ensure INIT and NMI
+ * aren't blocked, e.g. if a fatal error occurred between CLGI
+ * and STGI. Note, STGI may #UD if SVM is disabled from NMI
+ * context between reading EFER and executing STGI. In that
+ * case, GIF must already be set, otherwise the NMI would have
+ * been blocked, so just eat the fault.
+ */
+ asm_volatile_goto("1: stgi\n\t"
+ _ASM_EXTABLE(1b, %l[fault])
+ ::: "memory" : fault);
+fault:
+ wrmsrl(MSR_EFER, efer & ~EFER_SVME);
+ }
+}
+
static void svm_emergency_disable(void)
{
- cpu_svm_disable();
+ kvm_cpu_svm_disable();
}

static void svm_hardware_disable(void)
@@ -595,7 +617,7 @@ static void svm_hardware_disable(void)
if (tsc_scaling)
__svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT);

- cpu_svm_disable();
+ kvm_cpu_svm_disable();

amd_pmu_disable_virt();
}
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:45:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 17/18] KVM: x86: Force kvm_rebooting=true during emergency reboot/crash

Set kvm_rebooting when virtualization is disabled in an emergency so that
KVM eats faults on virtualization instructions even if kvm_reboot() isn't
reached.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 2 ++
2 files changed, 4 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2db03991dcdf..30f7840151be 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -608,6 +608,8 @@ static inline void kvm_cpu_svm_disable(void)

static void svm_emergency_disable(void)
{
+ kvm_rebooting = true;
+
kvm_cpu_svm_disable();
}

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 374e3ddbd476..8626010f5d54 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -770,6 +770,8 @@ static void vmx_emergency_disable(void)
int cpu = raw_smp_processor_id();
struct loaded_vmcs *v;

+ kvm_rebooting = true;
+
list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
loaded_vmcss_on_cpu_link)
vmcs_clear(v->vmcs);
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-10 21:45:18

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 18/18] KVM: SVM: Use "standard" stgi() helper when disabling SVM

Now that kvm_rebooting is guaranteed to be true prior to disabling SVM
in an emergency, use the existing stgi() helper instead of open coding
STGI. In effect, eat faults on STGI if and only if kvm_rebooting==true.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 30f7840151be..420b35770f0a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -591,17 +591,10 @@ static inline void kvm_cpu_svm_disable(void)
rdmsrl(MSR_EFER, efer);
if (efer & EFER_SVME) {
/*
- * Force GIF=1 prior to disabling SVM to ensure INIT and NMI
- * aren't blocked, e.g. if a fatal error occurred between CLGI
- * and STGI. Note, STGI may #UD if SVM is disabled from NMI
- * context between reading EFER and executing STGI. In that
- * case, GIF must already be set, otherwise the NMI would have
- * been blocked, so just eat the fault.
+ * Force GIF=1 prior to disabling SVM, e.g. to ensure INIT and
+ * NMI aren't blocked.
*/
- asm_volatile_goto("1: stgi\n\t"
- _ASM_EXTABLE(1b, %l[fault])
- ::: "memory" : fault);
-fault:
+ stgi();
wrmsrl(MSR_EFER, efer & ~EFER_SVME);
}
}
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-13 00:31:59

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 02/18] x86/reboot: Expose VMCS crash hooks if and only if KVM_INTEL is enabled

Hi Sean,

Thanks for copying me.

On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> Expose the crash/reboot hooks used by KVM to do VMCLEAR+VMXOFF if and
> only if there's a potential in-tree user, KVM_INTEL.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/reboot.h | 2 ++
> arch/x86/kernel/reboot.c | 4 ++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> index 2551baec927d..33c8e911e0de 100644
> --- a/arch/x86/include/asm/reboot.h
> +++ b/arch/x86/include/asm/reboot.h
> @@ -25,8 +25,10 @@ void __noreturn machine_real_restart(unsigned int type);
> #define MRR_BIOS 0
> #define MRR_APM 1
>
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
> typedef void crash_vmclear_fn(void);
> extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
> +#endif
> void cpu_emergency_disable_virtualization(void);
>
> typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 299b970e5f82..6c0b1634b884 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -787,6 +787,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
> }
> #endif
>
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
> /*
> * This is used to VMCLEAR all VMCSs loaded on the
> * processor. And when loading kvm_intel module, the
> @@ -807,6 +808,7 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
> do_vmclear_operation();
> rcu_read_unlock();
> }
> +#endif
>
> /* This is the CPU performing the emergency shutdown work. */
> int crashing_cpu = -1;
> @@ -818,7 +820,9 @@ int crashing_cpu = -1;
> */
> void cpu_emergency_disable_virtualization(void)
> {
> +#if IS_ENABLED(CONFIG_KVM_INTEL)
> cpu_crash_vmclear_loaded_vmcss();
> +#endif
>
> cpu_emergency_vmxoff();

In the changelog you mentioned to expose the *hooks* (plural) used to do
"VMCLEAR+VMXOFF" only when KVM_INTEL is on, but here only "VMCLEAR" is embraced
with CONFIG_KVM_INTEL. So either the changelog needs improvement, or the code
should be adjusted?

Personally, I think it's better to move VMXOFF part within CONFIG_KVM_INTEL too,
if you want to do this.

But I am not sure whether we want to do this (having CONFIG_KVM_INTEL around the
relevant code). In later patches, you mentioned the case of out-of-tree
hypervisor, for instance, below in the changelog of patch 04:

There's no need to attempt VMXOFF if KVM (or some other out-of-tree 
hypervisor) isn't loaded/active...

This means we want to do handle VMCLEAR+VMXOFF in case of out-of-tree hypervisor
too. So, shouldn't the hooks always exist but not only available when KVM_INTEL
or KVM_AMD is on, so the out-of-tree hypervisor can register their callbacks?


> cpu_emergency_svm_disable();
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>

2023-03-13 00:53:32

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 05/18] x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback

On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> Use the virt callback to disable SVM (and set GIF=1) during an emergency
> instead of blindly attempting to disable SVM.  Like the VMX case, if KVM
> (or an out-of-tree hypervisor) isn't loaded/active, SVM can't be in use.
>
> Signed-off-by: Sean Christopherson <[email protected]>

[...]

> -#if IS_ENABLED(CONFIG_KVM_INTEL)
> +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
>  /* RCU-protected callback to disable virtualization prior to reboot. */
>  static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;
>  
> @@ -821,7 +821,7 @@ int crashing_cpu = -1;
>   */
>  void cpu_emergency_disable_virtualization(void)
>  {
> -#if IS_ENABLED(CONFIG_KVM_INTEL)
> +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
>   cpu_emergency_virt_cb *callback;
>  
>   rcu_read_lock();
> @@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
>   callback();
>   rcu_read_unlock();
>  #endif
> - /* KVM_AMD doesn't yet utilize the common callback. */
> - cpu_emergency_svm_disable();
>  }

Shouldn't the callback be always present since you want to consider 'out-of-
tree' hypervisor case?

2023-03-13 00:54:39

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 07/18] x86/reboot: Disable virtualization during reboot iff callback is registered

On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> Attempt to disable virtualization during an emergency reboot if and only
> if there is a registered virt callback, i.e. iff a hypervisor (KVM) is
> active. If there's no active hypervisor, then the CPU can't be operating
> with VMX or SVM enabled (barring an egregious bug).

IIUC, this patch is the final one that you want to achieve how the "disable
virtualization" callback should work in the non-KVM core kernel (the rest
patches are related to moving VMXOFF code to KVM as the core-kernel now just
calls the callback, etc).  

There are middle step patches (2-7) to eventually help to get to this point.
But to be honest, personally, I am not sure whether those patches are necessary,
i.e. to me they actually cost more time to review since I have to think whether
such intermediate status is reasonable or not. I am wondering whether we can
just merge those patches together as single one, so it's easy to see what is the
final goal to achieve?

Just my 2cents, of course.

>
> Note, IRQs are disabled, which prevents KVM from coming along and enabling
> virtualization after the fact.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kernel/reboot.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index cb268ec7ce85..dd7def3d4144 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -22,7 +22,6 @@
> #include <asm/reboot_fixups.h>
> #include <asm/reboot.h>
> #include <asm/pci_x86.h>
> -#include <asm/virtext.h>
> #include <asm/cpu.h>
> #include <asm/nmi.h>
> #include <asm/smp.h>
> @@ -568,7 +567,6 @@ void cpu_emergency_disable_virtualization(void)
> callback();
> rcu_read_unlock();
> }
> -#endif /* CONFIG_KVM_INTEL || CONFIG_KVM_AMD */
>
> static void emergency_reboot_disable_virtualization(void)
> {
> @@ -585,7 +583,7 @@ static void emergency_reboot_disable_virtualization(void)
> * Do the NMI shootdown even if virtualization is off on _this_ CPU, as
> * other CPUs may have virtualization enabled.
> */
> - if (cpu_has_vmx() || cpu_has_svm(NULL)) {
> + if (rcu_access_pointer(cpu_emergency_virt_callback)) {
> /* Safely force _this_ CPU out of VMX/SVM operation. */
> cpu_emergency_disable_virtualization();
>
> @@ -593,6 +591,9 @@ static void emergency_reboot_disable_virtualization(void)
> nmi_shootdown_cpus_on_restart();
> }
> }
> +#else
> +static void emergency_reboot_disable_virtualization(void) { }
> +#endif /* CONFIG_KVM_INTEL || CONFIG_KVM_AMD */
>
>
> void __attribute__((weak)) mach_reboot_fixups(void)
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>

2023-03-13 02:47:11

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 14/18] KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()

On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> Check "this" CPU instead of the boot CPU when querying SVM support so that
> the per-CPU checks done during hardware enabling actually function as
> intended, i.e. will detect issues where SVM isn't support on all CPUs.
>
> Disable migration for the use from svm_init() mostly so that the standard
> accessors for the per-CPU data can be used without getting yelled at by
> CONFIG_DEBUG_PREEMPT=y sanity checks. Preventing the "disabled by BIOS"
> error message from reporting the wrong CPU is largely a bonus, as ensuring
> a stable CPU during module load is a non-goal for KVM.
>
> Link: https://lore.kernel.org/all/[email protected]
> Cc: Kai Huang <[email protected]>
> Cc: Chao Gao <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>

Should we add:

Fixes: c82a5c5c53c5 ("KVM: x86: Do compatibility checks when onlining CPU")

As that commit introduced using raw_smp_processor_id() to get CPU id in
kvm_is_svm_supported() and print the CPU id out in error message?

> ---
> arch/x86/kvm/svm/svm.c | 25 +++++++++++++++++++------
> 1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 2934f185960d..f04b61c3d9d8 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -520,18 +520,20 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
> vcpu->arch.osvw.status |= 1;
> }
>
> -static bool kvm_is_svm_supported(void)
> +static bool __kvm_is_svm_supported(void)
> {
> - int cpu = raw_smp_processor_id();
> + int cpu = smp_processor_id();

Since we have made sure __kvm_is_svm_supported() is always performed on a stable
cpu, should we keep using raw_smp_processor_id()?  

It is faster than smp_processor_id() when CONFIG_DEBUG_PREEMPT=y, but yes the
latter can help to catch bug.

> + struct cpuinfo_x86 *c = &cpu_data(cpu);
> +
> u64 vm_cr;
>
> - if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
> - boot_cpu_data.x86_vendor != X86_VENDOR_HYGON) {
> + if (c->x86_vendor != X86_VENDOR_AMD &&
> + c->x86_vendor != X86_VENDOR_HYGON) {
> pr_err("CPU %d isn't AMD or Hygon\n", cpu);
> return false;
> }
>
> - if (!boot_cpu_has(X86_FEATURE_SVM)) {
> + if (!cpu_has(c, X86_FEATURE_SVM)) {
> pr_err("SVM not supported by CPU %d\n", cpu);
> return false;
> }
> @@ -550,9 +552,20 @@ static bool kvm_is_svm_supported(void)
> return true;
> }
>
> +static bool kvm_is_svm_supported(void)
> +{
> + bool supported;
> +
> + migrate_disable();
> + supported = __kvm_is_svm_supported();
> + migrate_enable();
> +
> + return supported;
> +}
> +
> static int svm_check_processor_compat(void)
> {
> - if (!kvm_is_svm_supported())
> + if (!__kvm_is_svm_supported())
> return -EIO;
>
> return 0;
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>

2023-03-13 08:26:41

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] x86/reboot: Harden virtualization hooks for emergency reboot

On Fri, Mar 10, 2023 at 01:42:17PM -0800, Sean Christopherson wrote:
>+void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback)
>+{
>+ if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback)))
>+ return;
>+
>+ rcu_assign_pointer(cpu_emergency_virt_callback, callback);

Was it intentional to not call synchronize_rcu() (in the original
code), different from the un-registration path?

>+}
>+EXPORT_SYMBOL_GPL(cpu_emergency_register_virt_callback);
>+
>+void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback)
>+{
>+ if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback) != callback))
>+ return;
>+
>+ rcu_assign_pointer(cpu_emergency_virt_callback, NULL);
>+ synchronize_rcu();
>+}
>+EXPORT_SYMBOL_GPL(cpu_emergency_unregister_virt_callback);
>
> static inline void cpu_crash_vmclear_loaded_vmcss(void)
> {
>- crash_vmclear_fn *do_vmclear_operation = NULL;
>+ cpu_emergency_virt_cb *callback;
>
> rcu_read_lock();
>- do_vmclear_operation = rcu_dereference(crash_vmclear_loaded_vmcss);
>- if (do_vmclear_operation)
>- do_vmclear_operation();
>+ callback = rcu_dereference(cpu_emergency_virt_callback);
>+ if (callback)
>+ callback();
> rcu_read_unlock();
> }
> #endif
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 302086255be6..41095fc864f3 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -8551,8 +8551,7 @@ static void __vmx_exit(void)
> {
> allow_smaller_maxphyaddr = false;
>
>- RCU_INIT_POINTER(crash_vmclear_loaded_vmcss, NULL);
>- synchronize_rcu();
>+ cpu_emergency_unregister_virt_callback(crash_vmclear_local_loaded_vmcss);
>
> vmx_cleanup_l1d_flush();
> }
>@@ -8602,8 +8601,7 @@ static int __init vmx_init(void)
> pi_init_cpu(cpu);
> }
>
>- rcu_assign_pointer(crash_vmclear_loaded_vmcss,
>- crash_vmclear_local_loaded_vmcss);
>+ cpu_emergency_register_virt_callback(crash_vmclear_local_loaded_vmcss);
>
> vmx_check_vmcs12_offsets();
>
>--
>2.40.0.rc1.284.g88254d51c5-goog
>

2023-03-13 17:10:32

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 03/18] x86/reboot: Harden virtualization hooks for emergency reboot

On Mon, Mar 13, 2023, Chao Gao wrote:
> On Fri, Mar 10, 2023 at 01:42:17PM -0800, Sean Christopherson wrote:
> >+void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback)
> >+{
> >+ if (WARN_ON_ONCE(rcu_access_pointer(cpu_emergency_virt_callback)))
> >+ return;
> >+
> >+ rcu_assign_pointer(cpu_emergency_virt_callback, callback);
>
> Was it intentional to not call synchronize_rcu() (in the original
> code), different from the un-registration path?

Yes, synchronize_rcu() is needed when removing a callback to ensure any in-flight
invocations of the callback complete before KVM is unloaded, i.e. to prevent
use-after-free of the KVM module code. Registering a callback doesn't have the
same concerns, and adding a synchronize_rcu() wouldn't do anything in terms of
ensuring virtualization isn't enabled after an emergency restart/shutdown is
initiated.

2023-03-13 17:21:32

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 05/18] x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback

On Mon, Mar 13, 2023, Huang, Kai wrote:
> On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > Use the virt callback to disable SVM (and set GIF=1) during an emergency
> > instead of blindly attempting to disable SVM.� Like the VMX case, if KVM
> > (or an out-of-tree hypervisor) isn't loaded/active, SVM can't be in use.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
>
> [...]
>
> > -#if IS_ENABLED(CONFIG_KVM_INTEL)
> > +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> > �/* RCU-protected callback to disable virtualization prior to reboot. */
> > �static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;
> > �
> > @@ -821,7 +821,7 @@ int crashing_cpu = -1;
> > � */
> > �void cpu_emergency_disable_virtualization(void)
> > �{
> > -#if IS_ENABLED(CONFIG_KVM_INTEL)
> > +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> > � cpu_emergency_virt_cb *callback;
> > �
> > � rcu_read_lock();
> > @@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
> > � callback();
> > � rcu_read_unlock();
> > �#endif
> > - /* KVM_AMD doesn't yet utilize the common callback. */
> > - cpu_emergency_svm_disable();
> > �}
>
> Shouldn't the callback be always present since you want to consider 'out-of-
> tree' hypervisor case?

No? The kernel doesn't provide any guarantees for out-of-tree code. I don't have
a super strong preference, though I do like the effective documentation the checks
provide. Buy more importantly, my understanding is that the x86 maintainers want
to limit the exposure for these types of interfaces, e.g. `git grep IS_ENABLED\(CONFIG_KVM`
for a variety of hooks that are enabled iff KVM is enabled in the kernel config.

2023-03-13 17:30:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 14/18] KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()

On Mon, Mar 13, 2023, Huang, Kai wrote:
> On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > Check "this" CPU instead of the boot CPU when querying SVM support so that
> > the per-CPU checks done during hardware enabling actually function as
> > intended, i.e. will detect issues where SVM isn't support on all CPUs.
> >
> > Disable migration for the use from svm_init() mostly so that the standard
> > accessors for the per-CPU data can be used without getting yelled at by
> > CONFIG_DEBUG_PREEMPT=y sanity checks. Preventing the "disabled by BIOS"
> > error message from reporting the wrong CPU is largely a bonus, as ensuring
> > a stable CPU during module load is a non-goal for KVM.
> >
> > Link: https://lore.kernel.org/all/[email protected]
> > Cc: Kai Huang <[email protected]>
> > Cc: Chao Gao <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
>
> Should we add:
>
> Fixes: c82a5c5c53c5 ("KVM: x86: Do compatibility checks when onlining CPU")
>
> As that commit introduced using raw_smp_processor_id() to get CPU id in
> kvm_is_svm_supported() and print the CPU id out in error message?

My vote is to not to add a Fixes because using raw_smp_processor_id() and not disabling
migration for module probe case was deliberate and is safe. I don't want to give the
impression that the existing code is functionally broken. The only quirk is that
the reporting could be misleading.

That said, I'm not against adding a Fixes tag, because I certainly can't argue
against the reporting being flawed.

> > ---
> > arch/x86/kvm/svm/svm.c | 25 +++++++++++++++++++------
> > 1 file changed, 19 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 2934f185960d..f04b61c3d9d8 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -520,18 +520,20 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
> > vcpu->arch.osvw.status |= 1;
> > }
> >
> > -static bool kvm_is_svm_supported(void)
> > +static bool __kvm_is_svm_supported(void)
> > {
> > - int cpu = raw_smp_processor_id();
> > + int cpu = smp_processor_id();
>
> Since we have made sure __kvm_is_svm_supported() is always performed on a stable
> cpu, should we keep using raw_smp_processor_id()? �
>
> It is faster than smp_processor_id() when CONFIG_DEBUG_PREEMPT=y, but yes the
> latter can help to catch bug.

Most kernels with any amount of CONFIG_DEBUG_* options enabled are comically slow
anyways, I much prefer having the sanity checks than the performance.

2023-03-13 18:33:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 02/18] x86/reboot: Expose VMCS crash hooks if and only if KVM_INTEL is enabled

On Mon, Mar 13, 2023, Huang, Kai wrote:
> Hi Sean,
>
> Thanks for copying me.
>
> On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > Expose the crash/reboot hooks used by KVM to do VMCLEAR+VMXOFF if and
> > only if there's a potential in-tree user, KVM_INTEL.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---

...

> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index 299b970e5f82..6c0b1634b884 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -787,6 +787,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
> > }
> > #endif
> >
> > +#if IS_ENABLED(CONFIG_KVM_INTEL)
> > /*
> > * This is used to VMCLEAR all VMCSs loaded on the
> > * processor. And when loading kvm_intel module, the
> > @@ -807,6 +808,7 @@ static inline void cpu_crash_vmclear_loaded_vmcss(void)
> > do_vmclear_operation();
> > rcu_read_unlock();
> > }
> > +#endif
> >
> > /* This is the CPU performing the emergency shutdown work. */
> > int crashing_cpu = -1;
> > @@ -818,7 +820,9 @@ int crashing_cpu = -1;
> > */
> > void cpu_emergency_disable_virtualization(void)
> > {
> > +#if IS_ENABLED(CONFIG_KVM_INTEL)
> > cpu_crash_vmclear_loaded_vmcss();
> > +#endif
> >
> > cpu_emergency_vmxoff();
>
> In the changelog you mentioned to expose the *hooks* (plural) used to do
> "VMCLEAR+VMXOFF" only when KVM_INTEL is on, but here only "VMCLEAR" is embraced
> with CONFIG_KVM_INTEL. So either the changelog needs improvement, or the code
> should be adjusted?

I'll reword the changelog, "hooks" in my head was referring to the regsiter and
unregister "hooks", not the callback itself.

> Personally, I think it's better to move VMXOFF part within CONFIG_KVM_INTEL too,
> if you want to do this.

That happens eventually in the final third of this series.

> But I am not sure whether we want to do this (having CONFIG_KVM_INTEL around the
> relevant code). In later patches, you mentioned the case of out-of-tree
> hypervisor, for instance, below in the changelog of patch 04:
>
> There's no need to attempt VMXOFF if KVM (or some other out-of-tree�
> hypervisor) isn't loaded/active...
>
> This means we want to do handle VMCLEAR+VMXOFF in case of out-of-tree hypervisor
> too. So, shouldn't the hooks always exist but not only available when KVM_INTEL
> or KVM_AMD is on, so the out-of-tree hypervisor can register their callbacks?

Ah, I see how I confused things with that statement. My intent was only to call
out that, technically, a non-NULL callback doesn't mean KVM is loaded. I didn't
intend to sign the kernel up for going out of its way to support out-of-tree hypervisors.

Does it read better if I add a "that piggybacked the callback" qualifier?

There's no need to attempt VMXOFF if KVM (or some other out-of-tree hypervisor
that piggybacked the callback) isn't loaded/active, i.e. if the CPU can't
possibly be post-VMXON.

2023-03-13 18:42:13

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 07/18] x86/reboot: Disable virtualization during reboot iff callback is registered

On Mon, Mar 13, 2023, Huang, Kai wrote:
> On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > Attempt to disable virtualization during an emergency reboot if and only
> > if there is a registered virt callback, i.e. iff a hypervisor (KVM) is
> > active. If there's no active hypervisor, then the CPU can't be operating
> > with VMX or SVM enabled (barring an egregious bug).
>
> IIUC, this patch is the final one that you want to achieve how the "disable
> virtualization" callback should work in the non-KVM core kernel (the rest
> patches are related to moving VMXOFF code to KVM as the core-kernel now just
> calls the callback, etc). �
>
> There are middle step patches (2-7) to eventually help to get to this point.
> But to be honest, personally, I am not sure whether those patches are necessary,
> i.e. to me they actually cost more time to review since I have to think whether
> such intermediate status is reasonable or not. I am wondering whether we can
> just merge those patches together as single one, so it's easy to see what is the
> final goal to achieve?

I agree that the fine granularity makes it difficult to see the final form, but
from a bisection perspective I really, really want each change to be isolated as
much as possible. This code is extremely difficult, if not practically impossible,
to exhaustively test due to multiple points of entry from "this should never happen!"
types of flows. If any of these changes breaks someones deployment, I want to
make it as easy as possible for that someone to determine exactly what broke.

2023-03-14 00:17:45

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 14/18] KVM: SVM: Check that the current CPU supports SVM in kvm_is_svm_supported()

On Mon, 2023-03-13 at 10:29 -0700, Sean Christopherson wrote:
> On Mon, Mar 13, 2023, Huang, Kai wrote:
> > On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > > Check "this" CPU instead of the boot CPU when querying SVM support so that
> > > the per-CPU checks done during hardware enabling actually function as
> > > intended, i.e. will detect issues where SVM isn't support on all CPUs.
> > >
> > > Disable migration for the use from svm_init() mostly so that the standard
> > > accessors for the per-CPU data can be used without getting yelled at by
> > > CONFIG_DEBUG_PREEMPT=y sanity checks. Preventing the "disabled by BIOS"
> > > error message from reporting the wrong CPU is largely a bonus, as ensuring
> > > a stable CPU during module load is a non-goal for KVM.
> > >
> > > Link: https://lore.kernel.org/all/[email protected]
> > > Cc: Kai Huang <[email protected]>
> > > Cc: Chao Gao <[email protected]>
> > > Signed-off-by: Sean Christopherson <[email protected]>
> >
> > Should we add:
> >
> > Fixes: c82a5c5c53c5 ("KVM: x86: Do compatibility checks when onlining CPU")
> >
> > As that commit introduced using raw_smp_processor_id() to get CPU id in
> > kvm_is_svm_supported() and print the CPU id out in error message?
>
> My vote is to not to add a Fixes because using raw_smp_processor_id() and not disabling
> migration for module probe case was deliberate and is safe. I don't want to give the
> impression that the existing code is functionally broken. The only quirk is that
> the reporting could be misleading.
>
> That said, I'm not against adding a Fixes tag, because I certainly can't argue
> against the reporting being flawed.

Yeah the only issue is the reporting.

And I will leave this to others.

>
> > > ---
> > > arch/x86/kvm/svm/svm.c | 25 +++++++++++++++++++------
> > > 1 file changed, 19 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 2934f185960d..f04b61c3d9d8 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -520,18 +520,20 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
> > > vcpu->arch.osvw.status |= 1;
> > > }
> > >
> > > -static bool kvm_is_svm_supported(void)
> > > +static bool __kvm_is_svm_supported(void)
> > > {
> > > - int cpu = raw_smp_processor_id();
> > > + int cpu = smp_processor_id();
> >
> > Since we have made sure __kvm_is_svm_supported() is always performed on a stable
> > cpu, should we keep using raw_smp_processor_id()? �
> >
> > It is faster than smp_processor_id() when CONFIG_DEBUG_PREEMPT=y, but yes the
> > latter can help to catch bug.
>
> Most kernels with any amount of CONFIG_DEBUG_* options enabled are comically slow
> anyways, I much prefer having the sanity checks than the performance.

Yeah fine to me.

2023-03-14 00:44:02

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 05/18] x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback

On Mon, 2023-03-13 at 10:18 -0700, Sean Christopherson wrote:
> On Mon, Mar 13, 2023, Huang, Kai wrote:
> > On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > > Use the virt callback to disable SVM (and set GIF=1) during an emergency
> > > instead of blindly attempting to disable SVM.� Like the VMX case, if KVM
> > > (or an out-of-tree hypervisor) isn't loaded/active, SVM can't be in use.
> > >
> > > Signed-off-by: Sean Christopherson <[email protected]>
> >
> > [...]
> >
> > > -#if IS_ENABLED(CONFIG_KVM_INTEL)
> > > +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> > > �/* RCU-protected callback to disable virtualization prior to reboot. */
> > > �static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;
> > > �
> > > @@ -821,7 +821,7 @@ int crashing_cpu = -1;
> > > � */
> > > �void cpu_emergency_disable_virtualization(void)
> > > �{
> > > -#if IS_ENABLED(CONFIG_KVM_INTEL)
> > > +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> > > � cpu_emergency_virt_cb *callback;
> > > �
> > > � rcu_read_lock();
> > > @@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
> > > � callback();
> > > � rcu_read_unlock();
> > > �#endif
> > > - /* KVM_AMD doesn't yet utilize the common callback. */
> > > - cpu_emergency_svm_disable();
> > > �}
> >
> > Shouldn't the callback be always present since you want to consider 'out-of-
> > tree' hypervisor case?
>
> No? The kernel doesn't provide any guarantees for out-of-tree code. I don't have
> a super strong preference, though I do like the effective documentation the checks
> provide. Buy more importantly, my understanding is that the x86 maintainers want
> to limit the exposure for these types of interfaces, e.g. `git grep IS_ENABLED\(CONFIG_KVM`
> for a variety of hooks that are enabled iff KVM is enabled in the kernel config.

How about doing the embracing the code to do the emergency virt callback as the
last step?

I like the "cleanup" work in this series regardless whether we should guard the
emergency virt callback with CONFIG_KVM_INTEL || CONFIG_KVM_AMD. If we do the
actual "cleanup" work first, and put the CONFIG_KVM_INTEL || CONFIG_KVM_AMD as
the last step, it is also easier to review I guess, because it's kinda a
separate logic independent to the actual "cleanup" work.

Also, personally I don't particularly like the middle state in patch 04:

void cpu_emergency_disable_virtualization(void)
{
#if IS_ENABLED(CONFIG_KVM_INTEL)
- cpu_crash_vmclear_loaded_vmcss();
-#endif
+ cpu_emergency_virt_cb *callback;

- cpu_emergency_vmxoff();
+ rcu_read_lock();
+ callback = rcu_dereference(cpu_emergency_virt_callback);
+ if (callback)
+ callback();
+ rcu_read_unlock();
+#endif
+ /* KVM_AMD doesn't yet utilize the common callback. */
cpu_emergency_svm_disable();
}

Which eventually got fixed up in patch 05:

void cpu_emergency_disable_virtualization(void)
{
-#if IS_ENABLED(CONFIG_KVM_INTEL)
+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
cpu_emergency_virt_cb *callback;

rcu_read_lock();
@@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
callback();
rcu_read_unlock();
#endif
- /* KVM_AMD doesn't yet utilize the common callback. */
- cpu_emergency_svm_disable();
}

Could we just merge the two patches together?

2023-03-14 00:51:01

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 07/18] x86/reboot: Disable virtualization during reboot iff callback is registered

On Mon, 2023-03-13 at 11:40 -0700, Sean Christopherson wrote:
> On Mon, Mar 13, 2023, Huang, Kai wrote:
> > On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > > Attempt to disable virtualization during an emergency reboot if and only
> > > if there is a registered virt callback, i.e. iff a hypervisor (KVM) is
> > > active. If there's no active hypervisor, then the CPU can't be operating
> > > with VMX or SVM enabled (barring an egregious bug).
> >
> > IIUC, this patch is the final one that you want to achieve how the "disable
> > virtualization" callback should work in the non-KVM core kernel (the rest
> > patches are related to moving VMXOFF code to KVM as the core-kernel now just
> > calls the callback, etc). �
> >
> > There are middle step patches (2-7) to eventually help to get to this point.
> > But to be honest, personally, I am not sure whether those patches are necessary,
> > i.e. to me they actually cost more time to review since I have to think whether
> > such intermediate status is reasonable or not. I am wondering whether we can
> > just merge those patches together as single one, so it's easy to see what is the
> > final goal to achieve?
>
> I agree that the fine granularity makes it difficult to see the final form, but
> from a bisection perspective I really, really want each change to be isolated as
> much as possible. This code is extremely difficult, if not practically impossible,
> to exhaustively test due to multiple points of entry from "this should never happen!"
> types of flows. If any of these changes breaks someones deployment, I want to
> make it as easy as possible for that someone to determine exactly what broke.

Yeah sure.

Yes in general I agree we should make bisection easy to pinpoint the exact code
which breaks something, but I think over-splitting is also unnecessary
especially when code change is small ;)

2023-03-14 01:21:46

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 02/18] x86/reboot: Expose VMCS crash hooks if and only if KVM_INTEL is enabled


>
> > But I am not sure whether we want to do this (having CONFIG_KVM_INTEL around the
> > relevant code). In later patches, you mentioned the case of out-of-tree
> > hypervisor, for instance, below in the changelog of patch 04:
> >
> > There's no need to attempt VMXOFF if KVM (or some other out-of-tree�
> > hypervisor) isn't loaded/active...
> >
> > This means we want to do handle VMCLEAR+VMXOFF in case of out-of-tree hypervisor
> > too. So, shouldn't the hooks always exist but not only available when KVM_INTEL
> > or KVM_AMD is on, so the out-of-tree hypervisor can register their callbacks?
>
> Ah, I see how I confused things with that statement. My intent was only to call
> out that, technically, a non-NULL callback doesn't mean KVM is loaded. I didn't
> intend to sign the kernel up for going out of its way to support out-of-tree hypervisors.

I interpret this as:

Kernel doesn't officially support the out-of-tree hypervisor, but it provides a
callback which the out-of-tree hypervisor can utilize to handle emergency virt
disable. But such callback is only available when KVM is turned on in the
Kconfig.

?

>
> Does it read better if I add a "that piggybacked the callback" qualifier?
>
> There's no need to attempt VMXOFF if KVM (or some other out-of-tree hypervisor
> that piggybacked the callback) isn't loaded/active, i.e. if the CPU can't
> possibly be post-VMXON.

I think so?

But overall I just think having to consider out-of-tree hypervisor (we are
talking about a loadable module, right) only makes thing more confusing. I
guess we can either:

1) Don't mention out-of-tree hypervisor at all. This means kernel doesn't
officially provide mechanisms to support out-of-tree hyperivisor (a module). If
someone wants to do that, then someone takes the risk.

2) The kernel officially provide the callback to handle emergency virt disable
for out-of-tree hypervisor (module) to use. But this callback should be
available when KVM is off.

?

2023-03-15 00:47:50

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 05/18] x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback

On Tue, Mar 14, 2023, Huang, Kai wrote:
> On Mon, 2023-03-13 at 10:18 -0700, Sean Christopherson wrote:
> > On Mon, Mar 13, 2023, Huang, Kai wrote:
> > > On Fri, 2023-03-10 at 13:42 -0800, Sean Christopherson wrote:
> > > > Use the virt callback to disable SVM (and set GIF=1) during an emergency
> > > > instead of blindly attempting to disable SVM.� Like the VMX case, if KVM
> > > > (or an out-of-tree hypervisor) isn't loaded/active, SVM can't be in use.
> > > >
> > > > Signed-off-by: Sean Christopherson <[email protected]>
> > >
> > > [...]
> > >
> > > > -#if IS_ENABLED(CONFIG_KVM_INTEL)
> > > > +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> > > > �/* RCU-protected callback to disable virtualization prior to reboot. */
> > > > �static cpu_emergency_virt_cb __rcu *cpu_emergency_virt_callback;
> > > > �
> > > > @@ -821,7 +821,7 @@ int crashing_cpu = -1;
> > > > � */
> > > > �void cpu_emergency_disable_virtualization(void)
> > > > �{
> > > > -#if IS_ENABLED(CONFIG_KVM_INTEL)
> > > > +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> > > > � cpu_emergency_virt_cb *callback;
> > > > �
> > > > � rcu_read_lock();
> > > > @@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
> > > > � callback();
> > > > � rcu_read_unlock();
> > > > �#endif
> > > > - /* KVM_AMD doesn't yet utilize the common callback. */
> > > > - cpu_emergency_svm_disable();
> > > > �}
> > >
> > > Shouldn't the callback be always present since you want to consider 'out-of-
> > > tree' hypervisor case?
> >
> > No? The kernel doesn't provide any guarantees for out-of-tree code. I don't have
> > a super strong preference, though I do like the effective documentation the checks
> > provide. Buy more importantly, my understanding is that the x86 maintainers want
> > to limit the exposure for these types of interfaces, e.g. `git grep IS_ENABLED\(CONFIG_KVM`
> > for a variety of hooks that are enabled iff KVM is enabled in the kernel config.
>
> How about doing the embracing the code to do the emergency virt callback as the
> last step?

I like that idea, it also makes a few other patches a bit cleaner.

> I like the "cleanup" work in this series regardless whether we should guard the
> emergency virt callback with CONFIG_KVM_INTEL || CONFIG_KVM_AMD. If we do the
> actual "cleanup" work first, and put the CONFIG_KVM_INTEL || CONFIG_KVM_AMD as
> the last step, it is also easier to review I guess, because it's kinda a
> separate logic independent to the actual "cleanup" work.
>
> Also, personally I don't particularly like the middle state in patch 04:
>
> void cpu_emergency_disable_virtualization(void)
> {
> #if IS_ENABLED(CONFIG_KVM_INTEL)
> - cpu_crash_vmclear_loaded_vmcss();
> -#endif
> + cpu_emergency_virt_cb *callback;
>
> - cpu_emergency_vmxoff();
> + rcu_read_lock();
> + callback = rcu_dereference(cpu_emergency_virt_callback);
> + if (callback)
> + callback();
> + rcu_read_unlock();
> +#endif
> + /* KVM_AMD doesn't yet utilize the common callback. */
> cpu_emergency_svm_disable();
> }
>
> Which eventually got fixed up in patch 05:
>
> void cpu_emergency_disable_virtualization(void)
> {
> -#if IS_ENABLED(CONFIG_KVM_INTEL)
> +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> cpu_emergency_virt_cb *callback;
>
> rcu_read_lock();
> @@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
> callback();
> rcu_read_unlock();
> #endif
> - /* KVM_AMD doesn't yet utilize the common callback. */
> - cpu_emergency_svm_disable();
> }
>
> Could we just merge the two patches together?

I'd prefer not to squash the two. I agree it's ugly, but I dislike converting
VMX and SVM at the same time. I'm not totally opposed to moving everything in
one fell swoop, but my preference is to keep them separate.

2023-03-15 01:31:53

by Kai Huang

[permalink] [raw]
Subject: Re: [PATCH v2 05/18] x86/reboot: KVM: Disable SVM during reboot via virt/KVM reboot callback

On Tue, 2023-03-14 at 17:47 -0700, Sean Christopherson wrote:
> > Also, personally I don't particularly like the middle state in patch 04:
> >
> >   void cpu_emergency_disable_virtualization(void)
> >   {
> >   #if IS_ENABLED(CONFIG_KVM_INTEL)
> > - cpu_crash_vmclear_loaded_vmcss();
> > -#endif
> > + cpu_emergency_virt_cb *callback;
> >  
> > - cpu_emergency_vmxoff();
> > + rcu_read_lock();
> > + callback = rcu_dereference(cpu_emergency_virt_callback);
> > + if (callback)
> > + callback();
> > + rcu_read_unlock();
> > +#endif
> > + /* KVM_AMD doesn't yet utilize the common callback. */
> >    cpu_emergency_svm_disable();
> >   }
> >
> > Which eventually got fixed up in patch 05:
> >
> >   void cpu_emergency_disable_virtualization(void)
> >   {
> > -#if IS_ENABLED(CONFIG_KVM_INTEL)
> > +#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
> >    cpu_emergency_virt_cb *callback;
> >  
> >    rcu_read_lock();
> > @@ -830,8 +830,6 @@ void cpu_emergency_disable_virtualization(void)
> >    callback();
> >    rcu_read_unlock();
> >   #endif
> > - /* KVM_AMD doesn't yet utilize the common callback. */
> > - cpu_emergency_svm_disable();
> >   }
> >  
> > Could we just merge the two patches together?
>
> I'd prefer not to squash the two.  I agree it's ugly, but I dislike converting
> VMX and SVM at the same time.  I'm not totally opposed to moving everything in
> one fell swoop, but my preference is to keep them separate.

Sure.