From: Tom Lendacky <[email protected]>
This patch series provides support for running SEV-ES guests under KVM.
Secure Encrypted Virtualization - Encrypted State (SEV-ES) expands on the
SEV support to protect the guest register state from the hypervisor. See
"AMD64 Architecture Programmer's Manual Volume 2: System Programming",
section "15.35 Encrypted State (SEV-ES)" [1].
In order to allow a hypervisor to perform functions on behalf of a guest,
there is architectural support for notifying a guest's operating system
when certain types of VMEXITs are about to occur. This allows the guest to
selectively share information with the hypervisor to satisfy the requested
function. The notification is performed using a new exception, the VMM
Communication exception (#VC). The information is shared through the
Guest-Hypervisor Communication Block (GHCB) using the VMGEXIT instruction.
The GHCB format and the protocol for using it is documented in "SEV-ES
Guest-Hypervisor Communication Block Standardization" [2].
Under SEV-ES, a vCPU save area (VMSA) must be encrypted. SVM is updated to
build the initial VMSA and then encrypt it before running the guest. Once
encrypted, it must not be modified by the hypervisor. Modification of the
VMSA will result in the VMRUN instruction failing with a SHUTDOWN exit
code. KVM must support the VMGEXIT exit code in order to perform the
necessary functions required of the guest. The GHCB is used to exchange
the information needed by both the hypervisor and the guest.
Register data from the GHCB is copied into the KVM register variables and
accessed as usual during handling of the exit. Upon return to the guest,
updated registers are copied back to the GHCB for the guest to act upon.
There are changes to some of the intercepts that are needed under SEV-ES.
For example, CR0 writes cannot be intercepted, so the code needs to ensure
that the intercept is not enabled during execution or that the hypervisor
does not try to read the register as part of exit processing. Another
example is shutdown processing, where the vCPU cannot be directly reset.
Support is added to handle VMGEXIT events and implement the GHCB protocol.
This includes supporting standard exit events, like a CPUID instruction
intercept, to new support, for things like AP processor booting. Much of
the existing SVM intercept support can be re-used by setting the exit
code information from the VMGEXIT and calling the appropriate intercept
handlers.
Finally, to launch and run an SEV-ES guest requires changes to the vCPU
initialization, loading and execution.
[1] https://www.amd.com/system/files/TechDocs/24593.pdf
[2] https://developer.amd.com/wp-content/resources/56421.pdf
---
These patches are based on the KVM queue branch:
https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
dc924b062488 ("KVM: SVM: check CR4 changes against vcpu->arch")
A version of the tree can also be found at:
https://github.com/AMDESE/linux/tree/sev-es-v5
This tree has one addition patch that is not yet part of the queue
tree that is required to run any SEV guest:
[PATCH] KVM: x86: adjust SEV for commit 7e8e6eed75e
https://lore.kernel.org/kvm/[email protected]/
Changes from v4:
- Updated the tracking support for CR0/CR4
Changes from v3:
- Some krobot fixes.
- Some checkpatch cleanups.
Changes from v2:
- Update the freeing of the VMSA page to account for the encrypted memory
cache coherency feature as well as the VM page flush feature.
- Update the GHCB dump function with a bit more detail.
- Don't check for RAX being present as part of a string IO operation.
- Include RSI when syncing from GHCB to support KVM hypercall arguments.
- Add GHCB usage field validation check.
Changes from v1:
- Removed the VMSA indirection support:
- On LAUNCH_UPDATE_VMSA, sync traditional VMSA over to the new SEV-ES
VMSA area to be encrypted.
- On VMGEXIT VMEXIT, directly copy valid registers into vCPU arch
register array from GHCB. On VMRUN (following a VMGEXIT), directly
copy dirty vCPU arch registers to GHCB.
- Removed reg_read_override()/reg_write_override() KVM ops.
- Added VMGEXIT exit-reason validation.
- Changed kvm_vcpu_arch variable vmsa_encrypted to guest_state_protected
- Updated the tracking support for EFER/CR0/CR4/CR8 to minimize changes
to the x86.c code
- Updated __set_sregs to not set any register values (previously supported
setting the tracked values of EFER/CR0/CR4/CR8)
- Added support for reporting SMM capability at the VM-level. This allows
an SEV-ES guest to indicate SMM is not supported
- Updated FPU support to check for a guest FPU save area before using it.
Updated SVM to free guest FPU for an SEV-ES guest during KVM create_vcpu
op.
- Removed changes to the kvm_skip_emulated_instruction()
- Added VMSA validity checks before invoking LAUNCH_UPDATE_VMSA
- Minor code restructuring in areas for better readability
Cc: Paolo Bonzini <[email protected]>
Cc: Jim Mattson <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Vitaly Kuznetsov <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Brijesh Singh <[email protected]>
Tom Lendacky (34):
x86/cpu: Add VM page flush MSR availablility as a CPUID feature
KVM: SVM: Remove the call to sev_platform_status() during setup
KVM: SVM: Add support for SEV-ES capability in KVM
KVM: SVM: Add GHCB accessor functions for retrieving fields
KVM: SVM: Add support for the SEV-ES VMSA
KVM: x86: Mark GPRs dirty when written
KVM: SVM: Add required changes to support intercepts under SEV-ES
KVM: SVM: Prevent debugging under SEV-ES
KVM: SVM: Do not allow instruction emulation under SEV-ES
KVM: SVM: Cannot re-initialize the VMCB after shutdown with SEV-ES
KVM: SVM: Prepare for SEV-ES exit handling in the sev.c file
KVM: SVM: Add initial support for a VMGEXIT VMEXIT
KVM: SVM: Create trace events for VMGEXIT processing
KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x002
KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x004
KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Add support for booting APs for an SEV-ES guest
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support to launch and run an SEV-ES guest
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_host.h | 12 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/svm.h | 40 +-
arch/x86/include/uapi/asm/svm.h | 28 +
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/cpu/vmware.c | 12 +-
arch/x86/kvm/Kconfig | 3 +-
arch/x86/kvm/kvm_cache_regs.h | 51 +-
arch/x86/kvm/svm/sev.c | 933 +++++++++++++++++++++++++++--
arch/x86/kvm/svm/svm.c | 446 +++++++++++---
arch/x86/kvm/svm/svm.h | 166 ++++-
arch/x86/kvm/svm/vmenter.S | 50 ++
arch/x86/kvm/trace.h | 97 +++
arch/x86/kvm/vmx/vmx.c | 6 +-
arch/x86/kvm/x86.c | 344 +++++++++--
arch/x86/kvm/x86.h | 9 +
17 files changed, 1962 insertions(+), 238 deletions(-)
base-commit: dc924b062488a0376aae41d3e0a27dc99f852a5e
--
2.28.0
From: Tom Lendacky <[email protected]>
When performing VMGEXIT processing for an SEV-ES guest, register values
will be synced between KVM and the GHCB. Prepare for detecting when a GPR
has been updated (marked dirty) in order to determine whether to sync the
register to the GHCB.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/kvm_cache_regs.h | 51 ++++++++++++++++++-----------------
1 file changed, 26 insertions(+), 25 deletions(-)
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index a889563ad02d..f15bc16de07c 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -9,6 +9,31 @@
(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
| X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
+static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
+static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
+}
+
+static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+}
+
+static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
+ enum kvm_reg reg)
+{
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
+ __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
+}
+
#define BUILD_KVM_GPR_ACCESSORS(lname, uname) \
static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
{ \
@@ -18,6 +43,7 @@ static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \
unsigned long val) \
{ \
vcpu->arch.regs[VCPU_REGS_##uname] = val; \
+ kvm_register_mark_dirty(vcpu, VCPU_REGS_##uname); \
}
BUILD_KVM_GPR_ACCESSORS(rax, RAX)
BUILD_KVM_GPR_ACCESSORS(rbx, RBX)
@@ -37,31 +63,6 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14)
BUILD_KVM_GPR_ACCESSORS(r15, R15)
#endif
-static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
-}
-
-static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
-}
-
-static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
-}
-
-static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu,
- enum kvm_reg reg)
-{
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail);
- __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
-}
-
static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
{
if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
--
2.28.0
From: Tom Lendacky <[email protected]>
Add support to KVM for determining if a system is capable of supporting
SEV-ES as well as determining if a guest is an SEV-ES guest.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/Kconfig | 3 ++-
arch/x86/kvm/svm/sev.c | 47 ++++++++++++++++++++++++++++++++++--------
arch/x86/kvm/svm/svm.c | 20 +++++++++---------
arch/x86/kvm/svm/svm.h | 17 ++++++++++++++-
4 files changed, 66 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f92dfd8ef10d..7ac592664c52 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -100,7 +100,8 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
help
- Provides support for launching Encrypted VMs on AMD processors.
+ Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
+ with Encrypted State (SEV-ES) on AMD processors.
config KVM_MMU_AUDIT
bool "Audit KVM MMU"
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a4ba5476bf42..9bf5e9dadff5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -932,7 +932,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
struct kvm_sev_cmd sev_cmd;
int r;
- if (!svm_sev_enabled())
+ if (!svm_sev_enabled() || !sev)
return -ENOTTY;
if (!argp)
@@ -1125,29 +1125,58 @@ void sev_vm_destroy(struct kvm *kvm)
sev_asid_free(sev->asid);
}
-int __init sev_hardware_setup(void)
+void __init sev_hardware_setup(void)
{
+ unsigned int eax, ebx, ecx, edx;
+ bool sev_es_supported = false;
+ bool sev_supported = false;
+
+ /* Does the CPU support SEV? */
+ if (!boot_cpu_has(X86_FEATURE_SEV))
+ goto out;
+
+ /* Retrieve SEV CPUID information */
+ cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
+
/* Maximum number of encrypted guests supported simultaneously */
- max_sev_asid = cpuid_ecx(0x8000001F);
+ max_sev_asid = ecx;
if (!svm_sev_enabled())
- return 1;
+ goto out;
/* Minimum ASID value that should be used for SEV guest */
- min_sev_asid = cpuid_edx(0x8000001F);
+ min_sev_asid = edx;
/* Initialize SEV ASID bitmaps */
sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
if (!sev_asid_bitmap)
- return 1;
+ goto out;
sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
if (!sev_reclaim_asid_bitmap)
- return 1;
+ goto out;
- pr_info("SEV supported\n");
+ pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);
+ sev_supported = true;
- return 0;
+ /* SEV-ES support requested? */
+ if (!sev_es)
+ goto out;
+
+ /* Does the CPU support SEV-ES? */
+ if (!boot_cpu_has(X86_FEATURE_SEV_ES))
+ goto out;
+
+ /* Has the system been allocated ASIDs for SEV-ES? */
+ if (min_sev_asid == 1)
+ goto out;
+
+ pr_info("SEV-ES supported: %u ASIDs\n", min_sev_asid - 1);
+ sev_es_supported = true;
+
+out:
+ sev = sev_supported;
+ sev_es = sev_es_supported;
}
void sev_hardware_teardown(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6dc337b9c231..a1ea30c98629 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -187,9 +187,13 @@ static int vgif = true;
module_param(vgif, int, 0444);
/* enable/disable SEV support */
-static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
module_param(sev, int, 0444);
+/* enable/disable SEV-ES support */
+int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param(sev_es, int, 0444);
+
static bool __read_mostly dump_invalid_vmcb = 0;
module_param(dump_invalid_vmcb, bool, 0644);
@@ -959,15 +963,11 @@ static __init int svm_hardware_setup(void)
kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}
- if (sev) {
- if (boot_cpu_has(X86_FEATURE_SEV) &&
- IS_ENABLED(CONFIG_KVM_AMD_SEV)) {
- r = sev_hardware_setup();
- if (r)
- sev = false;
- } else {
- sev = false;
- }
+ if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {
+ sev_hardware_setup();
+ } else {
+ sev = false;
+ sev_es = false;
}
svm_adjust_mmio_mask();
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index fdff76eb6ceb..56d950df82e5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -61,6 +61,7 @@ enum {
struct kvm_sev_info {
bool active; /* SEV enabled guest */
+ bool es_active; /* SEV-ES enabled guest */
unsigned int asid; /* ASID used for this guest */
unsigned int handle; /* SEV firmware handle */
int fd; /* SEV device fd */
@@ -352,6 +353,9 @@ static inline bool gif_set(struct vcpu_svm *svm)
#define MSR_CR3_LONG_MBZ_MASK 0xfff0000000000000U
#define MSR_INVALID 0xffffffffU
+extern int sev;
+extern int sev_es;
+
u32 svm_msrpm_offset(u32 msr);
u32 *svm_vcpu_alloc_msrpm(void);
void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
@@ -484,6 +488,17 @@ static inline bool sev_guest(struct kvm *kvm)
#endif
}
+static inline bool sev_es_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+ struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+ return sev_guest(kvm) && sev->es_active;
+#else
+ return false;
+#endif
+}
+
static inline bool svm_sev_enabled(void)
{
return IS_ENABLED(CONFIG_KVM_AMD_SEV) ? max_sev_asid : 0;
@@ -496,7 +511,7 @@ int svm_register_enc_region(struct kvm *kvm,
int svm_unregister_enc_region(struct kvm *kvm,
struct kvm_enc_region *range);
void pre_sev_run(struct vcpu_svm *svm, int cpu);
-int __init sev_hardware_setup(void);
+void __init sev_hardware_setup(void);
void sev_hardware_teardown(void);
#endif
--
2.28.0
From: Tom Lendacky <[email protected]>
When a guest is running under SEV-ES, the hypervisor cannot access the
guest register state. There are numerous places in the KVM code where
certain registers are accessed that are not allowed to be accessed (e.g.
RIP, CR0, etc). Add checks to prevent register accesses and add intercept
update support at various points within the KVM code.
Also, when handling a VMGEXIT, exceptions are passed back through the
GHCB. Since the RDMSR/WRMSR intercepts (may) inject a #GP on error,
update the SVM intercepts to handle this for SEV-ES guests.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/svm.h | 3 +-
arch/x86/kvm/svm/svm.c | 111 +++++++++++++++++++++++++++++++++----
arch/x86/kvm/x86.c | 6 +-
3 files changed, 107 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1edf24f51b53..bce28482d63d 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -178,7 +178,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define LBR_CTL_ENABLE_MASK BIT_ULL(0)
#define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
-#define SVM_INTERRUPT_SHADOW_MASK 1
+#define SVM_INTERRUPT_SHADOW_MASK BIT_ULL(0)
+#define SVM_GUEST_INTERRUPT_MASK BIT_ULL(1)
#define SVM_IOIO_STR_SHIFT 2
#define SVM_IOIO_REP_SHIFT 3
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd4c9884e5a8..857d0d3f2752 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -36,6 +36,7 @@
#include <asm/mce.h>
#include <asm/spec-ctrl.h>
#include <asm/cpu_device_id.h>
+#include <asm/traps.h>
#include <asm/virtext.h>
#include "trace.h"
@@ -340,6 +341,13 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ /*
+ * SEV-ES does not expose the next RIP. The RIP update is controlled by
+ * the type of exit and the #VC handler in the guest.
+ */
+ if (sev_es_guest(vcpu->kvm))
+ goto done;
+
if (nrips && svm->vmcb->control.next_rip != 0) {
WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
svm->next_rip = svm->vmcb->control.next_rip;
@@ -351,6 +359,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
} else {
kvm_rip_write(vcpu, svm->next_rip);
}
+
+done:
svm_set_interrupt_shadow(vcpu, 0);
return 1;
@@ -1652,9 +1662,18 @@ static void svm_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt)
static void update_cr0_intercept(struct vcpu_svm *svm)
{
- ulong gcr0 = svm->vcpu.arch.cr0;
- u64 *hcr0 = &svm->vmcb->save.cr0;
+ ulong gcr0;
+ u64 *hcr0;
+
+ /*
+ * SEV-ES guests must always keep the CR intercepts cleared. CR
+ * tracking is done using the CR write traps.
+ */
+ if (sev_es_guest(svm->vcpu.kvm))
+ return;
+ gcr0 = svm->vcpu.arch.cr0;
+ hcr0 = &svm->vmcb->save.cr0;
*hcr0 = (*hcr0 & ~SVM_CR0_SELECTIVE_MASK)
| (gcr0 & SVM_CR0_SELECTIVE_MASK);
@@ -1674,7 +1693,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
struct vcpu_svm *svm = to_svm(vcpu);
#ifdef CONFIG_X86_64
- if (vcpu->arch.efer & EFER_LME) {
+ if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) {
if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
vcpu->arch.efer |= EFER_LMA;
svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
@@ -2608,7 +2627,29 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
static int rdmsr_interception(struct vcpu_svm *svm)
{
- return kvm_emulate_rdmsr(&svm->vcpu);
+ u32 ecx;
+ u64 data;
+
+ if (!sev_es_guest(svm->vcpu.kvm))
+ return kvm_emulate_rdmsr(&svm->vcpu);
+
+ ecx = kvm_rcx_read(&svm->vcpu);
+ if (kvm_get_msr(&svm->vcpu, ecx, &data)) {
+ trace_kvm_msr_read_ex(ecx);
+ ghcb_set_sw_exit_info_1(svm->ghcb, 1);
+ ghcb_set_sw_exit_info_2(svm->ghcb,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ return 1;
+ }
+
+ trace_kvm_msr_read(ecx, data);
+
+ kvm_rax_write(&svm->vcpu, data & -1u);
+ kvm_rdx_write(&svm->vcpu, (data >> 32) & -1u);
+
+ return kvm_skip_emulated_instruction(&svm->vcpu);
}
static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
@@ -2797,7 +2838,27 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
static int wrmsr_interception(struct vcpu_svm *svm)
{
- return kvm_emulate_wrmsr(&svm->vcpu);
+ u32 ecx;
+ u64 data;
+
+ if (!sev_es_guest(svm->vcpu.kvm))
+ return kvm_emulate_wrmsr(&svm->vcpu);
+
+ ecx = kvm_rcx_read(&svm->vcpu);
+ data = kvm_read_edx_eax(&svm->vcpu);
+ if (kvm_set_msr(&svm->vcpu, ecx, data)) {
+ trace_kvm_msr_write_ex(ecx, data);
+ ghcb_set_sw_exit_info_1(svm->ghcb, 1);
+ ghcb_set_sw_exit_info_2(svm->ghcb,
+ X86_TRAP_GP |
+ SVM_EVTINJ_TYPE_EXEPT |
+ SVM_EVTINJ_VALID);
+ return 1;
+ }
+
+ trace_kvm_msr_write(ecx, data);
+
+ return kvm_skip_emulated_instruction(&svm->vcpu);
}
static int msr_interception(struct vcpu_svm *svm)
@@ -2827,7 +2888,14 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
static int pause_interception(struct vcpu_svm *svm)
{
struct kvm_vcpu *vcpu = &svm->vcpu;
- bool in_kernel = (svm_get_cpl(vcpu) == 0);
+ bool in_kernel;
+
+ /*
+ * CPL is not made available for an SEV-ES guest, so just set in_kernel
+ * to true.
+ */
+ in_kernel = (sev_es_guest(svm->vcpu.kvm)) ? true
+ : (svm_get_cpl(vcpu) == 0);
if (!kvm_pause_in_guest(vcpu->kvm))
grow_ple_window(vcpu);
@@ -3090,10 +3158,13 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM);
- if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
- vcpu->arch.cr0 = svm->vmcb->save.cr0;
- if (npt_enabled)
- vcpu->arch.cr3 = svm->vmcb->save.cr3;
+ /* SEV-ES guests must use the CR write traps to track CR registers. */
+ if (!sev_es_guest(vcpu->kvm)) {
+ if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE))
+ vcpu->arch.cr0 = svm->vmcb->save.cr0;
+ if (npt_enabled)
+ vcpu->arch.cr3 = svm->vmcb->save.cr3;
+ }
if (is_guest_mode(vcpu)) {
int vmexit;
@@ -3205,6 +3276,13 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ /*
+ * SEV-ES guests must always keep the CR intercepts cleared. CR
+ * tracking is done using the CR write traps.
+ */
+ if (sev_es_guest(vcpu->kvm))
+ return;
+
if (nested_svm_virtualize_tpr(vcpu))
return;
@@ -3273,6 +3351,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
+ /*
+ * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
+ * bit to determine the state of the IF flag.
+ */
+ if (sev_es_guest(svm->vcpu.kvm))
+ return !(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK);
+
if (!gif_set(svm))
return true;
@@ -3458,6 +3543,12 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
svm->vcpu.arch.nmi_injected = true;
break;
case SVM_EXITINTINFO_TYPE_EXEPT:
+ /*
+ * Never re-inject a #VC exception.
+ */
+ if (vector == X86_TRAP_VC)
+ break;
+
/*
* In case of software exceptions, do not reinject the vector,
* but re-execute the instruction instead. Rewind RIP first
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a3fdc16cfd6f..b6809a2851d2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4018,7 +4018,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
int idx;
- if (vcpu->preempted)
+ if (vcpu->preempted && !vcpu->arch.guest_state_protected)
vcpu->arch.preempted_in_kernel = !kvm_x86_ops.get_cpl(vcpu);
/*
@@ -8161,7 +8161,9 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
{
struct kvm_run *kvm_run = vcpu->run;
- kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
+ kvm_run->if_flag = (vcpu->arch.guest_state_protected)
+ ? kvm_arch_interrupt_allowed(vcpu)
+ : (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;
kvm_run->cr8 = kvm_get_cr8(vcpu);
kvm_run->apic_base = kvm_get_apic_base(vcpu);
--
2.28.0
From: Tom Lendacky <[email protected]>
Update the GHCB accessor functions to add functions for retrieve GHCB
fields by name. Update existing code to use the new accessor functions.
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/svm.h | 10 ++++++++++
arch/x86/kernel/cpu/vmware.c | 12 ++++++------
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 71d630bb5e08..1edf24f51b53 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -379,6 +379,16 @@ struct vmcb {
(unsigned long *)&ghcb->save.valid_bitmap); \
} \
\
+ static inline u64 ghcb_get_##field(struct ghcb *ghcb) \
+ { \
+ return ghcb->save.field; \
+ } \
+ \
+ static inline u64 ghcb_get_##field##_if_valid(struct ghcb *ghcb) \
+ { \
+ return ghcb_##field##_is_valid(ghcb) ? ghcb->save.field : 0; \
+ } \
+ \
static inline void ghcb_set_##field(struct ghcb *ghcb, u64 value) \
{ \
__set_bit(GHCB_BITMAP_IDX(field), \
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 924571fe5864..c6ede3b3d302 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -501,12 +501,12 @@ static bool vmware_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
ghcb_rbp_is_valid(ghcb)))
return false;
- regs->bx = ghcb->save.rbx;
- regs->cx = ghcb->save.rcx;
- regs->dx = ghcb->save.rdx;
- regs->si = ghcb->save.rsi;
- regs->di = ghcb->save.rdi;
- regs->bp = ghcb->save.rbp;
+ regs->bx = ghcb_get_rbx(ghcb);
+ regs->cx = ghcb_get_rcx(ghcb);
+ regs->dx = ghcb_get_rdx(ghcb);
+ regs->si = ghcb_get_rsi(ghcb);
+ regs->di = ghcb_get_rdi(ghcb);
+ regs->bp = ghcb_get_rbp(ghcb);
return true;
}
--
2.28.0
From: Tom Lendacky <[email protected]>
Allocate a page during vCPU creation to be used as the encrypted VM save
area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch
structure that indicates whether the guest state is protected.
When freeing a VMSA page that has been encrypted, the cache contents must
be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page.
[ i386 build warnings ]
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 ++
arch/x86/kvm/svm/sev.c | 67 +++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 24 +++++++++++-
arch/x86/kvm/svm/svm.h | 5 +++
4 files changed, 97 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f002cdb13a0b..8cf6b0493d49 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,6 +805,9 @@ struct kvm_vcpu_arch {
*/
bool enforce;
} pv_cpuid;
+
+ /* Protected Guests */
+ bool guest_state_protected;
};
struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9bf5e9dadff5..fb4a411f7550 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -14,6 +14,7 @@
#include <linux/psp-sev.h>
#include <linux/pagemap.h>
#include <linux/swap.h>
+#include <linux/processor.h>
#include "x86.h"
#include "svm.h"
@@ -1190,6 +1191,72 @@ void sev_hardware_teardown(void)
sev_flush_asids();
}
+/*
+ * Pages used by hardware to hold guest encrypted state must be flushed before
+ * returning them to the system.
+ */
+static void sev_flush_guest_memory(struct vcpu_svm *svm, void *va,
+ unsigned long len)
+{
+ /*
+ * If hardware enforced cache coherency for encrypted mappings of the
+ * same physical page is supported, nothing to do.
+ */
+ if (boot_cpu_has(X86_FEATURE_SME_COHERENT))
+ return;
+
+ /*
+ * If the VM Page Flush MSR is supported, use it to flush the page
+ * (using the page virtual address and the guest ASID).
+ */
+ if (boot_cpu_has(X86_FEATURE_VM_PAGE_FLUSH)) {
+ struct kvm_sev_info *sev;
+ unsigned long va_start;
+ u64 start, stop;
+
+ /* Align start and stop to page boundaries. */
+ va_start = (unsigned long)va;
+ start = (u64)va_start & PAGE_MASK;
+ stop = PAGE_ALIGN((u64)va_start + len);
+
+ if (start < stop) {
+ sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+
+ while (start < stop) {
+ wrmsrl(MSR_AMD64_VM_PAGE_FLUSH,
+ start | sev->asid);
+
+ start += PAGE_SIZE;
+ }
+
+ return;
+ }
+
+ WARN(1, "Address overflow, using WBINVD\n");
+ }
+
+ /*
+ * Hardware should always have one of the above features,
+ * but if not, use WBINVD and issue a warning.
+ */
+ WARN_ONCE(1, "Using WBINVD to flush guest memory\n");
+ wbinvd_on_all_cpus();
+}
+
+void sev_free_vcpu(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm;
+
+ if (!sev_es_guest(vcpu->kvm))
+ return;
+
+ svm = to_svm(vcpu);
+
+ if (vcpu->arch.guest_state_protected)
+ sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
+ __free_page(virt_to_page(svm->vmsa));
+}
+
void pre_sev_run(struct vcpu_svm *svm, int cpu)
{
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a1ea30c98629..cd4c9884e5a8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1289,6 +1289,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm;
struct page *vmcb_page;
+ struct page *vmsa_page = NULL;
int err;
BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
@@ -1299,9 +1300,19 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
if (!vmcb_page)
goto out;
+ if (sev_es_guest(svm->vcpu.kvm)) {
+ /*
+ * SEV-ES guests require a separate VMSA page used to contain
+ * the encrypted register state of the guest.
+ */
+ vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!vmsa_page)
+ goto error_free_vmcb_page;
+ }
+
err = avic_init_vcpu(svm);
if (err)
- goto error_free_vmcb_page;
+ goto error_free_vmsa_page;
/* We initialize this flag to true to make sure that the is_running
* bit would be set the first time the vcpu is loaded.
@@ -1311,12 +1322,16 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
svm->msrpm = svm_vcpu_alloc_msrpm();
if (!svm->msrpm)
- goto error_free_vmcb_page;
+ goto error_free_vmsa_page;
svm_vcpu_init_msrpm(vcpu, svm->msrpm);
svm->vmcb = page_address(vmcb_page);
svm->vmcb_pa = __sme_set(page_to_pfn(vmcb_page) << PAGE_SHIFT);
+
+ if (vmsa_page)
+ svm->vmsa = page_address(vmsa_page);
+
svm->asid_generation = 0;
init_vmcb(svm);
@@ -1325,6 +1340,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
return 0;
+error_free_vmsa_page:
+ if (vmsa_page)
+ __free_page(vmsa_page);
error_free_vmcb_page:
__free_page(vmcb_page);
out:
@@ -1352,6 +1370,8 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
svm_free_nested(svm);
+ sev_free_vcpu(vcpu);
+
__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 56d950df82e5..80a359f3cf20 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -168,6 +168,10 @@ struct vcpu_svm {
DECLARE_BITMAP(read, MAX_DIRECT_ACCESS_MSRS);
DECLARE_BITMAP(write, MAX_DIRECT_ACCESS_MSRS);
} shadow_msr_intercept;
+
+ /* SEV-ES support */
+ struct vmcb_save_area *vmsa;
+ struct ghcb *ghcb;
};
struct svm_cpu_data {
@@ -513,5 +517,6 @@ int svm_unregister_enc_region(struct kvm *kvm,
void pre_sev_run(struct vcpu_svm *svm, int cpu);
void __init sev_hardware_setup(void);
void sev_hardware_teardown(void);
+void sev_free_vcpu(struct kvm_vcpu *vcpu);
#endif
--
2.28.0
On 12/10/20 11:06 AM, Tom Lendacky wrote:
> From: Tom Lendacky <[email protected]>
>
> This patch series provides support for running SEV-ES guests under KVM.
I cut the first send of this series short and resent it with a corrected
email address for Sean (since he is copied on all the patches), so please
look at the subsequent submission.
Sorry about that.
Tom
From: Tom Lendacky <[email protected]>
When both KVM support and the CCP driver are built into the kernel instead
of as modules, KVM initialization can happen before CCP initialization. As
a result, sev_platform_status() will return a failure when it is called
from sev_hardware_setup(), when this isn't really an error condition.
Since sev_platform_status() doesn't need to be called at this time anyway,
remove the invocation from sev_hardware_setup().
Signed-off-by: Tom Lendacky <[email protected]>
---
arch/x86/kvm/svm/sev.c | 22 +---------------------
1 file changed, 1 insertion(+), 21 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c0b14106258a..a4ba5476bf42 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1127,9 +1127,6 @@ void sev_vm_destroy(struct kvm *kvm)
int __init sev_hardware_setup(void)
{
- struct sev_user_data_status *status;
- int rc;
-
/* Maximum number of encrypted guests supported simultaneously */
max_sev_asid = cpuid_ecx(0x8000001F);
@@ -1148,26 +1145,9 @@ int __init sev_hardware_setup(void)
if (!sev_reclaim_asid_bitmap)
return 1;
- status = kmalloc(sizeof(*status), GFP_KERNEL);
- if (!status)
- return 1;
-
- /*
- * Check SEV platform status.
- *
- * PLATFORM_STATUS can be called in any state, if we failed to query
- * the PLATFORM status then either PSP firmware does not support SEV
- * feature or SEV firmware is dead.
- */
- rc = sev_platform_status(status, NULL);
- if (rc)
- goto err;
-
pr_info("SEV supported\n");
-err:
- kfree(status);
- return rc;
+ return 0;
}
void sev_hardware_teardown(void)
--
2.28.0