2023-08-03 08:51:02

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 00/19] Enable CET Virtualization

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
A shadow stack is a second stack used exclusively for control transfer
operations. The shadow stack is separate from the data/normal stack and
can be enabled individually in user and kernel mode. When shadow stack
is enabled, CALL pushes the return address on both the data and shadow
stack. RET pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor
generates a #CP.

Indirect Branch Tracking (IBT):
IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of
indirect branches (CALL, JMP etc...). If an indirect branch is executed
and the next instruction is _not_ an ENDBRANCH, the processor generates a
#CP. These instruction behaves as a NOP on platforms that doesn't support
CET.


Dependency:
--------------------------------------------------------------------------
The first 2 patches are taken over from CET native series[1] in kernel tip.
They're prerequisites for this KVM patch series as CET user mode xstate and
some feature bits are defined in the patches. Add this KVM series to kernel
tree to build qualified host kernel to support guest CET features. Also apply
QEMU CET enabling patches[2] to build qualified QEMU. These kernel dependent
patches will be enclosed in KVM series until CET native series is merged in
mainline tree.


Implementation:
--------------------------------------------------------------------------
This series enables full support for guest CET SHSTK/IBT register states,
i.e., CET register states in below usage models are backed by KVM.

|
User SHSTK | User IBT (user mode)
--------------------------------------------------
Kernel SHSTK | Kernel IBT (kernel mode)
|

KVM cooperates with host kernel to back CET register states in each model.
In this series, KVM manages guest CET kernel registers(MSRs) by itself and
relies on host kernel to manage the user mode registers, thus KVM relies on
capability from host XSS MSR before exposes CET features to guest.

Note, guest supervisor(kernel) SHSTK cannot be fully supported by this series,
therefore guest SSS_CET bit of CPUID(0x7,1):EDX[bit18] is cleared. Check SDM
(Vol 1, Section 17.2.3) for details.


CET states management:
--------------------------------------------------------------------------
CET user mode states, MSR_IA32_{U_CET,PL3_SSP}, depends on {XSAVES,XRSTORS}
instructions to swap guest and host's states. On vmexit, guest user states
are saved to guest fpu area and host user mode states are loaded from thread
context before vCPU returns to userspace, vice-versa on vmentry. See details
in kvm_{load,put}_guest_fpu(). So CET user mode states management depends on
CET user mode bit(U_CET bit) set in host XSS MSR.

CET supervisor mode states are grouped into two categories : XSAVE-managed
and non-XSAVE-managed, the former includes MSR_IA32_PL{0,1,2}_SSP and are
controlled by CET supervisor mode bit(S_CET bit) in XSS, the later consists
of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL.

The XSAVE-managed supervisor states theoretically can be handled by enabling
S_CET bit in host XSS. But given the fact supervisor shadow stack isn't enabled
in Linux kernel, enabling the control bit just like that for user mode states
has global side-effects to all threads/tasks running on host, i.e.:
1) Introducing unnecessary XSAVE operation when switch context between non-vCPU
userspace within current FPU framework.
2)Forcing allocating additional space for CET supervisor states in each thread
context regardless whether it's vCPU thread or not.

To avoid these downsides, this series provides a KVM solution to save/reload
vCPU's supervisor SHSTK states.

VMX introduces new VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, to
facilitate guest/host non-XSAVES-managed states. When VMX CET entry/exit load
bits are set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from
equivalent fields at vm-exit/entry. With these new fields, such supervisor states
require no addtional KVM save/reload actions.


Tests:
--------------------------------------------------------------------------
This series passed basic CET user shadow stack test and kernel IBT test in L1 and
L2 guest.

The patch series _has_ impact to existing vmx test cases in KVM-unit-tests,the
failures have been fixed in this patch[3].

All other parts of KVM unit-tests and selftests passed with this series. One new
selftest app for CET MSRs is posted here[4].

Note, this series hasn't been tested on AMD platform yet.

To run user SHSTK test and kernel IBT test in guest , an CET capable
platform is required, e.g., Sapphire Rapids server, and follow below steps to
build host/guest kernel properly:

1. Build host kernel: Add this series to kernel tree and build kernel.

2. Build guest kernel: Add full CET _native_ series to kernel tree and opt-in
CONFIG_X86_KERNEL_IBT and CONFIG_X86_USER_SHADOW_STACK options. Build with CET
enabled gcc versions(>= 8.5.0).

3. Use patched QEMU to launch a guest.

Check kernel selftest test_shadow_stack_64 output:

[INFO] new_ssp = 7f8c82100ff8, *new_ssp = 7f8c82101001
[INFO] changing ssp from 7f8c82900ff0 to 7f8c82100ff8
[INFO] ssp is now 7f8c82101000
[OK] Shadow stack pivot
[OK] Shadow stack faults
[INFO] Corrupting shadow stack
[INFO] Generated shadow stack violation successfully
[OK] Shadow stack violation test
[INFO] Gup read -> shstk access success
[INFO] Gup write -> shstk access success
[INFO] Violation from normal write
[INFO] Gup read -> write access success
[INFO] Violation from normal write
[INFO] Gup write -> write access success
[INFO] Cow gup write -> write access success
[OK] Shadow gup test
[INFO] Violation from shstk access
[OK] mprotect() test
[SKIP] Userfaultfd unavailable.
[OK] 32 bit test


Check kernel IBT with dmesg | grep CET:

CET detected: Indirect Branch Tracking enabled

--------------------------------------------------------------------------
Changes in v5:
1. Consolidated CET MSR access code into one patch to make it clearer. [Chao]
2. Refined SSP handling when enter/exit SMM mode. [Chao]
3. Refined CET MSR interception to make it adaptive to enable/disable cases. [Chao]
4. Refined guest PL{0,1,2}_SSP handling to comply with exiting code logic. [Chao]
5. Other tweaks per Sean and Chao's feedback.
6. Rebased to: https://github.com/kvm-x86/linux/tree/next tag: kvm-x86-next-2023.07.28


[1]: CET native series: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/shstk
[2]: QEMU patch: https://lore.kernel.org/all/[email protected]/
[3]: KVM-unit-tests fixup: https://lore.kernel.org/all/[email protected]/
[4]: Selftests for CET MSRs: https://lore.kernel.org/all/[email protected]/
[5]: v4 patchset: https://lore.kernel.org/all/[email protected]/


Patch 1-2: Native dependent CET patches.
Patch 3-5: Enable XSS support in KVM.
Patch 6 : Prepare patch for XSAVE-managed MSR access
Patch 7-9: Common patches to support CET on x86.
Patch 10-11: Emulate CET MSR access.
Patch 12: Handle SSP at entry/exit to SMM.
Patch 13-17: Add CET virtualization settings.
Patch 18-19: nVMX patches for CET support in nested VM.


Rick Edgecombe (2):
x86/cpufeatures: Add CPU feature flags for shadow stacks
x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

Sean Christopherson (2):
KVM:x86: Report XSS as to-be-saved if there are supported features
KVM:x86: Load guest FPU state when access XSAVE-managed MSRs

Yang Weijiang (15):
KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
KVM:x86: Initialize kvm_caps.supported_xss
KVM:x86: Add fault checks for guest CR4.CET setting
KVM:x86: Report KVM supported CET MSRs as to-be-saved
KVM:x86: Make guest supervisor states as non-XSAVE managed
KVM:VMX: Introduce CET VMCS fields and control bits
KVM:VMX: Emulate read and write to CET MSRs
KVM:x86: Save and reload SSP to/from SMRAM
KVM:VMX: Set up interception for CET MSRs
KVM:VMX: Set host constant supervisor states to VMCS fields
KVM:x86: Optimize CET supervisor SSP save/reload
KVM:x86: Enable CET virtualization for VMX and advertise to userspace
KVM:x86: Enable guest CET supervisor xstate bit support
KVM:nVMX: Refine error code injection to nested VM
KVM:nVMX: Enable CET support for nested VM

arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/fpu/types.h | 16 +-
arch/x86/include/asm/fpu/xstate.h | 6 +-
arch/x86/include/asm/kvm_host.h | 6 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 8 +
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/fpu/xstate.c | 90 ++++-----
arch/x86/kvm/cpuid.c | 32 ++-
arch/x86/kvm/cpuid.h | 11 ++
arch/x86/kvm/smm.c | 11 ++
arch/x86/kvm/smm.h | 2 +-
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/kvm/vmx/capabilities.h | 10 +
arch/x86/kvm/vmx/nested.c | 49 ++++-
arch/x86/kvm/vmx/nested.h | 7 +
arch/x86/kvm/vmx/vmcs12.c | 6 +
arch/x86/kvm/vmx/vmcs12.h | 14 +-
arch/x86/kvm/vmx/vmx.c | 133 ++++++++++++-
arch/x86/kvm/vmx/vmx.h | 6 +-
arch/x86/kvm/x86.c | 242 +++++++++++++++++++++--
arch/x86/kvm/x86.h | 35 ++++
24 files changed, 614 insertions(+), 85 deletions(-)


base-commit: d406b457840171306ada37400e4f3d3c6f0f4960
--
2.27.0



2023-08-03 09:02:41

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM

Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver
a hardware exception with or without an error code, regardless
of vector"

Modify has_error_code check before inject events to nested guest.
Only enforce the check when guest is in real mode, the exception
is not hard exception and the platform doesn't enumerate bit56
in VMX_BASIC, otherwise ignore it.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++--------
arch/x86/kvm/vmx/nested.h | 7 +++++++
2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 516391cc0d64..9bcd989252f7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1205,9 +1205,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
{
const u64 feature_and_reserved =
/* feature (except bit 48; see below) */
- BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
+ BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
/* reserved */
- BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+ BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
u64 vmx_basic = vmcs_config.nested.basic;

if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
@@ -2846,12 +2846,16 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
return -EINVAL;

- /* VM-entry interruption-info field: deliver error code */
- should_have_error_code =
- intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
- x86_exception_has_error_code(vector);
- if (CC(has_error_code != should_have_error_code))
- return -EINVAL;
+ if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
+ !nested_cpu_has_no_hw_errcode(vcpu)) {
+ /* VM-entry interruption-info field: deliver error code */
+ should_have_error_code =
+ intr_type == INTR_TYPE_HARD_EXCEPTION &&
+ prot_mode &&
+ x86_exception_has_error_code(vector);
+ if (CC(has_error_code != should_have_error_code))
+ return -EINVAL;
+ }

/* VM-entry exception error code */
if (CC(has_error_code &&
@@ -6967,6 +6971,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)

if (cpu_has_vmx_basic_inout())
msrs->basic |= VMX_BASIC_INOUT;
+ if (cpu_has_vmx_basic_no_hw_errcode())
+ msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE;
}

static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 96952263b029..1884628294e4 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
__kvm_is_valid_cr4(vcpu, val);
}

+static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE;
+}
+
/* No difference in the restrictions on guest and host CR4 in VMX operation. */
#define nested_guest_cr4_valid nested_cr4_valid
#define nested_host_cr4_valid nested_cr4_valid
--
2.27.0


2023-08-03 09:31:55

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 01/19] x86/cpufeatures: Add CPU feature flags for shadow stacks

From: Rick Edgecombe <[email protected]>

The Control-Flow Enforcement Technology contains two related features,
one of which is Shadow Stacks. Future patches will utilize this feature
for shadow stack support in KVM, so add a CPU feature flags for Shadow
Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]).

To protect shadow stack state from malicious modification, the registers
are only accessible in supervisor mode. This implementation
context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend
on XSAVES.

The shadow stack feature, enumerated by the CPUID bit described above,
encompasses both supervisor and userspace support for shadow stack. In
near future patches, only userspace shadow stack will be enabled. In
expectation of future supervisor shadow stack support, create a software
CPU capability to enumerate kernel utilization of userspace shadow stack
support. This user shadow stack bit should depend on the HW "shstk"
capability and that logic will be implemented in future patches.

Co-developed-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-9-rick.p.edgecombe%40intel.com
---
arch/x86/include/asm/cpufeatures.h | 2 ++
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 31c862d79fae..8ea5c290259c 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
#define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
#define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */
#define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */
+#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
@@ -380,6 +381,7 @@
#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */
#define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
+#define X86_FEATURE_SHSTK (16*32+ 7) /* "" Shadow stack */
#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */
#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */
#define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..b9c7eae2e70f 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -105,6 +105,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_X86_USER_SHADOW_STACK
+#define DISABLE_USER_SHSTK 0
+#else
+#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -120,7 +126,7 @@
#define DISABLED_MASK9 (DISABLE_SGX)
#define DISABLED_MASK10 0
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
- DISABLE_CALL_DEPTH_TRACKING)
+ DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
#define DISABLED_MASK12 (DISABLE_LAM)
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..e462c1d3800a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XFD, X86_FEATURE_XSAVES },
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
+ { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{}
};

--
2.27.0


2023-08-03 10:00:57

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields

Set constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} VMCS
fields explicitly. Kernel IBT is supported and the setting in
MSR_IA32_S_CET is static after post-boot(except is BIOS call
case but vCPU thread never across it.), i.e. KVM doesn't need
to refresh HOST_S_CET field before every VM-Enter/VM-Exit
sequence.

Host supervisor shadow stack is not enabled now and SSP is not
accessible to kernel mode, thus it's safe to set host IA32_INT_
SSP_TAB/SSP VMCS fields to 0s. When shadow stack is enabled for
CPL3, SSP is reloaded from IA32_PL3_SSP before it exits to userspace.
Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/SYSRET/SYSENTER SYSEXIT/
RDSSP/CALL etc.

Prevent KVM module loading and if host supervisor shadow stack
SHSTK_EN is set in MSR_IA32_S_CET as KVM cannot co-exit with it
correctly.

Suggested-by: Sean Christopherson <[email protected]>
Suggested-by: Chao Gao <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 4 ++++
arch/x86/kvm/vmx/vmx.c | 15 +++++++++++++++
arch/x86/kvm/x86.c | 14 ++++++++++++++
arch/x86/kvm/x86.h | 1 +
4 files changed, 34 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d0abee35d7ba..b1883f6c08eb 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
}

+static inline bool cpu_has_load_cet_ctrl(void)
+{
+ return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
static inline bool cpu_has_vmx_mpx(void)
{
return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6779b8a63789..99bf63b2a779 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4341,6 +4341,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)

if (cpu_has_load_ia32_efer())
vmcs_write64(HOST_IA32_EFER, host_efer);
+
+ /*
+ * Supervisor shadow stack is not enabled on host side, i.e.,
+ * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
+ * description(RDSSP instruction), SSP is not readable in CPL0,
+ * so resetting the two registers to 0s at VM-Exit does no harm
+ * to kernel execution. When execution flow exits to userspace,
+ * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
+ * 3 and 4 for details.
+ */
+ if (cpu_has_load_cet_ctrl()) {
+ vmcs_writel(HOST_S_CET, host_s_cet);
+ vmcs_writel(HOST_SSP, 0);
+ vmcs_writel(HOST_INTR_SSP_TABLE, 0);
+ }
}

void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56aa5a3d3913..01b4f10fa8ab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -113,6 +113,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#endif

static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
+u64 __read_mostly host_s_cet;
+EXPORT_SYMBOL_GPL(host_s_cet);

#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)

@@ -9615,6 +9617,18 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
return -EIO;
}

+ if (boot_cpu_has(X86_FEATURE_SHSTK)) {
+ rdmsrl(MSR_IA32_S_CET, host_s_cet);
+ /*
+ * Linux doesn't yet support supervisor shadow stacks (SSS), so
+ * KVM doesn't save/restore the associated MSRs, i.e. KVM may
+ * clobber the host values. Yell and refuse to load if SSS is
+ * unexpectedly enabled, e.g. to avoid crashing the host.
+ */
+ if (WARN_ON_ONCE(host_s_cet & CET_SHSTK_EN))
+ return -EIO;
+ }
+
x86_emulator_cache = kvm_alloc_emulator_cache();
if (!x86_emulator_cache) {
pr_err("failed to allocate cache for x86 emulator\n");
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 3b79d6db2f83..e42e5263fcf7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -323,6 +323,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);

extern u64 host_xcr0;
extern u64 host_xss;
+extern u64 host_s_cet;

extern struct kvm_caps kvm_caps;

--
2.27.0


2023-08-03 10:03:05

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

Add emulation interface for CET MSR read and write.
The emulation code is split into common part and vendor specific
part, the former resides in x86.c to benefic different x86 CPU
vendors, the latter for VMX is implemented in this patch.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 27 +++++++++++
arch/x86/kvm/x86.c | 104 +++++++++++++++++++++++++++++++++++++----
arch/x86/kvm/x86.h | 18 +++++++
3 files changed, 141 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6aa76124e81e..ccf750e79608 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
+ case MSR_IA32_S_CET:
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_INT_SSP_TAB:
+ if (kvm_get_msr_common(vcpu, msr_info))
+ return 1;
+ if (msr_info->index == MSR_KVM_GUEST_SSP)
+ msr_info->data = vmcs_readl(GUEST_SSP);
+ else if (msr_info->index == MSR_IA32_S_CET)
+ msr_info->data = vmcs_readl(GUEST_S_CET);
+ else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
+ msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+ break;
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
@@ -2404,6 +2416,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
+ case MSR_IA32_S_CET:
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_INT_SSP_TAB:
+ if (kvm_set_msr_common(vcpu, msr_info))
+ return 1;
+ if (msr_index == MSR_KVM_GUEST_SSP)
+ vmcs_writel(GUEST_SSP, data);
+ else if (msr_index == MSR_IA32_S_CET)
+ vmcs_writel(GUEST_S_CET, data);
+ else if (msr_index == MSR_IA32_INT_SSP_TAB)
+ vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+ break;
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;
@@ -4864,6 +4888,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmcs_write64(GUEST_BNDCFGS, 0);

vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */
+ vmcs_writel(GUEST_SSP, 0);
+ vmcs_writel(GUEST_S_CET, 0);
+ vmcs_writel(GUEST_INTR_SSP_TABLE, 0);

kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b63441fd2d2..98f3ff6078e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3627,6 +3627,39 @@ static bool kvm_is_msr_to_save(u32 msr_index)
return false;
}

+static inline bool is_shadow_stack_msr(u32 msr)
+{
+ return msr == MSR_IA32_PL0_SSP ||
+ msr == MSR_IA32_PL1_SSP ||
+ msr == MSR_IA32_PL2_SSP ||
+ msr == MSR_IA32_PL3_SSP ||
+ msr == MSR_IA32_INT_SSP_TAB ||
+ msr == MSR_KVM_GUEST_SSP;
+}
+
+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
+ struct msr_data *msr)
+{
+ if (is_shadow_stack_msr(msr->index)) {
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+ return false;
+
+ if (msr->index == MSR_KVM_GUEST_SSP)
+ return msr->host_initiated;
+
+ return msr->host_initiated ||
+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+ }
+
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ return false;
+
+ return msr->host_initiated ||
+ guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+}
+
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u32 msr = msr_info->index;
@@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.guest_fpu.xfd_err = data;
break;
#endif
+#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
+#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)
+#define CET_SHSTK_MASK_BITS GENMASK(1, 0)
+#define CET_IBT_MASK_BITS (GENMASK_ULL(5, 2) | \
+ GENMASK_ULL(63, 10))
+#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if (!!(data & CET_CTRL_RESERVED_BITS))
+ return 1;
+ if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+ (data & CET_SHSTK_MASK_BITS))
+ return 1;
+ if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
+ (data & CET_IBT_MASK_BITS))
+ return 1;
+ if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
+ (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
+ return 1;
+ if (msr == MSR_IA32_U_CET)
+ kvm_set_xsave_msr(msr_info);
+ break;
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if (is_noncanonical_address(data, vcpu))
+ return 1;
+ if (!IS_ALIGNED(data, 4))
+ return 1;
+ if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
+ msr == MSR_IA32_PL2_SSP) {
+ vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
+ } else if (msr == MSR_IA32_PL3_SSP) {
+ kvm_set_xsave_msr(msr_info);
+ }
+ break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
@@ -4051,7 +4123,9 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)

int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
- switch (msr_info->index) {
+ u32 msr = msr_info->index;
+
+ switch (msr) {
case MSR_IA32_PLATFORM_ID:
case MSR_IA32_EBL_CR_POWERON:
case MSR_IA32_LASTBRANCHFROMIP:
@@ -4086,7 +4160,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_get_msr(vcpu, msr_info);
msr_info->data = 0;
break;
@@ -4137,7 +4211,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_MTRRcap:
case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
case MSR_MTRRdefType:
- return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
+ return kvm_mtrr_get_msr(vcpu, msr, &msr_info->data);
case 0xcd: /* fsb frequency */
msr_info->data = 3;
break;
@@ -4159,7 +4233,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = kvm_get_apic_base(vcpu);
break;
case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
- return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
+ return kvm_x2apic_msr_read(vcpu, msr, &msr_info->data);
case MSR_IA32_TSC_DEADLINE:
msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
break;
@@ -4253,7 +4327,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
+ return get_msr_mce(vcpu, msr, &msr_info->data,
msr_info->host_initiated);
case MSR_IA32_XSS:
if (!msr_info->host_initiated &&
@@ -4284,7 +4358,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case HV_X64_MSR_TSC_EMULATION_STATUS:
case HV_X64_MSR_TSC_INVARIANT_CONTROL:
return kvm_hv_get_msr_common(vcpu,
- msr_info->index, &msr_info->data,
+ msr, &msr_info->data,
msr_info->host_initiated);
case MSR_IA32_BBL_CR_CTL3:
/* This legacy MSR exists but isn't fully documented in current
@@ -4337,8 +4411,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = vcpu->arch.guest_fpu.xfd_err;
break;
#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
+ msr == MSR_IA32_PL2_SSP) {
+ msr_info->data =
+ vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
+ } else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
+ kvm_get_xsave_msr(msr_info);
+ }
+ break;
default:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_get_msr(vcpu, msr_info);

/*
@@ -4346,7 +4434,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* to-be-saved, even if an MSR isn't fully supported.
*/
if (msr_info->host_initiated &&
- kvm_is_msr_to_save(msr_info->index)) {
+ kvm_is_msr_to_save(msr)) {
msr_info->data = 0;
break;
}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c69fc027f5ec..3b79d6db2f83 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
unsigned int port, void *data, unsigned int count,
int in);

+/*
+ * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
+ * access the MSRs to avoid MSR content corruption.
+ */
+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
+{
+ kvm_fpu_get();
+ rdmsrl(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
+{
+ kvm_fpu_get();
+ wrmsrl(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
#endif
--
2.27.0


2023-08-04 07:19:15

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On Thu, Aug 03, 2023 at 12:27:24AM -0400, Yang Weijiang wrote:
>Add emulation interface for CET MSR read and write.
>The emulation code is split into common part and vendor specific
>part, the former resides in x86.c to benefic different x86 CPU
>vendors, the latter for VMX is implemented in this patch.
>
>Signed-off-by: Yang Weijiang <[email protected]>
>---
> arch/x86/kvm/vmx/vmx.c | 27 +++++++++++
> arch/x86/kvm/x86.c | 104 +++++++++++++++++++++++++++++++++++++----
> arch/x86/kvm/x86.h | 18 +++++++
> 3 files changed, 141 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 6aa76124e81e..ccf750e79608 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> break;
>+ case MSR_IA32_S_CET:
>+ case MSR_KVM_GUEST_SSP:
>+ case MSR_IA32_INT_SSP_TAB:
>+ if (kvm_get_msr_common(vcpu, msr_info))
>+ return 1;
>+ if (msr_info->index == MSR_KVM_GUEST_SSP)
>+ msr_info->data = vmcs_readl(GUEST_SSP);
>+ else if (msr_info->index == MSR_IA32_S_CET)
>+ msr_info->data = vmcs_readl(GUEST_S_CET);
>+ else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>+ msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);

This if-else-if suggests that they are focibly grouped together to just
share the call of kvm_get_msr_common(). For readability, I think it is better
to handle them separately.

e.g.,
case MSR_IA32_S_CET:
if (kvm_get_msr_common(vcpu, msr_info))
return 1;
msr_info->data = vmcs_readl(GUEST_S_CET);
break;

case MSR_KVM_GUEST_SSP:
if (kvm_get_msr_common(vcpu, msr_info))
return 1;
msr_info->data = vmcs_readl(GUEST_SSP);
break;

...


>+ break;
> case MSR_IA32_DEBUGCTLMSR:
> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> break;
>@@ -2404,6 +2416,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> vmx->pt_desc.guest.addr_a[index / 2] = data;
> break;
>+ case MSR_IA32_S_CET:
>+ case MSR_KVM_GUEST_SSP:
>+ case MSR_IA32_INT_SSP_TAB:
>+ if (kvm_set_msr_common(vcpu, msr_info))
>+ return 1;
>+ if (msr_index == MSR_KVM_GUEST_SSP)
>+ vmcs_writel(GUEST_SSP, data);
>+ else if (msr_index == MSR_IA32_S_CET)
>+ vmcs_writel(GUEST_S_CET, data);
>+ else if (msr_index == MSR_IA32_INT_SSP_TAB)
>+ vmcs_writel(GUEST_INTR_SSP_TABLE, data);

ditto

>+ break;
> case MSR_IA32_PERF_CAPABILITIES:
> if (data && !vcpu_to_pmu(vcpu)->version)
> return 1;
>@@ -4864,6 +4888,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> vmcs_write64(GUEST_BNDCFGS, 0);
>
> vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */
>+ vmcs_writel(GUEST_SSP, 0);
>+ vmcs_writel(GUEST_S_CET, 0);
>+ vmcs_writel(GUEST_INTR_SSP_TABLE, 0);

where are MSR_IA32_PL3_SSP and MSR_IA32_U_CET reset?

I thought that guest FPU would be reset in kvm_vcpu_reset(). But it turns out
only MPX states are reset in KVM while other FPU states are unchanged. This
is aligned with "Table 10.1 IA-32 and Intel? 64 Processor States Following
Power-up, Reset, or INIT"

Could you double confirm the hardware beahavior that CET states are reset to 0
on INIT? If CET states are reset, we need to handle CET_IA32_PL3_SSP and
MSR_IA32_U_CET like MPX.

>
> kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 5b63441fd2d2..98f3ff6078e6 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -3627,6 +3627,39 @@ static bool kvm_is_msr_to_save(u32 msr_index)
> return false;
> }
>
>+static inline bool is_shadow_stack_msr(u32 msr)
>+{
>+ return msr == MSR_IA32_PL0_SSP ||
>+ msr == MSR_IA32_PL1_SSP ||
>+ msr == MSR_IA32_PL2_SSP ||
>+ msr == MSR_IA32_PL3_SSP ||
>+ msr == MSR_IA32_INT_SSP_TAB ||
>+ msr == MSR_KVM_GUEST_SSP;
>+}
>+
>+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>+ struct msr_data *msr)
>+{
>+ if (is_shadow_stack_msr(msr->index)) {
>+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>+ return false;
>+
>+ if (msr->index == MSR_KVM_GUEST_SSP)
>+ return msr->host_initiated;
>+
>+ return msr->host_initiated ||
>+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>+ }
>+
>+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
>+ return false;
>+
>+ return msr->host_initiated ||
>+ guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
>+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>+}
>+
> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
> u32 msr = msr_info->index;
>@@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vcpu->arch.guest_fpu.xfd_err = data;
> break;
> #endif
>+#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
>+#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)
>+#define CET_SHSTK_MASK_BITS GENMASK(1, 0)
>+#define CET_IBT_MASK_BITS (GENMASK_ULL(5, 2) | \
>+ GENMASK_ULL(63, 10))
>+#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
>+ case MSR_IA32_U_CET:
>+ case MSR_IA32_S_CET:
>+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+ return 1;
>+ if (!!(data & CET_CTRL_RESERVED_BITS))
>+ return 1;
>+ if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>+ (data & CET_SHSTK_MASK_BITS))
>+ return 1;
>+ if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
>+ (data & CET_IBT_MASK_BITS))
>+ return 1;
>+ if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
>+ (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
>+ return 1;
>+ if (msr == MSR_IA32_U_CET)

can you add a comment before this if() statement like?
/* MSR_IA32_S_CET is handled by vendor code */

>+ case MSR_KVM_GUEST_SSP:
>+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+ return 1;
>+ if (is_noncanonical_address(data, vcpu))
>+ return 1;
>+ if (!IS_ALIGNED(data, 4))
>+ return 1;
>+ if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
>+ msr == MSR_IA32_PL2_SSP) {
>+ vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
>+ } else if (msr == MSR_IA32_PL3_SSP) {
>+ kvm_set_xsave_msr(msr_info);
>+ }

brackets are not needed.

also add a comment for MSR_KVM_GUEST_SSP.

>+ break;
> default:
> if (kvm_pmu_is_valid_msr(vcpu, msr))
> return kvm_pmu_set_msr(vcpu, msr_info);
>@@ -4051,7 +4123,9 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
>
> int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
>- switch (msr_info->index) {
>+ u32 msr = msr_info->index;
>+
>+ switch (msr) {
> case MSR_IA32_PLATFORM_ID:
> case MSR_IA32_EBL_CR_POWERON:
> case MSR_IA32_LASTBRANCHFROMIP:
>@@ -4086,7 +4160,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
> case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
>- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
>+ if (kvm_pmu_is_valid_msr(vcpu, msr))
> return kvm_pmu_get_msr(vcpu, msr_info);
> msr_info->data = 0;
> break;
>@@ -4137,7 +4211,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_MTRRcap:
> case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
> case MSR_MTRRdefType:
>- return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
>+ return kvm_mtrr_get_msr(vcpu, msr, &msr_info->data);
> case 0xcd: /* fsb frequency */
> msr_info->data = 3;
> break;
>@@ -4159,7 +4233,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> msr_info->data = kvm_get_apic_base(vcpu);
> break;
> case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
>- return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
>+ return kvm_x2apic_msr_read(vcpu, msr, &msr_info->data);
> case MSR_IA32_TSC_DEADLINE:
> msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
> break;
>@@ -4253,7 +4327,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_MCG_STATUS:
> case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
> case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
>- return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
>+ return get_msr_mce(vcpu, msr, &msr_info->data,
> msr_info->host_initiated);
> case MSR_IA32_XSS:
> if (!msr_info->host_initiated &&
>@@ -4284,7 +4358,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case HV_X64_MSR_TSC_EMULATION_STATUS:
> case HV_X64_MSR_TSC_INVARIANT_CONTROL:
> return kvm_hv_get_msr_common(vcpu,
>- msr_info->index, &msr_info->data,
>+ msr, &msr_info->data,
> msr_info->host_initiated);
> case MSR_IA32_BBL_CR_CTL3:
> /* This legacy MSR exists but isn't fully documented in current
>@@ -4337,8 +4411,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> msr_info->data = vcpu->arch.guest_fpu.xfd_err;
> break;
> #endif
>+ case MSR_IA32_U_CET:
>+ case MSR_IA32_S_CET:
>+ case MSR_KVM_GUEST_SSP:
>+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+ return 1;
>+ if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
>+ msr == MSR_IA32_PL2_SSP) {
>+ msr_info->data =
>+ vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
>+ } else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
>+ kvm_get_xsave_msr(msr_info);
>+ }

Again, for readability and clarity, how about:

case MSR_IA32_U_CET:
case MSR_IA32_PL3_SSP:
if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
return 1;
kvm_get_xsave_msr(msr_info);
break;
case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
return 1;
msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
break;
case MSR_IA32_S_CET:
case MSR_KVM_GUEST_SSP:
if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
return 1;
/* Further handling in vendor code */
break;

>+ break;
> default:
>- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
>+ if (kvm_pmu_is_valid_msr(vcpu, msr))
> return kvm_pmu_get_msr(vcpu, msr_info);
>
> /*
>@@ -4346,7 +4434,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> * to-be-saved, even if an MSR isn't fully supported.
> */
> if (msr_info->host_initiated &&
>- kvm_is_msr_to_save(msr_info->index)) {
>+ kvm_is_msr_to_save(msr)) {
> msr_info->data = 0;
> break;
> }
>diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>index c69fc027f5ec..3b79d6db2f83 100644
>--- a/arch/x86/kvm/x86.h
>+++ b/arch/x86/kvm/x86.h
>@@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> unsigned int port, void *data, unsigned int count,
> int in);
>
>+/*
>+ * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
>+ * access the MSRs to avoid MSR content corruption.
>+ */

I think it is better to describe what the function does prior to jumping into
details like where guest FPU is loaded.

/*
* Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
* by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
* guest FPU should have been loaded already.
*/
>+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
>+{
>+ kvm_fpu_get();
>+ rdmsrl(msr_info->index, msr_info->data);
>+ kvm_fpu_put();
>+}
>+
>+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
>+{
>+ kvm_fpu_get();
>+ wrmsrl(msr_info->index, msr_info->data);
>+ kvm_fpu_put();
>+}

Can you rename functions to kvm_get/set_xstate_msr() to align with the comment
and patch 6? And if there is no user outside x86.c, you can just put these two
functions right after the is_xstate_msr() added in patch 6.

>+
> #endif
>--
>2.27.0
>

2023-08-04 08:48:48

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v5 14/19] KVM:VMX: Set host constant supervisor states to VMCS fields

On Thu, Aug 03, 2023 at 12:27:27AM -0400, Yang Weijiang wrote:
>Set constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} VMCS
>fields explicitly. Kernel IBT is supported and the setting in
>MSR_IA32_S_CET is static after post-boot(except is BIOS call
>case but vCPU thread never across it.), i.e. KVM doesn't need
>to refresh HOST_S_CET field before every VM-Enter/VM-Exit
>sequence.
>
>Host supervisor shadow stack is not enabled now and SSP is not
>accessible to kernel mode, thus it's safe to set host IA32_INT_
>SSP_TAB/SSP VMCS fields to 0s. When shadow stack is enabled for
>CPL3, SSP is reloaded from IA32_PL3_SSP before it exits to userspace.
>Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/SYSRET/SYSENTER SYSEXIT/
>RDSSP/CALL etc.
>
>Prevent KVM module loading and if host supervisor shadow stack
>SHSTK_EN is set in MSR_IA32_S_CET as KVM cannot co-exit with it
>correctly.
>
>Suggested-by: Sean Christopherson <[email protected]>
>Suggested-by: Chao Gao <[email protected]>
>Signed-off-by: Yang Weijiang <[email protected]>

Reviewed-by: Chao Gao <[email protected]>

2023-08-04 08:51:07

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

>+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>+ return 1;
>+ if (is_noncanonical_address(data, vcpu))
>+ return 1;
>+ if (!IS_ALIGNED(data, 4))
>+ return 1;

Why should MSR_IA32_INT_SSP_TAB be 4-byte aligned? I don't see
this requirement in SDM.

IA32_INTERRUPT_SSP_TABLE_ADDR:

Linear address of a table of seven shadow
stack pointers that are selected in IA-32e
mode using the IST index (when not 0) from
the interrupt gate descriptor. (R/W)
This MSR is not present on processors that
do not support Intel 64 architecture. This
field cannot represent a non-canonical
address.

2023-08-04 22:11:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On 8/3/23 06:27, Yang Weijiang wrote:
> + if (msr_info->index == MSR_KVM_GUEST_SSP)
> + msr_info->data = vmcs_readl(GUEST_SSP);

Accesses to MSR_KVM_(GUEST_)SSP must be rejected unless host-initiated.

Paolo


2023-08-04 22:17:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On Fri, Aug 04, 2023, Chao Gao wrote:
> On Thu, Aug 03, 2023 at 12:27:24AM -0400, Yang Weijiang wrote:
> >Add emulation interface for CET MSR read and write.
> >The emulation code is split into common part and vendor specific
> >part, the former resides in x86.c to benefic different x86 CPU
> >vendors, the latter for VMX is implemented in this patch.
> >
> >Signed-off-by: Yang Weijiang <[email protected]>
> >---
> > arch/x86/kvm/vmx/vmx.c | 27 +++++++++++
> > arch/x86/kvm/x86.c | 104 +++++++++++++++++++++++++++++++++++++----
> > arch/x86/kvm/x86.h | 18 +++++++
> > 3 files changed, 141 insertions(+), 8 deletions(-)
> >
> >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >index 6aa76124e81e..ccf750e79608 100644
> >--- a/arch/x86/kvm/vmx/vmx.c
> >+++ b/arch/x86/kvm/vmx/vmx.c
> >@@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > else
> > msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> > break;
> >+ case MSR_IA32_S_CET:
> >+ case MSR_KVM_GUEST_SSP:
> >+ case MSR_IA32_INT_SSP_TAB:
> >+ if (kvm_get_msr_common(vcpu, msr_info))
> >+ return 1;
> >+ if (msr_info->index == MSR_KVM_GUEST_SSP)
> >+ msr_info->data = vmcs_readl(GUEST_SSP);
> >+ else if (msr_info->index == MSR_IA32_S_CET)
> >+ msr_info->data = vmcs_readl(GUEST_S_CET);
> >+ else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
> >+ msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>
> This if-else-if suggests that they are focibly grouped together to just
> share the call of kvm_get_msr_common(). For readability, I think it is better
> to handle them separately.
>
> e.g.,
> case MSR_IA32_S_CET:
> if (kvm_get_msr_common(vcpu, msr_info))
> return 1;
> msr_info->data = vmcs_readl(GUEST_S_CET);
> break;
>
> case MSR_KVM_GUEST_SSP:
> if (kvm_get_msr_common(vcpu, msr_info))
> return 1;
> msr_info->data = vmcs_readl(GUEST_SSP);
> break;

Actually, we can do even better. We have an existing framework for these types
of prechecks, I just completely forgot about it :-( (my "look at PAT" was a bad
suggestion).

Handle the checks in __kvm_set_msr() and __kvm_get_msr(), i.e. *before* calling
into vendor code. Then vendor code doesn't need to make weird callbacks.

> > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > {
> > u32 msr = msr_info->index;
> >@@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > vcpu->arch.guest_fpu.xfd_err = data;
> > break;
> > #endif
> >+#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
> >+#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)

Please use a single namespace for these #defines, e.g. CET_CTRL_* or maybe
CET_US_* for everything.

> >+#define CET_SHSTK_MASK_BITS GENMASK(1, 0)
> >+#define CET_IBT_MASK_BITS (GENMASK_ULL(5, 2) | \
> >+ GENMASK_ULL(63, 10))
> >+#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)

Bah, stupid SDM. Please spell out "LEGACY", I though "LEG" was short for "LEGAL"
since this looks a lot like a page shift, i.e. getting a pfn.

> >+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> >+ struct msr_data *msr)
> >+{
> >+ if (is_shadow_stack_msr(msr->index)) {
> >+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> >+ return false;
> >+
> >+ if (msr->index == MSR_KVM_GUEST_SSP)
> >+ return msr->host_initiated;
> >+
> >+ return msr->host_initiated ||
> >+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> >+ }
> >+
> >+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> >+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
> >+ return false;
> >+
> >+ return msr->host_initiated ||
> >+ guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> >+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
accesses, i.e. require the feature to be enabled and exposed to the guest, even
for the host.

> >diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> >index c69fc027f5ec..3b79d6db2f83 100644
> >--- a/arch/x86/kvm/x86.h
> >+++ b/arch/x86/kvm/x86.h
> >@@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> > unsigned int port, void *data, unsigned int count,
> > int in);
> >
> >+/*
> >+ * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
> >+ * access the MSRs to avoid MSR content corruption.
> >+ */
>
> I think it is better to describe what the function does prior to jumping into
> details like where guest FPU is loaded.
>
> /*
> * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
> * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
> * guest FPU should have been loaded already.
> */
> >+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
> >+{
> >+ kvm_fpu_get();
> >+ rdmsrl(msr_info->index, msr_info->data);
> >+ kvm_fpu_put();
> >+}
> >+
> >+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
> >+{
> >+ kvm_fpu_get();
> >+ wrmsrl(msr_info->index, msr_info->data);
> >+ kvm_fpu_put();
> >+}
>
> Can you rename functions to kvm_get/set_xstate_msr() to align with the comment
> and patch 6? And if there is no user outside x86.c, you can just put these two
> functions right after the is_xstate_msr() added in patch 6.

+1. These should also assert that (a) guest FPU state is loaded and (b) the MSR
is passed through to the guest. I might be ok dropping (b) if both VMX and SVM
passthrough all MSRs if they're exposed to the guest, i.e. not lazily passed
through.

Sans any changes to kvm_{g,s}et_xsave_msr(), I think this? (completely untested)


---
arch/x86/kvm/vmx/vmx.c | 34 +++-------
arch/x86/kvm/x86.c | 151 +++++++++++++++--------------------------
2 files changed, 64 insertions(+), 121 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 491039aeb61b..1211eb469d06 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2100,16 +2100,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
case MSR_IA32_S_CET:
+ msr_info->data = vmcs_readl(GUEST_S_CET);
+ break;
case MSR_KVM_GUEST_SSP:
+ msr_info->data = vmcs_readl(GUEST_SSP);
+ break;
case MSR_IA32_INT_SSP_TAB:
- if (kvm_get_msr_common(vcpu, msr_info))
- return 1;
- if (msr_info->index == MSR_KVM_GUEST_SSP)
- msr_info->data = vmcs_readl(GUEST_SSP);
- else if (msr_info->index == MSR_IA32_S_CET)
- msr_info->data = vmcs_readl(GUEST_S_CET);
- else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
- msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+ msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
break;
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
@@ -2432,25 +2429,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
- case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
- if (kvm_set_msr_common(vcpu, msr_info))
- return 1;
- if (data) {
- vmx_disable_write_intercept_sss_msr(vcpu);
- wrmsrl(msr_index, data);
- }
- break;
case MSR_IA32_S_CET:
+ vmcs_writel(GUEST_S_CET, data);
+ break;
case MSR_KVM_GUEST_SSP:
+ vmcs_writel(GUEST_SSP, data);
+ break;
case MSR_IA32_INT_SSP_TAB:
- if (kvm_set_msr_common(vcpu, msr_info))
- return 1;
- if (msr_index == MSR_KVM_GUEST_SSP)
- vmcs_writel(GUEST_SSP, data);
- else if (msr_index == MSR_IA32_S_CET)
- vmcs_writel(GUEST_S_CET, data);
- else if (msr_index == MSR_IA32_INT_SSP_TAB)
- vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+ vmcs_writel(GUEST_INTR_SSP_TABLE, data);
break;
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7385fc25a987..75e6de7c9268 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1838,6 +1838,11 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
}
EXPORT_SYMBOL_GPL(kvm_msr_allowed);

+#define CET_US_RESERVED_BITS GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
+
/*
* Write @data into the MSR specified by @index. Select MSR specific fault
* checks are bypassed if @host_initiated is %true.
@@ -1897,6 +1902,35 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,

data = (u32)data;
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_can_use(vcpu, X86_FEATURE_IBT))
+ return 1;
+ if (data & CET_US_RESERVED_BITS)
+ return 1;
+ if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+ (data & CET_US_SHSTK_MASK_BITS))
+ return 1;
+ if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
+ (data & CET_US_IBT_MASK_BITS))
+ return 1;
+ if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+ return 1;
+
+ /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDR. */
+ if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+ return 1;
+ break;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ case MSR_KVM_GUEST_SSP:
+ if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
+ return 1;
+ if (is_noncanonical_address(data, vcpu))
+ return 1;
+ if (!IS_ALIGNED(data, 4))
+ return 1;
+ break;
}

msr.data = data;
@@ -1940,6 +1974,17 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
!guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
return 1;
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
+ !guest_can_use(vcpu, X86_FEATURE_SHSTK))
+ return 1;
+ break;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ case MSR_KVM_GUEST_SSP:
+ if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
+ return 1;
+ break;
}

msr.index = index;
@@ -3640,47 +3685,6 @@ static bool kvm_is_msr_to_save(u32 msr_index)
return false;
}

-static inline bool is_shadow_stack_msr(u32 msr)
-{
- return msr == MSR_IA32_PL0_SSP ||
- msr == MSR_IA32_PL1_SSP ||
- msr == MSR_IA32_PL2_SSP ||
- msr == MSR_IA32_PL3_SSP ||
- msr == MSR_IA32_INT_SSP_TAB ||
- msr == MSR_KVM_GUEST_SSP;
-}
-
-static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
- struct msr_data *msr)
-{
- if (is_shadow_stack_msr(msr->index)) {
- if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
- return false;
-
- /*
- * This MSR is synthesized mainly for userspace access during
- * Live Migration, it also can be accessed in SMM mode by VMM.
- * Guest is not allowed to access this MSR.
- */
- if (msr->index == MSR_KVM_GUEST_SSP) {
- if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
- return true;
-
- return msr->host_initiated;
- }
-
- return msr->host_initiated ||
- guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
- }
-
- if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
- !kvm_cpu_cap_has(X86_FEATURE_IBT))
- return false;
-
- return msr->host_initiated ||
- guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
- guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
-}

int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
@@ -4036,46 +4040,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.guest_fpu.xfd_err = data;
break;
#endif
-#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
-#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)
-#define CET_SHSTK_MASK_BITS GENMASK(1, 0)
-#define CET_IBT_MASK_BITS (GENMASK_ULL(5, 2) | \
- GENMASK_ULL(63, 10))
-#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
- return 1;
- if (!!(data & CET_CTRL_RESERVED_BITS))
- return 1;
- if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
- (data & CET_SHSTK_MASK_BITS))
- return 1;
- if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
- (data & CET_IBT_MASK_BITS))
- return 1;
- if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
- (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
- return 1;
- if (msr == MSR_IA32_U_CET)
- kvm_set_xsave_msr(msr_info);
- break;
- case MSR_KVM_GUEST_SSP:
- case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
- if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
- return 1;
- if (is_noncanonical_address(data, vcpu))
- return 1;
- if (!IS_ALIGNED(data, 4))
- return 1;
- if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
- msr == MSR_IA32_PL2_SSP) {
- vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
- if (!vcpu->arch.cet_sss_active && data)
- vcpu->arch.cet_sss_active = true;
- } else if (msr == MSR_IA32_PL3_SSP) {
- kvm_set_xsave_msr(msr_info);
- }
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_set_xsave_msr(msr_info);
break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr))
@@ -4436,17 +4403,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
break;
#endif
case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
- case MSR_KVM_GUEST_SSP:
- case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
- if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
- return 1;
- if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
- msr == MSR_IA32_PL2_SSP) {
- msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
- } else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
- kvm_get_xsave_msr(msr_info);
- }
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_get_xsave_msr(msr_info);
break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr))
@@ -7330,9 +7288,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
break;
case MSR_IA32_U_CET:
case MSR_IA32_S_CET:
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ return;
+ break;
case MSR_KVM_GUEST_SSP:
case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
- if (!kvm_is_cet_supported())
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
return;
break;
default:
@@ -9664,13 +9626,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
}
if (boot_cpu_has(X86_FEATURE_XSAVES)) {
- u32 eax, ebx, ecx, edx;
-
- cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx);
rdmsrl(MSR_IA32_XSS, host_xss);
kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
- if (ecx & XFEATURE_MASK_CET_KERNEL)
- kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
}

rdmsrl_safe(MSR_EFER, &host_efer);

base-commit: efb9177acd7a4df5883b844e1ec9c69ef0899c9c
--


2023-08-04 22:44:58

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM

This is not "refinement", this is full on supporting a new nVMX feature. Please
phrase the shortlog accordingly, e.g. something like this (it's not very good,
but it's a start).

KVM: nVMX: Add support for exposing "No PM H/W error code checks" to L1

Regarding shortlog, please update all of them in this series to put a space after
the colon, i.e. "KVM: VMX:" and "KVM: x86:", not "KVM:x86:".

> static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index 96952263b029..1884628294e4 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> __kvm_is_valid_cr4(vcpu, val);
> }
>
> +static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> + return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE;

The "CC" part of my suggestion is critical to this being sane. As is, this reads
"nested CPU has no hardware error code", which is not even remotely close to the
truth.

static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
{
return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
}

[*] https://lore.kernel.org/all/[email protected]

2023-08-04 23:58:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On 8/4/23 23:27, Sean Christopherson wrote:
>>> +
>>> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>>> + !kvm_cpu_cap_has(X86_FEATURE_IBT))
>>> + return false;
>>> +
>>> + return msr->host_initiated ||
>>> + guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
>>> + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>
> Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
> accesses, i.e. require the feature to be enabled and exposed to the guest, even
> for the host.

No, please don't. Allowing host-initiated accesses is what makes it
possible to take the list of MSR indices and pass it blindly to
KVM_GET_MSR and KVM_SET_MSR. This should be documented, will send a patch.

Paolo


2023-08-04 23:58:19

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On Fri, Aug 04, 2023, Paolo Bonzini wrote:
> On 8/4/23 23:27, Sean Christopherson wrote:
> > > > +
> > > > + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> > > > + !kvm_cpu_cap_has(X86_FEATURE_IBT))
> > > > + return false;
> > > > +
> > > > + return msr->host_initiated ||
> > > > + guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> > > > + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> >
> > Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
> > accesses, i.e. require the feature to be enabled and exposed to the guest, even
> > for the host.
>
> No, please don't. Allowing host-initiated accesses is what makes it
> possible to take the list of MSR indices and pass it blindly to KVM_GET_MSR
> and KVM_SET_MSR.

I don't see how that can work today. Oooh, the MSRs that don't exempt host_initiated
are added to the list of MSRs to save/restore, i.e. KVM "silently" supports
MSR_AMD64_OSVW_ID_LENGTH and MSR_AMD64_OSVW_STATUS.

And guest_pv_has() returns true unless userspace has opted in to enforcement.

Sad panda.

That means we need to figure out a solution for KVM stuffing GUEST_SSP on RSM,
which is a "host" write but a guest controlled value.

2023-08-06 09:55:13

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On 8/5/2023 5:27 AM, Sean Christopherson wrote:
> On Fri, Aug 04, 2023, Chao Gao wrote:
>> On Thu, Aug 03, 2023 at 12:27:24AM -0400, Yang Weijiang wrote:
>>> Add emulation interface for CET MSR read and write.
>>> The emulation code is split into common part and vendor specific
>>> part, the former resides in x86.c to benefic different x86 CPU
>>> vendors, the latter for VMX is implemented in this patch.
>>>
>>> Signed-off-by: Yang Weijiang <[email protected]>
>>> ---
>>> arch/x86/kvm/vmx/vmx.c | 27 +++++++++++
>>> arch/x86/kvm/x86.c | 104 +++++++++++++++++++++++++++++++++++++----
>>> arch/x86/kvm/x86.h | 18 +++++++
>>> 3 files changed, 141 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index 6aa76124e81e..ccf750e79608 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -2095,6 +2095,18 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> else
>>> msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>>> break;
>>> + case MSR_IA32_S_CET:
>>> + case MSR_KVM_GUEST_SSP:
>>> + case MSR_IA32_INT_SSP_TAB:
>>> + if (kvm_get_msr_common(vcpu, msr_info))
>>> + return 1;
>>> + if (msr_info->index == MSR_KVM_GUEST_SSP)
>>> + msr_info->data = vmcs_readl(GUEST_SSP);
>>> + else if (msr_info->index == MSR_IA32_S_CET)
>>> + msr_info->data = vmcs_readl(GUEST_S_CET);
>>> + else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>>> + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>> This if-else-if suggests that they are focibly grouped together to just
>> share the call of kvm_get_msr_common(). For readability, I think it is better
>> to handle them separately.
>>
>> e.g.,
>> case MSR_IA32_S_CET:
>> if (kvm_get_msr_common(vcpu, msr_info))
>> return 1;
>> msr_info->data = vmcs_readl(GUEST_S_CET);
>> break;
>>
>> case MSR_KVM_GUEST_SSP:
>> if (kvm_get_msr_common(vcpu, msr_info))
>> return 1;
>> msr_info->data = vmcs_readl(GUEST_SSP);
>> break;
> Actually, we can do even better. We have an existing framework for these types
> of prechecks, I just completely forgot about it :-( (my "look at PAT" was a bad
> suggestion).
>
> Handle the checks in __kvm_set_msr() and __kvm_get_msr(), i.e. *before* calling
> into vendor code. Then vendor code doesn't need to make weird callbacks.
I see, will change it, thank you!
>>> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> {
>>> u32 msr = msr_info->index;
>>> @@ -3981,6 +4014,45 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>> vcpu->arch.guest_fpu.xfd_err = data;
>>> break;
>>> #endif
>>> +#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
>>> +#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)
> Please use a single namespace for these #defines, e.g. CET_CTRL_* or maybe
> CET_US_* for everything.
OK.
>>> +#define CET_SHSTK_MASK_BITS GENMASK(1, 0)
>>> +#define CET_IBT_MASK_BITS (GENMASK_ULL(5, 2) | \
>>> + GENMASK_ULL(63, 10))
>>> +#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
> Bah, stupid SDM. Please spell out "LEGACY", I though "LEG" was short for "LEGAL"
> since this looks a lot like a page shift, i.e. getting a pfn.
Sure :-)
>>> +static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>>> + struct msr_data *msr)
>>> +{
>>> + if (is_shadow_stack_msr(msr->index)) {
>>> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>>> + return false;
>>> +
>>> + if (msr->index == MSR_KVM_GUEST_SSP)
>>> + return msr->host_initiated;
>>> +
>>> + return msr->host_initiated ||
>>> + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>>> + }
>>> +
>>> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>>> + !kvm_cpu_cap_has(X86_FEATURE_IBT))
>>> + return false;
>>> +
>>> + return msr->host_initiated ||
>>> + guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
>>> + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> Similar to my suggestsion for XSS, I think we drop the waiver for host_initiated
> accesses, i.e. require the feature to be enabled and exposed to the guest, even
> for the host.
I saw Paolo shares different opinion on this, so would hold on for a while...
>>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>>> index c69fc027f5ec..3b79d6db2f83 100644
>>> --- a/arch/x86/kvm/x86.h
>>> +++ b/arch/x86/kvm/x86.h
>>> @@ -552,4 +552,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>>> unsigned int port, void *data, unsigned int count,
>>> int in);
>>>
>>> +/*
>>> + * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
>>> + * access the MSRs to avoid MSR content corruption.
>>> + */
>> I think it is better to describe what the function does prior to jumping into
>> details like where guest FPU is loaded.
OK, will do it, thanks!
>> /*
>> * Lock and/or reload guest FPU and access xstate MSRs. For accesses initiated
>> * by host, guest FPU is loaded in __msr_io(). For accesses initiated by guest,
>> * guest FPU should have been loaded already.
>> */
>>> +static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
>>> +{
>>> + kvm_fpu_get();
>>> + rdmsrl(msr_info->index, msr_info->data);
>>> + kvm_fpu_put();
>>> +}
>>> +
>>> +static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
>>> +{
>>> + kvm_fpu_get();
>>> + wrmsrl(msr_info->index, msr_info->data);
>>> + kvm_fpu_put();
>>> +}
>> Can you rename functions to kvm_get/set_xstate_msr() to align with the comment
>> and patch 6? And if there is no user outside x86.c, you can just put these two
>> functions right after the is_xstate_msr() added in patch 6.
OK, maybe I added the helpers in this patch duo to compilation error "function is defined but not used".
> +1. These should also assert that (a) guest FPU state is loaded and
Do you mean something like this:
WARN_ON_ONCE(!vcpu->arch.guest_fpu->in_use) or  KVM_BUG_ON()
added in the helpers?
> (b) the MSR
> is passed through to the guest. I might be ok dropping (b) if both VMX and SVM
> passthrough all MSRs if they're exposed to the guest, i.e. not lazily passed
> through.
I'm OK to add the assert if finally all the CET MSRs are passed through directly.
> Sans any changes to kvm_{g,s}et_xsave_msr(), I think this? (completely untested)
>
>
> ---
> arch/x86/kvm/vmx/vmx.c | 34 +++-------
> arch/x86/kvm/x86.c | 151 +++++++++++++++--------------------------
> 2 files changed, 64 insertions(+), 121 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 491039aeb61b..1211eb469d06 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2100,16 +2100,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> break;
> case MSR_IA32_S_CET:
> + msr_info->data = vmcs_readl(GUEST_S_CET);
> + break;
> case MSR_KVM_GUEST_SSP:
> + msr_info->data = vmcs_readl(GUEST_SSP);
> + break;
> case MSR_IA32_INT_SSP_TAB:
> - if (kvm_get_msr_common(vcpu, msr_info))
> - return 1;
> - if (msr_info->index == MSR_KVM_GUEST_SSP)
> - msr_info->data = vmcs_readl(GUEST_SSP);
> - else if (msr_info->index == MSR_IA32_S_CET)
> - msr_info->data = vmcs_readl(GUEST_S_CET);
> - else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
> - msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
> break;
> case MSR_IA32_DEBUGCTLMSR:
> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> @@ -2432,25 +2429,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> vmx->pt_desc.guest.addr_a[index / 2] = data;
> break;
> - case MSR_IA32_PL0_SSP ... MSR_IA32_PL2_SSP:
> - if (kvm_set_msr_common(vcpu, msr_info))
> - return 1;
> - if (data) {
> - vmx_disable_write_intercept_sss_msr(vcpu);
> - wrmsrl(msr_index, data);
> - }
> - break;
> case MSR_IA32_S_CET:
> + vmcs_writel(GUEST_S_CET, data);
> + break;
> case MSR_KVM_GUEST_SSP:
> + vmcs_writel(GUEST_SSP, data);
> + break;
> case MSR_IA32_INT_SSP_TAB:
> - if (kvm_set_msr_common(vcpu, msr_info))
> - return 1;
> - if (msr_index == MSR_KVM_GUEST_SSP)
> - vmcs_writel(GUEST_SSP, data);
> - else if (msr_index == MSR_IA32_S_CET)
> - vmcs_writel(GUEST_S_CET, data);
> - else if (msr_index == MSR_IA32_INT_SSP_TAB)
> - vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> + vmcs_writel(GUEST_INTR_SSP_TABLE, data);
> break;
> case MSR_IA32_PERF_CAPABILITIES:
> if (data && !vcpu_to_pmu(vcpu)->version)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7385fc25a987..75e6de7c9268 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1838,6 +1838,11 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
> }
> EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>
> +#define CET_US_RESERVED_BITS GENMASK(9, 6)
> +#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
> +#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
> +#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
> +
> /*
> * Write @data into the MSR specified by @index. Select MSR specific fault
> * checks are bypassed if @host_initiated is %true.
> @@ -1897,6 +1902,35 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>
> data = (u32)data;
> break;
> + case MSR_IA32_U_CET:
> + case MSR_IA32_S_CET:
> + if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> + !guest_can_use(vcpu, X86_FEATURE_IBT))
> + return 1;
> + if (data & CET_US_RESERVED_BITS)
> + return 1;
> + if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> + (data & CET_US_SHSTK_MASK_BITS))
> + return 1;
> + if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
> + (data & CET_US_IBT_MASK_BITS))
> + return 1;
> + if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
> + return 1;
> +
> + /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDR. */
> + if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
> + return 1;
> + break;
> + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> + case MSR_KVM_GUEST_SSP:
> + if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
> + return 1;
> + if (is_noncanonical_address(data, vcpu))
> + return 1;
> + if (!IS_ALIGNED(data, 4))
> + return 1;
> + break;
> }
>
> msr.data = data;
> @@ -1940,6 +1974,17 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
> !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> return 1;
> break;
> + case MSR_IA32_U_CET:
> + case MSR_IA32_S_CET:
> + if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
> + !guest_can_use(vcpu, X86_FEATURE_SHSTK))
> + return 1;
> + break;
> + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> + case MSR_KVM_GUEST_SSP:
> + if (!guest_can_use(vcpu, X86_FEATURE_SHSTK))
> + return 1;
> + break;
> }
>
> msr.index = index;
> @@ -3640,47 +3685,6 @@ static bool kvm_is_msr_to_save(u32 msr_index)
> return false;
> }
>
> -static inline bool is_shadow_stack_msr(u32 msr)
> -{
> - return msr == MSR_IA32_PL0_SSP ||
> - msr == MSR_IA32_PL1_SSP ||
> - msr == MSR_IA32_PL2_SSP ||
> - msr == MSR_IA32_PL3_SSP ||
> - msr == MSR_IA32_INT_SSP_TAB ||
> - msr == MSR_KVM_GUEST_SSP;
> -}
> -
> -static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> - struct msr_data *msr)
> -{
> - if (is_shadow_stack_msr(msr->index)) {
> - if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> - return false;
> -
> - /*
> - * This MSR is synthesized mainly for userspace access during
> - * Live Migration, it also can be accessed in SMM mode by VMM.
> - * Guest is not allowed to access this MSR.
> - */
> - if (msr->index == MSR_KVM_GUEST_SSP) {
> - if (IS_ENABLED(CONFIG_X86_64) && is_smm(vcpu))
> - return true;
> -
> - return msr->host_initiated;
> - }
> -
> - return msr->host_initiated ||
> - guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> - }
> -
> - if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> - !kvm_cpu_cap_has(X86_FEATURE_IBT))
> - return false;
> -
> - return msr->host_initiated ||
> - guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> - guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
> -}
>
> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
> @@ -4036,46 +4040,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vcpu->arch.guest_fpu.xfd_err = data;
> break;
> #endif
> -#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
> -#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)
> -#define CET_SHSTK_MASK_BITS GENMASK(1, 0)
> -#define CET_IBT_MASK_BITS (GENMASK_ULL(5, 2) | \
> - GENMASK_ULL(63, 10))
> -#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
> case MSR_IA32_U_CET:
> - case MSR_IA32_S_CET:
> - if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> - return 1;
> - if (!!(data & CET_CTRL_RESERVED_BITS))
> - return 1;
> - if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> - (data & CET_SHSTK_MASK_BITS))
> - return 1;
> - if (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
> - (data & CET_IBT_MASK_BITS))
> - return 1;
> - if (!IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
> - (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS)
> - return 1;
> - if (msr == MSR_IA32_U_CET)
> - kvm_set_xsave_msr(msr_info);
> - break;
> - case MSR_KVM_GUEST_SSP:
> - case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> - if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> - return 1;
> - if (is_noncanonical_address(data, vcpu))
> - return 1;
> - if (!IS_ALIGNED(data, 4))
> - return 1;
> - if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
> - msr == MSR_IA32_PL2_SSP) {
> - vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
> - if (!vcpu->arch.cet_sss_active && data)
> - vcpu->arch.cet_sss_active = true;
> - } else if (msr == MSR_IA32_PL3_SSP) {
> - kvm_set_xsave_msr(msr_info);
> - }
> + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> + kvm_set_xsave_msr(msr_info);
> break;
> default:
> if (kvm_pmu_is_valid_msr(vcpu, msr))
> @@ -4436,17 +4403,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> break;
> #endif
> case MSR_IA32_U_CET:
> - case MSR_IA32_S_CET:
> - case MSR_KVM_GUEST_SSP:
> - case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> - if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> - return 1;
> - if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
> - msr == MSR_IA32_PL2_SSP) {
> - msr_info->data = vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
> - } else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
> - kvm_get_xsave_msr(msr_info);
> - }
> + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> + kvm_get_xsave_msr(msr_info);
> break;
> default:
> if (kvm_pmu_is_valid_msr(vcpu, msr))
> @@ -7330,9 +7288,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
> break;
> case MSR_IA32_U_CET:
> case MSR_IA32_S_CET:
> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> + !kvm_cpu_cap_has(X86_FEATURE_IBT))
> + return;
> + break;
> case MSR_KVM_GUEST_SSP:
> case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> - if (!kvm_is_cet_supported())
> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> return;
> break;
> default:
> @@ -9664,13 +9626,8 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
> }
> if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> - u32 eax, ebx, ecx, edx;
> -
> - cpuid_count(0xd, 1, &eax, &ebx, &ecx, &edx);
> rdmsrl(MSR_IA32_XSS, host_xss);
> kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
> - if (ecx & XFEATURE_MASK_CET_KERNEL)
> - kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
> }
>
> rdmsrl_safe(MSR_EFER, &host_efer);
>
> base-commit: efb9177acd7a4df5883b844e1ec9c69ef0899c9c
The code looks good to me except the handling of MSR_KVM_GUEST_SSP,
non-host-initiated read/write should be prevented.


2023-08-07 07:18:44

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On 8/5/23 00:21, Sean Christopherson wrote:
> Oooh, the MSRs that don't exempt host_initiated are added to the list

(are *not* added)

> of MSRs to save/restore, i.e. KVM "silently" supports
> MSR_AMD64_OSVW_ID_LENGTH and MSR_AMD64_OSVW_STATUS.
>
> And guest_pv_has() returns true unless userspace has opted in to
> enforcement.

Two different ways of having the same bug. The latter was introduced in
the implementation of KVM_CAP_ENFORCE_PV_FEATURE_CPUID; it would become
a problem if some selftests started using it.

Paolo


2023-08-07 08:13:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On 8/6/23 10:44, Yang, Weijiang wrote:
>> Similar to my suggestsion for XSS, I think we drop the waiver for
>> host_initiated
>> accesses, i.e. require the feature to be enabled and exposed to the
>> guest, even
>> for the host.
>
> I saw Paolo shares different opinion on this, so would hold on for a
> while...

It's not *so* different: the host initiated access should be allowed,
but it should only allow writing zero. So, something like:

> +static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> + struct msr_data *msr)
> +{

bool host_msr_reset =
msr->host_initiated && msr->data == 0;

and then below you use host_msr_reset instead of msr->host_initiated.

> + if (msr->index == MSR_KVM_GUEST_SSP)
> + return msr->host_initiated;
> +
> + return msr->host_initiated ||
> + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

This can be unified like this:

return
(host_msr_reset || guest_cpuid_has(vcpu, X86_FEATURE_SHSTK)) &&
(msr->index != MSR_KVM_GUEST_SSP || msr->host_initiated);

> + }
> +
> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> + !kvm_cpu_cap_has(X86_FEATURE_IBT))
> + return false;
> +
> + return msr->host_initiated ||
> + guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
> + guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

while this can simply use host_msr_reset.

Paolo


2023-08-09 04:08:11

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 18/19] KVM:nVMX: Refine error code injection to nested VM

On 8/5/2023 5:38 AM, Sean Christopherson wrote:
> This is not "refinement", this is full on supporting a new nVMX feature. Please
> phrase the shortlog accordingly, e.g. something like this (it's not very good,
> but it's a start).
>
> KVM: nVMX: Add support for exposing "No PM H/W error code checks" to L1
>
> Regarding shortlog, please update all of them in this series to put a space after
> the colon, i.e. "KVM: VMX:" and "KVM: x86:", not "KVM:x86:".
OK, will update this part.
>> static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
>> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
>> index 96952263b029..1884628294e4 100644
>> --- a/arch/x86/kvm/vmx/nested.h
>> +++ b/arch/x86/kvm/vmx/nested.h
>> @@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
>> __kvm_is_valid_cr4(vcpu, val);
>> }
>>
>> +static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu)
>> +{
>> + struct vcpu_vmx *vmx = to_vmx(vcpu);
>> +
>> + return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE;
> The "CC" part of my suggestion is critical to this being sane. As is, this reads
> "nested CPU has no hardware error code", which is not even remotely close to the
> truth.
Understood, I was not aware of the essence of "CC".
> static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
> {
> return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
> }
>
> [*] https://lore.kernel.org/all/[email protected]


2023-08-09 07:41:11

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On 8/5/2023 5:40 AM, Paolo Bonzini wrote:
> On 8/3/23 06:27, Yang Weijiang wrote:
>> +        if (msr_info->index == MSR_KVM_GUEST_SSP)
>> +            msr_info->data = vmcs_readl(GUEST_SSP);
>
> Accesses to MSR_KVM_(GUEST_)SSP must be rejected unless host-initiated.
Yes, it's kept, in v5 it's folded in:

+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,

+struct msr_data *msr)

+{

+if (is_shadow_stack_msr(msr->index)) {

+if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))

+return false;

+

+if (msr->index == MSR_KVM_GUEST_SSP)

+return msr->host_initiated;

+

+return msr->host_initiated ||

+guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);

+}

+

+if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&

+!kvm_cpu_cap_has(X86_FEATURE_IBT))

+return false;

+

+return msr->host_initiated ||

+guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||

+guest_cpuid_has(vcpu, X86_FEATURE_SHSTK); }

+

> Paolo
>


2023-08-09 08:34:24

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 11/19] KVM:VMX: Emulate read and write to CET MSRs

On 8/4/2023 4:28 PM, Chao Gao wrote:
>> + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>> + if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> + return 1;
>> + if (is_noncanonical_address(data, vcpu))
>> + return 1;
>> + if (!IS_ALIGNED(data, 4))
>> + return 1;
> Why should MSR_IA32_INT_SSP_TAB be 4-byte aligned? I don't see
> this requirement in SDM.
It must be something wrong in my memory, thanks for catching it!
> IA32_INTERRUPT_SSP_TABLE_ADDR:
>
> Linear address of a table of seven shadow
> stack pointers that are selected in IA-32e
> mode using the IST index (when not 0) from
> the interrupt gate descriptor. (R/W)
> This MSR is not present on processors that
> do not support Intel 64 architecture. This
> field cannot represent a non-canonical
> address.