2023-07-21 06:10:28

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 00/20] Enable CET Virtualization

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
A shadow stack is a second stack used exclusively for control transfer
operations. The shadow stack is separate from the data/normal stack and
can be enabled individually in user and kernel mode. When shadow stack
is enabled, CALL pushes the return address on both the data and shadow
stack. RET pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor
generates a #CP.

Indirect Branch Tracking (IBT):
IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of
indirect branches (CALL, JMP etc...). If an indirect branch is executed
and the next instruction is _not_ an ENDBRANCH, the processor generates a
#CP. These instruction behaves as a NOP on platforms that doesn't support
CET.


Dependency:
--------------------------------------------------------------------------
The first 2 patches are taken over from CET native series[1] in kernel tip.
They're prerequisites for this KVM patch series as CET user mode xstate and
some feature bits are defined in the patches. Add this KVM series to kernel
tree to build qualified host kernel to support guest CET features. Also apply
QEMU CET enabling patches[2] to build qualified QEMU. These kernel dependent
patches will be enclosed in KVM series until CET native series is merged in
mainline tree.


Implementation:
--------------------------------------------------------------------------
This series enables full support for guest CET SHSTK/IBT register states,
i.e., CET register states in below usage models are backed by KVM.

|
User SHSTK | User IBT (user mode)
--------------------------------------------------
Kernel SHSTK | Kernel IBT (kernel mode)
|

KVM cooperates with host kernel to back CET register states in each model.
In this series, KVM manages guest CET kernel registers(MSRs) by itself and
relies on host kernel to manage the user mode registers, thus KVM relies on
capability from host XSS MSR before exposes CET features to guest.

Note, guest supervisor(kernel) SHSTK cannot be fully supported by this series,
therefore guest SSS_CET bit of CPUID(0x7,1):EDX[bit18] is cleared. Check SDM
(Vol 1, Section 17.2.3) for details.


CET states management:
--------------------------------------------------------------------------
CET user mode states, MSR_IA32_{U_CET,PL3_SSP}, depends on {XSAVES,XRSTORS}
instructions to swap guest and host's states. On vmexit, guest user states
are saved to guest fpu area and host user mode states are loaded from thread
context before vCPU returns to userspace, vice-versa on vmentry. See details
in kvm_{load,put}_guest_fpu(). So CET user mode states management depends on
CET user mode bit(U_CET bit) set in host XSS MSR.

CET supervisor mode states are grouped into two categories : XSAVE-managed
and non-XSAVE-managed, the former includes MSR_IA32_PL{0,1,2}_SSP and are
controlled by CET supervisor mode bit(S_CET bit) in XSS, the later consists
of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL.

The XSAVE-managed supervisor states theoretically can be handled by enabling
S_CET bit in host XSS. But given the fact supervisor shadow stack isn't enabled
in Linux kernel, enabling the control bit just like that for user mode states
has global side-effects to all threads/tasks running on host, i.e.:
1) Introducing unnecessary XSAVE operation when switch to non-vCPU userspace
within current FPU framework.
2)Forcing allocating additional space for CET supervisor states in each thread
context regardless whether it's vCPU thread or not.

To avoid these downsides, this series provides a KVM solution to save/reload
vCPU's supervisor SHSTK states.

VMX introduces new VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, to
facilitate guest/host non-XSAVES-managed states. When VMX CET entry/exit load
bits are set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from
equivalent fields at vm-exit/entry. With these new fields, such supervisor states
require no addtional KVM save/reload actions.


Tests:
--------------------------------------------------------------------------
This series passed basic CET user shadow stack test and kernel IBT test in L1 and
L2 guest.

The patch series _has_ impact to existing vmx test cases in KVM-unit-tests,the
failures have been fixed in this patch[3].

All other parts of KVM unit-tests and selftests passed with this series. One new
selftest app for CET MSRs is posted here[4].

To run user SHSTK test and kernel IBT test in guest , an CET capable
platform is required, e.g., Sapphire Rapids server, and follow below steps to
build host/guest kernel properly:

1. Build host kernel: Add this series to kernel tree and build kernel.

2. Build guest kernel: Add full CET _native_ series to kernel tree and opt-in
CONFIG_X86_KERNEL_IBT and CONFIG_X86_USER_SHADOW_STACK options. Build with CET
enabled gcc versions(>= 8.5.0).

3. Use patched QEMU to launch a guest.

Check kernel selftest test_shadow_stack_64 output:

[INFO] new_ssp = 7f8c82100ff8, *new_ssp = 7f8c82101001
[INFO] changing ssp from 7f8c82900ff0 to 7f8c82100ff8
[INFO] ssp is now 7f8c82101000
[OK] Shadow stack pivot
[OK] Shadow stack faults
[INFO] Corrupting shadow stack
[INFO] Generated shadow stack violation successfully
[OK] Shadow stack violation test
[INFO] Gup read -> shstk access success
[INFO] Gup write -> shstk access success
[INFO] Violation from normal write
[INFO] Gup read -> write access success
[INFO] Violation from normal write
[INFO] Gup write -> write access success
[INFO] Cow gup write -> write access success
[OK] Shadow gup test
[INFO] Violation from shstk access
[OK] mprotect() test
[SKIP] Userfaultfd unavailable.
[OK] 32 bit test


Check kernel IBT with dmesg | grep CET:

CET detected: Indirect Branch Tracking enabled

--------------------------------------------------------------------------

Changes in v4:
1. Overhauled v3 series[5] per community review feedback. [Sean, Chao, Binbin etc.]
2. Modified CET dependency checks on host side before expose the features.
3. Added KVM specific solution for save/reload guest CET SHSTK supervisor states.
4. Rebase on kvm-x86/next [6].


[1]: CET native series: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/shstk
[2]: QEMU patch: https://lore.kernel.org/all/[email protected]/
[3]: KVM-unit-tests fixup: https://lore.kernel.org/all/[email protected]/
[4]: Selftests for CET MSRs: https://lore.kernel.org/all/[email protected]/
[5]: v3 patchset: https://lore.kernel.org/all/[email protected]/
[6]: Rebase branch: https://github.com/kvm-x86/linux/releases/tag/kvm-x86-next-2023.06.22


Patch 1-2: Native dependent CET patches.
Patch 3-5: Enable XSS support in KVM.
Patch 6 : Prepare patch for XSAVE-managed MSR access
Patch 7-11: Common patches to support CET on x86.
Patch 12-18: VMX specific patches to support CET.
Patch 19-20: nVMX patches for CET support in nested VM.

-----------------------------------------------------------------------------

Rick Edgecombe (2):
x86/cpufeatures: Add CPU feature flags for shadow stacks
x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

Sean Christopherson (2):
KVM:x86: Report XSS as to-be-saved if there are supported features
KVM:x86: Load guest FPU state when access XSAVE-managed MSRs

Yang Weijiang (16):
KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS
KVM:x86: Initialize kvm_caps.supported_xss
KVM:x86: Add fault checks for guest CR4.CET setting
KVM:x86: Report KVM supported CET MSRs as to-be-saved
KVM:x86: Add common code of CET MSR access
KVM:x86: Make guest supervisor states as non-XSAVE managed
KVM:x86: Save and reload GUEST_SSP to/from SMRAM
KVM:VMX: Introduce CET VMCS fields and control bits
KVM:VMX: Emulate read and write to CET MSRs
KVM:VMX: Set up interception for CET MSRs
KVM:VMX: Save host MSR_IA32_S_CET to VMCS field
KVM:x86: Optimize CET supervisor SSP save/reload
KVM:x86: Enable CET virtualization for VMX and advertise to userspace
KVM:x86: Enable guest CET supervisor xstate bit support
KVM:nVMX: Refine error code injection to nested VM
KVM:nVMX: Enable CET support for nested VM

arch/arm64/include/asm/kvm_host.h | 1 +
arch/mips/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/riscv/include/asm/kvm_host.h | 1 +
arch/s390/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/fpu/types.h | 16 +-
arch/x86/include/asm/fpu/xstate.h | 6 +-
arch/x86/include/asm/kvm_host.h | 6 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/vmx.h | 8 +
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/fpu/xstate.c | 90 +++++----
arch/x86/kvm/cpuid.c | 32 +++-
arch/x86/kvm/cpuid.h | 10 +
arch/x86/kvm/smm.c | 17 ++
arch/x86/kvm/smm.h | 2 +-
arch/x86/kvm/vmx/capabilities.h | 10 +
arch/x86/kvm/vmx/nested.c | 49 ++++-
arch/x86/kvm/vmx/nested.h | 7 +
arch/x86/kvm/vmx/vmcs12.c | 6 +
arch/x86/kvm/vmx/vmcs12.h | 14 +-
arch/x86/kvm/vmx/vmx.c | 134 ++++++++++++-
arch/x86/kvm/vmx/vmx.h | 6 +-
arch/x86/kvm/x86.c | 228 +++++++++++++++++++++--
arch/x86/kvm/x86.h | 31 +++
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 1 +
30 files changed, 606 insertions(+), 86 deletions(-)


base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
--
2.27.0



2023-07-21 06:10:34

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 07/20] KVM:x86: Add fault checks for guest CR4.CET setting

Check potential faults for CR4.CET setting per Intel SDM.
CET can be enabled if and only if CR0.WP==1, i.e. setting
CR4.CET=1 faults if CR0.WP==0 and setting CR0.WP=0 fails
if CR4.CET==1.

Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/x86.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04f0245ad0a2..be76ac6bbb21 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -993,6 +993,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
(is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
return 1;

+ if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+ return 1;
+
static_call(kvm_x86_set_cr0)(vcpu, cr0);

kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1204,6 +1207,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
return 1;
}

+ if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+ return 1;
+
static_call(kvm_x86_set_cr4)(vcpu, cr4);

kvm_post_set_cr4(vcpu, old_cr4, cr4);
--
2.27.0


2023-07-21 06:10:38

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 01/20] x86/cpufeatures: Add CPU feature flags for shadow stacks

From: Rick Edgecombe <[email protected]>

The Control-Flow Enforcement Technology contains two related features,
one of which is Shadow Stacks. Future patches will utilize this feature
for shadow stack support in KVM, so add a CPU feature flags for Shadow
Stacks (CPUID.(EAX=7,ECX=0):ECX[bit 7]).

To protect shadow stack state from malicious modification, the registers
are only accessible in supervisor mode. This implementation
context-switches the registers with XSAVES. Make X86_FEATURE_SHSTK depend
on XSAVES.

The shadow stack feature, enumerated by the CPUID bit described above,
encompasses both supervisor and userspace support for shadow stack. In
near future patches, only userspace shadow stack will be enabled. In
expectation of future supervisor shadow stack support, create a software
CPU capability to enumerate kernel utilization of userspace shadow stack
support. This user shadow stack bit should depend on the HW "shstk"
capability and that logic will be implemented in future patches.

Co-developed-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-9-rick.p.edgecombe%40intel.com
---
arch/x86/include/asm/cpufeatures.h | 2 ++
arch/x86/include/asm/disabled-features.h | 8 +++++++-
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..d7215c8b7923 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
#define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
#define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */
#define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */
+#define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */
@@ -380,6 +381,7 @@
#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */
#define X86_FEATURE_WAITPKG (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE Instructions */
#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */
+#define X86_FEATURE_SHSTK (16*32+ 7) /* "" Shadow stack */
#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */
#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */
#define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..b9c7eae2e70f 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -105,6 +105,12 @@
# define DISABLE_TDX_GUEST (1 << (X86_FEATURE_TDX_GUEST & 31))
#endif

+#ifdef CONFIG_X86_USER_SHADOW_STACK
+#define DISABLE_USER_SHSTK 0
+#else
+#define DISABLE_USER_SHSTK (1 << (X86_FEATURE_USER_SHSTK & 31))
+#endif
+
/*
* Make sure to add features to the correct mask
*/
@@ -120,7 +126,7 @@
#define DISABLED_MASK9 (DISABLE_SGX)
#define DISABLED_MASK10 0
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
- DISABLE_CALL_DEPTH_TRACKING)
+ DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
#define DISABLED_MASK12 (DISABLE_LAM)
#define DISABLED_MASK13 0
#define DISABLED_MASK14 0
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..e462c1d3800a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_XFD, X86_FEATURE_XSAVES },
{ X86_FEATURE_XFD, X86_FEATURE_XGETBV1 },
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
+ { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{}
};

--
2.27.0


2023-07-21 06:12:37

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 15/20] KVM:VMX: Save host MSR_IA32_S_CET to VMCS field

Save host MSR_IA32_S_CET to VMCS field as host constant state.
Kernel IBT is supported now and the setting in MSR_IA32_S_CET
is static after post-boot except in BIOS call case, but vCPU
won't execute such BIOS call path currently, so it's safe to
make the MSR as host constant.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/capabilities.h | 4 ++++
arch/x86/kvm/vmx/vmx.c | 8 ++++++++
2 files changed, 12 insertions(+)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d0abee35d7ba..b1883f6c08eb 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
}

+static inline bool cpu_has_load_cet_ctrl(void)
+{
+ return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
static inline bool cpu_has_vmx_mpx(void)
{
return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 85cb7e748a89..cba24acf1a7a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -109,6 +109,8 @@ module_param(enable_apicv, bool, S_IRUGO);
bool __read_mostly enable_ipiv = true;
module_param(enable_ipiv, bool, 0444);

+static u64 __read_mostly host_s_cet;
+
/*
* If nested=1, nested virtualization is supported, i.e., guests may use
* VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -4355,6 +4357,9 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)

if (cpu_has_load_ia32_efer())
vmcs_write64(HOST_IA32_EFER, host_efer);
+
+ if (cpu_has_load_cet_ctrl())
+ vmcs_writel(HOST_S_CET, host_s_cet);
}

void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
@@ -8633,6 +8638,9 @@ static __init int hardware_setup(void)
return r;
}

+ if (cpu_has_load_cet_ctrl())
+ rdmsrl_safe(MSR_IA32_S_CET, &host_s_cet);
+
vmx_set_cpu_caps();

r = alloc_kvm_area();
--
2.27.0


2023-07-21 06:15:36

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 17/20] KVM:x86: Enable CET virtualization for VMX and advertise to userspace

Enable CET related feature bits in KVM capabilities array and make
X86_CR4_CET available to guest. Remove the feature bits if host side
dependencies cannot be met.

Set the feature bits so that CET features are available in guest CPUID.
Add CR4.CET bit support in order to allow guest set CET master control
bit(CR4.CET).

Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET.
Don't expose CET feature if dependent CET bit(U_CET) is cleared in host
XSS or if XSAVES isn't supported.

The CET bits in VM_ENTRY/VM_EXIT control fields should be set to make guest
CET states isolated from host side. CET is only available on platforms that
enumerate VMX_BASIC[bit 56] as 1.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kvm/cpuid.c | 12 ++++++++++--
arch/x86/kvm/vmx/capabilities.h | 6 ++++++
arch/x86/kvm/vmx/vmx.c | 22 +++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 6 ++++--
arch/x86/kvm/x86.c | 16 +++++++++++++++-
arch/x86/kvm/x86.h | 3 +++
8 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c50b555234fb..f883696723f4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -125,7 +125,8 @@
| X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
| X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
- | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+ | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+ | X86_CR4_CET))

#define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..7ce0850c6067 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1078,6 +1078,7 @@
#define VMX_BASIC_MEM_TYPE_MASK 0x003c000000000000LLU
#define VMX_BASIC_MEM_TYPE_WB 6LLU
#define VMX_BASIC_INOUT 0x0040000000000000LLU
+#define VMX_BASIC_NO_HW_ERROR_CODE 0x0100000000000000LLU

/* Resctrl MSRs: */
/* - Intel: */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0338316b827c..1a601be7b4fa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -624,7 +624,7 @@ void kvm_set_cpu_caps(void)
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
- F(SGX_LC) | F(BUS_LOCK_DETECT)
+ F(SGX_LC) | F(BUS_LOCK_DETECT) | F(SHSTK)
);
/* Set LA57 based on hardware capability. */
if (cpuid_ecx(7) & F(LA57))
@@ -642,7 +642,8 @@ void kvm_set_cpu_caps(void)
F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
- F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
+ F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D) |
+ F(IBT)
);

/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
@@ -655,6 +656,13 @@ void kvm_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP);
if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
+ /*
+ * The feature bit in boot_cpu_data.x86_capability could have been
+ * cleared due to ibt=off cmdline option, then add it back if CPU
+ * supports IBT.
+ */
+ if (cpuid_edx(7) & F(IBT))
+ kvm_cpu_cap_set(X86_FEATURE_IBT);

kvm_cpu_cap_mask(CPUID_7_1_EAX,
F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index b1883f6c08eb..2948a288d0b4 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -79,6 +79,12 @@ static inline bool cpu_has_vmx_basic_inout(void)
return (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
}

+static inline bool cpu_has_vmx_basic_no_hw_errcode(void)
+{
+ return ((u64)vmcs_config.basic_cap << 32) &
+ VMX_BASIC_NO_HW_ERROR_CODE;
+}
+
static inline bool cpu_has_virtual_nmis(void)
{
return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3eb4fe9c9ab6..3f2f966e327d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2641,6 +2641,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
{ VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER },
{ VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS },
{ VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL },
+ { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE },
};

memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -2761,7 +2762,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
rdmsrl(MSR_IA32_VMX_MISC, misc_msr);

vmcs_conf->size = vmx_msr_high & 0x1fff;
- vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
+ vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;

vmcs_conf->revision_id = vmx_msr_low;

@@ -6359,6 +6360,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);

+ if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE) {
+ pr_err("S_CET = 0x%016lx\n", vmcs_readl(GUEST_S_CET));
+ pr_err("SSP = 0x%016lx\n", vmcs_readl(GUEST_SSP));
+ pr_err("INTR SSP TABLE = 0x%016lx\n",
+ vmcs_readl(GUEST_INTR_SSP_TABLE));
+ }
pr_err("*** Host State ***\n");
pr_err("RIP = 0x%016lx RSP = 0x%016lx\n",
vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6436,6 +6443,12 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
if (secondary_exec_control & SECONDARY_EXEC_ENABLE_VPID)
pr_err("Virtual processor ID = 0x%04x\n",
vmcs_read16(VIRTUAL_PROCESSOR_ID));
+ if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE) {
+ pr_err("S_CET = 0x%016lx\n", vmcs_readl(HOST_S_CET));
+ pr_err("SSP = 0x%016lx\n", vmcs_readl(HOST_SSP));
+ pr_err("INTR SSP TABLE = 0x%016lx\n",
+ vmcs_readl(HOST_INTR_SSP_TABLE));
+ }
}

/*
@@ -7966,6 +7979,13 @@ static __init void vmx_set_cpu_caps(void)

if (cpu_has_vmx_waitpkg())
kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+ if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
+ !cpu_has_vmx_basic_no_hw_errcode()) {
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ kvm_cpu_cap_clear(X86_FEATURE_IBT);
+ kvm_caps.supported_xss &= ~CET_XSTATE_MASK;
+ }
}

static void vmx_request_immediate_exit(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 32384ba38499..4e88b5fb45e8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -481,7 +481,8 @@ static inline u8 vmx_get_rvi(void)
VM_ENTRY_LOAD_IA32_EFER | \
VM_ENTRY_LOAD_BNDCFGS | \
VM_ENTRY_PT_CONCEAL_PIP | \
- VM_ENTRY_LOAD_IA32_RTIT_CTL)
+ VM_ENTRY_LOAD_IA32_RTIT_CTL | \
+ VM_ENTRY_LOAD_CET_STATE)

#define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \
(VM_EXIT_SAVE_DEBUG_CONTROLS | \
@@ -503,7 +504,8 @@ static inline u8 vmx_get_rvi(void)
VM_EXIT_LOAD_IA32_EFER | \
VM_EXIT_CLEAR_BNDCFGS | \
VM_EXIT_PT_CONCEAL_PIP | \
- VM_EXIT_CLEAR_IA32_RTIT_CTL)
+ VM_EXIT_CLEAR_IA32_RTIT_CTL | \
+ VM_EXIT_LOAD_CET_STATE)

#define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \
(PIN_BASED_EXT_INTR_MASK | \
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 49049454caf4..665593d75251 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -228,7 +228,7 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)

-#define KVM_SUPPORTED_XSS 0
+#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER)

u64 __read_mostly host_efer;
EXPORT_SYMBOL_GPL(host_efer);
@@ -9648,6 +9648,20 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)

kvm_ops_update(ops);

+ if (!kvm_is_cet_supported()) {
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ kvm_cpu_cap_clear(X86_FEATURE_IBT);
+ }
+
+ /*
+ * If SHSTK and IBT are not available in KVM, clear CET user bit in
+ * kvm_caps.supported_xss so that kvm_is_cet__supported() returns
+ * false when called.
+ */
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ kvm_caps.supported_xss &= ~CET_XSTATE_MASK;
+
for_each_online_cpu(cpu) {
smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
if (r < 0)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 09dd35a79ff3..9c88ddfb3e97 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -538,6 +538,9 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
__reserved_bits |= X86_CR4_VMXE; \
if (!__cpu_has(__c, X86_FEATURE_PCID)) \
__reserved_bits |= X86_CR4_PCIDE; \
+ if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \
+ !__cpu_has(__c, X86_FEATURE_IBT)) \
+ __reserved_bits |= X86_CR4_CET; \
__reserved_bits; \
})

--
2.27.0


2023-07-21 06:15:45

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 20/20] KVM:nVMX: Enable CET support for nested VM

Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed
setting for CR4 to enable CET for nested VM.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 27 +++++++++++++++++++++++++--
arch/x86/kvm/vmx/vmcs12.c | 6 ++++++
arch/x86/kvm/vmx/vmcs12.h | 14 +++++++++++++-
arch/x86/kvm/vmx/vmx.c | 2 ++
4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9bcd989252f7..bd6883033f69 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -660,6 +660,28 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_FLUSH_CMD, MSR_TYPE_W);

+ /* Pass CET MSRs to nested VM if L0 and L1 are set to pass-through. */
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_U_CET, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_S_CET, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW);
+
kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);

vmx->nested.force_msr_bitmap_recalc = false;
@@ -6793,7 +6815,7 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
VM_EXIT_HOST_ADDR_SPACE_SIZE |
#endif
VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
- VM_EXIT_CLEAR_BNDCFGS;
+ VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
msrs->exit_ctls_high |=
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -6815,7 +6837,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
#endif
- VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+ VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+ VM_ENTRY_LOAD_CET_STATE;
msrs->entry_ctls_high |=
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..4233b5ca9461 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = {
FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+ FIELD(GUEST_S_CET, guest_s_cet),
+ FIELD(GUEST_SSP, guest_ssp),
+ FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
FIELD(HOST_CR0, host_cr0),
FIELD(HOST_CR3, host_cr3),
FIELD(HOST_CR4, host_cr4),
@@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = {
FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
FIELD(HOST_RSP, host_rsp),
FIELD(HOST_RIP, host_rip),
+ FIELD(HOST_S_CET, host_s_cet),
+ FIELD(HOST_SSP, host_ssp),
+ FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
};
const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 01936013428b..3884489e7f7e 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
natural_width host_ia32_sysenter_eip;
natural_width host_rsp;
natural_width host_rip;
- natural_width paddingl[8]; /* room for future expansion */
+ natural_width host_s_cet;
+ natural_width host_ssp;
+ natural_width host_ssp_tbl;
+ natural_width guest_s_cet;
+ natural_width guest_ssp;
+ natural_width guest_ssp_tbl;
+ natural_width paddingl[2]; /* room for future expansion */
u32 pin_based_vm_exec_control;
u32 cpu_based_vm_exec_control;
u32 exception_bitmap;
@@ -292,6 +298,12 @@ static inline void vmx_check_vmcs12_offsets(void)
CHECK_OFFSET(host_ia32_sysenter_eip, 656);
CHECK_OFFSET(host_rsp, 664);
CHECK_OFFSET(host_rip, 672);
+ CHECK_OFFSET(host_s_cet, 680);
+ CHECK_OFFSET(host_ssp, 688);
+ CHECK_OFFSET(host_ssp_tbl, 696);
+ CHECK_OFFSET(guest_s_cet, 704);
+ CHECK_OFFSET(guest_ssp, 712);
+ CHECK_OFFSET(guest_ssp_tbl, 720);
CHECK_OFFSET(pin_based_vm_exec_control, 744);
CHECK_OFFSET(cpu_based_vm_exec_control, 748);
CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3f2f966e327d..dd68dc73f723 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7728,6 +7728,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU));
cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP));
cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57));
+ cr4_fixed1_update(X86_CR4_CET, ecx, feature_bit(SHSTK));
+ cr4_fixed1_update(X86_CR4_CET, edx, feature_bit(IBT));

#undef cr4_fixed1_update
}
--
2.27.0


2023-07-21 06:30:02

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 02/20] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states

From: Rick Edgecombe <[email protected]>

Shadow stack register state can be managed with XSAVE. The registers
can logically be separated into two groups:
* Registers controlling user-mode operation
* Registers controlling kernel-mode operation

The architecture has two new XSAVE state components: one for each group
of those groups of registers. This lets an OS manage them separately if
it chooses. Future patches for host userspace and KVM guests will only
utilize the user-mode registers, so only configure XSAVE to save
user-mode registers. This state will add 16 bytes to the xsave buffer
size.

Future patches will use the user-mode XSAVE area to save guest user-mode
CET state. However, VMCS includes new fields for guest CET supervisor
states. KVM can use these to save and restore guest supervisor state, so
host supervisor XSAVE support is not required.

Adding this exacerbates the already unwieldy if statement in
check_xstate_against_struct() that handles warning about unimplemented
xfeatures. So refactor these check's by having XCHECK_SZ() set a bool when
it actually check's the xfeature. This ends up exceeding 80 chars, but was
better on balance than other options explored. Pass the bool as pointer to
make it clear that XCHECK_SZ() can change the variable.

While configuring user-mode XSAVE, clarify kernel-mode registers are not
managed by XSAVE by defining the xfeature in
XFEATURE_MASK_SUPERVISOR_UNSUPPORTED, like is done for XFEATURE_MASK_PT.
This serves more of a documentation as code purpose, and functionally,
only enables a few safety checks.

Both XSAVE state components are supervisor states, even the state
controlling user-mode operation. This is a departure from earlier features
like protection keys where the PKRU state is a normal user
(non-supervisor) state. Having the user state be supervisor-managed
ensures there is no direct, unprivileged access to it, making it harder
for an attacker to subvert CET.

To facilitate this privileged access, define the two user-mode CET MSRs,
and the bits defined in those MSRs relevant to future shadow stack
enablement patches.

Co-developed-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Yu-cheng Yu <[email protected]>
Signed-off-by: Rick Edgecombe <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Mike Rapoport (IBM) <[email protected]>
Tested-by: Pengfei Xu <[email protected]>
Tested-by: John Allen <[email protected]>
Tested-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/all/20230613001108.3040476-25-rick.p.edgecombe%40intel.com
---
arch/x86/include/asm/fpu/types.h | 16 +++++-
arch/x86/include/asm/fpu/xstate.h | 6 ++-
arch/x86/kernel/fpu/xstate.c | 90 +++++++++++++++----------------
3 files changed, 61 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 7f6d858ff47a..eb810074f1e7 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -115,8 +115,8 @@ enum xfeature {
XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
XFEATURE_PKRU,
XFEATURE_PASID,
- XFEATURE_RSRVD_COMP_11,
- XFEATURE_RSRVD_COMP_12,
+ XFEATURE_CET_USER,
+ XFEATURE_CET_KERNEL_UNUSED,
XFEATURE_RSRVD_COMP_13,
XFEATURE_RSRVD_COMP_14,
XFEATURE_LBR,
@@ -138,6 +138,8 @@ enum xfeature {
#define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
#define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
#define XFEATURE_MASK_PASID (1 << XFEATURE_PASID)
+#define XFEATURE_MASK_CET_USER (1 << XFEATURE_CET_USER)
+#define XFEATURE_MASK_CET_KERNEL (1 << XFEATURE_CET_KERNEL_UNUSED)
#define XFEATURE_MASK_LBR (1 << XFEATURE_LBR)
#define XFEATURE_MASK_XTILE_CFG (1 << XFEATURE_XTILE_CFG)
#define XFEATURE_MASK_XTILE_DATA (1 << XFEATURE_XTILE_DATA)
@@ -252,6 +254,16 @@ struct pkru_state {
u32 pad;
} __packed;

+/*
+ * State component 11 is Control-flow Enforcement user states
+ */
+struct cet_user_state {
+ /* user control-flow settings */
+ u64 user_cet;
+ /* user shadow stack pointer */
+ u64 user_ssp;
+};
+
/*
* State component 15: Architectural LBR configuration state.
* The size of Arch LBR state depends on the number of LBRs (lbr_depth).
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index cd3dd170e23a..d4427b88ee12 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -50,7 +50,8 @@
#define XFEATURE_MASK_USER_DYNAMIC XFEATURE_MASK_XTILE_DATA

/* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID | \
+ XFEATURE_MASK_CET_USER)

/*
* A supervisor state component may not always contain valuable information,
@@ -77,7 +78,8 @@
* Unsupported supervisor features. When a supervisor feature in this mask is
* supported in the future, move it to the supported supervisor feature mask.
*/
-#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT)
+#define XFEATURE_MASK_SUPERVISOR_UNSUPPORTED (XFEATURE_MASK_PT | \
+ XFEATURE_MASK_CET_KERNEL)

/* All supervisor states including supported and unsupported states. */
#define XFEATURE_MASK_SUPERVISOR_ALL (XFEATURE_MASK_SUPERVISOR_SUPPORTED | \
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0bab497c9436..4fa4751912d9 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -39,26 +39,26 @@
*/
static const char *xfeature_names[] =
{
- "x87 floating point registers" ,
- "SSE registers" ,
- "AVX registers" ,
- "MPX bounds registers" ,
- "MPX CSR" ,
- "AVX-512 opmask" ,
- "AVX-512 Hi256" ,
- "AVX-512 ZMM_Hi256" ,
- "Processor Trace (unused)" ,
+ "x87 floating point registers",
+ "SSE registers",
+ "AVX registers",
+ "MPX bounds registers",
+ "MPX CSR",
+ "AVX-512 opmask",
+ "AVX-512 Hi256",
+ "AVX-512 ZMM_Hi256",
+ "Processor Trace (unused)",
"Protection Keys User registers",
"PASID state",
- "unknown xstate feature" ,
- "unknown xstate feature" ,
- "unknown xstate feature" ,
- "unknown xstate feature" ,
- "unknown xstate feature" ,
- "unknown xstate feature" ,
- "AMX Tile config" ,
- "AMX Tile data" ,
- "unknown xstate feature" ,
+ "Control-flow User registers",
+ "Control-flow Kernel registers (unused)",
+ "unknown xstate feature",
+ "unknown xstate feature",
+ "unknown xstate feature",
+ "unknown xstate feature",
+ "AMX Tile config",
+ "AMX Tile data",
+ "unknown xstate feature",
};

static unsigned short xsave_cpuid_features[] __initdata = {
@@ -73,6 +73,7 @@ static unsigned short xsave_cpuid_features[] __initdata = {
[XFEATURE_PT_UNIMPLEMENTED_SO_FAR] = X86_FEATURE_INTEL_PT,
[XFEATURE_PKRU] = X86_FEATURE_PKU,
[XFEATURE_PASID] = X86_FEATURE_ENQCMD,
+ [XFEATURE_CET_USER] = X86_FEATURE_SHSTK,
[XFEATURE_XTILE_CFG] = X86_FEATURE_AMX_TILE,
[XFEATURE_XTILE_DATA] = X86_FEATURE_AMX_TILE,
};
@@ -276,6 +277,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
print_xstate_feature(XFEATURE_MASK_PKRU);
print_xstate_feature(XFEATURE_MASK_PASID);
+ print_xstate_feature(XFEATURE_MASK_CET_USER);
print_xstate_feature(XFEATURE_MASK_XTILE_CFG);
print_xstate_feature(XFEATURE_MASK_XTILE_DATA);
}
@@ -344,6 +346,7 @@ static __init void os_xrstor_booting(struct xregs_state *xstate)
XFEATURE_MASK_BNDREGS | \
XFEATURE_MASK_BNDCSR | \
XFEATURE_MASK_PASID | \
+ XFEATURE_MASK_CET_USER | \
XFEATURE_MASK_XTILE)

/*
@@ -446,14 +449,15 @@ static void __init __xstate_dump_leaves(void)
} \
} while (0)

-#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \
- if ((nr == nr_macro) && \
- WARN_ONCE(sz != sizeof(__struct), \
- "%s: struct is %zu bytes, cpu state %d bytes\n", \
- __stringify(nr_macro), sizeof(__struct), sz)) { \
+#define XCHECK_SZ(sz, nr, __struct) ({ \
+ if (WARN_ONCE(sz != sizeof(__struct), \
+ "[%s]: struct is %zu bytes, cpu state %d bytes\n", \
+ xfeature_names[nr], sizeof(__struct), sz)) { \
__xstate_dump_leaves(); \
} \
-} while (0)
+ true; \
+})
+

/**
* check_xtile_data_against_struct - Check tile data state size.
@@ -527,36 +531,28 @@ static bool __init check_xstate_against_struct(int nr)
* Ask the CPU for the size of the state.
*/
int sz = xfeature_size(nr);
+
/*
* Match each CPU state with the corresponding software
* structure.
*/
- XCHECK_SZ(sz, nr, XFEATURE_YMM, struct ymmh_struct);
- XCHECK_SZ(sz, nr, XFEATURE_BNDREGS, struct mpx_bndreg_state);
- XCHECK_SZ(sz, nr, XFEATURE_BNDCSR, struct mpx_bndcsr_state);
- XCHECK_SZ(sz, nr, XFEATURE_OPMASK, struct avx_512_opmask_state);
- XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
- XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state);
- XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state);
- XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);
- XCHECK_SZ(sz, nr, XFEATURE_XTILE_CFG, struct xtile_cfg);
-
- /* The tile data size varies between implementations. */
- if (nr == XFEATURE_XTILE_DATA)
- check_xtile_data_against_struct(sz);
-
- /*
- * Make *SURE* to add any feature numbers in below if
- * there are "holes" in the xsave state component
- * numbers.
- */
- if ((nr < XFEATURE_YMM) ||
- (nr >= XFEATURE_MAX) ||
- (nr == XFEATURE_PT_UNIMPLEMENTED_SO_FAR) ||
- ((nr >= XFEATURE_RSRVD_COMP_11) && (nr <= XFEATURE_RSRVD_COMP_16))) {
+ switch (nr) {
+ case XFEATURE_YMM: return XCHECK_SZ(sz, nr, struct ymmh_struct);
+ case XFEATURE_BNDREGS: return XCHECK_SZ(sz, nr, struct mpx_bndreg_state);
+ case XFEATURE_BNDCSR: return XCHECK_SZ(sz, nr, struct mpx_bndcsr_state);
+ case XFEATURE_OPMASK: return XCHECK_SZ(sz, nr, struct avx_512_opmask_state);
+ case XFEATURE_ZMM_Hi256: return XCHECK_SZ(sz, nr, struct avx_512_zmm_uppers_state);
+ case XFEATURE_Hi16_ZMM: return XCHECK_SZ(sz, nr, struct avx_512_hi16_state);
+ case XFEATURE_PKRU: return XCHECK_SZ(sz, nr, struct pkru_state);
+ case XFEATURE_PASID: return XCHECK_SZ(sz, nr, struct ia32_pasid_state);
+ case XFEATURE_XTILE_CFG: return XCHECK_SZ(sz, nr, struct xtile_cfg);
+ case XFEATURE_CET_USER: return XCHECK_SZ(sz, nr, struct cet_user_state);
+ case XFEATURE_XTILE_DATA: check_xtile_data_against_struct(sz); return true;
+ default:
XSTATE_WARN_ON(1, "No structure for xstate: %d\n", nr);
return false;
}
+
return true;
}

--
2.27.0


2023-07-21 06:34:04

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs

Add VMX specific emulation for CET MSR read and write.
IBT feature is only available on Intel platforms now and the
virtualization interface to the control fields is vensor
specific, so split this part from the common code.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 7 -------
2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c8d9870cfecb..b29817ec6f2e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2093,6 +2093,21 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ return kvm_get_msr_common(vcpu, msr_info);
+ case MSR_IA32_S_CET:
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_INT_SSP_TAB:
+ if (kvm_get_msr_common(vcpu, msr_info))
+ return 1;
+ if (msr_info->index == MSR_KVM_GUEST_SSP)
+ msr_info->data = vmcs_readl(GUEST_SSP);
+ else if (msr_info->index == MSR_IA32_S_CET)
+ msr_info->data = vmcs_readl(GUEST_S_CET);
+ else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
+ msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+ break;
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
@@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
+#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6))
+#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
+#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ return kvm_set_msr_common(vcpu, msr_info);
+ break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_INT_SSP_TAB:
+ if ((msr_index == MSR_IA32_U_CET ||
+ msr_index == MSR_IA32_S_CET) &&
+ ((data & ~VMX_CET_CONTROL_MASK) ||
+ !IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
+ (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS))
+ return 1;
+ if (kvm_set_msr_common(vcpu, msr_info))
+ return 1;
+ if (msr_index == MSR_KVM_GUEST_SSP)
+ vmcs_writel(GUEST_SSP, data);
+ else if (msr_index == MSR_IA32_S_CET)
+ vmcs_writel(GUEST_S_CET, data);
+ else if (msr_index == MSR_IA32_INT_SSP_TAB)
+ vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+ break;
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 70d7c80889d6..e200f22cdaad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3642,13 +3642,6 @@ static inline bool is_shadow_stack_msr(struct kvm_vcpu *vcpu,
static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
struct msr_data *msr)
{
-
- /*
- * This function cannot work without later CET MSR read/write
- * emulation patch.
- */
- WARN_ON_ONCE(1);
-
if (is_shadow_stack_msr(vcpu, msr)) {
if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
return false;
--
2.27.0


2023-07-21 06:34:48

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 14/20] KVM:VMX: Set up interception for CET MSRs

Pass through CET MSRs when the associated feature is enabled.
Shadow Stack feature requires all the CET MSRs to make it
architectural support in guest. IBT feature only depends on
MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and
supervisor IBT.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b29817ec6f2e..85cb7e748a89 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -709,6 +709,10 @@ static bool is_valid_passthrough_msr(u32 msr)
case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
return true;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ return true;
}

r = possible_passthrough_msr_slot(msr) != -ENOENT;
@@ -7758,6 +7762,34 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
}

+static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
+{
+ if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB,
+ MSR_TYPE_RW, false);
+ return;
+ }
+
+ if (guest_can_use(vcpu, X86_FEATURE_IBT)) {
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
+ MSR_TYPE_RW, false);
+ }
+}
+
static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -7825,6 +7857,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)

/* Refresh #PF interception to account for MAXPHYADDR changes. */
vmx_update_exception_bitmap(vcpu);
+
+ if (kvm_is_cet_supported())
+ vmx_update_intercept_for_cet_msr(vcpu);
}

static u64 vmx_get_perf_capabilities(void)
--
2.27.0


2023-07-21 06:37:43

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 16/20] KVM:x86: Optimize CET supervisor SSP save/reload

Make PL{0,1,2}_SSP as write-intercepted to detect whether
guest is using these MSRs. Disable intercept to the MSRs
if they're written with non-zero values. KVM does save/
reload for the MSRs only if they're used by guest.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/vmx.c | 34 +++++++++++++++++++++++++++++----
arch/x86/kvm/x86.c | 15 ++++++++++-----
3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69cbc9d9b277..c50b555234fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -748,6 +748,7 @@ struct kvm_vcpu_arch {
bool tpr_access_reporting;
bool xsaves_enabled;
bool xfd_no_write_intercept;
+ bool cet_sss_active;
u64 ia32_xss;
u64 microcode_version;
u64 arch_capabilities;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cba24acf1a7a..3eb4fe9c9ab6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2155,6 +2155,18 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}

+static void vmx_disable_write_intercept_sss_msr(struct kvm_vcpu *vcpu)
+{
+ if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
+ MSR_TYPE_RW, false);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
+ MSR_TYPE_RW, false);
+ }
+}
+
/*
* Writes msr value into the appropriate "register".
* Returns 0 on success, non-0 otherwise.
@@ -2427,7 +2439,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
- return kvm_set_msr_common(vcpu, msr_info);
+ if (kvm_set_msr_common(vcpu, msr_info))
+ return 1;
+ /*
+ * Write to the base SSP MSRs should happen ahead of toggling
+ * of IA32_S_CET.SH_STK_EN bit.
+ */
+ if (msr_index != MSR_IA32_PL3_SSP && data) {
+ vmx_disable_write_intercept_sss_msr(vcpu);
+ wrmsrl(msr_index, data);
+ }
break;
case MSR_IA32_U_CET:
case MSR_IA32_S_CET:
@@ -7774,12 +7795,17 @@ static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
MSR_TYPE_RW, false);
vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
MSR_TYPE_RW, false);
+ /*
+ * Supervisor shadow stack MSRs are intercepted until
+ * they're written by guest, this is designed to
+ * optimize the save/restore overhead.
+ */
vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
- MSR_TYPE_RW, false);
+ MSR_TYPE_R, false);
vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
- MSR_TYPE_RW, false);
+ MSR_TYPE_R, false);
vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
- MSR_TYPE_RW, false);
+ MSR_TYPE_R, false);
vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,
MSR_TYPE_RW, false);
vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e200f22cdaad..49049454caf4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4051,6 +4051,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
msr == MSR_IA32_PL2_SSP) {
vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
+ if (!vcpu->arch.cet_sss_active && data)
+ vcpu->arch.cet_sss_active = true;
} else if (msr == MSR_IA32_PL3_SSP) {
kvm_set_xsave_msr(msr_info);
}
@@ -11252,7 +11254,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
kvm_sigset_activate(vcpu);
kvm_run->flags = 0;
kvm_load_guest_fpu(vcpu);
- kvm_reload_cet_supervisor_ssp(vcpu);
+ if (vcpu->arch.cet_sss_active)
+ kvm_reload_cet_supervisor_ssp(vcpu);

kvm_vcpu_srcu_read_lock(vcpu);
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
@@ -11341,7 +11344,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
r = vcpu_run(vcpu);

out:
- kvm_save_cet_supervisor_ssp(vcpu);
+ if (vcpu->arch.cet_sss_active)
+ kvm_save_cet_supervisor_ssp(vcpu);
kvm_put_guest_fpu(vcpu);
if (kvm_run->kvm_valid_regs)
store_regs(vcpu);
@@ -12430,15 +12434,16 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
pmu->need_cleanup = true;
kvm_make_request(KVM_REQ_PMU, vcpu);
}
-
- kvm_reload_cet_supervisor_ssp(vcpu);
+ if (vcpu->arch.cet_sss_active)
+ kvm_reload_cet_supervisor_ssp(vcpu);

static_call(kvm_x86_sched_in)(vcpu, cpu);
}

void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu)
{
- kvm_save_cet_supervisor_ssp(vcpu);
+ if (vcpu->arch.cet_sss_active)
+ kvm_save_cet_supervisor_ssp(vcpu);
}

void kvm_arch_free_vm(struct kvm *kvm)
--
2.27.0


2023-07-21 06:38:26

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 12/20] KVM:VMX: Introduce CET VMCS fields and control bits

Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.

Shadow Stack (SHSTK):
A shadow stack is a second stack used exclusively for control transfer
operations. The shadow stack is separate from the data/normal stack and
can be enabled individually in user and kernel mode. When shadow stack
is enabled, CALL pushes the return address on both the data and shadow
stack. RET pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor
generates a #CP.

Indirect Branch Tracking (IBT):
IBT introduces new instruction(ENDBRANCH)to mark valid target addresses of
indirect branches (CALL, JMP etc...). If an indirect branch is executed
and the next instruction is _not_ an ENDBRANCH, the processor generates a
#CP. These instruction behaves as a NOP on platforms that doesn't support
CET.

Several new CET MSRs are defined to support CET:
MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} mode respectively.

MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.

MSR_IA32_INT_SSP_TAB: Linear address of SHSTK table,the entry is indexed
by IST of interrupt gate desc.

Two XSAVES state bits are introduced for CET:
IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.

Six VMCS fields are introduced for CET:
{HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
{HOST,GUEST}_SSP: Stores shadow stack pointer of current active task/thread.
{HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.

On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET states are restored from
the following VMCS fields at VM-Exit:
HOST_S_CET
HOST_SSP
HOST_INTR_SSP_TABLE

If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET states are loaded from
the following VMCS fields at VM-Entry:
GUEST_S_CET
GUEST_SSP
GUEST_INTR_SSP_TABLE

Co-developed-by: Zhang Yi Z <[email protected]>
Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/vmx.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0d02c4aafa6f..db7f93307349 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -104,6 +104,7 @@
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
#define VM_EXIT_PT_CONCEAL_PIP 0x01000000
#define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000
+#define VM_EXIT_LOAD_CET_STATE 0x10000000

#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff

@@ -117,6 +118,7 @@
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
#define VM_ENTRY_PT_CONCEAL_PIP 0x00020000
#define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000
+#define VM_ENTRY_LOAD_CET_STATE 0x00100000

#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff

@@ -345,6 +347,9 @@ enum vmcs_field {
GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
GUEST_SYSENTER_ESP = 0x00006824,
GUEST_SYSENTER_EIP = 0x00006826,
+ GUEST_S_CET = 0x00006828,
+ GUEST_SSP = 0x0000682a,
+ GUEST_INTR_SSP_TABLE = 0x0000682c,
HOST_CR0 = 0x00006c00,
HOST_CR3 = 0x00006c02,
HOST_CR4 = 0x00006c04,
@@ -357,6 +362,9 @@ enum vmcs_field {
HOST_IA32_SYSENTER_EIP = 0x00006c12,
HOST_RSP = 0x00006c14,
HOST_RIP = 0x00006c16,
+ HOST_S_CET = 0x00006c18,
+ HOST_SSP = 0x00006c1a,
+ HOST_INTR_SSP_TABLE = 0x00006c1c
};

/*
--
2.27.0


2023-07-21 06:38:51

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 10/20] KVM:x86: Make guest supervisor states as non-XSAVE managed

Save and reload guest CET supervisor states, i.e.,PL{0,1,2}_SSP,
when vCPU context is being swapped before and after userspace
<->kernel entry, also do the same operation when vCPU is sched-in
or sched-out.

Enabling CET supervisor state management in KVM due to:
-Introducing unnecessary XSAVE operation when switch to non-vCPU
userspace within current FPU framework.
-Forcing allocating additional space for CET supervisor states in
each thread context regardless whether it's vCPU thread or not.

Add a new helper kvm_arch_sched_out() for that purpose. Adding
the support in kvm_arch_vcpu_put/load() without the new helper
looks possible, but the put/load functions are also called in
vcpu_put()/load(), the latter are heavily used in KVM, so adding
new helper makes the implementation clearer.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/mips/include/asm/kvm_host.h | 1 +
arch/powerpc/include/asm/kvm_host.h | 1 +
arch/riscv/include/asm/kvm_host.h | 1 +
arch/s390/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 37 +++++++++++++++++++++++++++++
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 1 +
8 files changed, 44 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7e7e19ef6993..98235cb3d258 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1023,6 +1023,7 @@ void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu);

static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}

void kvm_arm_init_debug(void);
void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 957121a495f0..56c5e85ba5a3 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -893,6 +893,7 @@ static inline void kvm_arch_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 14ee0dece853..11587d953bf6 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -880,6 +880,7 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index ee0acccb1d3b..6ff4a04fe0f2 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -244,6 +244,7 @@ struct kvm_vcpu_arch {

static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}

#define KVM_ARCH_WANT_MMU_NOTIFIER

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 2bbc3d54959d..d1750a6a86cf 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -1033,6 +1033,7 @@ extern int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc);

static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7a3753c05c09..f7558f0f6fc0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11212,6 +11212,33 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
trace_kvm_fpu(0);
}

+static void kvm_save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
+{
+ preempt_disable();
+ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+ rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
+ rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
+ rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
+ /*
+ * Omit reset to host PL{1,2}_SSP because Linux will never use
+ * these MSRs.
+ */
+ wrmsrl(MSR_IA32_PL0_SSP, 0);
+ }
+ preempt_enable();
+}
+
+static void kvm_reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
+{
+ preempt_disable();
+ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
+ wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
+ wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
+ wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
+ }
+ preempt_enable();
+}
+
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
{
struct kvm_queued_exception *ex = &vcpu->arch.exception;
@@ -11222,6 +11249,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
kvm_sigset_activate(vcpu);
kvm_run->flags = 0;
kvm_load_guest_fpu(vcpu);
+ kvm_reload_cet_supervisor_ssp(vcpu);

kvm_vcpu_srcu_read_lock(vcpu);
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
@@ -11310,6 +11338,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
r = vcpu_run(vcpu);

out:
+ kvm_save_cet_supervisor_ssp(vcpu);
kvm_put_guest_fpu(vcpu);
if (kvm_run->kvm_valid_regs)
store_regs(vcpu);
@@ -12398,9 +12427,17 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
pmu->need_cleanup = true;
kvm_make_request(KVM_REQ_PMU, vcpu);
}
+
+ kvm_reload_cet_supervisor_ssp(vcpu);
+
static_call(kvm_x86_sched_in)(vcpu, cpu);
}

+void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu)
+{
+ kvm_save_cet_supervisor_ssp(vcpu);
+}
+
void kvm_arch_free_vm(struct kvm *kvm)
{
kfree(to_kvm_hv(kvm)->hv_pa_pg);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d90331f16db1..b3032a5f0641 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1423,6 +1423,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);

void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
+void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu);

void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 66c1447d3c7f..42f28e8905e1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5885,6 +5885,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

+ kvm_arch_sched_out(vcpu, 0);
if (current->on_rq) {
WRITE_ONCE(vcpu->preempted, true);
WRITE_ONCE(vcpu->ready, true);
--
2.27.0


2023-07-21 06:39:05

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 19/20] KVM:nVMX: Refine error code injection to nested VM

Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver
a hardware exception with or without an error code, regardless
of vector"

Modify has_error_code check before inject events to nested guest.
Only enforce the check when guest is in real mode, the exception
is not hard exception and the platform doesn't enumerate bit56
in VMX_BASIC, otherwise ignore it.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++--------
arch/x86/kvm/vmx/nested.h | 7 +++++++
2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 516391cc0d64..9bcd989252f7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1205,9 +1205,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
{
const u64 feature_and_reserved =
/* feature (except bit 48; see below) */
- BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
+ BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
/* reserved */
- BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+ BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
u64 vmx_basic = vmcs_config.nested.basic;

if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
@@ -2846,12 +2846,16 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
return -EINVAL;

- /* VM-entry interruption-info field: deliver error code */
- should_have_error_code =
- intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
- x86_exception_has_error_code(vector);
- if (CC(has_error_code != should_have_error_code))
- return -EINVAL;
+ if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
+ !nested_cpu_has_no_hw_errcode(vcpu)) {
+ /* VM-entry interruption-info field: deliver error code */
+ should_have_error_code =
+ intr_type == INTR_TYPE_HARD_EXCEPTION &&
+ prot_mode &&
+ x86_exception_has_error_code(vector);
+ if (CC(has_error_code != should_have_error_code))
+ return -EINVAL;
+ }

/* VM-entry exception error code */
if (CC(has_error_code &&
@@ -6967,6 +6971,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)

if (cpu_has_vmx_basic_inout())
msrs->basic |= VMX_BASIC_INOUT;
+ if (cpu_has_vmx_basic_no_hw_errcode())
+ msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE;
}

static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 96952263b029..1884628294e4 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -284,6 +284,13 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
__kvm_is_valid_cr4(vcpu, val);
}

+static inline bool nested_cpu_has_no_hw_errcode(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ return vmx->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE;
+}
+
/* No difference in the restrictions on guest and host CR4 in VMX operation. */
#define nested_guest_cr4_valid nested_cr4_valid
#define nested_host_cr4_valid nested_cr4_valid
--
2.27.0


2023-07-21 06:39:04

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 11/20] KVM:x86: Save and reload GUEST_SSP to/from SMRAM

Save GUEST_SSP to SMRAM on SMI and reload it on RSM.
KVM emulates architectural behavior when guest enters/leaves SMM
mode, i.e., save registers to SMRAM at the entry of SMM and reload
them at the exit of SMM. Per SDM, GUEST_SSP is defined as one of
the fields in SMRAM for 64-bit mode, so handle the state accordingly.

Check HF_SMM_MASK to determine whether kvm_cet_is_msr_accessible()
is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/smm.c | 17 +++++++++++++++++
arch/x86/kvm/smm.h | 2 +-
arch/x86/kvm/x86.c | 12 +++++++++++-
3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index b42111a24cc2..a4e19d72224f 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -309,6 +309,15 @@ void enter_smm(struct kvm_vcpu *vcpu)

kvm_smm_changed(vcpu, true);

+#ifdef CONFIG_X86_64
+ if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
+ u64 data;
+
+ if (!kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &data))
+ smram.smram64.ssp = data;
+ }
+#endif
+
if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
goto error;

@@ -586,6 +595,14 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0)
static_call(kvm_x86_set_nmi_mask)(vcpu, false);

+#ifdef CONFIG_X86_64
+ if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
+ u64 data = smram.smram64.ssp;
+
+ if (is_noncanonical_address(data, vcpu) && IS_ALIGNED(data, 4))
+ kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, data);
+ }
+#endif
kvm_smm_changed(vcpu, false);

/*
diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index a1cf2ac5bd78..b3efef7cb1dc 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -116,7 +116,7 @@ struct kvm_smram_state_64 {
u32 smbase;
u32 reserved4[5];

- /* ssp and svm_* fields below are not implemented by KVM */
+ /* svm_* fields below are not implemented by KVM */
u64 ssp;
u64 svm_guest_pat;
u64 svm_host_efer;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f7558f0f6fc0..70d7c80889d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3653,8 +3653,18 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
return false;

- if (msr->index == MSR_KVM_GUEST_SSP)
+ /*
+ * This MSR is synthesized mainly for userspace access during
+ * Live Migration, it also can be accessed in SMM mode by VMM.
+ * Guest is not allowed to access this MSR.
+ */
+ if (msr->index == MSR_KVM_GUEST_SSP) {
+ if (IS_ENABLED(CONFIG_X86_64) &&
+ !!(vcpu->arch.hflags & HF_SMM_MASK))
+ return true;
+
return msr->host_initiated;
+ }

return msr->host_initiated ||
guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
--
2.27.0


2023-07-21 06:39:24

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 04/20] KVM:x86: Refresh CPUID on write to guest MSR_IA32_XSS

Update CPUID(EAX=0DH,ECX=1) when the guest's XSS is modified.
CPUID(EAX=0DH,ECX=1).EBX reports required storage size of
all enabled xstate features in XCR0 | XSS. Guest can allocate
sufficient xsave buffer based on the info.

Note, KVM does not yet support any XSS based features, i.e.
supported_xss is guaranteed to be zero at this time.

Co-developed-by: Zhang Yi Z <[email protected]>
Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 20 ++++++++++++++++++--
arch/x86/kvm/x86.c | 8 +++++---
3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 28bd38303d70..20bbcd95511f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -804,6 +804,7 @@ struct kvm_vcpu_arch {

u64 xcr0;
u64 guest_supported_xcr0;
+ u64 guest_supported_xss;

struct kvm_pio_request pio;
void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7f4d13383cf2..0338316b827c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -249,6 +249,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
}

+static u64 cpuid_get_supported_xss(struct kvm_cpuid_entry2 *entries, int nent)
+{
+ struct kvm_cpuid_entry2 *best;
+
+ best = cpuid_entry2_find(entries, nent, 0xd, 1);
+ if (!best)
+ return 0;
+
+ return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
+}
+
static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
int nent)
{
@@ -276,8 +287,11 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e

best = cpuid_entry2_find(entries, nent, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
- cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
- best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+ cpuid_entry_has(best, X86_FEATURE_XSAVEC))) {
+ u64 xstate = vcpu->arch.xcr0 | vcpu->arch.ia32_xss;
+
+ best->ebx = xstate_required_size(xstate, true);
+ }

best = __kvm_find_kvm_cpuid_features(vcpu, entries, nent);
if (kvm_hlt_in_guest(vcpu->kvm) && best &&
@@ -325,6 +339,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)

vcpu->arch.guest_supported_xcr0 =
cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+ vcpu->arch.guest_supported_xss =
+ cpuid_get_supported_xss(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);

/*
* FP+SSE can always be saved/restored via KVM_{G,S}ET_XSAVE, even if
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4597ab3c811c..26edb8fa857a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3781,10 +3781,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
* XSAVES/XRSTORS to save/restore PT MSRs.
*/
- if (data & ~kvm_caps.supported_xss)
+ if (data & ~vcpu->arch.guest_supported_xss)
return 1;
- vcpu->arch.ia32_xss = data;
- kvm_update_cpuid_runtime(vcpu);
+ if (vcpu->arch.ia32_xss != data) {
+ vcpu->arch.ia32_xss = data;
+ kvm_update_cpuid_runtime(vcpu);
+ }
break;
case MSR_SMI_COUNT:
if (!msr_info->host_initiated)
--
2.27.0


2023-07-21 06:39:47

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 05/20] KVM:x86: Initialize kvm_caps.supported_xss

Set kvm_caps.supported_xss to host_xss && KVM XSS mask.
host_xss contains the host supported xstate feature bits for thread
context switch, KVM_SUPPORTED_XSS includes all KVM enabled XSS feature
bits, the operation result represents all KVM supported feature bits.
Since the result is subset of host_xss, the related XSAVE-managed MSRs
are automatically swapped for guest and host when vCPU exits to
userspace.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 1 -
arch/x86/kvm/x86.c | 6 +++++-
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ecf4be2c6af..c8d9870cfecb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7849,7 +7849,6 @@ static __init void vmx_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_UMIP);

/* CPUID 0xD.1 */
- kvm_caps.supported_xss = 0;
if (!cpu_has_vmx_xsaves())
kvm_cpu_cap_clear(X86_FEATURE_XSAVES);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26edb8fa857a..8bdcbcf13146 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -225,6 +225,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)

+#define KVM_SUPPORTED_XSS 0
+
u64 __read_mostly host_efer;
EXPORT_SYMBOL_GPL(host_efer);

@@ -9499,8 +9501,10 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)

rdmsrl_safe(MSR_EFER, &host_efer);

- if (boot_cpu_has(X86_FEATURE_XSAVES))
+ if (boot_cpu_has(X86_FEATURE_XSAVES)) {
rdmsrl(MSR_IA32_XSS, host_xss);
+ kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+ }

kvm_init_pmu_capability(ops->pmu_ops);

--
2.27.0


2023-07-21 06:39:47

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 03/20] KVM:x86: Report XSS as to-be-saved if there are supported features

From: Sean Christopherson <[email protected]>

Add MSR_IA32_XSS to the list of MSRs reported to userspace if
supported_xss is non-zero, i.e. KVM supports at least one XSS
based feature.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/x86.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 439312e04384..4597ab3c811c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1451,6 +1451,7 @@ static const u32 msrs_to_save_base[] = {
MSR_IA32_UMWAIT_CONTROL,

MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+ MSR_IA32_XSS,
};

static const u32 msrs_to_save_pmu[] = {
@@ -7173,6 +7174,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
return;
break;
+ case MSR_IA32_XSS:
+ if (!kvm_caps.supported_xss)
+ return;
+ break;
default:
break;
}
--
2.27.0


2023-07-21 06:39:55

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access

Add unified handling for AMD/Intel's SHSTK MSRs, and leave IBT
related handling in VMX specific code.

Guest supervisor SSPs are handled specially because they are
loaded into HW registers only when the SSPs are used by guest.

Currently, AMD doesn't support IBT, so move related handling in
VMX specific code.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.h | 10 ++++
arch/x86/kvm/x86.c | 103 +++++++++++++++++++++++++++++---
arch/x86/kvm/x86.h | 18 ++++++
4 files changed, 124 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 20bbcd95511f..69cbc9d9b277 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,6 +805,7 @@ struct kvm_vcpu_arch {
u64 xcr0;
u64 guest_supported_xcr0;
u64 guest_supported_xss;
+ u64 cet_s_ssp[3];

struct kvm_pio_request pio;
void *pio_data;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b1658c0de847..7791a19b88ba 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -232,4 +232,14 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
}

+/*
+ * FIXME: When the "KVM-governed" enabling patchset is merge, rebase this
+ * series on top of that and replace this one with the helper merged.
+ */
+static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
+ unsigned int feature)
+{
+ return kvm_cpu_cap_has(feature) && guest_cpuid_has(vcpu, feature);
+}
+
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59e961a88b75..7a3753c05c09 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3628,6 +3628,47 @@ static bool kvm_is_msr_to_save(u32 msr_index)
return false;
}

+static inline bool is_shadow_stack_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr)
+{
+ return msr->index == MSR_IA32_PL0_SSP ||
+ msr->index == MSR_IA32_PL1_SSP ||
+ msr->index == MSR_IA32_PL2_SSP ||
+ msr->index == MSR_IA32_PL3_SSP ||
+ msr->index == MSR_IA32_INT_SSP_TAB ||
+ msr->index == MSR_KVM_GUEST_SSP;
+}
+
+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
+ struct msr_data *msr)
+{
+
+ /*
+ * This function cannot work without later CET MSR read/write
+ * emulation patch.
+ */
+ WARN_ON_ONCE(1);
+
+ if (is_shadow_stack_msr(vcpu, msr)) {
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+ return false;
+
+ if (msr->index == MSR_KVM_GUEST_SSP)
+ return msr->host_initiated;
+
+ return msr->host_initiated ||
+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+ }
+
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ return false;
+
+ return msr->host_initiated ||
+ guest_cpuid_has(vcpu, X86_FEATURE_IBT) ||
+ guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+}
+
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u32 msr = msr_info->index;
@@ -3982,6 +4023,35 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.guest_fpu.xfd_err = data;
break;
#endif
+#define CET_IBT_MASK_BITS GENMASK_ULL(63, 2)
+#define CET_SHSTK_MASK_BITS GENMASK(1, 0)
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if ((!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
+ (data & CET_SHSTK_MASK_BITS)) ||
+ (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
+ (data & CET_IBT_MASK_BITS)))
+ return 1;
+ if (msr == MSR_IA32_U_CET)
+ kvm_set_xsave_msr(msr_info);
+ break;
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if (is_noncanonical_address(data, vcpu))
+ return 1;
+ if (!IS_ALIGNED(data, 4))
+ return 1;
+ if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
+ msr == MSR_IA32_PL2_SSP) {
+ vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
+ } else if (msr == MSR_IA32_PL3_SSP) {
+ kvm_set_xsave_msr(msr_info);
+ }
+ break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
@@ -4052,7 +4122,9 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)

int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
- switch (msr_info->index) {
+ u32 msr = msr_info->index;
+
+ switch (msr) {
case MSR_IA32_PLATFORM_ID:
case MSR_IA32_EBL_CR_POWERON:
case MSR_IA32_LASTBRANCHFROMIP:
@@ -4087,7 +4159,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_get_msr(vcpu, msr_info);
msr_info->data = 0;
break;
@@ -4138,7 +4210,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_MTRRcap:
case MTRRphysBase_MSR(0) ... MSR_MTRRfix4K_F8000:
case MSR_MTRRdefType:
- return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
+ return kvm_mtrr_get_msr(vcpu, msr, &msr_info->data);
case 0xcd: /* fsb frequency */
msr_info->data = 3;
break;
@@ -4160,7 +4232,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = kvm_get_apic_base(vcpu);
break;
case APIC_BASE_MSR ... APIC_BASE_MSR + 0xff:
- return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
+ return kvm_x2apic_msr_read(vcpu, msr, &msr_info->data);
case MSR_IA32_TSC_DEADLINE:
msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
break;
@@ -4254,7 +4326,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
case MSR_IA32_MC0_CTL2 ... MSR_IA32_MCx_CTL2(KVM_MAX_MCE_BANKS) - 1:
- return get_msr_mce(vcpu, msr_info->index, &msr_info->data,
+ return get_msr_mce(vcpu, msr, &msr_info->data,
msr_info->host_initiated);
case MSR_IA32_XSS:
if (!msr_info->host_initiated &&
@@ -4285,7 +4357,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case HV_X64_MSR_TSC_EMULATION_STATUS:
case HV_X64_MSR_TSC_INVARIANT_CONTROL:
return kvm_hv_get_msr_common(vcpu,
- msr_info->index, &msr_info->data,
+ msr, &msr_info->data,
msr_info->host_initiated);
case MSR_IA32_BBL_CR_CTL3:
/* This legacy MSR exists but isn't fully documented in current
@@ -4338,8 +4410,22 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = vcpu->arch.guest_fpu.xfd_err;
break;
#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
+ msr == MSR_IA32_PL2_SSP) {
+ msr_info->data =
+ vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP];
+ } else if (msr == MSR_IA32_U_CET || msr == MSR_IA32_PL3_SSP) {
+ kvm_get_xsave_msr(msr_info);
+ }
+ break;
default:
- if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
+ if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_get_msr(vcpu, msr_info);

/*
@@ -4347,7 +4433,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* to-be-saved, even if an MSR isn't fully supported.
*/
if (msr_info->host_initiated &&
- kvm_is_msr_to_save(msr_info->index)) {
+ kvm_is_msr_to_save(msr)) {
msr_info->data = 0;
break;
}
@@ -12131,6 +12217,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)

vcpu->arch.cr3 = 0;
kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
+ memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));

/*
* CR0.CD/NW are set on RESET, preserved on INIT. Note, some versions
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6e6292915f8c..09dd35a79ff3 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -549,4 +549,22 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
unsigned int port, void *data, unsigned int count,
int in);

+/*
+ * Guest xstate MSRs have been loaded in __msr_io(), disable preemption before
+ * access the MSRs to avoid MSR content corruption.
+ */
+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
+{
+ kvm_fpu_get();
+ rdmsrl(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
+static inline void kvm_set_xsave_msr(struct msr_data *msr_info)
+{
+ kvm_fpu_get();
+ wrmsrl(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
#endif
--
2.27.0


2023-07-21 06:53:28

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 06/20] KVM:x86: Load guest FPU state when access XSAVE-managed MSRs

From: Sean Christopherson <[email protected]>

Load the guest's FPU state if userspace is accessing MSRs whose values are
managed by XSAVES. Two MSR access helpers, i.e., kvm_{get,set}_xsave_msr(),
are designed by a later patch to facilitate access to such kind of MSRs.

If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host contents before vCPU exits to userspace
and after it enters kernel again.

Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSS supporting MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).

Note that guest_cpuid_has() is not queried as host userspace is allowed
to access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.

Signed-off-by: Sean Christopherson <[email protected]>
Co-developed-by: Yang Weijiang <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8bdcbcf13146..04f0245ad0a2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -132,6 +132,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);

static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
struct kvm_x86_ops kvm_x86_ops __read_mostly;

#define KVM_X86_OP(func) \
@@ -4346,6 +4349,21 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
EXPORT_SYMBOL_GPL(kvm_get_msr_common);

+static const u32 xstate_msrs[] = {
+ MSR_IA32_U_CET, MSR_IA32_PL3_SSP,
+};
+
+static bool is_xstate_msr(u32 index)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(xstate_msrs); i++) {
+ if (index == xstate_msrs[i])
+ return true;
+ }
+ return false;
+}
+
/*
* Read or write a bunch of msrs. All parameters are kernel addresses.
*
@@ -4356,11 +4374,20 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
int (*do_msr)(struct kvm_vcpu *vcpu,
unsigned index, u64 *data))
{
+ bool fpu_loaded = false;
int i;

- for (i = 0; i < msrs->nmsrs; ++i)
+ for (i = 0; i < msrs->nmsrs; ++i) {
+ if (vcpu && !fpu_loaded && kvm_caps.supported_xss &&
+ is_xstate_msr(entries[i].index)) {
+ kvm_load_guest_fpu(vcpu);
+ fpu_loaded = true;
+ }
if (do_msr(vcpu, entries[i].index, &entries[i].data))
break;
+ }
+ if (fpu_loaded)
+ kvm_put_guest_fpu(vcpu);

return i;
}
--
2.27.0


2023-07-21 07:06:00

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 08/20] KVM:x86: Report KVM supported CET MSRs as to-be-saved

Add all CET MSRs including the synthesized GUEST_SSP to report list.
PL{0,1,2}_SSP are independent to host XSAVE management with later
patches. MSR_IA32_U_CET and MSR_IA32_PL3_SSP are XSAVE-managed on
host side. MSR_IA32_S_CET/MSR_IA32_INT_SSP_TAB/MSR_KVM_GUEST_SSP
are not XSAVE-managed.

When CET IBT/SHSTK are enumerated to guest, both user and supervisor
modes should be supported for architechtural integrity, i.e., two
modes are supported as both or neither.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kvm/x86.c | 10 ++++++++++
arch/x86/kvm/x86.h | 10 ++++++++++
3 files changed, 21 insertions(+)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 6e64b27b2c1e..7af465e4e0bd 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -58,6 +58,7 @@
#define MSR_KVM_ASYNC_PF_INT 0x4b564d06
#define MSR_KVM_ASYNC_PF_ACK 0x4b564d07
#define MSR_KVM_MIGRATION_CONTROL 0x4b564d08
+#define MSR_KVM_GUEST_SSP 0x4b564d09

struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be76ac6bbb21..59e961a88b75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1463,6 +1463,9 @@ static const u32 msrs_to_save_base[] = {

MSR_IA32_XFD, MSR_IA32_XFD_ERR,
MSR_IA32_XSS,
+ MSR_IA32_U_CET, MSR_IA32_S_CET,
+ MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+ MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB, MSR_KVM_GUEST_SSP,
};

static const u32 msrs_to_save_pmu[] = {
@@ -7215,6 +7218,13 @@ static void kvm_probe_msr_to_save(u32 msr_index)
if (!kvm_caps.supported_xss)
return;
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ case MSR_KVM_GUEST_SSP:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!kvm_is_cet_supported())
+ return;
+ break;
default:
break;
}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 82e3dafc5453..6e6292915f8c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -362,6 +362,16 @@ static inline bool kvm_mpx_supported(void)
== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
}

+#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
+/*
+ * Shadow Stack and Indirect Branch Tracking feature enabling depends on
+ * whether host side CET user xstate bit is supported or not.
+ */
+static inline bool kvm_is_cet_supported(void)
+{
+ return (kvm_caps.supported_xss & CET_XSTATE_MASK) == CET_XSTATE_MASK;
+}
+
extern unsigned int min_timer_period_us;

extern bool enable_vmware_backdoor;
--
2.27.0


2023-07-21 07:30:09

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v4 18/20] KVM:x86: Enable guest CET supervisor xstate bit support

Add S_CET bit in kvm_caps.supported_xss so that guest can enumerate
the feature in CPUID(0xd,1).ECX.

Guest S_CET xstate bit is specially handled, i.e., it can be exposed
without related enabling on host side, because KVM manually saves/reloads
guest supervisor SHSTK SSPs and current XSS swap logic for host/guest aslo
supports doing so, thus it's safe to enable the bit without host support.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/x86.c | 4 +++-
arch/x86/kvm/x86.h | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 665593d75251..f68e36ef34b9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -228,7 +228,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)

-#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER)
+#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_USER | \
+ XFEATURE_MASK_CET_KERNEL)

u64 __read_mostly host_efer;
EXPORT_SYMBOL_GPL(host_efer);
@@ -9638,6 +9639,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (boot_cpu_has(X86_FEATURE_XSAVES)) {
rdmsrl(MSR_IA32_XSS, host_xss);
kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;
+ kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;
}

kvm_init_pmu_capability(ops->pmu_ops);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9c88ddfb3e97..66733e01b0ce 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -362,7 +362,7 @@ static inline bool kvm_mpx_supported(void)
== (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
}

-#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER)
+#define CET_XSTATE_MASK (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
/*
* Shadow Stack and Indirect Branch Tracking feature enabling depends on
* whether host side CET user xstate bit is supported or not.
--
2.27.0


2023-07-24 09:36:01

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 10/20] KVM:x86: Make guest supervisor states as non-XSAVE managed

On Thu, Jul 20, 2023 at 11:03:42PM -0400, Yang Weijiang wrote:
>+static void kvm_save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>+{
>+ preempt_disable();

what's the purpose of disabling preemption?

>+ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>+ rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>+ rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>+ rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>+ /*
>+ * Omit reset to host PL{1,2}_SSP because Linux will never use
>+ * these MSRs.
>+ */
>+ wrmsrl(MSR_IA32_PL0_SSP, 0);

You don't need to reset the MSR because current host doesn't enable SSS
and leaving guest value in the MSR won't affect host behavior.

>+ }
>+ preempt_enable();
>+}
>+
>+static void kvm_reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>+{
>+ preempt_disable();
>+ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>+ wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>+ wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>+ wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>+ }
>+ preempt_enable();
>+}

save/load PLx_SSP in kvm_sched_in/out() and in VCPU_RUN ioctl is sub-optimal.

How about:
1. expose kvm_save/reload_cet_supervisor_ssp()
2. reload guest PLx_SSP in {vmx,svm}_prepare_switch_to_guest()
3. save guest PLx_SSP in vmx_prepare_switch_to_host() and
svm_prepare_host_switch()?

this way existing svm/vmx->guest_state_loaded can help to reduce a lot of
unnecessary PLx_SSP MSR accesses.

>+
> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> {
> struct kvm_queued_exception *ex = &vcpu->arch.exception;
>@@ -11222,6 +11249,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> kvm_sigset_activate(vcpu);
> kvm_run->flags = 0;
> kvm_load_guest_fpu(vcpu);
>+ kvm_reload_cet_supervisor_ssp(vcpu);
>
> kvm_vcpu_srcu_read_lock(vcpu);
> if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
>@@ -11310,6 +11338,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> r = vcpu_run(vcpu);
>
> out:
>+ kvm_save_cet_supervisor_ssp(vcpu);
> kvm_put_guest_fpu(vcpu);
> if (kvm_run->kvm_valid_regs)
> store_regs(vcpu);
>@@ -12398,9 +12427,17 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
> pmu->need_cleanup = true;
> kvm_make_request(KVM_REQ_PMU, vcpu);
> }
>+
>+ kvm_reload_cet_supervisor_ssp(vcpu);
>+
> static_call(kvm_x86_sched_in)(vcpu, cpu);
> }
>
>+void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu)
>+{

@cpu its meaning isn't clear and isn't used and ...

>+ kvm_save_cet_supervisor_ssp(vcpu);
>+}
>+
> void kvm_arch_free_vm(struct kvm *kvm)
> {
> kfree(to_kvm_hv(kvm)->hv_pa_pg);
>diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>index d90331f16db1..b3032a5f0641 100644
>--- a/include/linux/kvm_host.h
>+++ b/include/linux/kvm_host.h
>@@ -1423,6 +1423,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
>
> void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
>+void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu);
>
> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>index 66c1447d3c7f..42f28e8905e1 100644
>--- a/virt/kvm/kvm_main.c
>+++ b/virt/kvm/kvm_main.c
>@@ -5885,6 +5885,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
> {
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>
>+ kvm_arch_sched_out(vcpu, 0);

passing 0 always looks problematic.

> if (current->on_rq) {
> WRITE_ONCE(vcpu->preempted, true);
> WRITE_ONCE(vcpu->ready, true);
>--
>2.27.0
>

2023-07-24 10:23:22

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 11/20] KVM:x86: Save and reload GUEST_SSP to/from SMRAM

On Thu, Jul 20, 2023 at 11:03:43PM -0400, Yang Weijiang wrote:
>Save GUEST_SSP to SMRAM on SMI and reload it on RSM.
>KVM emulates architectural behavior when guest enters/leaves SMM
>mode, i.e., save registers to SMRAM at the entry of SMM and reload
>them at the exit of SMM. Per SDM, GUEST_SSP is defined as one of

To me, GUEST_SSP is confusing here. From QEMU's perspective, it reads/writes
the SSP register. People may confuse it with the GUEST_SSP in nVMCS field.
I prefer to rename it to MSR_KVM_SSP.

>the fields in SMRAM for 64-bit mode, so handle the state accordingly.
>
>Check HF_SMM_MASK to determine whether kvm_cet_is_msr_accessible()
>is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode.
>
>Signed-off-by: Yang Weijiang <[email protected]>
>---
> arch/x86/kvm/smm.c | 17 +++++++++++++++++
> arch/x86/kvm/smm.h | 2 +-
> arch/x86/kvm/x86.c | 12 +++++++++++-
> 3 files changed, 29 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
>index b42111a24cc2..a4e19d72224f 100644
>--- a/arch/x86/kvm/smm.c
>+++ b/arch/x86/kvm/smm.c
>@@ -309,6 +309,15 @@ void enter_smm(struct kvm_vcpu *vcpu)
>
> kvm_smm_changed(vcpu, true);
>
>+#ifdef CONFIG_X86_64
>+ if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
>+ u64 data;
>+
>+ if (!kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &data))
>+ smram.smram64.ssp = data;

I don't think it is correct to continue if kvm fails to read the MSR.

how about:
if (kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
goto error;
>+ }
>+#endif
>+
> if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
> goto error;
>
>@@ -586,6 +595,14 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
> if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0)
> static_call(kvm_x86_set_nmi_mask)(vcpu, false);
>
>+#ifdef CONFIG_X86_64
>+ if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
>+ u64 data = smram.smram64.ssp;
>+

>+ if (is_noncanonical_address(data, vcpu) && IS_ALIGNED(data, 4))

shouldn't the checks be already done inside kvm_set_msr()?

>+ kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, data);

please handle the failure case. Probably just return X86EMUL_UNHANDLEABLE like other
failure paths in this function.

>+ }
>+#endif
> kvm_smm_changed(vcpu, false);
>
> /*
>diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
>index a1cf2ac5bd78..b3efef7cb1dc 100644
>--- a/arch/x86/kvm/smm.h
>+++ b/arch/x86/kvm/smm.h
>@@ -116,7 +116,7 @@ struct kvm_smram_state_64 {
> u32 smbase;
> u32 reserved4[5];
>
>- /* ssp and svm_* fields below are not implemented by KVM */
>+ /* svm_* fields below are not implemented by KVM */

move this comment one line downward

> u64 ssp;
> u64 svm_guest_pat;
> u64 svm_host_efer;
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index f7558f0f6fc0..70d7c80889d6 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -3653,8 +3653,18 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
> if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> return false;
>
>- if (msr->index == MSR_KVM_GUEST_SSP)
>+ /*
>+ * This MSR is synthesized mainly for userspace access during
>+ * Live Migration, it also can be accessed in SMM mode by VMM.
>+ * Guest is not allowed to access this MSR.
>+ */
>+ if (msr->index == MSR_KVM_GUEST_SSP) {
>+ if (IS_ENABLED(CONFIG_X86_64) &&
>+ !!(vcpu->arch.hflags & HF_SMM_MASK))

use is_smm() instead.

>+ return true;
>+
> return msr->host_initiated;
>+ }
>
> return msr->host_initiated ||
> guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>--
>2.27.0
>

2023-07-24 14:28:18

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 10/20] KVM:x86: Make guest supervisor states as non-XSAVE managed


On 7/24/2023 4:26 PM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:42PM -0400, Yang Weijiang wrote:
>> +static void kvm_save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>> +{
>> + preempt_disable();
> what's the purpose of disabling preemption?

Thanks!

These preempt_disable/enable() becomes unnecessary due to the PLx_SSP
handling

in sched_in/out(). Will remove them.

>
>> + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>> + rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>> + rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>> + rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>> + /*
>> + * Omit reset to host PL{1,2}_SSP because Linux will never use
>> + * these MSRs.
>> + */
>> + wrmsrl(MSR_IA32_PL0_SSP, 0);
> You don't need to reset the MSR because current host doesn't enable SSS
> and leaving guest value in the MSR won't affect host behavior.

Yes,  I just want to make the host PLx_SSPs as clean as possible.

>
>> + }
>> + preempt_enable();
>> +}
>> +
>> +static void kvm_reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
>> +{
>> + preempt_disable();
>> + if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
>> + wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
>> + wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
>> + wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
>> + }
>> + preempt_enable();
>> +}
> save/load PLx_SSP in kvm_sched_in/out() and in VCPU_RUN ioctl is sub-optimal.
>
> How about:
> 1. expose kvm_save/reload_cet_supervisor_ssp()
> 2. reload guest PLx_SSP in {vmx,svm}_prepare_switch_to_guest()
> 3. save guest PLx_SSP in vmx_prepare_switch_to_host() and
> svm_prepare_host_switch()?
>
> this way existing svm/vmx->guest_state_loaded can help to reduce a lot of
> unnecessary PLx_SSP MSR accesses.

Nice suggestion! It looks workable. I'll try this,  thanks!

>
>> +
>> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>> {
>> struct kvm_queued_exception *ex = &vcpu->arch.exception;
>> @@ -11222,6 +11249,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>> kvm_sigset_activate(vcpu);
>> kvm_run->flags = 0;
>> kvm_load_guest_fpu(vcpu);
>> + kvm_reload_cet_supervisor_ssp(vcpu);
>>
>> kvm_vcpu_srcu_read_lock(vcpu);
>> if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
>> @@ -11310,6 +11338,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>> r = vcpu_run(vcpu);
>>
>> out:
>> + kvm_save_cet_supervisor_ssp(vcpu);
>> kvm_put_guest_fpu(vcpu);
>> if (kvm_run->kvm_valid_regs)
>> store_regs(vcpu);
>> @@ -12398,9 +12427,17 @@ void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
>> pmu->need_cleanup = true;
>> kvm_make_request(KVM_REQ_PMU, vcpu);
>> }
>> +
>> + kvm_reload_cet_supervisor_ssp(vcpu);
>> +
>> static_call(kvm_x86_sched_in)(vcpu, cpu);
>> }
>>
>> +void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu)
>> +{
> @cpu its meaning isn't clear and isn't used and ...
Yes, I should have removed it.
>
>> + kvm_save_cet_supervisor_ssp(vcpu);
>> +}
>> +
>> void kvm_arch_free_vm(struct kvm *kvm)
>> {
>> kfree(to_kvm_hv(kvm)->hv_pa_pg);
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index d90331f16db1..b3032a5f0641 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1423,6 +1423,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
>> int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
>>
>> void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
>> +void kvm_arch_sched_out(struct kvm_vcpu *vcpu, int cpu);
>>
>> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
>> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 66c1447d3c7f..42f28e8905e1 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -5885,6 +5885,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
>> {
>> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>>
>> + kvm_arch_sched_out(vcpu, 0);
> passing 0 always looks problematic.
Can you elaborate? I have no intent to use @cpu now.
>> if (current->on_rq) {
>> WRITE_ONCE(vcpu->preempted, true);
>> WRITE_ONCE(vcpu->ready, true);
>> --
>> 2.27.0
>>

2023-07-24 14:33:26

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 11/20] KVM:x86: Save and reload GUEST_SSP to/from SMRAM


On 7/24/2023 5:13 PM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:43PM -0400, Yang Weijiang wrote:
>> Save GUEST_SSP to SMRAM on SMI and reload it on RSM.
>> KVM emulates architectural behavior when guest enters/leaves SMM
>> mode, i.e., save registers to SMRAM at the entry of SMM and reload
>> them at the exit of SMM. Per SDM, GUEST_SSP is defined as one of
> To me, GUEST_SSP is confusing here. From QEMU's perspective, it reads/writes
> the SSP register. People may confuse it with the GUEST_SSP in nVMCS field.
> I prefer to rename it to MSR_KVM_SSP.

Hmm, looks a bit, I'll change it, thanks!

>
>> the fields in SMRAM for 64-bit mode, so handle the state accordingly.
>>
>> Check HF_SMM_MASK to determine whether kvm_cet_is_msr_accessible()
>> is called in SMM mode so that kvm_{set,get}_msr() works in SMM mode.
>>
>> Signed-off-by: Yang Weijiang <[email protected]>
>> ---
>> arch/x86/kvm/smm.c | 17 +++++++++++++++++
>> arch/x86/kvm/smm.h | 2 +-
>> arch/x86/kvm/x86.c | 12 +++++++++++-
>> 3 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
>> index b42111a24cc2..a4e19d72224f 100644
>> --- a/arch/x86/kvm/smm.c
>> +++ b/arch/x86/kvm/smm.c
>> @@ -309,6 +309,15 @@ void enter_smm(struct kvm_vcpu *vcpu)
>>
>> kvm_smm_changed(vcpu, true);
>>
>> +#ifdef CONFIG_X86_64
>> + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
>> + u64 data;
>> +
>> + if (!kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &data))
>> + smram.smram64.ssp = data;
> I don't think it is correct to continue if kvm fails to read the MSR.
>
> how about:
> if (kvm_get_msr(vcpu, MSR_KVM_GUEST_SSP, &smram.smram64.ssp))
> goto error;

Agree,  will change it.

>> + }
>> +#endif
>> +
>> if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram)))
>> goto error;
>>
>> @@ -586,6 +595,14 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>> if ((vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK) == 0)
>> static_call(kvm_x86_set_nmi_mask)(vcpu, false);
>>
>> +#ifdef CONFIG_X86_64
>> + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
>> + u64 data = smram.smram64.ssp;
>> +
>> + if (is_noncanonical_address(data, vcpu) && IS_ALIGNED(data, 4))
> shouldn't the checks be already done inside kvm_set_msr()?

Nice catch, will remove them.


>
>> + kvm_set_msr(vcpu, MSR_KVM_GUEST_SSP, data);
> please handle the failure case. Probably just return X86EMUL_UNHANDLEABLE like other
> failure paths in this function.

OK, VM should be shutdown if this field cannot be written successfully.

>
>> + }
>> +#endif
>> kvm_smm_changed(vcpu, false);
>>
>> /*
>> diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
>> index a1cf2ac5bd78..b3efef7cb1dc 100644
>> --- a/arch/x86/kvm/smm.h
>> +++ b/arch/x86/kvm/smm.h
>> @@ -116,7 +116,7 @@ struct kvm_smram_state_64 {
>> u32 smbase;
>> u32 reserved4[5];
>>
>> - /* ssp and svm_* fields below are not implemented by KVM */
>> + /* svm_* fields below are not implemented by KVM */
> move this comment one line downward

OK

>
>> u64 ssp;
>> u64 svm_guest_pat;
>> u64 svm_host_efer;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index f7558f0f6fc0..70d7c80889d6 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3653,8 +3653,18 @@ static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>> if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
>> return false;
>>
>> - if (msr->index == MSR_KVM_GUEST_SSP)
>> + /*
>> + * This MSR is synthesized mainly for userspace access during
>> + * Live Migration, it also can be accessed in SMM mode by VMM.
>> + * Guest is not allowed to access this MSR.
>> + */
>> + if (msr->index == MSR_KVM_GUEST_SSP) {
>> + if (IS_ENABLED(CONFIG_X86_64) &&
>> + !!(vcpu->arch.hflags & HF_SMM_MASK))
> use is_smm() instead.

OK.

>
>> + return true;
>> +
>> return msr->host_initiated;
>> + }
>>
>> return msr->host_initiated ||
>> guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
>> --
>> 2.27.0
>>

2023-07-24 14:44:09

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 11/20] KVM:x86: Save and reload GUEST_SSP to/from SMRAM

On Mon, Jul 24, 2023, Weijiang Yang wrote:
>
> On 7/24/2023 5:13 PM, Chao Gao wrote:
> > On Thu, Jul 20, 2023 at 11:03:43PM -0400, Yang Weijiang wrote:
> > > Save GUEST_SSP to SMRAM on SMI and reload it on RSM.
> > > KVM emulates architectural behavior when guest enters/leaves SMM
> > > mode, i.e., save registers to SMRAM at the entry of SMM and reload
> > > them at the exit of SMM. Per SDM, GUEST_SSP is defined as one of
> > To me, GUEST_SSP is confusing here. From QEMU's perspective, it reads/writes
> > the SSP register. People may confuse it with the GUEST_SSP in nVMCS field.
> > I prefer to rename it to MSR_KVM_SSP.
>
> Hmm, looks a bit, I'll change it, thanks!

Please just say "SSP". The SMRAM field has nothing to do with KVM's synthetic MSR.

2023-07-26 08:07:04

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access

On Thu, Jul 20, 2023 at 11:03:41PM -0400, Yang Weijiang wrote:
>+static inline bool is_shadow_stack_msr(struct kvm_vcpu *vcpu,

remove @vcpu since it isn't used. And I think it is better to accept
an MSR index than struct msr_data because whether a MSR is a shadow
stack MSR is entirely decided by the MSR index; other fields in the
struct msr_data are irrelevant.

>+ struct msr_data *msr)
>+{
>+ return msr->index == MSR_IA32_PL0_SSP ||
>+ msr->index == MSR_IA32_PL1_SSP ||
>+ msr->index == MSR_IA32_PL2_SSP ||
>+ msr->index == MSR_IA32_PL3_SSP ||
>+ msr->index == MSR_IA32_INT_SSP_TAB ||
>+ msr->index == MSR_KVM_GUEST_SSP;
>+}
>+
>+static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>+ struct msr_data *msr)
>+{
>+
>+ /*
>+ * This function cannot work without later CET MSR read/write
>+ * emulation patch.

Probably you should consider merging the "later" patch into this one.
Then you can get rid of this comment and make this patch easier for
review ...

> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> {
> u32 msr = msr_info->index;
>@@ -3982,6 +4023,35 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> vcpu->arch.guest_fpu.xfd_err = data;
> break;
> #endif
>+#define CET_IBT_MASK_BITS GENMASK_ULL(63, 2)

bit9:6 are reserved even if IBT is supported.

>@@ -12131,6 +12217,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>
> vcpu->arch.cr3 = 0;
> kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>+ memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));

... this begs the question: where other MSRs are reset. I suppose
U_CET/PL3_SSP are handled when resetting guest FPU. But how about S_CET
and INT_SSP_TAB? there is no answer in this patch.

2023-07-26 08:48:15

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access


On 7/26/2023 3:33 PM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:41PM -0400, Yang Weijiang wrote:
>> +static inline bool is_shadow_stack_msr(struct kvm_vcpu *vcpu,
> remove @vcpu since it isn't used. And I think it is better to accept
> an MSR index than struct msr_data because whether a MSR is a shadow
> stack MSR is entirely decided by the MSR index; other fields in the
> struct msr_data are irrelevant.

Yes, I should have removed it, thanks!

>
>> + struct msr_data *msr)
>> +{
>> + return msr->index == MSR_IA32_PL0_SSP ||
>> + msr->index == MSR_IA32_PL1_SSP ||
>> + msr->index == MSR_IA32_PL2_SSP ||
>> + msr->index == MSR_IA32_PL3_SSP ||
>> + msr->index == MSR_IA32_INT_SSP_TAB ||
>> + msr->index == MSR_KVM_GUEST_SSP;
>> +}
>> +
>> +static bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu,
>> + struct msr_data *msr)
>> +{
>> +
>> + /*
>> + * This function cannot work without later CET MSR read/write
>> + * emulation patch.
> Probably you should consider merging the "later" patch into this one.
> Then you can get rid of this comment and make this patch easier for
> review ...

Which later patch you mean? If you mean [13/20] KVM:VMX: Emulate read
and write to CET MSRs,

then I intentionally separate these two, this one is for CET MSR common
checks and operations,

the latter is specific to VMX, and add the above comments in case
someone is bisecting

the patches and happens to split at this patch, then it would faulted
and take some actions.

>> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> {
>> u32 msr = msr_info->index;
>> @@ -3982,6 +4023,35 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> vcpu->arch.guest_fpu.xfd_err = data;
>> break;
>> #endif
>> +#define CET_IBT_MASK_BITS GENMASK_ULL(63, 2)
> bit9:6 are reserved even if IBT is supported.

Yes, as IBT is only available on Intel platforms, I move the handling of
bit 9:6 to VMX  related patch.

Here's the common check in case IBT is not available.

>
>> @@ -12131,6 +12217,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>
>> vcpu->arch.cr3 = 0;
>> kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>> + memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));
> ... this begs the question: where other MSRs are reset. I suppose
> U_CET/PL3_SSP are handled when resetting guest FPU. But how about S_CET
> and INT_SSP_TAB? there is no answer in this patch.

I think the related guest VMCS fields(S_CET/INT_SSP_TAB/SSP) should be
reset to 0 in vmx_vcpu_reset(),

do you think so?


2023-07-26 08:51:32

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs

On Thu, Jul 20, 2023 at 11:03:45PM -0400, Yang Weijiang wrote:
>Add VMX specific emulation for CET MSR read and write.
>IBT feature is only available on Intel platforms now and the
>virtualization interface to the control fields is vensor
>specific, so split this part from the common code.
>
>Signed-off-by: Yang Weijiang <[email protected]>
>---
> arch/x86/kvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 7 -------
> 2 files changed, 40 insertions(+), 7 deletions(-)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index c8d9870cfecb..b29817ec6f2e 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -2093,6 +2093,21 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> break;
>+ case MSR_IA32_U_CET:
>+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>+ return kvm_get_msr_common(vcpu, msr_info);

kvm_get_msr_common() is called for the "default" case. so this can be dropped.

>+ case MSR_IA32_S_CET:
>+ case MSR_KVM_GUEST_SSP:
>+ case MSR_IA32_INT_SSP_TAB:
>+ if (kvm_get_msr_common(vcpu, msr_info))
>+ return 1;
>+ if (msr_info->index == MSR_KVM_GUEST_SSP)
>+ msr_info->data = vmcs_readl(GUEST_SSP);
>+ else if (msr_info->index == MSR_IA32_S_CET)
>+ msr_info->data = vmcs_readl(GUEST_S_CET);
>+ else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>+ msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>+ break;
> case MSR_IA32_DEBUGCTLMSR:
> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> break;
>@@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> vmx->pt_desc.guest.addr_a[index / 2] = data;
> break;
>+#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6))

bits9-6 are reserved for both intel and amd. Shouldn't this check be
done in the common code?

>+#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
>+#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)

>+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>+ return kvm_set_msr_common(vcpu, msr_info);

this hunk can be dropped as well.

>+ break;
>+ case MSR_IA32_U_CET:
>+ case MSR_IA32_S_CET:
>+ case MSR_KVM_GUEST_SSP:
>+ case MSR_IA32_INT_SSP_TAB:
>+ if ((msr_index == MSR_IA32_U_CET ||
>+ msr_index == MSR_IA32_S_CET) &&
>+ ((data & ~VMX_CET_CONTROL_MASK) ||
>+ !IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
>+ (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS))
>+ return 1;

how about

case MSR_IA32_U_CET:
case MSR_IA32_S_CET:
if ((data & ~VMX_CET_CONTROL_MASK) || ...
...

case MSR_KVM_GUEST_SSP:
case MSR_IA32_INT_SSP_TAB:

2023-07-26 09:01:09

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 14/20] KVM:VMX: Set up interception for CET MSRs

On Thu, Jul 20, 2023 at 11:03:46PM -0400, Yang Weijiang wrote:
>Pass through CET MSRs when the associated feature is enabled.
>Shadow Stack feature requires all the CET MSRs to make it
>architectural support in guest. IBT feature only depends on
>MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and
>supervisor IBT.

If a guest supports SHSTK only, KVM has no way to prevent guest from
enabling IBT because the U/S_CET are pass-thru'd. it is a problem.

I am wondering if it is necessary to pass-thru U/S_CET MSRs. Probably
the answer is yes at least for U_CET MSR because the MSR is per-task.

>
>Signed-off-by: Yang Weijiang <[email protected]>
>---
> arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index b29817ec6f2e..85cb7e748a89 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -709,6 +709,10 @@ static bool is_valid_passthrough_msr(u32 msr)
> case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
> /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
> return true;
>+ case MSR_IA32_U_CET:
>+ case MSR_IA32_S_CET:
>+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>+ return true;
> }
>
> r = possible_passthrough_msr_slot(msr) != -ENOENT;
>@@ -7758,6 +7762,34 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
> vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
> }
>
>+static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
>+{
>+ if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
>+ MSR_TYPE_RW, false);
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
>+ MSR_TYPE_RW, false);
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
>+ MSR_TYPE_RW, false);
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
>+ MSR_TYPE_RW, false);
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
>+ MSR_TYPE_RW, false);
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,
>+ MSR_TYPE_RW, false);
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB,
>+ MSR_TYPE_RW, false);
>+ return;
>+ }
>+
>+ if (guest_can_use(vcpu, X86_FEATURE_IBT)) {
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
>+ MSR_TYPE_RW, false);
>+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
>+ MSR_TYPE_RW, false);
>+ }

This is incorrect. see

https://lore.kernel.org/all/[email protected]/

>+}
>+
> static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
>@@ -7825,6 +7857,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>
> /* Refresh #PF interception to account for MAXPHYADDR changes. */
> vmx_update_exception_bitmap(vcpu);
>+
>+ if (kvm_is_cet_supported())

Nit: this check is not necessary. here isn't a hot path. and if
kvm_is_cet_supported() is false, guest_can_use(., X86_FEATURE_SHSTK/IBT)
should be false.

>+ vmx_update_intercept_for_cet_msr(vcpu);
> }
>
> static u64 vmx_get_perf_capabilities(void)
>--
>2.27.0
>

2023-07-26 10:03:52

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 15/20] KVM:VMX: Save host MSR_IA32_S_CET to VMCS field

On Thu, Jul 20, 2023 at 11:03:47PM -0400, Yang Weijiang wrote:
>Save host MSR_IA32_S_CET to VMCS field as host constant state.
>Kernel IBT is supported now and the setting in MSR_IA32_S_CET
>is static after post-boot except in BIOS call case, but vCPU
>won't execute such BIOS call path currently, so it's safe to
>make the MSR as host constant.
>
>Suggested-by: Sean Christopherson <[email protected]>
>Signed-off-by: Yang Weijiang <[email protected]>
>---
> arch/x86/kvm/vmx/capabilities.h | 4 ++++
> arch/x86/kvm/vmx/vmx.c | 8 ++++++++
> 2 files changed, 12 insertions(+)
>
>diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
>index d0abee35d7ba..b1883f6c08eb 100644
>--- a/arch/x86/kvm/vmx/capabilities.h
>+++ b/arch/x86/kvm/vmx/capabilities.h
>@@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
> return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> }
>
>+static inline bool cpu_has_load_cet_ctrl(void)
>+{
>+ return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);

VM_ENTRY_LOAD_CET_STATE is to load guest state. Strictly speaking, you
should check VM_EXIT_LOAD_HOST_CET_STATE though I believe CPUs will
support both or none.

>+}
> static inline bool cpu_has_vmx_mpx(void)
> {
> return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index 85cb7e748a89..cba24acf1a7a 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -109,6 +109,8 @@ module_param(enable_apicv, bool, S_IRUGO);
> bool __read_mostly enable_ipiv = true;
> module_param(enable_ipiv, bool, 0444);
>
>+static u64 __read_mostly host_s_cet;

caching host's value is to save an MSR read on vCPU creation?

Otherwise I don't see why a local variable cannot work.

>+
> /*
> * If nested=1, nested virtualization is supported, i.e., guests may use
> * VMX and be a hypervisor for its own guests. If nested=0, guests may not
>@@ -4355,6 +4357,9 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
>
> if (cpu_has_load_ia32_efer())
> vmcs_write64(HOST_IA32_EFER, host_efer);
>+
>+ if (cpu_has_load_cet_ctrl())
>+ vmcs_writel(HOST_S_CET, host_s_cet);
> }
>
> void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
>@@ -8633,6 +8638,9 @@ static __init int hardware_setup(void)
> return r;
> }
>
>+ if (cpu_has_load_cet_ctrl())
>+ rdmsrl_safe(MSR_IA32_S_CET, &host_s_cet);
>+
> vmx_set_cpu_caps();
>
> r = alloc_kvm_area();
>--
>2.27.0
>

2023-07-26 14:34:44

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access

On Wed, Jul 26, 2023 at 04:26:06PM +0800, Yang, Weijiang wrote:
>> > + /*
>> > + * This function cannot work without later CET MSR read/write
>> > + * emulation patch.
>> Probably you should consider merging the "later" patch into this one.
>> Then you can get rid of this comment and make this patch easier for
>> review ...
>
>Which later patch you mean? If you mean [13/20] KVM:VMX: Emulate read and
>write to CET MSRs,
>
>then I intentionally separate these two, this one is for CET MSR common
>checks and operations,
>
>the latter is specific to VMX, and add the above comments in case someone is

The problem of this organization is the handling of S_CET, SSP, INT_SSP_TABLE
MSR is incomplete in this patch. I think a better organization is to either
merge this patch and patch 13, or move all changes related to S_CET, SSP,
INT_SSP_TABLE into patch 13? e.g.,

case MSR_IA32_U_CET:
- case MSR_IA32_S_CET:
if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
return 1;
if ((!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
(data & CET_SHSTK_MASK_BITS)) ||
(!guest_can_use(vcpu, X86_FEATURE_IBT) &&
(data & CET_IBT_MASK_BITS)))
return 1;
- if (msr == MSR_IA32_U_CET)
- kvm_set_xsave_msr(msr_info);
kvm_set_xsave_msr(msr_info);
break;
- case MSR_KVM_GUEST_SSP:
- case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
return 1;
if (is_noncanonical_address(data, vcpu))
return 1;
if (!IS_ALIGNED(data, 4))
return 1;
if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
msr == MSR_IA32_PL2_SSP) {
vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
} else if (msr == MSR_IA32_PL3_SSP) {
kvm_set_xsave_msr(msr_info);
}
break;



BTW, shouldn't bit2:0 of MSR_KVM_GUEST_SSP be 0? i.e., for MSR_KVM_GUEST_SSP,
the alignment check should be IS_ALIGNED(data, 8).

>bisecting
>
>the patches and happens to split at this patch, then it would faulted and
>take some actions.

I am not sure what kind of issue you are worrying about. In my understanding,
KVM hasn't advertised the support of IBT and SHSTK, so,
kvm_cpu_cap_has(X86_FEATURE_SHSTK/IBT) will always return false. and then
kvm_cet_is_msr_accessible() is guaranteed to return false.

If there is any issue in your mind, you can fix it or reorganize your patches to
avoid the issue. To me, adding a comment and a warning is not a good solution.

>
>> > int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> > {
>> > u32 msr = msr_info->index;
>> > @@ -3982,6 +4023,35 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> > vcpu->arch.guest_fpu.xfd_err = data;
>> > break;
>> > #endif
>> > +#define CET_IBT_MASK_BITS GENMASK_ULL(63, 2)
>> bit9:6 are reserved even if IBT is supported.
>
>Yes, as IBT is only available on Intel platforms, I move the handling of bit
>9:6 to VMX? related patch.

IIUC, bits 9:6 are not reserved for IBT. I don't get how IBT availability
affects the handling of bits 9:6.

>
>Here's the common check in case IBT is not available.
>
>>
>> > @@ -12131,6 +12217,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>> >
>> > vcpu->arch.cr3 = 0;
>> > kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>> > + memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));
>> ... this begs the question: where other MSRs are reset. I suppose
>> U_CET/PL3_SSP are handled when resetting guest FPU. But how about S_CET
>> and INT_SSP_TAB? there is no answer in this patch.
>
>I think the related guest VMCS fields(S_CET/INT_SSP_TAB/SSP) should be reset
>to 0 in vmx_vcpu_reset(),
>
>do you think so?

Yes, looks good.

2023-07-26 15:39:25

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 15/20] KVM:VMX: Save host MSR_IA32_S_CET to VMCS field

On Wed, Jul 26, 2023, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:47PM -0400, Yang Weijiang wrote:
> >Save host MSR_IA32_S_CET to VMCS field as host constant state.
> >Kernel IBT is supported now and the setting in MSR_IA32_S_CET
> >is static after post-boot except in BIOS call case, but vCPU
> >won't execute such BIOS call path currently, so it's safe to
> >make the MSR as host constant.
> >
> >Suggested-by: Sean Christopherson <[email protected]>
> >Signed-off-by: Yang Weijiang <[email protected]>
> >---
> > arch/x86/kvm/vmx/capabilities.h | 4 ++++
> > arch/x86/kvm/vmx/vmx.c | 8 ++++++++
> > 2 files changed, 12 insertions(+)
> >
> >diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> >index d0abee35d7ba..b1883f6c08eb 100644
> >--- a/arch/x86/kvm/vmx/capabilities.h
> >+++ b/arch/x86/kvm/vmx/capabilities.h
> >@@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
> > return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> > }
> >
> >+static inline bool cpu_has_load_cet_ctrl(void)
> >+{
> >+ return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
>
> VM_ENTRY_LOAD_CET_STATE is to load guest state. Strictly speaking, you
> should check VM_EXIT_LOAD_HOST_CET_STATE though I believe CPUs will
> support both or none.

No need, pairs are now handled by setup_vmcs_config(). See commit f5a81d0eb01e
("KVM: VMX: Sanitize VM-Entry/VM-Exit control pairs at kvm_intel load time"), and
then patch 17 does:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 3eb4fe9c9ab6..3f2f966e327d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2641,6 +2641,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
{ VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER },
{ VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS },
{ VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL },
+ { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE },
};

>
> >+}
> > static inline bool cpu_has_vmx_mpx(void)
> > {
> > return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
> >diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >index 85cb7e748a89..cba24acf1a7a 100644
> >--- a/arch/x86/kvm/vmx/vmx.c
> >+++ b/arch/x86/kvm/vmx/vmx.c
> >@@ -109,6 +109,8 @@ module_param(enable_apicv, bool, S_IRUGO);
> > bool __read_mostly enable_ipiv = true;
> > module_param(enable_ipiv, bool, 0444);
> >
> >+static u64 __read_mostly host_s_cet;
>
> caching host's value is to save an MSR read on vCPU creation?

Yep. And probably more importantly, to document that the host value is static,
i.e. that KVM doesn't need to refresh S_CET before every VM-Enter/VM-Exit sequence.

2023-07-27 03:36:01

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs


On 7/26/2023 4:06 PM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:45PM -0400, Yang Weijiang wrote:
>> Add VMX specific emulation for CET MSR read and write.
>> IBT feature is only available on Intel platforms now and the
>> virtualization interface to the control fields is vensor
>> specific, so split this part from the common code.
>>
>> Signed-off-by: Yang Weijiang <[email protected]>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++
>> arch/x86/kvm/x86.c | 7 -------
>> 2 files changed, 40 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index c8d9870cfecb..b29817ec6f2e 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2093,6 +2093,21 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> else
>> msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
>> break;
>> + case MSR_IA32_U_CET:
>> + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>> + return kvm_get_msr_common(vcpu, msr_info);
> kvm_get_msr_common() is called for the "default" case. so this can be dropped.

yes, I can drop these in vmx_get_msr(), just wanted to make it clearer,
Thanks!

>
>> + case MSR_IA32_S_CET:
>> + case MSR_KVM_GUEST_SSP:
>> + case MSR_IA32_INT_SSP_TAB:
>> + if (kvm_get_msr_common(vcpu, msr_info))
>> + return 1;
>> + if (msr_info->index == MSR_KVM_GUEST_SSP)
>> + msr_info->data = vmcs_readl(GUEST_SSP);
>> + else if (msr_info->index == MSR_IA32_S_CET)
>> + msr_info->data = vmcs_readl(GUEST_S_CET);
>> + else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>> + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>> + break;
>> case MSR_IA32_DEBUGCTLMSR:
>> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
>> break;
>> @@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> else
>> vmx->pt_desc.guest.addr_a[index / 2] = data;
>> break;
>> +#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6))
> bits9-6 are reserved for both intel and amd. Shouldn't this check be
> done in the common code?

My thinking is, on AMD platform, bit 63:2 is anyway reserved since it
doesn't support IBT,

so the checks in common code for AMD is enough, when the execution flow
comes here,

it should be vmx, and need this additional check.

>
>> +#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
>> +#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
>> + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>> + return kvm_set_msr_common(vcpu, msr_info);
> this hunk can be dropped as well.

In patch 16, these lines still need to be added back for PL{0,1,2}_SSP,
so would like keep it

here.

>
>> + break;
>> + case MSR_IA32_U_CET:
>> + case MSR_IA32_S_CET:
>> + case MSR_KVM_GUEST_SSP:
>> + case MSR_IA32_INT_SSP_TAB:
>> + if ((msr_index == MSR_IA32_U_CET ||
>> + msr_index == MSR_IA32_S_CET) &&
>> + ((data & ~VMX_CET_CONTROL_MASK) ||
>> + !IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
>> + (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS))
>> + return 1;
> how about
>
> case MSR_IA32_U_CET:
> case MSR_IA32_S_CET:
> if ((data & ~VMX_CET_CONTROL_MASK) || ...
> ...
>
> case MSR_KVM_GUEST_SSP:
> case MSR_IA32_INT_SSP_TAB:

Do you mean to use "fallthrough"?

If not, for  MSR_IA32_S_CET, we need vmcs_write() to handle it,

this is vmx specific code.


2023-07-27 04:03:48

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 16/20] KVM:x86: Optimize CET supervisor SSP save/reload

On Thu, Jul 20, 2023 at 11:03:48PM -0400, Yang Weijiang wrote:
> /*
> * Writes msr value into the appropriate "register".
> * Returns 0 on success, non-0 otherwise.
>@@ -2427,7 +2439,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> #define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
> #define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
> case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>- return kvm_set_msr_common(vcpu, msr_info);
>+ if (kvm_set_msr_common(vcpu, msr_info))
>+ return 1;
>+ /*
>+ * Write to the base SSP MSRs should happen ahead of toggling
>+ * of IA32_S_CET.SH_STK_EN bit.

Is this a requirement from SDM? And how is this related to the change below?

Note that PLx_SSP MSRs are linear addresses of shadow stacks for different CPLs.
I may think using the page at 0 (assuming 0 is the reset value of PLx SSP) is
allowed in architecture although probably no kernel will do so.

I don't understand why this comment is needed. I suggest dropping it.

>+ */
>+ if (msr_index != MSR_IA32_PL3_SSP && data) {
>+ vmx_disable_write_intercept_sss_msr(vcpu);
>+ wrmsrl(msr_index, data);
>+ }

2023-07-27 05:22:26

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 14/20] KVM:VMX: Set up interception for CET MSRs


On 7/26/2023 4:30 PM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:46PM -0400, Yang Weijiang wrote:
>> Pass through CET MSRs when the associated feature is enabled.
>> Shadow Stack feature requires all the CET MSRs to make it
>> architectural support in guest. IBT feature only depends on
>> MSR_IA32_U_CET and MSR_IA32_S_CET to enable both user and
>> supervisor IBT.
> If a guest supports SHSTK only, KVM has no way to prevent guest from
> enabling IBT because the U/S_CET are pass-thru'd. it is a problem.

Yes, this is a CET architectural defect when only SHSTK is enabled. But
it shouldn't

bring issues, right? I will highlight it in commit log.

>
> I am wondering if it is necessary to pass-thru U/S_CET MSRs. Probably
> the answer is yes at least for U_CET MSR because the MSR is per-task.

Agree,  also xsaves/xrstors in guest could fail if they're not pass-thrued.

>
>> Signed-off-by: Yang Weijiang <[email protected]>
>> ---
>> arch/x86/kvm/vmx/vmx.c | 35 +++++++++++++++++++++++++++++++++++
>> 1 file changed, 35 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index b29817ec6f2e..85cb7e748a89 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -709,6 +709,10 @@ static bool is_valid_passthrough_msr(u32 msr)
>> case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
>> /* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
>> return true;
>> + case MSR_IA32_U_CET:
>> + case MSR_IA32_S_CET:
>> + case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>> + return true;
>> }
>>
>> r = possible_passthrough_msr_slot(msr) != -ENOENT;
>> @@ -7758,6 +7762,34 @@ static void update_intel_pt_cfg(struct kvm_vcpu *vcpu)
>> vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4));
>> }
>>
>> +static void vmx_update_intercept_for_cet_msr(struct kvm_vcpu *vcpu)
>> +{
>> + if (guest_can_use(vcpu, X86_FEATURE_SHSTK)) {
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
>> + MSR_TYPE_RW, false);
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
>> + MSR_TYPE_RW, false);
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP,
>> + MSR_TYPE_RW, false);
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP,
>> + MSR_TYPE_RW, false);
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP,
>> + MSR_TYPE_RW, false);
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP,
>> + MSR_TYPE_RW, false);
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_INT_SSP_TAB,
>> + MSR_TYPE_RW, false);
>> + return;
>> + }
>> +
>> + if (guest_can_use(vcpu, X86_FEATURE_IBT)) {
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET,
>> + MSR_TYPE_RW, false);
>> + vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET,
>> + MSR_TYPE_RW, false);
>> + }
> This is incorrect. see
>
> https://lore.kernel.org/all/[email protected]/

Yes, will add the lost counterpart, thanks!

>
>> +}
>> +
>> static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>> {
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> @@ -7825,6 +7857,9 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>>
>> /* Refresh #PF interception to account for MAXPHYADDR changes. */
>> vmx_update_exception_bitmap(vcpu);
>> +
>> + if (kvm_is_cet_supported())
> Nit: this check is not necessary. here isn't a hot path. and if
> kvm_is_cet_supported() is false, guest_can_use(., X86_FEATURE_SHSTK/IBT)
> should be false.

Yes, I think it can be removed.

>
>> + vmx_update_intercept_for_cet_msr(vcpu);
>> }
>>
>> static u64 vmx_get_perf_capabilities(void)
>> --
>> 2.27.0
>>

2023-07-27 05:33:42

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs

>> > + case MSR_IA32_S_CET:
>> > + case MSR_KVM_GUEST_SSP:
>> > + case MSR_IA32_INT_SSP_TAB:
>> > + if (kvm_get_msr_common(vcpu, msr_info))
>> > + return 1;
>> > + if (msr_info->index == MSR_KVM_GUEST_SSP)
>> > + msr_info->data = vmcs_readl(GUEST_SSP);
>> > + else if (msr_info->index == MSR_IA32_S_CET)
>> > + msr_info->data = vmcs_readl(GUEST_S_CET);
>> > + else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>> > + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>> > + break;
>> > case MSR_IA32_DEBUGCTLMSR:
>> > msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
>> > break;
>> > @@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> > else
>> > vmx->pt_desc.guest.addr_a[index / 2] = data;
>> > break;
>> > +#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6))
>> bits9-6 are reserved for both intel and amd. Shouldn't this check be
>> done in the common code?
>
>My thinking is, on AMD platform, bit 63:2 is anyway reserved since it doesn't
>support IBT,

You can only say

bits 5:2 and bits 63:10 are reserved since AMD doens't support IBT.

bits 9:6 are reserved regardless of the support of IBT.

>
>so the checks in common code for AMD is enough, when the execution flow comes
>here,
>
>it should be vmx, and need this additional check.

The checks against reserved bits are common for AMD and Intel:

1. if SHSTK is supported, bit1:0 are not reserved.
2. if IBT is supported, bit5:2 and bit63:10 are not reserved
3. bit9:6 are always reserved.

There is nothing specific to Intel.

>
>>
>> > +#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
>> > +#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
>> > + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>> > + return kvm_set_msr_common(vcpu, msr_info);
>> this hunk can be dropped as well.
>
>In patch 16, these lines still need to be added back for PL{0,1,2}_SSP, so
>would like keep it

If that's the case, better to move it to patch 16, where the change
can be justified. And PL3_SSP should be removed anyway. and then
"msr_index != MSR_IA32_PL3_SSP" check in the below code snippet in
patch 16 can go away.

+ /*
+ * Write to the base SSP MSRs should happen ahead of toggling
+ * of IA32_S_CET.SH_STK_EN bit.
+ */
+ if (msr_index != MSR_IA32_PL3_SSP && data) {
+ vmx_disable_write_intercept_sss_msr(vcpu);
+ wrmsrl(msr_index, data);
+ }


>
>here.
>
>>
>> > + break;
>> > + case MSR_IA32_U_CET:
>> > + case MSR_IA32_S_CET:
>> > + case MSR_KVM_GUEST_SSP:
>> > + case MSR_IA32_INT_SSP_TAB:
>> > + if ((msr_index == MSR_IA32_U_CET ||
>> > + msr_index == MSR_IA32_S_CET) &&
>> > + ((data & ~VMX_CET_CONTROL_MASK) ||
>> > + !IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
>> > + (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS))
>> > + return 1;
>> how about
>>
>> case MSR_IA32_U_CET:
>> case MSR_IA32_S_CET:
>> if ((data & ~VMX_CET_CONTROL_MASK) || ...
>> ...
>>
>> case MSR_KVM_GUEST_SSP:
>> case MSR_IA32_INT_SSP_TAB:
>
>Do you mean to use "fallthrough"?

Yes.

2023-07-27 06:35:40

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 12/20] KVM:VMX: Introduce CET VMCS fields and control bits

On Thu, Jul 20, 2023 at 11:03:44PM -0400, Yang Weijiang wrote:
>Two XSAVES state bits are introduced for CET:
> IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
> IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.
>
>Six VMCS fields are introduced for CET:
> {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
> {HOST,GUEST}_SSP: Stores shadow stack pointer of current active task/thread.
> {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.
>
>On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
>control fields:
>If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET states are restored from

Nit: s/VM_EXIT_LOAD_HOST_CET_STATE/VM_EXIT_LOAD_CET_STATE

to align with the name you are actually using.

>the following VMCS fields at VM-Exit:
> HOST_S_CET
> HOST_SSP
> HOST_INTR_SSP_TABLE
>
>If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET states are loaded from

Nit: s/VM_ENTRY_LOAD_GUEST_CET_STATE/VM_ENTRY_LOAD_CET_STATE

>the following VMCS fields at VM-Entry:
> GUEST_S_CET
> GUEST_SSP
> GUEST_INTR_SSP_TABLE
>
>Co-developed-by: Zhang Yi Z <[email protected]>
>Signed-off-by: Zhang Yi Z <[email protected]>
>Signed-off-by: Yang Weijiang <[email protected]>

Reviewed-by: Chao Gao <[email protected]>

>---
> arch/x86/include/asm/vmx.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>index 0d02c4aafa6f..db7f93307349 100644
>--- a/arch/x86/include/asm/vmx.h
>+++ b/arch/x86/include/asm/vmx.h
>@@ -104,6 +104,7 @@
> #define VM_EXIT_CLEAR_BNDCFGS 0x00800000
> #define VM_EXIT_PT_CONCEAL_PIP 0x01000000
> #define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000
>+#define VM_EXIT_LOAD_CET_STATE 0x10000000
>
> #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
>
>@@ -117,6 +118,7 @@
> #define VM_ENTRY_LOAD_BNDCFGS 0x00010000
> #define VM_ENTRY_PT_CONCEAL_PIP 0x00020000
> #define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000
>+#define VM_ENTRY_LOAD_CET_STATE 0x00100000
>
> #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
>
>@@ -345,6 +347,9 @@ enum vmcs_field {
> GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
> GUEST_SYSENTER_ESP = 0x00006824,
> GUEST_SYSENTER_EIP = 0x00006826,
>+ GUEST_S_CET = 0x00006828,
>+ GUEST_SSP = 0x0000682a,
>+ GUEST_INTR_SSP_TABLE = 0x0000682c,
> HOST_CR0 = 0x00006c00,
> HOST_CR3 = 0x00006c02,
> HOST_CR4 = 0x00006c04,
>@@ -357,6 +362,9 @@ enum vmcs_field {
> HOST_IA32_SYSENTER_EIP = 0x00006c12,
> HOST_RSP = 0x00006c14,
> HOST_RIP = 0x00006c16,
>+ HOST_S_CET = 0x00006c18,
>+ HOST_SSP = 0x00006c1a,
>+ HOST_INTR_SSP_TABLE = 0x00006c1c
> };
>
> /*
>--
>2.27.0
>

2023-07-27 06:36:39

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access


On 7/26/2023 9:46 PM, Chao Gao wrote:
> On Wed, Jul 26, 2023 at 04:26:06PM +0800, Yang, Weijiang wrote:
>>>> + /*
>>>> + * This function cannot work without later CET MSR read/write
>>>> + * emulation patch.
>>> Probably you should consider merging the "later" patch into this one.
>>> Then you can get rid of this comment and make this patch easier for
>>> review ...
>> Which later patch you mean? If you mean [13/20] KVM:VMX: Emulate read and
>> write to CET MSRs,
>>
>> then I intentionally separate these two, this one is for CET MSR common
>> checks and operations,
>>
>> the latter is specific to VMX, and add the above comments in case someone is
> The problem of this organization is the handling of S_CET, SSP, INT_SSP_TABLE
> MSR is incomplete in this patch. I think a better organization is to either
> merge this patch and patch 13, or move all changes related to S_CET, SSP,
> INT_SSP_TABLE into patch 13? e.g.,

Yes, I'm thinking of merging this patch with patch 13 to make it
complete, thanks for

the suggestion!

>
> case MSR_IA32_U_CET:
> - case MSR_IA32_S_CET:
> if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> return 1;
> if ((!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
> (data & CET_SHSTK_MASK_BITS)) ||
> (!guest_can_use(vcpu, X86_FEATURE_IBT) &&
> (data & CET_IBT_MASK_BITS)))
> return 1;
> - if (msr == MSR_IA32_U_CET)
> - kvm_set_xsave_msr(msr_info);
> kvm_set_xsave_msr(msr_info);
> break;
> - case MSR_KVM_GUEST_SSP:
> - case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> return 1;
> if (is_noncanonical_address(data, vcpu))
> return 1;
> if (!IS_ALIGNED(data, 4))
> return 1;
> if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
> msr == MSR_IA32_PL2_SSP) {
> vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
> } else if (msr == MSR_IA32_PL3_SSP) {
> kvm_set_xsave_msr(msr_info);
> }
> break;
>
>
>
> BTW, shouldn't bit2:0 of MSR_KVM_GUEST_SSP be 0? i.e., for MSR_KVM_GUEST_SSP,
> the alignment check should be IS_ALIGNED(data, 8).

The check for GUEST_SSP should be consistent with that of PLx_SSPs,
otherwise there would

be issues, maybe I need to change the alignment check as :

#ifdef CONFIG_X86_64

if (!IS_ALIGNED(data, 8))
return 1;
#else
if (!IS_ALIGNED(data, 4))

return 1;
#endif

>
>> bisecting
>>
>> the patches and happens to split at this patch, then it would faulted and
>> take some actions.
> I am not sure what kind of issue you are worrying about. In my understanding,
> KVM hasn't advertised the support of IBT and SHSTK, so,
> kvm_cpu_cap_has(X86_FEATURE_SHSTK/IBT) will always return false. and then
> kvm_cet_is_msr_accessible() is guaranteed to return false.
>
> If there is any issue in your mind, you can fix it or reorganize your patches to
> avoid the issue. To me, adding a comment and a warning is not a good solution.

I will reorganize the patches and merge the code in this patch to patch 13.

>
>>>> int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>> {
>>>> u32 msr = msr_info->index;
>>>> @@ -3982,6 +4023,35 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>> vcpu->arch.guest_fpu.xfd_err = data;
>>>> break;
>>>> #endif
>>>> +#define CET_IBT_MASK_BITS GENMASK_ULL(63, 2)
>>> bit9:6 are reserved even if IBT is supported.
>> Yes, as IBT is only available on Intel platforms, I move the handling of bit
>> 9:6 to VMX  related patch.
> IIUC, bits 9:6 are not reserved for IBT. I don't get how IBT availability
> affects the handling of bits 9:6.

I handle it in this way,  when IBT is not available all bits 63:2 should
be handled as reserved. When IBT is

available, additional checks for bits 9:6 should be enforced.

>
>> Here's the common check in case IBT is not available.
>>
>>>> @@ -12131,6 +12217,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>>>
>>>> vcpu->arch.cr3 = 0;
>>>> kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
>>>> + memset(vcpu->arch.cet_s_ssp, 0, sizeof(vcpu->arch.cet_s_ssp));
>>> ... this begs the question: where other MSRs are reset. I suppose
>>> U_CET/PL3_SSP are handled when resetting guest FPU. But how about S_CET
>>> and INT_SSP_TAB? there is no answer in this patch.
>> I think the related guest VMCS fields(S_CET/INT_SSP_TAB/SSP) should be reset
>> to 0 in vmx_vcpu_reset(),
>>
>> do you think so?
> Yes, looks good.

2023-07-27 06:47:21

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 16/20] KVM:x86: Optimize CET supervisor SSP save/reload


On 7/27/2023 11:27 AM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:48PM -0400, Yang Weijiang wrote:
>> /*
>> * Writes msr value into the appropriate "register".
>> * Returns 0 on success, non-0 otherwise.
>> @@ -2427,7 +2439,16 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> #define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
>> #define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
>> case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>> - return kvm_set_msr_common(vcpu, msr_info);
>> + if (kvm_set_msr_common(vcpu, msr_info))
>> + return 1;
>> + /*
>> + * Write to the base SSP MSRs should happen ahead of toggling
>> + * of IA32_S_CET.SH_STK_EN bit.
> Is this a requirement from SDM? And how is this related to the change below?

No, after a second thought, the usage of the supervisor SSPs doesn't
necessary mean

supervisor SHSTK is being enabled, e.g., used as some HW registers. I'll
remove it.

>
> Note that PLx_SSP MSRs are linear addresses of shadow stacks for different CPLs.
> I may think using the page at 0 (assuming 0 is the reset value of PLx SSP) is
> allowed in architecture although probably no kernel will do so.
>
> I don't understand why this comment is needed. I suggest dropping it.

will drop it, thanks!

>
>> + */
>> + if (msr_index != MSR_IA32_PL3_SSP && data) {
>> + vmx_disable_write_intercept_sss_msr(vcpu);
>> + wrmsrl(msr_index, data);
>> + }

2023-07-27 07:02:25

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 17/20] KVM:x86: Enable CET virtualization for VMX and advertise to userspace

On Thu, Jul 20, 2023 at 11:03:49PM -0400, Yang Weijiang wrote:
> vmcs_conf->size = vmx_msr_high & 0x1fff;
>- vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>+ vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;

why bit 13 and 14 are masked here?

2023-07-27 08:14:34

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 17/20] KVM:x86: Enable CET virtualization for VMX and advertise to userspace


On 7/27/2023 2:32 PM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:49PM -0400, Yang Weijiang wrote:
>> vmcs_conf->size = vmx_msr_high & 0x1fff;
>> - vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>> + vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
> why bit 13 and 14 are masked here?

Oops, this is typo! Thanks!


2023-07-27 08:15:42

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 15/20] KVM:VMX: Save host MSR_IA32_S_CET to VMCS field


On 7/26/2023 10:05 PM, Sean Christopherson wrote:
> On Wed, Jul 26, 2023, Chao Gao wrote:
>> On Thu, Jul 20, 2023 at 11:03:47PM -0400, Yang Weijiang wrote:
>>> Save host MSR_IA32_S_CET to VMCS field as host constant state.
>>> Kernel IBT is supported now and the setting in MSR_IA32_S_CET
>>> is static after post-boot except in BIOS call case, but vCPU
>>> won't execute such BIOS call path currently, so it's safe to
>>> make the MSR as host constant.
>>>
>>> Suggested-by: Sean Christopherson <[email protected]>
>>> Signed-off-by: Yang Weijiang <[email protected]>
>>> ---
>>> arch/x86/kvm/vmx/capabilities.h | 4 ++++
>>> arch/x86/kvm/vmx/vmx.c | 8 ++++++++
>>> 2 files changed, 12 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
>>> index d0abee35d7ba..b1883f6c08eb 100644
>>> --- a/arch/x86/kvm/vmx/capabilities.h
>>> +++ b/arch/x86/kvm/vmx/capabilities.h
>>> @@ -106,6 +106,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
>>> return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
>>> }
>>>
>>> +static inline bool cpu_has_load_cet_ctrl(void)
>>> +{
>>> + return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
>> VM_ENTRY_LOAD_CET_STATE is to load guest state. Strictly speaking, you
>> should check VM_EXIT_LOAD_HOST_CET_STATE though I believe CPUs will
>> support both or none.
> No need, pairs are now handled by setup_vmcs_config(). See commit f5a81d0eb01e
> ("KVM: VMX: Sanitize VM-Entry/VM-Exit control pairs at kvm_intel load time"), and
> then patch 17 does:
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 3eb4fe9c9ab6..3f2f966e327d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2641,6 +2641,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> { VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER },
> { VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS },
> { VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL },
> + { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE },
> };
>
>>> +}
>>> static inline bool cpu_has_vmx_mpx(void)
>>> {
>>> return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index 85cb7e748a89..cba24acf1a7a 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -109,6 +109,8 @@ module_param(enable_apicv, bool, S_IRUGO);
>>> bool __read_mostly enable_ipiv = true;
>>> module_param(enable_ipiv, bool, 0444);
>>>
>>> +static u64 __read_mostly host_s_cet;
>> caching host's value is to save an MSR read on vCPU creation?
> Yep. And probably more importantly, to document that the host value is static,
> i.e. that KVM doesn't need to refresh S_CET before every VM-Enter/VM-Exit sequence.

OK, will add it to change log, thanks!


2023-07-27 08:17:33

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs


On 7/27/2023 1:16 PM, Chao Gao wrote:
>>>> + case MSR_IA32_S_CET:
>>>> + case MSR_KVM_GUEST_SSP:
>>>> + case MSR_IA32_INT_SSP_TAB:
>>>> + if (kvm_get_msr_common(vcpu, msr_info))
>>>> + return 1;
>>>> + if (msr_info->index == MSR_KVM_GUEST_SSP)
>>>> + msr_info->data = vmcs_readl(GUEST_SSP);
>>>> + else if (msr_info->index == MSR_IA32_S_CET)
>>>> + msr_info->data = vmcs_readl(GUEST_S_CET);
>>>> + else if (msr_info->index == MSR_IA32_INT_SSP_TAB)
>>>> + msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
>>>> + break;
>>>> case MSR_IA32_DEBUGCTLMSR:
>>>> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
>>>> break;
>>>> @@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>> else
>>>> vmx->pt_desc.guest.addr_a[index / 2] = data;
>>>> break;
>>>> +#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6))
>>> bits9-6 are reserved for both intel and amd. Shouldn't this check be
>>> done in the common code?
>> My thinking is, on AMD platform, bit 63:2 is anyway reserved since it doesn't
>> support IBT,
> You can only say
>
> bits 5:2 and bits 63:10 are reserved since AMD doens't support IBT.
>
> bits 9:6 are reserved regardless of the support of IBT.
>
>> so the checks in common code for AMD is enough, when the execution flow comes
>> here,
>>
>> it should be vmx, and need this additional check.
> The checks against reserved bits are common for AMD and Intel:
>
> 1. if SHSTK is supported, bit1:0 are not reserved.
> 2. if IBT is supported, bit5:2 and bit63:10 are not reserved
> 3. bit9:6 are always reserved.
>
> There is nothing specific to Intel.

So you want the code to be:

+#define CET_IBT_MASK_BITS          (GENMASK_ULL(5, 2) | GENMASK_ULL(63,
10))

+#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)

+#define CET_SHSTK_MASK_BITSGENMASK(1, 0)

+if ((!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&

+(data & CET_SHSTK_MASK_BITS)) ||

+(!guest_can_use(vcpu, X86_FEATURE_IBT) &&

+(data & CET_IBT_MASK_BITS)) ||

                            (data & CET_CTRL_RESERVED_BITS) )

^^^^^^^^^^^^^^^^^^^^^^^^^

+return 1;

>
>>>> +#define CET_LEG_BITMAP_BASE(data) ((data) >> 12)
>>>> +#define CET_EXCLUSIVE_BITS (CET_SUPPRESS | CET_WAIT_ENDBR)
>>>> + case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>>>> + return kvm_set_msr_common(vcpu, msr_info);
>>> this hunk can be dropped as well.
>> In patch 16, these lines still need to be added back for PL{0,1,2}_SSP, so
>> would like keep it
> If that's the case, better to move it to patch 16, where the change
> can be justified. And PL3_SSP should be removed anyway. and then
> "msr_index != MSR_IA32_PL3_SSP" check in the below code snippet in
> patch 16 can go away.

Sure, will do it.

>
> + /*
> + * Write to the base SSP MSRs should happen ahead of toggling
> + * of IA32_S_CET.SH_STK_EN bit.
> + */
> + if (msr_index != MSR_IA32_PL3_SSP && data) {
> + vmx_disable_write_intercept_sss_msr(vcpu);
> + wrmsrl(msr_index, data);
> + }
>
>
>> here.
>>
>>>> + break;
>>>> + case MSR_IA32_U_CET:
>>>> + case MSR_IA32_S_CET:
>>>> + case MSR_KVM_GUEST_SSP:
>>>> + case MSR_IA32_INT_SSP_TAB:
>>>> + if ((msr_index == MSR_IA32_U_CET ||
>>>> + msr_index == MSR_IA32_S_CET) &&
>>>> + ((data & ~VMX_CET_CONTROL_MASK) ||
>>>> + !IS_ALIGNED(CET_LEG_BITMAP_BASE(data), 4) ||
>>>> + (data & CET_EXCLUSIVE_BITS) == CET_EXCLUSIVE_BITS))
>>>> + return 1;
>>> how about
>>>
>>> case MSR_IA32_U_CET:
>>> case MSR_IA32_S_CET:
>>> if ((data & ~VMX_CET_CONTROL_MASK) || ...
>>> ...
>>>
>>> case MSR_KVM_GUEST_SSP:
>>> case MSR_IA32_INT_SSP_TAB:
>> Do you mean to use "fallthrough"?
> Yes.

OK, will change it, thanks!



2023-07-27 08:22:34

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access

>> - case MSR_KVM_GUEST_SSP:
>> - case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
>> case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
>> if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
>> return 1;
>> if (is_noncanonical_address(data, vcpu))
>> return 1;
>> if (!IS_ALIGNED(data, 4))
>> return 1;
>> if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
>> msr == MSR_IA32_PL2_SSP) {
>> vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
>> } else if (msr == MSR_IA32_PL3_SSP) {
>> kvm_set_xsave_msr(msr_info);
>> }
>> break;
>>
>>
>>
>> BTW, shouldn't bit2:0 of MSR_KVM_GUEST_SSP be 0? i.e., for MSR_KVM_GUEST_SSP,
>> the alignment check should be IS_ALIGNED(data, 8).
>
>The check for GUEST_SSP should be consistent with that of PLx_SSPs, otherwise
>there would be issues

OK. I had the question because Gil said in a previous email:

IDT event delivery, when changing to rings 0-2 will load SSP from the
MSR corresponding to the new ring. These transitions check that bits
2:0 of the new value are all zero and will generate a nested fault if
any of those bits are set. (Far CALL using a call gate also checks this
if changing CPL.)

it sounds to me, at least for CPL0-2, SSP (or the synethic
MSR_KVM_GUEST_SSP) should be 8-byte aligned. Otherwise, there will be a
nested fault when trying to load SSP.

I might be overly cautious. No objection to do IS_ALIGNED(data, 4) for SSP.

2023-07-27 09:08:12

by Chao Gao

[permalink] [raw]
Subject: Re: [PATCH v4 18/20] KVM:x86: Enable guest CET supervisor xstate bit support

On Thu, Jul 20, 2023 at 11:03:50PM -0400, Yang Weijiang wrote:
>
> u64 __read_mostly host_efer;
> EXPORT_SYMBOL_GPL(host_efer);
>@@ -9638,6 +9639,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> if (boot_cpu_has(X86_FEATURE_XSAVES)) {
> rdmsrl(MSR_IA32_XSS, host_xss);
> kvm_caps.supported_xss = host_xss & KVM_SUPPORTED_XSS;

>+ kvm_caps.supported_xss |= XFEATURE_MASK_CET_KERNEL;

Hardware may not support S_CET state management via XSAVES. we need to
consult CPUID.0xD.1[bit12].

2023-07-27 09:14:39

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 12/20] KVM:VMX: Introduce CET VMCS fields and control bits


On 7/27/2023 1:26 PM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 11:03:44PM -0400, Yang Weijiang wrote:
>> Two XSAVES state bits are introduced for CET:
>> IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
>> IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.
>>
>> Six VMCS fields are introduced for CET:
>> {HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
>> {HOST,GUEST}_SSP: Stores shadow stack pointer of current active task/thread.
>> {HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.
>>
>> On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
>> control fields:
>> If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET states are restored from
> Nit: s/VM_EXIT_LOAD_HOST_CET_STATE/VM_EXIT_LOAD_CET_STATE
>
> to align with the name you are actually using.
>
>> the following VMCS fields at VM-Exit:
>> HOST_S_CET
>> HOST_SSP
>> HOST_INTR_SSP_TABLE
>>
>> If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET states are loaded from
> Nit: s/VM_ENTRY_LOAD_GUEST_CET_STATE/VM_ENTRY_LOAD_CET_STATE

Sure, will change it, thanks a lot!

>
>> the following VMCS fields at VM-Entry:
>> GUEST_S_CET
>> GUEST_SSP
>> GUEST_INTR_SSP_TABLE
>>
>> Co-developed-by: Zhang Yi Z <[email protected]>
>> Signed-off-by: Zhang Yi Z <[email protected]>
>> Signed-off-by: Yang Weijiang <[email protected]>
> Reviewed-by: Chao Gao <[email protected]>
>
>> [...]

2023-07-27 15:34:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs

On Thu, Jul 27, 2023, Weijiang Yang wrote:
>
> On 7/27/2023 1:16 PM, Chao Gao wrote:
> > > > > @@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > > > else
> > > > > vmx->pt_desc.guest.addr_a[index / 2] = data;
> > > > > break;
> > > > > +#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6))
> > > > bits9-6 are reserved for both intel and amd. Shouldn't this check be
> > > > done in the common code?
> > > My thinking is, on AMD platform, bit 63:2 is anyway reserved since it doesn't
> > > support IBT,
> > You can only say
> >
> > bits 5:2 and bits 63:10 are reserved since AMD doens't support IBT.
> >
> > bits 9:6 are reserved regardless of the support of IBT.
> >
> > > so the checks in common code for AMD is enough, when the execution flow comes
> > > here,
> > >
> > > it should be vmx, and need this additional check.
> > The checks against reserved bits are common for AMD and Intel:
> >
> > 1. if SHSTK is supported, bit1:0 are not reserved.
> > 2. if IBT is supported, bit5:2 and bit63:10 are not reserved
> > 3. bit9:6 are always reserved.
> >
> > There is nothing specific to Intel.

+1

> So you want the code to be:
>
> +#define CET_IBT_MASK_BITS          (GENMASK_ULL(5, 2) | GENMASK_ULL(63,
> 10))
>
> +#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)
>
> +#define CET_SHSTK_MASK_BITSGENMASK(1, 0)
>
> +if ((!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>
> +(data & CET_SHSTK_MASK_BITS)) ||
>
> +(!guest_can_use(vcpu, X86_FEATURE_IBT) &&
>
> +(data & CET_IBT_MASK_BITS)) ||
>
>                             (data & CET_CTRL_RESERVED_BITS) )
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^

Yes, though I vote to separate each check, e.g.

if (data & CET_CTRL_RESERVED_BITS)
return 1;

if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) && (data & CET_SHSTK_MASK_BITS))
return 1;

if (!guest_can_use(vcpu, X86_FEATURE_IBT) && (data & CET_IBT_MASK_BITS))
return 1;

I would expect the code generation to be similar, if not outright identical, and
IMO it's easier to quickly understand the flow if each check is a separate if-statement.

2023-07-27 17:34:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 09/20] KVM:x86: Add common code of CET MSR access

On Thu, Jul 27, 2023, Chao Gao wrote:
> >> - case MSR_KVM_GUEST_SSP:
> >> - case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
> >> case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
> >> if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> >> return 1;
> >> if (is_noncanonical_address(data, vcpu))
> >> return 1;
> >> if (!IS_ALIGNED(data, 4))
> >> return 1;
> >> if (msr == MSR_IA32_PL0_SSP || msr == MSR_IA32_PL1_SSP ||
> >> msr == MSR_IA32_PL2_SSP) {
> >> vcpu->arch.cet_s_ssp[msr - MSR_IA32_PL0_SSP] = data;
> >> } else if (msr == MSR_IA32_PL3_SSP) {
> >> kvm_set_xsave_msr(msr_info);
> >> }
> >> break;
> >>
> >>
> >>
> >> BTW, shouldn't bit2:0 of MSR_KVM_GUEST_SSP be 0? i.e., for MSR_KVM_GUEST_SSP,
> >> the alignment check should be IS_ALIGNED(data, 8).
> >
> >The check for GUEST_SSP should be consistent with that of PLx_SSPs, otherwise
> >there would be issues
>
> OK. I had the question because Gil said in a previous email:
>
> IDT event delivery, when changing to rings 0-2 will load SSP from the
> MSR corresponding to the new ring. These transitions check that bits
> 2:0 of the new value are all zero and will generate a nested fault if
> any of those bits are set. (Far CALL using a call gate also checks this
> if changing CPL.)
>
> it sounds to me, at least for CPL0-2, SSP (or the synethic
> MSR_KVM_GUEST_SSP) should be 8-byte aligned. Otherwise, there will be a
> nested fault when trying to load SSP.

Yes, but that's the guest's problem. KVM's responsibility is purely to faithfully
emulate hardware, which in this case means requiring that bits 1:0 be cleared on
the WRMSR. *Architecturally*, software is allowed to set bit 2, and only if/when
the vCPU consumes the "bad" value by transitioning to the relevant CPL will the
CPU generate a fault.

2023-07-28 01:19:12

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v4 13/20] KVM:VMX: Emulate read and write to CET MSRs


On 7/27/2023 11:20 PM, Sean Christopherson wrote:
> On Thu, Jul 27, 2023, Weijiang Yang wrote:
>> On 7/27/2023 1:16 PM, Chao Gao wrote:
>>>>>> @@ -2402,6 +2417,31 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>>>>> else
>>>>>> vmx->pt_desc.guest.addr_a[index / 2] = data;
>>>>>> break;
>>>>>> +#define VMX_CET_CONTROL_MASK (~GENMASK_ULL(9, 6))
>>>>> bits9-6 are reserved for both intel and amd. Shouldn't this check be
>>>>> done in the common code?
>>>> My thinking is, on AMD platform, bit 63:2 is anyway reserved since it doesn't
>>>> support IBT,
>>> You can only say
>>>
>>> bits 5:2 and bits 63:10 are reserved since AMD doens't support IBT.
>>>
>>> bits 9:6 are reserved regardless of the support of IBT.
>>>
>>>> so the checks in common code for AMD is enough, when the execution flow comes
>>>> here,
>>>>
>>>> it should be vmx, and need this additional check.
>>> The checks against reserved bits are common for AMD and Intel:
>>>
>>> 1. if SHSTK is supported, bit1:0 are not reserved.
>>> 2. if IBT is supported, bit5:2 and bit63:10 are not reserved
>>> 3. bit9:6 are always reserved.
>>>
>>> There is nothing specific to Intel.
> +1
>
>> So you want the code to be:
>>
>> +#define CET_IBT_MASK_BITS          (GENMASK_ULL(5, 2) | GENMASK_ULL(63,
>> 10))
>>
>> +#define CET_CTRL_RESERVED_BITS GENMASK(9, 6)
>>
>> +#define CET_SHSTK_MASK_BITSGENMASK(1, 0)
>>
>> +if ((!guest_can_use(vcpu, X86_FEATURE_SHSTK) &&
>>
>> +(data & CET_SHSTK_MASK_BITS)) ||
>>
>> +(!guest_can_use(vcpu, X86_FEATURE_IBT) &&
>>
>> +(data & CET_IBT_MASK_BITS)) ||
>>
>>                             (data & CET_CTRL_RESERVED_BITS) )
>>
>> ^^^^^^^^^^^^^^^^^^^^^^^^^
> Yes, though I vote to separate each check, e.g.
>
> if (data & CET_CTRL_RESERVED_BITS)
> return 1;
>
> if (!guest_can_use(vcpu, X86_FEATURE_SHSTK) && (data & CET_SHSTK_MASK_BITS))
> return 1;
>
> if (!guest_can_use(vcpu, X86_FEATURE_IBT) && (data & CET_IBT_MASK_BITS))
> return 1;
>
> I would expect the code generation to be similar, if not outright identical, and
> IMO it's easier to quickly understand the flow if each check is a separate if-statement.

It looks good to me! Thank you!