2019-02-26 06:33:25

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 0/8] This patch-set is to enable Guest CET support

Control-flow Enforcement Technology (CET) provides protection against
return/jump-oriented programming (ROP) attacks. To make kvm Guest OS own
the capability, this patch-set is required. It enables CET related CPUID
report, xsaves/xrstors, vmx entry configuration etc. for Guest OS.

PATCH 1 : Define CET VMCS fields and bits.
PATCH 2/3 : Report CET feature support in CPUID.
PATCH 4 : Fix xsaves size calculation issue.
PATCH 5 : Pass through CET MSRs to Guest.
PATCH 6 : Set Guest CET state auto loading bit.
PATCH 7 : Enable CET xsaves bits support in XSS.
PATCH 8 : Add CET MSR user space access interface.

Changelog:
v3:
- Modified patches to make Guest CET independent to Host enabling.
- Added patch 8 to add user space access for Guest CET MSR access.
- Modified code comments and patch description to reflect changes.

v2:
- Re-ordered patch sequence, combined one patch.
- Added more description for CET related VMCS fields.
- Added Host CET capability check while enabling Guest CET loading bit.
- Added Host CET capability check while reporting Guest CPUID(EAX=7,
EXC=0).
- Modified code in reporting Guest CPUID(EAX=D,ECX>=1), make it clearer.
- Added Host and Guest XSS mask check while setting bits for Guest XSS.


Yang Weijiang (8):
KVM:VMX: Define CET VMCS fields and bits
KVM:CPUID: Define CET CPUID bits and CR4.CET master enable bit.
KVM:CPUID: Add CPUID support for Guest CET
KVM:CPUID: Fix xsaves area size calculation for CPUID.(EAX=0xD,ECX=1).
KVM:VMX: Pass through host CET related MSRs to Guest.
KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest
KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.
KVM:X86: Add user-space read/write interface for CET MSRs.

arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/asm/vmx.h | 8 ++++
arch/x86/kvm/cpuid.c | 67 ++++++++++++++++++++++++---------
arch/x86/kvm/vmx.c | 53 ++++++++++++++++++++++++--
arch/x86/kvm/x86.c | 46 ++++++++++++++++++++--
arch/x86/kvm/x86.h | 4 ++
6 files changed, 157 insertions(+), 24 deletions(-)

--
2.17.1



2019-02-26 06:33:33

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 2/8] KVM:CPUID: Define CET CPUID bits and CR4.CET master enable bit.

Guest queries CET SHSTK and IBT support by CPUID.(EAX=0x7,ECX=0),
in return, ECX[bit 7] corresponds to SHSTK feature, and EDX[bit 20]
corresponds to IBT feature.
CR4.CET[bit 23] is CET master enable bit, it controls CET feature
availability in guest OS.

Note: Although SHSTK or IBT can be enabled independently,
either of the features is controlled by CR4.CET.

Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/kvm/cpuid.c | 4 ++--
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55e51ff7e421..df002936088f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -90,7 +90,8 @@
| X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
| X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
- | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
+ | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
+ | X86_CR4_CET))

#define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7bcfa61375c0..cb1aece25b17 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -406,12 +406,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
- F(CLDEMOTE);
+ F(CLDEMOTE) | F(SHSTK);

/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
- F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES);
+ F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(IBT);

/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
--
2.17.1


2019-02-26 06:33:49

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

"Load Guest CET state" bit controls whether guest CET states
will be loaded at Guest entry. Before doing that, KVM needs
to check if CPU CET feature is available.

Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 89ee086e1729..d32cee9ee079 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -55,6 +55,7 @@
#include <asm/mmu_context.h>
#include <asm/spec-ctrl.h>
#include <asm/mshyperv.h>
+#include <asm/cet.h>

#include "trace.h"
#include "pmu.h"
@@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
return !(val & ~valid_bits);
}

+static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
+{
+ u32 eax, ebx, ecx, edx;
+
+ /*
+ * Guest CET can work as long as HW supports the feature, independent
+ * to Host SW enabling status.
+ */
+ cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
+
+ return ((ecx & bit(X86_FEATURE_SHSTK)) |
+ (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
+}
+
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
switch (msr->index) {
@@ -5409,6 +5424,23 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
return 1;
}

+ /*
+ * To enable Guest CET, check whether CPU CET feature is
+ * available, if it's there, set Guest CET state loading bit
+ * per CR4.CET status, otherwise, return a fault to Guest.
+ */
+ if (vmx_guest_cet_cap(vcpu)) {
+ if (cr4 & X86_CR4_CET) {
+ vmcs_set_bits(VM_ENTRY_CONTROLS,
+ VM_ENTRY_LOAD_GUEST_CET_STATE);
+ } else {
+ vmcs_clear_bits(VM_ENTRY_CONTROLS,
+ VM_ENTRY_LOAD_GUEST_CET_STATE);
+ }
+ } else if (cr4 & X86_CR4_CET) {
+ return 1;
+ }
+
if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
return 1;

--
2.17.1


2019-02-26 06:33:53

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

For Guest XSS, right now, only bit 11(user states) and bit 12
(supervisor states) are supported, if other bits are being set,
need to modify KVM_SUPPORTED_XSS macro to have support.

Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d32cee9ee079..68908ed7b151 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -47,6 +47,7 @@
#include <asm/virtext.h>
#include <asm/mce.h>
#include <asm/fpu/internal.h>
+#include <asm/fpu/types.h>
#include <asm/perf_event.h>
#include <asm/debugreg.h>
#include <asm/kexec.h>
@@ -4336,12 +4337,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_XSS:
if (!vmx_xsaves_supported())
return 1;
+
/*
- * The only supported bit as of Skylake is bit 8, but
- * it is not supported on KVM.
+ * Check bits being set are supported in KVM.
*/
- if (data != 0)
+ if (data & ~kvm_supported_xss())
return 1;
+
vcpu->arch.ia32_xss = data;
if (vcpu->arch.ia32_xss != host_xss)
add_atomic_switch_msr(vmx, MSR_IA32_XSS,
--
2.17.1


2019-02-26 06:34:02

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 8/8] KVM:X86: Add user-space read/write interface for CET MSRs.

The Guest MSRs are stored in fpu storage area, they are
operated by XSAVES/XRSTORS, so use kvm_load_guest_fpu
to restore them is a convenient way to let KVM access
them. After finish operation, need to restore Host MSR
contents by kvm_put_guest_fpu.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a0f8b71b2132..a4bdbef3a712 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -75,6 +75,8 @@

#define MAX_IO_MSRS 256
#define KVM_MAX_MCE_BANKS 32
+#define MAX_GUEST_CET_MSRS 5
+
u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;
EXPORT_SYMBOL_GPL(kvm_mce_cap_supported);

@@ -214,6 +216,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
u64 __read_mostly host_xcr0;

static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);

static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
{
@@ -2889,21 +2893,57 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
EXPORT_SYMBOL_GPL(kvm_get_msr_common);

+static int do_cet_msrs(struct kvm_vcpu *vcpu, int entry_num,
+ struct kvm_msr_entry *entries, bool read)
+{
+ int i = entry_num;
+ int j = MAX_GUEST_CET_MSRS;
+ bool has_cet;
+
+ has_cet = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) |
+ guest_cpuid_has(vcpu, X86_FEATURE_IBT);
+ /*
+ * Guest CET MSRs are saved by XSAVES, so need to restore
+ * them first, then read out or update the contents and
+ * restore Host ones.
+ */
+ if (has_cet) {
+ kvm_load_guest_fpu(vcpu);
+
+ if (read) {
+ for (j = 0; j < MAX_GUEST_CET_MSRS; j++, i++)
+ rdmsrl(entries[i].index, entries[i].data);
+ } else {
+ for (j = 0; j < MAX_GUEST_CET_MSRS; j++, i++)
+ wrmsrl(entries[i].index, entries[i].data);
+ }
+
+ kvm_put_guest_fpu(vcpu);
+ }
+ return j;
+}
/*
* Read or write a bunch of msrs. All parameters are kernel addresses.
*
* @return number of msrs set successfully.
*/
static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
- struct kvm_msr_entry *entries,
+ struct kvm_msr_entry *entries, bool read,
int (*do_msr)(struct kvm_vcpu *vcpu,
unsigned index, u64 *data))
{
int i;

- for (i = 0; i < msrs->nmsrs; ++i)
+ for (i = 0; i < msrs->nmsrs; ++i) {
+ /* If it comes to CET related MSRs, read them together. */
+ if (entries[i].index == MSR_IA32_U_CET) {
+ i += do_cet_msrs(vcpu, i, entries, read) - 1;
+ continue;
+ }
+
if (do_msr(vcpu, entries[i].index, &entries[i].data))
break;
+ }

return i;
}
@@ -2938,7 +2978,7 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
goto out;
}

- r = n = __msr_io(vcpu, &msrs, entries, do_msr);
+ r = n = __msr_io(vcpu, &msrs, entries, !!writeback, do_msr);
if (r < 0)
goto out_free;

--
2.17.1


2019-02-26 06:34:13

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 5/8] KVM:VMX: Pass through host CET related MSRs to Guest.

The CET runtime settings, i.e., CET state control bits(IA32_U_CET/
IA32_S_CET), CET SSP(IA32_PL3_SSP/IA32_PL0_SSP) and SSP table address
(IA32_INTERRUPT_SSP_TABLE_ADDR) are task/thread specific, therefore,
OS needs to save/restore the states properly during context switch,
e.g., task/thread switching, interrupt/exception handling, it uses
xsaves/xrstors to achieve that.

The difference between VMCS CET area fields and xsave CET area, is that
the former is for state retention during Guest/Host context
switch while the latter is for state retention during OS execution.

Linux currently doesn't support CPL1 and CPL2, so SSPs for these level
are skipped here.

Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/vmx.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7bbb8b26e901..89ee086e1729 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11769,6 +11769,7 @@ static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ unsigned long *msr_bitmap;

if (cpu_has_secondary_exec_ctrls()) {
vmx_compute_secondary_exec_control(vmx);
@@ -11786,6 +11787,18 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
nested_vmx_cr_fixed1_bits_update(vcpu);
nested_vmx_entry_exit_ctls_update(vcpu);
}
+
+ msr_bitmap = vmx->vmcs01.msr_bitmap;
+
+ if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) |
+ guest_cpuid_has(vcpu, X86_FEATURE_IBT)) {
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW);
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+ vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+ }
+
}

static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
--
2.17.1


2019-02-26 06:34:24

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 4/8] KVM:CPUID: Fix xsaves area size calculation for CPUID.(EAX=0xD,ECX=1).

According to latest Software Development Manual vol.2/3.2,
for CPUID.(EAX=0xD,ECX=1), it should report xsaves area size
containing all states enabled by XCR0|IA32_MSR_XSS.

Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/cpuid.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5e05756cc6db..f71c3d8d6ec3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -131,7 +131,8 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)

best = kvm_find_cpuid_entry(vcpu, 0xD, 1);
if (best && (best->eax & (F(XSAVES) | F(XSAVEC))))
- best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+ best->ebx = xstate_required_size(vcpu->arch.xcr0 |
+ kvm_supported_xss(), true);

/*
* The existing code assumes virtual address is 48-bit or 57-bit in the
--
2.17.1


2019-02-26 06:34:31

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

Guest CET SHSTK and IBT capability are reported via
CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
Guest user mode and supervisor mode xsaves component size
is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
respectively.

Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
arch/x86/kvm/x86.h | 4 +++
2 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index cb1aece25b17..5e05756cc6db 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
return xcr0;
}

+u64 kvm_supported_xss(void)
+{
+ u64 xss;
+
+ rdmsrl(MSR_IA32_XSS, xss);
+ xss &= KVM_SUPPORTED_XSS;
+ return xss;
+}
+EXPORT_SYMBOL(kvm_supported_xss);
+
#define F(x) bit(X86_FEATURE_##x)

/* For scattered features from cpufeatures.h; we currently expose none */
@@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
u32 index, int *nent, int maxnent)
{
int r;
+ u32 eax, ebx, ecx, edx;
unsigned f_nx = is_efer_nx() ? F(NX) : 0;
#ifdef CONFIG_X86_64
unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
@@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
* if the host doesn't support it.
*/
entry->edx |= F(ARCH_CAPABILITIES);
+
+ /*
+ * Guest OS CET enabling is designed independent to
+ * host enabling, it only has dependency on Host HW
+ * capability, if it has, report CET support to
+ * Guest.
+ */
+ cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
+ if (ecx & F(SHSTK))
+ entry->ecx |= F(SHSTK);
+
+ if (edx & F(IBT))
+ entry->edx |= F(IBT);
+
} else {
entry->ebx = 0;
entry->ecx = 0;
@@ -564,14 +589,17 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
}
case 0xd: {
int idx, i;
- u64 supported = kvm_supported_xcr0();
+ u64 u_supported = kvm_supported_xcr0();
+ u64 s_supported = kvm_supported_xss();
+ u64 supported;
+ int compacted;

- entry->eax &= supported;
- entry->ebx = xstate_required_size(supported, false);
+ entry->eax &= u_supported;
+ entry->ebx = xstate_required_size(u_supported, false);
entry->ecx = entry->ebx;
- entry->edx &= supported >> 32;
+ entry->edx &= u_supported >> 32;
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
- if (!supported)
+ if (!u_supported && !s_supported)
break;

for (idx = 1, i = 1; idx < 64; ++idx) {
@@ -583,19 +611,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
if (idx == 1) {
entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
- entry[i].ebx = 0;
- if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
- entry[i].ebx =
- xstate_required_size(supported,
- true);
+ supported = u_supported | s_supported;
+ compacted = entry[i].eax &
+ (F(XSAVES) | F(XSAVEC));
+ entry[i].ebx = xstate_required_size(supported,
+ compacted);
+ entry[i].ecx &= s_supported;
+ entry[i].edx = 0;
} else {
+ supported = (entry[i].ecx & 1) ? s_supported :
+ u_supported;
if (entry[i].eax == 0 || !(supported & mask))
continue;
- if (WARN_ON_ONCE(entry[i].ecx & 1))
- continue;
+ entry[i].ecx &= 1;
+ entry[i].edx = 0;
+ if (entry[i].ecx)
+ entry[i].ebx = 0;
}
- entry[i].ecx = 0;
- entry[i].edx = 0;
entry[i].flags |=
KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
++*nent;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 224cd0a47568..c61da41c3c5c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -283,6 +283,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU)
+
+#define KVM_SUPPORTED_XSS (XFEATURE_MASK_SHSTK_USER \
+ | XFEATURE_MASK_SHSTK_KERNEL)
+
extern u64 host_xcr0;

extern u64 kvm_supported_xcr0(void);
--
2.17.1


2019-02-26 06:34:40

by Yang, Weijiang

[permalink] [raw]
Subject: [PATCH v3 1/8] KVM:VMX: Define CET VMCS fields and bits

CET - Control-flow Enforcement Technology, it's used to
protect against return/jump oriented programming (ROP)
attacks. It provides the following capabilities to defend
against ROP/JOP style control-flow subversion attacks:
- Shadow Stack (SHSTK):
A second stack for the program that is
used exclusively for control transfer
operations.
- Indirect Branch Tracking (IBT):
Free branch protection to defend against
Jump/Call Oriented Programming.

On processors that support CET, VMX saves/restores
the states of IA32_S_CET, SSP and IA32_INTR_SSP_TABL_ADDR MSR
to the VMCS area for Guest/Host unconditionally.

If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET MSRs are
restored from VMCS host-state area at VM exit as follows:

- HOST_IA32_S_CET: Host supervisor mode IA32_S_CET MSR is loaded
from this field.

- HOST_SSP : Host SSP is loaded from this field.

- HOST_INTR_SSP_TABL_ADDR : Host IA32_INTR_SSP_TABL_ADDR
MSR is loaded from this field.

If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET MSRs are loaded
from VMCS guest-state area at VM entry as follows:

- GUEST_IA32_S_CET : Guest supervisor mode IA32_S_CET MSR is loaded
from this field.

- GUEST_SSP : Guest SSP is loaded from this field.

- GUEST_INTR_SSP_TABL_ADDR : Guest IA32_INTR_SSP_TABL_ADDR
MSR is loaded from this field.

Additionally, to context switch guest and host CET states, the VMM
uses xsaves/xrstors instructions to save/restore the guest CET states
at VM exit/entry. The CET xsave area is within thread_struct.fpu area.
If OS execution flow changes during task switch/interrupt/exception etc.,
the OS also relies on xsaves/xrstors to switch CET states accordingly.

Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/vmx.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index ade0f153947d..395c1f7e5938 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -98,6 +98,7 @@
#define VM_EXIT_LOAD_IA32_EFER 0x00200000
#define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
+#define VM_EXIT_LOAD_HOST_CET_STATE 0x10000000

#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff

@@ -109,6 +110,7 @@
#define VM_ENTRY_LOAD_IA32_PAT 0x00004000
#define VM_ENTRY_LOAD_IA32_EFER 0x00008000
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
+#define VM_ENTRY_LOAD_GUEST_CET_STATE 0x00100000

#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff

@@ -325,6 +327,9 @@ enum vmcs_field {
GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
GUEST_SYSENTER_ESP = 0x00006824,
GUEST_SYSENTER_EIP = 0x00006826,
+ GUEST_IA32_S_CET = 0x00006828,
+ GUEST_SSP = 0x0000682a,
+ GUEST_INTR_SSP_TABL_ADDR = 0x0000682c,
HOST_CR0 = 0x00006c00,
HOST_CR3 = 0x00006c02,
HOST_CR4 = 0x00006c04,
@@ -337,6 +342,9 @@ enum vmcs_field {
HOST_IA32_SYSENTER_EIP = 0x00006c12,
HOST_RSP = 0x00006c14,
HOST_RIP = 0x00006c16,
+ HOST_IA32_S_CET = 0x00006c18,
+ HOST_SSP = 0x00006c1a,
+ HOST_INTR_SSP_TABL_ADDR = 0x00006c1c
};

/*
--
2.17.1


2019-02-26 19:33:50

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] KVM:VMX: Define CET VMCS fields and bits

On Mon, Feb 25, 2019 at 10:32 PM Yang Weijiang <[email protected]> wrote:
>
> CET - Control-flow Enforcement Technology, it's used to
> protect against return/jump oriented programming (ROP)
> attacks. It provides the following capabilities to defend
> against ROP/JOP style control-flow subversion attacks:
> - Shadow Stack (SHSTK):
> A second stack for the program that is
> used exclusively for control transfer
> operations.
> - Indirect Branch Tracking (IBT):
> Free branch protection to defend against
> Jump/Call Oriented Programming.
>
> On processors that support CET, VMX saves/restores
> the states of IA32_S_CET, SSP and IA32_INTR_SSP_TABL_ADDR MSR
> to the VMCS area for Guest/Host unconditionally.
>
> If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET MSRs are
> restored from VMCS host-state area at VM exit as follows:
>
> - HOST_IA32_S_CET: Host supervisor mode IA32_S_CET MSR is loaded
> from this field.
>
> - HOST_SSP : Host SSP is loaded from this field.
>
> - HOST_INTR_SSP_TABL_ADDR : Host IA32_INTR_SSP_TABL_ADDR
> MSR is loaded from this field.
>
> If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET MSRs are loaded
> from VMCS guest-state area at VM entry as follows:
>
> - GUEST_IA32_S_CET : Guest supervisor mode IA32_S_CET MSR is loaded
> from this field.
>
> - GUEST_SSP : Guest SSP is loaded from this field.
>
> - GUEST_INTR_SSP_TABL_ADDR : Guest IA32_INTR_SSP_TABL_ADDR
> MSR is loaded from this field.
>
> Additionally, to context switch guest and host CET states, the VMM
> uses xsaves/xrstors instructions to save/restore the guest CET states
> at VM exit/entry. The CET xsave area is within thread_struct.fpu area.
> If OS execution flow changes during task switch/interrupt/exception etc.,
> the OS also relies on xsaves/xrstors to switch CET states accordingly.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/include/asm/vmx.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index ade0f153947d..395c1f7e5938 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -98,6 +98,7 @@
> #define VM_EXIT_LOAD_IA32_EFER 0x00200000
> #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
> #define VM_EXIT_CLEAR_BNDCFGS 0x00800000
> +#define VM_EXIT_LOAD_HOST_CET_STATE 0x10000000
>
> #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
>
> @@ -109,6 +110,7 @@
> #define VM_ENTRY_LOAD_IA32_PAT 0x00004000
> #define VM_ENTRY_LOAD_IA32_EFER 0x00008000
> #define VM_ENTRY_LOAD_BNDCFGS 0x00010000
> +#define VM_ENTRY_LOAD_GUEST_CET_STATE 0x00100000
>
> #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
>
> @@ -325,6 +327,9 @@ enum vmcs_field {
> GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
> GUEST_SYSENTER_ESP = 0x00006824,
> GUEST_SYSENTER_EIP = 0x00006826,
> + GUEST_IA32_S_CET = 0x00006828,
> + GUEST_SSP = 0x0000682a,
> + GUEST_INTR_SSP_TABL_ADDR = 0x0000682c,

Nit: TABL is an unusual abbreviation. Perhaps TBL here and below? And
why did you drop the 'IA32' here, but not in GUEST_IA32_S_CET above?
(It is true that there seems to be no rhyme or reason to the mnemonics
chosen here. For example, EFER keeps its IA32, but SYSENTER_EIP
doesn't. Sigh.)

> HOST_CR0 = 0x00006c00,
> HOST_CR3 = 0x00006c02,
> HOST_CR4 = 0x00006c04,
> @@ -337,6 +342,9 @@ enum vmcs_field {
> HOST_IA32_SYSENTER_EIP = 0x00006c12,
> HOST_RSP = 0x00006c14,
> HOST_RIP = 0x00006c16,
> + HOST_IA32_S_CET = 0x00006c18,
> + HOST_SSP = 0x00006c1a,
> + HOST_INTR_SSP_TABL_ADDR = 0x00006c1c
> };
>
> /*
> --
> 2.17.1

Reviewed-by: Jim Mattson <[email protected]>

2019-02-26 19:50:21

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM:CPUID: Define CET CPUID bits and CR4.CET master enable bit.

On Mon, Feb 25, 2019 at 10:32 PM Yang Weijiang <[email protected]> wrote:
>
> Guest queries CET SHSTK and IBT support by CPUID.(EAX=0x7,ECX=0),
> in return, ECX[bit 7] corresponds to SHSTK feature, and EDX[bit 20]
> corresponds to IBT feature.
> CR4.CET[bit 23] is CET master enable bit, it controls CET feature
> availability in guest OS.
>
> Note: Although SHSTK or IBT can be enabled independently,
> either of the features is controlled by CR4.CET.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>

Am I missing something? X86_CR4_CET and X86_FEATURE_SHSTK and
X86_FEATURE_IBT don't appear to be defined in Linus' tree.

2019-02-27 00:58:57

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] KVM:VMX: Define CET VMCS fields and bits

On Tue, Feb 26, 2019 at 11:31:11AM -0800, Jim Mattson wrote:
> On Mon, Feb 25, 2019 at 10:32 PM Yang Weijiang <[email protected]> wrote:
> >
> > CET - Control-flow Enforcement Technology, it's used to
> > protect against return/jump oriented programming (ROP)
> > attacks. It provides the following capabilities to defend
> > against ROP/JOP style control-flow subversion attacks:
> > - Shadow Stack (SHSTK):
> > A second stack for the program that is
> > used exclusively for control transfer
> > operations.
> > - Indirect Branch Tracking (IBT):
> > Free branch protection to defend against
> > Jump/Call Oriented Programming.
> >
> > On processors that support CET, VMX saves/restores
> > the states of IA32_S_CET, SSP and IA32_INTR_SSP_TABL_ADDR MSR
> > to the VMCS area for Guest/Host unconditionally.
> >
> > If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET MSRs are
> > restored from VMCS host-state area at VM exit as follows:
> >
> > - HOST_IA32_S_CET: Host supervisor mode IA32_S_CET MSR is loaded
> > from this field.
> >
> > - HOST_SSP : Host SSP is loaded from this field.
> >
> > - HOST_INTR_SSP_TABL_ADDR : Host IA32_INTR_SSP_TABL_ADDR
> > MSR is loaded from this field.
> >
> > If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET MSRs are loaded
> > from VMCS guest-state area at VM entry as follows:
> >
> > - GUEST_IA32_S_CET : Guest supervisor mode IA32_S_CET MSR is loaded
> > from this field.
> >
> > - GUEST_SSP : Guest SSP is loaded from this field.
> >
> > - GUEST_INTR_SSP_TABL_ADDR : Guest IA32_INTR_SSP_TABL_ADDR
> > MSR is loaded from this field.
> >
> > Additionally, to context switch guest and host CET states, the VMM
> > uses xsaves/xrstors instructions to save/restore the guest CET states
> > at VM exit/entry. The CET xsave area is within thread_struct.fpu area.
> > If OS execution flow changes during task switch/interrupt/exception etc.,
> > the OS also relies on xsaves/xrstors to switch CET states accordingly.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/include/asm/vmx.h | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > index ade0f153947d..395c1f7e5938 100644
> > --- a/arch/x86/include/asm/vmx.h
> > +++ b/arch/x86/include/asm/vmx.h
> > @@ -98,6 +98,7 @@
> > #define VM_EXIT_LOAD_IA32_EFER 0x00200000
> > #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
> > #define VM_EXIT_CLEAR_BNDCFGS 0x00800000
> > +#define VM_EXIT_LOAD_HOST_CET_STATE 0x10000000
> >
> > #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
> >
> > @@ -109,6 +110,7 @@
> > #define VM_ENTRY_LOAD_IA32_PAT 0x00004000
> > #define VM_ENTRY_LOAD_IA32_EFER 0x00008000
> > #define VM_ENTRY_LOAD_BNDCFGS 0x00010000
> > +#define VM_ENTRY_LOAD_GUEST_CET_STATE 0x00100000
> >
> > #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
> >
> > @@ -325,6 +327,9 @@ enum vmcs_field {
> > GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
> > GUEST_SYSENTER_ESP = 0x00006824,
> > GUEST_SYSENTER_EIP = 0x00006826,
> > + GUEST_IA32_S_CET = 0x00006828,
> > + GUEST_SSP = 0x0000682a,
> > + GUEST_INTR_SSP_TABL_ADDR = 0x0000682c,
>
> Nit: TABL is an unusual abbreviation. Perhaps TBL here and below? And
> why did you drop the 'IA32' here, but not in GUEST_IA32_S_CET above?
> (It is true that there seems to be no rhyme or reason to the mnemonics
> chosen here. For example, EFER keeps its IA32, but SYSENTER_EIP
> doesn't. Sigh.)
>
Hi, Jim,
Thanks for the comments. You're right, I'll change these definitions
in next version.

> > HOST_CR0 = 0x00006c00,
> > HOST_CR3 = 0x00006c02,
> > HOST_CR4 = 0x00006c04,
> > @@ -337,6 +342,9 @@ enum vmcs_field {
> > HOST_IA32_SYSENTER_EIP = 0x00006c12,
> > HOST_RSP = 0x00006c14,
> > HOST_RIP = 0x00006c16,
> > + HOST_IA32_S_CET = 0x00006c18,
> > + HOST_SSP = 0x00006c1a,
> > + HOST_INTR_SSP_TABL_ADDR = 0x00006c1c
> > };
> >
> > /*
> > --
> > 2.17.1
>
> Reviewed-by: Jim Mattson <[email protected]>

2019-02-27 01:04:54

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM:CPUID: Define CET CPUID bits and CR4.CET master enable bit.

On Tue, Feb 26, 2019 at 11:48:59AM -0800, Jim Mattson wrote:
> On Mon, Feb 25, 2019 at 10:32 PM Yang Weijiang <[email protected]> wrote:
> >
> > Guest queries CET SHSTK and IBT support by CPUID.(EAX=0x7,ECX=0),
> > in return, ECX[bit 7] corresponds to SHSTK feature, and EDX[bit 20]
> > corresponds to IBT feature.
> > CR4.CET[bit 23] is CET master enable bit, it controls CET feature
> > availability in guest OS.
> >
> > Note: Although SHSTK or IBT can be enabled independently,
> > either of the features is controlled by CR4.CET.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
>
> Am I missing something? X86_CR4_CET and X86_FEATURE_SHSTK and
> X86_FEATURE_IBT don't appear to be defined in Linus' tree.
The patch-set has dependency on this CET Kernel patch-set:
https://lkml.org/lkml/2018/11/20/203
I reused some definitions from the kernel patches.


2019-02-28 15:56:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] KVM:VMX: Define CET VMCS fields and bits

On Tue, Feb 26, 2019 at 11:31:11AM -0800, Jim Mattson wrote:
> On Mon, Feb 25, 2019 at 10:32 PM Yang Weijiang <[email protected]> wrote:
> >
> > CET - Control-flow Enforcement Technology, it's used to
> > protect against return/jump oriented programming (ROP)
> > attacks. It provides the following capabilities to defend
> > against ROP/JOP style control-flow subversion attacks:
> > - Shadow Stack (SHSTK):
> > A second stack for the program that is
> > used exclusively for control transfer
> > operations.
> > - Indirect Branch Tracking (IBT):
> > Free branch protection to defend against
> > Jump/Call Oriented Programming.
> >
> > On processors that support CET, VMX saves/restores
> > the states of IA32_S_CET, SSP and IA32_INTR_SSP_TABL_ADDR MSR
> > to the VMCS area for Guest/Host unconditionally.
> >
> > If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET MSRs are
> > restored from VMCS host-state area at VM exit as follows:
> >
> > - HOST_IA32_S_CET: Host supervisor mode IA32_S_CET MSR is loaded
> > from this field.
> >
> > - HOST_SSP : Host SSP is loaded from this field.
> >
> > - HOST_INTR_SSP_TABL_ADDR : Host IA32_INTR_SSP_TABL_ADDR
> > MSR is loaded from this field.
> >
> > If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET MSRs are loaded
> > from VMCS guest-state area at VM entry as follows:
> >
> > - GUEST_IA32_S_CET : Guest supervisor mode IA32_S_CET MSR is loaded
> > from this field.
> >
> > - GUEST_SSP : Guest SSP is loaded from this field.
> >
> > - GUEST_INTR_SSP_TABL_ADDR : Guest IA32_INTR_SSP_TABL_ADDR
> > MSR is loaded from this field.
> >
> > Additionally, to context switch guest and host CET states, the VMM
> > uses xsaves/xrstors instructions to save/restore the guest CET states
> > at VM exit/entry. The CET xsave area is within thread_struct.fpu area.
> > If OS execution flow changes during task switch/interrupt/exception etc.,
> > the OS also relies on xsaves/xrstors to switch CET states accordingly.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/include/asm/vmx.h | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > index ade0f153947d..395c1f7e5938 100644
> > --- a/arch/x86/include/asm/vmx.h
> > +++ b/arch/x86/include/asm/vmx.h
> > @@ -98,6 +98,7 @@
> > #define VM_EXIT_LOAD_IA32_EFER 0x00200000
> > #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
> > #define VM_EXIT_CLEAR_BNDCFGS 0x00800000
> > +#define VM_EXIT_LOAD_HOST_CET_STATE 0x10000000
> >
> > #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
> >
> > @@ -109,6 +110,7 @@
> > #define VM_ENTRY_LOAD_IA32_PAT 0x00004000
> > #define VM_ENTRY_LOAD_IA32_EFER 0x00008000
> > #define VM_ENTRY_LOAD_BNDCFGS 0x00010000
> > +#define VM_ENTRY_LOAD_GUEST_CET_STATE 0x00100000
> >
> > #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
> >
> > @@ -325,6 +327,9 @@ enum vmcs_field {
> > GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
> > GUEST_SYSENTER_ESP = 0x00006824,
> > GUEST_SYSENTER_EIP = 0x00006826,
> > + GUEST_IA32_S_CET = 0x00006828,
> > + GUEST_SSP = 0x0000682a,
> > + GUEST_INTR_SSP_TABL_ADDR = 0x0000682c,
>
> Nit: TABL is an unusual abbreviation. Perhaps TBL here and below?

I don't see any reason to abbreviate TABLE, we don't need the two
characters for anything.

> And why did you drop the 'IA32' here, but not in GUEST_IA32_S_CET above?
> (It is true that there seems to be no rhyme or reason to the mnemonics
> chosen here. For example, EFER keeps its IA32, but SYSENTER_EIP
> doesn't. Sigh.)

My vote is to always drop IA32, IMO it's redundant.

2019-02-28 16:00:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> Guest CET SHSTK and IBT capability are reported via
> CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> Guest user mode and supervisor mode xsaves component size
> is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> respectively.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> arch/x86/kvm/x86.h | 4 +++
> 2 files changed, 50 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index cb1aece25b17..5e05756cc6db 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> return xcr0;
> }
>
> +u64 kvm_supported_xss(void)
> +{
> + u64 xss;
> +
> + rdmsrl(MSR_IA32_XSS, xss);
> + xss &= KVM_SUPPORTED_XSS;
> + return xss;
> +}
> +EXPORT_SYMBOL(kvm_supported_xss);
> +
> #define F(x) bit(X86_FEATURE_##x)
>
> /* For scattered features from cpufeatures.h; we currently expose none */
> @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> u32 index, int *nent, int maxnent)
> {
> int r;
> + u32 eax, ebx, ecx, edx;
> unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> #ifdef CONFIG_X86_64
> unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> * if the host doesn't support it.
> */
> entry->edx |= F(ARCH_CAPABILITIES);
> +
> + /*
> + * Guest OS CET enabling is designed independent to
> + * host enabling, it only has dependency on Host HW
> + * capability, if it has, report CET support to
> + * Guest.
> + */
> + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> + if (ecx & F(SHSTK))
> + entry->ecx |= F(SHSTK);
> +
> + if (edx & F(IBT))
> + entry->edx |= F(IBT);

There's no need to manually add these flags. They will be automatically
kept if supported in hardware because your previous patch, 02/08, added
them to the mask of features that can be exposed to the guest,
i.e. set them in kvm_cpuid_7_0_e{c,d}x_x86_features.

> +
> } else {
> entry->ebx = 0;
> entry->ecx = 0;
> @@ -564,14 +589,17 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> }
> case 0xd: {
> int idx, i;
> - u64 supported = kvm_supported_xcr0();
> + u64 u_supported = kvm_supported_xcr0();
> + u64 s_supported = kvm_supported_xss();
> + u64 supported;
> + int compacted;
>
> - entry->eax &= supported;
> - entry->ebx = xstate_required_size(supported, false);
> + entry->eax &= u_supported;
> + entry->ebx = xstate_required_size(u_supported, false);
> entry->ecx = entry->ebx;
> - entry->edx &= supported >> 32;
> + entry->edx &= u_supported >> 32;
> entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> - if (!supported)
> + if (!u_supported && !s_supported)
> break;
>
> for (idx = 1, i = 1; idx < 64; ++idx) {
> @@ -583,19 +611,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> if (idx == 1) {
> entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> - entry[i].ebx = 0;
> - if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> - entry[i].ebx =
> - xstate_required_size(supported,
> - true);
> + supported = u_supported | s_supported;
> + compacted = entry[i].eax &
> + (F(XSAVES) | F(XSAVEC));
> + entry[i].ebx = xstate_required_size(supported,
> + compacted);
> + entry[i].ecx &= s_supported;
> + entry[i].edx = 0;
> } else {
> + supported = (entry[i].ecx & 1) ? s_supported :
> + u_supported;
> if (entry[i].eax == 0 || !(supported & mask))
> continue;
> - if (WARN_ON_ONCE(entry[i].ecx & 1))
> - continue;
> + entry[i].ecx &= 1;
> + entry[i].edx = 0;
> + if (entry[i].ecx)
> + entry[i].ebx = 0;
> }
> - entry[i].ecx = 0;
> - entry[i].edx = 0;
> entry[i].flags |=
> KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> ++*nent;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 224cd0a47568..c61da41c3c5c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -283,6 +283,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> | XFEATURE_MASK_PKRU)
> +
> +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_SHSTK_USER \
> + | XFEATURE_MASK_SHSTK_KERNEL)
> +
> extern u64 host_xcr0;
>
> extern u64 kvm_supported_xcr0(void);
> --
> 2.17.1
>

2019-02-28 16:06:57

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM:CPUID: Define CET CPUID bits and CR4.CET master enable bit.

On Mon, Feb 25, 2019 at 09:27:10PM +0800, Yang Weijiang wrote:
> Guest queries CET SHSTK and IBT support by CPUID.(EAX=0x7,ECX=0),
> in return, ECX[bit 7] corresponds to SHSTK feature, and EDX[bit 20]
> corresponds to IBT feature.
> CR4.CET[bit 23] is CET master enable bit, it controls CET feature
> availability in guest OS.
>
> Note: Although SHSTK or IBT can be enabled independently,
> either of the features is controlled by CR4.CET.

This patch effectively allows the guest to set CR4, it should be the
last patch in the series to actually "enable" CET in the guest, i.e.
once KVM actually virtualizes CET.

>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 3 ++-
> arch/x86/kvm/cpuid.c | 4 ++--
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55e51ff7e421..df002936088f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -90,7 +90,8 @@
> | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
> | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
> | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
> - | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
> + | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
> + | X86_CR4_CET))

This macro doesn't define CR4 bits, it defines which bits should be
treated as reserved when the guest writes CR4.

>
> #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 7bcfa61375c0..cb1aece25b17 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -406,12 +406,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
> F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> - F(CLDEMOTE);
> + F(CLDEMOTE) | F(SHSTK);
>
> /* cpuid 7.0.edx*/
> const u32 kvm_cpuid_7_0_edx_x86_features =
> F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES);
> + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(IBT);

Again, these masks don't "define" the feature bits, rather they specify
which feature bits can be exposed to the guest.

>
> /* all calls to cpuid_count() should be made on the same cpu */
> get_cpu();
> --
> 2.17.1
>

2019-02-28 18:19:30

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On Mon, Feb 25, 2019 at 09:27:15PM +0800, Yang Weijiang wrote:
> For Guest XSS, right now, only bit 11(user states) and bit 12
> (supervisor states) are supported, if other bits are being set,
> need to modify KVM_SUPPORTED_XSS macro to have support.

The changelog should describe what the change does directly. Referencing
specific bits implies that the code is explicitly checking for said bits,
which it does not. And there's no need to advise readers on how to add
more bits in the future, e.g. the KVM_SUPPORTED_XSS macro may not exist
the next time new bits are added. Just cover what the patch does and why.

Something like:

KVM: x86: Allow the guest to set supported bits in XSS

...now that KVM supports setting CET related bits. Previously, KVM did
not support setting any bits in XSS and so hardcoded its check to inject
a #GP if the guest attempted to write a non-zero value to IA32_XSS.

>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d32cee9ee079..68908ed7b151 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -47,6 +47,7 @@
> #include <asm/virtext.h>
> #include <asm/mce.h>
> #include <asm/fpu/internal.h>
> +#include <asm/fpu/types.h>
> #include <asm/perf_event.h>
> #include <asm/debugreg.h>
> #include <asm/kexec.h>
> @@ -4336,12 +4337,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_XSS:
> if (!vmx_xsaves_supported())
> return 1;
> +
> /*
> - * The only supported bit as of Skylake is bit 8, but
> - * it is not supported on KVM.
> + * Check bits being set are supported in KVM.

I'd drop the comment altogether, it's pretty obvious from the code that
were checking which bits are supported.

> */
> - if (data != 0)
> + if (data & ~kvm_supported_xss())
> return 1;
> +
> vcpu->arch.ia32_xss = data;
> if (vcpu->arch.ia32_xss != host_xss)
> add_atomic_switch_msr(vmx, MSR_IA32_XSS,
> --
> 2.17.1
>

2019-02-28 19:34:13

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> "Load Guest CET state" bit controls whether guest CET states
> will be loaded at Guest entry. Before doing that, KVM needs
> to check if CPU CET feature is available.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> 1 file changed, 32 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 89ee086e1729..d32cee9ee079 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -55,6 +55,7 @@
> #include <asm/mmu_context.h>
> #include <asm/spec-ctrl.h>
> #include <asm/mshyperv.h>
> +#include <asm/cet.h>
>
> #include "trace.h"
> #include "pmu.h"
> @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> return !(val & ~valid_bits);
> }
>
> +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> +{
> + u32 eax, ebx, ecx, edx;
> +
> + /*
> + * Guest CET can work as long as HW supports the feature, independent
> + * to Host SW enabling status.
> + */
> + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> +
> + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;

Given the holes in the (current) architecture/spec, I think KVM has to
require both features to be supported in the guest to allow CR4.CET to
be enabled.

Technically SHSTK and IBT can be enabled independently, but unless I'm
missing something, supporting that in KVM (or any VMM) would be nasty
and would likely degrade guest performance significantly.

MSRs IA32_U_CET and IA32_S_CET have enable bits for each CET feature.
Presumably the bits for each feature are reserved if the feature is not
supported, e.g. SH_STK_EN is reserved to zero if SHSTK isn't supported.
This wouldn't be a problem except that IA32_U_CET and the shadow stack
MSRs, e.g. IA32_PL*_SSP, can be saved/restored via XSAVES/XRSTORS. The
behavior is restricted by IA32_XSS, but again it's all or nothing, e.g.
if any CET feature is supported then XSS_CET_{S,U} can be set.

For example, if a guest supported IBT and !SHSTK, and the guest enabled
XSS_CET_{S,I}, KVM would need to trap XSAVES/XRSTORS to enforce that the
SHSTK bits in XSS_CET_U aren't set. And that doesn't even address the
fact that the architecture defines the effects on the size of the XSAVE
state area as being a bundled deal, e.g. IA32_XSS.CET_U=1 increases the
size by 16 bytes (for IA32_U_CET and IA32_PL3_SSP) regardless of whether
or not SHSTK is supported. One would assume that IA32_PL3_SSP doesn't
exist if shadow stacks are not supported by the CPU.

TL;DR: the architecture enumerates SHSTK and IBT independently, but
the architecture effectively assumes they are bundled together.

> +}
> +
> static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> {
> switch (msr->index) {
> @@ -5409,6 +5424,23 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> return 1;
> }
>
> + /*
> + * To enable Guest CET, check whether CPU CET feature is
> + * available, if it's there, set Guest CET state loading bit
> + * per CR4.CET status, otherwise, return a fault to Guest.
> + */
> + if (vmx_guest_cet_cap(vcpu)) {

This is wrong, it's checking the host capabilities. Use guest_cpuid_has()
to query the guest capabilities. E.g. CET can be supported in the host
but not exposed to guest, in which case the CPUID bits will not be "set"
for the guest.

> + if (cr4 & X86_CR4_CET) {

No need for curly braces here, both the 'if' and 'else' contain a single
statement.

> + vmcs_set_bits(VM_ENTRY_CONTROLS,
> + VM_ENTRY_LOAD_GUEST_CET_STATE);
> + } else {
> + vmcs_clear_bits(VM_ENTRY_CONTROLS,
> + VM_ENTRY_LOAD_GUEST_CET_STATE);
> + }
> + } else if (cr4 & X86_CR4_CET) {
> + return 1;
> + }
> +
> if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
> return 1;
>
> --
> 2.17.1
>

2019-02-28 19:41:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] KVM:X86: Add user-space read/write interface for CET MSRs.

On Mon, Feb 25, 2019 at 09:27:16PM +0800, Yang Weijiang wrote:
> The Guest MSRs are stored in fpu storage area, they are
> operated by XSAVES/XRSTORS, so use kvm_load_guest_fpu
> to restore them is a convenient way to let KVM access
> them. After finish operation, need to restore Host MSR
> contents by kvm_put_guest_fpu.
>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a0f8b71b2132..a4bdbef3a712 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -75,6 +75,8 @@
>
> #define MAX_IO_MSRS 256
> #define KVM_MAX_MCE_BANKS 32
> +#define MAX_GUEST_CET_MSRS 5
> +
> u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;
> EXPORT_SYMBOL_GPL(kvm_mce_cap_supported);
>
> @@ -214,6 +216,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> u64 __read_mostly host_xcr0;
>
> static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>
> static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> {
> @@ -2889,21 +2893,57 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> }
> EXPORT_SYMBOL_GPL(kvm_get_msr_common);
>
> +static int do_cet_msrs(struct kvm_vcpu *vcpu, int entry_num,
> + struct kvm_msr_entry *entries, bool read)
> +{
> + int i = entry_num;
> + int j = MAX_GUEST_CET_MSRS;
> + bool has_cet;
> +
> + has_cet = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) |
> + guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> + /*
> + * Guest CET MSRs are saved by XSAVES, so need to restore
> + * them first, then read out or update the contents and
> + * restore Host ones.
> + */
> + if (has_cet) {
> + kvm_load_guest_fpu(vcpu);
> +
> + if (read) {
> + for (j = 0; j < MAX_GUEST_CET_MSRS; j++, i++)
> + rdmsrl(entries[i].index, entries[i].data);
> + } else {
> + for (j = 0; j < MAX_GUEST_CET_MSRS; j++, i++)
> + wrmsrl(entries[i].index, entries[i].data);
> + }
> +
> + kvm_put_guest_fpu(vcpu);
> + }
> + return j;
> +}
> /*
> * Read or write a bunch of msrs. All parameters are kernel addresses.
> *
> * @return number of msrs set successfully.
> */
> static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
> - struct kvm_msr_entry *entries,
> + struct kvm_msr_entry *entries, bool read,
> int (*do_msr)(struct kvm_vcpu *vcpu,
> unsigned index, u64 *data))
> {
> int i;
>
> - for (i = 0; i < msrs->nmsrs; ++i)
> + for (i = 0; i < msrs->nmsrs; ++i) {
> + /* If it comes to CET related MSRs, read them together. */
> + if (entries[i].index == MSR_IA32_U_CET) {
> + i += do_cet_msrs(vcpu, i, entries, read) - 1;

This wrong, not to mention horribly buggy. The correct way to handle
MSRs that are switched via the VMCS is to read/write the VMCS in
vmx_{get,set}_msr() as needed, e.g. vmcs_writel(GUEST_GS_BASE, data).

> + continue;
> + }
> +
> if (do_msr(vcpu, entries[i].index, &entries[i].data))
> break;
> + }
>
> return i;
> }
> @@ -2938,7 +2978,7 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
> goto out;
> }
>
> - r = n = __msr_io(vcpu, &msrs, entries, do_msr);
> + r = n = __msr_io(vcpu, &msrs, entries, !!writeback, do_msr);
> if (r < 0)
> goto out_free;
>
> --
> 2.17.1
>

2019-03-01 02:03:22

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM:CPUID: Define CET CPUID bits and CR4.CET master enable bit.

On Thu, Feb 28, 2019 at 08:04:22AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:10PM +0800, Yang Weijiang wrote:
> > Guest queries CET SHSTK and IBT support by CPUID.(EAX=0x7,ECX=0),
> > in return, ECX[bit 7] corresponds to SHSTK feature, and EDX[bit 20]
> > corresponds to IBT feature.
> > CR4.CET[bit 23] is CET master enable bit, it controls CET feature
> > availability in guest OS.
> >
> > Note: Although SHSTK or IBT can be enabled independently,
> > either of the features is controlled by CR4.CET.
>
> This patch effectively allows the guest to set CR4, it should be the
> last patch in the series to actually "enable" CET in the guest, i.e.
> once KVM actually virtualizes CET.
>
Thanks Sean. I'll change the patch sequence in next version.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 3 ++-
> > arch/x86/kvm/cpuid.c | 4 ++--
> > 2 files changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 55e51ff7e421..df002936088f 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -90,7 +90,8 @@
> > | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \
> > | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
> > | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
> > - | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP))
> > + | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
> > + | X86_CR4_CET))
>
> This macro doesn't define CR4 bits, it defines which bits should be
> treated as reserved when the guest writes CR4.
>
> >
> > #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 7bcfa61375c0..cb1aece25b17 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -406,12 +406,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ |
> > F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> > F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> > - F(CLDEMOTE);
> > + F(CLDEMOTE) | F(SHSTK);
> >
> > /* cpuid 7.0.edx*/
> > const u32 kvm_cpuid_7_0_edx_x86_features =
> > F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> > - F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES);
> > + F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(IBT);
>
> Again, these masks don't "define" the feature bits, rather they specify
> which feature bits can be exposed to the guest.
>
> >
> > /* all calls to cpuid_count() should be made on the same cpu */
> > get_cpu();
> > --
> > 2.17.1
> >

2019-03-01 02:09:13

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > "Load Guest CET state" bit controls whether guest CET states
> > will be loaded at Guest entry. Before doing that, KVM needs
> > to check if CPU CET feature is available.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > 1 file changed, 32 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 89ee086e1729..d32cee9ee079 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -55,6 +55,7 @@
> > #include <asm/mmu_context.h>
> > #include <asm/spec-ctrl.h>
> > #include <asm/mshyperv.h>
> > +#include <asm/cet.h>
> >
> > #include "trace.h"
> > #include "pmu.h"
> > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > return !(val & ~valid_bits);
> > }
> >
> > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > +{
> > + u32 eax, ebx, ecx, edx;
> > +
> > + /*
> > + * Guest CET can work as long as HW supports the feature, independent
> > + * to Host SW enabling status.
> > + */
> > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > +
> > + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
>
> Given the holes in the (current) architecture/spec, I think KVM has to
> require both features to be supported in the guest to allow CR4.CET to
> be enabled.
The reason why I use a "OR" here is to keep CET enabling control the
same as that on host, right now on host, users can select to enable SHSTK or IBT
feature by disabling the unexpected one. It's free to select SHSTK & IBT
or SHSTK | IBT.
>
> Technically SHSTK and IBT can be enabled independently, but unless I'm
> missing something, supporting that in KVM (or any VMM) would be nasty
> and would likely degrade guest performance significantly.
>
> MSRs IA32_U_CET and IA32_S_CET have enable bits for each CET feature.
> Presumably the bits for each feature are reserved if the feature is not
> supported, e.g. SH_STK_EN is reserved to zero if SHSTK isn't supported.
> This wouldn't be a problem except that IA32_U_CET and the shadow stack
> MSRs, e.g. IA32_PL*_SSP, can be saved/restored via XSAVES/XRSTORS. The
> behavior is restricted by IA32_XSS, but again it's all or nothing, e.g.
> if any CET feature is supported then XSS_CET_{S,U} can be set.
>
> For example, if a guest supported IBT and !SHSTK, and the guest enabled
> XSS_CET_{S,I}, KVM would need to trap XSAVES/XRSTORS to enforce that the
> SHSTK bits in XSS_CET_U aren't set. And that doesn't even address the
> fact that the architecture defines the effects on the size of the XSAVE
> state area as being a bundled deal, e.g. IA32_XSS.CET_U=1 increases the
> size by 16 bytes (for IA32_U_CET and IA32_PL3_SSP) regardless of whether
> or not SHSTK is supported. One would assume that IA32_PL3_SSP doesn't
> exist if shadow stacks are not supported by the CPU.
>
> TL;DR: the architecture enumerates SHSTK and IBT independently, but
> the architecture effectively assumes they are bundled together.
>
> > +}
> > +
> > static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> > {
> > switch (msr->index) {
> > @@ -5409,6 +5424,23 @@ static int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> > return 1;
> > }
> >
> > + /*
> > + * To enable Guest CET, check whether CPU CET feature is
> > + * available, if it's there, set Guest CET state loading bit
> > + * per CR4.CET status, otherwise, return a fault to Guest.
> > + */
> > + if (vmx_guest_cet_cap(vcpu)) {
>
> This is wrong, it's checking the host capabilities. Use guest_cpuid_has()
> to query the guest capabilities. E.g. CET can be supported in the host
> but not exposed to guest, in which case the CPUID bits will not be "set"
> for the guest.
>
you're right, guest_cpuid_has() is enough for CET checking here since
now guest CET enabling is independent to host CET state.

> > + if (cr4 & X86_CR4_CET) {
>
> No need for curly braces here, both the 'if' and 'else' contain a single
> statement.

will remove the braces.
>
> > + vmcs_set_bits(VM_ENTRY_CONTROLS,
> > + VM_ENTRY_LOAD_GUEST_CET_STATE);
> > + } else {
> > + vmcs_clear_bits(VM_ENTRY_CONTROLS,
> > + VM_ENTRY_LOAD_GUEST_CET_STATE);
> > + }
> > + } else if (cr4 & X86_CR4_CET) {
> > + return 1;
> > + }
> > +
> > if (to_vmx(vcpu)->nested.vmxon && !nested_cr4_valid(vcpu, cr4))
> > return 1;
> >
> > --
> > 2.17.1
> >

2019-03-01 02:09:13

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Thu, Feb 28, 2019 at 07:59:40AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > Guest CET SHSTK and IBT capability are reported via
> > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > Guest user mode and supervisor mode xsaves component size
> > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > respectively.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > arch/x86/kvm/x86.h | 4 +++
> > 2 files changed, 50 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index cb1aece25b17..5e05756cc6db 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > return xcr0;
> > }
> >
> > +u64 kvm_supported_xss(void)
> > +{
> > + u64 xss;
> > +
> > + rdmsrl(MSR_IA32_XSS, xss);
> > + xss &= KVM_SUPPORTED_XSS;
> > + return xss;
> > +}
> > +EXPORT_SYMBOL(kvm_supported_xss);
> > +
> > #define F(x) bit(X86_FEATURE_##x)
> >
> > /* For scattered features from cpufeatures.h; we currently expose none */
> > @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > u32 index, int *nent, int maxnent)
> > {
> > int r;
> > + u32 eax, ebx, ecx, edx;
> > unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > #ifdef CONFIG_X86_64
> > unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> > @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > * if the host doesn't support it.
> > */
> > entry->edx |= F(ARCH_CAPABILITIES);
> > +
> > + /*
> > + * Guest OS CET enabling is designed independent to
> > + * host enabling, it only has dependency on Host HW
> > + * capability, if it has, report CET support to
> > + * Guest.
> > + */
> > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > + if (ecx & F(SHSTK))
> > + entry->ecx |= F(SHSTK);
> > +
> > + if (edx & F(IBT))
> > + entry->edx |= F(IBT);
>
> There's no need to manually add these flags. They will be automatically
> kept if supported in hardware because your previous patch, 02/08, added
> them to the mask of features that can be exposed to the guest,
> i.e. set them in kvm_cpuid_7_0_e{c,d}x_x86_features.
>
I shared the same thought as you before, but after I took a closer look at the
kernel code, actually, when host CET feature is disabled by user via
cmdline options(no_cet_shstk and no_cet_ibt), it'll mask out CET feature bits in
boot_cpu_data.x86_capbility[] array, and cpuid_mask() will make the bits
in previous definition lost, so these lines actually add them back when
host CET is disabled.

please check CET kernel patch here:
https://lkml.org/lkml/2018/11/20/204

> > +
> > } else {
> > entry->ebx = 0;
> > entry->ecx = 0;
> > @@ -564,14 +589,17 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > }
> > case 0xd: {
> > int idx, i;
> > - u64 supported = kvm_supported_xcr0();
> > + u64 u_supported = kvm_supported_xcr0();
> > + u64 s_supported = kvm_supported_xss();
> > + u64 supported;
> > + int compacted;
> >
> > - entry->eax &= supported;
> > - entry->ebx = xstate_required_size(supported, false);
> > + entry->eax &= u_supported;
> > + entry->ebx = xstate_required_size(u_supported, false);
> > entry->ecx = entry->ebx;
> > - entry->edx &= supported >> 32;
> > + entry->edx &= u_supported >> 32;
> > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > - if (!supported)
> > + if (!u_supported && !s_supported)
> > break;
> >
> > for (idx = 1, i = 1; idx < 64; ++idx) {
> > @@ -583,19 +611,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > if (idx == 1) {
> > entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> > cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> > - entry[i].ebx = 0;
> > - if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> > - entry[i].ebx =
> > - xstate_required_size(supported,
> > - true);
> > + supported = u_supported | s_supported;
> > + compacted = entry[i].eax &
> > + (F(XSAVES) | F(XSAVEC));
> > + entry[i].ebx = xstate_required_size(supported,
> > + compacted);
> > + entry[i].ecx &= s_supported;
> > + entry[i].edx = 0;
> > } else {
> > + supported = (entry[i].ecx & 1) ? s_supported :
> > + u_supported;
> > if (entry[i].eax == 0 || !(supported & mask))
> > continue;
> > - if (WARN_ON_ONCE(entry[i].ecx & 1))
> > - continue;
> > + entry[i].ecx &= 1;
> > + entry[i].edx = 0;
> > + if (entry[i].ecx)
> > + entry[i].ebx = 0;
> > }
> > - entry[i].ecx = 0;
> > - entry[i].edx = 0;
> > entry[i].flags |=
> > KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > ++*nent;
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index 224cd0a47568..c61da41c3c5c 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -283,6 +283,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> > | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> > | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> > | XFEATURE_MASK_PKRU)
> > +
> > +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_SHSTK_USER \
> > + | XFEATURE_MASK_SHSTK_KERNEL)
> > +
> > extern u64 host_xcr0;
> >
> > extern u64 kvm_supported_xcr0(void);
> > --
> > 2.17.1
> >

2019-03-01 02:11:17

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On Thu, Feb 28, 2019 at 08:25:12AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:15PM +0800, Yang Weijiang wrote:
> > For Guest XSS, right now, only bit 11(user states) and bit 12
> > (supervisor states) are supported, if other bits are being set,
> > need to modify KVM_SUPPORTED_XSS macro to have support.
>
> The changelog should describe what the change does directly. Referencing
> specific bits implies that the code is explicitly checking for said bits,
> which it does not. And there's no need to advise readers on how to add
> more bits in the future, e.g. the KVM_SUPPORTED_XSS macro may not exist
> the next time new bits are added. Just cover what the patch does and why.
>
> Something like:
>
> KVM: x86: Allow the guest to set supported bits in XSS
>
> ...now that KVM supports setting CET related bits. Previously, KVM did
> not support setting any bits in XSS and so hardcoded its check to inject
> a #GP if the guest attempted to write a non-zero value to IA32_XSS.
>
good point! Will change these description.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index d32cee9ee079..68908ed7b151 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -47,6 +47,7 @@
> > #include <asm/virtext.h>
> > #include <asm/mce.h>
> > #include <asm/fpu/internal.h>
> > +#include <asm/fpu/types.h>
> > #include <asm/perf_event.h>
> > #include <asm/debugreg.h>
> > #include <asm/kexec.h>
> > @@ -4336,12 +4337,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > case MSR_IA32_XSS:
> > if (!vmx_xsaves_supported())
> > return 1;
> > +
> > /*
> > - * The only supported bit as of Skylake is bit 8, but
> > - * it is not supported on KVM.
> > + * Check bits being set are supported in KVM.
>
> I'd drop the comment altogether, it's pretty obvious from the code that
> were checking which bits are supported.
you won't see these redundancies in next version ;)
> > */
> > - if (data != 0)
> > + if (data & ~kvm_supported_xss())
> > return 1;
> > +
> > vcpu->arch.ia32_xss = data;
> > if (vcpu->arch.ia32_xss != host_xss)
> > add_atomic_switch_msr(vmx, MSR_IA32_XSS,
> > --
> > 2.17.1
> >

2019-03-01 02:16:31

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM:CPUID: Define CET CPUID bits and CR4.CET master enable bit.

On Tue, 2019-02-26 at 15:57 +0800, Yang Weijiang wrote:
> On Tue, Feb 26, 2019 at 11:48:59AM -0800, Jim Mattson wrote:
> > On Mon, Feb 25, 2019 at 10:32 PM Yang Weijiang <[email protected]>
> > wrote:
> > >
> > > Guest queries CET SHSTK and IBT support by CPUID.(EAX=0x7,ECX=0),
> > > in return, ECX[bit 7] corresponds to SHSTK feature, and EDX[bit 20]
> > > corresponds to IBT feature.
> > > CR4.CET[bit 23] is CET master enable bit, it controls CET feature
> > > availability in guest OS.
> > >
> > > Note: Although SHSTK or IBT can be enabled independently,
> > > either of the features is controlled by CR4.CET.
> > >
> > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > Signed-off-by: Yang Weijiang <[email protected]>
> >
> > Am I missing something? X86_CR4_CET and X86_FEATURE_SHSTK and
> > X86_FEATURE_IBT don't appear to be defined in Linus' tree.
>
> The patch-set has dependency on this CET Kernel patch-set:
> https://lkml.org/lkml/2018/11/20/203
> I reused some definitions from the kernel patches.

Right, the definitions should be defined by the kernel patches.
However, I have checked the latest kernel patches and found it didn't define
the related CPUID feature bits.



2019-03-01 03:06:51

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] KVM:X86: Add user-space read/write interface for CET MSRs.

On Thu, Feb 28, 2019 at 08:32:49AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:16PM +0800, Yang Weijiang wrote:
> > The Guest MSRs are stored in fpu storage area, they are
> > operated by XSAVES/XRSTORS, so use kvm_load_guest_fpu
> > to restore them is a convenient way to let KVM access
> > them. After finish operation, need to restore Host MSR
> > contents by kvm_put_guest_fpu.
> >
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 43 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a0f8b71b2132..a4bdbef3a712 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -75,6 +75,8 @@
> >
> > #define MAX_IO_MSRS 256
> > #define KVM_MAX_MCE_BANKS 32
> > +#define MAX_GUEST_CET_MSRS 5
> > +
> > u64 __read_mostly kvm_mce_cap_supported = MCG_CTL_P | MCG_SER_P;
> > EXPORT_SYMBOL_GPL(kvm_mce_cap_supported);
> >
> > @@ -214,6 +216,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> > u64 __read_mostly host_xcr0;
> >
> > static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
> > +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> > +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> >
> > static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> > {
> > @@ -2889,21 +2893,57 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > }
> > EXPORT_SYMBOL_GPL(kvm_get_msr_common);
> >
> > +static int do_cet_msrs(struct kvm_vcpu *vcpu, int entry_num,
> > + struct kvm_msr_entry *entries, bool read)
> > +{
> > + int i = entry_num;
> > + int j = MAX_GUEST_CET_MSRS;
> > + bool has_cet;
> > +
> > + has_cet = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) |
> > + guest_cpuid_has(vcpu, X86_FEATURE_IBT);
> > + /*
> > + * Guest CET MSRs are saved by XSAVES, so need to restore
> > + * them first, then read out or update the contents and
> > + * restore Host ones.
> > + */
> > + if (has_cet) {
> > + kvm_load_guest_fpu(vcpu);
> > +
> > + if (read) {
> > + for (j = 0; j < MAX_GUEST_CET_MSRS; j++, i++)
> > + rdmsrl(entries[i].index, entries[i].data);
> > + } else {
> > + for (j = 0; j < MAX_GUEST_CET_MSRS; j++, i++)
> > + wrmsrl(entries[i].index, entries[i].data);
> > + }
> > +
> > + kvm_put_guest_fpu(vcpu);
> > + }
> > + return j;
> > +}
> > /*
> > * Read or write a bunch of msrs. All parameters are kernel addresses.
> > *
> > * @return number of msrs set successfully.
> > */
> > static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
> > - struct kvm_msr_entry *entries,
> > + struct kvm_msr_entry *entries, bool read,
> > int (*do_msr)(struct kvm_vcpu *vcpu,
> > unsigned index, u64 *data))
> > {
> > int i;
> >
> > - for (i = 0; i < msrs->nmsrs; ++i)
> > + for (i = 0; i < msrs->nmsrs; ++i) {
> > + /* If it comes to CET related MSRs, read them together. */
> > + if (entries[i].index == MSR_IA32_U_CET) {
> > + i += do_cet_msrs(vcpu, i, entries, read) - 1;
>
> This wrong, not to mention horribly buggy. The correct way to handle
> MSRs that are switched via the VMCS is to read/write the VMCS in
> vmx_{get,set}_msr() as needed, e.g. vmcs_writel(GUEST_GS_BASE, data).
>
The code is barely for migration purpose since they're passed through
to guest, I have no intent to expose them just like normal MSRs.
I double checked the spec.:
MSR_IA32_U_CET 0x6a0
MSR_IA32_PL0_SSP 0x6a4
MSR_IA32_PL3_SSP 0x6a7
should come from xsave area.

MSR_IA32_S_CET 0x6a2
MSR_IA32_INTR_SSP_TABL 0x6a8
should come from VMCS guest fields.

for the former, they're stored in guest fpu area, need
to restore them with xrstors temporarily before read, while the later is stored in
VMCS guest fields, I can read them out via vmcs_read() directly.

I'd like to read them out as a whole(all or nothing) due to cost induced
by xsaves/xrstors, in Qemu I put these MSRs as sequential fields.

what do you think of it?


> > + continue;
> > + }
> > +
> > if (do_msr(vcpu, entries[i].index, &entries[i].data))
> > break;
> > + }
> >
> > return i;
> > }
> > @@ -2938,7 +2978,7 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
> > goto out;
> > }
> >
> > - r = n = __msr_io(vcpu, &msrs, entries, do_msr);
> > + r = n = __msr_io(vcpu, &msrs, entries, !!writeback, do_msr);
> > if (r < 0)
> > goto out_free;
> >
> > --
> > 2.17.1
> >

2019-03-01 03:31:47

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] KVM:VMX: Define CET VMCS fields and bits

On Thu, Feb 28, 2019 at 07:53:45AM -0800, Sean Christopherson wrote:
> On Tue, Feb 26, 2019 at 11:31:11AM -0800, Jim Mattson wrote:
> > On Mon, Feb 25, 2019 at 10:32 PM Yang Weijiang <[email protected]> wrote:
> > >
> > > CET - Control-flow Enforcement Technology, it's used to
> > > protect against return/jump oriented programming (ROP)
> > > attacks. It provides the following capabilities to defend
> > > against ROP/JOP style control-flow subversion attacks:
> > > - Shadow Stack (SHSTK):
> > > A second stack for the program that is
> > > used exclusively for control transfer
> > > operations.
> > > - Indirect Branch Tracking (IBT):
> > > Free branch protection to defend against
> > > Jump/Call Oriented Programming.
> > >
> > > On processors that support CET, VMX saves/restores
> > > the states of IA32_S_CET, SSP and IA32_INTR_SSP_TABL_ADDR MSR
> > > to the VMCS area for Guest/Host unconditionally.
> > >
> > > If VM_EXIT_LOAD_HOST_CET_STATE = 1, the host CET MSRs are
> > > restored from VMCS host-state area at VM exit as follows:
> > >
> > > - HOST_IA32_S_CET: Host supervisor mode IA32_S_CET MSR is loaded
> > > from this field.
> > >
> > > - HOST_SSP : Host SSP is loaded from this field.
> > >
> > > - HOST_INTR_SSP_TABL_ADDR : Host IA32_INTR_SSP_TABL_ADDR
> > > MSR is loaded from this field.
> > >
> > > If VM_ENTRY_LOAD_GUEST_CET_STATE = 1, the guest CET MSRs are loaded
> > > from VMCS guest-state area at VM entry as follows:
> > >
> > > - GUEST_IA32_S_CET : Guest supervisor mode IA32_S_CET MSR is loaded
> > > from this field.
> > >
> > > - GUEST_SSP : Guest SSP is loaded from this field.
> > >
> > > - GUEST_INTR_SSP_TABL_ADDR : Guest IA32_INTR_SSP_TABL_ADDR
> > > MSR is loaded from this field.
> > >
> > > Additionally, to context switch guest and host CET states, the VMM
> > > uses xsaves/xrstors instructions to save/restore the guest CET states
> > > at VM exit/entry. The CET xsave area is within thread_struct.fpu area.
> > > If OS execution flow changes during task switch/interrupt/exception etc.,
> > > the OS also relies on xsaves/xrstors to switch CET states accordingly.
> > >
> > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > Signed-off-by: Yang Weijiang <[email protected]>
> > > ---
> > > arch/x86/include/asm/vmx.h | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > > index ade0f153947d..395c1f7e5938 100644
> > > --- a/arch/x86/include/asm/vmx.h
> > > +++ b/arch/x86/include/asm/vmx.h
> > > @@ -98,6 +98,7 @@
> > > #define VM_EXIT_LOAD_IA32_EFER 0x00200000
> > > #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x00400000
> > > #define VM_EXIT_CLEAR_BNDCFGS 0x00800000
> > > +#define VM_EXIT_LOAD_HOST_CET_STATE 0x10000000
> > >
> > > #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
> > >
> > > @@ -109,6 +110,7 @@
> > > #define VM_ENTRY_LOAD_IA32_PAT 0x00004000
> > > #define VM_ENTRY_LOAD_IA32_EFER 0x00008000
> > > #define VM_ENTRY_LOAD_BNDCFGS 0x00010000
> > > +#define VM_ENTRY_LOAD_GUEST_CET_STATE 0x00100000
> > >
> > > #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
> > >
> > > @@ -325,6 +327,9 @@ enum vmcs_field {
> > > GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
> > > GUEST_SYSENTER_ESP = 0x00006824,
> > > GUEST_SYSENTER_EIP = 0x00006826,
> > > + GUEST_IA32_S_CET = 0x00006828,
> > > + GUEST_SSP = 0x0000682a,
> > > + GUEST_INTR_SSP_TABL_ADDR = 0x0000682c,
> >
> > Nit: TABL is an unusual abbreviation. Perhaps TBL here and below?
>
> I don't see any reason to abbreviate TABLE, we don't need the two
> characters for anything.
>
> > And why did you drop the 'IA32' here, but not in GUEST_IA32_S_CET above?
> > (It is true that there seems to be no rhyme or reason to the mnemonics
> > chosen here. For example, EFER keeps its IA32, but SYSENTER_EIP
> > doesn't. Sigh.)
>
> My vote is to always drop IA32, IMO it's redundant.

How about GUEST_S_CET, GUEST_SSP and GUEST_INTR_SSP_TABLE?


2019-03-01 14:56:14

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Thu, Feb 28, 2019 at 04:28:32PM +0800, Yang Weijiang wrote:
> On Thu, Feb 28, 2019 at 07:59:40AM -0800, Sean Christopherson wrote:
> > On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > > Guest CET SHSTK and IBT capability are reported via
> > > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > > Guest user mode and supervisor mode xsaves component size
> > > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > > respectively.
> > >
> > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > Signed-off-by: Yang Weijiang <[email protected]>
> > > ---
> > > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > > arch/x86/kvm/x86.h | 4 +++
> > > 2 files changed, 50 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index cb1aece25b17..5e05756cc6db 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > > return xcr0;
> > > }
> > >
> > > +u64 kvm_supported_xss(void)
> > > +{
> > > + u64 xss;
> > > +
> > > + rdmsrl(MSR_IA32_XSS, xss);
> > > + xss &= KVM_SUPPORTED_XSS;
> > > + return xss;
> > > +}
> > > +EXPORT_SYMBOL(kvm_supported_xss);
> > > +
> > > #define F(x) bit(X86_FEATURE_##x)
> > >
> > > /* For scattered features from cpufeatures.h; we currently expose none */
> > > @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > u32 index, int *nent, int maxnent)
> > > {
> > > int r;
> > > + u32 eax, ebx, ecx, edx;
> > > unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > > #ifdef CONFIG_X86_64
> > > unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> > > @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > * if the host doesn't support it.
> > > */
> > > entry->edx |= F(ARCH_CAPABILITIES);
> > > +
> > > + /*
> > > + * Guest OS CET enabling is designed independent to
> > > + * host enabling, it only has dependency on Host HW
> > > + * capability, if it has, report CET support to
> > > + * Guest.
> > > + */
> > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > + if (ecx & F(SHSTK))
> > > + entry->ecx |= F(SHSTK);
> > > +
> > > + if (edx & F(IBT))
> > > + entry->edx |= F(IBT);
> >
> > There's no need to manually add these flags. They will be automatically
> > kept if supported in hardware because your previous patch, 02/08, added
> > them to the mask of features that can be exposed to the guest,
> > i.e. set them in kvm_cpuid_7_0_e{c,d}x_x86_features.
> >
> I shared the same thought as you before, but after I took a closer look at the
> kernel code, actually, when host CET feature is disabled by user via
> cmdline options(no_cet_shstk and no_cet_ibt), it'll mask out CET feature bits in
> boot_cpu_data.x86_capbility[] array, and cpuid_mask() will make the bits
> in previous definition lost, so these lines actually add them back when
> host CET is disabled.

'entry' is filled by do_cpuid_1_ent(), which does cpuid_count(), same as
your code, i.e. it's not affected by whether or not the host kernel is
using each feature.

> please check CET kernel patch here:
> https://lkml.org/lkml/2018/11/20/204
>
> > > +
> > > } else {
> > > entry->ebx = 0;
> > > entry->ecx = 0;
> > > @@ -564,14 +589,17 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > }
> > > case 0xd: {
> > > int idx, i;
> > > - u64 supported = kvm_supported_xcr0();
> > > + u64 u_supported = kvm_supported_xcr0();
> > > + u64 s_supported = kvm_supported_xss();
> > > + u64 supported;
> > > + int compacted;
> > >
> > > - entry->eax &= supported;
> > > - entry->ebx = xstate_required_size(supported, false);
> > > + entry->eax &= u_supported;
> > > + entry->ebx = xstate_required_size(u_supported, false);
> > > entry->ecx = entry->ebx;
> > > - entry->edx &= supported >> 32;
> > > + entry->edx &= u_supported >> 32;
> > > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > > - if (!supported)
> > > + if (!u_supported && !s_supported)
> > > break;
> > >
> > > for (idx = 1, i = 1; idx < 64; ++idx) {
> > > @@ -583,19 +611,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > if (idx == 1) {
> > > entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> > > cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> > > - entry[i].ebx = 0;
> > > - if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> > > - entry[i].ebx =
> > > - xstate_required_size(supported,
> > > - true);
> > > + supported = u_supported | s_supported;
> > > + compacted = entry[i].eax &
> > > + (F(XSAVES) | F(XSAVEC));
> > > + entry[i].ebx = xstate_required_size(supported,
> > > + compacted);
> > > + entry[i].ecx &= s_supported;
> > > + entry[i].edx = 0;
> > > } else {
> > > + supported = (entry[i].ecx & 1) ? s_supported :
> > > + u_supported;
> > > if (entry[i].eax == 0 || !(supported & mask))
> > > continue;
> > > - if (WARN_ON_ONCE(entry[i].ecx & 1))
> > > - continue;
> > > + entry[i].ecx &= 1;
> > > + entry[i].edx = 0;
> > > + if (entry[i].ecx)
> > > + entry[i].ebx = 0;
> > > }
> > > - entry[i].ecx = 0;
> > > - entry[i].edx = 0;
> > > entry[i].flags |=
> > > KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > > ++*nent;
> > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > > index 224cd0a47568..c61da41c3c5c 100644
> > > --- a/arch/x86/kvm/x86.h
> > > +++ b/arch/x86/kvm/x86.h
> > > @@ -283,6 +283,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> > > | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> > > | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> > > | XFEATURE_MASK_PKRU)
> > > +
> > > +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_SHSTK_USER \
> > > + | XFEATURE_MASK_SHSTK_KERNEL)
> > > +
> > > extern u64 host_xcr0;
> > >
> > > extern u64 kvm_supported_xcr0(void);
> > > --
> > > 2.17.1
> > >

2019-03-01 15:01:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > "Load Guest CET state" bit controls whether guest CET states
> > > will be loaded at Guest entry. Before doing that, KVM needs
> > > to check if CPU CET feature is available.
> > >
> > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > Signed-off-by: Yang Weijiang <[email protected]>
> > > ---
> > > arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > 1 file changed, 32 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 89ee086e1729..d32cee9ee079 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -55,6 +55,7 @@
> > > #include <asm/mmu_context.h>
> > > #include <asm/spec-ctrl.h>
> > > #include <asm/mshyperv.h>
> > > +#include <asm/cet.h>
> > >
> > > #include "trace.h"
> > > #include "pmu.h"
> > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > return !(val & ~valid_bits);
> > > }
> > >
> > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > +{
> > > + u32 eax, ebx, ecx, edx;
> > > +
> > > + /*
> > > + * Guest CET can work as long as HW supports the feature, independent
> > > + * to Host SW enabling status.
> > > + */
> > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > +
> > > + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> >
> > Given the holes in the (current) architecture/spec, I think KVM has to
> > require both features to be supported in the guest to allow CR4.CET to
> > be enabled.
> The reason why I use a "OR" here is to keep CET enabling control the
> same as that on host, right now on host, users can select to enable SHSTK or IBT
> feature by disabling the unexpected one. It's free to select SHSTK & IBT
> or SHSTK | IBT.

Which is not the same as SHSTK != IBT in *hardware*, which is effectively
what this is allowing for the guest. The problem is that the architecture
doesn't cleanly separate the two features, i.e. we'd have a virtualization
hole where the guest could touch state for a disabled feature.

Regardless, the guest would still be able to selectively enable each CET
feature, it would just never see a model where SHSTK != IBT.

2019-03-04 02:42:57

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Fri, Mar 01, 2019 at 06:53:23AM -0800, Sean Christopherson wrote:
> On Thu, Feb 28, 2019 at 04:28:32PM +0800, Yang Weijiang wrote:
> > On Thu, Feb 28, 2019 at 07:59:40AM -0800, Sean Christopherson wrote:
> > > On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > > > Guest CET SHSTK and IBT capability are reported via
> > > > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > > > Guest user mode and supervisor mode xsaves component size
> > > > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > > > respectively.
> > > >
> > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > ---
> > > > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > > > arch/x86/kvm/x86.h | 4 +++
> > > > 2 files changed, 50 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > index cb1aece25b17..5e05756cc6db 100644
> > > > --- a/arch/x86/kvm/cpuid.c
> > > > +++ b/arch/x86/kvm/cpuid.c
> > > > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > > > return xcr0;
> > > > }
> > > >
> > > > +u64 kvm_supported_xss(void)
> > > > +{
> > > > + u64 xss;
> > > > +
> > > > + rdmsrl(MSR_IA32_XSS, xss);
> > > > + xss &= KVM_SUPPORTED_XSS;
> > > > + return xss;
> > > > +}
> > > > +EXPORT_SYMBOL(kvm_supported_xss);
> > > > +
> > > > #define F(x) bit(X86_FEATURE_##x)
> > > >
> > > > /* For scattered features from cpufeatures.h; we currently expose none */
> > > > @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > u32 index, int *nent, int maxnent)
> > > > {
> > > > int r;
> > > > + u32 eax, ebx, ecx, edx;
> > > > unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > > > #ifdef CONFIG_X86_64
> > > > unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> > > > @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > * if the host doesn't support it.
> > > > */
> > > > entry->edx |= F(ARCH_CAPABILITIES);
> > > > +
> > > > + /*
> > > > + * Guest OS CET enabling is designed independent to
> > > > + * host enabling, it only has dependency on Host HW
> > > > + * capability, if it has, report CET support to
> > > > + * Guest.
> > > > + */
> > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > + if (ecx & F(SHSTK))
> > > > + entry->ecx |= F(SHSTK);
> > > > +
> > > > + if (edx & F(IBT))
> > > > + entry->edx |= F(IBT);
> > >
> > > There's no need to manually add these flags. They will be automatically
> > > kept if supported in hardware because your previous patch, 02/08, added
> > > them to the mask of features that can be exposed to the guest,
> > > i.e. set them in kvm_cpuid_7_0_e{c,d}x_x86_features.
> > >
> > I shared the same thought as you before, but after I took a closer look at the
> > kernel code, actually, when host CET feature is disabled by user via
> > cmdline options(no_cet_shstk and no_cet_ibt), it'll mask out CET feature bits in
> > boot_cpu_data.x86_capbility[] array, and cpuid_mask() will make the bits
> > in previous definition lost, so these lines actually add them back when
> > host CET is disabled.
>
> 'entry' is filled by do_cpuid_1_ent(), which does cpuid_count(), same as
> your code, i.e. it's not affected by whether or not the host kernel is
> using each feature.
>
I checked CET kernel patch:
#ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
static __init int setup_disable_shstk(char *s) {
/* require an exact match without trailing characters */
if (s[0] != '\0')
return 0;

if (!boot_cpu_has(X86_FEATURE_SHSTK))
return 1;

setup_clear_cpu_cap(X86_FEATURE_SHSTK);
pr_info("x86: 'no_cet_shstk' specified, disabling Shadow Stack\n");
return 1;
}
__setup("no_cet_shstk", setup_disable_shstk); #endif

setup_disable_shstk()->setup_clear_cpu_cap(X86_FEATURE_SHSTK)->do_clear_cpu_cap(NULL,
feature)->clear_feature(c, feature)->clear_cpu_cap(&boot_cpu_data, feature);

this path will clear boot_cpu_data.x86_capability[X86_FEATURE_SHSTK] if "no_cet_shstk" is set.
but in cpuid_mask(), it will "AND" the bit with SHSTK bit set
in kvm_cpuid_7_0_ecx_x86_features, so the bit in ecx is cleared,
need to add the bit back according to host cpuid_count().
the CET kernel patch can be seen in below patch link.

> > please check CET kernel patch here:
> > https://lkml.org/lkml/2018/11/20/204
> >
> > > > +
> > > > } else {
> > > > entry->ebx = 0;
> > > > entry->ecx = 0;
> > > > @@ -564,14 +589,17 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > }
> > > > case 0xd: {
> > > > int idx, i;
> > > > - u64 supported = kvm_supported_xcr0();
> > > > + u64 u_supported = kvm_supported_xcr0();
> > > > + u64 s_supported = kvm_supported_xss();
> > > > + u64 supported;
> > > > + int compacted;
> > > >
> > > > - entry->eax &= supported;
> > > > - entry->ebx = xstate_required_size(supported, false);
> > > > + entry->eax &= u_supported;
> > > > + entry->ebx = xstate_required_size(u_supported, false);
> > > > entry->ecx = entry->ebx;
> > > > - entry->edx &= supported >> 32;
> > > > + entry->edx &= u_supported >> 32;
> > > > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > > > - if (!supported)
> > > > + if (!u_supported && !s_supported)
> > > > break;
> > > >
> > > > for (idx = 1, i = 1; idx < 64; ++idx) {
> > > > @@ -583,19 +611,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > if (idx == 1) {
> > > > entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> > > > cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> > > > - entry[i].ebx = 0;
> > > > - if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> > > > - entry[i].ebx =
> > > > - xstate_required_size(supported,
> > > > - true);
> > > > + supported = u_supported | s_supported;
> > > > + compacted = entry[i].eax &
> > > > + (F(XSAVES) | F(XSAVEC));
> > > > + entry[i].ebx = xstate_required_size(supported,
> > > > + compacted);
> > > > + entry[i].ecx &= s_supported;
> > > > + entry[i].edx = 0;
> > > > } else {
> > > > + supported = (entry[i].ecx & 1) ? s_supported :
> > > > + u_supported;
> > > > if (entry[i].eax == 0 || !(supported & mask))
> > > > continue;
> > > > - if (WARN_ON_ONCE(entry[i].ecx & 1))
> > > > - continue;
> > > > + entry[i].ecx &= 1;
> > > > + entry[i].edx = 0;
> > > > + if (entry[i].ecx)
> > > > + entry[i].ebx = 0;
> > > > }
> > > > - entry[i].ecx = 0;
> > > > - entry[i].edx = 0;
> > > > entry[i].flags |=
> > > > KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > > > ++*nent;
> > > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > > > index 224cd0a47568..c61da41c3c5c 100644
> > > > --- a/arch/x86/kvm/x86.h
> > > > +++ b/arch/x86/kvm/x86.h
> > > > @@ -283,6 +283,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> > > > | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> > > > | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> > > > | XFEATURE_MASK_PKRU)
> > > > +
> > > > +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_SHSTK_USER \
> > > > + | XFEATURE_MASK_SHSTK_KERNEL)
> > > > +
> > > > extern u64 host_xcr0;
> > > >
> > > > extern u64 kvm_supported_xcr0(void);
> > > > --
> > > > 2.17.1
> > > >

2019-03-04 05:33:33

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > "Load Guest CET state" bit controls whether guest CET states
> > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > to check if CPU CET feature is available.
> > > >
> > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > ---
> > > > arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > 1 file changed, 32 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > index 89ee086e1729..d32cee9ee079 100644
> > > > --- a/arch/x86/kvm/vmx.c
> > > > +++ b/arch/x86/kvm/vmx.c
> > > > @@ -55,6 +55,7 @@
> > > > #include <asm/mmu_context.h>
> > > > #include <asm/spec-ctrl.h>
> > > > #include <asm/mshyperv.h>
> > > > +#include <asm/cet.h>
> > > >
> > > > #include "trace.h"
> > > > #include "pmu.h"
> > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > return !(val & ~valid_bits);
> > > > }
> > > >
> > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + u32 eax, ebx, ecx, edx;
> > > > +
> > > > + /*
> > > > + * Guest CET can work as long as HW supports the feature, independent
> > > > + * to Host SW enabling status.
> > > > + */
> > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > +
> > > > + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > >
> > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > require both features to be supported in the guest to allow CR4.CET to
> > > be enabled.
> > The reason why I use a "OR" here is to keep CET enabling control the
> > same as that on host, right now on host, users can select to enable SHSTK or IBT
> > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > or SHSTK | IBT.
>
> Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> what this is allowing for the guest. The problem is that the architecture
> doesn't cleanly separate the two features, i.e. we'd have a virtualization
> hole where the guest could touch state for a disabled feature.
>
> Regardless, the guest would still be able to selectively enable each CET
> feature, it would just never see a model where SHSTK != IBT.
Hi, Sean,
I'd like to understand your concerns. From my point of view, e.g.,
when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
configured. Could you detail your concerns?


2019-03-04 18:47:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Sun, Mar 03, 2019 at 05:36:45PM +0800, Yang Weijiang wrote:
> On Fri, Mar 01, 2019 at 06:53:23AM -0800, Sean Christopherson wrote:
> > On Thu, Feb 28, 2019 at 04:28:32PM +0800, Yang Weijiang wrote:
> > > On Thu, Feb 28, 2019 at 07:59:40AM -0800, Sean Christopherson wrote:
> > > > On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > > > > Guest CET SHSTK and IBT capability are reported via
> > > > > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > > > > Guest user mode and supervisor mode xsaves component size
> > > > > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > > > > respectively.
> > > > >
> > > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > > ---
> > > > > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > > > > arch/x86/kvm/x86.h | 4 +++
> > > > > 2 files changed, 50 insertions(+), 14 deletions(-)
> > > > >
> > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > > index cb1aece25b17..5e05756cc6db 100644
> > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > > > > return xcr0;
> > > > > }
> > > > >
> > > > > +u64 kvm_supported_xss(void)
> > > > > +{
> > > > > + u64 xss;
> > > > > +
> > > > > + rdmsrl(MSR_IA32_XSS, xss);
> > > > > + xss &= KVM_SUPPORTED_XSS;
> > > > > + return xss;
> > > > > +}
> > > > > +EXPORT_SYMBOL(kvm_supported_xss);
> > > > > +
> > > > > #define F(x) bit(X86_FEATURE_##x)
> > > > >
> > > > > /* For scattered features from cpufeatures.h; we currently expose none */
> > > > > @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > > u32 index, int *nent, int maxnent)
> > > > > {
> > > > > int r;
> > > > > + u32 eax, ebx, ecx, edx;
> > > > > unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > > > > #ifdef CONFIG_X86_64
> > > > > unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> > > > > @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > > * if the host doesn't support it.
> > > > > */
> > > > > entry->edx |= F(ARCH_CAPABILITIES);
> > > > > +
> > > > > + /*
> > > > > + * Guest OS CET enabling is designed independent to
> > > > > + * host enabling, it only has dependency on Host HW
> > > > > + * capability, if it has, report CET support to
> > > > > + * Guest.
> > > > > + */
> > > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > + if (ecx & F(SHSTK))
> > > > > + entry->ecx |= F(SHSTK);
> > > > > +
> > > > > + if (edx & F(IBT))
> > > > > + entry->edx |= F(IBT);
> > > >
> > > > There's no need to manually add these flags. They will be automatically
> > > > kept if supported in hardware because your previous patch, 02/08, added
> > > > them to the mask of features that can be exposed to the guest,
> > > > i.e. set them in kvm_cpuid_7_0_e{c,d}x_x86_features.
> > > >
> > > I shared the same thought as you before, but after I took a closer look at the
> > > kernel code, actually, when host CET feature is disabled by user via
> > > cmdline options(no_cet_shstk and no_cet_ibt), it'll mask out CET feature bits in
> > > boot_cpu_data.x86_capbility[] array, and cpuid_mask() will make the bits
> > > in previous definition lost, so these lines actually add them back when
> > > host CET is disabled.
> >
> > 'entry' is filled by do_cpuid_1_ent(), which does cpuid_count(), same as
> > your code, i.e. it's not affected by whether or not the host kernel is
> > using each feature.
> >
> I checked CET kernel patch:
> #ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
> static __init int setup_disable_shstk(char *s) {
> /* require an exact match without trailing characters */
> if (s[0] != '\0')
> return 0;
>
> if (!boot_cpu_has(X86_FEATURE_SHSTK))
> return 1;
>
> setup_clear_cpu_cap(X86_FEATURE_SHSTK);
> pr_info("x86: 'no_cet_shstk' specified, disabling Shadow Stack\n");
> return 1;
> }
> __setup("no_cet_shstk", setup_disable_shstk); #endif
>
> setup_disable_shstk()->setup_clear_cpu_cap(X86_FEATURE_SHSTK)->do_clear_cpu_cap(NULL,
> feature)->clear_feature(c, feature)->clear_cpu_cap(&boot_cpu_data, feature);
>
> this path will clear boot_cpu_data.x86_capability[X86_FEATURE_SHSTK] if "no_cet_shstk" is set.
> but in cpuid_mask(), it will "AND" the bit with SHSTK bit set
> in kvm_cpuid_7_0_ecx_x86_features, so the bit in ecx is cleared,
> need to add the bit back according to host cpuid_count().
> the CET kernel patch can be seen in below patch link.

Ah, I see. In this case we need to honor boot_cpu_data. The idea is
that a feature should not be exposed to the guest, i.e. actually used,
if it has been explicitly disabled by the user, e.g. to workaround a
hardware or firmware issue. The cases where a feature is exposed to
the guest even when disabled in host is when said feature is emulated
by KVM in software.

>
> > > please check CET kernel patch here:
> > > https://lkml.org/lkml/2018/11/20/204

2019-03-04 18:50:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > to check if CPU CET feature is available.
> > > > >
> > > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > > ---
> > > > > arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > > 1 file changed, 32 insertions(+)
> > > > >
> > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > --- a/arch/x86/kvm/vmx.c
> > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > @@ -55,6 +55,7 @@
> > > > > #include <asm/mmu_context.h>
> > > > > #include <asm/spec-ctrl.h>
> > > > > #include <asm/mshyperv.h>
> > > > > +#include <asm/cet.h>
> > > > >
> > > > > #include "trace.h"
> > > > > #include "pmu.h"
> > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > > return !(val & ~valid_bits);
> > > > > }
> > > > >
> > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > + u32 eax, ebx, ecx, edx;
> > > > > +
> > > > > + /*
> > > > > + * Guest CET can work as long as HW supports the feature, independent
> > > > > + * to Host SW enabling status.
> > > > > + */
> > > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > +
> > > > > + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > >
> > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > require both features to be supported in the guest to allow CR4.CET to
> > > > be enabled.
> > > The reason why I use a "OR" here is to keep CET enabling control the
> > > same as that on host, right now on host, users can select to enable SHSTK or IBT
> > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > or SHSTK | IBT.
> >
> > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > what this is allowing for the guest. The problem is that the architecture
> > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > hole where the guest could touch state for a disabled feature.
> >
> > Regardless, the guest would still be able to selectively enable each CET
> > feature, it would just never see a model where SHSTK != IBT.
> Hi, Sean,
> I'd like to understand your concerns. From my point of view, e.g.,
> when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> configured. Could you detail your concerns?

In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
to the guest. If only SHSTK is exposed, a devious guest can still use IBT
because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
Preventing the guest from using IBT in this scenario is infeasible as it
would require trapping and emulating the XSAVE as well as the relevent CET
MSRs.

2019-03-04 18:52:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> Guest CET SHSTK and IBT capability are reported via
> CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> Guest user mode and supervisor mode xsaves component size
> is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> respectively.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> arch/x86/kvm/x86.h | 4 +++
> 2 files changed, 50 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index cb1aece25b17..5e05756cc6db 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> return xcr0;
> }
>
> +u64 kvm_supported_xss(void)
> +{
> + u64 xss;
> +
> + rdmsrl(MSR_IA32_XSS, xss);

Honest question as I haven't thought through the flows: do we actually
need to restrict XSS based on what's enabled in the host? This
conflicts with your other statements that CET features can be enabled
independent of host support.

And if we do incorporate the host status, the value should be read once
and cached unless we're expecting the host value to change dynamically,
e.g. see host_efer.

> + xss &= KVM_SUPPORTED_XSS;
> + return xss;
> +}
> +EXPORT_SYMBOL(kvm_supported_xss);
> +
> #define F(x) bit(X86_FEATURE_##x)
>
> /* For scattered features from cpufeatures.h; we currently expose none */
> @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> u32 index, int *nent, int maxnent)
> {
> int r;
> + u32 eax, ebx, ecx, edx;
> unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> #ifdef CONFIG_X86_64
> unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> * if the host doesn't support it.
> */
> entry->edx |= F(ARCH_CAPABILITIES);
> +
> + /*
> + * Guest OS CET enabling is designed independent to
> + * host enabling, it only has dependency on Host HW
> + * capability, if it has, report CET support to
> + * Guest.
> + */
> + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> + if (ecx & F(SHSTK))
> + entry->ecx |= F(SHSTK);
> +
> + if (edx & F(IBT))
> + entry->edx |= F(IBT);
> +
> } else {
> entry->ebx = 0;
> entry->ecx = 0;
> @@ -564,14 +589,17 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> }
> case 0xd: {
> int idx, i;
> - u64 supported = kvm_supported_xcr0();
> + u64 u_supported = kvm_supported_xcr0();
> + u64 s_supported = kvm_supported_xss();
> + u64 supported;
> + int compacted;
>
> - entry->eax &= supported;
> - entry->ebx = xstate_required_size(supported, false);
> + entry->eax &= u_supported;
> + entry->ebx = xstate_required_size(u_supported, false);
> entry->ecx = entry->ebx;
> - entry->edx &= supported >> 32;
> + entry->edx &= u_supported >> 32;
> entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> - if (!supported)
> + if (!u_supported && !s_supported)
> break;
>
> for (idx = 1, i = 1; idx < 64; ++idx) {
> @@ -583,19 +611,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> if (idx == 1) {
> entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> - entry[i].ebx = 0;
> - if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> - entry[i].ebx =
> - xstate_required_size(supported,
> - true);
> + supported = u_supported | s_supported;
> + compacted = entry[i].eax &
> + (F(XSAVES) | F(XSAVEC));
> + entry[i].ebx = xstate_required_size(supported,
> + compacted);
> + entry[i].ecx &= s_supported;
> + entry[i].edx = 0;
> } else {
> + supported = (entry[i].ecx & 1) ? s_supported :
> + u_supported;
> if (entry[i].eax == 0 || !(supported & mask))
> continue;
> - if (WARN_ON_ONCE(entry[i].ecx & 1))
> - continue;
> + entry[i].ecx &= 1;
> + entry[i].edx = 0;
> + if (entry[i].ecx)
> + entry[i].ebx = 0;
> }
> - entry[i].ecx = 0;
> - entry[i].edx = 0;
> entry[i].flags |=
> KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> ++*nent;
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 224cd0a47568..c61da41c3c5c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -283,6 +283,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> | XFEATURE_MASK_PKRU)
> +
> +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_SHSTK_USER \
> + | XFEATURE_MASK_SHSTK_KERNEL)

I don't think these definitions are correct, the spec I have lists them
as CET_U and CET_S, i.e. they aren't specific to SHSTK. Did these names
get inherited from the kernel enabling patches?

> +
> extern u64 host_xcr0;
>
> extern u64 kvm_supported_xcr0(void);
> --
> 2.17.1
>

2019-03-04 18:54:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM:VMX: Pass through host CET related MSRs to Guest.

On Mon, Feb 25, 2019 at 09:27:13PM +0800, Yang Weijiang wrote:
> The CET runtime settings, i.e., CET state control bits(IA32_U_CET/
> IA32_S_CET), CET SSP(IA32_PL3_SSP/IA32_PL0_SSP) and SSP table address
> (IA32_INTERRUPT_SSP_TABLE_ADDR) are task/thread specific, therefore,
> OS needs to save/restore the states properly during context switch,
> e.g., task/thread switching, interrupt/exception handling, it uses
> xsaves/xrstors to achieve that.
>
> The difference between VMCS CET area fields and xsave CET area, is that
> the former is for state retention during Guest/Host context
> switch while the latter is for state retention during OS execution.
>
> Linux currently doesn't support CPL1 and CPL2, so SSPs for these level
> are skipped here.

But don't we want to allow a guest to access the MSRs regardless of
the host kernel's behavior?

> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 7bbb8b26e901..89ee086e1729 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11769,6 +11769,7 @@ static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
> static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> + unsigned long *msr_bitmap;
>
> if (cpu_has_secondary_exec_ctrls()) {
> vmx_compute_secondary_exec_control(vmx);
> @@ -11786,6 +11787,18 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> nested_vmx_cr_fixed1_bits_update(vcpu);
> nested_vmx_entry_exit_ctls_update(vcpu);
> }
> +
> + msr_bitmap = vmx->vmcs01.msr_bitmap;
> +
> + if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) |

This should be a logical OR, not a bitwise OR.

> + guest_cpuid_has(vcpu, X86_FEATURE_IBT)) {
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW);
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> + }
> +
> }
>
> static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> --
> 2.17.1
>

2019-03-04 18:56:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Mon, Mar 04, 2019 at 10:47:53AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > Guest CET SHSTK and IBT capability are reported via
> > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > Guest user mode and supervisor mode xsaves component size
> > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > respectively.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > arch/x86/kvm/x86.h | 4 +++
> > 2 files changed, 50 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index cb1aece25b17..5e05756cc6db 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > return xcr0;
> > }
> >
> > +u64 kvm_supported_xss(void)
> > +{
> > + u64 xss;
> > +
> > + rdmsrl(MSR_IA32_XSS, xss);
>
> Honest question as I haven't thought through the flows: do we actually
> need to restrict XSS based on what's enabled in the host? This
> conflicts with your other statements that CET features can be enabled
> independent of host support.

Never mind, just saw Paolo's comment in a previous version about XSAVE
being dependent on host XSS. The below comment about caching still
applies though.

> And if we do incorporate the host status, the value should be read once
> and cached unless we're expecting the host value to change dynamically,
> e.g. see host_efer.
>
> > + xss &= KVM_SUPPORTED_XSS;
> > + return xss;
> > +}
> > +EXPORT_SYMBOL(kvm_supported_xss);
> > +
> > #define F(x) bit(X86_FEATURE_##x)
> >
> > /* For scattered features from cpufeatures.h; we currently expose none */

2019-03-05 03:04:24

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Mon, Mar 04, 2019 at 10:43:07AM -0800, Sean Christopherson wrote:
> On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> > On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > > to check if CPU CET feature is available.
> > > > > >
> > > > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > > > ---
> > > > > > arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > > > 1 file changed, 32 insertions(+)
> > > > > >
> > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > @@ -55,6 +55,7 @@
> > > > > > #include <asm/mmu_context.h>
> > > > > > #include <asm/spec-ctrl.h>
> > > > > > #include <asm/mshyperv.h>
> > > > > > +#include <asm/cet.h>
> > > > > >
> > > > > > #include "trace.h"
> > > > > > #include "pmu.h"
> > > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > > > return !(val & ~valid_bits);
> > > > > > }
> > > > > >
> > > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > + u32 eax, ebx, ecx, edx;
> > > > > > +
> > > > > > + /*
> > > > > > + * Guest CET can work as long as HW supports the feature, independent
> > > > > > + * to Host SW enabling status.
> > > > > > + */
> > > > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > > +
> > > > > > + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > > + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > > >
> > > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > > require both features to be supported in the guest to allow CR4.CET to
> > > > > be enabled.
> > > > The reason why I use a "OR" here is to keep CET enabling control the
> > > > same as that on host, right now on host, users can select to enable SHSTK or IBT
> > > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > > or SHSTK | IBT.
> > >
> > > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > > what this is allowing for the guest. The problem is that the architecture
> > > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > > hole where the guest could touch state for a disabled feature.
> > >
> > > Regardless, the guest would still be able to selectively enable each CET
> > > feature, it would just never see a model where SHSTK != IBT.
> > Hi, Sean,
> > I'd like to understand your concerns. From my point of view, e.g.,
> > when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> > this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> > configured. Could you detail your concerns?
>
> In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
> to the guest. If only SHSTK is exposed, a devious guest can still use IBT
> because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
> Preventing the guest from using IBT in this scenario is infeasible as it
> would require trapping and emulating the XSAVE as well as the relevent CET
> MSRs.
Cannot agree with you more!
This is some design limitation, but from my point of view, once vmm
exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
which features to be enabled, we don't need to add extra constraints on
the usage.


2019-03-05 03:08:03

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Mon, Mar 04, 2019 at 10:47:53AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > Guest CET SHSTK and IBT capability are reported via
> > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > Guest user mode and supervisor mode xsaves component size
> > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > respectively.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > arch/x86/kvm/x86.h | 4 +++
> > 2 files changed, 50 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index cb1aece25b17..5e05756cc6db 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > return xcr0;
> > }
> >
> > +u64 kvm_supported_xss(void)
> > +{
> > + u64 xss;
> > +
> > + rdmsrl(MSR_IA32_XSS, xss);
>
> Honest question as I haven't thought through the flows: do we actually
> need to restrict XSS based on what's enabled in the host? This
> conflicts with your other statements that CET features can be enabled
> independent of host support.
>
> And if we do incorporate the host status, the value should be read once
> and cached unless we're expecting the host value to change dynamically,
> e.g. see host_efer.
>
> > + xss &= KVM_SUPPORTED_XSS;
> > + return xss;
> > +}
> > +EXPORT_SYMBOL(kvm_supported_xss);
> > +
> > #define F(x) bit(X86_FEATURE_##x)
> >
> > /* For scattered features from cpufeatures.h; we currently expose none */
> > @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > u32 index, int *nent, int maxnent)
> > {
> > int r;
> > + u32 eax, ebx, ecx, edx;
> > unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > #ifdef CONFIG_X86_64
> > unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> > @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > * if the host doesn't support it.
> > */
> > entry->edx |= F(ARCH_CAPABILITIES);
> > +
> > + /*
> > + * Guest OS CET enabling is designed independent to
> > + * host enabling, it only has dependency on Host HW
> > + * capability, if it has, report CET support to
> > + * Guest.
> > + */
> > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > + if (ecx & F(SHSTK))
> > + entry->ecx |= F(SHSTK);
> > +
> > + if (edx & F(IBT))
> > + entry->edx |= F(IBT);
> > +
> > } else {
> > entry->ebx = 0;
> > entry->ecx = 0;
> > @@ -564,14 +589,17 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > }
> > case 0xd: {
> > int idx, i;
> > - u64 supported = kvm_supported_xcr0();
> > + u64 u_supported = kvm_supported_xcr0();
> > + u64 s_supported = kvm_supported_xss();
> > + u64 supported;
> > + int compacted;
> >
> > - entry->eax &= supported;
> > - entry->ebx = xstate_required_size(supported, false);
> > + entry->eax &= u_supported;
> > + entry->ebx = xstate_required_size(u_supported, false);
> > entry->ecx = entry->ebx;
> > - entry->edx &= supported >> 32;
> > + entry->edx &= u_supported >> 32;
> > entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > - if (!supported)
> > + if (!u_supported && !s_supported)
> > break;
> >
> > for (idx = 1, i = 1; idx < 64; ++idx) {
> > @@ -583,19 +611,23 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > if (idx == 1) {
> > entry[i].eax &= kvm_cpuid_D_1_eax_x86_features;
> > cpuid_mask(&entry[i].eax, CPUID_D_1_EAX);
> > - entry[i].ebx = 0;
> > - if (entry[i].eax & (F(XSAVES)|F(XSAVEC)))
> > - entry[i].ebx =
> > - xstate_required_size(supported,
> > - true);
> > + supported = u_supported | s_supported;
> > + compacted = entry[i].eax &
> > + (F(XSAVES) | F(XSAVEC));
> > + entry[i].ebx = xstate_required_size(supported,
> > + compacted);
> > + entry[i].ecx &= s_supported;
> > + entry[i].edx = 0;
> > } else {
> > + supported = (entry[i].ecx & 1) ? s_supported :
> > + u_supported;
> > if (entry[i].eax == 0 || !(supported & mask))
> > continue;
> > - if (WARN_ON_ONCE(entry[i].ecx & 1))
> > - continue;
> > + entry[i].ecx &= 1;
> > + entry[i].edx = 0;
> > + if (entry[i].ecx)
> > + entry[i].ebx = 0;
> > }
> > - entry[i].ecx = 0;
> > - entry[i].edx = 0;
> > entry[i].flags |=
> > KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> > ++*nent;
> > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > index 224cd0a47568..c61da41c3c5c 100644
> > --- a/arch/x86/kvm/x86.h
> > +++ b/arch/x86/kvm/x86.h
> > @@ -283,6 +283,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
> > | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
> > | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> > | XFEATURE_MASK_PKRU)
> > +
> > +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_SHSTK_USER \
> > + | XFEATURE_MASK_SHSTK_KERNEL)
>
> I don't think these definitions are correct, the spec I have lists them
> as CET_U and CET_S, i.e. they aren't specific to SHSTK. Did these names
> get inherited from the kernel enabling patches?
>
Yes, exactly, I'll notify kernel developer of the issue.
> > +
> > extern u64 host_xcr0;
> >
> > extern u64 kvm_supported_xcr0(void);
> > --
> > 2.17.1
> >

2019-03-05 03:12:39

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
> On Mon, Mar 04, 2019 at 10:43:07AM -0800, Sean Christopherson wrote:
> > On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> > > On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > > > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > > > to check if CPU CET feature is available.
> > > > > > >
> > > > > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > > > > ---
> > > > > > > arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > > > > 1 file changed, 32 insertions(+)
> > > > > > >
> > > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > > @@ -55,6 +55,7 @@
> > > > > > > #include <asm/mmu_context.h>
> > > > > > > #include <asm/spec-ctrl.h>
> > > > > > > #include <asm/mshyperv.h>
> > > > > > > +#include <asm/cet.h>
> > > > > > >
> > > > > > > #include "trace.h"
> > > > > > > #include "pmu.h"
> > > > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > > > > return !(val & ~valid_bits);
> > > > > > > }
> > > > > > >
> > > > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > + u32 eax, ebx, ecx, edx;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * Guest CET can work as long as HW supports the feature, independent
> > > > > > > + * to Host SW enabling status.
> > > > > > > + */
> > > > > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > > > +
> > > > > > > + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > > > + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > > > >
> > > > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > > > require both features to be supported in the guest to allow CR4.CET to
> > > > > > be enabled.
> > > > > The reason why I use a "OR" here is to keep CET enabling control the
> > > > > same as that on host, right now on host, users can select to enable SHSTK or IBT
> > > > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > > > or SHSTK | IBT.
> > > >
> > > > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > > > what this is allowing for the guest. The problem is that the architecture
> > > > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > > > hole where the guest could touch state for a disabled feature.
> > > >
> > > > Regardless, the guest would still be able to selectively enable each CET
> > > > feature, it would just never see a model where SHSTK != IBT.
> > > Hi, Sean,
> > > I'd like to understand your concerns. From my point of view, e.g.,
> > > when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> > > this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> > > configured. Could you detail your concerns?
> >
> > In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
> > to the guest. If only SHSTK is exposed, a devious guest can still use IBT
> > because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
> > Preventing the guest from using IBT in this scenario is infeasible as it
> > would require trapping and emulating the XSAVE as well as the relevent CET
> > MSRs.
> Cannot agree with you more!
> This is some design limitation, but from my point of view, once vmm
> exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
> which features to be enabled, we don't need to add extra constraints on
> the usage.

But if KVM allows SHSTK and IBT to be toggled independently then the VMM
has only exposed SHSTK or IBT, not CET as whole.

Even if SHSTK and IBT are bundled together the guest still has to opt-in
to enabling each feature. I don't see what we gain by pretending that
SHSTK/IBT can be individually exposed to the guest, and on the flip side
doing so creates a virtualization hole.

2019-03-05 03:13:26

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM:VMX: Pass through host CET related MSRs to Guest.

On Mon, Mar 04, 2019 at 10:53:27AM -0800, Sean Christopherson wrote:
> On Mon, Feb 25, 2019 at 09:27:13PM +0800, Yang Weijiang wrote:
> > The CET runtime settings, i.e., CET state control bits(IA32_U_CET/
> > IA32_S_CET), CET SSP(IA32_PL3_SSP/IA32_PL0_SSP) and SSP table address
> > (IA32_INTERRUPT_SSP_TABLE_ADDR) are task/thread specific, therefore,
> > OS needs to save/restore the states properly during context switch,
> > e.g., task/thread switching, interrupt/exception handling, it uses
> > xsaves/xrstors to achieve that.
> >
> > The difference between VMCS CET area fields and xsave CET area, is that
> > the former is for state retention during Guest/Host context
> > switch while the latter is for state retention during OS execution.
> >
> > Linux currently doesn't support CPL1 and CPL2, so SSPs for these level
> > are skipped here.
>
> But don't we want to allow a guest to access the MSRs regardless of
> the host kernel's behavior?
>
Do you see any necessity of exposing the access to guest?

> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 7bbb8b26e901..89ee086e1729 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -11769,6 +11769,7 @@ static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
> > static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > {
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > + unsigned long *msr_bitmap;
> >
> > if (cpu_has_secondary_exec_ctrls()) {
> > vmx_compute_secondary_exec_control(vmx);
> > @@ -11786,6 +11787,18 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> > nested_vmx_cr_fixed1_bits_update(vcpu);
> > nested_vmx_entry_exit_ctls_update(vcpu);
> > }
> > +
> > + msr_bitmap = vmx->vmcs01.msr_bitmap;
> > +
> > + if (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) |
>
> This should be a logical OR, not a bitwise OR.
>
Good capture, thanks!

> > + guest_cpuid_has(vcpu, X86_FEATURE_IBT)) {
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_U_CET, MSR_TYPE_RW);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_S_CET, MSR_TYPE_RW);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_INT_SSP_TAB, MSR_TYPE_RW);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL0_SSP, MSR_TYPE_RW);
> > + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_PL3_SSP, MSR_TYPE_RW);
> > + }
> > +
> > }
> >
> > static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
> > --
> > 2.17.1
> >

2019-03-05 03:17:45

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Mon, Mar 04, 2019 at 10:54:29AM -0800, Sean Christopherson wrote:
> On Mon, Mar 04, 2019 at 10:47:53AM -0800, Sean Christopherson wrote:
> > On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > > Guest CET SHSTK and IBT capability are reported via
> > > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > > Guest user mode and supervisor mode xsaves component size
> > > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > > respectively.
> > >
> > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > Signed-off-by: Yang Weijiang <[email protected]>
> > > ---
> > > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > > arch/x86/kvm/x86.h | 4 +++
> > > 2 files changed, 50 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index cb1aece25b17..5e05756cc6db 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > > return xcr0;
> > > }
> > >
> > > +u64 kvm_supported_xss(void)
> > > +{
> > > + u64 xss;
> > > +
> > > + rdmsrl(MSR_IA32_XSS, xss);
> >
> > Honest question as I haven't thought through the flows: do we actually
> > need to restrict XSS based on what's enabled in the host? This
> > conflicts with your other statements that CET features can be enabled
> > independent of host support.
>
> Never mind, just saw Paolo's comment in a previous version about XSAVE
> being dependent on host XSS. The below comment about caching still
> applies though.
>
In my WIP work, I use existing host_xss in vmx.c instead of reading MSR here to
incoperate host status.

> > And if we do incorporate the host status, the value should be read once
> > and cached unless we're expecting the host value to change dynamically,
> > e.g. see host_efer.
> >
> > > + xss &= KVM_SUPPORTED_XSS;
> > > + return xss;
> > > +}
> > > +EXPORT_SYMBOL(kvm_supported_xss);
> > > +
> > > #define F(x) bit(X86_FEATURE_##x)
> > >
> > > /* For scattered features from cpufeatures.h; we currently expose none */

2019-03-05 03:22:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM:VMX: Pass through host CET related MSRs to Guest.

On Mon, Mar 04, 2019 at 06:07:14PM +0800, Yang Weijiang wrote:
> On Mon, Mar 04, 2019 at 10:53:27AM -0800, Sean Christopherson wrote:
> > On Mon, Feb 25, 2019 at 09:27:13PM +0800, Yang Weijiang wrote:
> > > The CET runtime settings, i.e., CET state control bits(IA32_U_CET/
> > > IA32_S_CET), CET SSP(IA32_PL3_SSP/IA32_PL0_SSP) and SSP table address
> > > (IA32_INTERRUPT_SSP_TABLE_ADDR) are task/thread specific, therefore,
> > > OS needs to save/restore the states properly during context switch,
> > > e.g., task/thread switching, interrupt/exception handling, it uses
> > > xsaves/xrstors to achieve that.
> > >
> > > The difference between VMCS CET area fields and xsave CET area, is that
> > > the former is for state retention during Guest/Host context
> > > switch while the latter is for state retention during OS execution.
> > >
> > > Linux currently doesn't support CPL1 and CPL2, so SSPs for these level
> > > are skipped here.
> >
> > But don't we want to allow a guest to access the MSRs regardless of
> > the host kernel's behavior?
> >
> Do you see any necessity of exposing the access to guest?

No, but isn't exposing them to the guest effectively free since XSAVES
and XRSTORS will always save/restore them along with SSP0 and SSP3?

2019-03-05 05:23:11

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Mon, Mar 04, 2019 at 10:28:36AM -0800, Sean Christopherson wrote:
> On Sun, Mar 03, 2019 at 05:36:45PM +0800, Yang Weijiang wrote:
> > On Fri, Mar 01, 2019 at 06:53:23AM -0800, Sean Christopherson wrote:
> > > On Thu, Feb 28, 2019 at 04:28:32PM +0800, Yang Weijiang wrote:
> > > > On Thu, Feb 28, 2019 at 07:59:40AM -0800, Sean Christopherson wrote:
> > > > > On Mon, Feb 25, 2019 at 09:27:11PM +0800, Yang Weijiang wrote:
> > > > > > Guest CET SHSTK and IBT capability are reported via
> > > > > > CPUID.(EAX=7, ECX=0):ECX[bit 7] and EDX[bit 20] respectively.
> > > > > > Guest user mode and supervisor mode xsaves component size
> > > > > > is reported via CPUID.(EAX=0xD, ECX=1):ECX[bit 11] and ECX[bit 12]
> > > > > > respectively.
> > > > > >
> > > > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > > > ---
> > > > > > arch/x86/kvm/cpuid.c | 60 +++++++++++++++++++++++++++++++++-----------
> > > > > > arch/x86/kvm/x86.h | 4 +++
> > > > > > 2 files changed, 50 insertions(+), 14 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > > > index cb1aece25b17..5e05756cc6db 100644
> > > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > > @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> > > > > > return xcr0;
> > > > > > }
> > > > > >
> > > > > > +u64 kvm_supported_xss(void)
> > > > > > +{
> > > > > > + u64 xss;
> > > > > > +
> > > > > > + rdmsrl(MSR_IA32_XSS, xss);
> > > > > > + xss &= KVM_SUPPORTED_XSS;
> > > > > > + return xss;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(kvm_supported_xss);
> > > > > > +
> > > > > > #define F(x) bit(X86_FEATURE_##x)
> > > > > >
> > > > > > /* For scattered features from cpufeatures.h; we currently expose none */
> > > > > > @@ -323,6 +333,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > > > u32 index, int *nent, int maxnent)
> > > > > > {
> > > > > > int r;
> > > > > > + u32 eax, ebx, ecx, edx;
> > > > > > unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > > > > > #ifdef CONFIG_X86_64
> > > > > > unsigned f_gbpages = (kvm_x86_ops->get_lpage_level() == PT_PDPE_LEVEL)
> > > > > > @@ -503,6 +514,20 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> > > > > > * if the host doesn't support it.
> > > > > > */
> > > > > > entry->edx |= F(ARCH_CAPABILITIES);
> > > > > > +
> > > > > > + /*
> > > > > > + * Guest OS CET enabling is designed independent to
> > > > > > + * host enabling, it only has dependency on Host HW
> > > > > > + * capability, if it has, report CET support to
> > > > > > + * Guest.
> > > > > > + */
> > > > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > > + if (ecx & F(SHSTK))
> > > > > > + entry->ecx |= F(SHSTK);
> > > > > > +
> > > > > > + if (edx & F(IBT))
> > > > > > + entry->edx |= F(IBT);
> > > > >
> > > > > There's no need to manually add these flags. They will be automatically
> > > > > kept if supported in hardware because your previous patch, 02/08, added
> > > > > them to the mask of features that can be exposed to the guest,
> > > > > i.e. set them in kvm_cpuid_7_0_e{c,d}x_x86_features.
> > > > >
> > > > I shared the same thought as you before, but after I took a closer look at the
> > > > kernel code, actually, when host CET feature is disabled by user via
> > > > cmdline options(no_cet_shstk and no_cet_ibt), it'll mask out CET feature bits in
> > > > boot_cpu_data.x86_capbility[] array, and cpuid_mask() will make the bits
> > > > in previous definition lost, so these lines actually add them back when
> > > > host CET is disabled.
> > >
> > > 'entry' is filled by do_cpuid_1_ent(), which does cpuid_count(), same as
> > > your code, i.e. it's not affected by whether or not the host kernel is
> > > using each feature.
> > >
> > I checked CET kernel patch:
> > #ifdef CONFIG_X86_INTEL_SHADOW_STACK_USER
> > static __init int setup_disable_shstk(char *s) {
> > /* require an exact match without trailing characters */
> > if (s[0] != '\0')
> > return 0;
> >
> > if (!boot_cpu_has(X86_FEATURE_SHSTK))
> > return 1;
> >
> > setup_clear_cpu_cap(X86_FEATURE_SHSTK);
> > pr_info("x86: 'no_cet_shstk' specified, disabling Shadow Stack\n");
> > return 1;
> > }
> > __setup("no_cet_shstk", setup_disable_shstk); #endif
> >
> > setup_disable_shstk()->setup_clear_cpu_cap(X86_FEATURE_SHSTK)->do_clear_cpu_cap(NULL,
> > feature)->clear_feature(c, feature)->clear_cpu_cap(&boot_cpu_data, feature);
> >
> > this path will clear boot_cpu_data.x86_capability[X86_FEATURE_SHSTK] if "no_cet_shstk" is set.
> > but in cpuid_mask(), it will "AND" the bit with SHSTK bit set
> > in kvm_cpuid_7_0_ecx_x86_features, so the bit in ecx is cleared,
> > need to add the bit back according to host cpuid_count().
> > the CET kernel patch can be seen in below patch link.
>
> Ah, I see. In this case we need to honor boot_cpu_data. The idea is
> that a feature should not be exposed to the guest, i.e. actually used,
> if it has been explicitly disabled by the user, e.g. to workaround a
> hardware or firmware issue. The cases where a feature is exposed to
> the guest even when disabled in host is when said feature is emulated
> by KVM in software.
>
Make sense, I'll change related checks in the patch set.
Thanks!

> >
> > > > please check CET kernel patch here:
> > > > https://lkml.org/lkml/2018/11/20/204

2019-03-05 05:43:32

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Mon, Mar 04, 2019 at 07:12:02PM -0800, Sean Christopherson wrote:
> On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
> > On Mon, Mar 04, 2019 at 10:43:07AM -0800, Sean Christopherson wrote:
> > > On Sun, Mar 03, 2019 at 08:26:08PM +0800, Yang Weijiang wrote:
> > > > On Fri, Mar 01, 2019 at 06:58:19AM -0800, Sean Christopherson wrote:
> > > > > On Thu, Feb 28, 2019 at 04:38:44PM +0800, Yang Weijiang wrote:
> > > > > > On Thu, Feb 28, 2019 at 08:17:15AM -0800, Sean Christopherson wrote:
> > > > > > > On Mon, Feb 25, 2019 at 09:27:14PM +0800, Yang Weijiang wrote:
> > > > > > > > "Load Guest CET state" bit controls whether guest CET states
> > > > > > > > will be loaded at Guest entry. Before doing that, KVM needs
> > > > > > > > to check if CPU CET feature is available.
> > > > > > > >
> > > > > > > > Signed-off-by: Zhang Yi Z <[email protected]>
> > > > > > > > Signed-off-by: Yang Weijiang <[email protected]>
> > > > > > > > ---
> > > > > > > > arch/x86/kvm/vmx.c | 32 ++++++++++++++++++++++++++++++++
> > > > > > > > 1 file changed, 32 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > > > > > > index 89ee086e1729..d32cee9ee079 100644
> > > > > > > > --- a/arch/x86/kvm/vmx.c
> > > > > > > > +++ b/arch/x86/kvm/vmx.c
> > > > > > > > @@ -55,6 +55,7 @@
> > > > > > > > #include <asm/mmu_context.h>
> > > > > > > > #include <asm/spec-ctrl.h>
> > > > > > > > #include <asm/mshyperv.h>
> > > > > > > > +#include <asm/cet.h>
> > > > > > > >
> > > > > > > > #include "trace.h"
> > > > > > > > #include "pmu.h"
> > > > > > > > @@ -4065,6 +4066,20 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> > > > > > > > return !(val & ~valid_bits);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static int vmx_guest_cet_cap(struct kvm_vcpu *vcpu)
> > > > > > > > +{
> > > > > > > > + u32 eax, ebx, ecx, edx;
> > > > > > > > +
> > > > > > > > + /*
> > > > > > > > + * Guest CET can work as long as HW supports the feature, independent
> > > > > > > > + * to Host SW enabling status.
> > > > > > > > + */
> > > > > > > > + cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
> > > > > > > > +
> > > > > > > > + return ((ecx & bit(X86_FEATURE_SHSTK)) |
> > > > > > > > + (edx & bit(X86_FEATURE_IBT))) ? 1 : 0;
> > > > > > >
> > > > > > > Given the holes in the (current) architecture/spec, I think KVM has to
> > > > > > > require both features to be supported in the guest to allow CR4.CET to
> > > > > > > be enabled.
> > > > > > The reason why I use a "OR" here is to keep CET enabling control the
> > > > > > same as that on host, right now on host, users can select to enable SHSTK or IBT
> > > > > > feature by disabling the unexpected one. It's free to select SHSTK & IBT
> > > > > > or SHSTK | IBT.
> > > > >
> > > > > Which is not the same as SHSTK != IBT in *hardware*, which is effectively
> > > > > what this is allowing for the guest. The problem is that the architecture
> > > > > doesn't cleanly separate the two features, i.e. we'd have a virtualization
> > > > > hole where the guest could touch state for a disabled feature.
> > > > >
> > > > > Regardless, the guest would still be able to selectively enable each CET
> > > > > feature, it would just never see a model where SHSTK != IBT.
> > > > Hi, Sean,
> > > > I'd like to understand your concerns. From my point of view, e.g.,
> > > > when only IBT is enabled, PL3_SSP MSR would be unnecessrily exposed,
> > > > this is the design "limitation", but PL3_SSP keeps 0 if SHSTK is not
> > > > configured. Could you detail your concerns?
> > >
> > > In your approach, IA32_{S,U}_CET can be written if SHSTK or IBT is exposed
> > > to the guest. If only SHSTK is exposed, a devious guest can still use IBT
> > > because it can set CR4.CET as well as the enable bits in IA32_{S,U}_CET.
> > > Preventing the guest from using IBT in this scenario is infeasible as it
> > > would require trapping and emulating the XSAVE as well as the relevent CET
> > > MSRs.
> > Cannot agree with you more!
> > This is some design limitation, but from my point of view, once vmm
> > exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
> > which features to be enabled, we don't need to add extra constraints on
> > the usage.
>
> But if KVM allows SHSTK and IBT to be toggled independently then the VMM
> has only exposed SHSTK or IBT, not CET as whole.
>
> Even if SHSTK and IBT are bundled together the guest still has to opt-in
> to enabling each feature. I don't see what we gain by pretending that
> SHSTK/IBT can be individually exposed to the guest, and on the flip side
> doing so creates a virtualization hole.
you almost convinced me ;-), maybe I'll make the feature as a bundle in
next release after check with kernel team. BTW, what do you mean by
saying "create a virtualization hole"? Is it what you stated in above
reply?

2019-03-08 10:51:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On 25/02/19 14:27, Yang Weijiang wrote:
> For Guest XSS, right now, only bit 11(user states) and bit 12
> (supervisor states) are supported, if other bits are being set,
> need to modify KVM_SUPPORTED_XSS macro to have support.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index d32cee9ee079..68908ed7b151 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -47,6 +47,7 @@
> #include <asm/virtext.h>
> #include <asm/mce.h>
> #include <asm/fpu/internal.h>
> +#include <asm/fpu/types.h>
> #include <asm/perf_event.h>
> #include <asm/debugreg.h>
> #include <asm/kexec.h>
> @@ -4336,12 +4337,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_XSS:
> if (!vmx_xsaves_supported())
> return 1;
> +
> /*
> - * The only supported bit as of Skylake is bit 8, but
> - * it is not supported on KVM.
> + * Check bits being set are supported in KVM.
> */
> - if (data != 0)
> + if (data & ~kvm_supported_xss())
> return 1;
> +
> vcpu->arch.ia32_xss = data;
> if (vcpu->arch.ia32_xss != host_xss)
> add_atomic_switch_msr(vmx, MSR_IA32_XSS,
>

Luwei, can you work with Weijiang to add XSAVES support for Processor
Tracing?

Paolo

2019-03-08 11:32:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On 28/02/19 09:44, Yang Weijiang wrote:
>>> if (!vmx_xsaves_supported())
>>> return 1;
>>> +
>>> /*
>>> - * The only supported bit as of Skylake is bit 8, but
>>> - * it is not supported on KVM.
>>> + * Check bits being set are supported in KVM.
>> I'd drop the comment altogether, it's pretty obvious from the code that
>> were checking which bits are supported.
> you won't see these redundancies in next version ;)
>>> */
>>> - if (data != 0)
>>> + if (data & ~kvm_supported_xss())
>>> return 1;

You should instead check this against CPUID[0xD, 1].EDX:ECX. If CET is
disabled in CPUID, the guest should not be able to set it in MSR_IA32_CSS.

Paolo

2019-03-08 11:35:18

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On 25/02/19 14:27, Yang Weijiang wrote:
> + compacted = entry[i].eax &
> + (F(XSAVES) | F(XSAVEC));
> + entry[i].ebx = xstate_required_size(supported,
> + compacted);

If XSAVES and XSAVEC are both false, just set ebx to 0 as it was before
this patch.

Paolo

2019-03-08 16:31:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On Mon, Mar 04, 2019 at 08:36:55PM +0800, Yang Weijiang wrote:
> On Mon, Mar 04, 2019 at 07:12:02PM -0800, Sean Christopherson wrote:
> > On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
> > > Cannot agree with you more!
> > > This is some design limitation, but from my point of view, once vmm
> > > exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
> > > which features to be enabled, we don't need to add extra constraints on
> > > the usage.
> >
> > But if KVM allows SHSTK and IBT to be toggled independently then the VMM
> > has only exposed SHSTK or IBT, not CET as whole.
> >
> > Even if SHSTK and IBT are bundled together the guest still has to opt-in
> > to enabling each feature. I don't see what we gain by pretending that
> > SHSTK/IBT can be individually exposed to the guest, and on the flip side
> > doing so creates a virtualization hole.
> you almost convinced me ;-), maybe I'll make the feature as a bundle in
> next release after check with kernel team. BTW, what do you mean by
> saying "create a virtualization hole"? Is it what you stated in above
> reply?

By "virtualization hole" I mean the guest would be able to use a feature
that the virtual CPU model says isn't supported.

After rereading the XSS architecture, there's a marginally less crappy
option for handling XRSTOR as we could use the XSS_EXIT_BITMAP to
intercept XRSTOR if SHSTK != IBT and the guest is restoring CET state,
e.g. to ensure the guest isn't setting IA32_PL*_SSP if !SHSTK and isn't
setting bits that are effectively reserved in IA32_U_CET.

But practically speaking that'd be the same as intercepting XRSTORS
unconditionally when the guest is using CET, i.e. it's still going to
tank the performance of a guest that uses CET+XSAVES/XRSTORS.

2019-03-08 16:38:33

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM:VMX: Load Guest CET via VMCS when CET is enabled in Guest

On 08/03/19 17:28, Sean Christopherson wrote:
> On Mon, Mar 04, 2019 at 08:36:55PM +0800, Yang Weijiang wrote:
>> On Mon, Mar 04, 2019 at 07:12:02PM -0800, Sean Christopherson wrote:
>>> On Mon, Mar 04, 2019 at 05:56:40PM +0800, Yang Weijiang wrote:
>>>> Cannot agree with you more!
>>>> This is some design limitation, but from my point of view, once vmm
>>>> exposes CET capability to guest via CPUID, it grants the guest kernel freedom to choose
>>>> which features to be enabled, we don't need to add extra constraints on
>>>> the usage.
>>>
>>> But if KVM allows SHSTK and IBT to be toggled independently then the VMM
>>> has only exposed SHSTK or IBT, not CET as whole.
>>>
>>> Even if SHSTK and IBT are bundled together the guest still has to opt-in
>>> to enabling each feature. I don't see what we gain by pretending that
>>> SHSTK/IBT can be individually exposed to the guest, and on the flip side
>>> doing so creates a virtualization hole.
>> you almost convinced me ;-), maybe I'll make the feature as a bundle in
>> next release after check with kernel team. BTW, what do you mean by
>> saying "create a virtualization hole"? Is it what you stated in above
>> reply?
>
> By "virtualization hole" I mean the guest would be able to use a feature
> that the virtual CPU model says isn't supported.

I think it's okay to leave the hole and leave it to userspace to forbid
enabling only one of the bits.

Paolo

> After rereading the XSS architecture, there's a marginally less crappy
> option for handling XRSTOR as we could use the XSS_EXIT_BITMAP to
> intercept XRSTOR if SHSTK != IBT and the guest is restoring CET state,
> e.g. to ensure the guest isn't setting IA32_PL*_SSP if !SHSTK and isn't
> setting bits that are effectively reserved in IA32_U_CET.
>
> But practically speaking that'd be the same as intercepting XRSTORS
> unconditionally when the guest is using CET, i.e. it's still going to
> tank the performance of a guest that uses CET+XSAVES/XRSTORS.
>


2019-03-08 17:31:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] KVM:X86: Add user-space read/write interface for CET MSRs.

On Thu, Feb 28, 2019 at 05:41:52PM +0800, Yang Weijiang wrote:
> On Thu, Feb 28, 2019 at 08:32:49AM -0800, Sean Christopherson wrote:
> > On Mon, Feb 25, 2019 at 09:27:16PM +0800, Yang Weijiang wrote:

...

> > > static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
> > > - struct kvm_msr_entry *entries,
> > > + struct kvm_msr_entry *entries, bool read,
> > > int (*do_msr)(struct kvm_vcpu *vcpu,
> > > unsigned index, u64 *data))
> > > {
> > > int i;
> > >
> > > - for (i = 0; i < msrs->nmsrs; ++i)
> > > + for (i = 0; i < msrs->nmsrs; ++i) {
> > > + /* If it comes to CET related MSRs, read them together. */
> > > + if (entries[i].index == MSR_IA32_U_CET) {
> > > + i += do_cet_msrs(vcpu, i, entries, read) - 1;
> >
> > This wrong, not to mention horribly buggy. The correct way to handle
> > MSRs that are switched via the VMCS is to read/write the VMCS in
> > vmx_{get,set}_msr() as needed, e.g. vmcs_writel(GUEST_GS_BASE, data).
> >
> The code is barely for migration purpose since they're passed through
> to guest, I have no intent to expose them just like normal MSRs.
> I double checked the spec.:
> MSR_IA32_U_CET 0x6a0
> MSR_IA32_PL0_SSP 0x6a4
> MSR_IA32_PL3_SSP 0x6a7
> should come from xsave area.
>
> MSR_IA32_S_CET 0x6a2
> MSR_IA32_INTR_SSP_TABL 0x6a8
> should come from VMCS guest fields.
>
> for the former, they're stored in guest fpu area, need
> to restore them with xrstors temporarily before read, while the later is stored in
> VMCS guest fields, I can read them out via vmcs_read() directly.
>
> I'd like to read them out as a whole(all or nothing) due to cost induced
> by xsaves/xrstors, in Qemu I put these MSRs as sequential fields.
>
> what do you think of it?

It doesn't need to be all or nothing, it should be simple enough to set
a flag when we load guest state and use it to skip reloading guest state
and to load host state before returning. I think the attached patch
does the trick.


Attachments:
(No filename) (2.05 kB)
0001-KVM-x86-load-guest-fpu-state-when-accessing-MSRs-man.patch (3.77 kB)
Download all attachments

2019-03-11 02:32:36

by Luwei Kang

[permalink] [raw]
Subject: RE: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

> > For Guest XSS, right now, only bit 11(user states) and bit 12
> > (supervisor states) are supported, if other bits are being set, need
> > to modify KVM_SUPPORTED_XSS macro to have support.
> >
> > Signed-off-by: Zhang Yi Z <[email protected]>
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> > d32cee9ee079..68908ed7b151 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -47,6 +47,7 @@
> > #include <asm/virtext.h>
> > #include <asm/mce.h>
> > #include <asm/fpu/internal.h>
> > +#include <asm/fpu/types.h>
> > #include <asm/perf_event.h>
> > #include <asm/debugreg.h>
> > #include <asm/kexec.h>
> > @@ -4336,12 +4337,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > case MSR_IA32_XSS:
> > if (!vmx_xsaves_supported())
> > return 1;
> > +
> > /*
> > - * The only supported bit as of Skylake is bit 8, but
> > - * it is not supported on KVM.
> > + * Check bits being set are supported in KVM.
> > */
> > - if (data != 0)
> > + if (data & ~kvm_supported_xss())
> > return 1;
> > +
> > vcpu->arch.ia32_xss = data;
> > if (vcpu->arch.ia32_xss != host_xss)
> > add_atomic_switch_msr(vmx, MSR_IA32_XSS,
> >
>
> Luwei, can you work with Weijiang to add XSAVES support for Processor Tracing?

Hi Paolo,
As we talk long time before. I will do that.

Thanks,
Luwei Kang

>
> Paolo

2019-03-11 05:24:38

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM:CPUID: Add CPUID support for Guest CET

On Fri, Mar 08, 2019 at 12:32:12PM +0100, Paolo Bonzini wrote:
> On 25/02/19 14:27, Yang Weijiang wrote:
> > + compacted = entry[i].eax &
> > + (F(XSAVES) | F(XSAVEC));
> > + entry[i].ebx = xstate_required_size(supported,
> > + compacted);
>
> If XSAVES and XSAVEC are both false, just set ebx to 0 as it was before
> this patch.
>
> Paolo
Thanks Paulo, it makes sense to do so, will change it in next version.

2019-03-11 05:27:06

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On Fri, Mar 08, 2019 at 12:32:04PM +0100, Paolo Bonzini wrote:
> On 28/02/19 09:44, Yang Weijiang wrote:
> >>> if (!vmx_xsaves_supported())
> >>> return 1;
> >>> +
> >>> /*
> >>> - * The only supported bit as of Skylake is bit 8, but
> >>> - * it is not supported on KVM.
> >>> + * Check bits being set are supported in KVM.
> >> I'd drop the comment altogether, it's pretty obvious from the code that
> >> were checking which bits are supported.
> > you won't see these redundancies in next version ;)
> >>> */
> >>> - if (data != 0)
> >>> + if (data & ~kvm_supported_xss())
> >>> return 1;
>
> You should instead check this against CPUID[0xD, 1].EDX:ECX. If CET is
> disabled in CPUID, the guest should not be able to set it in MSR_IA32_CSS.
>
> Paolo
Thanks, OK, will change it.

2019-03-11 06:41:47

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On Sun, Mar 10, 2019 at 08:20:30PM +0800, Yang Weijiang wrote:
> On Fri, Mar 08, 2019 at 12:32:04PM +0100, Paolo Bonzini wrote:
> > On 28/02/19 09:44, Yang Weijiang wrote:
> > >>> if (!vmx_xsaves_supported())
> > >>> return 1;
> > >>> +
> > >>> /*
> > >>> - * The only supported bit as of Skylake is bit 8, but
> > >>> - * it is not supported on KVM.
> > >>> + * Check bits being set are supported in KVM.
> > >> I'd drop the comment altogether, it's pretty obvious from the code that
> > >> were checking which bits are supported.
> > > you won't see these redundancies in next version ;)
> > >>> */
> > >>> - if (data != 0)
> > >>> + if (data & ~kvm_supported_xss())
> > >>> return 1;
> >
> > You should instead check this against CPUID[0xD, 1].EDX:ECX. If CET is
> > disabled in CPUID, the guest should not be able to set it in MSR_IA32_CSS.
> >
> > Paolo
> Thanks, OK, will change it.
Hi, Paolo,
How about change kvm_supported_xss() as below so that CPUID[0xd,1] check
is implied in host_xss contents, vmx_supported_xss() now just returns host_xss
in vmx.c. The purpose of this change is to make XSS data check
common for other XSS based features.

+u64 kvm_supported_xss(void)
+{
+ return KVM_SUPPORTED_XSS & kvm_x86_ops->vmx_supported_xss();
+}
+


2019-03-11 15:34:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On 10/03/19 14:35, Yang Weijiang wrote:
>>>> - if (data != 0)
>>>> + if (data & ~kvm_supported_xss())
>>>> return 1;
>>> You should instead check this against CPUID[0xD, 1].EDX:ECX. If CET is
>>> disabled in CPUID, the guest should not be able to set it in MSR_IA32_CSS.
>>>
>>> Paolo
>> Thanks, OK, will change it.
> Hi, Paolo,
> How about change kvm_supported_xss() as below so that CPUID[0xd,1] check
> is implied in host_xss contents, vmx_supported_xss() now just returns host_xss
> in vmx.c. The purpose of this change is to make XSS data check
> common for other XSS based features.
>
> +u64 kvm_supported_xss(void)
> +{
> + return KVM_SUPPORTED_XSS & kvm_x86_ops->vmx_supported_xss();
> +}
> +
>

This is correct; however, you should also check against the *guest*'s
CPUID[0xD, 1].EDX:ECX.

One possibility is to add a field kvm->guest_supported_xss and update it
in kvm_update_cpuid, then here you do

if (data & ~kvm->guest_supported_xss)
return 1;

Thanks,

Paolo

2019-03-13 07:37:10

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM:X86: Add XSS bit 11 and 12 support for CET xsaves/xrstors.

On Mon, Mar 11, 2019 at 04:32:32PM +0100, Paolo Bonzini wrote:
> On 10/03/19 14:35, Yang Weijiang wrote:
> >>>> - if (data != 0)
> >>>> + if (data & ~kvm_supported_xss())
> >>>> return 1;
> >>> You should instead check this against CPUID[0xD, 1].EDX:ECX. If CET is
> >>> disabled in CPUID, the guest should not be able to set it in MSR_IA32_CSS.
> >>>
> >>> Paolo
> >> Thanks, OK, will change it.
> > Hi, Paolo,
> > How about change kvm_supported_xss() as below so that CPUID[0xd,1] check
> > is implied in host_xss contents, vmx_supported_xss() now just returns host_xss
> > in vmx.c. The purpose of this change is to make XSS data check
> > common for other XSS based features.
> >
> > +u64 kvm_supported_xss(void)
> > +{
> > + return KVM_SUPPORTED_XSS & kvm_x86_ops->vmx_supported_xss();
> > +}
> > +
> >
>
> This is correct; however, you should also check against the *guest*'s
> CPUID[0xD, 1].EDX:ECX.
>
> One possibility is to add a field kvm->guest_supported_xss and update it
> in kvm_update_cpuid, then here you do
>
> if (data & ~kvm->guest_supported_xss)
> return 1;
>
> Thanks,
>
> Paolo
Thanks Paolo, I'll add the change in next version.