2023-10-10 20:02:59

by John Allen

[permalink] [raw]
Subject: [PATCH 0/9] SVM guest shadow stack support

AMD Zen3 and newer processors support shadow stack, a feature designed to
protect against ROP (return-oriented programming) attacks in which an attacker
manipulates return addresses on the call stack in order to execute arbitrary
code. To prevent this, shadow stacks can be allocated that are only used by
control transfer and return instructions. When a CALL instruction is issued, it
writes the return address to both the program stack and the shadow stack. When
the subsequent RET instruction is issued, it pops the return address from both
stacks and compares them. If the addresses don't match, a control-protection
exception is raised.

Shadow stack and a related feature, Indirect Branch Tracking (IBT), are
collectively referred to as Control-flow Enforcement Technology (CET). However,
current AMD processors only support shadow stack and not IBT.

This series adds support for shadow stack in SVM guests and builds upon
the support added in the CET guest support patch series [1]. Additional
patches are required to support shadow stack enabled guests in qemu [2].

[1]: CET guest support patches (v6)
https://lore.kernel.org/all/[email protected]/

[2]: CET qemu patches
https://lore.kernel.org/all/[email protected]/

RFC v3 series:
https://lore.kernel.org/all/[email protected]/

---

RFC v2:
- Rebased on v3 of the Intel CET virtualization series, dropping the
patch that moved cet_is_msr_accessible to common code as that has
been pulled into the Intel series.
- Minor change removing curly brackets around if statement introduced
in patch 6/6.
RFC v3:
- Rebased on v5 of the Intel CET virtualization series.
- Add patch changing the name of vmplX_ssp SEV-ES save area fields to
plX_ssp.
- Merge this series intended for KVM with the separate guest kernel
patch (now patch 7/8).
- Update MSR passthrough code to conditionally pass through shadow
stack MSRS based on both host and guest support.
- Don't save PL0_SSP, PL1_SSP, and PL2_SSP MSRs on SEV-ES VMRUN as
these are currently unused.
v1:
- Remove RFC tag from series
- Rebase on v6 of the Intel CET virtualization series
- Use KVM-governed feature to track SHSTK for SVM

John Allen (9):
KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs
KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions
KVM: x86: SVM: Pass through shadow stack MSRs
KVM: SVM: Rename vmplX_ssp -> plX_ssp
KVM: SVM: Save shadow stack host state on VMRUN
KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel
x86/sev-es: Include XSS value in GHCB CPUID request
KVM: SVM: Use KVM-governed features to track SHSTK
KVM: SVM: Add CET features to supported_xss

arch/x86/include/asm/svm.h | 9 +++---
arch/x86/kernel/sev-shared.c | 15 ++++++++++
arch/x86/kvm/svm/sev.c | 21 ++++++++++++--
arch/x86/kvm/svm/svm.c | 54 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 3 +-
5 files changed, 95 insertions(+), 7 deletions(-)

--
2.40.1


2023-10-10 20:03:24

by John Allen

[permalink] [raw]
Subject: [PATCH 2/9] KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions

Add shadow stack VMCB save area fields to dump_vmcb. Only include S_CET,
SSP, and ISST_ADDR. Since there currently isn't support to decrypt and
dump the SEV-ES save area, exclude PL0_SSP, PL1_SSP, PL2_SSP, PL3_SSP, and
U_CET which are only inlcuded in the SEV-ES save area.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/kvm/svm/svm.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6a0d225311bc..e435e4fbadda 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3416,6 +3416,10 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
"rip:", save->rip, "rflags:", save->rflags);
pr_err("%-15s %016llx %-13s %016llx\n",
"rsp:", save->rsp, "rax:", save->rax);
+ pr_err("%-15s %016llx %-13s %016llx\n",
+ "s_cet:", save->s_cet, "ssp:", save->ssp);
+ pr_err("%-15s %016llx\n",
+ "isst_addr:", save->isst_addr);
pr_err("%-15s %016llx %-13s %016llx\n",
"star:", save01->star, "lstar:", save01->lstar);
pr_err("%-15s %016llx %-13s %016llx\n",
--
2.40.1

2023-10-10 20:03:29

by John Allen

[permalink] [raw]
Subject: [PATCH 3/9] KVM: x86: SVM: Pass through shadow stack MSRs

If kvm supports shadow stack, pass through shadow stack MSRs to improve
guest performance.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/kvm/svm/svm.c | 26 ++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 2 +-
2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e435e4fbadda..984e89d7a734 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -139,6 +139,13 @@ static const struct svm_direct_access_msrs {
{ .index = X2APIC_MSR(APIC_TMICT), .always = false },
{ .index = X2APIC_MSR(APIC_TMCCT), .always = false },
{ .index = X2APIC_MSR(APIC_TDCR), .always = false },
+ { .index = MSR_IA32_U_CET, .always = false },
+ { .index = MSR_IA32_S_CET, .always = false },
+ { .index = MSR_IA32_INT_SSP_TAB, .always = false },
+ { .index = MSR_IA32_PL0_SSP, .always = false },
+ { .index = MSR_IA32_PL1_SSP, .always = false },
+ { .index = MSR_IA32_PL2_SSP, .always = false },
+ { .index = MSR_IA32_PL3_SSP, .always = false },
{ .index = MSR_INVALID, .always = false },
};

@@ -1225,6 +1232,25 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
}
+
+ if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+ bool shstk_enabled = guest_cpuid_has(vcpu, X86_FEATURE_SHSTK);
+
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_U_CET,
+ shstk_enabled, shstk_enabled);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_S_CET,
+ shstk_enabled, shstk_enabled);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_INT_SSP_TAB,
+ shstk_enabled, shstk_enabled);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PL0_SSP,
+ shstk_enabled, shstk_enabled);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PL1_SSP,
+ shstk_enabled, shstk_enabled);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PL2_SSP,
+ shstk_enabled, shstk_enabled);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_PL3_SSP,
+ shstk_enabled, shstk_enabled);
+ }
}

static void init_vmcb(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f41253958357..bdc39003b955 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2

-#define MAX_DIRECT_ACCESS_MSRS 46
+#define MAX_DIRECT_ACCESS_MSRS 53
#define MSRPM_OFFSETS 32
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
--
2.40.1

2023-10-10 20:03:35

by John Allen

[permalink] [raw]
Subject: [PATCH 1/9] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs

Set up interception of shadow stack MSRs. In the event that shadow stack
is unsupported on the host or the MSRs are otherwise inaccessible, the
interception code will return an error. In certain circumstances such as
host initiated MSR reads or writes, the interception code will get or
set the requested MSR value.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/kvm/svm/svm.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f283eb47f6ac..6a0d225311bc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2859,6 +2859,15 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (guest_cpuid_is_intel(vcpu))
msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
break;
+ case MSR_IA32_S_CET:
+ msr_info->data = svm->vmcb->save.s_cet;
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ msr_info->data = svm->vmcb->save.isst_addr;
+ break;
+ case MSR_KVM_SSP:
+ msr_info->data = svm->vmcb->save.ssp;
+ break;
case MSR_TSC_AUX:
msr_info->data = svm->tsc_aux;
break;
@@ -3085,6 +3094,15 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
svm->vmcb01.ptr->save.sysenter_esp = (u32)data;
svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
break;
+ case MSR_IA32_S_CET:
+ svm->vmcb->save.s_cet = data;
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ svm->vmcb->save.isst_addr = data;
+ break;
+ case MSR_KVM_SSP:
+ svm->vmcb->save.ssp = data;
+ break;
case MSR_TSC_AUX:
/*
* TSC_AUX is usually changed only during boot and never read
--
2.40.1

2023-10-10 20:03:45

by John Allen

[permalink] [raw]
Subject: [PATCH 4/9] KVM: SVM: Rename vmplX_ssp -> plX_ssp

Rename SEV-ES save area SSP fields to be consistent with the APM.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/include/asm/svm.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 19bf955b67e0..568d97084e44 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -361,10 +361,10 @@ struct sev_es_save_area {
struct vmcb_seg ldtr;
struct vmcb_seg idtr;
struct vmcb_seg tr;
- u64 vmpl0_ssp;
- u64 vmpl1_ssp;
- u64 vmpl2_ssp;
- u64 vmpl3_ssp;
+ u64 pl0_ssp;
+ u64 pl1_ssp;
+ u64 pl2_ssp;
+ u64 pl3_ssp;
u64 u_cet;
u8 reserved_0xc8[2];
u8 vmpl;
--
2.40.1

2023-10-10 20:04:08

by John Allen

[permalink] [raw]
Subject: [PATCH 5/9] KVM: SVM: Save shadow stack host state on VMRUN

When running as an SEV-ES guest, the PL0_SSP, PL1_SSP, PL2_SSP, PL3_SSP,
and U_CET fields in the VMCB save area are type B, meaning the host
state is automatically loaded on a VMEXIT, but is not saved on a VMRUN.
The other shadow stack MSRs, S_CET, SSP, and ISST_ADDR are type A,
meaning they are loaded on VMEXIT and saved on VMRUN. PL0_SSP, PL1_SSP,
and PL2_SSP are currently unused. Manually save the other type B host
MSR values before VMRUN.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/kvm/svm/sev.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b9a0a939d59f..bb4b18baa6f7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3098,6 +3098,15 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
}
+
+ if (boot_cpu_has(X86_FEATURE_SHSTK)) {
+ /*
+ * MSR_IA32_U_CET and MSR_IA32_PL3_SSP are restored on VMEXIT,
+ * save the current host values.
+ */
+ rdmsrl(MSR_IA32_U_CET, hostsa->u_cet);
+ rdmsrl(MSR_IA32_PL3_SSP, hostsa->pl3_ssp);
+ }
}

void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
--
2.40.1

2023-10-10 20:04:38

by John Allen

[permalink] [raw]
Subject: [PATCH 9/9] KVM: SVM: Add CET features to supported_xss

If the CPU supports CET, add CET XSAVES feature bits to the
supported_xss mask.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/kvm/svm/svm.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 00a8cef3cbb8..f63b2bbac542 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5151,6 +5151,10 @@ static __init void svm_set_cpu_caps(void)
boot_cpu_has(X86_FEATURE_AMD_SSBD))
kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);

+ if (kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+ kvm_caps.supported_xss |= XFEATURE_MASK_CET_USER |
+ XFEATURE_MASK_CET_KERNEL;
+
if (enable_pmu) {
/*
* Enumerate support for PERFCTR_CORE if and only if KVM has
--
2.40.1

2023-10-10 20:04:40

by John Allen

[permalink] [raw]
Subject: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

When a guest issues a cpuid instruction for Fn0000000D_x0B
(CetUserOffset), KVM will intercept and need to access the guest
MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
included in the GHCB to be visible to the hypervisor.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 12 ++++++++++--
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 3 ++-
4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 568d97084e44..5afc9e03379d 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -678,5 +678,6 @@ DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
DEFINE_GHCB_ACCESSORS(sw_scratch)
DEFINE_GHCB_ACCESSORS(xcr0)
+DEFINE_GHCB_ACCESSORS(xss)

#endif
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index bb4b18baa6f7..94ab7203525f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2445,8 +2445,13 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)

svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);

- if (kvm_ghcb_xcr0_is_valid(svm)) {
- vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
+ if (kvm_ghcb_xcr0_is_valid(svm) || kvm_ghcb_xss_is_valid(svm)) {
+ if (kvm_ghcb_xcr0_is_valid(svm))
+ vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
+
+ if (kvm_ghcb_xss_is_valid(svm))
+ vcpu->arch.ia32_xss = ghcb_get_xss(ghcb);
+
kvm_update_cpuid_runtime(vcpu);
}

@@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
svm_clr_intercept(svm, INTERCEPT_RDTSCP);
}
+
+ if (kvm_caps.supported_xss)
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
}

void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 984e89d7a734..ee7c7d0a09ab 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -146,6 +146,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_IA32_PL1_SSP, .always = false },
{ .index = MSR_IA32_PL2_SSP, .always = false },
{ .index = MSR_IA32_PL3_SSP, .always = false },
+ { .index = MSR_IA32_XSS, .always = false },
{ .index = MSR_INVALID, .always = false },
};

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index bdc39003b955..2011456d2e9f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2

-#define MAX_DIRECT_ACCESS_MSRS 53
+#define MAX_DIRECT_ACCESS_MSRS 54
#define MSRPM_OFFSETS 32
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
@@ -720,5 +720,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
DEFINE_KVM_GHCB_ACCESSORS(xcr0)
+DEFINE_KVM_GHCB_ACCESSORS(xss)

#endif
--
2.40.1

2023-10-10 20:04:56

by John Allen

[permalink] [raw]
Subject: [PATCH 8/9] KVM: SVM: Use KVM-governed features to track SHSTK

Use the KVM-governed features framework to track whether SHSTK can be by
both userspace and guest for SVM.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/kvm/svm/svm.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ee7c7d0a09ab..00a8cef3cbb8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4366,6 +4366,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF);
kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
+ kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_SHSTK);

svm_recalc_instruction_intercepts(vcpu, svm);

--
2.40.1

2023-10-10 20:42:47

by John Allen

[permalink] [raw]
Subject: [PATCH 7/9] x86/sev-es: Include XSS value in GHCB CPUID request

When a guest issues a cpuid instruction for Fn0000000D_x0B (CetUserOffset), the
hypervisor may intercept and access the guest XSS value. For SEV-ES, this is
encrypted and needs to be included in the GHCB to be visible to the hypervisor.
The rdmsr instruction needs to be called directly as the code may be used in
early boot in which case the rdmsr wrappers should be avoided as they are
incompatible with the decompression boot phase.

Signed-off-by: John Allen <[email protected]>
---
arch/x86/kernel/sev-shared.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 2eabccde94fb..e38a1d049bc1 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -890,6 +890,21 @@ static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
/* xgetbv will cause #GP - use reset value for xcr0 */
ghcb_set_xcr0(ghcb, 1);

+ if (has_cpuflag(X86_FEATURE_SHSTK) && regs->ax == 0xd && regs->cx <= 1) {
+ unsigned long lo, hi;
+ u64 xss;
+
+ /*
+ * Since vc_handle_cpuid may be used during early boot, the
+ * rdmsr wrappers are incompatible and should not be used.
+ * Invoke the instruction directly.
+ */
+ asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
+ : "c" (MSR_IA32_XSS));
+ xss = (hi << 32) | lo;
+ ghcb_set_xss(ghcb, xss);
+ }
+
ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
if (ret != ES_OK)
return ret;
--
2.40.1

2023-10-12 09:02:09

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [PATCH 3/9] KVM: x86: SVM: Pass through shadow stack MSRs

On 10/11/2023 1:32 AM, John Allen wrote:
> If kvm supports shadow stack, pass through shadow stack MSRs to improve
> guest performance.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 26 ++++++++++++++++++++++++++
> arch/x86/kvm/svm/svm.h | 2 +-
> 2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e435e4fbadda..984e89d7a734 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -139,6 +139,13 @@ static const struct svm_direct_access_msrs {
> { .index = X2APIC_MSR(APIC_TMICT), .always = false },
> { .index = X2APIC_MSR(APIC_TMCCT), .always = false },
> { .index = X2APIC_MSR(APIC_TDCR), .always = false },
> + { .index = MSR_IA32_U_CET, .always = false },
> + { .index = MSR_IA32_S_CET, .always = false },
> + { .index = MSR_IA32_INT_SSP_TAB, .always = false },
> + { .index = MSR_IA32_PL0_SSP, .always = false },
> + { .index = MSR_IA32_PL1_SSP, .always = false },
> + { .index = MSR_IA32_PL2_SSP, .always = false },
> + { .index = MSR_IA32_PL3_SSP, .always = false },

First three MSRs are emulated in the patch 1, any specific reason for skipping MSR_IA32_PL[0-3]_SSP ?

Regards,
Nikunj

2023-10-12 12:59:58

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/sev-es: Include XSS value in GHCB CPUID request

On Tue, Oct 10, 2023 at 08:02:18PM +0000, John Allen wrote:
> When a guest issues a cpuid instruction for Fn0000000D_x0B (CetUserOffset), the
> hypervisor may intercept and access the guest XSS value. For SEV-ES, this is
> encrypted and needs to be included in the GHCB to be visible to the hypervisor.
> The rdmsr instruction needs to be called directly as the code may be used in
> early boot in which case the rdmsr wrappers should be avoided as they are
> incompatible with the decompression boot phase.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/kernel/sev-shared.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 2eabccde94fb..e38a1d049bc1 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -890,6 +890,21 @@ static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
> /* xgetbv will cause #GP - use reset value for xcr0 */
> ghcb_set_xcr0(ghcb, 1);
>
> + if (has_cpuflag(X86_FEATURE_SHSTK) && regs->ax == 0xd && regs->cx <= 1) {
> + unsigned long lo, hi;
> + u64 xss;
> +
> + /*
> + * Since vc_handle_cpuid may be used during early boot, the
> + * rdmsr wrappers are incompatible and should not be used.
> + * Invoke the instruction directly.
> + */
> + asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
> + : "c" (MSR_IA32_XSS));

Does __rdmsr() work too?

I know it has exception handling but a SEV-ES guest should not fault
when accessing MSR_IA32_XSS anyway, especially if it has shadow stack
enabled. And if it does fault, your version would explode too but
__rdmsr() would be at least less code. :)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-10-14 00:32:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

On Tue, Oct 10, 2023, John Allen wrote:
> When a guest issues a cpuid instruction for Fn0000000D_x0B
> (CetUserOffset), KVM will intercept and need to access the guest
> MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
> included in the GHCB to be visible to the hypervisor.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> }
> +
> + if (kvm_caps.supported_xss)
> + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);

This creates a giant gaping virtualization hole for the guest to walk through.
Want to hide shadow stacks from the guest because hardware is broken? Too bad,
the guest can still set any and all XFeature bits it wants! I realize "KVM"
already creates such a hole by disabling interception of XSETBV, but that doesn't
make it right. In quotes because it's not like KVM has a choice for SEV-ES guests.

This is another example of SEV-SNP and beyond clearly being designed to work with
a paravisor, SVM_VMGEXIT_AP_CREATE being the other big one that comes to mind.

KVM should at least block CET by killing the VM if the guest illegally sets
CR4.CET. Ugh, and is that even possible with SVM_VMGEXIT_AP_CREATE? The guest
can shove in whatever CR4 it wants, so long as it the value is supported by
hardware.

At what point do we bite the bullet and require a paravisor? Because the more I
see of SNP, the more I'm convinced that it's not entirely safe to run untrusted
guest code at VMPL0.

2023-10-17 18:12:52

by John Allen

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/sev-es: Include XSS value in GHCB CPUID request

On Thu, Oct 12, 2023 at 02:59:24PM +0200, Borislav Petkov wrote:
> On Tue, Oct 10, 2023 at 08:02:18PM +0000, John Allen wrote:
> > When a guest issues a cpuid instruction for Fn0000000D_x0B (CetUserOffset), the
> > hypervisor may intercept and access the guest XSS value. For SEV-ES, this is
> > encrypted and needs to be included in the GHCB to be visible to the hypervisor.
> > The rdmsr instruction needs to be called directly as the code may be used in
> > early boot in which case the rdmsr wrappers should be avoided as they are
> > incompatible with the decompression boot phase.
> >
> > Signed-off-by: John Allen <[email protected]>
> > ---
> > arch/x86/kernel/sev-shared.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> > index 2eabccde94fb..e38a1d049bc1 100644
> > --- a/arch/x86/kernel/sev-shared.c
> > +++ b/arch/x86/kernel/sev-shared.c
> > @@ -890,6 +890,21 @@ static enum es_result vc_handle_cpuid(struct ghcb *ghcb,
> > /* xgetbv will cause #GP - use reset value for xcr0 */
> > ghcb_set_xcr0(ghcb, 1);
> >
> > + if (has_cpuflag(X86_FEATURE_SHSTK) && regs->ax == 0xd && regs->cx <= 1) {
> > + unsigned long lo, hi;
> > + u64 xss;
> > +
> > + /*
> > + * Since vc_handle_cpuid may be used during early boot, the
> > + * rdmsr wrappers are incompatible and should not be used.
> > + * Invoke the instruction directly.
> > + */
> > + asm volatile("rdmsr" : "=a" (lo), "=d" (hi)
> > + : "c" (MSR_IA32_XSS));
>
> Does __rdmsr() work too?
>
> I know it has exception handling but a SEV-ES guest should not fault
> when accessing MSR_IA32_XSS anyway, especially if it has shadow stack
> enabled. And if it does fault, your version would explode too but
> __rdmsr() would be at least less code. :)

I looked into using __rdmsr in an earlier revision of the patch, but
found that it causes a build warning:

ld: warning: orphan section `__ex_table' from `arch/x86/boot/compressed/sev.o' being placed in section `__ex_table'

This is due to the __ex_table section not being used during
decompression boot. Do you know of a way around this?

Thanks,
John

2023-10-17 18:17:50

by John Allen

[permalink] [raw]
Subject: Re: [PATCH 3/9] KVM: x86: SVM: Pass through shadow stack MSRs

On Thu, Oct 12, 2023 at 02:31:19PM +0530, Nikunj A. Dadhania wrote:
> On 10/11/2023 1:32 AM, John Allen wrote:
> > If kvm supports shadow stack, pass through shadow stack MSRs to improve
> > guest performance.
> >
> > Signed-off-by: John Allen <[email protected]>
> > ---
> > arch/x86/kvm/svm/svm.c | 26 ++++++++++++++++++++++++++
> > arch/x86/kvm/svm/svm.h | 2 +-
> > 2 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index e435e4fbadda..984e89d7a734 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -139,6 +139,13 @@ static const struct svm_direct_access_msrs {
> > { .index = X2APIC_MSR(APIC_TMICT), .always = false },
> > { .index = X2APIC_MSR(APIC_TMCCT), .always = false },
> > { .index = X2APIC_MSR(APIC_TDCR), .always = false },
> > + { .index = MSR_IA32_U_CET, .always = false },
> > + { .index = MSR_IA32_S_CET, .always = false },
> > + { .index = MSR_IA32_INT_SSP_TAB, .always = false },
> > + { .index = MSR_IA32_PL0_SSP, .always = false },
> > + { .index = MSR_IA32_PL1_SSP, .always = false },
> > + { .index = MSR_IA32_PL2_SSP, .always = false },
> > + { .index = MSR_IA32_PL3_SSP, .always = false },
>
> First three MSRs are emulated in the patch 1, any specific reason for skipping MSR_IA32_PL[0-3]_SSP ?

I'm not sure what you mean. The PLx_SSP MSRS should be getting passed
through here unless I'm misunderstanding something.

Thanks,
John

2023-10-17 18:51:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/sev-es: Include XSS value in GHCB CPUID request

On Tue, Oct 17, 2023 at 01:12:30PM -0500, John Allen wrote:
> I looked into using __rdmsr in an earlier revision of the patch, but
> found that it causes a build warning:
>
> ld: warning: orphan section `__ex_table' from `arch/x86/boot/compressed/sev.o' being placed in section `__ex_table'
>
> This is due to the __ex_table section not being used during
> decompression boot. Do you know of a way around this?

Yeah, arch/x86/boot/msr.h.

We did those a while ago. I guess they could be moved to
asm/shared/msr.h and renamed to something that is not a
"boot_" prefix.

Something like

rdmsr_without_any_exception_handling_and_other_gunk_don_t_you_even_think_of_adding()

...

But srsly:

raw_rdmsr()
raw_wrmsr()

should be good, as tglx suggests offlist.

You can do that in one patch and prepend your set with it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-10-18 11:27:43

by Nikunj A. Dadhania

[permalink] [raw]
Subject: Re: [PATCH 3/9] KVM: x86: SVM: Pass through shadow stack MSRs

On 10/17/2023 11:47 PM, John Allen wrote:
> On Thu, Oct 12, 2023 at 02:31:19PM +0530, Nikunj A. Dadhania wrote:
>> On 10/11/2023 1:32 AM, John Allen wrote:
>>> If kvm supports shadow stack, pass through shadow stack MSRs to improve
>>> guest performance.
>>>
>>> Signed-off-by: John Allen <[email protected]>
>>> ---
>>> arch/x86/kvm/svm/svm.c | 26 ++++++++++++++++++++++++++
>>> arch/x86/kvm/svm/svm.h | 2 +-
>>> 2 files changed, 27 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index e435e4fbadda..984e89d7a734 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -139,6 +139,13 @@ static const struct svm_direct_access_msrs {
>>> { .index = X2APIC_MSR(APIC_TMICT), .always = false },
>>> { .index = X2APIC_MSR(APIC_TMCCT), .always = false },
>>> { .index = X2APIC_MSR(APIC_TDCR), .always = false },
>>> + { .index = MSR_IA32_U_CET, .always = false },
>>> + { .index = MSR_IA32_S_CET, .always = false },
>>> + { .index = MSR_IA32_INT_SSP_TAB, .always = false },
>>> + { .index = MSR_IA32_PL0_SSP, .always = false },
>>> + { .index = MSR_IA32_PL1_SSP, .always = false },
>>> + { .index = MSR_IA32_PL2_SSP, .always = false },
>>> + { .index = MSR_IA32_PL3_SSP, .always = false },
>>
>> First three MSRs are emulated in the patch 1, any specific reason for skipping MSR_IA32_PL[0-3]_SSP ?
>
> I'm not sure what you mean.

MSR_IA32_U_CET, MSR_IA32_S_CET and MSR_IA32_INT_SSP_TAB are selectively emulated and there is no good explanation why MSR_IA32_PL[0-3]_SSP do not need emulation. Moreover, MSR interception is initially set (i.e. always=false) for MSR_IA32_PL[0-3]_SSP.

> The PLx_SSP MSRS should be getting passed
> through here unless I'm misunderstanding something.

In that case, intercept should be cleared from the very beginning.

+ { .index = MSR_IA32_PL0_SSP, .always = true },
+ { .index = MSR_IA32_PL1_SSP, .always = true },
+ { .index = MSR_IA32_PL2_SSP, .always = true },
+ { .index = MSR_IA32_PL3_SSP, .always = true },

Regards
Nikunj


2023-11-02 18:10:51

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 4/9] KVM: SVM: Rename vmplX_ssp -> plX_ssp

On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> Rename SEV-ES save area SSP fields to be consistent with the APM.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/include/asm/svm.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 19bf955b67e0..568d97084e44 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -361,10 +361,10 @@ struct sev_es_save_area {
> struct vmcb_seg ldtr;
> struct vmcb_seg idtr;
> struct vmcb_seg tr;
> - u64 vmpl0_ssp;
> - u64 vmpl1_ssp;
> - u64 vmpl2_ssp;
> - u64 vmpl3_ssp;
> + u64 pl0_ssp;
> + u64 pl1_ssp;
> + u64 pl2_ssp;
> + u64 pl3_ssp;
> u64 u_cet;
> u8 reserved_0xc8[2];
> u8 vmpl;

Matches the APM.

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2023-11-02 18:11:12

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 3/9] KVM: x86: SVM: Pass through shadow stack MSRs

On Wed, 2023-10-18 at 16:57 +0530, Nikunj A. Dadhania wrote:
> On 10/17/2023 11:47 PM, John Allen wrote:
> > On Thu, Oct 12, 2023 at 02:31:19PM +0530, Nikunj A. Dadhania wrote:
> > > On 10/11/2023 1:32 AM, John Allen wrote:
> > > > If kvm supports shadow stack, pass through shadow stack MSRs to improve
> > > > guest performance.
> > > >
> > > > Signed-off-by: John Allen <[email protected]>
> > > > ---
> > > > arch/x86/kvm/svm/svm.c | 26 ++++++++++++++++++++++++++
> > > > arch/x86/kvm/svm/svm.h | 2 +-
> > > > 2 files changed, 27 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index e435e4fbadda..984e89d7a734 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -139,6 +139,13 @@ static const struct svm_direct_access_msrs {
> > > > { .index = X2APIC_MSR(APIC_TMICT), .always = false },
> > > > { .index = X2APIC_MSR(APIC_TMCCT), .always = false },
> > > > { .index = X2APIC_MSR(APIC_TDCR), .always = false },
> > > > + { .index = MSR_IA32_U_CET, .always = false },
> > > > + { .index = MSR_IA32_S_CET, .always = false },
> > > > + { .index = MSR_IA32_INT_SSP_TAB, .always = false },
> > > > + { .index = MSR_IA32_PL0_SSP, .always = false },
> > > > + { .index = MSR_IA32_PL1_SSP, .always = false },
> > > > + { .index = MSR_IA32_PL2_SSP, .always = false },
> > > > + { .index = MSR_IA32_PL3_SSP, .always = false },
> > >
> > > First three MSRs are emulated in the patch 1, any specific reason for skipping MSR_IA32_PL[0-3]_SSP ?
> >
> > I'm not sure what you mean.
>
> MSR_IA32_U_CET, MSR_IA32_S_CET and MSR_IA32_INT_SSP_TAB are selectively emulated and there is no good explanation why MSR_IA32_PL[0-3]_SSP do not need emulation. Moreover, MSR interception is initially set (i.e. always=false) for MSR_IA32_PL[0-3]_SSP.
>

Let me explain:

Passing through an MSR and having code for reading/writing it in svm_set_msr/svm_get_msr are two different things:

Passing through an MSR means that the guest can read/write the msr freely (that assumes that this can't cause harm to the host),
but either KVM or the hardware usually swaps the guest MSR value with host msr value on vm entry/exit.

Therefore the function of svm_set_msr/svm_get_msr is to obtain the saved guest msr value while the guest is not running for various use cases (for example for migration).

In case of MSR_IA32_U_CET, MSR_IA32_S_CET and MSR_IA32_INT_SSP_TAB the hardware itself loads the guest values from the vmcb on VM entry and saves the modified guest values
to the vmcb on vm exit, thus correct way of reading/writing the guest value is to read/write it from/to the vmcb field.


Now why other CET msrs are not handled by svm_set_msr/svm_get_msr?
The answer is that those msrs are not saved/restored by SVM microcode, and instead their guest values remain
during the VMexit.

The CET common code which I finally finished reviewing last night, context switches them together with the rest of guest FPU state using xsaves/xrstors instruction,
and if the KVM wants to read these msrs, the common code will first load the guest FPU state and then read/write the msrs using regular rdmsr/wrmsr.


> > The PLx_SSP MSRS should be getting passed
> > through here unless I'm misunderstanding something.
>
> In that case, intercept should be cleared from the very beginning.
>
> + { .index = MSR_IA32_PL0_SSP, .always = true },
> + { .index = MSR_IA32_PL1_SSP, .always = true },
> + { .index = MSR_IA32_PL2_SSP, .always = true },
> + { .index = MSR_IA32_PL3_SSP, .always = true },

.always is only true when a MSR is *always* passed through. CET msrs are only passed through when CET is supported.

Therefore I don't expect that we ever add another msr to this list which has .always = true.

In fact the .always = True for X86_64 arch msrs like MSR_GS_BASE/MSR_FS_BASE and such is not 100% correct too -
when we start a VM which doesn't have cpuid bit X86_FEATURE_LM, these msrs should not exist and I think that we have a
kvm unit test that fails because of this on 32 bit but I didn't bother yet to fix it.

.always probably needs to be dropped completely.


So IMHO this patch is correct (I might have missed something though):

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

>
> Regards
> Nikunj
>
>


2023-11-02 18:11:25

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 8/9] KVM: SVM: Use KVM-governed features to track SHSTK

On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> Use the KVM-governed features framework to track whether SHSTK can be by
> both userspace and guest for SVM.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ee7c7d0a09ab..00a8cef3cbb8 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4366,6 +4366,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
> kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF);
> kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
> + kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_SHSTK);
>
> svm_recalc_instruction_intercepts(vcpu, svm);
>

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2023-11-02 18:11:48

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 9/9] KVM: SVM: Add CET features to supported_xss

On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> If the CPU supports CET, add CET XSAVES feature bits to the
> supported_xss mask.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 00a8cef3cbb8..f63b2bbac542 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5151,6 +5151,10 @@ static __init void svm_set_cpu_caps(void)
> boot_cpu_has(X86_FEATURE_AMD_SSBD))
> kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
>
> + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> + kvm_caps.supported_xss |= XFEATURE_MASK_CET_USER |
> + XFEATURE_MASK_CET_KERNEL;
> +
> if (enable_pmu) {
> /*
> * Enumerate support for PERFCTR_CORE if and only if KVM has

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2023-11-02 18:13:17

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 5/9] KVM: SVM: Save shadow stack host state on VMRUN

On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> When running as an SEV-ES guest, the PL0_SSP, PL1_SSP, PL2_SSP, PL3_SSP,
> and U_CET fields in the VMCB save area are type B, meaning the host
> state is automatically loaded on a VMEXIT, but is not saved on a VMRUN.
> The other shadow stack MSRs, S_CET, SSP, and ISST_ADDR are type A,
> meaning they are loaded on VMEXIT and saved on VMRUN. PL0_SSP, PL1_SSP,
> and PL2_SSP are currently unused. Manually save the other type B host
> MSR values before VMRUN.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b9a0a939d59f..bb4b18baa6f7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3098,6 +3098,15 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
> hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
> }
> +
> + if (boot_cpu_has(X86_FEATURE_SHSTK)) {
> + /*
> + * MSR_IA32_U_CET and MSR_IA32_PL3_SSP are restored on VMEXIT,
> + * save the current host values.
> + */
> + rdmsrl(MSR_IA32_U_CET, hostsa->u_cet);
> + rdmsrl(MSR_IA32_PL3_SSP, hostsa->pl3_ssp);
> + }
> }


Do we actually need this patch?

Host FPU state is not encrypted, and as far as I understood from the common CET patch series,
that on return to userspace these msrs will be restored.

Best regards,
Maxim Levitsky


PS: AMD's APM is silent on how 'S_CET, SSP, and ISST_ADDR' are saved/restored for non encrypted guests.
Are they also type A?

Can the VMSA table be trusted in general to provide the same swap type as for the non encrypted guests?


>
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)


2023-11-02 18:13:28

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> When a guest issues a cpuid instruction for Fn0000000D_x0B
> (CetUserOffset), KVM will intercept and need to access the guest
> MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
> included in the GHCB to be visible to the hypervisor.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/include/asm/svm.h | 1 +
> arch/x86/kvm/svm/sev.c | 12 ++++++++++--
> arch/x86/kvm/svm/svm.c | 1 +
> arch/x86/kvm/svm/svm.h | 3 ++-
> 4 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 568d97084e44..5afc9e03379d 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -678,5 +678,6 @@ DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
> DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
> DEFINE_GHCB_ACCESSORS(sw_scratch)
> DEFINE_GHCB_ACCESSORS(xcr0)
> +DEFINE_GHCB_ACCESSORS(xss)

I don't see anywhere in the patch adding xss to ghcb_save_area.
What kernel version/commit these patches are based on?

>
> #endif
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index bb4b18baa6f7..94ab7203525f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2445,8 +2445,13 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>
> svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
>
> - if (kvm_ghcb_xcr0_is_valid(svm)) {
> - vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> + if (kvm_ghcb_xcr0_is_valid(svm) || kvm_ghcb_xss_is_valid(svm)) {
> + if (kvm_ghcb_xcr0_is_valid(svm))
> + vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> +
> + if (kvm_ghcb_xss_is_valid(svm))
> + vcpu->arch.ia32_xss = ghcb_get_xss(ghcb);
> +
> kvm_update_cpuid_runtime(vcpu);
> }
>
> @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> }
> +
> + if (kvm_caps.supported_xss)
> + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);

This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
to whatever value it wants, and thus it might allow XSAVES to access some host msrs
that guest must not be able to access.

AMD might not yet have such msrs, but on Intel side I do see various components
like 'HDC State', 'HWP state' and such.

I understand that this is needed so that #VC handler could read this msr, and trying
to read it will cause another #VC which is probably not allowed (I don't know this detail of SEV-ES)

I guess #VC handler should instead use a kernel cached value of this msr instead, or at least
KVM should only allow reads and not writes to it.

In addition to that, if we decide to open the read access to the IA32_XSS from the guest,
this IMHO should be done in a separate patch.

Best regards,
Maxim Levitsky


> }
>
> void sev_init_vmcb(struct vcpu_svm *svm)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 984e89d7a734..ee7c7d0a09ab 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -146,6 +146,7 @@ static const struct svm_direct_access_msrs {
> { .index = MSR_IA32_PL1_SSP, .always = false },
> { .index = MSR_IA32_PL2_SSP, .always = false },
> { .index = MSR_IA32_PL3_SSP, .always = false },
> + { .index = MSR_IA32_XSS, .always = false },
> { .index = MSR_INVALID, .always = false },
> };
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index bdc39003b955..2011456d2e9f 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -30,7 +30,7 @@
> #define IOPM_SIZE PAGE_SIZE * 3
> #define MSRPM_SIZE PAGE_SIZE * 2
>
> -#define MAX_DIRECT_ACCESS_MSRS 53
> +#define MAX_DIRECT_ACCESS_MSRS 54
> #define MSRPM_OFFSETS 32
> extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> extern bool npt_enabled;
> @@ -720,5 +720,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
> DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
> DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
> DEFINE_KVM_GHCB_ACCESSORS(xcr0)
> +DEFINE_KVM_GHCB_ACCESSORS(xss)
>
> #endif




2023-11-02 18:15:56

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 1/9] KVM: x86: SVM: Emulate reads and writes to shadow stack MSRs

On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> Set up interception of shadow stack MSRs. In the event that shadow stack
> is unsupported on the host or the MSRs are otherwise inaccessible, the
> interception code will return an error. In certain circumstances such as
> host initiated MSR reads or writes, the interception code will get or
> set the requested MSR value.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f283eb47f6ac..6a0d225311bc 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2859,6 +2859,15 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> if (guest_cpuid_is_intel(vcpu))
> msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
> break;
> + case MSR_IA32_S_CET:
> + msr_info->data = svm->vmcb->save.s_cet;
> + break;
> + case MSR_IA32_INT_SSP_TAB:
> + msr_info->data = svm->vmcb->save.isst_addr;
> + break;
> + case MSR_KVM_SSP:
> + msr_info->data = svm->vmcb->save.ssp;
> + break;
> case MSR_TSC_AUX:
> msr_info->data = svm->tsc_aux;
> break;
> @@ -3085,6 +3094,15 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> svm->vmcb01.ptr->save.sysenter_esp = (u32)data;
> svm->sysenter_esp_hi = guest_cpuid_is_intel(vcpu) ? (data >> 32) : 0;
> break;
> + case MSR_IA32_S_CET:
> + svm->vmcb->save.s_cet = data;
> + break;
> + case MSR_IA32_INT_SSP_TAB:
> + svm->vmcb->save.isst_addr = data;
> + break;
> + case MSR_KVM_SSP:
> + svm->vmcb->save.ssp = data;
> + break;
> case MSR_TSC_AUX:
> /*
> * TSC_AUX is usually changed only during boot and never read

Looks good, except that if my complaint about turning the fake 'MSR_KVM_SSP' into
a first class register with an ioctl to load/save it is accepted, then
there will be a new vendor callback to read/write it instead of doing it in svm_get_msr/svm_set_msr.


Besides this,

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2023-11-02 18:16:05

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 2/9] KVM: x86: SVM: Update dump_vmcb with shadow stack save area additions

On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> Add shadow stack VMCB save area fields to dump_vmcb. Only include S_CET,
> SSP, and ISST_ADDR. Since there currently isn't support to decrypt and
> dump the SEV-ES save area, exclude PL0_SSP, PL1_SSP, PL2_SSP, PL3_SSP, and
> U_CET which are only inlcuded in the SEV-ES save area.
>
> Signed-off-by: John Allen <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 6a0d225311bc..e435e4fbadda 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3416,6 +3416,10 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
> "rip:", save->rip, "rflags:", save->rflags);
> pr_err("%-15s %016llx %-13s %016llx\n",
> "rsp:", save->rsp, "rax:", save->rax);
> + pr_err("%-15s %016llx %-13s %016llx\n",
> + "s_cet:", save->s_cet, "ssp:", save->ssp);
> + pr_err("%-15s %016llx\n",
> + "isst_addr:", save->isst_addr);
> pr_err("%-15s %016llx %-13s %016llx\n",
> "star:", save01->star, "lstar:", save01->lstar);
> pr_err("%-15s %016llx %-13s %016llx\n",

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2023-11-02 18:17:13

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 7/9] x86/sev-es: Include XSS value in GHCB CPUID request

On Tue, 2023-10-17 at 20:49 +0200, Borislav Petkov wrote:
> On Tue, Oct 17, 2023 at 01:12:30PM -0500, John Allen wrote:
> > I looked into using __rdmsr in an earlier revision of the patch, but
> > found that it causes a build warning:
> >
> > ld: warning: orphan section `__ex_table' from `arch/x86/boot/compressed/sev.o' being placed in section `__ex_table'
> >
> > This is due to the __ex_table section not being used during
> > decompression boot. Do you know of a way around this?
>
> Yeah, arch/x86/boot/msr.h.
>
> We did those a while ago. I guess they could be moved to
> asm/shared/msr.h and renamed to something that is not a
> "boot_" prefix.
>
> Something like
>
> rdmsr_without_any_exception_handling_and_other_gunk_don_t_you_even_think_of_adding()
>
> ...
>
> But srsly:
>
> raw_rdmsr()
> raw_wrmsr()
>
> should be good, as tglx suggests offlist.
>
> You can do that in one patch and prepend your set with it.
>
> Thx.
>


Assuming that we will actually allow the guest to read the IA32_XSS, then it is correct,
but otherwise we will need to return some cached value of IA32_XSS,
the same as the guest wrote last time.

Or another option: KVM can cache the last value that the guest wrote to IA32_XSS (I assume that
the guest can write msrs by sharing the address and value via GHCB), and then use it
when it constructs the CPUID.

Best regards,
Maxim Levitsky



2023-11-02 23:30:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > }
> > +
> > + if (kvm_caps.supported_xss)
> > + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
>
> This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> that guest must not be able to access.
>
> AMD might not yet have such msrs, but on Intel side I do see various components
> like 'HDC State', 'HWP state' and such.

The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
that the guest can access. So, in theory, if/when AMD adds more XCR0/XSS-based
features, that state will also be context switched.

Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
just horrific.

> I understand that this is needed so that #VC handler could read this msr, and
> trying to read it will cause another #VC which is probably not allowed (I
> don't know this detail of SEV-ES)
>
> I guess #VC handler should instead use a kernel cached value of this msr
> instead, or at least KVM should only allow reads and not writes to it.

Nope, doesn't work. In addition to automatically context switching state, SEV-ES
also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
for the guest, because KVM *can't* load the guest's desired value into hardware.

The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
or XCR0, and there's not a damn thing KVM can do to service the request.

2023-11-06 16:46:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 3/9] KVM: x86: SVM: Pass through shadow stack MSRs

On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> On Wed, 2023-10-18 at 16:57 +0530, Nikunj A. Dadhania wrote:
> > On 10/17/2023 11:47 PM, John Allen wrote:
> > In that case, intercept should be cleared from the very beginning.
> >
> > + { .index = MSR_IA32_PL0_SSP, .always = true },
> > + { .index = MSR_IA32_PL1_SSP, .always = true },
> > + { .index = MSR_IA32_PL2_SSP, .always = true },
> > + { .index = MSR_IA32_PL3_SSP, .always = true },
>
> .always is only true when a MSR is *always* passed through. CET msrs are only
> passed through when CET is supported.
>
> Therefore I don't expect that we ever add another msr to this list which has
> .always = true.
>
> In fact the .always = True for X86_64 arch msrs like MSR_GS_BASE/MSR_FS_BASE
> and such is not 100% correct too - when we start a VM which doesn't have
> cpuid bit X86_FEATURE_LM, these msrs should not exist and I think that we
> have a kvm unit test that fails because of this on 32 bit but I didn't bother
> yet to fix it.
>
> .always probably needs to be dropped completely.

FWIW, I have a half-baked series to clean up SVM's MSR interception code and
converge the SVM and VMX APIs. E.g. set_msr_interception_bitmap()'s inverted
polarity confuses me every time I look at its usage.

I can hunt down the branch if someone plans on tackling this code.

2024-02-15 17:39:53

by John Allen

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

On Tue, Nov 07, 2023 at 08:20:52PM +0200, Maxim Levitsky wrote:
> On Thu, 2023-11-02 at 16:22 -0700, Sean Christopherson wrote:
> > On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> > > On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > > > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > > > if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > > > svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > > > }
> > > > +
> > > > + if (kvm_caps.supported_xss)
> > > > + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> > >
> > > This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> > > to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> > > that guest must not be able to access.
> > >
> > > AMD might not yet have such msrs, but on Intel side I do see various components
> > > like 'HDC State', 'HWP state' and such.
> >
> > The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
> > that the guest can access. So, in theory, if/when AMD adds more XCR0/XSS-based
> > features, that state will also be context switched.
> >
> > Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
> > just horrific.
> >
> > > I understand that this is needed so that #VC handler could read this msr, and
> > > trying to read it will cause another #VC which is probably not allowed (I
> > > don't know this detail of SEV-ES)
> > >
> > > I guess #VC handler should instead use a kernel cached value of this msr
> > > instead, or at least KVM should only allow reads and not writes to it.
> >
> > Nope, doesn't work. In addition to automatically context switching state, SEV-ES
> > also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
> > for the guest, because KVM *can't* load the guest's desired value into hardware.
> >
> > The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
> > or XCR0, and there's not a damn thing KVM can do to service the request.
> >
>
> Ah, I understand now. Everything makes sense, and yes, this is really ugly.

Hi Maxim and Sean,

It looks as though there are some broad changes that will need to happen
over the long term WRT to SEV-ES/SEV-SNP. In the short term, how would
you suggest I proceed with the SVM shstk series? Can we omit the SEV-ES
changes for now with an additional patch that disallows guest shstk when
SEV-ES is enabled? Subsequently, when we have a proper solution for the
concerns discussed here, we could submit another series for SEV-ES
support.

Thanks,
John

2024-02-20 16:23:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

On Thu, Feb 15, 2024, John Allen wrote:
> On Tue, Nov 07, 2023 at 08:20:52PM +0200, Maxim Levitsky wrote:
> > On Thu, 2023-11-02 at 16:22 -0700, Sean Christopherson wrote:
> > > On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> > > > On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > > > > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > > > > if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > > > > svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > > > > }
> > > > > +
> > > > > + if (kvm_caps.supported_xss)
> > > > > + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> > > >
> > > > This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> > > > to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> > > > that guest must not be able to access.
> > > >
> > > > AMD might not yet have such msrs, but on Intel side I do see various components
> > > > like 'HDC State', 'HWP state' and such.
> > >
> > > The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
> > > that the guest can access. So, in theory, if/when AMD adds more XCR0/XSS-based
> > > features, that state will also be context switched.
> > >
> > > Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
> > > just horrific.
> > >
> > > > I understand that this is needed so that #VC handler could read this msr, and
> > > > trying to read it will cause another #VC which is probably not allowed (I
> > > > don't know this detail of SEV-ES)
> > > >
> > > > I guess #VC handler should instead use a kernel cached value of this msr
> > > > instead, or at least KVM should only allow reads and not writes to it.
> > >
> > > Nope, doesn't work. In addition to automatically context switching state, SEV-ES
> > > also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
> > > for the guest, because KVM *can't* load the guest's desired value into hardware.
> > >
> > > The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
> > > or XCR0, and there's not a damn thing KVM can do to service the request.
> > >
> >
> > Ah, I understand now. Everything makes sense, and yes, this is really ugly.
>
> Hi Maxim and Sean,
>
> It looks as though there are some broad changes that will need to happen
> over the long term WRT to SEV-ES/SEV-SNP. In the short term, how would
> you suggest I proceed with the SVM shstk series? Can we omit the SEV-ES
> changes for now with an additional patch that disallows guest shstk when
> SEV-ES is enabled? Subsequently, when we have a proper solution for the
> concerns discussed here, we could submit another series for SEV-ES
> support.

The SEV-ES mess was already addressed by commit a26b7cd22546 ("KVM: SEV: Do not
intercept accesses to MSR_IA32_XSS for SEV-ES guests"). Or is there more that's
needed for shadow stacks?

2024-02-20 16:41:45

by John Allen

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

On Tue, Feb 20, 2024 at 08:20:36AM -0800, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, John Allen wrote:
> > On Tue, Nov 07, 2023 at 08:20:52PM +0200, Maxim Levitsky wrote:
> > > On Thu, 2023-11-02 at 16:22 -0700, Sean Christopherson wrote:
> > > > On Thu, Nov 02, 2023, Maxim Levitsky wrote:
> > > > > On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > > > > > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > > > > > if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > > > > > svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > > > > > }
> > > > > > +
> > > > > > + if (kvm_caps.supported_xss)
> > > > > > + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
> > > > >
> > > > > This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> > > > > to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> > > > > that guest must not be able to access.
> > > > >
> > > > > AMD might not yet have such msrs, but on Intel side I do see various components
> > > > > like 'HDC State', 'HWP state' and such.
> > > >
> > > > The approach AMD has taken with SEV-ES+ is to have ucode context switch everything
> > > > that the guest can access. So, in theory, if/when AMD adds more XCR0/XSS-based
> > > > features, that state will also be context switched.
> > > >
> > > > Don't get me wrong, I hate this with a passion, but it's not *quite* fatally unsafe,
> > > > just horrific.
> > > >
> > > > > I understand that this is needed so that #VC handler could read this msr, and
> > > > > trying to read it will cause another #VC which is probably not allowed (I
> > > > > don't know this detail of SEV-ES)
> > > > >
> > > > > I guess #VC handler should instead use a kernel cached value of this msr
> > > > > instead, or at least KVM should only allow reads and not writes to it.
> > > >
> > > > Nope, doesn't work. In addition to automatically context switching state, SEV-ES
> > > > also encrypts the guest state, i.e. KVM *can't* correctly virtualize XSS (or XCR0)
> > > > for the guest, because KVM *can't* load the guest's desired value into hardware.
> > > >
> > > > The guest can do #VMGEXIT (a.k.a. VMMCALL) all it wants to request a certain XSS
> > > > or XCR0, and there's not a damn thing KVM can do to service the request.
> > > >
> > >
> > > Ah, I understand now. Everything makes sense, and yes, this is really ugly.
> >
> > Hi Maxim and Sean,
> >
> > It looks as though there are some broad changes that will need to happen
> > over the long term WRT to SEV-ES/SEV-SNP. In the short term, how would
> > you suggest I proceed with the SVM shstk series? Can we omit the SEV-ES
> > changes for now with an additional patch that disallows guest shstk when
> > SEV-ES is enabled? Subsequently, when we have a proper solution for the
> > concerns discussed here, we could submit another series for SEV-ES
> > support.
>
> The SEV-ES mess was already addressed by commit a26b7cd22546 ("KVM: SEV: Do not
> intercept accesses to MSR_IA32_XSS for SEV-ES guests"). Or is there more that's
> needed for shadow stacks?

Ah, yes, you are right. That patch should address the controversial
change discussed above at least. Patch 5/9 and 7/9 of this series also
address different SEV-ES issues and will still need to included.

Thanks,
John

2024-02-21 16:38:38

by John Allen

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: SVM: Add MSR_IA32_XSS to the GHCB for hypervisor kernel

On Thu, Nov 02, 2023 at 08:10:58PM +0200, Maxim Levitsky wrote:
> On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > When a guest issues a cpuid instruction for Fn0000000D_x0B
> > (CetUserOffset), KVM will intercept and need to access the guest
> > MSR_IA32_XSS value. For SEV-ES, this is encrypted and needs to be
> > included in the GHCB to be visible to the hypervisor.
> >
> > Signed-off-by: John Allen <[email protected]>
> > ---
> > arch/x86/include/asm/svm.h | 1 +
> > arch/x86/kvm/svm/sev.c | 12 ++++++++++--
> > arch/x86/kvm/svm/svm.c | 1 +
> > arch/x86/kvm/svm/svm.h | 3 ++-
> > 4 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> > index 568d97084e44..5afc9e03379d 100644
> > --- a/arch/x86/include/asm/svm.h
> > +++ b/arch/x86/include/asm/svm.h
> > @@ -678,5 +678,6 @@ DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
> > DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
> > DEFINE_GHCB_ACCESSORS(sw_scratch)
> > DEFINE_GHCB_ACCESSORS(xcr0)
> > +DEFINE_GHCB_ACCESSORS(xss)
>
> I don't see anywhere in the patch adding xss to ghcb_save_area.
> What kernel version/commit these patches are based on?

Looks like it first got added to the vmcb save area here:
861377730aa9db4cbaa0f3bd3f4d295c152732c4

It was included in the ghcb save area when it was created it looks
like:
a4690359eaec985a1351786da887df1ba92440a0

Unless I'm misunderstanding the ask. Is there somewhere else this needs
to be added?

Thanks,
John

>
> >
> > #endif
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index bb4b18baa6f7..94ab7203525f 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -2445,8 +2445,13 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >
> > svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
> >
> > - if (kvm_ghcb_xcr0_is_valid(svm)) {
> > - vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> > + if (kvm_ghcb_xcr0_is_valid(svm) || kvm_ghcb_xss_is_valid(svm)) {
> > + if (kvm_ghcb_xcr0_is_valid(svm))
> > + vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> > +
> > + if (kvm_ghcb_xss_is_valid(svm))
> > + vcpu->arch.ia32_xss = ghcb_get_xss(ghcb);
> > +
> > kvm_update_cpuid_runtime(vcpu);
> > }
> >
> > @@ -3032,6 +3037,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> > if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
> > svm_clr_intercept(svm, INTERCEPT_RDTSCP);
> > }
> > +
> > + if (kvm_caps.supported_xss)
> > + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
>
> This is not just a virtualization hole. This allows the guest to set MSR_IA32_XSS
> to whatever value it wants, and thus it might allow XSAVES to access some host msrs
> that guest must not be able to access.
>
> AMD might not yet have such msrs, but on Intel side I do see various components
> like 'HDC State', 'HWP state' and such.
>
> I understand that this is needed so that #VC handler could read this msr, and trying
> to read it will cause another #VC which is probably not allowed (I don't know this detail of SEV-ES)
>
> I guess #VC handler should instead use a kernel cached value of this msr instead, or at least
> KVM should only allow reads and not writes to it.
>
> In addition to that, if we decide to open the read access to the IA32_XSS from the guest,
> this IMHO should be done in a separate patch.
>
> Best regards,
> Maxim Levitsky
>
>
> > }
> >
> > void sev_init_vmcb(struct vcpu_svm *svm)
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 984e89d7a734..ee7c7d0a09ab 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -146,6 +146,7 @@ static const struct svm_direct_access_msrs {
> > { .index = MSR_IA32_PL1_SSP, .always = false },
> > { .index = MSR_IA32_PL2_SSP, .always = false },
> > { .index = MSR_IA32_PL3_SSP, .always = false },
> > + { .index = MSR_IA32_XSS, .always = false },
> > { .index = MSR_INVALID, .always = false },
> > };
> >
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index bdc39003b955..2011456d2e9f 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -30,7 +30,7 @@
> > #define IOPM_SIZE PAGE_SIZE * 3
> > #define MSRPM_SIZE PAGE_SIZE * 2
> >
> > -#define MAX_DIRECT_ACCESS_MSRS 53
> > +#define MAX_DIRECT_ACCESS_MSRS 54
> > #define MSRPM_OFFSETS 32
> > extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> > extern bool npt_enabled;
> > @@ -720,5 +720,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
> > DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
> > DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
> > DEFINE_KVM_GHCB_ACCESSORS(xcr0)
> > +DEFINE_KVM_GHCB_ACCESSORS(xss)
> >
> > #endif
>
>
>
>

2024-02-26 17:40:56

by John Allen

[permalink] [raw]
Subject: Re: [PATCH 5/9] KVM: SVM: Save shadow stack host state on VMRUN

On Thu, Nov 02, 2023 at 08:07:20PM +0200, Maxim Levitsky wrote:
> On Tue, 2023-10-10 at 20:02 +0000, John Allen wrote:
> > When running as an SEV-ES guest, the PL0_SSP, PL1_SSP, PL2_SSP, PL3_SSP,
> > and U_CET fields in the VMCB save area are type B, meaning the host
> > state is automatically loaded on a VMEXIT, but is not saved on a VMRUN.
> > The other shadow stack MSRs, S_CET, SSP, and ISST_ADDR are type A,
> > meaning they are loaded on VMEXIT and saved on VMRUN. PL0_SSP, PL1_SSP,
> > and PL2_SSP are currently unused. Manually save the other type B host
> > MSR values before VMRUN.
> >
> > Signed-off-by: John Allen <[email protected]>
> > ---
> > arch/x86/kvm/svm/sev.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index b9a0a939d59f..bb4b18baa6f7 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3098,6 +3098,15 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> > hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
> > hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
> > }
> > +
> > + if (boot_cpu_has(X86_FEATURE_SHSTK)) {
> > + /*
> > + * MSR_IA32_U_CET and MSR_IA32_PL3_SSP are restored on VMEXIT,
> > + * save the current host values.
> > + */
> > + rdmsrl(MSR_IA32_U_CET, hostsa->u_cet);
> > + rdmsrl(MSR_IA32_PL3_SSP, hostsa->pl3_ssp);
> > + }
> > }
>
>
> Do we actually need this patch?
>
> Host FPU state is not encrypted, and as far as I understood from the common CET patch series,
> that on return to userspace these msrs will be restored.

Hi Maxim,

I think you're right on this. My next version omits the patch and
testing seems to confirm that it's not needed.

>
> Best regards,
> Maxim Levitsky
>
>
> PS: AMD's APM is silent on how 'S_CET, SSP, and ISST_ADDR' are saved/restored for non encrypted guests.
> Are they also type A?
>
> Can the VMSA table be trusted in general to provide the same swap type as for the non encrypted guests?

From what I gather, for a non-SEV-ES guest using the save area that is part
of the VMCB as opposed to the separate VMCB/VMSA for SEV-ES and SEV-SNP,
anything listed in that save area will effectively be swap type A. Does that
answer your question?

Thanks,
John

>
>
> >
> > void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>
>