2021-10-20 12:45:50

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v5 0/6] KVM: SVM: Add initial GHCB protocol version 2 support

From: Joerg Roedel <[email protected]>

Hi,

here is a small set of patches which I took from the pending SEV-SNP
patch-sets to enable basic support for GHCB protocol version 2.

When SEV-SNP is not supported, only two new MSR protocol VMGEXIT calls
are required:

- MSR-based AP-reset-hold
- MSR-based HV-feature-request

These calls are implemented here and then the protocol is lifted to
version 2.

This is submitted separately because the MSR-based AP-reset-hold call
is required to support kexec/kdump in SEV-ES guests.

The previous version can be found here:

https://lore.kernel.org/lkml/[email protected]/

Regards,

Joerg

Changes v4->v5:

- Removed stable SoB from patch 1
- Moved kvm_emulate_ap_reset_hold() and all related code in
later patches to SVM specific code, as suggested by Sean.

Brijesh Singh (2):
KVM: SVM: Add support for Hypervisor Feature support MSR protocol
KVM: SVM: Increase supported GHCB protocol version

Joerg Roedel (2):
KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions
KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code

Sean Christopherson (1):
KVM: SVM: Add helper to generate GHCB MSR verson info, and drop macro

Tom Lendacky (1):
KVM: SVM: Add support to handle AP reset MSR protocol

arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/include/asm/sev-common.h | 14 +--
arch/x86/include/uapi/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 141 +++++++++++++++++++++---------
arch/x86/kvm/svm/svm.h | 13 +--
arch/x86/kvm/x86.c | 11 +--
6 files changed, 120 insertions(+), 63 deletions(-)


base-commit: 73f122c4f06f650ddf7f7410d8510db1a92a16a0
--
2.33.1


2021-10-20 12:45:57

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v5 2/6] KVM: SVM: Add helper to generate GHCB MSR verson info, and drop macro

From: Sean Christopherson <[email protected]>

Convert the GHCB_MSR_SEV_INFO macro into a helper function, and have the
helper hardcode the min/max versions instead of relying on the caller to
do the same. Under no circumstance should different pieces of KVM define
different min/max versions.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/sev-common.h | 5 -----
arch/x86/kvm/svm/sev.c | 24 ++++++++++++++++++------
arch/x86/kvm/svm/svm.h | 5 -----
3 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 8540972cad04..886c36f0cb16 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -24,11 +24,6 @@
#define GHCB_MSR_VER_MIN_MASK 0xffff
#define GHCB_MSR_CBIT_POS 24
#define GHCB_MSR_CBIT_MASK 0xff
-#define GHCB_MSR_SEV_INFO(_max, _min, _cbit) \
- ((((_max) & GHCB_MSR_VER_MAX_MASK) << GHCB_MSR_VER_MAX_POS) | \
- (((_min) & GHCB_MSR_VER_MIN_MASK) << GHCB_MSR_VER_MIN_POS) | \
- (((_cbit) & GHCB_MSR_CBIT_MASK) << GHCB_MSR_CBIT_POS) | \
- GHCB_MSR_SEV_INFO_RESP)
#define GHCB_MSR_INFO(v) ((v) & 0xfffUL)
#define GHCB_MSR_PROTO_MAX(v) (((v) >> GHCB_MSR_VER_MAX_POS) & GHCB_MSR_VER_MAX_MASK)
#define GHCB_MSR_PROTO_MIN(v) (((v) >> GHCB_MSR_VER_MIN_POS) & GHCB_MSR_VER_MIN_MASK)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2e3ecab6f1c8..9fb4d8fad1f4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2389,6 +2389,22 @@ static u64 ghcb_msr_cpuid_resp(u64 reg, u64 value)
return msr;
}

+/* The min/max GHCB version supported by KVM. */
+#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MIN 1ULL
+
+static u64 ghcb_msr_version_info(void)
+{
+ u64 msr;
+
+ msr = GHCB_MSR_SEV_INFO_RESP;
+ msr |= GHCB_VERSION_MAX << GHCB_MSR_VER_MAX_POS;
+ msr |= GHCB_VERSION_MIN << GHCB_MSR_VER_MIN_POS;
+ msr |= sev_enc_bit << GHCB_MSR_CBIT_POS;
+
+ return msr;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2403,9 +2419,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)

switch (ghcb_info) {
case GHCB_MSR_SEV_INFO_REQ:
- svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
- GHCB_VERSION_MIN,
- sev_enc_bit);
+ control->ghcb_gpa = ghcb_msr_version_info();
break;
case GHCB_MSR_CPUID_REQ: {
u64 cpuid_fn, cpuid_reg, cpuid_value;
@@ -2621,9 +2635,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
* Set the GHCB MSR value as per the GHCB specification when emulating
* vCPU RESET for an SEV-ES guest.
*/
- svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
- GHCB_VERSION_MIN,
- sev_enc_bit);
+ svm->vmcb->control.ghcb_gpa = ghcb_msr_version_info();
}

void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0d7bbe548ac3..68e5f16a0554 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -544,11 +544,6 @@ void svm_vcpu_blocking(struct kvm_vcpu *vcpu);
void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);

/* sev.c */
-
-#define GHCB_VERSION_MAX 1ULL
-#define GHCB_VERSION_MIN 1ULL
-
-
extern unsigned int max_sev_asid;

void sev_vm_destroy(struct kvm *kvm);
--
2.33.1

2021-10-20 12:46:01

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v5 1/6] KVM: SVM: Get rid of set_ghcb_msr() and *ghcb_msr_bits() functions

From: Joerg Roedel <[email protected]>

Replace the get_ghcb_msr_bits() function with macros and open code the
GHCB MSR setters with hypercall specific helper macros and functions.
This will avoid preserving any previous bits in the GHCB-MSR and
improves code readability.

Also get rid of the set_ghcb_msr() function and open-code it at its
call-sites for better code readability.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/sev-common.h | 9 +++++
arch/x86/kvm/svm/sev.c | 56 +++++++++++--------------------
2 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
index 2cef6c5a52c2..8540972cad04 100644
--- a/arch/x86/include/asm/sev-common.h
+++ b/arch/x86/include/asm/sev-common.h
@@ -50,6 +50,10 @@
(GHCB_MSR_CPUID_REQ | \
(((unsigned long)reg & GHCB_MSR_CPUID_REG_MASK) << GHCB_MSR_CPUID_REG_POS) | \
(((unsigned long)fn) << GHCB_MSR_CPUID_FUNC_POS))
+#define GHCB_MSR_CPUID_FN(msr) \
+ (((msr) >> GHCB_MSR_CPUID_FUNC_POS) & GHCB_MSR_CPUID_FUNC_MASK)
+#define GHCB_MSR_CPUID_REG(msr) \
+ (((msr) >> GHCB_MSR_CPUID_REG_POS) & GHCB_MSR_CPUID_REG_MASK)

/* AP Reset Hold */
#define GHCB_MSR_AP_RESET_HOLD_REQ 0x006
@@ -67,6 +71,11 @@
#define GHCB_SEV_TERM_REASON(reason_set, reason_val) \
(((((u64)reason_set) & GHCB_MSR_TERM_REASON_SET_MASK) << GHCB_MSR_TERM_REASON_SET_POS) | \
((((u64)reason_val) & GHCB_MSR_TERM_REASON_MASK) << GHCB_MSR_TERM_REASON_POS))
+#define GHCB_MSR_TERM_REASON_SET(msr) \
+ (((msr) >> GHCB_MSR_TERM_REASON_SET_POS) & GHCB_MSR_TERM_REASON_SET_MASK)
+#define GHCB_MSR_TERM_REASON(msr) \
+ (((msr) >> GHCB_MSR_TERM_REASON_POS) & GHCB_MSR_TERM_REASON_MASK)
+

#define GHCB_SEV_ES_REASON_GENERAL_REQUEST 0
#define GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED 1
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3e2769855e51..2e3ecab6f1c8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2378,21 +2378,15 @@ static bool setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
return true;
}

-static void set_ghcb_msr_bits(struct vcpu_svm *svm, u64 value, u64 mask,
- unsigned int pos)
+static u64 ghcb_msr_cpuid_resp(u64 reg, u64 value)
{
- svm->vmcb->control.ghcb_gpa &= ~(mask << pos);
- svm->vmcb->control.ghcb_gpa |= (value & mask) << pos;
-}
+ u64 msr;

-static u64 get_ghcb_msr_bits(struct vcpu_svm *svm, u64 mask, unsigned int pos)
-{
- return (svm->vmcb->control.ghcb_gpa >> pos) & mask;
-}
+ msr = GHCB_MSR_CPUID_RESP;
+ msr |= (reg & GHCB_MSR_CPUID_REG_MASK) << GHCB_MSR_CPUID_REG_POS;
+ msr |= (value & GHCB_MSR_CPUID_VALUE_MASK) << GHCB_MSR_CPUID_VALUE_POS;

-static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
-{
- svm->vmcb->control.ghcb_gpa = value;
+ return msr;
}

static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
@@ -2409,16 +2403,14 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)

switch (ghcb_info) {
case GHCB_MSR_SEV_INFO_REQ:
- set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
- GHCB_VERSION_MIN,
- sev_enc_bit));
+ svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+ GHCB_VERSION_MIN,
+ sev_enc_bit);
break;
case GHCB_MSR_CPUID_REQ: {
u64 cpuid_fn, cpuid_reg, cpuid_value;

- cpuid_fn = get_ghcb_msr_bits(svm,
- GHCB_MSR_CPUID_FUNC_MASK,
- GHCB_MSR_CPUID_FUNC_POS);
+ cpuid_fn = GHCB_MSR_CPUID_FN(control->ghcb_gpa);

/* Initialize the registers needed by the CPUID intercept */
vcpu->arch.regs[VCPU_REGS_RAX] = cpuid_fn;
@@ -2430,9 +2422,8 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
break;
}

- cpuid_reg = get_ghcb_msr_bits(svm,
- GHCB_MSR_CPUID_REG_MASK,
- GHCB_MSR_CPUID_REG_POS);
+ cpuid_reg = GHCB_MSR_CPUID_REG(control->ghcb_gpa);
+
if (cpuid_reg == 0)
cpuid_value = vcpu->arch.regs[VCPU_REGS_RAX];
else if (cpuid_reg == 1)
@@ -2442,26 +2433,19 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
else
cpuid_value = vcpu->arch.regs[VCPU_REGS_RDX];

- set_ghcb_msr_bits(svm, cpuid_value,
- GHCB_MSR_CPUID_VALUE_MASK,
- GHCB_MSR_CPUID_VALUE_POS);
+ svm->vmcb->control.ghcb_gpa = ghcb_msr_cpuid_resp(cpuid_reg, cpuid_value);

- set_ghcb_msr_bits(svm, GHCB_MSR_CPUID_RESP,
- GHCB_MSR_INFO_MASK,
- GHCB_MSR_INFO_POS);
break;
}
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

- reason_set = get_ghcb_msr_bits(svm,
- GHCB_MSR_TERM_REASON_SET_MASK,
- GHCB_MSR_TERM_REASON_SET_POS);
- reason_code = get_ghcb_msr_bits(svm,
- GHCB_MSR_TERM_REASON_MASK,
- GHCB_MSR_TERM_REASON_POS);
+ reason_set = GHCB_MSR_TERM_REASON_SET(control->ghcb_gpa);
+ reason_code = GHCB_MSR_TERM_REASON(control->ghcb_gpa);
+
pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
reason_set, reason_code);
+
fallthrough;
}
default:
@@ -2637,9 +2621,9 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
* Set the GHCB MSR value as per the GHCB specification when emulating
* vCPU RESET for an SEV-ES guest.
*/
- set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
- GHCB_VERSION_MIN,
- sev_enc_bit));
+ svm->vmcb->control.ghcb_gpa = GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+ GHCB_VERSION_MIN,
+ sev_enc_bit);
}

void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu)
--
2.33.1

2021-10-20 12:46:53

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v5 5/6] KVM: SVM: Add support for Hypervisor Feature support MSR protocol

From: Brijesh Singh <[email protected]>

Version 2 of the GHCB specification introduced advertisement of
supported Hypervisor SEV features. This request is required to support
a the GHCB version 2 protocol.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/uapi/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index efa969325ede..67cf153fe580 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
#define SVM_VMGEXIT_AP_JUMP_TABLE 0x80000005
#define SVM_VMGEXIT_SET_AP_JUMP_TABLE 0
#define SVM_VMGEXIT_GET_AP_JUMP_TABLE 1
+#define SVM_VMGEXIT_HYPERVISOR_FEATURES 0x8000fffd
#define SVM_VMGEXIT_UNSUPPORTED_EVENT 0x8000ffff

/* Exit code reserved for hypervisor/software use */
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 10af4ac83971..0c673291e905 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2216,6 +2216,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+ case SVM_VMGEXIT_HYPERVISOR_FEATURES:
break;
default:
goto vmgexit_err;
@@ -2423,6 +2424,20 @@ static u64 ghcb_msr_ap_rst_resp(u64 value)
return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
}

+/* Hypervisor GHCB Features supported by KVM */
+#define KVM_SUPPORTED_GHCB_HV_FEATURES 0UL
+
+static u64 ghcb_msr_hv_feat_resp(void)
+{
+ u64 msr;
+
+ msr = GHCB_MSR_HV_FT_RESP;
+ msr |= (KVM_SUPPORTED_GHCB_HV_FEATURES << GHCB_DATA_LOW);
+
+ return msr;
+}
+
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2478,6 +2493,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
*/
svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);

+ break;
+ case GHCB_MSR_HV_FT_REQ:
+ svm->vmcb->control.ghcb_gpa = ghcb_msr_hv_feat_resp();
break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
@@ -2591,6 +2609,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = 1;
break;
}
+ case SVM_VMGEXIT_HYPERVISOR_FEATURES:
+ ghcb_set_sw_exit_info_2(ghcb, KVM_SUPPORTED_GHCB_HV_FEATURES);
+
+ ret = 1;
+ break;
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, exit_info_2=%#llx\n",
--
2.33.1

2021-10-20 12:47:26

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code

From: Joerg Roedel <[email protected]>

The function is only used by the kvm-amd module. Move it to the AMD
specific part of the code and name it sev_emulate_ap_reset_hold().

Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/sev.c | 10 +++++++++-
arch/x86/kvm/x86.c | 11 ++---------
3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 80f4b8a9233c..b67f550616cf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1682,8 +1682,8 @@ int kvm_emulate_monitor(struct kvm_vcpu *vcpu);
int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in);
int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
int kvm_emulate_halt(struct kvm_vcpu *vcpu);
+int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason);
int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
-int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);

void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9fb4d8fad1f4..9afa71cb36e6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2405,6 +2405,14 @@ static u64 ghcb_msr_version_info(void)
return msr;
}

+static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
+{
+ int ret = kvm_skip_emulated_instruction(&svm->vcpu);
+
+ return __kvm_vcpu_halt(&svm->vcpu,
+ KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2536,7 +2544,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
- ret = kvm_emulate_ap_reset_hold(vcpu);
+ ret = sev_emulate_ap_reset_hold(svm);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c59b63c56af9..cc3a65d6821d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8651,7 +8651,7 @@ void kvm_arch_exit(void)
#endif
}

-static int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
+int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
{
++vcpu->stat.halt_exits;
if (lapic_in_kernel(vcpu)) {
@@ -8662,6 +8662,7 @@ static int __kvm_vcpu_halt(struct kvm_vcpu *vcpu, int state, int reason)
return 0;
}
}
+EXPORT_SYMBOL_GPL(__kvm_vcpu_halt);

int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
{
@@ -8680,14 +8681,6 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_emulate_halt);

-int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
-{
- int ret = kvm_skip_emulated_instruction(vcpu);
-
- return __kvm_vcpu_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
-}
-EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
-
#ifdef CONFIG_X86_64
static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
unsigned long clock_type)
--
2.33.1

2021-10-20 12:47:42

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v5 6/6] KVM: SVM: Increase supported GHCB protocol version

From: Brijesh Singh <[email protected]>

Now that KVM has basic support for version 2 of the GHCB specification,
bump the maximum supported protocol version. The SNP specific functions
are still missing, but those are only required when the Hypervisor
supports running SNP guests.

Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kvm/svm/sev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0c673291e905..3ad69f4f0ed6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2394,7 +2394,7 @@ static u64 ghcb_msr_cpuid_resp(u64 reg, u64 value)
}

/* The min/max GHCB version supported by KVM. */
-#define GHCB_VERSION_MAX 1ULL
+#define GHCB_VERSION_MAX 2ULL
#define GHCB_VERSION_MIN 1ULL

static u64 ghcb_msr_version_info(void)
--
2.33.1

2021-10-20 12:48:03

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol

From: Tom Lendacky <[email protected]>

Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
available in version 2 of the GHCB specification.

Signed-off-by: Tom Lendacky <[email protected]>
Signed-off-by: Brijesh Singh <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/svm/sev.c | 52 ++++++++++++++++++++++++++-------
arch/x86/kvm/svm/svm.h | 8 +++++
3 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b67f550616cf..5c6b1469cc3b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -237,7 +237,6 @@ enum x86_intercept_stage;
KVM_GUESTDBG_INJECT_DB | \
KVM_GUESTDBG_BLOCKIRQ)

-
#define PFERR_PRESENT_BIT 0
#define PFERR_WRITE_BIT 1
#define PFERR_USER_BIT 2
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9afa71cb36e6..10af4ac83971 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2246,6 +2246,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)

void sev_es_unmap_ghcb(struct vcpu_svm *svm)
{
+ /* Clear any indication that the vCPU is in a type of AP Reset Hold */
+ svm->reset_hold_type = AP_RESET_HOLD_NONE;
+
if (!svm->ghcb)
return;

@@ -2405,14 +2408,21 @@ static u64 ghcb_msr_version_info(void)
return msr;
}

-static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
+static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm, enum ap_reset_hold_type type)
{
int ret = kvm_skip_emulated_instruction(&svm->vcpu);

+ svm->reset_hold_type = type;
+
return __kvm_vcpu_halt(&svm->vcpu,
KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
}

+static u64 ghcb_msr_ap_rst_resp(u64 value)
+{
+ return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
+}
+
static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -2459,6 +2469,16 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)

break;
}
+ case GHCB_MSR_AP_RESET_HOLD_REQ:
+ ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_MSR_PROTO);
+
+ /*
+ * Preset the result to a non-SIPI return and then only set
+ * the result to non-zero when delivering a SIPI.
+ */
+ svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);
+
+ break;
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;

@@ -2544,7 +2564,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
break;
case SVM_VMGEXIT_AP_HLT_LOOP:
- ret = sev_emulate_ap_reset_hold(svm);
+ ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_NAE_EVENT);
break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
@@ -2679,13 +2699,23 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
return;
}

- /*
- * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
- * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
- * non-zero value.
- */
- if (!svm->ghcb)
- return;
-
- ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+ /* Subsequent SIPI */
+ switch (svm->reset_hold_type) {
+ case AP_RESET_HOLD_NAE_EVENT:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
+ */
+ ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+ break;
+ case AP_RESET_HOLD_MSR_PROTO:
+ /*
+ * Return from an AP Reset Hold VMGEXIT, where the guest will
+ * set the CS and RIP. Set GHCB data field to a non-zero value.
+ */
+ svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(1);
+ break;
+ default:
+ break;
+ }
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 68e5f16a0554..bf9379f1cfb8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -69,6 +69,12 @@ enum {
/* TPR and CR2 are always written before VMRUN */
#define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))

+enum ap_reset_hold_type {
+ AP_RESET_HOLD_NONE,
+ AP_RESET_HOLD_NAE_EVENT,
+ AP_RESET_HOLD_MSR_PROTO,
+};
+
struct kvm_sev_info {
bool active; /* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
@@ -199,6 +205,8 @@ struct vcpu_svm {
bool ghcb_sa_free;

bool guest_state_loaded;
+
+ enum ap_reset_hold_type reset_hold_type;
};

struct svm_cpu_data {
--
2.33.1

2021-10-20 14:31:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] KVM: SVM: Move kvm_emulate_ap_reset_hold() to AMD specific code

On Wed, Oct 20, 2021, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> The function is only used by the kvm-amd module. Move it to the AMD
> specific part of the code and name it sev_emulate_ap_reset_hold().
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---

Reviewed-by: Sean Christopherson <[email protected]>

2021-10-20 17:42:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol

On Wed, Oct 20, 2021, Joerg Roedel wrote:
> From: Tom Lendacky <[email protected]>
>
> Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
> available in version 2 of the GHCB specification.

The changelog needs to explain how this is actually supposed to work. Doesn't
need gory details, just a basic explanation of the sequence of events to wake a
vCPU that requested a reset hold.

I apologize in advance for the following rant(s). There's some actionable feedback,
but a lot of it is just me complaining about the reset hold nonsense.

For the actual feedback, attached are two patches: patch 1 eliminates the
"first_sipi_received" hack, patch 2 is a (hopefully) fixed version of this patch
(but doesn't have an updated changelog). Both are compile tested only. There
will be a benign conflict with patch 05 of this series.

> Signed-off-by: Tom Lendacky <[email protected]>
> Signed-off-by: Brijesh Singh <[email protected]>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/svm/sev.c | 52 ++++++++++++++++++++++++++-------
> arch/x86/kvm/svm/svm.h | 8 +++++
> 3 files changed, 49 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b67f550616cf..5c6b1469cc3b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -237,7 +237,6 @@ enum x86_intercept_stage;
> KVM_GUESTDBG_INJECT_DB | \
> KVM_GUESTDBG_BLOCKIRQ)
>
> -

Spurious whitespace change.

> #define PFERR_PRESENT_BIT 0
> #define PFERR_WRITE_BIT 1
> #define PFERR_USER_BIT 2
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 9afa71cb36e6..10af4ac83971 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2246,6 +2246,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>
> void sev_es_unmap_ghcb(struct vcpu_svm *svm)
> {
> + /* Clear any indication that the vCPU is in a type of AP Reset Hold */
> + svm->reset_hold_type = AP_RESET_HOLD_NONE;
> +
> if (!svm->ghcb)
> return;
>
> @@ -2405,14 +2408,21 @@ static u64 ghcb_msr_version_info(void)
> return msr;
> }
>
> -static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
> +static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm, enum ap_reset_hold_type type)
> {
> int ret = kvm_skip_emulated_instruction(&svm->vcpu);
>
> + svm->reset_hold_type = type;
> +
> return __kvm_vcpu_halt(&svm->vcpu,
> KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
> }
>
> +static u64 ghcb_msr_ap_rst_resp(u64 value)
> +{
> + return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
> +}
> +
> static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
> {
> struct vmcb_control_area *control = &svm->vmcb->control;
> @@ -2459,6 +2469,16 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>
> break;
> }
> + case GHCB_MSR_AP_RESET_HOLD_REQ:
> + ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_MSR_PROTO);
> +
> + /*
> + * Preset the result to a non-SIPI return and then only set
> + * the result to non-zero when delivering a SIPI.
> + */
> + svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);

This can race with the SIPI and effectively corrupt svm->vmcb->control.ghcb_gpa.

vCPU0 vCPU1
#VMGEXIT(RESET_HOLD)
__kvm_vcpu_halt()
INIT
SIPI
sev_vcpu_deliver_sipi_vector()
ghcb_msr_ap_rst_resp(1);
ghcb_msr_ap_rst_resp(0);

Note, the "INIT" above is mostly a guess. I'm pretty sure it's necessary because
I don't see how KVM can possibly be correct otherwise. The SIPI handler (below)
quite clearly expects the vCPU to have been in an AP reset hold, but the invocation
of sev_vcpu_deliver_sipi_vector is gated by the vCPU being in
KVM_MP_STATE_INIT_RECEIVED, not KVM_MP_STATE_AP_RESET_HOLD. That implies the BSP
must INIT the AP.

if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
}

But the GHCB 2.0 spec doesn't actually say that; it instead implies that SIPI is
supposed to be able to directly wake the AP.

* When a guest AP reaches its HLT loop (or similar method for parking the AP),
it instead can either:
1. Issue an AP Reset Hold NAE event.
2. Issue the AP Reset Hold Request MSR Protocol
* The hypervisor treats treats either request like the guest issued a HLT
^^^^^^^^^^^^^ ^^^
spec typo ???
instruction and marks the vCPU as halted.
* When the hypervisor receives a SIPI request for the vCPU, it will not update
^^^^^^^^^^^^
any register values and, instead, it will either complete the AP Reset Hold
NAE event or complete the AP Reset Hold MSR protocol
* Mark the vCPU as active, allowing the VMGEXIT to complete.
* Upon return from the VMGEXIT, the AP must transition from its current execution
mode into real mode and begin executing at the reset vector supplied in the SIPI
request.

Piecing things together, my understanding is that the "hold request" really is
intended to be a HLT, but with extra semantics where the host sets sw_exit_info_2
to indicate that the vCPU was woken by INIT-SIPI.

It's probably too late to change the spec, but I'm going to complain anyways.
This is all ridiculously convoluted and completely unnecessary. As pointed out
in the SNP series regarding AP "creation", this can be fully handled in the guest
via a simple mailbox between firmware and kernel. What's really fubar is that
the guest firmware and kernel already have a mailbox! But it's defined by the
GHCB spec instead of e.g. ACPI, and so instead of handling this fully within the
guest, the hypervisor (and PSP to some extent on SNP because of the secrets page!!!)
gets involved.

The complications to support this in the guest firmware are hilarious, e.g. the
guest hasto manually switch from 64-bit mode to Real Mode just so that the kernel
can continue to use a horribly antiquated method for gaining control of APs.

> +
> + break;
> case GHCB_MSR_TERM_REQ: {
> u64 reason_set, reason_code;
>
> @@ -2544,7 +2564,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
> ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
> break;
> case SVM_VMGEXIT_AP_HLT_LOOP:
> - ret = sev_emulate_ap_reset_hold(svm);
> + ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_NAE_EVENT);
> break;
> case SVM_VMGEXIT_AP_JUMP_TABLE: {
> struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
> @@ -2679,13 +2699,23 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)

Tying into above, handling this in SIPI is flawed. For example, if the guest
does INIT-SIPI-SIPI without a reset hold, KVM would incorrect set sw_exit_info_2
on the SIPI. Because this mess requires an INIT, KVM has lost track of whether
the guest was in KVM_MP_STATE_AP_RESET_HOLD and thus can't know if the SIPI
arrived after a reset hold. Looking at KVM, IIUC, this bug is why the hack
"received_first_sipi" exists.

Of course this all begs the question of why there's a "reset hold" concept in
the first place. It's literally HLT with a flag to say "you got INIT-SIPI".
But the guest has to supply and fill a jump table! Just put the flag in the
jump table!!!!

> return;
> }
>
> - /*
> - * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
> - * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
> - * non-zero value.
> - */
> - if (!svm->ghcb)
> - return;
> -
> - ghcb_set_sw_exit_info_2(svm->ghcb, 1);
> + /* Subsequent SIPI */
> + switch (svm->reset_hold_type) {
> + case AP_RESET_HOLD_NAE_EVENT:
> + /*
> + * Return from an AP Reset Hold VMGEXIT, where the guest will
> + * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
> + */
> + ghcb_set_sw_exit_info_2(svm->ghcb, 1);

Doesn't this need to check for a null svm->ghcb?

I also suspect a boolean might make it easier to understand the implications
and also make the whole thing less error prone, e.g.

if (svm->reset_hold_msr_protocol) {

} else if (svm->ghcb) {

}

> + break;
> + case AP_RESET_HOLD_MSR_PROTO:
> + /*
> + * Return from an AP Reset Hold VMGEXIT, where the guest will
> + * set the CS and RIP. Set GHCB data field to a non-zero value.
> + */
> + svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(1);
> + break;
> + default:
> + break;
> + }
> }
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 68e5f16a0554..bf9379f1cfb8 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -69,6 +69,12 @@ enum {
> /* TPR and CR2 are always written before VMRUN */
> #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>
> +enum ap_reset_hold_type {
> + AP_RESET_HOLD_NONE,
> + AP_RESET_HOLD_NAE_EVENT,
> + AP_RESET_HOLD_MSR_PROTO,
> +};
> +
> struct kvm_sev_info {
> bool active; /* SEV enabled guest */
> bool es_active; /* SEV-ES enabled guest */
> @@ -199,6 +205,8 @@ struct vcpu_svm {
> bool ghcb_sa_free;
>
> bool guest_state_loaded;
> +
> + enum ap_reset_hold_type reset_hold_type;
> };
>
> struct svm_cpu_data {
> --
> 2.33.1
>


Attachments:
(No filename) (9.86 kB)
0001-KVM-SVM-Set-released-on-INIT-SIPI-iff-SEV-ES-vCPU-wa.patch (4.84 kB)
0002-KVM-SVM-Add-support-to-handle-AP-reset-MSR-protocol.patch (4.22 kB)
Download all attachments

2021-10-20 18:17:23

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol

On 10/20/21 12:40 PM, Sean Christopherson wrote:
> On Wed, Oct 20, 2021, Joerg Roedel wrote:
>> From: Tom Lendacky <[email protected]>
>>
>> Add support for AP Reset Hold being invoked using the GHCB MSR protocol,
>> available in version 2 of the GHCB specification.
>
> The changelog needs to explain how this is actually supposed to work. Doesn't
> need gory details, just a basic explanation of the sequence of events to wake a
> vCPU that requested a reset hold.
>
> I apologize in advance for the following rant(s). There's some actionable feedback,
> but a lot of it is just me complaining about the reset hold nonsense.
>
> For the actual feedback, attached are two patches: patch 1 eliminates the
> "first_sipi_received" hack, patch 2 is a (hopefully) fixed version of this patch
> (but doesn't have an updated changelog). Both are compile tested only. There
> will be a benign conflict with patch 05 of this series.
>
>> Signed-off-by: Tom Lendacky <[email protected]>
>> Signed-off-by: Brijesh Singh <[email protected]>
>> Signed-off-by: Joerg Roedel <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 -
>> arch/x86/kvm/svm/sev.c | 52 ++++++++++++++++++++++++++-------
>> arch/x86/kvm/svm/svm.h | 8 +++++
>> 3 files changed, 49 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index b67f550616cf..5c6b1469cc3b 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -237,7 +237,6 @@ enum x86_intercept_stage;
>> KVM_GUESTDBG_INJECT_DB | \
>> KVM_GUESTDBG_BLOCKIRQ)
>>
>> -
>
> Spurious whitespace change.
>
>> #define PFERR_PRESENT_BIT 0
>> #define PFERR_WRITE_BIT 1
>> #define PFERR_USER_BIT 2
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 9afa71cb36e6..10af4ac83971 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2246,6 +2246,9 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>>
>> void sev_es_unmap_ghcb(struct vcpu_svm *svm)
>> {
>> + /* Clear any indication that the vCPU is in a type of AP Reset Hold */
>> + svm->reset_hold_type = AP_RESET_HOLD_NONE;
>> +
>> if (!svm->ghcb)
>> return;
>>
>> @@ -2405,14 +2408,21 @@ static u64 ghcb_msr_version_info(void)
>> return msr;
>> }
>>
>> -static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm)
>> +static int sev_emulate_ap_reset_hold(struct vcpu_svm *svm, enum ap_reset_hold_type type)
>> {
>> int ret = kvm_skip_emulated_instruction(&svm->vcpu);
>>
>> + svm->reset_hold_type = type;
>> +
>> return __kvm_vcpu_halt(&svm->vcpu,
>> KVM_MP_STATE_AP_RESET_HOLD, KVM_EXIT_AP_RESET_HOLD) && ret;
>> }
>>
>> +static u64 ghcb_msr_ap_rst_resp(u64 value)
>> +{
>> + return (u64)GHCB_MSR_AP_RESET_HOLD_RESP | (value << GHCB_DATA_LOW);
>> +}
>> +
>> static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>> {
>> struct vmcb_control_area *control = &svm->vmcb->control;
>> @@ -2459,6 +2469,16 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>
>> break;
>> }
>> + case GHCB_MSR_AP_RESET_HOLD_REQ:
>> + ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_MSR_PROTO);
>> +
>> + /*
>> + * Preset the result to a non-SIPI return and then only set
>> + * the result to non-zero when delivering a SIPI.
>> + */
>> + svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(0);
>
> This can race with the SIPI and effectively corrupt svm->vmcb->control.ghcb_gpa.
>
> vCPU0 vCPU1
> #VMGEXIT(RESET_HOLD)
> __kvm_vcpu_halt()
> INIT
> SIPI
> sev_vcpu_deliver_sipi_vector()
> ghcb_msr_ap_rst_resp(1);

This isn't possible. vCPU0 doesn't set vCPU1's GHCB value. vCPU1's GHCB
value is set when vCPU1 handles events in vcpu_enter_guest().

Thanks,
Tom

> ghcb_msr_ap_rst_resp(0);
>
> Note, the "INIT" above is mostly a guess. I'm pretty sure it's necessary because
> I don't see how KVM can possibly be correct otherwise. The SIPI handler (below)
> quite clearly expects the vCPU to have been in an AP reset hold, but the invocation
> of sev_vcpu_deliver_sipi_vector is gated by the vCPU being in
> KVM_MP_STATE_INIT_RECEIVED, not KVM_MP_STATE_AP_RESET_HOLD. That implies the BSP
> must INIT the AP.
>
> if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> /* evaluate pending_events before reading the vector */
> smp_rmb();
> sipi_vector = apic->sipi_vector;
> kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> }
>
> But the GHCB 2.0 spec doesn't actually say that; it instead implies that SIPI is
> supposed to be able to directly wake the AP.
>
> * When a guest AP reaches its HLT loop (or similar method for parking the AP),
> it instead can either:
> 1. Issue an AP Reset Hold NAE event.
> 2. Issue the AP Reset Hold Request MSR Protocol
> * The hypervisor treats treats either request like the guest issued a HLT
> ^^^^^^^^^^^^^ ^^^
> spec typo ???
> instruction and marks the vCPU as halted.
> * When the hypervisor receives a SIPI request for the vCPU, it will not update
> ^^^^^^^^^^^^
> any register values and, instead, it will either complete the AP Reset Hold
> NAE event or complete the AP Reset Hold MSR protocol
> * Mark the vCPU as active, allowing the VMGEXIT to complete.
> * Upon return from the VMGEXIT, the AP must transition from its current execution
> mode into real mode and begin executing at the reset vector supplied in the SIPI
> request.
>
> Piecing things together, my understanding is that the "hold request" really is
> intended to be a HLT, but with extra semantics where the host sets sw_exit_info_2
> to indicate that the vCPU was woken by INIT-SIPI.
>
> It's probably too late to change the spec, but I'm going to complain anyways.
> This is all ridiculously convoluted and completely unnecessary. As pointed out
> in the SNP series regarding AP "creation", this can be fully handled in the guest
> via a simple mailbox between firmware and kernel. What's really fubar is that
> the guest firmware and kernel already have a mailbox! But it's defined by the
> GHCB spec instead of e.g. ACPI, and so instead of handling this fully within the
> guest, the hypervisor (and PSP to some extent on SNP because of the secrets page!!!)
> gets involved.
>
> The complications to support this in the guest firmware are hilarious, e.g. the
> guest hasto manually switch from 64-bit mode to Real Mode just so that the kernel
> can continue to use a horribly antiquated method for gaining control of APs.
>
>> +
>> + break;
>> case GHCB_MSR_TERM_REQ: {
>> u64 reason_set, reason_code;
>>
>> @@ -2544,7 +2564,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
>> ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>> break;
>> case SVM_VMGEXIT_AP_HLT_LOOP:
>> - ret = sev_emulate_ap_reset_hold(svm);
>> + ret = sev_emulate_ap_reset_hold(svm, AP_RESET_HOLD_NAE_EVENT);
>> break;
>> case SVM_VMGEXIT_AP_JUMP_TABLE: {
>> struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
>> @@ -2679,13 +2699,23 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>
> Tying into above, handling this in SIPI is flawed. For example, if the guest
> does INIT-SIPI-SIPI without a reset hold, KVM would incorrect set sw_exit_info_2
> on the SIPI. Because this mess requires an INIT, KVM has lost track of whether
> the guest was in KVM_MP_STATE_AP_RESET_HOLD and thus can't know if the SIPI
> arrived after a reset hold. Looking at KVM, IIUC, this bug is why the hack
> "received_first_sipi" exists.
>
> Of course this all begs the question of why there's a "reset hold" concept in
> the first place. It's literally HLT with a flag to say "you got INIT-SIPI".
> But the guest has to supply and fill a jump table! Just put the flag in the
> jump table!!!!
>
>> return;
>> }
>>
>> - /*
>> - * Subsequent SIPI: Return from an AP Reset Hold VMGEXIT, where
>> - * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>> - * non-zero value.
>> - */
>> - if (!svm->ghcb)
>> - return;
>> -
>> - ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>> + /* Subsequent SIPI */
>> + switch (svm->reset_hold_type) {
>> + case AP_RESET_HOLD_NAE_EVENT:
>> + /*
>> + * Return from an AP Reset Hold VMGEXIT, where the guest will
>> + * set the CS and RIP. Set SW_EXIT_INFO_2 to a non-zero value.
>> + */
>> + ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>
> Doesn't this need to check for a null svm->ghcb?
>
> I also suspect a boolean might make it easier to understand the implications
> and also make the whole thing less error prone, e.g.
>
> if (svm->reset_hold_msr_protocol) {
>
> } else if (svm->ghcb) {
>
> }
>
>> + break;
>> + case AP_RESET_HOLD_MSR_PROTO:
>> + /*
>> + * Return from an AP Reset Hold VMGEXIT, where the guest will
>> + * set the CS and RIP. Set GHCB data field to a non-zero value.
>> + */
>> + svm->vmcb->control.ghcb_gpa = ghcb_msr_ap_rst_resp(1);
>> + break;
>> + default:
>> + break;
>> + }
>> }
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 68e5f16a0554..bf9379f1cfb8 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -69,6 +69,12 @@ enum {
>> /* TPR and CR2 are always written before VMRUN */
>> #define VMCB_ALWAYS_DIRTY_MASK ((1U << VMCB_INTR) | (1U << VMCB_CR2))
>>
>> +enum ap_reset_hold_type {
>> + AP_RESET_HOLD_NONE,
>> + AP_RESET_HOLD_NAE_EVENT,
>> + AP_RESET_HOLD_MSR_PROTO,
>> +};
>> +
>> struct kvm_sev_info {
>> bool active; /* SEV enabled guest */
>> bool es_active; /* SEV-ES enabled guest */
>> @@ -199,6 +205,8 @@ struct vcpu_svm {
>> bool ghcb_sa_free;
>>
>> bool guest_state_loaded;
>> +
>> + enum ap_reset_hold_type reset_hold_type;
>> };
>>
>> struct svm_cpu_data {
>> --
>> 2.33.1
>>

2021-10-20 18:37:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol

On 10/20/21 12:40 PM, Sean Christopherson wrote:
> On Wed, Oct 20, 2021, Joerg Roedel wrote:
>
> Tying into above, handling this in SIPI is flawed. For example, if the guest
> does INIT-SIPI-SIPI without a reset hold, KVM would incorrect set sw_exit_info_2
> on the SIPI. Because this mess requires an INIT, KVM has lost track of whether
> the guest was in KVM_MP_STATE_AP_RESET_HOLD and thus can't know if the SIPI
> arrived after a reset hold. Looking at KVM, IIUC, this bug is why the hack
> "received_first_sipi" exists.

The received_first_sipi is because the APs have to be started the very
first time in the traditional way. The AP can't issue an AP reset hold if
it hasn't started to begin with in which case there would be no GHCB mapped.

After the check to see if the GHCB is mapped was added to
sev_vcpu_deliver_sipi_vector(), the "received_first_sipi" could probably
have been deleted at that point.

Thanks,
Tom

2021-10-20 18:40:24

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] KVM: SVM: Add support to handle AP reset MSR protocol

On Wed, Oct 20, 2021, Tom Lendacky wrote:
> On 10/20/21 12:40 PM, Sean Christopherson wrote:
> > On Wed, Oct 20, 2021, Joerg Roedel wrote:
> > This can race with the SIPI and effectively corrupt svm->vmcb->control.ghcb_gpa.
> >
> > vCPU0 vCPU1
> > #VMGEXIT(RESET_HOLD)
> > __kvm_vcpu_halt()
> > INIT
> > SIPI
> > sev_vcpu_deliver_sipi_vector()
> > ghcb_msr_ap_rst_resp(1);
>
> This isn't possible. vCPU0 doesn't set vCPU1's GHCB value. vCPU1's GHCB
> value is set when vCPU1 handles events in vcpu_enter_guest().

Argh, I was thinking of injecting regular IPIs across vCPUs. In hindsight it
makes sense that INIT and SIPI are handled on the current vCPU, stuffing all that
state from a different vCPU would be needlessly complex.