2023-04-11 12:59:38

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap

This is to use another AMD SEV-ES hardware assisted register swap,
more detail in 5/6. In the process it's been suggested to fix other
things, here is the attempt, with the great help of amders.

The previous conversation is here:
https://lore.kernel.org/r/[email protected]

This is based on sha1
f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".

Please comment. Thanks.



Alexey Kardashevskiy (6):
KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
KVM: SEV-ES: explicitly disable debug
KVM: SVM/SEV/SEV-ES: Rework intercepts
KVM: SEV: Enable data breakpoints in SEV-ES
x86/sev: Do not handle #VC for DR7 read/write

arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/svm.h | 42 ---------------
arch/x86/boot/compressed/sev.c | 2 +-
arch/x86/kernel/sev.c | 6 +++
arch/x86/kvm/svm/sev.c | 54 +++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 48 +++++++++++++++--
7 files changed, 105 insertions(+), 49 deletions(-)

--
2.39.1


2023-04-11 12:59:47

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel v5 1/6] KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header

Static functions set_dr_intercepts() and clr_dr_intercepts() are only
called from SVM so move them to .c.

No functional change intended.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Carlos Bilbao <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Reviewed-by: Santosh Shukla <[email protected]>
---
Changes:
v5:
* new in the series
---
arch/x86/kvm/svm/svm.h | 42 --------------------
arch/x86/kvm/svm/svm.c | 42 ++++++++++++++++++++
2 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 839809972da1..4deec59be71b 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -403,48 +403,6 @@ static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u3
return test_bit(bit, (unsigned long *)&control->intercepts);
}

-static inline void set_dr_intercepts(struct vcpu_svm *svm)
-{
- struct vmcb *vmcb = svm->vmcb01.ptr;
-
- if (!sev_es_guest(svm->vcpu.kvm)) {
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
- }
-
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
-
- recalc_intercepts(svm);
-}
-
-static inline void clr_dr_intercepts(struct vcpu_svm *svm)
-{
- struct vmcb *vmcb = svm->vmcb01.ptr;
-
- vmcb->control.intercepts[INTERCEPT_DR] = 0;
-
- /* DR7 access must remain intercepted for an SEV-ES guest */
- if (sev_es_guest(svm->vcpu.kvm)) {
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
- }
-
- recalc_intercepts(svm);
-}
-
static inline void set_exception_intercept(struct vcpu_svm *svm, u32 bit)
{
struct vmcb *vmcb = svm->vmcb01.ptr;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a1b08359769b..1e1c1eb13392 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -687,6 +687,48 @@ static int svm_cpu_init(int cpu)

}

+static void set_dr_intercepts(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb = svm->vmcb01.ptr;
+
+ if (!sev_es_guest(svm->vcpu.kvm)) {
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
+ }
+
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+
+ recalc_intercepts(svm);
+}
+
+static void clr_dr_intercepts(struct vcpu_svm *svm)
+{
+ struct vmcb *vmcb = svm->vmcb01.ptr;
+
+ vmcb->control.intercepts[INTERCEPT_DR] = 0;
+
+ /* DR7 access must remain intercepted for an SEV-ES guest */
+ if (sev_es_guest(svm->vcpu.kvm)) {
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+ }
+
+ recalc_intercepts(svm);
+}
+
static int direct_access_msr_slot(u32 msr)
{
u32 i;
--
2.39.1

2023-04-11 12:59:58

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel v5 2/6] KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV

Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb() and tries
keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is #GP intercept which init_vmcb() skips setting for SEV guests and
then sev_es_init_vmcb() needlessly clears it.

Remove the SEV check from init_vmcb(). Clear the #GP intercept in
sev_init_vmcb(). SEV-ES will use the SEV setting.

No functional change intended.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Carlos Bilbao <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Reviewed-by: Santosh Shukla <[email protected]>
---
Changes:
v5:
* new in the series
---
arch/x86/kvm/svm/sev.c | 9 ++++++---
arch/x86/kvm/svm/svm.c | 5 ++---
2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c25aeb550cd9..0f4761a57d86 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2968,9 +2968,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
svm_set_intercept(svm, TRAP_CR4_WRITE);
svm_set_intercept(svm, TRAP_CR8_WRITE);

- /* No support for enable_vmware_backdoor */
- clr_exception_intercept(svm, GP_VECTOR);
-
/* Can't intercept XSETBV, HV can't modify XCR0 directly */
svm_clr_intercept(svm, INTERCEPT_XSETBV);

@@ -2996,6 +2993,12 @@ void sev_init_vmcb(struct vcpu_svm *svm)
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
clr_exception_intercept(svm, UD_VECTOR);

+ /*
+ * Don't intercept #GP for SEV guests, e.g. for the VMware backdoor, as
+ * KVM can't decrypt guest memory to decode the faulting instruction.
+ */
+ clr_exception_intercept(svm, GP_VECTOR);
+
if (sev_es_guest(svm->vcpu.kvm))
sev_es_init_vmcb(svm);
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1e1c1eb13392..dc12de325cca 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1253,10 +1253,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
* Guest access to VMware backdoor ports could legitimately
* trigger #GP because of TSS I/O permission bitmap.
* We intercept those #GP and allow access to them anyway
- * as VMware does. Don't intercept #GP for SEV guests as KVM can't
- * decrypt guest memory to decode the faulting instruction.
+ * as VMware does.
*/
- if (enable_vmware_backdoor && !sev_guest(vcpu->kvm))
+ if (enable_vmware_backdoor)
set_exception_intercept(svm, GP_VECTOR);

svm_set_intercept(svm, INTERCEPT_INTR);
--
2.39.1

2023-04-11 13:00:02

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug

SVM/SEV enable debug registers intercepts to skip swapping DRs
on entering/exiting the guest. When the guest is in control of
debug registers (vcpu->guest_debug == 0), there is an optimisation to
reduce the number of context switches: intercepts are cleared and
the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping
on guest enter/exit.

The same code also executes for SEV-ES, however it has no effect as
- it always takes (vcpu->guest_debug == 0) branch;
- KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared;
- vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them
with the values from _encrypted_ VMSA.

Be explicit about SEV-ES not supporting debug:
- return right away from dr_interception() and skip unnecessary processing;
- clear vcpu->guest_debug at SEV-ES' LAUNCH_UPDATE_VMSA if debugging
was already enabled; after that point the generic x86's
KVM_SET_GUEST_DEBUG ioctl disallows enabling debug.

Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on
guest exit) to signify that SEV-ES won't hit that path.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
Changes:
v5:
* new in the series
---
arch/x86/kvm/svm/sev.c | 6 ++++++
arch/x86/kvm/svm/svm.c | 10 +++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0f4761a57d86..b4365622222b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -639,6 +639,12 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
return ret;

vcpu->arch.guest_state_protected = true;
+
+ if (vcpu->guest_debug)
+ pr_warn_ratelimited("guest_debug (%lx) not supported for SEV-ES",
+ vcpu->guest_debug);
+ vcpu->guest_debug = 0;
+
return 0;
}

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dc12de325cca..179952a31d3b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1980,7 +1980,7 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

- if (vcpu->arch.guest_state_protected)
+ if (WARN_ON_ONCE(sev_es_guest(vcpu->kvm)))
return;

get_debugreg(vcpu->arch.db[0], 0);
@@ -2698,6 +2698,14 @@ static int dr_interception(struct kvm_vcpu *vcpu)
unsigned long val;
int err = 0;

+ /*
+ * SEV-ES intercepts DR7 only to disable guest debugging
+ * and the guest issues a VMGEXIT for DR7 write only. KVM cannot
+ * change DR7 (always swapped as type 'A') so return early.
+ */
+ if (sev_es_guest(vcpu->kvm))
+ return 1;
+
if (vcpu->guest_debug == 0) {
/*
* No more DR vmexits; force a reload of the debug registers
--
2.39.1

2023-04-11 13:00:07

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts

Currently SVM setup is done sequentially in
init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb()
and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs.

Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on
set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only.

No functional change intended.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Santosh Shukla <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
Changes:
v5:
* updated the comments
* removed sev_es_guest() checks from set_dr_intercepts()/clr_dr_intercepts()
* removed remaining intercepts from clr_dr_intercepts()
---
arch/x86/kvm/svm/sev.c | 11 ++++++
arch/x86/kvm/svm/svm.c | 37 ++++++++------------
2 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b4365622222b..f0885250252d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2946,6 +2946,7 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)

static void sev_es_init_vmcb(struct vcpu_svm *svm)
{
+ struct vmcb *vmcb = svm->vmcb01.ptr;
struct kvm_vcpu *vcpu = &svm->vcpu;

svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
@@ -2974,6 +2975,16 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
svm_set_intercept(svm, TRAP_CR4_WRITE);
svm_set_intercept(svm, TRAP_CR8_WRITE);

+ /*
+ * DR7 access must remain intercepted for an SEV-ES guest to disallow
+ * the guest kernel enable debugging as otherwise a VM writing to DR7
+ * from the #DB handler may trigger infinite loop of #DB's.
+ */
+ vmcb->control.intercepts[INTERCEPT_DR] = 0;
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+ recalc_intercepts(svm);
+
/* Can't intercept XSETBV, HV can't modify XCR0 directly */
svm_clr_intercept(svm, INTERCEPT_XSETBV);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 179952a31d3b..0271360e8fde 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -691,23 +691,20 @@ static void set_dr_intercepts(struct vcpu_svm *svm)
{
struct vmcb *vmcb = svm->vmcb01.ptr;

- if (!sev_es_guest(svm->vcpu.kvm)) {
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
- }
-
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);

@@ -720,12 +717,6 @@ static void clr_dr_intercepts(struct vcpu_svm *svm)

vmcb->control.intercepts[INTERCEPT_DR] = 0;

- /* DR7 access must remain intercepted for an SEV-ES guest */
- if (sev_es_guest(svm->vcpu.kvm)) {
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
- }
-
recalc_intercepts(svm);
}

--
2.39.1

2023-04-11 13:03:21

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write

With MSR_AMD64_SEV_DEBUG_SWAP enabled, the guest is not expected to
receive a #VC for reads or writes of DR7.

Update the SNP_FEATURES_PRESENT mask with MSR_AMD64_SNP_DEBUG_SWAP so
an SNP guest doesn't gracefully terminate during SNP feature negotiation
if MSR_AMD64_SEV_DEBUG_SWAP is enabled.

Since a guest is not expected to receive a #VC on DR7 accesses when
MSR_AMD64_SEV_DEBUG_SWAP is enabled, return an error from the #VC
handler in this situation.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Carlos Bilbao <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Reviewed-by: Pankaj Gupta <[email protected]>
---
Changes:
v4:
* rebased on top of SNP feature negotiation

v2:
* use new bit definition
---
arch/x86/boot/compressed/sev.c | 2 +-
arch/x86/kernel/sev.c | 6 ++++++
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 014b89c89088..f6123808be42 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -313,7 +313,7 @@ static void enforce_vmpl0(void)
* by the guest kernel. As and when a new feature is implemented in the
* guest kernel, a corresponding bit should be added to the mask.
*/
-#define SNP_FEATURES_PRESENT (0)
+#define SNP_FEATURES_PRESENT MSR_AMD64_SNP_DEBUG_SWAP

void snp_check_features(void)
{
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index b031244d6d2d..a515eb880970 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1620,6 +1620,9 @@ static enum es_result vc_handle_dr7_write(struct ghcb *ghcb,
long val, *reg = vc_insn_get_rm(ctxt);
enum es_result ret;

+ if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP)
+ return ES_VMM_ERROR;
+
if (!reg)
return ES_DECODE_FAILED;

@@ -1657,6 +1660,9 @@ static enum es_result vc_handle_dr7_read(struct ghcb *ghcb,
struct sev_es_runtime_data *data = this_cpu_read(runtime_data);
long *reg = vc_insn_get_rm(ctxt);

+ if (sev_status & MSR_AMD64_SNP_DEBUG_SWAP)
+ return ES_VMM_ERROR;
+
if (!reg)
return ES_DECODE_FAILED;

--
2.39.1

2023-04-11 13:05:27

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

Prior to SEV-ES, KVM saved/restored host debug registers upon switching
to/from a VM. Changing those registers inside a running SEV VM
triggered #VMEXIT to KVM.

SEV-ES added encrypted state (ES) which uses an encrypted page
for the VM state (VMSA). The hardware saves/restores certain registers
on VMRUN/VMEXIT according to a swap type (A, B, C), see
"Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
volume 2.

The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
guests, but a new feature is available, identified via
CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
support for swapping additional debug registers. DR[0-3] and
DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
is set.

Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
("NoNestedDataBp", "Processor ignores nested data breakpoints") is
supported by the SOC as otherwise a malicious SEV-ES guest can set up
data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.

Eliminate DR7 and #DB intercepts as:
- they are not needed when DebugSwap is supported;
- #VC for these intercepts is most likely not supported anyway and
kills the VM.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
Changes:
v8:
* added CPUID's DebugSwap feature
* commit log, comments updated
* redid the whole thing

v4:
* removed sev_es_is_debug_swap_enabled() helper
* made sev_es_debug_swap_enabled (module param) static
* set sev_feature early in sev_es_init_vmcb() and made intercepts
dependend on it vs. module param
* move set_/clr_dr_intercepts to .c

v3:
* rewrote the commit log again
* rebased on tip/master to use recently defined X86_FEATURE_NO_NESTED_DATA_BP
* s/boot_cpu_has/cpu_feature_enabled/

v2:
* debug_swap moved from vcpu to module_param
* rewrote commit log

---
Tested with:
===
int x;
int main(int argc, char *argv[])
{
x = 1;
return 0;
}
===
gcc -g a.c
rsync a.out ruby-954vm:~/
ssh -t ruby-954vm 'gdb -ex "file a.out" -ex "watch x" -ex r'

where ruby-954vm is a VM.

With "/sys/module/kvm_amd/parameters/debug_swap = 0", gdb does not stop
on the watchpoint, with "= 1" - gdb does.
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/kvm/svm/sev.c | 36 ++++++++++++++++++--
3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d9c190cdefa9..3a5eeb178778 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -435,6 +435,7 @@
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
#define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
+#define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */

/* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
#define X86_FEATURE_NO_NESTED_DATA_BP (20*32+ 0) /* "" No Nested Data Breakpoints */
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 770dcf75eaa9..3a422d213010 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -280,6 +280,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
#define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
#define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL

+#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)

struct vmcb_seg {
u16 selector;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f0885250252d..ba12e7962e94 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -22,6 +22,7 @@
#include <asm/pkru.h>
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
+#include <asm/debugreg.h>

#include "mmu.h"
#include "x86.h"
@@ -53,9 +54,14 @@ module_param_named(sev, sev_enabled, bool, 0444);
/* enable/disable SEV-ES support */
static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);
+
+/* enable/disable SEV-ES DebugSwap support */
+static bool sev_es_debug_swap_enabled = true;
+module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
#else
#define sev_enabled false
#define sev_es_enabled false
+#define sev_es_debug_swap_enabled false
#endif /* CONFIG_KVM_AMD_SEV */

static u8 sev_enc_bit;
@@ -605,6 +611,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;

+ if (sev_es_debug_swap_enabled)
+ save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);

@@ -2256,6 +2265,9 @@ void __init sev_hardware_setup(void)
out:
sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
+ if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
+ !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
+ sev_es_debug_swap_enabled = false;
#endif
}

@@ -2976,14 +2988,20 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
svm_set_intercept(svm, TRAP_CR8_WRITE);

/*
+ * Unless DebugSwap (depends on X86_FEATURE_NO_NESTED_DATA_BP) is enabled,
* DR7 access must remain intercepted for an SEV-ES guest to disallow
* the guest kernel enable debugging as otherwise a VM writing to DR7
* from the #DB handler may trigger infinite loop of #DB's.
*/
vmcb->control.intercepts[INTERCEPT_DR] = 0;
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
- vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
- recalc_intercepts(svm);
+ if (sev_es_debug_swap_enabled) {
+ clr_exception_intercept(svm, DB_VECTOR);
+ /* clr_exception_intercept() called recalc_intercepts() */
+ } else {
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
+ vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
+ recalc_intercepts(svm);
+ }

/* Can't intercept XSETBV, HV can't modify XCR0 directly */
svm_clr_intercept(svm, INTERCEPT_XSETBV);
@@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)

/* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
hostsa->xss = host_xss;
+
+ /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
+ if (sev_es_debug_swap_enabled) {
+ hostsa->dr0 = native_get_debugreg(0);
+ hostsa->dr1 = native_get_debugreg(1);
+ hostsa->dr2 = native_get_debugreg(2);
+ hostsa->dr3 = native_get_debugreg(3);
+ hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
+ hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
+ hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
+ hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
+ }
}

void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
--
2.39.1

2023-04-20 01:50:51

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap

On 11/4/23 22:57, Alexey Kardashevskiy wrote:
> This is to use another AMD SEV-ES hardware assisted register swap,
> more detail in 5/6. In the process it's been suggested to fix other
> things, here is the attempt, with the great help of amders.
>
> The previous conversation is here:
> https://lore.kernel.org/r/[email protected]
>
> This is based on sha1
> f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".
>
> Please comment. Thanks.

Ping?
Or should I relax until the end of the nearest merge window (May
6th-ish)? :) Thanks,


>
>
> Alexey Kardashevskiy (6):
> KVM: SEV: move set_dr_intercepts/clr_dr_intercepts from the header
> KVM: SEV: Move SEV's GP_VECTOR intercept setup to SEV
> KVM: SEV-ES: explicitly disable debug
> KVM: SVM/SEV/SEV-ES: Rework intercepts
> KVM: SEV: Enable data breakpoints in SEV-ES
> x86/sev: Do not handle #VC for DR7 read/write
>
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/svm.h | 1 +
> arch/x86/kvm/svm/svm.h | 42 ---------------
> arch/x86/boot/compressed/sev.c | 2 +-
> arch/x86/kernel/sev.c | 6 +++
> arch/x86/kvm/svm/sev.c | 54 +++++++++++++++++++-
> arch/x86/kvm/svm/svm.c | 48 +++++++++++++++--
> 7 files changed, 105 insertions(+), 49 deletions(-)
>

--
Alexey

2023-04-20 14:36:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap

On Thu, Apr 20, 2023, Alexey Kardashevskiy wrote:
> On 11/4/23 22:57, Alexey Kardashevskiy wrote:
> > This is to use another AMD SEV-ES hardware assisted register swap,
> > more detail in 5/6. In the process it's been suggested to fix other
> > things, here is the attempt, with the great help of amders.
> >
> > The previous conversation is here:
> > https://lore.kernel.org/r/[email protected]
> >
> > This is based on sha1
> > f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".
> >
> > Please comment. Thanks.
>
> Ping?
> Or should I relax until the end of the nearest merge window (May 6th-ish)?
> :) Thanks,

Sorry, the answer is "relax". I'm likely going to be offline for a few days in
early May, so it might be more like May 15th until you hear from me, but this is
on my todo list.

2023-05-09 11:18:07

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> to/from a VM. Changing those registers inside a running SEV VM
> triggered #VMEXIT to KVM.
>
> SEV-ES added encrypted state (ES) which uses an encrypted page
> for the VM state (VMSA). The hardware saves/restores certain registers
> on VMRUN/VMEXIT according to a swap type (A, B, C), see
> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> volume 2.
>
> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
> guests, but a new feature is available, identified via
> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> support for swapping additional debug registers. DR[0-3] and
> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> is set.
>
> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> supported by the SOC as otherwise a malicious SEV-ES guest can set up
> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.

You mean #DB => #BP here?

> Set the features bit in sev_es_sync_vmsa() which is the last point
> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> init happens) is called not only when VCPU is initialized but also on
> intrahost migration when VMSA is encrypted.
>
> Eliminate DR7 and #DB intercepts as:
> - they are not needed when DebugSwap is supported;
> - #VC for these intercepts is most likely not supported anyway and
> kills the VM.
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> Reviewed-by: Tom Lendacky <[email protected]>
> ---
> Changes:
> v8:
> * added CPUID's DebugSwap feature
> * commit log, comments updated
> * redid the whole thing
>
> v4:
> * removed sev_es_is_debug_swap_enabled() helper
> * made sev_es_debug_swap_enabled (module param) static
> * set sev_feature early in sev_es_init_vmcb() and made intercepts
> dependend on it vs. module param
> * move set_/clr_dr_intercepts to .c
>
> v3:
> * rewrote the commit log again
> * rebased on tip/master to use recently defined X86_FEATURE_NO_NESTED_DATA_BP
> * s/boot_cpu_has/cpu_feature_enabled/
>
> v2:
> * debug_swap moved from vcpu to module_param
> * rewrote commit log
>
> ---
> Tested with:
> ===
> int x;
> int main(int argc, char *argv[])
> {
> x = 1;
> return 0;
> }
> ===
> gcc -g a.c
> rsync a.out ruby-954vm:~/
> ssh -t ruby-954vm 'gdb -ex "file a.out" -ex "watch x" -ex r'
>
> where ruby-954vm is a VM.
>
> With "/sys/module/kvm_amd/parameters/debug_swap = 0", gdb does not stop
> on the watchpoint, with "= 1" - gdb does.
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/svm.h | 1 +
> arch/x86/kvm/svm/sev.c | 36 ++++++++++++++++++--
> 3 files changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d9c190cdefa9..3a5eeb178778 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -435,6 +435,7 @@
> #define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
> #define X86_FEATURE_V_TSC_AUX (19*32+ 9) /* "" Virtual TSC_AUX */
> #define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
> +#define X86_FEATURE_DEBUG_SWAP (19*32+14) /* AMD SEV-ES full debug state swap support */
>
> /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
> #define X86_FEATURE_NO_NESTED_DATA_BP (20*32+ 0) /* "" No Nested Data Breakpoints */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 770dcf75eaa9..3a422d213010 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -280,6 +280,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
> #define AVIC_HPA_MASK ~((0xFFFULL << 52) | 0xFFF)
> #define VMCB_AVIC_APIC_BAR_MASK 0xFFFFFFFFFF000ULL
>
> +#define SVM_SEV_FEAT_DEBUG_SWAP BIT(5)
>
> struct vmcb_seg {
> u16 selector;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index f0885250252d..ba12e7962e94 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -22,6 +22,7 @@
> #include <asm/pkru.h>
> #include <asm/trapnr.h>
> #include <asm/fpu/xcr.h>
> +#include <asm/debugreg.h>
>
> #include "mmu.h"
> #include "x86.h"
> @@ -53,9 +54,14 @@ module_param_named(sev, sev_enabled, bool, 0444);
> /* enable/disable SEV-ES support */
> static bool sev_es_enabled = true;
> module_param_named(sev_es, sev_es_enabled, bool, 0444);
> +
> +/* enable/disable SEV-ES DebugSwap support */
> +static bool sev_es_debug_swap_enabled = true;
> +module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
> #else
> #define sev_enabled false
> #define sev_es_enabled false
> +#define sev_es_debug_swap_enabled false
> #endif /* CONFIG_KVM_AMD_SEV */
>
> static u8 sev_enc_bit;
> @@ -605,6 +611,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
> save->xss = svm->vcpu.arch.ia32_xss;
> save->dr6 = svm->vcpu.arch.dr6;
>
> + if (sev_es_debug_swap_enabled)
> + save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
> +
> pr_debug("Virtual Machine Save Area (VMSA):\n");
> print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>
> @@ -2256,6 +2265,9 @@ void __init sev_hardware_setup(void)
> out:
> sev_enabled = sev_supported;
> sev_es_enabled = sev_es_supported;
> + if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_DEBUG_SWAP) ||
> + !cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
> + sev_es_debug_swap_enabled = false;
> #endif
> }
>
> @@ -2976,14 +2988,20 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> svm_set_intercept(svm, TRAP_CR8_WRITE);
>
> /*
> + * Unless DebugSwap (depends on X86_FEATURE_NO_NESTED_DATA_BP) is enabled,
> * DR7 access must remain intercepted for an SEV-ES guest to disallow
> * the guest kernel enable debugging as otherwise a VM writing to DR7
> * from the #DB handler may trigger infinite loop of #DB's.
> */
> vmcb->control.intercepts[INTERCEPT_DR] = 0;
> - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> - vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> - recalc_intercepts(svm);
> + if (sev_es_debug_swap_enabled) {
> + clr_exception_intercept(svm, DB_VECTOR);
> + /* clr_exception_intercept() called recalc_intercepts() */
> + } else {
> + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> + recalc_intercepts(svm);
> + }
>
> /* Can't intercept XSETBV, HV can't modify XCR0 directly */
> svm_clr_intercept(svm, INTERCEPT_XSETBV);
> @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>
> /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
> hostsa->xss = host_xss;
> +
> + /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
> + if (sev_es_debug_swap_enabled) {
> + hostsa->dr0 = native_get_debugreg(0);
> + hostsa->dr1 = native_get_debugreg(1);
> + hostsa->dr2 = native_get_debugreg(2);
> + hostsa->dr3 = native_get_debugreg(3);
> + hostsa->dr0_addr_mask = amd_get_dr_addr_mask(0);
> + hostsa->dr1_addr_mask = amd_get_dr_addr_mask(1);
> + hostsa->dr2_addr_mask = amd_get_dr_addr_mask(2);
> + hostsa->dr3_addr_mask = amd_get_dr_addr_mask(3);
> + }
> }
>
> void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)

2023-05-10 09:44:11

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES


>> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
>> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
>> supported by the SOC as otherwise a malicious SEV-ES guest can set up
>> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
>
> You mean #DB => #BP here
Indeed its #DB. Was thinking something else.

Reviewed-by: Pankaj Gupta <[email protected]>

2023-05-19 00:44:01

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap

Hi Sean,

is that still on the list? Just checking :) Thanks,


On 21/4/23 00:32, Sean Christopherson wrote:
> On Thu, Apr 20, 2023, Alexey Kardashevskiy wrote:
>> On 11/4/23 22:57, Alexey Kardashevskiy wrote:
>>> This is to use another AMD SEV-ES hardware assisted register swap,
>>> more detail in 5/6. In the process it's been suggested to fix other
>>> things, here is the attempt, with the great help of amders.
>>>
>>> The previous conversation is here:
>>> https://lore.kernel.org/r/[email protected]
>>>
>>> This is based on sha1
>>> f91f9332d782 Ingo Molnar "Merge branch into tip/master: 'x86/tdx'".
>>>
>>> Please comment. Thanks.
>>
>> Ping?
>> Or should I relax until the end of the nearest merge window (May 6th-ish)?
>> :) Thanks,
>
> Sorry, the answer is "relax". I'm likely going to be offline for a few days in
> early May, so it might be more like May 15th until you hear from me, but this is
> on my todo list.

--
Alexey

2023-05-19 15:57:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 0/6] KVM: SEV: Enable AMD SEV-ES DebugSwap

On Fri, May 19, 2023, Alexey Kardashevskiy wrote:
> Hi Sean,
>
> is that still on the list? Just checking :) Thanks,

Yes, sorry for the long delay, I'm getting a late start on reviews this cycle for
a variety of reasons.

2023-05-22 23:03:30

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 4/6] KVM: SVM/SEV/SEV-ES: Rework intercepts

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> Currently SVM setup is done sequentially in
> init_vmcb() -> sev_init_vmcb() -> sev_es_init_vmcb()
> and tries keeping SVM/SEV/SEV-ES bits separated. One of the exceptions
> is DR intercepts which is for SEV-ES before sev_es_init_vmcb() runs.
>
> Move the SEV-ES intercept setup to sev_es_init_vmcb(). From now on
> set_dr_intercepts()/clr_dr_intercepts() handle SVM/SEV only.
>
> No functional change intended.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> Reviewed-by: Santosh Shukla <[email protected]>
> Reviewed-by: Tom Lendacky <[email protected]>
> ---
> Changes:
> v5:
> * updated the comments
> * removed sev_es_guest() checks from set_dr_intercepts()/clr_dr_intercepts()
> * removed remaining intercepts from clr_dr_intercepts()
> ---
> arch/x86/kvm/svm/sev.c | 11 ++++++
> arch/x86/kvm/svm/svm.c | 37 ++++++++------------
> 2 files changed, 25 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index b4365622222b..f0885250252d 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2946,6 +2946,7 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in)
>
> static void sev_es_init_vmcb(struct vcpu_svm *svm)
> {
> + struct vmcb *vmcb = svm->vmcb01.ptr;
> struct kvm_vcpu *vcpu = &svm->vcpu;
>
> svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE;
> @@ -2974,6 +2975,16 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> svm_set_intercept(svm, TRAP_CR4_WRITE);
> svm_set_intercept(svm, TRAP_CR8_WRITE);
>
> + /*
> + * DR7 access must remain intercepted for an SEV-ES guest to disallow
> + * the guest kernel enable debugging as otherwise a VM writing to DR7
> + * from the #DB handler may trigger infinite loop of #DB's.

This is wrong. The attack isn't writing DR7 in the #DB handler, it's setting up
a #DB on memory that's needed to vector a #DB, e.g. the stack, so that the _CPU_
itself gets stuck in an infinite #DB loop[*]. The guest software handler putting
itself into an infinite loop is a non-issue because it can be interrupted.

[*] https://bugzilla.redhat.com/show_bug.cgi?id=1278496

> + */
> + vmcb->control.intercepts[INTERCEPT_DR] = 0;
> + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
> + vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_WRITE);
> + recalc_intercepts(svm);

2023-05-22 23:05:15

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 3/6] KVM: SEV-ES: explicitly disable debug

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> SVM/SEV enable debug registers intercepts to skip swapping DRs
> on entering/exiting the guest. When the guest is in control of
> debug registers (vcpu->guest_debug == 0), there is an optimisation to
> reduce the number of context switches: intercepts are cleared and
> the KVM_DEBUGREG_WONT_EXIT flag is set to tell KVM to do swapping
> on guest enter/exit.
>
> The same code also executes for SEV-ES, however it has no effect as
> - it always takes (vcpu->guest_debug == 0) branch;
> - KVM_DEBUGREG_WONT_EXIT is set but DR7 intercept is not cleared;
> - vcpu_enter_guest() writes DRs but VMRUN for SEV-ES swaps them
> with the values from _encrypted_ VMSA.
>
> Be explicit about SEV-ES not supporting debug:
> - return right away from dr_interception() and skip unnecessary processing;
> - clear vcpu->guest_debug at SEV-ES' LAUNCH_UPDATE_VMSA if debugging
> was already enabled; after that point the generic x86's
> KVM_SET_GUEST_DEBUG ioctl disallows enabling debug.
>
> Add WARN_ON to kvm_x86::sync_dirty_debug_regs() (saves guest DRs on
> guest exit) to signify that SEV-ES won't hit that path.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> Reviewed-by: Tom Lendacky <[email protected]>
> ---
> Changes:
> v5:
> * new in the series
> ---
> arch/x86/kvm/svm/sev.c | 6 ++++++
> arch/x86/kvm/svm/svm.c | 10 +++++++++-
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0f4761a57d86..b4365622222b 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -639,6 +639,12 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
> return ret;
>
> vcpu->arch.guest_state_protected = true;
> +
> + if (vcpu->guest_debug)
> + pr_warn_ratelimited("guest_debug (%lx) not supported for SEV-ES",

Note, this needs a newline in the printk, otherwise it'll get buffered until
the next non-cont printk comes along (guess how many times I've been burned by
this).

> + vcpu->guest_debug);
> + vcpu->guest_debug = 0;

Argh, KVM's APIs can be quite frustrating. IIUC, guest_debug can never actually
be consumed because, per Tom[*], "A guest can't run before the LAUNCH_UPDATE process
is complete". But because the fact that the VM is an SEV-ES is communicated to
KVM after KVM_CREATE_VM, userspace can do KVM_SET_GUEST_DEBUG before KVM_SEV_ES_INIT
and before KVM_SEV_LAUNCH_UPDATE_VMSA, and thus get KVM into a state where
guest_debug is non-zero for an SEV-ES guest. Blech.

Instead of a ratelimited warn, can KVM get away with simply rejecting
KVM_SEV_LAUNCH_UPDATE_VMSA if guest_debug is non-zero? That combo can't work,
so it's seems unlikely userspace is relying on being able to do KVM_SET_GUEST_DEBUG.

If we do "have" to keep this approach, I'm generally opposed to any kind of printk
in KVM, but this one does seem to be justified since the most likely scenario is
that there's a human interactively debugging the guest (or at least, trying to
debug the guest). But I would say explicitly call out the ioctl(), "guest_debug"
probably won't mean anything to a random user. And I vote to not print the value,
that implies that the specific value is unsupported, not that debug in general is
disallowed.

Something like this (if we have to)?

pr_warn_ratelimited("Suppressing KVM_SET_GUEST_DEBUG for SEV-ES guest\n"

[*] https://lore.kernel.org/all/[email protected]

> +
> return 0;
> }
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dc12de325cca..179952a31d3b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1980,7 +1980,7 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> - if (vcpu->arch.guest_state_protected)
> + if (WARN_ON_ONCE(sev_es_guest(vcpu->kvm)))
> return;
>
> get_debugreg(vcpu->arch.db[0], 0);
> @@ -2698,6 +2698,14 @@ static int dr_interception(struct kvm_vcpu *vcpu)
> unsigned long val;
> int err = 0;
>
> + /*
> + * SEV-ES intercepts DR7 only to disable guest debugging
> + * and the guest issues a VMGEXIT for DR7 write only. KVM cannot

Wrapping is a bit aggressive (wrap at 80, not earlier).

> + * change DR7 (always swapped as type 'A') so return early.
> + */
> + if (sev_es_guest(vcpu->kvm))
> + return 1;
> +
> if (vcpu->guest_debug == 0) {
> /*
> * No more DR vmexits; force a reload of the debug registers
> --
> 2.39.1
>

2023-05-23 01:01:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> With MSR_AMD64_SEV_DEBUG_SWAP enabled, the guest is not expected to
> receive a #VC for reads or writes of DR7.
>
> Update the SNP_FEATURES_PRESENT mask with MSR_AMD64_SNP_DEBUG_SWAP so
> an SNP guest doesn't gracefully terminate during SNP feature negotiation
> if MSR_AMD64_SEV_DEBUG_SWAP is enabled.
>
> Since a guest is not expected to receive a #VC on DR7 accesses when
> MSR_AMD64_SEV_DEBUG_SWAP is enabled, return an error from the #VC
> handler in this situation.
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> Reviewed-by: Carlos Bilbao <[email protected]>
> Reviewed-by: Tom Lendacky <[email protected]>
> Reviewed-by: Pankaj Gupta <[email protected]>
> ---
> Changes:
> v4:
> * rebased on top of SNP feature negotiation
>
> v2:
> * use new bit definition
> ---
> arch/x86/boot/compressed/sev.c | 2 +-
> arch/x86/kernel/sev.c | 6 ++++++
> 2 files changed, 7 insertions(+), 1 deletion(-)

Can you post this separately (or bribe Boris to grab it)? IIUC, this has no
dependency on the KVM enabling, i.e. can/should go through the tip tree without
waiting for the KVM patches to be applied.

2023-05-23 01:02:08

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> to/from a VM. Changing those registers inside a running SEV VM
> triggered #VMEXIT to KVM.

Please, please don't make it sound like some behavior is *the* one and only behavior.
*KVM* *chooses* to intercept debug register accesses. Though I would omit this
paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
a basic understanding of how KVM deals with DRs and #DBs.

> SEV-ES added encrypted state (ES) which uses an encrypted page
> for the VM state (VMSA). The hardware saves/restores certain registers
> on VMRUN/VMEXIT according to a swap type (A, B, C), see
> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> volume 2.
>
> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES

Please rewrite this to just state what the behavior is instead of "Type A" vs.
"Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
enough to justify obfuscating super simple concepts.

Actually, this feature really has nothing to do with Type A vs. Type B, since
that's purely about host state. I assume the switch from Type A to Type B is a
side effect, or an opportunistic optimization?

Regardless, something like this is a lot easier for a human to read. It's easy
enough to find DebugSwap in the APM (literally took me longer to find my copy of
the PDF).

Add support for "DebugSwap for SEV-ES guests", which provides support
for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
allows KVM to expose debug capabilities to SEV-ES guests. Without
DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
and KVM cannot manually context switch guest DRs due the VMSA being
encrypted.

Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
vectoring a #DB. Without NoNestedDataBp, a malicious guest can DoS
the host by putting the CPU into an infinite loop of vectoring #DBs
(see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Set the features bit in sev_es_sync_vmsa() which is the last point
when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
init happens) is called not only when VCPU is initialized but also on
intrahost migration when VMSA is encrypted.

> guests, but a new feature is available, identified via
> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> support for swapping additional debug registers. DR[0-3] and
> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> is set.
>
> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> supported by the SOC as otherwise a malicious SEV-ES guest can set up
> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
> Set the features bit in sev_es_sync_vmsa() which is the last point
> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> init happens) is called not only when VCPU is initialized but also on
> intrahost migration when VMSA is encrypted.
>
> Eliminate DR7 and #DB intercepts as:
> - they are not needed when DebugSwap is supported;

"not needed" isn't sufficient justification. KVM doesn't strictly need to do a
lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
when NoNestedDataBp is support. Presumably the answer is "because it's simpler
than toggling #DB interception for guest_debug.

Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
separate patch? KVM can still inject #DBs for SEV-ES guests, no?

As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.

> - #VC for these intercepts is most likely not supported anyway and
> kills the VM.

I don't see how this is relevant or helpful. What the guest may or may not do in
response to a #VC on a DR7 write has no bearing on KVM's behavior.

> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> Reviewed-by: Tom Lendacky <[email protected]>
> ---

...

> @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>
> /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
> hostsa->xss = host_xss;
> +
> + /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */

Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
Can you fold the below somewhere before this patch, and then tweak this comment
to say something like:

/*
* If DebugSwap is enabled, debug registers are loaded but NOT saved by
* the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU both
* saves and loads debug registers (Type-A).
*/

[*] https://lore.kernel.org/all/[email protected]/

--
From: Sean Christopherson <[email protected]>
Date: Mon, 22 May 2023 16:29:52 -0700
Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
about swap types

Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
needs to save a seemingly random subset of host state, and to provide a
decoder for the APM's Type-A/B/C terminology.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 69ae5e1b3120..1e0e9b08e491 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
{
/*
- * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
- * of which one step is to perform a VMLOAD. KVM performs the
- * corresponding VMSAVE in svm_prepare_guest_switch for both
- * traditional and SEV-ES guests.
+ * All host state for SEV-ES guests is categorized into three swap types
+ * based on how it is handled by hardware during a world switch:
+ *
+ * A: VMRUN: Host state saved in host save area
+ * VMEXIT: Host state loaded from host save area
+ *
+ * B: VMRUN: Host state _NOT_ saved in host save area
+ * VMEXIT: Host state loaded from host save area
+ *
+ * C: VMRUN: Host state _NOT_ saved in host save area
+ * VMEXIT: Host state initialized to default(reset) values
+ *
+ * Manually save type-B state, i.e. state that is loaded by VMEXIT but
+ * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
+ * by common SVM code).
*/
-
- /* XCR0 is restored on VMEXIT, save the current host value */
hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
-
- /* PKRU is restored on VMEXIT, save the current host value */
hostsa->pkru = read_pkru();
-
- /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
hostsa->xss = host_xss;
}


base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
--

2023-05-23 11:59:05

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES



On 23/5/23 09:39, Sean Christopherson wrote:
> On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
>> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
>> to/from a VM. Changing those registers inside a running SEV VM
>> triggered #VMEXIT to KVM.
>
> Please, please don't make it sound like some behavior is *the* one and only behavior.
> *KVM* *chooses* to intercept debug register accesses. Though I would omit this
> paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
> a basic understanding of how KVM deals with DRs and #DBs.

Out of curiosity - is ARM similar in this regard, interceptions and stuff?

>> SEV-ES added encrypted state (ES) which uses an encrypted page
>> for the VM state (VMSA). The hardware saves/restores certain registers
>> on VMRUN/VMEXIT according to a swap type (A, B, C), see
>> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
>> volume 2.
>>
>> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
>
> Please rewrite this to just state what the behavior is instead of "Type A" vs.
> "Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
> enough to justify obfuscating super simple concepts.
>
> Actually, this feature really has nothing to do with Type A vs. Type B, since
> that's purely about host state.

Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in
HOSTSA, then the host looses debug state because of the working feature.

> I assume the switch from Type A to Type B is a
> side effect, or an opportunistic optimization?

There is no switch. DR[67] were and are one type, and the other DRs were
not swapped and now are, but with a different Swap Type.

And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves
some explaining why is that.

> Regardless, something like this is a lot easier for a human to read. It's easy
> enough to find DebugSwap in the APM (literally took me longer to find my copy of
> the PDF).

It is also easy to find "Swap Types" in the APM...

> Add support for "DebugSwap for SEV-ES guests", which provides support
> for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
> allows KVM to expose debug capabilities to SEV-ES guests. Without
> DebugSwap support, the CPU doesn't save/load _guest_ debug registers,

But it does save/load DR6 and DR7.

> and KVM cannot manually context switch guest DRs due the VMSA being
> encrypted.
>
> Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
> which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
> vectoring a #DB.

(english question) What does "vectoring" mean here precisely? Handling?
Processing?

> Without NoNestedDataBp, a malicious guest can DoS
> the host by putting the CPU into an infinite loop of vectoring #DBs
> (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)

Good one, thanks for the link.

>
> Set the features bit in sev_es_sync_vmsa() which is the last point
> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> init happens) is called not only when VCPU is initialized but also on
> intrahost migration when VMSA is encrypted.
>
>> guests, but a new feature is available, identified via
>> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
>> support for swapping additional debug registers. DR[0-3] and
>> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
>> is set.
>>
>> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
>> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
>> supported by the SOC as otherwise a malicious SEV-ES guest can set up
>> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
>> Set the features bit in sev_es_sync_vmsa() which is the last point
>> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
>> init happens) is called not only when VCPU is initialized but also on
>> intrahost migration when VMSA is encrypted.
>>
>> Eliminate DR7 and #DB intercepts as:
>> - they are not needed when DebugSwap is supported;
>
> "not needed" isn't sufficient justification. KVM doesn't strictly need to do a
> lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
> is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
> this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
> when NoNestedDataBp is support. Presumably the answer is "because it's simpler
> than toggling #DB interception for guest_debug.

TBH I did not have a good answer but from the above I'd say we want to
disable #DB intercepts in a separate patch, just as you say below.

> Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
> separate patch? KVM can still inject #DBs for SEV-ES guests, no?

Sorry for my ignorance but what is the point of injecting #DB if there
is no way of changing the guest's DR7?


> As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
> and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.
>
>> - #VC for these intercepts is most likely not supported anyway and
>> kills the VM.
>
> I don't see how this is relevant or helpful. What the guest may or may not do in
> response to a #VC on a DR7 write has no bearing on KVM's behavior.

Readers of the patch may wonder of the chances of breaks guests.
Definitely helps me to realize there is likely to be some sort of
panic() in the guest if such intercept happens.


>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>> Reviewed-by: Tom Lendacky <[email protected]>
>> ---
>
> ...
>
>> @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
>>
>> /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
>> hostsa->xss = host_xss;
>> +
>> + /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
>
> Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
> Can you fold the below somewhere before this patch, and then tweak this comment
> to say something like:
>
> /*
> * If DebugSwap is enabled, debug registers are loaded but NOT saved by
> * the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU both
> * saves and loads debug registers (Type-A).
> */

Sure but...

>
> [*] https://lore.kernel.org/all/[email protected]/
>
>
> From: Sean Christopherson <[email protected]>
> Date: Mon, 22 May 2023 16:29:52 -0700
> Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
> about swap types


... I am missing the point here. You already wrote the patch below which
1) you are happy about 2) you can push out right away to the upstream.
Are you suggesting that I repost it?


>
> Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
> swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
> needs to save a seemingly random subset of host state, and to provide a
> decoder for the APM's Type-A/B/C terminology.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
> 1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 69ae5e1b3120..1e0e9b08e491 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
> void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> {
> /*
> - * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
> - * of which one step is to perform a VMLOAD. KVM performs the
> - * corresponding VMSAVE in svm_prepare_guest_switch for both
> - * traditional and SEV-ES guests.


When I see "traditional", I assume there was one single x86 KVM before
say 2010 which was slow, emulated a lot but worked exactly the same on
Intel and AMD. Which does not seem to ever be the case. May be say "SVM"
here?


> + * All host state for SEV-ES guests is categorized into three swap types
> + * based on how it is handled by hardware during a world switch:
> + *
> + * A: VMRUN: Host state saved in host save area
> + * VMEXIT: Host state loaded from host save area
> + *
> + * B: VMRUN: Host state _NOT_ saved in host save area
> + * VMEXIT: Host state loaded from host save area
> + *
> + * C: VMRUN: Host state _NOT_ saved in host save area
> + * VMEXIT: Host state initialized to default(reset) values
> + *
> + * Manually save type-B state, i.e. state that is loaded by VMEXIT but
> + * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
> + * by common SVM code).

Like here - "common SVM"?

Thanks for the comments!


> */
> -
> - /* XCR0 is restored on VMEXIT, save the current host value */
> hostsa->xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
> -
> - /* PKRU is restored on VMEXIT, save the current host value */
> hostsa->pkru = read_pkru();
> -
> - /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
> hostsa->xss = host_xss;
> }
>
>
> base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f

--
Alexey

2023-05-23 16:17:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>
>
> On 23/5/23 09:39, Sean Christopherson wrote:
> > On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
> > > Prior to SEV-ES, KVM saved/restored host debug registers upon switching
> > > to/from a VM. Changing those registers inside a running SEV VM
> > > triggered #VMEXIT to KVM.
> >
> > Please, please don't make it sound like some behavior is *the* one and only behavior.
> > *KVM* *chooses* to intercept debug register accesses. Though I would omit this
> > paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
> > a basic understanding of how KVM deals with DRs and #DBs.
>
> Out of curiosity - is ARM similar in this regard, interceptions and stuff?

Yes. The granularity of interceptions varies based on the architectural revision,
and presumably there are things that always trap. But the core concept of letting
software decide what to intercept is the same.

> > > SEV-ES added encrypted state (ES) which uses an encrypted page
> > > for the VM state (VMSA). The hardware saves/restores certain registers
> > > on VMRUN/VMEXIT according to a swap type (A, B, C), see
> > > "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
> > > volume 2.
> > >
> > > The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
> >
> > Please rewrite this to just state what the behavior is instead of "Type A" vs.
> > "Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
> > enough to justify obfuscating super simple concepts.
> >
> > Actually, this feature really has nothing to do with Type A vs. Type B, since
> > that's purely about host state.
>
> Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in HOSTSA,
> then the host looses debug state because of the working feature.
>
> > I assume the switch from Type A to Type B is a
> > side effect, or an opportunistic optimization?
>
> There is no switch. DR[67] were and are one type, and the other DRs were not
> swapped and now are, but with a different Swap Type.

Ah, this is what I missed.

> And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves some
> explaining why is that.
>
> > Regardless, something like this is a lot easier for a human to read. It's easy
> > enough to find DebugSwap in the APM (literally took me longer to find my copy of
> > the PDF).
>
> It is also easy to find "Swap Types" in the APM...

Redirecting readers to specs for gory details is fine. Redirecting for basic,
simple functionality is not. Maybe the swap types will someday be common knowledge
in KVM, and an explanation will no longer be necessary, but we are nowhere near
that point.

> > Add support for "DebugSwap for SEV-ES guests", which provides support
> > for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
> > allows KVM to expose debug capabilities to SEV-ES guests. Without
> > DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
>
> But it does save/load DR6 and DR7.
>
> > and KVM cannot manually context switch guest DRs due the VMSA being
> > encrypted.
> >
> > Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
> > which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
> > vectoring a #DB.
>
> (english question) What does "vectoring" mean here precisely? Handling?
> Processing?

It's not really an English thing. "Vectoring" is, or at least was, Intel terminology
for describing the flow from the initial detection of an exception to the exception's
delivery to software, e.g. covers the IDT lookup, any GDT/LDT lookups, pushing
information on the stack, fetching the software exception handler, etc. Importantly,
it describes the period where there are no instructions retired and thus ucode doesn't
open event windows, i.e. where triggering another, non-contributory exception will
effectively "hang" the CPU (at least on CPUs without Intel's "notify" VM-Exit support).

> > the host by putting the CPU into an infinite loop of vectoring #DBs
> > (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)
>
> Good one, thanks for the link.
>
> >
> > Set the features bit in sev_es_sync_vmsa() which is the last point
> > when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> > init happens) is called not only when VCPU is initialized but also on
> > intrahost migration when VMSA is encrypted.
> >
> > > guests, but a new feature is available, identified via
> > > CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
> > > support for swapping additional debug registers. DR[0-3] and
> > > DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
> > > is set.
> > >
> > > Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
> > > ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
> > > supported by the SOC as otherwise a malicious SEV-ES guest can set up
> > > data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
> > > Set the features bit in sev_es_sync_vmsa() which is the last point
> > > when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
> > > init happens) is called not only when VCPU is initialized but also on
> > > intrahost migration when VMSA is encrypted.
> > >
> > > Eliminate DR7 and #DB intercepts as:
> > > - they are not needed when DebugSwap is supported;
> >
> > "not needed" isn't sufficient justification. KVM doesn't strictly need to do a
> > lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
> > is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
> > this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
> > when NoNestedDataBp is support. Presumably the answer is "because it's simpler
> > than toggling #DB interception for guest_debug.
>
> TBH I did not have a good answer but from the above I'd say we want to
> disable #DB intercepts in a separate patch, just as you say below.
>
> > Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
> > separate patch? KVM can still inject #DBs for SEV-ES guests, no?
>
> Sorry for my ignorance but what is the point of injecting #DB if there is no
> way of changing the guest's DR7?

Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
"What's the point of _intercepting_ #DB" is the real question. And for SEV-ES guests
with DebugSwap, there is no point, which is why I agree that KVM should disable
interception in that case. What I'm calling out is that disabling #Db interception
isn't _necessary_ for correctness (unless I'm missing something), which means that
it can and should go in a separate patch.

> > As for DR7, the real justification is that, as above, KVM can't modify guest DR7,
> > and intercepting DR7 would completely defeat the purpose of enabling DebugSwap.
> >
> > > - #VC for these intercepts is most likely not supported anyway and
> > > kills the VM.
> >
> > I don't see how this is relevant or helpful. What the guest may or may not do in
> > response to a #VC on a DR7 write has no bearing on KVM's behavior.
>
> Readers of the patch may wonder of the chances of breaks guests. Definitely
> helps me to realize there is likely to be some sort of panic() in the guest
> if such intercept happens.

But that's irrelevant. Intercepting DR7 writes will break the guest regardless
of how the guest deals with the #VC. If the guest eats the #VC and continues on,
the debug capabilities expected by the guest will still be missing, i.e. KVM has
broken the functionality of the guest. I am total ok if the changelog describes
the _possible_ scenarios (within reason), e.g. that the guest will either panic
on an unexpected #VC or lose debug capabilities that were promised. What I'm
objecting to is speculating on what the guest is _likely_ to do, and especially
using that speculation as justification for doing something in KVM.

> > > Signed-off-by: Alexey Kardashevskiy <[email protected]>
> > > Reviewed-by: Tom Lendacky <[email protected]>
> > > ---
> >
> > ...
> >
> > > @@ -3048,6 +3066,18 @@ void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> > > /* MSR_IA32_XSS is restored on VMEXIT, save the currnet host value */
> > > hostsa->xss = host_xss;
> > > +
> > > + /* The DebugSwap SEV feature does Type B swaps of DR[0-3] */
> >
> > Since dangling a carrot didn't work[*], I'm going to resort to extortion :-)
> > Can you fold the below somewhere before this patch, and then tweak this comment
> > to say something like:
> >
> > /*
> > * If DebugSwap is enabled, debug registers are loaded but NOT saved by
> > * the CPU (Type-B). If DebugSwap is disabled/unsupported, the CPU both
> > * saves and loads debug registers (Type-A).
> > */
>
> Sure but...
>
> >
> > [*] https://lore.kernel.org/all/[email protected]/
> >
> >
> > From: Sean Christopherson <[email protected]>
> > Date: Mon, 22 May 2023 16:29:52 -0700
> > Subject: [PATCH] KVM: SVM: Rewrite sev_es_prepare_switch_to_guest()'s comment
> > about swap types
>
>
> ... I am missing the point here. You already wrote the patch below which 1)
> you are happy about 2) you can push out right away to the upstream. Are you
> suggesting that I repost it?

I am asking you to include it in your series because the comment I suggested above
(for DebugSwap) loosely depends on the revamped comment for sev_es_prepare_switch_to_guest()
as a whole. I would like to settle on the exact wording for all of the comments
in sev_es_prepare_switch_to_guest() in a single series instead of having disjoint
threads.

> > Rewrite the comment(s) in sev_es_prepare_switch_to_guest() to explain the
> > swap types employed by the CPU for SEV-ES guests, i.e. to explain why KVM
> > needs to save a seemingly random subset of host state, and to provide a
> > decoder for the APM's Type-A/B/C terminology.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/svm/sev.c | 25 +++++++++++++++----------
> > 1 file changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 69ae5e1b3120..1e0e9b08e491 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3017,19 +3017,24 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm)
> > void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa)
> > {
> > /*
> > - * As an SEV-ES guest, hardware will restore the host state on VMEXIT,
> > - * of which one step is to perform a VMLOAD. KVM performs the
> > - * corresponding VMSAVE in svm_prepare_guest_switch for both
> > - * traditional and SEV-ES guests.
>
>
> When I see "traditional", I assume there was one single x86 KVM before say
> 2010 which was slow, emulated a lot but worked exactly the same on Intel and
> AMD. Which does not seem to ever be the case. May be say "SVM" here?

This is the code being removed. I agree that "traditional" is ambiguous, which
is why I want to delete it :-)

> > + * All host state for SEV-ES guests is categorized into three swap types
> > + * based on how it is handled by hardware during a world switch:
> > + *
> > + * A: VMRUN: Host state saved in host save area
> > + * VMEXIT: Host state loaded from host save area
> > + *
> > + * B: VMRUN: Host state _NOT_ saved in host save area
> > + * VMEXIT: Host state loaded from host save area
> > + *
> > + * C: VMRUN: Host state _NOT_ saved in host save area
> > + * VMEXIT: Host state initialized to default(reset) values
> > + *
> > + * Manually save type-B state, i.e. state that is loaded by VMEXIT but
> > + * isn't saved by VMRUN, that isn't already saved by VMSAVE (performed
> > + * by common SVM code).

2023-05-24 06:39:22

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 6/6] x86/sev: Do not handle #VC for DR7 read/write



On 23/5/23 09:44, Sean Christopherson wrote:
> On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
>> With MSR_AMD64_SEV_DEBUG_SWAP enabled, the guest is not expected to
>> receive a #VC for reads or writes of DR7.
>>
>> Update the SNP_FEATURES_PRESENT mask with MSR_AMD64_SNP_DEBUG_SWAP so
>> an SNP guest doesn't gracefully terminate during SNP feature negotiation
>> if MSR_AMD64_SEV_DEBUG_SWAP is enabled.
>>
>> Since a guest is not expected to receive a #VC on DR7 accesses when
>> MSR_AMD64_SEV_DEBUG_SWAP is enabled, return an error from the #VC
>> handler in this situation.
>>
>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>> Reviewed-by: Carlos Bilbao <[email protected]>
>> Reviewed-by: Tom Lendacky <[email protected]>
>> Reviewed-by: Pankaj Gupta <[email protected]>
>> ---
>> Changes:
>> v4:
>> * rebased on top of SNP feature negotiation
>>
>> v2:
>> * use new bit definition
>> ---
>> arch/x86/boot/compressed/sev.c | 2 +-
>> arch/x86/kernel/sev.c | 6 ++++++
>> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> Can you post this separately (or bribe Boris to grab it)? IIUC, this has no
> dependency on the KVM enabling, i.e. can/should go through the tip tree without
> waiting for the KVM patches to be applied.

I definitely can, do you mind adding yours "rb"/"ab"? Thanks!


--
Alexey

2023-05-26 03:23:07

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES



On 24/5/23 01:44, Sean Christopherson wrote:
> On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>>
>>
>> On 23/5/23 09:39, Sean Christopherson wrote:
>>> On Tue, Apr 11, 2023, Alexey Kardashevskiy wrote:
>>>> Prior to SEV-ES, KVM saved/restored host debug registers upon switching
>>>> to/from a VM. Changing those registers inside a running SEV VM
>>>> triggered #VMEXIT to KVM.
>>>
>>> Please, please don't make it sound like some behavior is *the* one and only behavior.
>>> *KVM* *chooses* to intercept debug register accesses. Though I would omit this
>>> paragraph (and largely the next) entirely, IMO it's safe to assume the reader has
>>> a basic understanding of how KVM deals with DRs and #DBs.
>>
>> Out of curiosity - is ARM similar in this regard, interceptions and stuff?
>
> Yes. The granularity of interceptions varies based on the architectural revision,
> and presumably there are things that always trap. But the core concept of letting
> software decide what to intercept is the same.
>
>>>> SEV-ES added encrypted state (ES) which uses an encrypted page
>>>> for the VM state (VMSA). The hardware saves/restores certain registers
>>>> on VMRUN/VMEXIT according to a swap type (A, B, C), see
>>>> "Table B-3. Swap Types" in the AMD Architecture Programmer’s Manual
>>>> volume 2.
>>>>
>>>> The DR6 and DR7 registers have always been swapped as Type A for SEV-ES
>>>
>>> Please rewrite this to just state what the behavior is instead of "Type A" vs.
>>> "Type B". Outside of AMD, the "type a/b/c" stuff isn't anywhere near ubiquitous
>>> enough to justify obfuscating super simple concepts.
>>>
>>> Actually, this feature really has nothing to do with Type A vs. Type B, since
>>> that's purely about host state.
>>
>> Mmm. Nothing? If the feature is enabled and DR[0-3] are not saved in HOSTSA,
>> then the host looses debug state because of the working feature.
>>
>>> I assume the switch from Type A to Type B is a
>>> side effect, or an opportunistic optimization?
>>
>> There is no switch. DR[67] were and are one type, and the other DRs were not
>> swapped and now are, but with a different Swap Type.
>
> Ah, this is what I missed.
>
>> And the patch saves DR[0-3] in HOSTSA but not DR[67] and this deserves some
>> explaining why is that.
>>
>>> Regardless, something like this is a lot easier for a human to read. It's easy
>>> enough to find DebugSwap in the APM (literally took me longer to find my copy of
>>> the PDF).
>>
>> It is also easy to find "Swap Types" in the APM...
>
> Redirecting readers to specs for gory details is fine. Redirecting for basic,
> simple functionality is not. Maybe the swap types will someday be common knowledge
> in KVM, and an explanation will no longer be necessary, but we are nowhere near
> that point.
>
>>> Add support for "DebugSwap for SEV-ES guests", which provides support
>>> for swapping DR[0-3] and DR[0-3]_ADDR_MASK on VMRUN and VMEXIT, i.e.
>>> allows KVM to expose debug capabilities to SEV-ES guests. Without
>>> DebugSwap support, the CPU doesn't save/load _guest_ debug registers,
>>
>> But it does save/load DR6 and DR7.
>>
>>> and KVM cannot manually context switch guest DRs due the VMSA being
>>> encrypted.
>>>
>>> Enable DebugSwap if and only if the CPU also supports NoNestedDataBp,
>>> which causes the CPU to ignore nested #DBs, i.e. #DBs that occur when
>>> vectoring a #DB.
>>
>> (english question) What does "vectoring" mean here precisely? Handling?
>> Processing?
>
> It's not really an English thing. "Vectoring" is, or at least was, Intel terminology
> for describing the flow from the initial detection of an exception to the exception's
> delivery to software, e.g. covers the IDT lookup, any GDT/LDT lookups, pushing
> information on the stack, fetching the software exception handler, etc. Importantly,
> it describes the period where there are no instructions retired and thus ucode doesn't
> open event windows, i.e. where triggering another, non-contributory exception will
> effectively "hang" the CPU (at least on CPUs without Intel's "notify" VM-Exit support).
>
>>> the host by putting the CPU into an infinite loop of vectoring #DBs
>>> (see https://bugzilla.redhat.com/show_bug.cgi?id=1278496)
>>
>> Good one, thanks for the link.
>>
>>>
>>> Set the features bit in sev_es_sync_vmsa() which is the last point
>>> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
>>> init happens) is called not only when VCPU is initialized but also on
>>> intrahost migration when VMSA is encrypted.
>>>
>>>> guests, but a new feature is available, identified via
>>>> CPUID Fn8000001F_EAX[14] "DebugSwap for SEV-ES guests", that provides
>>>> support for swapping additional debug registers. DR[0-3] and
>>>> DR[0-3]_ADDR_MASK are swapped as Type B when SEV_FEATURES[5] (DebugSwap)
>>>> is set.
>>>>
>>>> Enable DebugSwap for a VMSA but only do so if CPUID Fn80000021_EAX[0]
>>>> ("NoNestedDataBp", "Processor ignores nested data breakpoints") is
>>>> supported by the SOC as otherwise a malicious SEV-ES guest can set up
>>>> data breakpoints on the #DB IDT entry/stack and cause an infinite loop.
>>>> Set the features bit in sev_es_sync_vmsa() which is the last point
>>>> when VMSA is not encrypted yet as sev_(es_)init_vmcb() (where the most
>>>> init happens) is called not only when VCPU is initialized but also on
>>>> intrahost migration when VMSA is encrypted.
>>>>
>>>> Eliminate DR7 and #DB intercepts as:
>>>> - they are not needed when DebugSwap is supported;
>>>
>>> "not needed" isn't sufficient justification. KVM doesn't strictly need to do a
>>> lot of things, but does them anyways for a variety of reasons. E.g. #DB intercept
>>> is also not needed when NoNestedDataBp is supported and vcpu->guest_debug==0, i.e.
>>> this should clarify why KVM doesn't simply disable #DB intercepts for _all_ VMs
>>> when NoNestedDataBp is support. Presumably the answer is "because it's simpler
>>> than toggling #DB interception for guest_debug.
>>
>> TBH I did not have a good answer but from the above I'd say we want to
>> disable #DB intercepts in a separate patch, just as you say below.
>>
>>> Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
>>> separate patch? KVM can still inject #DBs for SEV-ES guests, no?
>>
>> Sorry for my ignorance but what is the point of injecting #DB if there is no
>> way of changing the guest's DR7?
>
> Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
> "What's the point of _intercepting_ #DB" is the real question. And for SEV-ES guests
> with DebugSwap, there is no point, which is why I agree that KVM should disable
> interception in that case. What I'm calling out is that disabling #Db interception
> isn't _necessary_ for correctness (unless I'm missing something), which means that
> it can and should go in a separate patch.


About this. Instead of sev_es_init_vmcb(), I can toggle the #DB
intercept when toggling guest_debug, see below. This
kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and
kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if
guest_state_protected = true). Is there any downside?


diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index da28ed69bf4a..a7df5eb4ac00 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1951,9 +1951,15 @@ static void svm_update_exception_bitmap(struct
kvm_vcpu *vcpu)

clr_exception_intercept(svm, BP_VECTOR);

+ if (cpu_feature_enabled(X86_FEATURE_NO_NESTED_DATA_BP))
+ clr_exception_intercept(svm, DB_VECTOR);
+
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
set_exception_intercept(svm, BP_VECTOR);
+
+ if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
+ set_exception_intercept(svm, DB_VECTOR);
}
}




--
Alexey

2023-05-26 15:24:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

On Fri, May 26, 2023, Alexey Kardashevskiy wrote:
>
> On 24/5/23 01:44, Sean Christopherson wrote:
> > On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
> > > > Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
> > > > separate patch? KVM can still inject #DBs for SEV-ES guests, no?
> > >
> > > Sorry for my ignorance but what is the point of injecting #DB if there is no
> > > way of changing the guest's DR7?
> >
> > Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
> > "What's the point of _intercepting_ #DB" is the real question. And for SEV-ES guests
> > with DebugSwap, there is no point, which is why I agree that KVM should disable
> > interception in that case. What I'm calling out is that disabling #Db interception
> > isn't _necessary_ for correctness (unless I'm missing something), which means that
> > it can and should go in a separate patch.
>
>
> About this. Instead of sev_es_init_vmcb(), I can toggle the #DB intercept
> when toggling guest_debug, see below. This
> kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and
> kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if
> guest_state_protected = true).

KVM also intercepts #DB when single-stepping over IRET to find an NMI window, so
you'd also have to factor in nmi_singlestep, and update svm_enable_nmi_window()
and disable_nmi_singlestep() to call svm_update_exception_bitmap().

> Is there any downside?

Complexity is the main one. The complexity is quite low, but the benefit to the
guest is likely even lower. A #DB in the guest isn't likely to be performance
sensitive. And on the flip side, opening an NMI window would be a tiny bit more
expensive, though I doubt that would be meaningful either.

All in all, I think it makes sense to just keep intercepting #DB for non-SEV-ES
guests.

Side topic, isn't there an existing bug regarding SEV-ES NMI windows? KVM can't
actually single-step an SEV-ES guest, but tries to set RFLAGS.TF anyways. Blech,
and suppressing EFER.SVME in efer_trap() is a bit gross, but I suppose since the
GHCB doesn't allow for CLGI or STGI it's "fine".

E.g. shouldn't KVM do this?

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ca32389f3c36..4e4a49031efe 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
return; /* IRET will cause a vm exit */

+ /*
+ * KVM can't single-step SEV-ES guests and instead assumes that IRET
+ * in the guest will always succeed, i.e. clears NMI masking on the
+ * next VM-Exit. Note, GIF is guaranteed to be '1' for SEV-ES guests
+ * as the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
+ * EFER.SVME for good measure, see efer_trap()).
+ */
+ if (sev_es_guest(vcpu->kvm))
+ return;
+
if (!gif_set(svm)) {
if (vgif)
svm_set_intercept(svm, INTERCEPT_STGI);

2023-05-30 09:08:53

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES



On 27/5/23 00:39, Sean Christopherson wrote:
> On Fri, May 26, 2023, Alexey Kardashevskiy wrote:
>>
>> On 24/5/23 01:44, Sean Christopherson wrote:
>>> On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>>>>> Actually, can't disabling #DB interception for DebugSwap SEV-ES guests be a
>>>>> separate patch? KVM can still inject #DBs for SEV-ES guests, no?
>>>>
>>>> Sorry for my ignorance but what is the point of injecting #DB if there is no
>>>> way of changing the guest's DR7?
>>>
>>> Well, _injecting_ the #DB is necessary for correctness from the guest's perspective.
>>> "What's the point of _intercepting_ #DB" is the real question. And for SEV-ES guests
>>> with DebugSwap, there is no point, which is why I agree that KVM should disable
>>> interception in that case. What I'm calling out is that disabling #Db interception
>>> isn't _necessary_ for correctness (unless I'm missing something), which means that
>>> it can and should go in a separate patch.
>>
>>
>> About this. Instead of sev_es_init_vmcb(), I can toggle the #DB intercept
>> when toggling guest_debug, see below. This
>> kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and
>> kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if
>> guest_state_protected = true).
>
> KVM also intercepts #DB when single-stepping over IRET to find an NMI window, so
> you'd also have to factor in nmi_singlestep, and update svm_enable_nmi_window()
> and disable_nmi_singlestep() to call svm_update_exception_bitmap().

Uff. New can of worms for me :-/


>> Is there any downside?
>
> Complexity is the main one. The complexity is quite low, but the benefit to the
> guest is likely even lower. A #DB in the guest isn't likely to be performance
> sensitive. And on the flip side, opening an NMI window would be a tiny bit more
> expensive, though I doubt that would be meaningful either.
>
> All in all, I think it makes sense to just keep intercepting #DB for non-SEV-ES
> guests.
>
> Side topic, isn't there an existing bug regarding SEV-ES NMI windows? KVM can't
> actually single-step an SEV-ES guest, but tries to set RFLAGS.TF anyways.

Why is it a "bug" and what does the patch fix? Sound to me as it is
pointless and the guest won't do single stepping and instead will run
till it exits somehow, what do I miss?

> Blech,
> and suppressing EFER.SVME in efer_trap() is a bit gross,

Why suppressed? svm_set_efer() sets it eventually anyway.

> but I suppose since the
> GHCB doesn't allow for CLGI or STGI it's "fine".

GHCB does not mention this, instead these are always intercepted in
init_vmcb().

> E.g. shouldn't KVM do this?

It sure can and I am happy to include this into the series, the commit
log is what I am struggling with :)

>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ca32389f3c36..4e4a49031efe 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
> return; /* IRET will cause a vm exit */
>
> + /*
> + * KVM can't single-step SEV-ES guests and instead assumes that IRET
> + * in the guest will always succeed,

It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET):

case SVM_VMGEXIT_NMI_COMPLETE:
ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
break;


> i.e. clears NMI masking on the
> + * next VM-Exit. Note, GIF is guaranteed to be '1' for SEV-ES guests
> + * as the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
> + * EFER.SVME for good measure, see efer_trap()).

SVM KVM seems to not enforce EFER.SVME, the guest does what it wants and
KVM is only told the new value via EFER_WRITE_TRAP. And "writes by
SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM.
I must be missing the point here...


> + */
> + if (sev_es_guest(vcpu->kvm))
> + return;
> +
> if (!gif_set(svm)) {
> if (vgif)
> svm_set_intercept(svm, INTERCEPT_STGI);

--
Alexey

2023-06-01 23:40:56

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

Sean, ping?

I wonder if this sev-es-not-singlestepping is a showstopper or it is
alright to repost this patchset without it? Thanks,


On 30/5/23 18:57, Alexey Kardashevskiy wrote:
>
>
> On 27/5/23 00:39, Sean Christopherson wrote:
>> On Fri, May 26, 2023, Alexey Kardashevskiy wrote:
>>>
>>> On 24/5/23 01:44, Sean Christopherson wrote:
>>>> On Tue, May 23, 2023, Alexey Kardashevskiy wrote:
>>>>>> Actually, can't disabling #DB interception for DebugSwap SEV-ES
>>>>>> guests be a
>>>>>> separate patch?  KVM can still inject #DBs for SEV-ES guests, no?
>>>>>
>>>>> Sorry for my ignorance but what is the point of injecting #DB if
>>>>> there is no
>>>>> way of changing the guest's DR7?
>>>>
>>>> Well, _injecting_ the #DB is necessary for correctness from the
>>>> guest's perspective.
>>>> "What's the point of _intercepting_ #DB" is the real question.  And
>>>> for SEV-ES guests
>>>> with DebugSwap, there is no point, which is why I agree that KVM
>>>> should disable
>>>> interception in that case.  What I'm calling out is that disabling
>>>> #Db interception
>>>> isn't _necessary_ for correctness (unless I'm missing something),
>>>> which means that
>>>> it can and should go in a separate patch.
>>>
>>>
>>> About this. Instead of sev_es_init_vmcb(), I can toggle the #DB
>>> intercept
>>> when toggling guest_debug, see below. This
>>> kvm_x86_ops::update_exception_bitmap hook is called on vcpu reset and
>>> kvm_arch_vcpu_ioctl_set_guest_debug (which skips this call if
>>> guest_state_protected = true).
>>
>> KVM also intercepts #DB when single-stepping over IRET to find an NMI
>> window, so
>> you'd also have to factor in nmi_singlestep, and update
>> svm_enable_nmi_window()
>> and disable_nmi_singlestep() to call svm_update_exception_bitmap().
>
> Uff. New can of worms for me :-/
>
>
>>> Is there any downside?
>>
>> Complexity is the main one.  The complexity is quite low, but the
>> benefit to the
>> guest is likely even lower.  A #DB in the guest isn't likely to be
>> performance
>> sensitive.  And on the flip side, opening an NMI window would be a
>> tiny bit more
>> expensive, though I doubt that would be meaningful either.
>>
>> All in all, I think it makes sense to just keep intercepting #DB for
>> non-SEV-ES
>> guests.
>>
>> Side topic, isn't there an existing bug regarding SEV-ES NMI windows?
>> KVM can't
>> actually single-step an SEV-ES guest, but tries to set RFLAGS.TF anyways.
>
> Why is it a "bug" and what does the patch fix? Sound to me as it is
> pointless and the guest won't do single stepping and instead will run
> till it exits somehow, what do I miss?
>
>> Blech,
>> and suppressing EFER.SVME in efer_trap() is a bit gross,
>
> Why suppressed? svm_set_efer() sets it eventually anyway.
>
>> but I suppose since the
>> GHCB doesn't allow for CLGI or STGI it's "fine".
>
> GHCB does not mention this, instead these are always intercepted in
> init_vmcb().
>
>> E.g. shouldn't KVM do this?
>
> It sure can and I am happy to include this into the series, the commit
> log is what I am struggling with :)
>
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index ca32389f3c36..4e4a49031efe 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct
>> kvm_vcpu *vcpu)
>>          if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
>>                  return; /* IRET will cause a vm exit */
>> +       /*
>> +        * KVM can't single-step SEV-ES guests and instead assumes
>> that IRET
>> +        * in the guest will always succeed,
>
> It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET):
>
>         case SVM_VMGEXIT_NMI_COMPLETE:
>                 ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>                 break;
>
>
>> i.e. clears NMI masking on the
>> +        * next VM-Exit.  Note, GIF is guaranteed to be '1' for SEV-ES
>> guests
>> +        * as the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
>> +        * EFER.SVME for good measure, see efer_trap()).
>
> SVM KVM seems to not enforce EFER.SVME, the guest does what it wants and
> KVM is only told the new value via EFER_WRITE_TRAP. And "writes by
> SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM.
> I must be missing the point here...
>
>
>> +        */
>> +       if (sev_es_guest(vcpu->kvm))
>> +               return;
>> +
>>          if (!gif_set(svm)) {
>>                  if (vgif)
>>                          svm_set_intercept(svm, INTERCEPT_STGI);
>

--
Alexey

2023-06-13 23:40:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

On Fri, Jun 02, 2023, Alexey Kardashevskiy wrote:
> Sean, ping?
>=20
> I wonder if this sev-es-not-singlestepping is a showstopper or it is alri=
ght
> to repost this patchset without it? Thanks,

Ah, shoot, I completely lost this in my inbox. Sorry :-/

> > > Side topic, isn't there an existing bug regarding SEV-ES NMI windows?
> > > KVM can't actually single-step an SEV-ES guest, but tries to set
> > > RFLAGS.TF anyways.
> >=20
> > Why is it a "bug" and what does the patch fix? Sound to me as it is
> > pointless and the guest won't do single stepping and instead will run
> > till it exits somehow, what do I miss?

The bug is benign in the end, but it's still a bug. I'm not worried about =
fixing
any behavior, but I dislike having dead, misleading code, especially for so=
mething
like this where both NMI virtualization and SEV-ES are already crazy comple=
x and
subtle. I think it's safe to say that I've spent more time digging through=
SEV-ES
and NMI virtualization than most KVM developers, and as evidenced by the nu=
mber of
things I got wrong below, I'm still struggling to keep track of the bigger =
picture.
Developers that are new to all of this need as much help as they can get.

> > > Blech, and suppressing EFER.SVME in efer_trap() is a bit gross,
> >=20
> > Why suppressed? svm_set_efer() sets it eventually anyway.

svm_set_efer() sets SVME in hardware, but KVM's view of the guest's value t=
hat's
stored in vcpu->arch.efer doesn't have SVME set. E.g. from the guest's per=
spective,
EFER.SVME will have "Reserved Read As Zero" semantics.

> > > but I suppose since the GHCB doesn't allow for CLGI or STGI it's "fin=
e".
> >=20
> > GHCB does not mention this, instead these are always intercepted in
> > init_vmcb().

Right, I'm calling out that the absense of protocol support for requesting =
CLGI
or STGI emulation means dropping the guest's EFER.SVME is ok (though gross =
:-) ).

> > > E.g. shouldn't KVM do this?
> >=20
> > It sure can and I am happy to include this into the series, the commit
> > log is what I am struggling with :)
> >=20
> > >=20
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index ca32389f3c36..4e4a49031efe 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct
> > > kvm_vcpu *vcpu)
> > > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
> > > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=
return; /* IRET will cause a vm exit */
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD /*
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * KV=
M can't single-step SEV-ES guests and instead assumes
> > > that IRET
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * in=
the guest will always succeed,
> >=20
> > It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET=
):
> >=20
> > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD case S=
VM_VMGEXIT_NMI_COMPLETE:
> > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D =
svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
> > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD break;

Ah, right, better to say that the guest is responsible for signaling that i=
t's
ready to accept NMIs, which KVM handles by "emulating" IRET.

> > > i.e. clears NMI masking on the
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * ne=
xt VM-Exit.=EF=BF=BD Note, GIF is guaranteed to be '1' for
> > > SEV-ES guests
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * as=
the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
> > > +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * EF=
ER.SVME for good measure, see efer_trap()).
> >=20
> > SVM KVM seems to not enforce EFER.SVME, the guest does what it wants an=
d
> > KVM is only told the new value via EFER_WRITE_TRAP. And "writes by
> > SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM=
.

Ahhh, that blurb in the APM is what I'm missing.

Actually, there's a real bug here. KVM doesn't immediately unmask NMIs in =
response
to NMI_COMPLETE, and instead goes through the whole awaiting_iret_completio=
n =3D>
svm_complete_interrupts(), which means that KVM doesn't unmask NMIs until t=
he
*next* VM-Exit. Theoretically, that could be never, e.g. if the host is ti=
ckless
and the guest is configured to busy wait idle CPUs.

Attached patches are compile tested only.


Attachments:
(No filename) (4.88 kB)
3D"0001-KVM-SVM-Don-t-defer-NMI-unblocking-until-next-exit-f.patc= (3.02 kB)
3D"0002-KVM-SVM-Don-t-try-to-pointlessly-single-step-SEV-ES-.patc= (1.83 kB)
Download all attachments

2023-06-14 04:12:19

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

On 14/6/23 09:19, Sean Christopherson wrote:
> On Fri, Jun 02, 2023, Alexey Kardashevskiy wrote:
>> Sean, ping?
>> =20
>> I wonder if this sev-es-not-singlestepping is a showstopper or it is alri=
> ght
>> to repost this patchset without it? Thanks,
>
> Ah, shoot, I completely lost this in my inbox. Sorry :-/

I saw the "OOO" message the other day and relaxed :)


>>>> Side topic, isn't there an existing bug regarding SEV-ES NMI windows?
>>>> KVM can't actually single-step an SEV-ES guest, but tries to set
>>>> RFLAGS.TF anyways.
>>> =20
>>> Why is it a "bug" and what does the patch fix? Sound to me as it is
>>> pointless and the guest won't do single stepping and instead will run
>>> till it exits somehow, what do I miss?
>
> The bug is benign in the end, but it's still a bug. I'm not worried about =


(unrelated) Your response's encoding broke somehow and I wonder if this
is something I did or you did. Lore got it too:

https://lore.kernel.org/all/[email protected]/


> fixing
> any behavior, but I dislike having dead, misleading code, especially for so=
> mething
> like this where both NMI virtualization and SEV-ES are already crazy comple=
> x and
> subtle. I think it's safe to say that I've spent more time digging through=
> SEV-ES
> and NMI virtualization than most KVM developers, and as evidenced by the nu=
> mber of
> things I got wrong below, I'm still struggling to keep track of the bigger =
> picture.
> Developers that are new to all of this need as much help as they can get.
>
>>>> Blech, and suppressing EFER.SVME in efer_trap() is a bit gross,
>>> =20
>>> Why suppressed? svm_set_efer() sets it eventually anyway.
>
> svm_set_efer() sets SVME in hardware, but KVM's view of the guest's value t=
> hat's
> stored in vcpu->arch.efer doesn't have SVME set. E.g. from the guest's per=
> spective,
> EFER.SVME will have "Reserved Read As Zero" semantics.

It is not zero, why? From inside the guest, rdmsrl(MSR_EFER, efer) reads
0x1d01 from that msr where 0x1000==(1<<_EFER_SVME), _EFER_SVME==12.


>
>>>> but I suppose since the GHCB doesn't allow for CLGI or STGI it's "fin=
> e".
>>> =20
>>> GHCB does not mention this, instead these are always intercepted in
>>> init_vmcb().
>
> Right, I'm calling out that the absense of protocol support for requesting =
> CLGI
> or STGI emulation means dropping the guest's EFER.SVME is ok (though gross =
> :-) ).
>
>>>> E.g. shouldn't KVM do this?
>>> =20
>>> It sure can and I am happy to include this into the series, the commit
>>> log is what I am struggling with :)
>>> =20
>>>> =20
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index ca32389f3c36..4e4a49031efe 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct
>>>> kvm_vcpu *vcpu)
>>>> =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion)
>>>> =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=
> return; /* IRET will cause a vm exit */
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD /*
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * KV=
> M can't single-step SEV-ES guests and instead assumes
>>>> that IRET
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * in=
> the guest will always succeed,
>>> =20
>>> It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET=
> ):
>>> =20
>>> =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD case S=
> VM_VMGEXIT_NMI_COMPLETE:
>>> =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D =
> svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET);
>>> =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=
> =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD break;
>
> Ah, right, better to say that the guest is responsible for signaling that i=
> t's
> ready to accept NMIs, which KVM handles by "emulating" IRET.
>
>>>> i.e. clears NMI masking on the
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * ne=
> xt VM-Exit.=EF=BF=BD Note, GIF is guaranteed to be '1' for
>>>> SEV-ES guests
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * as=
> the GHCB doesn't allow for CLGI or STGI (and KVM suppresses
>>>> +=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD * EF=
> ER.SVME for good measure, see efer_trap()).
>>> =20
>>> SVM KVM seems to not enforce EFER.SVME, the guest does what it wants an=
> d
>>> KVM is only told the new value via EFER_WRITE_TRAP. And "writes by
>>> SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM=
> .
>
> Ahhh, that blurb in the APM is what I'm missing.
>
> Actually, there's a real bug here. KVM doesn't immediately unmask NMIs in =
> response
> to NMI_COMPLETE, and instead goes through the whole awaiting_iret_completio=
> n =3D>
> svm_complete_interrupts(), which means that KVM doesn't unmask NMIs until t=
> he
> *next* VM-Exit. Theoretically, that could be never, e.g. if the host is ti=
> ckless
> and the guest is configured to busy wait idle CPUs.
>
> Attached patches are compile tested only.

Well, NMIs still get injected from QEMU so I guess it is a pass? Thanks,

--
Alexey


2023-06-14 22:58:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH kernel v5 5/6] KVM: SEV: Enable data breakpoints in SEV-ES

On Wed, Jun 14, 2023, Alexey Kardashevskiy wrote:
> On 14/6/23 09:19, Sean Christopherson wrote:
> > On Fri, Jun 02, 2023, Alexey Kardashevskiy wrote:
> > > > > Side topic, isn't there an existing bug regarding SEV-ES NMI windows?
> > > > > KVM can't actually single-step an SEV-ES guest, but tries to set
> > > > > RFLAGS.TF anyways.
> > > > =20
> > > > Why is it a "bug" and what does the patch fix? Sound to me as it is
> > > > pointless and the guest won't do single stepping and instead will run
> > > > till it exits somehow, what do I miss?
> >
> > The bug is benign in the end, but it's still a bug. I'm not worried about =
>
>
> (unrelated) Your response's encoding broke somehow and I wonder if this is
> something I did or you did. Lore got it too:
>
> https://lore.kernel.org/all/[email protected]/

Huh. Guessing something I did, but I've no idea what caused it.

> > fixing
> > any behavior, but I dislike having dead, misleading code, especially for so=
> > mething
> > like this where both NMI virtualization and SEV-ES are already crazy comple=
> > x and
> > subtle. I think it's safe to say that I've spent more time digging through=
> > SEV-ES
> > and NMI virtualization than most KVM developers, and as evidenced by the nu=
> > mber of
> > things I got wrong below, I'm still struggling to keep track of the bigger =
> > picture.
> > Developers that are new to all of this need as much help as they can get.
> >
> > > > > Blech, and suppressing EFER.SVME in efer_trap() is a bit gross,
> > > > =20
> > > > Why suppressed? svm_set_efer() sets it eventually anyway.
> >
> > svm_set_efer() sets SVME in hardware, but KVM's view of the guest's value t=
> > hat's
> > stored in vcpu->arch.efer doesn't have SVME set. E.g. from the guest's per=
> > spective,
> > EFER.SVME will have "Reserved Read As Zero" semantics.
>
> It is not zero, why? From inside the guest, rdmsrl(MSR_EFER, efer) reads
> 0x1d01 from that msr where 0x1000==(1<<_EFER_SVME), _EFER_SVME==12.

Oh, lame. So the guest gets to see the raw value in the VMSA. So it really comes
down to the GHCB not providing support for STGI/CLGI.