2021-06-28 10:46:29

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH 0/6] KVM: nSVM: Fix issues when SMM is entered from L2

This is a continuation of "[PATCH RFC] KVM: nSVM: Fix L1 state corruption
upon return from SMM".

VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the
nested L2 guest") broke return from SMM when we entered there from guest
(L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
appears to be that VMCB01 gets irreversibly destroyed during SMM execution.
Previously, we used to have 'hsave' VMCB where regular (pre-SMM) L1's state
was saved upon nested_svm_vmexit() but now we just switch to VMCB01 from
VMCB02.

While writing a selftest for the issue, I've noticed that 'svm->nested.ctl'
doesn't get restored after KVM_GET_NESTED_STATE/KVM_SET_NESTED_STATE cycle
when guest happens to be in SMM triggered from L2. "KVM: nSVM: Restore
nested control upon leaving SMM" is aimed to fix that.

First two patches of the series add missing sanity checks for
MSR_VM_HSAVE_PA which has to be page aligned and not zero.

Vitaly Kuznetsov (6):
KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA
KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN
KVM: nSVM: Introduce svm_copy_nonvmloadsave_state()
KVM: nSVM: Fix L1 state corruption upon return from SMM
KVM: nSVM: Restore nested control upon leaving SMM
KVM: selftests: smm_test: Test SMM enter from L2

arch/x86/kvm/svm/nested.c | 45 +++++++-----
arch/x86/kvm/svm/svm.c | 51 +++++++++++++-
arch/x86/kvm/svm/svm.h | 4 ++
tools/testing/selftests/kvm/x86_64/smm_test.c | 70 +++++++++++++++++--
4 files changed, 144 insertions(+), 26 deletions(-)

--
2.31.1


2021-06-28 10:46:51

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH 1/6] KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA

APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when
the supplied address is not page-aligned or is outside of "maximum
supported physical address for this implementation".
page_address_valid() check seems suitable. Also, forcefully page-align
the address when it's written from VMM.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/svm/svm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8834822c00cd..b6f85fd19f96 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2941,7 +2941,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
svm_disable_lbrv(vcpu);
break;
case MSR_VM_HSAVE_PA:
- svm->nested.hsave_msr = data;
+ if (!msr->host_initiated && !page_address_valid(vcpu, data))
+ return 1;
+
+ svm->nested.hsave_msr = data & PAGE_MASK;
break;
case MSR_VM_CR:
return svm_set_vm_cr(vcpu, data);
--
2.31.1

2021-06-28 10:47:02

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH 2/6] KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN

APM states that "The address written to the VM_HSAVE_PA MSR, which holds
the address of the page used to save the host state on a VMRUN, must point
to a hypervisor-owned page. If this check fails, the WRMSR will fail with
a #GP(0) exception. Note that a value of 0 is not considered valid for the
VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will
fail with a #GP(0) exception."

svm_set_msr() already checks that the supplied address is valid, so only
check for '0' is missing. Add it to nested_svm_vmrun().

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/svm/nested.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 21d03e3a5dfd..1c6b0698b52e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -618,6 +618,11 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
struct kvm_host_map map;
u64 vmcb12_gpa;

+ if (!svm->nested.hsave_msr) {
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+
if (is_smm(vcpu)) {
kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
--
2.31.1

2021-06-28 10:47:36

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH 3/6] KVM: nSVM: Introduce svm_copy_nonvmloadsave_state()

Separate the code setting non-VMLOAD-VMSAVE state from
svm_set_nested_state() into its own function. This is going to be
re-used from svm_enter_smm()/svm_leave_smm().

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/svm/nested.c | 36 +++++++++++++++++++++---------------
arch/x86/kvm/svm/svm.h | 2 ++
2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1c6b0698b52e..a1dec2c40181 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -697,6 +697,26 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return ret;
}

+void svm_copy_nonvmloadsave_state(struct vmcb_save_area *from_save,
+ struct vmcb_save_area *to_save)
+{
+ to_save->es = from_save->es;
+ to_save->cs = from_save->cs;
+ to_save->ss = from_save->ss;
+ to_save->ds = from_save->ds;
+ to_save->gdtr = from_save->gdtr;
+ to_save->idtr = from_save->idtr;
+ to_save->rflags = from_save->rflags | X86_EFLAGS_FIXED;
+ to_save->efer = from_save->efer;
+ to_save->cr0 = from_save->cr0;
+ to_save->cr3 = from_save->cr3;
+ to_save->cr4 = from_save->cr4;
+ to_save->rax = from_save->rax;
+ to_save->rsp = from_save->rsp;
+ to_save->rip = from_save->rip;
+ to_save->cpl = 0;
+}
+
void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
{
to_vmcb->save.fs = from_vmcb->save.fs;
@@ -1360,21 +1380,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,

svm->nested.vmcb12_gpa = kvm_state->hdr.svm.vmcb_pa;

- svm->vmcb01.ptr->save.es = save->es;
- svm->vmcb01.ptr->save.cs = save->cs;
- svm->vmcb01.ptr->save.ss = save->ss;
- svm->vmcb01.ptr->save.ds = save->ds;
- svm->vmcb01.ptr->save.gdtr = save->gdtr;
- svm->vmcb01.ptr->save.idtr = save->idtr;
- svm->vmcb01.ptr->save.rflags = save->rflags | X86_EFLAGS_FIXED;
- svm->vmcb01.ptr->save.efer = save->efer;
- svm->vmcb01.ptr->save.cr0 = save->cr0;
- svm->vmcb01.ptr->save.cr3 = save->cr3;
- svm->vmcb01.ptr->save.cr4 = save->cr4;
- svm->vmcb01.ptr->save.rax = save->rax;
- svm->vmcb01.ptr->save.rsp = save->rsp;
- svm->vmcb01.ptr->save.rip = save->rip;
- svm->vmcb01.ptr->save.cpl = 0;
+ svm_copy_nonvmloadsave_state(save, &svm->vmcb01.ptr->save);

nested_load_control_from_vmcb12(svm, ctl);

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f89b623bb591..ff2dac2b23b6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -463,6 +463,8 @@ void svm_leave_nested(struct vcpu_svm *svm);
void svm_free_nested(struct vcpu_svm *svm);
int svm_allocate_nested(struct vcpu_svm *svm);
int nested_svm_vmrun(struct kvm_vcpu *vcpu);
+void svm_copy_nonvmloadsave_state(struct vmcb_save_area *from_save,
+ struct vmcb_save_area *to_save);
void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
int nested_svm_vmexit(struct vcpu_svm *svm);

--
2.31.1

2021-06-28 10:47:51

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH 5/6] KVM: nSVM: Restore nested control upon leaving SMM

In case nested state was saved/resored while in SMM,
nested_load_control_from_vmcb12() which sets svm->nested.ctl was never
called and the first nested_vmcb_check_controls() (either from
nested_svm_vmrun() or from svm_set_nested_state() if save/restore
cycle is repeated) is doomed to fail.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/svm/nested.c | 4 ++--
arch/x86/kvm/svm/svm.c | 7 ++++++-
arch/x86/kvm/svm/svm.h | 2 ++
3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a1dec2c40181..6549e40155fa 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -304,8 +304,8 @@ static bool nested_vmcb_valid_sregs(struct kvm_vcpu *vcpu,
return true;
}

-static void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
- struct vmcb_control_area *control)
+void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
+ struct vmcb_control_area *control)
{
copy_vmcb_control_area(&svm->nested.ctl, control);

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fbf1b352a9bb..525b07873927 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4344,6 +4344,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
u64 saved_efer = GET_SMSTATE(u64, smstate, 0x7ed0);
u64 guest = GET_SMSTATE(u64, smstate, 0x7ed8);
u64 vmcb12_gpa = GET_SMSTATE(u64, smstate, 0x7ee0);
+ struct vmcb *vmcb12;

if (guest) {
if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
@@ -4359,7 +4360,11 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
if (svm_allocate_nested(svm))
return 1;

- ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, map.hva);
+ vmcb12 = map.hva;
+
+ nested_load_control_from_vmcb12(svm, &vmcb12->control);
+
+ ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12);
kvm_vcpu_unmap(vcpu, &map, true);

/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ff2dac2b23b6..13f2d465ca36 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -481,6 +481,8 @@ int nested_svm_check_permissions(struct kvm_vcpu *vcpu);
int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
bool has_error_code, u32 error_code);
int nested_svm_exit_special(struct vcpu_svm *svm);
+void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
+ struct vmcb_control_area *control);
void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
--
2.31.1

2021-06-28 10:48:12

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH 6/6] KVM: selftests: smm_test: Test SMM enter from L2

Two additional tests are added:
- SMM triggered from L2 does not currupt L1 host state.
- Save/restore during SMM triggered from L2 does not corrupt guest/host
state.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/x86_64/smm_test.c | 70 +++++++++++++++++--
1 file changed, 64 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index c1f831803ad2..d0fe2fdce58c 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -53,15 +53,28 @@ static inline void sync_with_host(uint64_t phase)
: "+a" (phase));
}

-void self_smi(void)
+static void self_smi(void)
{
x2apic_write_reg(APIC_ICR,
APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
}

-void guest_code(void *arg)
+static void l2_guest_code(void)
{
+ sync_with_host(8);
+
+ sync_with_host(10);
+
+ vmcall();
+}
+
+static void guest_code(void *arg)
+{
+ #define L2_GUEST_STACK_SIZE 64
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
uint64_t apicbase = rdmsr(MSR_IA32_APICBASE);
+ struct svm_test_data *svm = arg;
+ struct vmx_pages *vmx_pages = arg;

sync_with_host(1);

@@ -74,21 +87,50 @@ void guest_code(void *arg)
sync_with_host(4);

if (arg) {
- if (cpu_has_svm())
- generic_svm_setup(arg, NULL, NULL);
- else
- GUEST_ASSERT(prepare_for_vmx_operation(arg));
+ if (cpu_has_svm()) {
+ generic_svm_setup(svm, l2_guest_code,
+ &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ } else {
+ GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
+ GUEST_ASSERT(load_vmcs(vmx_pages));
+ prepare_vmcs(vmx_pages, l2_guest_code,
+ &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+ }

sync_with_host(5);

self_smi();

sync_with_host(7);
+
+ if (cpu_has_svm()) {
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ svm->vmcb->save.rip += 3;
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ } else {
+ vmlaunch();
+ vmresume();
+ }
+
+ /* Stages 8-11 are eaten by SMM (SMRAM_STAGE reported instead) */
+ sync_with_host(12);
}

sync_with_host(DONE);
}

+void inject_smi(struct kvm_vm *vm)
+{
+ struct kvm_vcpu_events events;
+
+ vcpu_events_get(vm, VCPU_ID, &events);
+
+ events.smi.pending = 1;
+ events.flags |= KVM_VCPUEVENT_VALID_SMM;
+
+ vcpu_events_set(vm, VCPU_ID, &events);
+}
+
int main(int argc, char *argv[])
{
vm_vaddr_t nested_gva = 0;
@@ -147,6 +189,22 @@ int main(int argc, char *argv[])
"Unexpected stage: #%x, got %x",
stage, stage_reported);

+ /*
+ * Enter SMM during L2 execution and check that we correctly
+ * return from it. Do not perform save/restore while in SMM yet.
+ */
+ if (stage == 8) {
+ inject_smi(vm);
+ continue;
+ }
+
+ /*
+ * Perform save/restore while the guest is in SMM triggered
+ * during L2 execution.
+ */
+ if (stage == 10)
+ inject_smi(vm);
+
state = vcpu_save_state(vm, VCPU_ID);
kvm_vm_release(vm);
kvm_vm_restart(vm, O_RDWR);
--
2.31.1

2021-06-28 10:49:39

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH 4/6] KVM: nSVM: Fix L1 state corruption upon return from SMM

VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the
nested L2 guest") broke return from SMM when we entered there from guest
(L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
manifests itself like this:

kvm_exit: reason EXIT_RSM rip 0x7ffbb280 info 0 0
kvm_emulate_insn: 0:7ffbb280: 0f aa
kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x7ffb3000
kvm_nested_vmrun: rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
npt: on
kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
intercepts: fd44bfeb 0000217f 00000000
kvm_entry: vcpu 0, rip 0xffffffffffbbe119
kvm_exit: reason EXIT_NPF rip 0xffffffffffbbe119 info
200000006 1ab000
kvm_nested_vmexit: vcpu 0 reason npf rip 0xffffffffffbbe119 info1
0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
error_code 0x00000000
kvm_page_fault: address 1ab000 error_code 6
kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
int_info 0 int_info_err 0
kvm_entry: vcpu 0, rip 0x7ffbb280
kvm_exit: reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
kvm_emulate_insn: 0:7ffbb280: 0f aa
kvm_inj_exception: #GP (0x0)

Note: return to L2 succeeded but upon first exit to L1 its RIP points to
'RSM' instruction but we're not in SMM.

The problem appears to be that VMCB01 gets irreversibly destroyed during
SMM execution. Previously, we used to have 'hsave' VMCB where regular
(pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
switch to VMCB01 from VMCB02.

Pre-split (working) flow looked like:
- SMM is triggered during L2's execution
- L2's state is pushed to SMRAM
- nested_svm_vmexit() restores L1's state from 'hsave'
- SMM -> RSM
- enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
pre-SMM (and pre L2 VMRUN) L1's state there
- L2's state is restored from SMRAM
- upon first exit L1's state is restored from L1.

This was always broken with regards to svm_get_nested_state()/
svm_set_nested_state(): 'hsave' was never a part of what's being
save and restored so migration happening during SMM triggered from L2 would
never restore L1's state correctly.

Post-split flow (broken) looks like:
- SMM is triggered during L2's execution
- L2's state is pushed to SMRAM
- nested_svm_vmexit() switches to VMCB01 from VMCB02
- SMM -> RSM
- enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
is already lost.
- L2's state is restored from SMRAM
- upon first exit L1's state is restored from VMCB01 but it is corrupted
(reflects the state during 'RSM' execution).

VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
and host state so when we switch back to VMCS02 L1's state is intact there.

To resolve the issue we need to save L1's state somewhere. We could've
created a third VMCB for SMM but that would require us to modify saved
state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
seems appropriate: L0 is free to save any (or none) of L1's state there.
Currently, KVM does 'none'.

Note, for nested state migration to succeed, both source and destination
hypervisors must have the fix. We, however, don't need to create a new
flag indicating the fact that HSAVE area is now populated as migration
during SMM triggered from L2 was always broken.

Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
- RFC: I'm not 100% sure my 'smart' idea to use currently-unused HSAVE area
is that smart. Also, we don't even seem to check that L1 set it up upon
nested VMRUN so hypervisors which don't do that may remain broken. A very
much needed selftest is also missing.
---
arch/x86/kvm/svm/svm.c | 39 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b6f85fd19f96..fbf1b352a9bb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4291,6 +4291,7 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ struct kvm_host_map map_save;
int ret;

if (is_guest_mode(vcpu)) {
@@ -4306,6 +4307,29 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
ret = nested_svm_vmexit(svm);
if (ret)
return ret;
+
+ /*
+ * KVM uses VMCB01 to cache L1 host state while L2 runs but
+ * VMCB01 is going to be used during SMM and thus the state will
+ * be lost. Temporary save non-VMLOAD/VMSAVE state to host save
+ * area pointed to by MSR_VM_HSAVE_PA. APM guarantees that the
+ * format of the area is identical to guest save area offsetted
+ * by 0x400 (matches the offset of 'struct vmcb_save_area'
+ * within 'struct vmcb'). Note: HSAVE area may also be used by
+ * L1 hypervisor to save additional host context (e.g. KVM does
+ * that, see svm_prepare_guest_switch()) which must be
+ * preserved.
+ */
+ if (kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.hsave_msr),
+ &map_save) == -EINVAL)
+ return 1;
+
+ BUILD_BUG_ON(offsetof(struct vmcb, save) != 0x400);
+
+ svm_copy_nonvmloadsave_state(&svm->vmcb01.ptr->save,
+ map_save.hva + 0x400);
+
+ kvm_vcpu_unmap(vcpu, &map_save, true);
}
return 0;
}
@@ -4313,7 +4337,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct kvm_host_map map;
+ struct kvm_host_map map, map_save;
int ret = 0;

if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
@@ -4337,6 +4361,19 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)

ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, map.hva);
kvm_vcpu_unmap(vcpu, &map, true);
+
+ /*
+ * Restore L1 host state from L1 HSAVE area as VMCB01 was
+ * used during SMM (see svm_enter_smm())
+ */
+ if (kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.hsave_msr),
+ &map_save) == -EINVAL)
+ return 1;
+
+ svm_copy_nonvmloadsave_state(map_save.hva + 0x400,
+ &svm->vmcb01.ptr->save);
+
+ kvm_vcpu_unmap(vcpu, &map_save, true);
}
}

--
2.31.1

2021-07-05 12:11:09

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/6] KVM: nSVM: Introduce svm_copy_nonvmloadsave_state()

On 28/06/21 12:44, Vitaly Kuznetsov wrote:
> Separate the code setting non-VMLOAD-VMSAVE state from
> svm_set_nested_state() into its own function. This is going to be
> re-used from svm_enter_smm()/svm_leave_smm().
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/svm/nested.c | 36 +++++++++++++++++++++---------------
> arch/x86/kvm/svm/svm.h | 2 ++
> 2 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 1c6b0698b52e..a1dec2c40181 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -697,6 +697,26 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
> return ret;
> }
>
> +void svm_copy_nonvmloadsave_state(struct vmcb_save_area *from_save,
> + struct vmcb_save_area *to_save)

Probably best to name this svm_copy_vmrun_state and perhaps (as a
cleanup) change nested_svm_vmloadsave to svm_copy_vmloadsave_state.

Paolo

> +{
> + to_save->es = from_save->es;
> + to_save->cs = from_save->cs;
> + to_save->ss = from_save->ss;
> + to_save->ds = from_save->ds;
> + to_save->gdtr = from_save->gdtr;
> + to_save->idtr = from_save->idtr;
> + to_save->rflags = from_save->rflags | X86_EFLAGS_FIXED;
> + to_save->efer = from_save->efer;
> + to_save->cr0 = from_save->cr0;
> + to_save->cr3 = from_save->cr3;
> + to_save->cr4 = from_save->cr4;
> + to_save->rax = from_save->rax;
> + to_save->rsp = from_save->rsp;
> + to_save->rip = from_save->rip;
> + to_save->cpl = 0;
> +}
> +
> void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
> {
> to_vmcb->save.fs = from_vmcb->save.fs;
> @@ -1360,21 +1380,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>
> svm->nested.vmcb12_gpa = kvm_state->hdr.svm.vmcb_pa;
>
> - svm->vmcb01.ptr->save.es = save->es;
> - svm->vmcb01.ptr->save.cs = save->cs;
> - svm->vmcb01.ptr->save.ss = save->ss;
> - svm->vmcb01.ptr->save.ds = save->ds;
> - svm->vmcb01.ptr->save.gdtr = save->gdtr;
> - svm->vmcb01.ptr->save.idtr = save->idtr;
> - svm->vmcb01.ptr->save.rflags = save->rflags | X86_EFLAGS_FIXED;
> - svm->vmcb01.ptr->save.efer = save->efer;
> - svm->vmcb01.ptr->save.cr0 = save->cr0;
> - svm->vmcb01.ptr->save.cr3 = save->cr3;
> - svm->vmcb01.ptr->save.cr4 = save->cr4;
> - svm->vmcb01.ptr->save.rax = save->rax;
> - svm->vmcb01.ptr->save.rsp = save->rsp;
> - svm->vmcb01.ptr->save.rip = save->rip;
> - svm->vmcb01.ptr->save.cpl = 0;
> + svm_copy_nonvmloadsave_state(save, &svm->vmcb01.ptr->save);
>
> nested_load_control_from_vmcb12(svm, ctl);
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index f89b623bb591..ff2dac2b23b6 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -463,6 +463,8 @@ void svm_leave_nested(struct vcpu_svm *svm);
> void svm_free_nested(struct vcpu_svm *svm);
> int svm_allocate_nested(struct vcpu_svm *svm);
> int nested_svm_vmrun(struct kvm_vcpu *vcpu);
> +void svm_copy_nonvmloadsave_state(struct vmcb_save_area *from_save,
> + struct vmcb_save_area *to_save);
> void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
> int nested_svm_vmexit(struct vcpu_svm *svm);
>
>

2021-07-07 10:30:57

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN

On Mon, 2021-06-28 at 12:44 +0200, Vitaly Kuznetsov wrote:
> APM states that "The address written to the VM_HSAVE_PA MSR, which holds
> the address of the page used to save the host state on a VMRUN, must point
> to a hypervisor-owned page. If this check fails, the WRMSR will fail with
> a #GP(0) exception. Note that a value of 0 is not considered valid for the
> VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will
> fail with a #GP(0) exception."
>
> svm_set_msr() already checks that the supplied address is valid, so only
> check for '0' is missing. Add it to nested_svm_vmrun().
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/svm/nested.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 21d03e3a5dfd..1c6b0698b52e 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -618,6 +618,11 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
> struct kvm_host_map map;
> u64 vmcb12_gpa;
>
> + if (!svm->nested.hsave_msr) {
> + kvm_inject_gp(vcpu, 0);
> + return 1;
> + }
> +
> if (is_smm(vcpu)) {
> kvm_queue_exception(vcpu, UD_VECTOR);
> return 1;

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2021-07-07 10:30:59

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 3/6] KVM: nSVM: Introduce svm_copy_nonvmloadsave_state()

On Mon, 2021-07-05 at 14:08 +0200, Paolo Bonzini wrote:
> On 28/06/21 12:44, Vitaly Kuznetsov wrote:
> > Separate the code setting non-VMLOAD-VMSAVE state from
> > svm_set_nested_state() into its own function. This is going to be
> > re-used from svm_enter_smm()/svm_leave_smm().
> >
> > Signed-off-by: Vitaly Kuznetsov <[email protected]>
> > ---
> > arch/x86/kvm/svm/nested.c | 36 +++++++++++++++++++++---------------
> > arch/x86/kvm/svm/svm.h | 2 ++
> > 2 files changed, 23 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 1c6b0698b52e..a1dec2c40181 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -697,6 +697,26 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
> > return ret;
> > }
> >
> > +void svm_copy_nonvmloadsave_state(struct vmcb_save_area *from_save,
> > + struct vmcb_save_area *to_save)
>
> Probably best to name this svm_copy_vmrun_state and perhaps (as a
> cleanup) change nested_svm_vmloadsave to svm_copy_vmloadsave_state.

I agree with that. I would also add a comment to both
svm_copy_vmloadsave_state and svm_copy_vmrun_state stating what they
are doing, like "this function copies state save area fields which
are used by vmrun"

Best regards,
Maxim Levitsky


>
> Paolo
>
> > +{
> > + to_save->es = from_save->es;
> > + to_save->cs = from_save->cs;
> > + to_save->ss = from_save->ss;
> > + to_save->ds = from_save->ds;
> > + to_save->gdtr = from_save->gdtr;
> > + to_save->idtr = from_save->idtr;
> > + to_save->rflags = from_save->rflags | X86_EFLAGS_FIXED;
> > + to_save->efer = from_save->efer;
> > + to_save->cr0 = from_save->cr0;
> > + to_save->cr3 = from_save->cr3;
> > + to_save->cr4 = from_save->cr4;
> > + to_save->rax = from_save->rax;
> > + to_save->rsp = from_save->rsp;
> > + to_save->rip = from_save->rip;
> > + to_save->cpl = 0;
> > +}
> > +
> > void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb)
> > {
> > to_vmcb->save.fs = from_vmcb->save.fs;
> > @@ -1360,21 +1380,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
> >
> > svm->nested.vmcb12_gpa = kvm_state->hdr.svm.vmcb_pa;
> >
> > - svm->vmcb01.ptr->save.es = save->es;
> > - svm->vmcb01.ptr->save.cs = save->cs;
> > - svm->vmcb01.ptr->save.ss = save->ss;
> > - svm->vmcb01.ptr->save.ds = save->ds;
> > - svm->vmcb01.ptr->save.gdtr = save->gdtr;
> > - svm->vmcb01.ptr->save.idtr = save->idtr;
> > - svm->vmcb01.ptr->save.rflags = save->rflags | X86_EFLAGS_FIXED;
> > - svm->vmcb01.ptr->save.efer = save->efer;
> > - svm->vmcb01.ptr->save.cr0 = save->cr0;
> > - svm->vmcb01.ptr->save.cr3 = save->cr3;
> > - svm->vmcb01.ptr->save.cr4 = save->cr4;
> > - svm->vmcb01.ptr->save.rax = save->rax;
> > - svm->vmcb01.ptr->save.rsp = save->rsp;
> > - svm->vmcb01.ptr->save.rip = save->rip;
> > - svm->vmcb01.ptr->save.cpl = 0;
> > + svm_copy_nonvmloadsave_state(save, &svm->vmcb01.ptr->save);
> >
> > nested_load_control_from_vmcb12(svm, ctl);
> >
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index f89b623bb591..ff2dac2b23b6 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -463,6 +463,8 @@ void svm_leave_nested(struct vcpu_svm *svm);
> > void svm_free_nested(struct vcpu_svm *svm);
> > int svm_allocate_nested(struct vcpu_svm *svm);
> > int nested_svm_vmrun(struct kvm_vcpu *vcpu);
> > +void svm_copy_nonvmloadsave_state(struct vmcb_save_area *from_save,
> > + struct vmcb_save_area *to_save);
> > void nested_svm_vmloadsave(struct vmcb *from_vmcb, struct vmcb *to_vmcb);
> > int nested_svm_vmexit(struct vcpu_svm *svm);
> >
> >


2021-07-07 10:31:41

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 1/6] KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA

On Mon, 2021-06-28 at 12:44 +0200, Vitaly Kuznetsov wrote:
> APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when
> the supplied address is not page-aligned or is outside of "maximum
> supported physical address for this implementation".
> page_address_valid() check seems suitable. Also, forcefully page-align
> the address when it's written from VMM.

Minor nitpick: I would have checked the host provided value as well,
just in case since there is no reason why it won't pass the same check,
and fail if the value is not aligned.

Other than that:
Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8834822c00cd..b6f85fd19f96 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2941,7 +2941,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> svm_disable_lbrv(vcpu);
> break;
> case MSR_VM_HSAVE_PA:
> - svm->nested.hsave_msr = data;
> + if (!msr->host_initiated && !page_address_valid(vcpu, data))
> + return 1;
> +
> + svm->nested.hsave_msr = data & PAGE_MASK;
> break;
> case MSR_VM_CR:
> return svm_set_vm_cr(vcpu, data);


2021-07-07 10:34:13

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 4/6] KVM: nSVM: Fix L1 state corruption upon return from SMM

On Mon, 2021-06-28 at 12:44 +0200, Vitaly Kuznetsov wrote:
> VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the
> nested L2 guest") broke return from SMM when we entered there from guest
> (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
> manifests itself like this:
>
> kvm_exit: reason EXIT_RSM rip 0x7ffbb280 info 0 0
> kvm_emulate_insn: 0:7ffbb280: 0f aa
> kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x7ffb3000
> kvm_nested_vmrun: rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
> nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
> npt: on
> kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
> intercepts: fd44bfeb 0000217f 00000000
> kvm_entry: vcpu 0, rip 0xffffffffffbbe119
> kvm_exit: reason EXIT_NPF rip 0xffffffffffbbe119 info
> 200000006 1ab000
> kvm_nested_vmexit: vcpu 0 reason npf rip 0xffffffffffbbe119 info1
> 0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
> error_code 0x00000000
> kvm_page_fault: address 1ab000 error_code 6
> kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
> int_info 0 int_info_err 0
> kvm_entry: vcpu 0, rip 0x7ffbb280
> kvm_exit: reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
> kvm_emulate_insn: 0:7ffbb280: 0f aa
> kvm_inj_exception: #GP (0x0)
>
> Note: return to L2 succeeded but upon first exit to L1 its RIP points to
> 'RSM' instruction but we're not in SMM.
>
> The problem appears to be that VMCB01 gets irreversibly destroyed during
> SMM execution. Previously, we used to have 'hsave' VMCB where regular
> (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
> switch to VMCB01 from VMCB02.
>
> Pre-split (working) flow looked like:
> - SMM is triggered during L2's execution
> - L2's state is pushed to SMRAM
> - nested_svm_vmexit() restores L1's state from 'hsave'
> - SMM -> RSM
> - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
> pre-SMM (and pre L2 VMRUN) L1's state there
> - L2's state is restored from SMRAM
> - upon first exit L1's state is restored from L1.
>
> This was always broken with regards to svm_get_nested_state()/
> svm_set_nested_state(): 'hsave' was never a part of what's being
> save and restored so migration happening during SMM triggered from L2 would
> never restore L1's state correctly.
>
> Post-split flow (broken) looks like:
> - SMM is triggered during L2's execution
> - L2's state is pushed to SMRAM
> - nested_svm_vmexit() switches to VMCB01 from VMCB02
> - SMM -> RSM
> - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
> is already lost.
> - L2's state is restored from SMRAM
> - upon first exit L1's state is restored from VMCB01 but it is corrupted
> (reflects the state during 'RSM' execution).
>
> VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
> and host state so when we switch back to VMCS02 L1's state is intact there.
>
> To resolve the issue we need to save L1's state somewhere. We could've
> created a third VMCB for SMM but that would require us to modify saved
> state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
> seems appropriate: L0 is free to save any (or none) of L1's state there.
> Currently, KVM does 'none'.
>
> Note, for nested state migration to succeed, both source and destination
> hypervisors must have the fix. We, however, don't need to create a new
> flag indicating the fact that HSAVE area is now populated as migration
> during SMM triggered from L2 was always broken.
>
> Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> - RFC: I'm not 100% sure my 'smart' idea to use currently-unused HSAVE area
> is that smart. Also, we don't even seem to check that L1 set it up upon
> nested VMRUN so hypervisors which don't do that may remain broken. A very
> much needed selftest is also missing.
> ---
> arch/x86/kvm/svm/svm.c | 39 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b6f85fd19f96..fbf1b352a9bb 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4291,6 +4291,7 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> + struct kvm_host_map map_save;
> int ret;
>
> if (is_guest_mode(vcpu)) {
> @@ -4306,6 +4307,29 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> ret = nested_svm_vmexit(svm);
> if (ret)
> return ret;
> +
> + /*
> + * KVM uses VMCB01 to cache L1 host state while L2 runs but
> + * VMCB01 is going to be used during SMM and thus the state will
> + * be lost. Temporary save non-VMLOAD/VMSAVE state to host save
> + * area pointed to by MSR_VM_HSAVE_PA. APM guarantees that the
> + * format of the area is identical to guest save area offsetted
> + * by 0x400 (matches the offset of 'struct vmcb_save_area'
> + * within 'struct vmcb'). Note: HSAVE area may also be used by
> + * L1 hypervisor to save additional host context (e.g. KVM does
> + * that, see svm_prepare_guest_switch()) which must be
> + * preserved.
> + */

Minor nitpick: I woudn't say that KVM "caches" L1 host state as this implies
that there is a master copy somewhere.
I would write something like that instead:

"KVM stores L1 host state in VMCB01 because the CPU naturally stores it there."

Other than that the comment looks very good.



> + if (kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.hsave_msr),
> + &map_save) == -EINVAL)
> + return 1;

Looks like the return value of vendor specific 'enter_smm' isn't checked
in the common 'enter_smm' handler which itself returns 'void', and doesn't
check anything it does either.

enter_smm is called from inject_pending_event which is allowed to return
a negative value which is assumed to be -EBUSY and doesn't lead
to exit with negative value to userspace.
I vote to fix this and only hide -EBUSY from the userspace,
and let all other errors from this function end up in userspace
so that userspace could kill the guest.

This isn't this patch fault though, and I think I'll send a patch
to do the above.


> +
> + BUILD_BUG_ON(offsetof(struct vmcb, save) != 0x400);
I doubt that this will ever change, but let it be.
> +
> + svm_copy_nonvmloadsave_state(&svm->vmcb01.ptr->save,
> + map_save.hva + 0x400);
> +
> + kvm_vcpu_unmap(vcpu, &map_save, true);
> }
> return 0;
> }
> @@ -4313,7 +4337,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> - struct kvm_host_map map;
> + struct kvm_host_map map, map_save;
> int ret = 0;
>
> if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
> @@ -4337,6 +4361,19 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
>
> ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, map.hva);
> kvm_vcpu_unmap(vcpu, &map, true);
> +
> + /*
> + * Restore L1 host state from L1 HSAVE area as VMCB01 was
> + * used during SMM (see svm_enter_smm())
> + */
Looks fine as well.

> + if (kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.hsave_msr),
> + &map_save) == -EINVAL)
> + return 1;
> +
> + svm_copy_nonvmloadsave_state(map_save.hva + 0x400,
> + &svm->vmcb01.ptr->save);
> +
> + kvm_vcpu_unmap(vcpu, &map_save, true);
> }
> }
>

So besides the nitpick of the comment,
Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2021-07-07 10:37:37

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: selftests: smm_test: Test SMM enter from L2

On Mon, 2021-06-28 at 12:44 +0200, Vitaly Kuznetsov wrote:
> Two additional tests are added:
> - SMM triggered from L2 does not currupt L1 host state.
> - Save/restore during SMM triggered from L2 does not corrupt guest/host
> state.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> tools/testing/selftests/kvm/x86_64/smm_test.c | 70 +++++++++++++++++--
> 1 file changed, 64 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
> index c1f831803ad2..d0fe2fdce58c 100644
> --- a/tools/testing/selftests/kvm/x86_64/smm_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
> @@ -53,15 +53,28 @@ static inline void sync_with_host(uint64_t phase)
> : "+a" (phase));
> }
>
> -void self_smi(void)
> +static void self_smi(void)
> {
> x2apic_write_reg(APIC_ICR,
> APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
> }
>
> -void guest_code(void *arg)
> +static void l2_guest_code(void)
> {
> + sync_with_host(8);
> +
> + sync_with_host(10);
> +
> + vmcall();
> +}
> +
> +static void guest_code(void *arg)
> +{
> + #define L2_GUEST_STACK_SIZE 64
> + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
> uint64_t apicbase = rdmsr(MSR_IA32_APICBASE);
> + struct svm_test_data *svm = arg;
> + struct vmx_pages *vmx_pages = arg;
>
> sync_with_host(1);
>
> @@ -74,21 +87,50 @@ void guest_code(void *arg)
> sync_with_host(4);
>
> if (arg) {
> - if (cpu_has_svm())
> - generic_svm_setup(arg, NULL, NULL);
> - else
> - GUEST_ASSERT(prepare_for_vmx_operation(arg));
> + if (cpu_has_svm()) {
> + generic_svm_setup(svm, l2_guest_code,
> + &l2_guest_stack[L2_GUEST_STACK_SIZE]);
> + } else {
> + GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
> + GUEST_ASSERT(load_vmcs(vmx_pages));
> + prepare_vmcs(vmx_pages, l2_guest_code,
> + &l2_guest_stack[L2_GUEST_STACK_SIZE]);
> + }
>
> sync_with_host(5);
>
> self_smi();
>
> sync_with_host(7);
> +
> + if (cpu_has_svm()) {
> + run_guest(svm->vmcb, svm->vmcb_gpa);
> + svm->vmcb->save.rip += 3;
> + run_guest(svm->vmcb, svm->vmcb_gpa);
> + } else {
> + vmlaunch();
> + vmresume();
> + }
> +
> + /* Stages 8-11 are eaten by SMM (SMRAM_STAGE reported instead) */
> + sync_with_host(12);
> }
>
> sync_with_host(DONE);
> }
>
> +void inject_smi(struct kvm_vm *vm)
> +{
> + struct kvm_vcpu_events events;
> +
> + vcpu_events_get(vm, VCPU_ID, &events);
> +
> + events.smi.pending = 1;
> + events.flags |= KVM_VCPUEVENT_VALID_SMM;
> +
> + vcpu_events_set(vm, VCPU_ID, &events);
> +}
> +
> int main(int argc, char *argv[])
> {
> vm_vaddr_t nested_gva = 0;
> @@ -147,6 +189,22 @@ int main(int argc, char *argv[])
> "Unexpected stage: #%x, got %x",
> stage, stage_reported);
>
> + /*
> + * Enter SMM during L2 execution and check that we correctly
> + * return from it. Do not perform save/restore while in SMM yet.
> + */
> + if (stage == 8) {
> + inject_smi(vm);
> + continue;
> + }
> +
> + /*
> + * Perform save/restore while the guest is in SMM triggered
> + * during L2 execution.
> + */
> + if (stage == 10)
> + inject_smi(vm);
> +
> state = vcpu_save_state(vm, VCPU_ID);
> kvm_vm_release(vm);
> kvm_vm_restart(vm, O_RDWR);

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2021-07-07 10:38:10

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: nSVM: Restore nested control upon leaving SMM

On Mon, 2021-06-28 at 12:44 +0200, Vitaly Kuznetsov wrote:
> In case nested state was saved/resored while in SMM,
> nested_load_control_from_vmcb12() which sets svm->nested.ctl was never
> called and the first nested_vmcb_check_controls() (either from
> nested_svm_vmrun() or from svm_set_nested_state() if save/restore
> cycle is repeated) is doomed to fail.

I don't like the commit description.

I propose something like that:

If the VM was migrated while in SMM, no nested state was saved/restored,
and therefore svm_leave_smm has to load both save and control area
of the vmcb12. Save area is already loaded from HSAVE area,
so now load the control area as well from the vmcb12.

However if you like to, feel free to leave the commit message as is.
My point is that while in SMM, SVM is fully disabled, so not only
svm->nested.ctl is not set but no nested state is loaded/stored at all.

Also this makes the svm_leave_smm even more dangerous versus errors,
as I said in previos patch. Since its return value is ignored,
and we are loading here the guest vmcb01 which can change under
our feet, lots of fun can happen (enter_svm_guest_mode result
isn't really checked).

Best regards,
Maxim Levitsky

>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/svm/nested.c | 4 ++--
> arch/x86/kvm/svm/svm.c | 7 ++++++-
> arch/x86/kvm/svm/svm.h | 2 ++
> 3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index a1dec2c40181..6549e40155fa 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -304,8 +304,8 @@ static bool nested_vmcb_valid_sregs(struct kvm_vcpu *vcpu,
> return true;
> }
>
> -static void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
> - struct vmcb_control_area *control)
> +void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
> + struct vmcb_control_area *control)
> {
> copy_vmcb_control_area(&svm->nested.ctl, control);
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index fbf1b352a9bb..525b07873927 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4344,6 +4344,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
> u64 saved_efer = GET_SMSTATE(u64, smstate, 0x7ed0);
> u64 guest = GET_SMSTATE(u64, smstate, 0x7ed8);
> u64 vmcb12_gpa = GET_SMSTATE(u64, smstate, 0x7ee0);
> + struct vmcb *vmcb12;
>
> if (guest) {
> if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
> @@ -4359,7 +4360,11 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
> if (svm_allocate_nested(svm))
> return 1;
>
> - ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, map.hva);
> + vmcb12 = map.hva;
> +
> + nested_load_control_from_vmcb12(svm, &vmcb12->control);
> +
> + ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12);
> kvm_vcpu_unmap(vcpu, &map, true);
>
> /*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index ff2dac2b23b6..13f2d465ca36 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -481,6 +481,8 @@ int nested_svm_check_permissions(struct kvm_vcpu *vcpu);
> int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
> bool has_error_code, u32 error_code);
> int nested_svm_exit_special(struct vcpu_svm *svm);
> +void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
> + struct vmcb_control_area *control);
> void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
> void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
> void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);


2021-07-08 17:29:13

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 1/6] KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA

On 07/07/21 12:28, Maxim Levitsky wrote:
> Minor nitpick: I would have checked the host provided value as well,
> just in case since there is no reason why it won't pass the same check,
> and fail if the value is not aligned.

The reason not to do so is that it would allow a guest running an old
kernel to defeat live migration.

Paolo

> Other than that:
> Reviewed-by: Maxim Levitsky<[email protected]>

2021-07-08 17:42:08

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: nSVM: Fix issues when SMM is entered from L2

On 28/06/21 12:44, Vitaly Kuznetsov wrote:
> This is a continuation of "[PATCH RFC] KVM: nSVM: Fix L1 state corruption
> upon return from SMM".
>
> VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the
> nested L2 guest") broke return from SMM when we entered there from guest
> (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
> appears to be that VMCB01 gets irreversibly destroyed during SMM execution.
> Previously, we used to have 'hsave' VMCB where regular (pre-SMM) L1's state
> was saved upon nested_svm_vmexit() but now we just switch to VMCB01 from
> VMCB02.
>
> While writing a selftest for the issue, I've noticed that 'svm->nested.ctl'
> doesn't get restored after KVM_GET_NESTED_STATE/KVM_SET_NESTED_STATE cycle
> when guest happens to be in SMM triggered from L2. "KVM: nSVM: Restore
> nested control upon leaving SMM" is aimed to fix that.
>
> First two patches of the series add missing sanity checks for
> MSR_VM_HSAVE_PA which has to be page aligned and not zero.
>
> Vitaly Kuznetsov (6):
> KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA
> KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN
> KVM: nSVM: Introduce svm_copy_nonvmloadsave_state()
> KVM: nSVM: Fix L1 state corruption upon return from SMM
> KVM: nSVM: Restore nested control upon leaving SMM
> KVM: selftests: smm_test: Test SMM enter from L2
>
> arch/x86/kvm/svm/nested.c | 45 +++++++-----
> arch/x86/kvm/svm/svm.c | 51 +++++++++++++-
> arch/x86/kvm/svm/svm.h | 4 ++
> tools/testing/selftests/kvm/x86_64/smm_test.c | 70 +++++++++++++++++--
> 4 files changed, 144 insertions(+), 26 deletions(-)
>

Queued, thanks.

Paolo

2021-07-09 06:11:52

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 1/6] KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA

On Thu, 2021-07-08 at 19:27 +0200, Paolo Bonzini wrote:
> On 07/07/21 12:28, Maxim Levitsky wrote:
> > Minor nitpick: I would have checked the host provided value as well,
> > just in case since there is no reason why it won't pass the same check,
> > and fail if the value is not aligned.
>
> The reason not to do so is that it would allow a guest running an old
> kernel to defeat live migration.
I understand now, and I will keep this in mind next time.

Best regards,
Maxim Levitsky

>
> Paolo
>
> > Other than that:
> > Reviewed-by: Maxim Levitsky<[email protected]>