2020-09-23 18:47:49

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 0/7] KVM: nVMX: Bug fixes and cleanup

Fix for a brutal segment caching bug that manifested as random nested
VM-Enter failures when running with unrestricted guest disabled. A few
more bug fixes and cleanups for stuff found by inspection when hunting
down the caching issue.

v2:
- Rebased to kvm/queue, commit e1ba1a15af73 ("KVM: SVM: Enable INVPCID
feature on AMD").

Sean Christopherson (7):
KVM: nVMX: Reset the segment cache when stuffing guest segs
KVM: nVMX: Reload vmcs01 if getting vmcs12's pages fails
KVM: nVMX: Explicitly check for valid guest state for !unrestricted
guest
KVM: nVMX: Move free_nested() below vmx_switch_vmcs()
KVM: nVMX: Ensure vmcs01 is the loaded VMCS when freeing nested state
KVM: nVMX: Drop redundant VMCS switch and free_nested() call
KVM: nVMX: WARN on attempt to switch the currently loaded VMCS

arch/x86/kvm/vmx/nested.c | 103 ++++++++++++++++++++------------------
arch/x86/kvm/vmx/vmx.c | 8 +--
arch/x86/kvm/vmx/vmx.h | 9 ++++
3 files changed, 65 insertions(+), 55 deletions(-)

--
2.28.0


2020-09-23 18:47:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 1/7] KVM: nVMX: Reset the segment cache when stuffing guest segs

Explicitly reset the segment cache after stuffing guest segment regs in
prepare_vmcs02_rare(). Although the cache is reset when switching to
vmcs02, there is nothing that prevents KVM from re-populating the cache
prior to writing vmcs02 with vmcs12's values. E.g. if the vCPU is
preempted after switching to vmcs02 but before prepare_vmcs02_rare(),
kvm_arch_vcpu_put() will dereference GUEST_SS_AR_BYTES via .get_cpl()
and cache the stale vmcs02 value. While the current code base only
caches stale data in the preemption case, it's theoretically possible
future code could read a segment register during the nested flow itself,
i.e. this isn't technically illegal behavior in kvm_arch_vcpu_put(),
although it did introduce the bug.

This manifests as an unexpected nested VM-Enter failure when running
with unrestricted guest disabled if the above preemption case coincides
with L1 switching L2's CPL, e.g. when switching from a L2 vCPU at CPL3
to to a L2 vCPU at CPL0. stack_segment_valid() will see the new SS_SEL
but the old SS_AR_BYTES and incorrectly mark the guest state as invalid
due to SS.dpl != SS.rpl.

Don't bother updating the cache even though prepare_vmcs02_rare() writes
every segment. With unrestricted guest, guest segments are almost never
read, let alone L2 guest segments. On the other hand, populating the
cache requires a large number of memory writes, i.e. it's unlikely to be
a net win. Updating the cache would be a win when unrestricted guest is
not supported, as guest_state_valid() will immediately cache all segment
registers. But, nested virtualization without unrestricted guest is
dirt slow, saving some VMREADs won't change that, and every CPU
manufactured in the last decade supports unrestricted guest. In other
words, the extra (minor) complexity isn't worth the trouble.

Note, kvm_arch_vcpu_put() may see stale data when querying guest CPL
depending on when preemption occurs. This is "ok" in that the usage is
imperfect by nature, i.e. it's used heuristically to improve performance
but doesn't affect functionality. kvm_arch_vcpu_put() could be "fixed"
by also disabling preemption while loading segments, but that's
pointless and misleading as reading state from kvm_sched_{in,out}() is
guaranteed to see stale data in one form or another. E.g. even if all
the usage of regs_avail is fixed to call kvm_register_mark_available()
after the associated state is set, the individual state might still be
stale with respect to the overall vCPU state. I.e. making functional
decisions in an asynchronous hook is doomed from the get go. Thankfully
KVM doesn't do that.

Fixes: de63ad4cf4973 ("KVM: X86: implement the logic for spinlock optimization")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b6ce9ce91029..04441663a631 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2408,6 +2408,8 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
vmcs_writel(GUEST_TR_BASE, vmcs12->guest_tr_base);
vmcs_writel(GUEST_GDTR_BASE, vmcs12->guest_gdtr_base);
vmcs_writel(GUEST_IDTR_BASE, vmcs12->guest_idtr_base);
+
+ vmx->segment_cache.bitmask = 0;
}

if (!hv_evmcs || !(hv_evmcs->hv_clean_fields &
--
2.28.0

2020-09-23 18:48:26

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 6/7] KVM: nVMX: Drop redundant VMCS switch and free_nested() call

Remove the explicit switch to vmcs01 and the call to free_nested() in
nested_vmx_free_vcpu(). free_nested(), which is called unconditionally
by vmx_leave_nested(), ensures vmcs01 is loaded prior to freeing vmcs02
and friends.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3e6cc0d7090e..63550dcf6b9f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -326,8 +326,6 @@ void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu)
{
vcpu_load(vcpu);
vmx_leave_nested(vcpu);
- vmx_switch_vmcs(vcpu, &to_vmx(vcpu)->vmcs01);
- free_nested(vcpu);
vcpu_put(vcpu);
}

--
2.28.0

2020-09-25 21:39:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] KVM: nVMX: Bug fixes and cleanup

On 23/09/20 20:44, Sean Christopherson wrote:
> Fix for a brutal segment caching bug that manifested as random nested
> VM-Enter failures when running with unrestricted guest disabled. A few
> more bug fixes and cleanups for stuff found by inspection when hunting
> down the caching issue.
>
> v2:
> - Rebased to kvm/queue, commit e1ba1a15af73 ("KVM: SVM: Enable INVPCID
> feature on AMD").
>
> Sean Christopherson (7):
> KVM: nVMX: Reset the segment cache when stuffing guest segs
> KVM: nVMX: Reload vmcs01 if getting vmcs12's pages fails
> KVM: nVMX: Explicitly check for valid guest state for !unrestricted
> guest
> KVM: nVMX: Move free_nested() below vmx_switch_vmcs()
> KVM: nVMX: Ensure vmcs01 is the loaded VMCS when freeing nested state
> KVM: nVMX: Drop redundant VMCS switch and free_nested() call
> KVM: nVMX: WARN on attempt to switch the currently loaded VMCS
>
> arch/x86/kvm/vmx/nested.c | 103 ++++++++++++++++++++------------------
> arch/x86/kvm/vmx/vmx.c | 8 +--
> arch/x86/kvm/vmx/vmx.h | 9 ++++
> 3 files changed, 65 insertions(+), 55 deletions(-)
>

Queued, thanks.

Paolo