LinuxLists.cc - [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

2020-03-20 21:29:52

Subject: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

Add a helper to determine whether or not a full TLB flush needs to be
performed on nested VM-Enter/VM-Exit, as the logic is identical for both
flows and needs a fairly beefy comment to boot. This also provides a
common point to make future adjustments to the logic.

Handle vpid12 changes the new helper as well even though it is specific
to VM-Enter. The vpid12 logic is an extension of the flushing logic,
and it's worth the extra bool parameter to provide a single location for
the flushing logic.

Cc: Liran Alon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 88 +++++++++++++++++++--------------------
1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 77819d890088..580d5c98352f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1154,6 +1154,48 @@ static bool nested_has_guest_tlb_tag(struct kvm_vcpu *vcpu)
(nested_cpu_has_vpid(vmcs12) && to_vmx(vcpu)->nested.vpid02);
}

+static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
+ struct vmcs12 *vmcs12,
+ bool is_vmenter)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ /*
+ * If VPID is disabled, linear and combined mappings are flushed on
+ * VM-Enter/VM-Exit, and guest-physical mappings are valid only for
+ * their associated EPTP.
+ */
+ if (!enable_vpid)
+ return;
+
+ /*
+ * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
+ * for *all* contexts to be flushed on VM-Enter/VM-Exit.
+ *
+ * If VPID is enabled and used by vmc12, but L2 does not have a unique
+ * TLB tag (ASID), i.e. EPT is disabled and KVM was unable to allocate
+ * a VPID for L2, flush the TLB as the effective ASID is common to both
+ * L1 and L2.
+ *
+ * Defer the flush so that it runs after vmcs02.EPTP has been set by
+ * KVM_REQ_LOAD_MMU_PGD (if nested EPT is enabled) and to avoid
+ * redundant flushes further down the nested pipeline.
+ *
+ * If a TLB flush isn't required due to any of the above, and vpid12 is
+ * changing then the new "virtual" VPID (vpid12) will reuse the same
+ * "real" VPID (vpid02), and so needs to be sync'd. There is no direct
+ * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
+ * all nested vCPUs.
+ */
+ if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
+ kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+ } else if (is_vmenter &&
+ vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
+ vmx->nested.last_vpid = vmcs12->virtual_processor_id;
+ vpid_sync_context(nested_get_vpid02(vcpu));
+ }
+}
+
static bool is_bitwise_subset(u64 superset, u64 subset, u64 mask)
{
superset &= mask;
@@ -2462,32 +2504,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (kvm_has_tsc_control)
decache_tsc_multiplier(vmx);

- if (enable_vpid) {
- /*
- * There is no direct mapping between vpid02 and vpid12, the
- * vpid02 is per-vCPU for L0 and reused while the value of
- * vpid12 is changed w/ one invvpid during nested vmentry.
- * The vpid12 is allocated by L1 for L2, so it will not
- * influence global bitmap(for vpid01 and vpid02 allocation)
- * even if spawn a lot of nested vCPUs.
- */
- if (nested_cpu_has_vpid(vmcs12) && nested_has_guest_tlb_tag(vcpu)) {
- if (vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
- vmx->nested.last_vpid = vmcs12->virtual_processor_id;
- vpid_sync_context(nested_get_vpid02(vcpu));
- }
- } else {
- /*
- * If L1 use EPT, then L0 needs to execute INVEPT on
- * EPTP02 instead of EPTP01. Therefore, delay TLB
- * flush until vmcs02->eptp is fully updated by
- * KVM_REQ_LOAD_MMU_PGD. Note that this assumes
- * KVM_REQ_TLB_FLUSH is evaluated after
- * KVM_REQ_LOAD_MMU_PGD in vcpu_enter_guest().
- */
- kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
- }
- }
+ nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);

if (nested_cpu_has_ept(vmcs12))
nested_ept_init_mmu_context(vcpu);
@@ -4054,24 +4071,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
if (!enable_ept)
vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;

- /*
- * If vmcs01 doesn't use VPID, CPU flushes TLB on every
- * VMEntry/VMExit. Thus, no need to flush TLB.
- *
- * If vmcs12 doesn't use VPID, L1 expects TLB to be
- * flushed on every VMEntry/VMExit.
- *
- * Otherwise, we can preserve TLB entries as long as we are
- * able to tag L1 TLB entries differently than L2 TLB entries.
- *
- * If vmcs12 uses EPT, we need to execute this flush on EPTP01
- * and therefore we request the TLB flush to happen only after VMCS EPTP
- * has been set by KVM_REQ_LOAD_MMU_PGD.
- */
- if (enable_vpid &&
- (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu))) {
- kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
- }
+ nested_vmx_transition_tlb_flush(vcpu, vmcs12, false);

vmcs_write32(GUEST_SYSENTER_CS, vmcs12->host_ia32_sysenter_cs);
vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->host_ia32_sysenter_esp);
--
2.24.1

2021-10-28 13:13:20

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

On Sat, Mar 21, 2020 at 5:29 AM Sean Christopherson
<[email protected]> wrote:

> + if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
> + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> + } else if (is_vmenter &&
> + vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> + vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> + vpid_sync_context(nested_get_vpid02(vcpu));
> + }
> +}

(I'm sorry to pick this old email to reply to, but the problem has
nothing to do with this patch nor 5c614b3583e7 and it exists since
nested vmx is introduced.)

I think kvm_mmu_free_guest_mode_roots() should be called
if (!enable_ept && vmcs12->virtual_processor_id != vmx->nested.last_vpid)
just because prev_roots doesn't cache the vpid12.
(prev_roots caches PCID, which is distinctive)

The problem hardly exists if L1's hypervisor is also kvm, but if
L1's hypervisor is different or is also kvm with some changes
in the way how it manages VPID. (Actually, I planned to
change the way how it manages VPID to svm-like.)

nvcpu0 and nvcpu1 are in the same nested VM and are running the same
application process.

vcpu1: runs nvcpu1 with the same cr3 as nvcpu0
vcpu0: runs nvcpu0, modifies pagetable and L1 sync root, and flush VPID12
but L0 doesn't sync, it just removes the root from vcpu0's prev_roots.
vcpu1: L1 migrates nvcpu0 to here, allocates a *fresh* VPID12 to nvcpu0
like the ways svm allocates a fresh ASID.
vcpu1: runs nvcpu0 without any flush. (vcpu1's prev_roots has already had it
L0 hasn't synced it)

If my understanding is correct, I hope it is a report and somebody fixes it.

2021-10-28 15:27:23

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

-me :-)

On Thu, Oct 28, 2021, Lai Jiangshan wrote:
> On Sat, Mar 21, 2020 at 5:29 AM Sean Christopherson
> <[email protected]> wrote:
>
> > + if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
> > + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> > + } else if (is_vmenter &&
> > + vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > + vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > + vpid_sync_context(nested_get_vpid02(vcpu));
> > + }
> > +}
>
> (I'm sorry to pick this old email to reply to, but the problem has
> nothing to do with this patch nor 5c614b3583e7 and it exists since
> nested vmx is introduced.)
>
> I think kvm_mmu_free_guest_mode_roots() should be called
> if (!enable_ept && vmcs12->virtual_processor_id != vmx->nested.last_vpid)
> just because prev_roots doesn't cache the vpid12.
> (prev_roots caches PCID, which is distinctive)
>
> The problem hardly exists if L1's hypervisor is also kvm, but if L1's
> hypervisor is different or is also kvm with some changes in the way how it
> manages VPID.

Indeed. A more straightforward error case would be if L1 and L2 share CR3, and
vmcs02.VPID is toggled (or used for the first time) on the L1 => L2 VM-Enter.

The fix should simply be:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index eedcebf58004..574823370e7a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
*
* If a TLB flush isn't required due to any of the above, and vpid12 is
* changing then the new "virtual" VPID (vpid12) will reuse the same
- * "real" VPID (vpid02), and so needs to be flushed. There's no direct
- * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
- * all nested vCPUs. Remember, a flush on VM-Enter does not invalidate
- * guest-physical mappings, so there is no need to sync the nEPT MMU.
+ * "real" VPID (vpid02), and so needs to be flushed. Like the !vpid02
+ * case above, this is a full TLB flush from the guest's perspective.
*/
if (!nested_has_guest_tlb_tag(vcpu)) {
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
} else if (is_vmenter &&
vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
vmx->nested.last_vpid = vmcs12->virtual_processor_id;
- vpid_sync_context(nested_get_vpid02(vcpu));
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
}
}

2021-10-29 00:45:58

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

On Thu, Oct 28, 2021 at 11:22 PM Sean Christopherson <[email protected]> wrote:
>
> -me :-)
>
> On Thu, Oct 28, 2021, Lai Jiangshan wrote:
> > On Sat, Mar 21, 2020 at 5:29 AM Sean Christopherson
> > <[email protected]> wrote:
> >
> > > + if (!nested_cpu_has_vpid(vmcs12) || !nested_has_guest_tlb_tag(vcpu)) {
> > > + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> > > + } else if (is_vmenter &&
> > > + vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > > + vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > > + vpid_sync_context(nested_get_vpid02(vcpu));
> > > + }
> > > +}
> >
> > (I'm sorry to pick this old email to reply to, but the problem has
> > nothing to do with this patch nor 5c614b3583e7 and it exists since
> > nested vmx is introduced.)
> >
> > I think kvm_mmu_free_guest_mode_roots() should be called
> > if (!enable_ept && vmcs12->virtual_processor_id != vmx->nested.last_vpid)
> > just because prev_roots doesn't cache the vpid12.
> > (prev_roots caches PCID, which is distinctive)
> >
> > The problem hardly exists if L1's hypervisor is also kvm, but if L1's
> > hypervisor is different or is also kvm with some changes in the way how it
> > manages VPID.
>
> Indeed. A more straightforward error case would be if L1 and L2 share CR3, and
> vmcs02.VPID is toggled (or used for the first time) on the L1 => L2 VM-Enter.
>
> The fix should simply be:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index eedcebf58004..574823370e7a 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> *
> * If a TLB flush isn't required due to any of the above, and vpid12 is
> * changing then the new "virtual" VPID (vpid12) will reuse the same
> - * "real" VPID (vpid02), and so needs to be flushed. There's no direct
> - * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
> - * all nested vCPUs. Remember, a flush on VM-Enter does not invalidate
> - * guest-physical mappings, so there is no need to sync the nEPT MMU.
> + * "real" VPID (vpid02), and so needs to be flushed. Like the !vpid02
> + * case above, this is a full TLB flush from the guest's perspective.
> */
> if (!nested_has_guest_tlb_tag(vcpu)) {
> kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> } else if (is_vmenter &&
> vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> - vpid_sync_context(nested_get_vpid02(vcpu));
> + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);

This change is neat.

But current KVM_REQ_TLB_FLUSH_GUEST flushes vpid01 only, and it doesn't flush
vpid02. vmx_flush_tlb_guest() might need to be changed to flush vpid02 too.

And if so, this nested_vmx_transition_tlb_flush() can be simplified further
since KVM_REQ_TLB_FLUSH_CURRENT(!enable_ept) can be replaced with
KVM_REQ_TLB_FLUSH_GUEST.

> }
> }

2021-10-29 17:12:34

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

TL;DR: I'll work on a proper series next week, there are multiple things that need
to be fixed.

On Fri, Oct 29, 2021, Lai Jiangshan wrote:
> On Thu, Oct 28, 2021 at 11:22 PM Sean Christopherson <[email protected]> wrote:
> > The fix should simply be:
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index eedcebf58004..574823370e7a 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> > *
> > * If a TLB flush isn't required due to any of the above, and vpid12 is
> > * changing then the new "virtual" VPID (vpid12) will reuse the same
> > - * "real" VPID (vpid02), and so needs to be flushed. There's no direct
> > - * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
> > - * all nested vCPUs. Remember, a flush on VM-Enter does not invalidate
> > - * guest-physical mappings, so there is no need to sync the nEPT MMU.
> > + * "real" VPID (vpid02), and so needs to be flushed. Like the !vpid02
> > + * case above, this is a full TLB flush from the guest's perspective.
> > */
> > if (!nested_has_guest_tlb_tag(vcpu)) {
> > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > } else if (is_vmenter &&
> > vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > - vpid_sync_context(nested_get_vpid02(vcpu));
> > + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>
> This change is neat.

Heh, yeah, but too neat to be right :-)

> But current KVM_REQ_TLB_FLUSH_GUEST flushes vpid01 only, and it doesn't flush
> vpid02. vmx_flush_tlb_guest() might need to be changed to flush vpid02 too.

Hmm. I think vmx_flush_tlb_guest() is straight up broken. E.g. if EPT is enabled
but L1 doesn't use EPT for L2 and doesn't intercept INVPCID, then KVM will handle
INVPCID from L2. That means the recent addition to kvm_invalidate_pcid() (see
below) will flush the wrong VPID. And it's incorrect (well, more than is required
by the SDM) to flush both VPIDs because flushes from INVPCID (and flushes from the
guest's perspective in general) are scoped to the current VPID, e.g. a "full" TLB
flush in the "host" by toggling CR4.PGE flushes only the current VPID:

Operations that architecturally invalidate entries in the TLBs or paging-structure
caches independent of VMX operation (e.g., the INVLPG and INVPCID instructions)
invalidate linear mappings and combined mappings. They are required to do so only
for the current VPID (but, for combined mappings, all EP4TAs).

static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
{
struct kvm_mmu *mmu = vcpu->arch.mmu;
unsigned long roots_to_free = 0;
int i;

/*
* MOV CR3 and INVPCID are usually not intercepted when using TDP, but
* this is reachable when running EPT=1 and unrestricted_guest=0, and
* also via the emulator. KVM's TDP page tables are not in the scope of
* the invalidation, but the guest's TLB entries need to be flushed as
* the CPU may have cached entries in its TLB for the target PCID.
*/
if (unlikely(tdp_enabled)) {
kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
return;
}

...
}

To fix that, the "guest" flushes should always operate on the current VPID. But
this alone is insufficient (more below).

static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu))
return nested_get_vpid02(vcpu);
return to_vmx(vcpu)->vpid;
}

static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu)
{
struct kvm_mmu *mmu = vcpu->arch.mmu;
u64 root_hpa = mmu->root_hpa;

/* No flush required if the current context is invalid. */
if (!VALID_PAGE(root_hpa))
return;

if (enable_ept)
ept_sync_context(construct_eptp(vcpu, root_hpa,
mmu->shadow_root_level));
else
vpid_sync_context(vmx_get_current_vpid(vcpu));
}

static void vmx_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t addr)
{
/*
* vpid_sync_vcpu_addr() is a nop if vpid==0, see the comment in
* vmx_flush_tlb_guest() for an explanation of why this is ok.
*/
vpid_sync_vcpu_addr(vmx_get_current_vpid(vcpu), addr);
}

static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu)
{
/*
* vpid_sync_context() is a nop if vpid==0, e.g. if enable_vpid==0 or a
* vpid couldn't be allocated for this vCPU. VM-Enter and VM-Exit are
* required to flush GVA->{G,H}PA mappings from the TLB if vpid is
* disabled (VM-Enter with vpid enabled and vpid==0 is disallowed),
* i.e. no explicit INVVPID is necessary.
*/
vpid_sync_context(vmx_get_current_vpid(vcpu));
}

> And if so, this nested_vmx_transition_tlb_flush() can be simplified further
> since KVM_REQ_TLB_FLUSH_CURRENT(!enable_ept) can be replaced with
> KVM_REQ_TLB_FLUSH_GUEST.

And as above KVM_REQ_TLB_FLUSH_GUEST is conceptually wrong, too. E.g. in my
dummy case of L1 and L2 using the same CR3, if L1 assigns L2 a VPID then L1's
ASID is not flushed flushed on VM-Exit, so pending PTE updates for that single
CR3 would not be flushed (sync'd in KVM) for L1 even though they were flushed
for L2.

kvm_mmu_page_role doesn't track VPID, but it does track is_guest_mode, so the
bizarre case of L1 but not L2 having stale entries for a single CR3 is "supported".

Another wrinkle that is being mishandled is if L1 doesn't intercept INVPCID and
KVM synthesizes a nested VM-Exit from L2=>L1 before servicing KVM_REQ_MMU_SYNC,
KVM will sync the wrong MMU because the nested transitions only service pending
"current" flushes. The GUEST variant also has the same bug (which I alluded to
above).

To fix that, the nVMX code should handle all pending flushes and syncs that are
specific to the current vCPU, e.g. by replacing the open coded TLB_FLUSH_CURRENT
check with a call to a common helper as below. Ideally enter_guest_mode() and
leave_guest_mode() would handle these calls so that SVM doesn't need to be updated
if/when SVM stops flushing on all nested transitions, but VMX switches to vmcs02
and has already modified state before getting to enter_guest_mode(), which makes
me more than a bit nervous.

@@ -3361,8 +3358,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
};
u32 failed_index;

- if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
- kvm_vcpu_flush_tlb_current(vcpu);
+ kvm_service_pending_tlb_flush_on_nested_transition(vcpu);

evaluate_pending_interrupts = exec_controls_get(vmx) &
(CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);
@@ -4516,9 +4512,8 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
(void)nested_get_evmcs_page(vcpu);
}

- /* Service the TLB flush request for L2 before switching to L1. */
- if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
- kvm_vcpu_flush_tlb_current(vcpu);
+ /* Service pending TLB flush requests for L2 before switching to L1. */
+ kvm_service_pending_tlb_flush_on_nested_transition(vcpu);

/*
* VCPU_EXREG_PDPTR will be clobbered in arch/x86/kvm/vmx/vmx.h between

And for nested_vmx_transition_tlb_flush(), assuming all the other things are fixed,
the "vpid12 is changing" case does indeed become KVM_REQ_TLB_FLUSH_GUEST. It also
needs to be prioritized above nested_has_guest_tlb_tag() because a GUEST flush is
"strong" than a CURRENT flush.

static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12,
bool is_vmenter)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);

/*
* If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
* for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
* full TLB flush from the guest's perspective. This is required even
* if VPID is disabled in the host as KVM may need to synchronize the
* MMU in response to the guest TLB flush.
*
* Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use.
* EPT is a special snowflake, as guest-physical mappings aren't
* flushed on VPID invalidations, including VM-Enter or VM-Exit with
* VPID disabled. As a result, KVM _never_ needs to sync nEPT
* entries on VM-Enter because L1 can't rely on VM-Enter to flush
* those mappings.
*/
if (!nested_cpu_has_vpid(vmcs12)) {
kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
return;
}

/* L2 should never have a VPID if VPID is disabled. */
WARN_ON(!enable_vpid);

/*
* VPID is enabled and in use by vmcs12. If vpid12 is changing, then
* emulate a guest TLB flush as KVM does not track vpid12 history nor
* is the VPID incorporated into the MMU context. I.e. KVM must assume
* that the new vpid12 has never been used and thus represents a new
* guest ASID that cannot have entries in the TLB.
*/
if (is_vmenter && vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
vmx->nested.last_vpid = vmcs12->virtual_processor_id;
kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
return;
}

/*
* If VPID is enabled, used by vmc12, and vpid12 is not changing but
* but does not have a unique TLB tag (ASID), i.e. EPT is disabled and
* KVM was unable to allocate a VPID for L2, flush the current context
* as the effective ASID is common to both L1 and L2.
*/
if (!nested_has_guest_tlb_tag(vcpu))
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
}

2021-10-30 01:39:15

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

/

On Sat, Oct 30, 2021 at 1:10 AM Sean Christopherson <[email protected]> wrote:
>
> TL;DR: I'll work on a proper series next week, there are multiple things that need
> to be fixed.
>
> On Fri, Oct 29, 2021, Lai Jiangshan wrote:
> > On Thu, Oct 28, 2021 at 11:22 PM Sean Christopherson <[email protected]> wrote:
> > > The fix should simply be:
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index eedcebf58004..574823370e7a 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> > > *
> > > * If a TLB flush isn't required due to any of the above, and vpid12 is
> > > * changing then the new "virtual" VPID (vpid12) will reuse the same
> > > - * "real" VPID (vpid02), and so needs to be flushed. There's no direct
> > > - * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
> > > - * all nested vCPUs. Remember, a flush on VM-Enter does not invalidate
> > > - * guest-physical mappings, so there is no need to sync the nEPT MMU.
> > > + * "real" VPID (vpid02), and so needs to be flushed. Like the !vpid02
> > > + * case above, this is a full TLB flush from the guest's perspective.
> > > */
> > > if (!nested_has_guest_tlb_tag(vcpu)) {
> > > kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > > } else if (is_vmenter &&
> > > vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > > vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > > - vpid_sync_context(nested_get_vpid02(vcpu));
> > > + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> >
> > This change is neat.
>
> Heh, yeah, but too neat to be right :-)
>
> > But current KVM_REQ_TLB_FLUSH_GUEST flushes vpid01 only, and it doesn't flush
> > vpid02. vmx_flush_tlb_guest() might need to be changed to flush vpid02 too.
>
> Hmm. I think vmx_flush_tlb_guest() is straight up broken. E.g. if EPT is enabled
> but L1 doesn't use EPT for L2 and doesn't intercept INVPCID, then KVM will handle
> INVPCID from L2. That means the recent addition to kvm_invalidate_pcid() (see
> below) will flush the wrong VPID. And it's incorrect (well, more than is required
> by the SDM) to flush both VPIDs because flushes from INVPCID (and flushes from the
> guest's perspective in general) are scoped to the current VPID, e.g. a "full" TLB
> flush in the "host" by toggling CR4.PGE flushes only the current VPID:

I think KVM_REQ_TLB_FLUSH_GUEST/kvm_vcpu_flush_tlb_guest/vmx_flush_tlb_guest
was deliberately designed for the L1 guest only. It can be seen from the code,
from the history, and from the caller's side. For example,
nested_vmx_transition_tlb_flush() knows KVM_REQ_TLB_FLUSH_GUEST flushes
L1 guest:

/*
* If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
* for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
* full TLB flush from the guest's perspective. This is required even
* if VPID is disabled in the host as KVM may need to synchronize the
* MMU in response to the guest TLB flush.
*
* Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use.
* EPT is a special snowflake, as guest-physical mappings aren't
* flushed on VPID invalidations, including VM-Enter or VM-Exit with
* VPID disabled. As a result, KVM _never_ needs to sync nEPT
* entries on VM-Enter because L1 can't rely on VM-Enter to flush
* those mappings.
*/
if (!nested_cpu_has_vpid(vmcs12)) {
kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
return;
}

While handle_invvpid() doesn't use KVM_REQ_TLB_FLUSH_GUEST.

So, I don't think KVM_REQ_TLB_FLUSH_GUEST, kvm_vcpu_flush_tlb_guest
or vmx_flush_tlb_guest is broken since they are for L1 guests.
What we have to do is to consider is it worth extending them for
nested guests for the convenience of nested code.

I second that they are extended.

A small comment in your proposal: I found that KVM_REQ_TLB_FLUSH_CURRENT
and KVM_REQ_TLB_FLUSH_GUEST is to flush "current" vpid only, some special
work needs to be added when switching mmu from L1 to L2 and vice versa:
handle the requests before switching.

2021-11-04 17:50:48

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

On Sat, Oct 30, 2021, Lai Jiangshan wrote:
> A small comment in your proposal: I found that KVM_REQ_TLB_FLUSH_CURRENT
> and KVM_REQ_TLB_FLUSH_GUEST is to flush "current" vpid only, some special
> work needs to be added when switching mmu from L1 to L2 and vice versa:
> handle the requests before switching.

Oh, yeah, that's this snippet of my pseudo patch, but I didn't provide the
kvm_service_pending_tlb_flush_on_nested_transition() implementation so it's not
exactly obvious what I intended. The current code handles CURRENT, but not GUEST,
the idea is to shove those into a helper that can be shared between nVMX and nSVM.

And I believe the "flush" also needs to service KVM_REQ_MMU_SYNC. For L1=>L2 it
should be irrelevant/impossible, since L1 can only be unsync if L1 and L2 share
an MMU, but the L2=>L1 path could result in a lost sync if something, e.g. an IRQ,
prompted a nested VM-Exit before re-entering L2.

Let me know if I misunderstood your comment. Thanks!

@@ -3361,8 +3358,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
};
u32 failed_index;

- if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
- kvm_vcpu_flush_tlb_current(vcpu);
+ kvm_service_pending_tlb_flush_on_nested_transition(vcpu);

evaluate_pending_interrupts = exec_controls_get(vmx) &
(CPU_BASED_INTR_WINDOW_EXITING | CPU_BASED_NMI_WINDOW_EXITING);