2020-10-21 10:28:28

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 00/10] KVM: VMX: Clean up Hyper-V PV TLB flush

Clean up KVM's PV TLB flushing when running with EPT on Hyper-V, i.e. as
a nested VMM. No real goal in mind other than the sole patch in v1, which
is a minor change to avoid a future mixup when TDX also wants to define
.remote_flush_tlb. Everything else is opportunistic clean up.

Ran Hyper-V KVM unit tests (if those are even relevant?), but haven't
actually tested on top of Hyper-V.

v2: Rewrite everything.

Sean Christopherson (10):
KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flush
KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB
flush
KVM: VMX: Fold Hyper-V EPTP checking into it's only caller
KVM: VMX: Do Hyper-V TLB flush iff vCPU's EPTP hasn't been flushed
KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch
KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matches
KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd
KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is
enabled
KVM: VMX: Skip additional Hyper-V TLB EPTP flushes if one fails
KVM: VMX: Track PGD instead of EPTP for paravirt Hyper-V TLB flush

arch/x86/kvm/vmx/vmx.c | 102 ++++++++++++++++++++---------------------
arch/x86/kvm/vmx/vmx.h | 16 +++----
2 files changed, 57 insertions(+), 61 deletions(-)

--
2.28.0


2020-10-21 10:29:55

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 02/10] KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB flush

Capture kvm_vmx in a local variable instead of polluting
hv_remote_flush_tlb_with_range() with to_kvm_vmx(kvm).

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6d53bcc4a1a9..6d41c99c70c4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -516,26 +516,27 @@ static inline int hv_remote_flush_eptp(u64 eptp, struct kvm_tlb_range *range)
static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
struct kvm_tlb_range *range)
{
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
struct kvm_vcpu *vcpu;
int ret = 0, i;

- spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
+ spin_lock(&kvm_vmx->ept_pointer_lock);

- if (to_kvm_vmx(kvm)->ept_pointers_match == EPT_POINTERS_CHECK)
+ if (kvm_vmx->ept_pointers_match == EPT_POINTERS_CHECK)
check_ept_pointer_match(kvm);

- if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
+ if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
kvm_for_each_vcpu(i, vcpu, kvm) {
/* If ept_pointer is invalid pointer, bypass flush request. */
if (VALID_PAGE(to_vmx(vcpu)->ept_pointer))
ret |= hv_remote_flush_eptp(to_vmx(vcpu)->ept_pointer,
range);
}
- } else if (VALID_PAGE(to_kvm_vmx(kvm)->hv_tlb_eptp)) {
- ret = hv_remote_flush_eptp(to_kvm_vmx(kvm)->hv_tlb_eptp, range);
+ } else if (VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
+ ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
}

- spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
+ spin_unlock(&kvm_vmx->ept_pointer_lock);
return ret;
}
static int hv_remote_flush_tlb(struct kvm *kvm)
--
2.28.0

2020-10-21 10:30:17

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 05/10] KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch

Drop the dedicated 'ept_pointers_match' field in favor of stuffing
'hv_tlb_eptp' with INVALID_PAGE to mark it as invalid, i.e. to denote
that there is at least one EPTP mismatch. Use a local variable to
track whether or not a mismatch is detected so that hv_tlb_eptp can be
used to skip redundant flushes.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 16 ++++++++--------
arch/x86/kvm/vmx/vmx.h | 7 -------
2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 52cb9eec1db3..4dfde8b64750 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -498,13 +498,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
struct kvm_vcpu *vcpu;
int ret = 0, i;
+ bool mismatch;
u64 tmp_eptp;

spin_lock(&kvm_vmx->ept_pointer_lock);

- if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
- kvm_vmx->ept_pointers_match = EPT_POINTERS_MATCH;
- kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
+ if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
+ mismatch = false;

kvm_for_each_vcpu(i, vcpu, kvm) {
tmp_eptp = to_vmx(vcpu)->ept_pointer;
@@ -515,12 +515,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp))
kvm_vmx->hv_tlb_eptp = tmp_eptp;
else
- kvm_vmx->ept_pointers_match
- = EPT_POINTERS_MISMATCH;
+ mismatch = true;

ret |= hv_remote_flush_eptp(tmp_eptp, range);
}
- } else if (VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
+ if (mismatch)
+ kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
+ } else {
ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
}

@@ -3042,8 +3043,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
if (kvm_x86_ops.tlb_remote_flush) {
spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
to_vmx(vcpu)->ept_pointer = eptp;
- to_kvm_vmx(kvm)->ept_pointers_match
- = EPT_POINTERS_CHECK;
+ to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
}

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 3d557a065c01..e8d7d07b2020 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -288,12 +288,6 @@ struct vcpu_vmx {
} shadow_msr_intercept;
};

-enum ept_pointers_status {
- EPT_POINTERS_CHECK = 0,
- EPT_POINTERS_MATCH = 1,
- EPT_POINTERS_MISMATCH = 2
-};
-
struct kvm_vmx {
struct kvm kvm;

@@ -302,7 +296,6 @@ struct kvm_vmx {
gpa_t ept_identity_map_addr;

hpa_t hv_tlb_eptp;
- enum ept_pointers_status ept_pointers_match;
spinlock_t ept_pointer_lock;
};

--
2.28.0

2020-10-21 10:33:42

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 06/10] KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matches

Don't invalidate the common EPTP, and thus trigger rechecking of EPTPs
across all vCPUs, if the new EPTP matches the old/common EPTP. In all
likelihood this is a meaningless optimization, but there are (uncommon)
scenarios where KVM can reload the same EPTP.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4dfde8b64750..e6569bafacdc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3043,7 +3043,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
if (kvm_x86_ops.tlb_remote_flush) {
spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
to_vmx(vcpu)->ept_pointer = eptp;
- to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
+ if (eptp != to_kvm_vmx(kvm)->hv_tlb_eptp)
+ to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
}

--
2.28.0

2020-10-21 10:34:57

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 03/10] KVM: VMX: Fold Hyper-V EPTP checking into it's only caller

Fold check_ept_pointer_match() into hv_remote_flush_tlb_with_range() in
preparation for combining the kvm_for_each_vcpu loops of the ==CHECK and
!=MATCH statements.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 42 +++++++++++++++++++-----------------------
1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6d41c99c70c4..bba6d91f1fe1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -469,27 +469,6 @@ static const u32 vmx_uret_msrs_list[] = {
static bool __read_mostly enlightened_vmcs = true;
module_param(enlightened_vmcs, bool, 0444);

-/* check_ept_pointer() should be under protection of ept_pointer_lock. */
-static void check_ept_pointer_match(struct kvm *kvm)
-{
- struct kvm_vcpu *vcpu;
- u64 tmp_eptp = INVALID_PAGE;
- int i;
-
- kvm_for_each_vcpu(i, vcpu, kvm) {
- if (!VALID_PAGE(tmp_eptp)) {
- tmp_eptp = to_vmx(vcpu)->ept_pointer;
- } else if (tmp_eptp != to_vmx(vcpu)->ept_pointer) {
- to_kvm_vmx(kvm)->ept_pointers_match
- = EPT_POINTERS_MISMATCH;
- return;
- }
- }
-
- to_kvm_vmx(kvm)->hv_tlb_eptp = tmp_eptp;
- to_kvm_vmx(kvm)->ept_pointers_match = EPT_POINTERS_MATCH;
-}
-
static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush,
void *data)
{
@@ -519,11 +498,28 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
struct kvm_vcpu *vcpu;
int ret = 0, i;
+ u64 tmp_eptp;

spin_lock(&kvm_vmx->ept_pointer_lock);

- if (kvm_vmx->ept_pointers_match == EPT_POINTERS_CHECK)
- check_ept_pointer_match(kvm);
+ if (kvm_vmx->ept_pointers_match == EPT_POINTERS_CHECK) {
+ kvm_vmx->ept_pointers_match = EPT_POINTERS_MATCH;
+ kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ tmp_eptp = to_vmx(vcpu)->ept_pointer;
+ if (!VALID_PAGE(tmp_eptp))
+ continue;
+
+ if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
+ kvm_vmx->hv_tlb_eptp = tmp_eptp;
+ } else if (kvm_vmx->hv_tlb_eptp != tmp_eptp) {
+ kvm_vmx->ept_pointers_match
+ = EPT_POINTERS_MISMATCH;
+ break;
+ }
+ }
+ }

if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
kvm_for_each_vcpu(i, vcpu, kvm) {
--
2.28.0

2020-10-21 13:17:07

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] KVM: VMX: Clean up Hyper-V PV TLB flush

Sean Christopherson <[email protected]> writes:

> Clean up KVM's PV TLB flushing when running with EPT on Hyper-V, i.e. as
> a nested VMM.

The terminology we use is a bit confusing and I'd like to use the
opportunity to enlighten myself on how to call "PV TLB flushing"
properly :-)

Hyper-V supports two types of 'PV TLB flushing':

HvFlushVirtualAddressSpace/HvFlushVirtualAddressList[,Ex] which is
described in TLFS as ".. hypercall invalidates ... virtual TLB entries
that belong to a specified address space."

HvFlushGuestPhysicalAddressSpace/HvFlushGuestPhysicalAddressList which
in TLFS is referred to as "... hypercall invalidates cached L2 GPA to
GPA mappings within a second level address space... hypercall is like
the execution of an INVEPT instruction with type “single-context” on all
processors" and INVEPT is defined in SDM as "Invalidates mappings in the
translation lookaside buffers (TLBs) and paging-structure caches that
were derived from extended page tables (EPT)." (and that's what this
series is about)

and every time I see e.g. 'hv_remote_flush_tlb.*' it takes me some time
to recall which flushing is this related to. Do you by any chance have
any suggestions on how things can be improved?

> No real goal in mind other than the sole patch in v1, which
> is a minor change to avoid a future mixup when TDX also wants to define
> .remote_flush_tlb. Everything else is opportunistic clean up.
>

Looks like a nice cleanup, thanks!

> Ran Hyper-V KVM unit tests (if those are even relevant?)

No, they aren't. KVM doesn't currently implement
HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST so we can't test this feature
outside of a real Hyper-V environment. We also don't yet test KVM-on-KVM
with Enlightened VMCS ...

> but haven't actually tested on top of Hyper-V.

Just in case you are interested in doing so and there's no Hyper-V
server around, you can either search for a Win10 desktop around or just
spin an Azure VM where modern instance types (e.g. Dv3/v4, Ev3/v4
families, Intel only - so no Ea/Da/...) have VMX and PV Hyper-V features
exposed.

I'm going to give this a try today and I will also try to review
individual patches, thanks again!

>
> v2: Rewrite everything.
>
> Sean Christopherson (10):
> KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flush
> KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB
> flush
> KVM: VMX: Fold Hyper-V EPTP checking into it's only caller
> KVM: VMX: Do Hyper-V TLB flush iff vCPU's EPTP hasn't been flushed
> KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch
> KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matches
> KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd
> KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is
> enabled
> KVM: VMX: Skip additional Hyper-V TLB EPTP flushes if one fails
> KVM: VMX: Track PGD instead of EPTP for paravirt Hyper-V TLB flush
>
> arch/x86/kvm/vmx/vmx.c | 102 ++++++++++++++++++++---------------------
> arch/x86/kvm/vmx/vmx.h | 16 +++----
> 2 files changed, 57 insertions(+), 61 deletions(-)

--
Vitaly

2020-10-21 13:54:38

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 02/10] KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB flush

Sean Christopherson <[email protected]> writes:

> Capture kvm_vmx in a local variable instead of polluting
> hv_remote_flush_tlb_with_range() with to_kvm_vmx(kvm).
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 6d53bcc4a1a9..6d41c99c70c4 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -516,26 +516,27 @@ static inline int hv_remote_flush_eptp(u64 eptp, struct kvm_tlb_range *range)
> static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> struct kvm_tlb_range *range)
> {
> + struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
> struct kvm_vcpu *vcpu;
> int ret = 0, i;
>
> - spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> + spin_lock(&kvm_vmx->ept_pointer_lock);
>
> - if (to_kvm_vmx(kvm)->ept_pointers_match == EPT_POINTERS_CHECK)
> + if (kvm_vmx->ept_pointers_match == EPT_POINTERS_CHECK)
> check_ept_pointer_match(kvm);
>
> - if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
> + if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
> kvm_for_each_vcpu(i, vcpu, kvm) {
> /* If ept_pointer is invalid pointer, bypass flush request. */
> if (VALID_PAGE(to_vmx(vcpu)->ept_pointer))
> ret |= hv_remote_flush_eptp(to_vmx(vcpu)->ept_pointer,
> range);
> }
> - } else if (VALID_PAGE(to_kvm_vmx(kvm)->hv_tlb_eptp)) {
> - ret = hv_remote_flush_eptp(to_kvm_vmx(kvm)->hv_tlb_eptp, range);
> + } else if (VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
> + ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
> }
>
> - spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> + spin_unlock(&kvm_vmx->ept_pointer_lock);
> return ret;
> }
> static int hv_remote_flush_tlb(struct kvm *kvm)

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2020-10-21 13:55:09

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 03/10] KVM: VMX: Fold Hyper-V EPTP checking into it's only caller

Sean Christopherson <[email protected]> writes:

> Fold check_ept_pointer_match() into hv_remote_flush_tlb_with_range() in
> preparation for combining the kvm_for_each_vcpu loops of the ==CHECK and
> !=MATCH statements.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 42 +++++++++++++++++++-----------------------
> 1 file changed, 19 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 6d41c99c70c4..bba6d91f1fe1 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -469,27 +469,6 @@ static const u32 vmx_uret_msrs_list[] = {
> static bool __read_mostly enlightened_vmcs = true;
> module_param(enlightened_vmcs, bool, 0444);
>
> -/* check_ept_pointer() should be under protection of ept_pointer_lock. */
> -static void check_ept_pointer_match(struct kvm *kvm)
> -{
> - struct kvm_vcpu *vcpu;
> - u64 tmp_eptp = INVALID_PAGE;
> - int i;
> -
> - kvm_for_each_vcpu(i, vcpu, kvm) {
> - if (!VALID_PAGE(tmp_eptp)) {
> - tmp_eptp = to_vmx(vcpu)->ept_pointer;
> - } else if (tmp_eptp != to_vmx(vcpu)->ept_pointer) {
> - to_kvm_vmx(kvm)->ept_pointers_match
> - = EPT_POINTERS_MISMATCH;
> - return;
> - }
> - }
> -
> - to_kvm_vmx(kvm)->hv_tlb_eptp = tmp_eptp;
> - to_kvm_vmx(kvm)->ept_pointers_match = EPT_POINTERS_MATCH;
> -}
> -
> static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush,
> void *data)
> {
> @@ -519,11 +498,28 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
> struct kvm_vcpu *vcpu;
> int ret = 0, i;
> + u64 tmp_eptp;
>
> spin_lock(&kvm_vmx->ept_pointer_lock);
>
> - if (kvm_vmx->ept_pointers_match == EPT_POINTERS_CHECK)
> - check_ept_pointer_match(kvm);
> + if (kvm_vmx->ept_pointers_match == EPT_POINTERS_CHECK) {
> + kvm_vmx->ept_pointers_match = EPT_POINTERS_MATCH;
> + kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + tmp_eptp = to_vmx(vcpu)->ept_pointer;
> + if (!VALID_PAGE(tmp_eptp))
> + continue;
> +
> + if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
> + kvm_vmx->hv_tlb_eptp = tmp_eptp;
> + } else if (kvm_vmx->hv_tlb_eptp != tmp_eptp) {
> + kvm_vmx->ept_pointers_match
> + = EPT_POINTERS_MISMATCH;
> + break;

Actually no (scratch my comment on PATCH1), in case pointers differ
kvm_vmx->hv_tlb_eptp remains set to the last matched EPTP. This likely
doesn't matter as we're not going to use it but maybe sacrificing couple
instructions and resetting it here to INVALID_PAGE (or actually setting
it only in case of EPT_POINTERS_MATCH after the loop)?

> + }
> + }
> + }
>
> if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
> kvm_for_each_vcpu(i, vcpu, kvm) {

--
Vitaly

2020-10-21 14:01:41

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 05/10] KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch

Sean Christopherson <[email protected]> writes:

> Drop the dedicated 'ept_pointers_match' field in favor of stuffing
> 'hv_tlb_eptp' with INVALID_PAGE to mark it as invalid, i.e. to denote
> that there is at least one EPTP mismatch. Use a local variable to
> track whether or not a mismatch is detected so that hv_tlb_eptp can be
> used to skip redundant flushes.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 16 ++++++++--------
> arch/x86/kvm/vmx/vmx.h | 7 -------
> 2 files changed, 8 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 52cb9eec1db3..4dfde8b64750 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -498,13 +498,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
> struct kvm_vcpu *vcpu;
> int ret = 0, i;
> + bool mismatch;
> u64 tmp_eptp;
>
> spin_lock(&kvm_vmx->ept_pointer_lock);
>
> - if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
> - kvm_vmx->ept_pointers_match = EPT_POINTERS_MATCH;
> - kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> + if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
> + mismatch = false;
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> tmp_eptp = to_vmx(vcpu)->ept_pointer;
> @@ -515,12 +515,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp))
> kvm_vmx->hv_tlb_eptp = tmp_eptp;
> else
> - kvm_vmx->ept_pointers_match
> - = EPT_POINTERS_MISMATCH;
> + mismatch = true;
>
> ret |= hv_remote_flush_eptp(tmp_eptp, range);
> }
> - } else if (VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
> + if (mismatch)
> + kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> + } else {
> ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
> }

Personally, I find double negations like 'mismatch = false' hard to read
:-). What if we write this all like

if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
kvm_vmx->hv_tlb_eptp = to_vmx(vcpu0)->ept_pointer;
kvm_for_each_vcpu() {
tmp_eptp = to_vmx(vcpu)->ept_pointer;
if (!VALID_PAGE(tmp_eptp) || tmp_eptp != kvm_vmx->hv_tlb_eptp)
kvm_vmx->hv_tlb_eptp = INVALID_PAGE;

if (VALID_PAGE(tmp_eptp))
ret |= hv_remote_flush_eptp(tmp_eptp, range);
}
} else {
ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
}

(not tested and I've probably missed something)

>
> @@ -3042,8 +3043,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
> if (kvm_x86_ops.tlb_remote_flush) {
> spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> to_vmx(vcpu)->ept_pointer = eptp;
> - to_kvm_vmx(kvm)->ept_pointers_match
> - = EPT_POINTERS_CHECK;
> + to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
> spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> }
>
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 3d557a065c01..e8d7d07b2020 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -288,12 +288,6 @@ struct vcpu_vmx {
> } shadow_msr_intercept;
> };
>
> -enum ept_pointers_status {
> - EPT_POINTERS_CHECK = 0,
> - EPT_POINTERS_MATCH = 1,
> - EPT_POINTERS_MISMATCH = 2
> -};
> -
> struct kvm_vmx {
> struct kvm kvm;
>
> @@ -302,7 +296,6 @@ struct kvm_vmx {
> gpa_t ept_identity_map_addr;
>
> hpa_t hv_tlb_eptp;
> - enum ept_pointers_status ept_pointers_match;
> spinlock_t ept_pointer_lock;
> };

--
Vitaly

2020-10-21 18:25:26

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 01/10] KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flush

Explicitly track the EPTP that is common to all vCPUs instead of
grabbing vCPU0's EPTP when invoking Hyper-V's paravirt TLB flush.
Tracking the EPTP will allow optimizing the checks when loading a new
EPTP and will also allow dropping ept_pointer_match, e.g. by marking
the common EPTP as invalid.

This also technically fixes a bug where KVM could theoretically flush an
invalid GPA if all vCPUs have an invalid root. In practice, it's likely
impossible to trigger a remote TLB flush in such a scenario. In any
case, the superfluous flush is completely benign.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 19 ++++++++-----------
arch/x86/kvm/vmx/vmx.h | 1 +
2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bcc097bb8321..6d53bcc4a1a9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -486,6 +486,7 @@ static void check_ept_pointer_match(struct kvm *kvm)
}
}

+ to_kvm_vmx(kvm)->hv_tlb_eptp = tmp_eptp;
to_kvm_vmx(kvm)->ept_pointers_match = EPT_POINTERS_MATCH;
}

@@ -498,21 +499,18 @@ static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush
range->pages);
}

-static inline int __hv_remote_flush_tlb_with_range(struct kvm *kvm,
- struct kvm_vcpu *vcpu, struct kvm_tlb_range *range)
+static inline int hv_remote_flush_eptp(u64 eptp, struct kvm_tlb_range *range)
{
- u64 ept_pointer = to_vmx(vcpu)->ept_pointer;
-
/*
* FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs address
* of the base of EPT PML4 table, strip off EPT configuration
* information.
*/
if (range)
- return hyperv_flush_guest_mapping_range(ept_pointer & PAGE_MASK,
+ return hyperv_flush_guest_mapping_range(eptp & PAGE_MASK,
kvm_fill_hv_flush_list_func, (void *)range);
else
- return hyperv_flush_guest_mapping(ept_pointer & PAGE_MASK);
+ return hyperv_flush_guest_mapping(eptp & PAGE_MASK);
}

static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
@@ -530,12 +528,11 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
kvm_for_each_vcpu(i, vcpu, kvm) {
/* If ept_pointer is invalid pointer, bypass flush request. */
if (VALID_PAGE(to_vmx(vcpu)->ept_pointer))
- ret |= __hv_remote_flush_tlb_with_range(
- kvm, vcpu, range);
+ ret |= hv_remote_flush_eptp(to_vmx(vcpu)->ept_pointer,
+ range);
}
- } else {
- ret = __hv_remote_flush_tlb_with_range(kvm,
- kvm_get_vcpu(kvm, 0), range);
+ } else if (VALID_PAGE(to_kvm_vmx(kvm)->hv_tlb_eptp)) {
+ ret = hv_remote_flush_eptp(to_kvm_vmx(kvm)->hv_tlb_eptp, range);
}

spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 5961cb897125..3d557a065c01 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -301,6 +301,7 @@ struct kvm_vmx {
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;

+ hpa_t hv_tlb_eptp;
enum ept_pointers_status ept_pointers_match;
spinlock_t ept_pointer_lock;
};
--
2.28.0

2020-10-21 18:25:32

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 08/10] KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is enabled

Ifdef away the Hyper-V specific fields in structs kvm_vmx and vcpu_vmx
as each field has only a single reference outside of the struct itself
that isn't already wrapped in ifdeffery (and both are initialization).

vcpu_vmx.ept_pointer in particular should be wrapped as it is valid if
and only if Hyper-v is active, i.e. non-Hyper-V code cannot rely on it
to actually track the current EPTP (without additional code changes).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 5 ++++-
arch/x86/kvm/vmx/vmx.h | 4 ++++
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 55d6b699d8e3..a45a90d44d24 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6933,8 +6933,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
vmx->pi_desc.nv = POSTED_INTR_VECTOR;
vmx->pi_desc.sn = 1;

+#if IS_ENABLED(CONFIG_HYPERV)
vmx->ept_pointer = INVALID_PAGE;
-
+#endif
return 0;

free_vmcs:
@@ -6951,7 +6952,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)

static int vmx_vm_init(struct kvm *kvm)
{
+#if IS_ENABLED(CONFIG_HYPERV)
spin_lock_init(&to_kvm_vmx(kvm)->ept_pointer_lock);
+#endif

if (!ple_gap)
kvm->arch.pause_in_guest = true;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index e8d7d07b2020..1b8c08e483cd 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -276,7 +276,9 @@ struct vcpu_vmx {
*/
u64 msr_ia32_feature_control;
u64 msr_ia32_feature_control_valid_bits;
+#if IS_ENABLED(CONFIG_HYPERV)
u64 ept_pointer;
+#endif

struct pt_desc pt_desc;

@@ -295,8 +297,10 @@ struct kvm_vmx {
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;

+#if IS_ENABLED(CONFIG_HYPERV)
hpa_t hv_tlb_eptp;
spinlock_t ept_pointer_lock;
+#endif
};

bool nested_vmx_allowed(struct kvm_vcpu *vcpu);
--
2.28.0

2020-10-21 18:26:42

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 10/10] KVM: VMX: Track PGD instead of EPTP for paravirt Hyper-V TLB flush

Track the address of the top-level EPT struct, a.k.a. the PGD, instead
of the EPTP itself for Hyper-V's paravirt TLB flush. The paravirt API
takes the PGD, not the EPTP, and in theory tracking the EPTP could lead
to false negatives, e.g. if the PGD matched but the attributes in the
EPTP do not. In practice, such a mismatch is extremely unlikely, if not
flat out impossible, given how KVM generates the EPTP.

Opportunistically rename the related fields to use the 'pgd'
nomenclature, and to prefix them with 'hv_tlb' to connect them to
Hyper-V's paravirt TLB flushing.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 53 +++++++++++++++++++-----------------------
arch/x86/kvm/vmx/vmx.h | 6 ++---
2 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e0fea09a6e42..89019e6476b3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -478,18 +478,13 @@ static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush
range->pages);
}

-static inline int hv_remote_flush_eptp(u64 eptp, struct kvm_tlb_range *range)
+static inline int hv_remote_flush_pgd(u64 pgd, struct kvm_tlb_range *range)
{
- /*
- * FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs address
- * of the base of EPT PML4 table, strip off EPT configuration
- * information.
- */
if (range)
- return hyperv_flush_guest_mapping_range(eptp & PAGE_MASK,
+ return hyperv_flush_guest_mapping_range(pgd,
kvm_fill_hv_flush_list_func, (void *)range);
else
- return hyperv_flush_guest_mapping(eptp & PAGE_MASK);
+ return hyperv_flush_guest_mapping(pgd);
}

static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
@@ -499,37 +494,37 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
struct kvm_vcpu *vcpu;
int ret = 0, i;
bool mismatch;
- u64 tmp_eptp;
+ u64 tmp_pgd;

- spin_lock(&kvm_vmx->ept_pointer_lock);
+ spin_lock(&kvm_vmx->hv_tlb_pgd_lock);

- if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
+ if (!VALID_PAGE(kvm_vmx->hv_tlb_pgd)) {
mismatch = false;

kvm_for_each_vcpu(i, vcpu, kvm) {
- tmp_eptp = to_vmx(vcpu)->ept_pointer;
- if (!VALID_PAGE(tmp_eptp) ||
- tmp_eptp == kvm_vmx->hv_tlb_eptp)
+ tmp_pgd = to_vmx(vcpu)->hv_tlb_pgd;
+ if (!VALID_PAGE(tmp_pgd) ||
+ tmp_pgd == kvm_vmx->hv_tlb_pgd)
continue;

- if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp))
- kvm_vmx->hv_tlb_eptp = tmp_eptp;
+ if (!VALID_PAGE(kvm_vmx->hv_tlb_pgd))
+ kvm_vmx->hv_tlb_pgd = tmp_pgd;
else
mismatch = true;

if (!ret)
- ret = hv_remote_flush_eptp(tmp_eptp, range);
+ ret = hv_remote_flush_pgd(tmp_pgd, range);

if (ret && mismatch)
break;
}
if (mismatch)
- kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
+ kvm_vmx->hv_tlb_pgd = INVALID_PAGE;
} else {
- ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
+ ret = hv_remote_flush_pgd(kvm_vmx->hv_tlb_pgd, range);
}

- spin_unlock(&kvm_vmx->ept_pointer_lock);
+ spin_unlock(&kvm_vmx->hv_tlb_pgd_lock);
return ret;
}
static int hv_remote_flush_tlb(struct kvm *kvm)
@@ -564,17 +559,17 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)

#endif /* IS_ENABLED(CONFIG_HYPERV) */

-static void hv_load_mmu_eptp(struct kvm_vcpu *vcpu, u64 eptp)
+static void hv_load_mmu_pgd(struct kvm_vcpu *vcpu, u64 pgd)
{
#if IS_ENABLED(CONFIG_HYPERV)
struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);

if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
- spin_lock(&kvm_vmx->ept_pointer_lock);
- to_vmx(vcpu)->ept_pointer = eptp;
- if (eptp != kvm_vmx->hv_tlb_eptp)
- kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
- spin_unlock(&kvm_vmx->ept_pointer_lock);
+ spin_lock(&kvm_vmx->hv_tlb_pgd_lock);
+ to_vmx(vcpu)->hv_tlb_pgd = pgd;
+ if (pgd != kvm_vmx->hv_tlb_pgd)
+ kvm_vmx->hv_tlb_pgd = INVALID_PAGE;
+ spin_unlock(&kvm_vmx->hv_tlb_pgd_lock);
}
#endif
}
@@ -3059,7 +3054,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
eptp = construct_eptp(vcpu, pgd, pgd_level);
vmcs_write64(EPT_POINTER, eptp);

- hv_load_mmu_eptp(vcpu, eptp);
+ hv_load_mmu_pgd(vcpu, pgd);

if (!enable_unrestricted_guest && !is_paging(vcpu))
guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
@@ -6938,7 +6933,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
vmx->pi_desc.sn = 1;

#if IS_ENABLED(CONFIG_HYPERV)
- vmx->ept_pointer = INVALID_PAGE;
+ vmx->hv_tlb_pgd = INVALID_PAGE;
#endif
return 0;

@@ -6957,7 +6952,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
static int vmx_vm_init(struct kvm *kvm)
{
#if IS_ENABLED(CONFIG_HYPERV)
- spin_lock_init(&to_kvm_vmx(kvm)->ept_pointer_lock);
+ spin_lock_init(&to_kvm_vmx(kvm)->hv_tlb_pgd_lock);
#endif

if (!ple_gap)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 1b8c08e483cd..4d61fb8e8d9d 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -277,7 +277,7 @@ struct vcpu_vmx {
u64 msr_ia32_feature_control;
u64 msr_ia32_feature_control_valid_bits;
#if IS_ENABLED(CONFIG_HYPERV)
- u64 ept_pointer;
+ u64 hv_tlb_pgd;
#endif

struct pt_desc pt_desc;
@@ -298,8 +298,8 @@ struct kvm_vmx {
gpa_t ept_identity_map_addr;

#if IS_ENABLED(CONFIG_HYPERV)
- hpa_t hv_tlb_eptp;
- spinlock_t ept_pointer_lock;
+ hpa_t hv_tlb_pgd;
+ spinlock_t hv_tlb_pgd_lock;
#endif
};

--
2.28.0

2020-10-21 18:26:51

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 07/10] KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd

Explicitly check that kvm_x86_ops.tlb_remote_flush() points at Hyper-V's
implementation for PV flushing instead of assuming that a non-NULL
implementation means running on Hyper-V. Wrap the related logic in
ifdeffery as hv_remote_flush_tlb() is defined iff CONFIG_HYPERV!=n.

Short term, the explicit check makes it more obvious why a non-NULL
tlb_remote_flush() triggers EPTP shenanigans. Long term, this will
allow TDX to define its own implementation of tlb_remote_flush() without
running afoul of Hyper-V.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e6569bafacdc..55d6b699d8e3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -560,6 +560,21 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)

#endif /* IS_ENABLED(CONFIG_HYPERV) */

+static void hv_load_mmu_eptp(struct kvm_vcpu *vcpu, u64 eptp)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+ struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
+
+ if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
+ spin_lock(&kvm_vmx->ept_pointer_lock);
+ to_vmx(vcpu)->ept_pointer = eptp;
+ if (eptp != kvm_vmx->hv_tlb_eptp)
+ kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
+ spin_unlock(&kvm_vmx->ept_pointer_lock);
+ }
+#endif
+}
+
/*
* Comment's format: document - errata name - stepping - processor name.
* Refer from
@@ -3040,13 +3055,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
eptp = construct_eptp(vcpu, pgd, pgd_level);
vmcs_write64(EPT_POINTER, eptp);

- if (kvm_x86_ops.tlb_remote_flush) {
- spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
- to_vmx(vcpu)->ept_pointer = eptp;
- if (eptp != to_kvm_vmx(kvm)->hv_tlb_eptp)
- to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
- spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
- }
+ hv_load_mmu_eptp(vcpu, eptp);

if (!enable_unrestricted_guest && !is_paging(vcpu))
guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
--
2.28.0

2020-10-21 19:47:45

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 09/10] KVM: VMX: Skip additional Hyper-V TLB EPTP flushes if one fails

Skip additional EPTP flushes if one fails when processing EPTPs for
Hyper-V's paravirt TLB flushing. If _any_ flush fails, KVM falls back
to a full global flush, i.e. additional flushes are unnecessary (and
will likely fail anyways).

Continue processing the loop unless a mismatch was already detected,
e.g. to handle the case where the first flush fails and there is a
yet-to-be-detected mismatch.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a45a90d44d24..e0fea09a6e42 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -517,7 +517,11 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
else
mismatch = true;

- ret |= hv_remote_flush_eptp(tmp_eptp, range);
+ if (!ret)
+ ret = hv_remote_flush_eptp(tmp_eptp, range);
+
+ if (ret && mismatch)
+ break;
}
if (mismatch)
kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
--
2.28.0

2020-10-21 19:48:21

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 04/10] KVM: VMX: Do Hyper-V TLB flush iff vCPU's EPTP hasn't been flushed

Combine the for-loops for Hyper-V TLB EPTP checking and flushing, and in
doing so skip flushes for vCPUs whose EPTP matches the target EPTP.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bba6d91f1fe1..52cb9eec1db3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -502,31 +502,23 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,

spin_lock(&kvm_vmx->ept_pointer_lock);

- if (kvm_vmx->ept_pointers_match == EPT_POINTERS_CHECK) {
+ if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
kvm_vmx->ept_pointers_match = EPT_POINTERS_MATCH;
kvm_vmx->hv_tlb_eptp = INVALID_PAGE;

kvm_for_each_vcpu(i, vcpu, kvm) {
tmp_eptp = to_vmx(vcpu)->ept_pointer;
- if (!VALID_PAGE(tmp_eptp))
+ if (!VALID_PAGE(tmp_eptp) ||
+ tmp_eptp == kvm_vmx->hv_tlb_eptp)
continue;

- if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
+ if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp))
kvm_vmx->hv_tlb_eptp = tmp_eptp;
- } else if (kvm_vmx->hv_tlb_eptp != tmp_eptp) {
+ else
kvm_vmx->ept_pointers_match
= EPT_POINTERS_MISMATCH;
- break;
- }
- }
- }

- if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
- kvm_for_each_vcpu(i, vcpu, kvm) {
- /* If ept_pointer is invalid pointer, bypass flush request. */
- if (VALID_PAGE(to_vmx(vcpu)->ept_pointer))
- ret |= hv_remote_flush_eptp(to_vmx(vcpu)->ept_pointer,
- range);
+ ret |= hv_remote_flush_eptp(tmp_eptp, range);
}
} else if (VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
--
2.28.0

2020-10-22 01:34:39

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matches

Sean Christopherson <[email protected]> writes:

> Don't invalidate the common EPTP, and thus trigger rechecking of EPTPs
> across all vCPUs, if the new EPTP matches the old/common EPTP. In all
> likelihood this is a meaningless optimization, but there are (uncommon)
> scenarios where KVM can reload the same EPTP.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4dfde8b64750..e6569bafacdc 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3043,7 +3043,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
> if (kvm_x86_ops.tlb_remote_flush) {
> spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> to_vmx(vcpu)->ept_pointer = eptp;
> - to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
> + if (eptp != to_kvm_vmx(kvm)->hv_tlb_eptp)
> + to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
> spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> }

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2020-10-22 02:36:41

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 08/10] KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is enabled

Sean Christopherson <[email protected]> writes:

> Ifdef away the Hyper-V specific fields in structs kvm_vmx and vcpu_vmx
> as each field has only a single reference outside of the struct itself
> that isn't already wrapped in ifdeffery (and both are initialization).
>
> vcpu_vmx.ept_pointer in particular should be wrapped as it is valid if
> and only if Hyper-v is active, i.e. non-Hyper-V code cannot rely on it
> to actually track the current EPTP (without additional code changes).
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 5 ++++-
> arch/x86/kvm/vmx/vmx.h | 4 ++++
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 55d6b699d8e3..a45a90d44d24 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6933,8 +6933,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
> vmx->pi_desc.nv = POSTED_INTR_VECTOR;
> vmx->pi_desc.sn = 1;
>
> +#if IS_ENABLED(CONFIG_HYPERV)
> vmx->ept_pointer = INVALID_PAGE;
> -
> +#endif
> return 0;
>
> free_vmcs:
> @@ -6951,7 +6952,9 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
>
> static int vmx_vm_init(struct kvm *kvm)
> {
> +#if IS_ENABLED(CONFIG_HYPERV)
> spin_lock_init(&to_kvm_vmx(kvm)->ept_pointer_lock);
> +#endif
>
> if (!ple_gap)
> kvm->arch.pause_in_guest = true;
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index e8d7d07b2020..1b8c08e483cd 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -276,7 +276,9 @@ struct vcpu_vmx {
> */
> u64 msr_ia32_feature_control;
> u64 msr_ia32_feature_control_valid_bits;
> +#if IS_ENABLED(CONFIG_HYPERV)
> u64 ept_pointer;
> +#endif
>
> struct pt_desc pt_desc;
>
> @@ -295,8 +297,10 @@ struct kvm_vmx {
> bool ept_identity_pagetable_done;
> gpa_t ept_identity_map_addr;
>
> +#if IS_ENABLED(CONFIG_HYPERV)
> hpa_t hv_tlb_eptp;
> spinlock_t ept_pointer_lock;
> +#endif
> };
>
> bool nested_vmx_allowed(struct kvm_vcpu *vcpu);

Assuming this compiles without CONFIG_HYPERV,

Reviewed-by: Vitaly Kuznetsov <[email protected]>

--
Vitaly

2020-10-22 05:52:24

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 05/10] KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch

On Wed, Oct 21, 2020 at 02:39:20PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > Drop the dedicated 'ept_pointers_match' field in favor of stuffing
> > 'hv_tlb_eptp' with INVALID_PAGE to mark it as invalid, i.e. to denote
> > that there is at least one EPTP mismatch. Use a local variable to
> > track whether or not a mismatch is detected so that hv_tlb_eptp can be
> > used to skip redundant flushes.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 16 ++++++++--------
> > arch/x86/kvm/vmx/vmx.h | 7 -------
> > 2 files changed, 8 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 52cb9eec1db3..4dfde8b64750 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -498,13 +498,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> > struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
> > struct kvm_vcpu *vcpu;
> > int ret = 0, i;
> > + bool mismatch;
> > u64 tmp_eptp;
> >
> > spin_lock(&kvm_vmx->ept_pointer_lock);
> >
> > - if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
> > - kvm_vmx->ept_pointers_match = EPT_POINTERS_MATCH;
> > - kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> > + if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
> > + mismatch = false;
> >
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > tmp_eptp = to_vmx(vcpu)->ept_pointer;
> > @@ -515,12 +515,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
> > if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp))
> > kvm_vmx->hv_tlb_eptp = tmp_eptp;
> > else
> > - kvm_vmx->ept_pointers_match
> > - = EPT_POINTERS_MISMATCH;
> > + mismatch = true;
> >
> > ret |= hv_remote_flush_eptp(tmp_eptp, range);
> > }
> > - } else if (VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
> > + if (mismatch)
> > + kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> > + } else {
> > ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
> > }
>
> Personally, I find double negations like 'mismatch = false' hard to read
> :-).

Paolo also dislikes double negatives (I just wasted a minute of my life trying
to work a double negative into that sentence).

> What if we write this all like
>
> if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
> kvm_vmx->hv_tlb_eptp = to_vmx(vcpu0)->ept_pointer;
> kvm_for_each_vcpu() {
> tmp_eptp = to_vmx(vcpu)->ept_pointer;
> if (!VALID_PAGE(tmp_eptp) || tmp_eptp != kvm_vmx->hv_tlb_eptp)
> kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> if (VALID_PAGE(tmp_eptp))
> ret |= hv_remote_flush_eptp(tmp_eptp, range);
> }
> } else {
> ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
> }
>
> (not tested and I've probably missed something)

It works, but doesn't optimize the case where one or more vCPUs has an invalid
EPTP. E.g. if vcpuN->ept_pointer is INVALID_PAGE, vcpuN+1..vcpuZ will flush,
even if they all match. Now, whether or not it's worth optimizing that case...

This is also why I named it "mismatch", i.e. it tracks whether or not there was
a mismatch between valid EPTPs, not that all EPTPs matched.

What about replacing "mismatch" with a counter that tracks the number of unique,
valid PGDs that are encountered?

if (!VALID_PAGE(kvm_vmx->hv_tlb_pgd)) {
unique_valid_pgd_cnt = 0;

kvm_for_each_vcpu(i, vcpu, kvm) {
tmp_pgd = to_vmx(vcpu)->hv_tlb_pgd;
if (!VALID_PAGE(tmp_pgd) ||
tmp_pgd == kvm_vmx->hv_tlb_pgd)
continue;

unique_valid_pgd_cnt++;

if (!VALID_PAGE(kvm_vmx->hv_tlb_pgd))
kvm_vmx->hv_tlb_pgd = tmp_pgd;

if (!ret)
ret = hv_remote_flush_pgd(tmp_pgd, range);

if (ret && unique_valid_pgd_cnt > 1)
break;
}
if (unique_valid_pgd_cnt > 1)
kvm_vmx->hv_tlb_pgd = INVALID_PAGE;
} else {
ret = hv_remote_flush_pgd(kvm_vmx->hv_tlb_pgd, range);
}


Alternatively, the pgd_cnt adjustment could be used to update hv_tlb_pgd, e.g.

if (++unique_valid_pgd_cnt == 1)
kvm_vmx->hv_tlb_pgd = tmp_pgd;

I think I like this last one the most. It self-documents what we're tracking
as well as the relationship between the number of valid PGDs and hv_tlb_pgd.

I'll also add a few comments to explain how kvm_vmx->hv_tlb_pgd is used.

Thoughts?

> > @@ -3042,8 +3043,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
> > if (kvm_x86_ops.tlb_remote_flush) {
> > spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> > to_vmx(vcpu)->ept_pointer = eptp;
> > - to_kvm_vmx(kvm)->ept_pointers_match
> > - = EPT_POINTERS_CHECK;
> > + to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
> > spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> > }
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index 3d557a065c01..e8d7d07b2020 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -288,12 +288,6 @@ struct vcpu_vmx {
> > } shadow_msr_intercept;
> > };
> >
> > -enum ept_pointers_status {
> > - EPT_POINTERS_CHECK = 0,
> > - EPT_POINTERS_MATCH = 1,
> > - EPT_POINTERS_MISMATCH = 2
> > -};
> > -
> > struct kvm_vmx {
> > struct kvm kvm;
> >
> > @@ -302,7 +296,6 @@ struct kvm_vmx {
> > gpa_t ept_identity_map_addr;
> >
> > hpa_t hv_tlb_eptp;
> > - enum ept_pointers_status ept_pointers_match;
> > spinlock_t ept_pointer_lock;
> > };
>
> --
> Vitaly
>

2020-10-22 06:31:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] KVM: VMX: Track PGD instead of EPTP for paravirt Hyper-V TLB flush

On Wed, Oct 21, 2020 at 04:39:28PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index e0fea09a6e42..89019e6476b3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -478,18 +478,13 @@ static int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush
> > range->pages);
> > }
> >
> > -static inline int hv_remote_flush_eptp(u64 eptp, struct kvm_tlb_range *range)
> > +static inline int hv_remote_flush_pgd(u64 pgd, struct kvm_tlb_range *range)
> > {
> > - /*
> > - * FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs address
> > - * of the base of EPT PML4 table, strip off EPT configuration
> > - * information.
> > - */
> > if (range)
> > - return hyperv_flush_guest_mapping_range(eptp & PAGE_MASK,
> > + return hyperv_flush_guest_mapping_range(pgd,
> > kvm_fill_hv_flush_list_func, (void *)range);
> > else
> > - return hyperv_flush_guest_mapping(eptp & PAGE_MASK);
> > + return hyperv_flush_guest_mapping(pgd);
> > }
>
> (I'm probably missing something, please bear with me -- this is the last
> patch of the series after all :-) but PGD which comes from
> kvm_mmu_load_pgd() has PCID bits encoded and you're dropping
> '&PAGE_MASK' here ...

...

> > @@ -564,17 +559,17 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
> >
> > #endif /* IS_ENABLED(CONFIG_HYPERV) */
> >
> > -static void hv_load_mmu_eptp(struct kvm_vcpu *vcpu, u64 eptp)
> > +static void hv_load_mmu_pgd(struct kvm_vcpu *vcpu, u64 pgd)
> > {
> > #if IS_ENABLED(CONFIG_HYPERV)
> > struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
> >
> > if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
> > - spin_lock(&kvm_vmx->ept_pointer_lock);
> > - to_vmx(vcpu)->ept_pointer = eptp;
> > - if (eptp != kvm_vmx->hv_tlb_eptp)
> > - kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> > - spin_unlock(&kvm_vmx->ept_pointer_lock);
> > + spin_lock(&kvm_vmx->hv_tlb_pgd_lock);
> > + to_vmx(vcpu)->hv_tlb_pgd = pgd;
> > + if (pgd != kvm_vmx->hv_tlb_pgd)
> > + kvm_vmx->hv_tlb_pgd = INVALID_PAGE;
> > + spin_unlock(&kvm_vmx->hv_tlb_pgd_lock);
> > }
> > #endif
> > }
> > @@ -3059,7 +3054,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
> > eptp = construct_eptp(vcpu, pgd, pgd_level);
> > vmcs_write64(EPT_POINTER, eptp);
> >
> > - hv_load_mmu_eptp(vcpu, eptp);
> > + hv_load_mmu_pgd(vcpu, pgd);
>
> ... and not adding it here. (construct_eptp() seems to drop PCID bits
> but add its own stuff). Is this on purpose?

No, I completely forgot KVM crams the PCID bits into pgd. I'll think I'll add
a patch to rework .load_mmu_pgd() to move the PCID bits to a separate param,
and change construct_eptp() to do WARN_ON_ONCE(pgd & ~PAGE_MASK).

Actually, I think it makes more sense to have VMX and SVM, grab the PCID via
kvm_get_active_pcid(vcpu) when necessary. For EPTP, getting the PCID bits may
unnecessarily read CR3 from the VMCS.

Ugh, which brings up another issue. I'm pretty sure the "vmcs01.GUEST_CR3 is
already up-to-date" is dead code:

if (!enable_unrestricted_guest && !is_paging(vcpu))
guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
guest_cr3 = vcpu->arch.cr3;
else /* vmcs01.GUEST_CR3 is already up-to-date. */
update_guest_cr3 = false;
vmx_ept_load_pdptrs(vcpu);

The sole caller of .load_mmu_pgd() always invokes kvm_get_active_pcid(), which
in turn always does kvm_read_cr3(), i.e. CR3 will always be available.

So yeah, I think moving kvm_get_active_pcid() in VMX/SVM is the right approach.
I'll rename "pgd" to "root_hpa" and "pgd_level" to "root_level" so that we
don't end up with inconsistencies, e.g. where pgd may or may not contain PCID
bits.

Nice catch!

2020-10-22 09:33:10

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2 05/10] KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatch

Sean Christopherson <[email protected]> writes:

> On Wed, Oct 21, 2020 at 02:39:20PM +0200, Vitaly Kuznetsov wrote:
>> Sean Christopherson <[email protected]> writes:
>>
>> > Drop the dedicated 'ept_pointers_match' field in favor of stuffing
>> > 'hv_tlb_eptp' with INVALID_PAGE to mark it as invalid, i.e. to denote
>> > that there is at least one EPTP mismatch. Use a local variable to
>> > track whether or not a mismatch is detected so that hv_tlb_eptp can be
>> > used to skip redundant flushes.
>> >
>> > No functional change intended.
>> >
>> > Signed-off-by: Sean Christopherson <[email protected]>
>> > ---
>> > arch/x86/kvm/vmx/vmx.c | 16 ++++++++--------
>> > arch/x86/kvm/vmx/vmx.h | 7 -------
>> > 2 files changed, 8 insertions(+), 15 deletions(-)
>> >
>> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> > index 52cb9eec1db3..4dfde8b64750 100644
>> > --- a/arch/x86/kvm/vmx/vmx.c
>> > +++ b/arch/x86/kvm/vmx/vmx.c
>> > @@ -498,13 +498,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
>> > struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
>> > struct kvm_vcpu *vcpu;
>> > int ret = 0, i;
>> > + bool mismatch;
>> > u64 tmp_eptp;
>> >
>> > spin_lock(&kvm_vmx->ept_pointer_lock);
>> >
>> > - if (kvm_vmx->ept_pointers_match != EPT_POINTERS_MATCH) {
>> > - kvm_vmx->ept_pointers_match = EPT_POINTERS_MATCH;
>> > - kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
>> > + if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
>> > + mismatch = false;
>> >
>> > kvm_for_each_vcpu(i, vcpu, kvm) {
>> > tmp_eptp = to_vmx(vcpu)->ept_pointer;
>> > @@ -515,12 +515,13 @@ static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
>> > if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp))
>> > kvm_vmx->hv_tlb_eptp = tmp_eptp;
>> > else
>> > - kvm_vmx->ept_pointers_match
>> > - = EPT_POINTERS_MISMATCH;
>> > + mismatch = true;
>> >
>> > ret |= hv_remote_flush_eptp(tmp_eptp, range);
>> > }
>> > - } else if (VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
>> > + if (mismatch)
>> > + kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
>> > + } else {
>> > ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
>> > }
>>
>> Personally, I find double negations like 'mismatch = false' hard to read
>> :-).
>
> Paolo also dislikes double negatives (I just wasted a minute of my life trying
> to work a double negative into that sentence).
>
>> What if we write this all like
>>
>> if (!VALID_PAGE(kvm_vmx->hv_tlb_eptp)) {
>> kvm_vmx->hv_tlb_eptp = to_vmx(vcpu0)->ept_pointer;
>> kvm_for_each_vcpu() {
>> tmp_eptp = to_vmx(vcpu)->ept_pointer;
>> if (!VALID_PAGE(tmp_eptp) || tmp_eptp != kvm_vmx->hv_tlb_eptp)
>> kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
>> if (VALID_PAGE(tmp_eptp))
>> ret |= hv_remote_flush_eptp(tmp_eptp, range);
>> }
>> } else {
>> ret = hv_remote_flush_eptp(kvm_vmx->hv_tlb_eptp, range);
>> }
>>
>> (not tested and I've probably missed something)
>
> It works, but doesn't optimize the case where one or more vCPUs has an invalid
> EPTP. E.g. if vcpuN->ept_pointer is INVALID_PAGE, vcpuN+1..vcpuZ will flush,
> even if they all match. Now, whether or not it's worth optimizing
> that case...

Yea. As KVM is already running on Hyper-V, nesting on top of it is
likely out of question so IMO it's not even worth optimizing...

>
> This is also why I named it "mismatch", i.e. it tracks whether or not there was
> a mismatch between valid EPTPs, not that all EPTPs matched.
>
> What about replacing "mismatch" with a counter that tracks the number of unique,
> valid PGDs that are encountered?
>
> if (!VALID_PAGE(kvm_vmx->hv_tlb_pgd)) {
> unique_valid_pgd_cnt = 0;
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> tmp_pgd = to_vmx(vcpu)->hv_tlb_pgd;
> if (!VALID_PAGE(tmp_pgd) ||
> tmp_pgd == kvm_vmx->hv_tlb_pgd)
> continue;
>
> unique_valid_pgd_cnt++;
>
> if (!VALID_PAGE(kvm_vmx->hv_tlb_pgd))
> kvm_vmx->hv_tlb_pgd = tmp_pgd;
>
> if (!ret)
> ret = hv_remote_flush_pgd(tmp_pgd, range);
>
> if (ret && unique_valid_pgd_cnt > 1)
> break;
> }
> if (unique_valid_pgd_cnt > 1)
> kvm_vmx->hv_tlb_pgd = INVALID_PAGE;
> } else {
> ret = hv_remote_flush_pgd(kvm_vmx->hv_tlb_pgd, range);
> }
>
>
> Alternatively, the pgd_cnt adjustment could be used to update hv_tlb_pgd, e.g.
>
> if (++unique_valid_pgd_cnt == 1)
> kvm_vmx->hv_tlb_pgd = tmp_pgd;
>
> I think I like this last one the most. It self-documents what we're tracking
> as well as the relationship between the number of valid PGDs and
> hv_tlb_pgd.

Both approaches look good to me, thanks!

>
> I'll also add a few comments to explain how kvm_vmx->hv_tlb_pgd is used.
>
> Thoughts?
>
>> > @@ -3042,8 +3043,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
>> > if (kvm_x86_ops.tlb_remote_flush) {
>> > spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
>> > to_vmx(vcpu)->ept_pointer = eptp;
>> > - to_kvm_vmx(kvm)->ept_pointers_match
>> > - = EPT_POINTERS_CHECK;
>> > + to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
>> > spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
>> > }
>> >
>> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
>> > index 3d557a065c01..e8d7d07b2020 100644
>> > --- a/arch/x86/kvm/vmx/vmx.h
>> > +++ b/arch/x86/kvm/vmx/vmx.h
>> > @@ -288,12 +288,6 @@ struct vcpu_vmx {
>> > } shadow_msr_intercept;
>> > };
>> >
>> > -enum ept_pointers_status {
>> > - EPT_POINTERS_CHECK = 0,
>> > - EPT_POINTERS_MATCH = 1,
>> > - EPT_POINTERS_MISMATCH = 2
>> > -};
>> > -
>> > struct kvm_vmx {
>> > struct kvm kvm;
>> >
>> > @@ -302,7 +296,6 @@ struct kvm_vmx {
>> > gpa_t ept_identity_map_addr;
>> >
>> > hpa_t hv_tlb_eptp;
>> > - enum ept_pointers_status ept_pointers_match;
>> > spinlock_t ept_pointer_lock;
>> > };
>>
>> --
>> Vitaly
>>
>

--
Vitaly

2020-10-22 11:46:26

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 07/10] KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgd

On Wed, Oct 21, 2020 at 04:18:04PM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > Explicitly check that kvm_x86_ops.tlb_remote_flush() points at Hyper-V's
> > implementation for PV flushing instead of assuming that a non-NULL
> > implementation means running on Hyper-V. Wrap the related logic in
> > ifdeffery as hv_remote_flush_tlb() is defined iff CONFIG_HYPERV!=n.
> >
> > Short term, the explicit check makes it more obvious why a non-NULL
> > tlb_remote_flush() triggers EPTP shenanigans. Long term, this will
> > allow TDX to define its own implementation of tlb_remote_flush() without
> > running afoul of Hyper-V.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 23 ++++++++++++++++-------
> > 1 file changed, 16 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index e6569bafacdc..55d6b699d8e3 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -560,6 +560,21 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
> >
> > #endif /* IS_ENABLED(CONFIG_HYPERV) */
> >
> > +static void hv_load_mmu_eptp(struct kvm_vcpu *vcpu, u64 eptp)
> > +{
> > +#if IS_ENABLED(CONFIG_HYPERV)
> > + struct kvm_vmx *kvm_vmx = to_kvm_vmx(vcpu->kvm);
> > +
> > + if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) {
> > + spin_lock(&kvm_vmx->ept_pointer_lock);
> > + to_vmx(vcpu)->ept_pointer = eptp;
> > + if (eptp != kvm_vmx->hv_tlb_eptp)
> > + kvm_vmx->hv_tlb_eptp = INVALID_PAGE;
> > + spin_unlock(&kvm_vmx->ept_pointer_lock);
> > + }
> > +#endif
> > +}
> > +
> > /*
> > * Comment's format: document - errata name - stepping - processor name.
> > * Refer from
> > @@ -3040,13 +3055,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, unsigned long pgd,
> > eptp = construct_eptp(vcpu, pgd, pgd_level);
> > vmcs_write64(EPT_POINTER, eptp);
> >
> > - if (kvm_x86_ops.tlb_remote_flush) {
> > - spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> > - to_vmx(vcpu)->ept_pointer = eptp;
> > - if (eptp != to_kvm_vmx(kvm)->hv_tlb_eptp)
> > - to_kvm_vmx(kvm)->hv_tlb_eptp = INVALID_PAGE;
> > - spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> > - }
> > + hv_load_mmu_eptp(vcpu, eptp);
>
> So when TDX comes around, will we need to add something to
> hv_load_mmu_eptp() and rename it or there's nothing to do for TDX when
> PGD changes? I'm just wondering if it would make sense to rename
> hv_load_mmu_eptp() to something else right away.

Short answer, it's a non-issue.

There are things to do for TDX guests when PGD changes, but it's a completely
different flow than VMX. For TDX, the Secure/Private EPTP is set in stone,
i.e. it's per-VM and cannot be changed. The Shared/Public EPTP can be changed,
and this is what's handled in TDX's implementaiton of .load_mmu_pgd().

As for why .tlb_remote_flush() is relevant...

For TDX, because the VMM is untrusted, the actual INVEPTP on the Secure EPTP
must be done by the TDX-Module; there is state tracking in TDX-Module that
enforces this, e.g. operations on the S-EPT tables that rely on a previous
flush will fail if the flush wasn't performed.

That's why KVM hooks .tlb_remote_flush for TDX; KVM needs to do INVEPT on the
shared/public/untrusted EPTP, and then do a SEAMCALL to invoke TDX-Module's
tracking and flushing.

The collision on the VMX side occurs because VMX and TDX code shared the same
kvm_x86_ops (in our proposed implementation), i.e. VMX would get a false
positive for "am I running on Hyper-V" if it only checked for a non-null
callback.

For "real" TDX, KVM will be running on bare metal, i.e. KVM won't be an L1
running on Hyper-V. It's certainly possible emulate the functional bits of TDX
in L0, i.e. to run/load KVM+TDX in L1, but the odds of that colliding with
Hyper-V's L1 GPA PV TLB flushing in upstream KVM are *extremely* tiny. The
main use case of TDX is to take the platform owner, e.g. CSP, out of the TCB,
i.e. running KVM+TDX in L1 in production would wipe out the value provided by
TDX as doing so would mean trusting L0 to do the right thing.

There is value in running KVM+TDX in L1 from a development/debug perspective,
but (a) I'd be quite surprised if Microsoft publicly released a version of
Hyper-V that emulated SEAM+TDX, and (b) even if a publicly available VMM
emulates SEAM+TDX, it would not want to enlighten L1 KVM as the goal behind
running nested would be for development/debug, i.e. it'd want to provide an
environment as close to the real thing as possible.