2022-09-21 15:27:53

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 00/39] KVM: x86: hyper-v: Fine-grained TLB flush + L2 TLB flush features

Changes since v9:
- Rebase to the latest sean/for_paolo/6.1 (5df50a4a9b60)
- Patch "x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition" was
dropped from this series as it is already queued.
- Add Drew's R-b tag to PATCH27.

Original description:

Currently, KVM handles HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests
by flushing the whole VPID and this is sub-optimal. This series introduces
the required mechanism to make handling of these requests more
fine-grained by flushing individual GVAs only (when requested). On this
foundation, "Direct Virtual Flush" Hyper-V feature is implemented. The
feature allows L0 to handle Hyper-V TLB flush hypercalls directly at
L0 without the need to reflect the exit to L1. This has at least two
benefits: reflecting vmexit and the consequent vmenter are avoided + L0
has precise information whether the target vCPU is actually running (and
thus requires a kick).

Sean Christopherson (1):
KVM: x86: hyper-v: Add helper to read hypercall data for array

Vitaly Kuznetsov (38):
KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'
KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag
KVM: x86: hyper-v: Introduce TLB flush fifo
KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls
gently
KVM: x86: hyper-v: Expose support for extended gva ranges for flush
hypercalls
KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
x86/hyperv: Introduce
HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
KVM: x86: hyper-v: Use
HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw
'64'
KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in
kvm_hv_send_ipi()
KVM: x86: hyper-v: Create a separate fifo for L2 TLB flush
KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv'
instead of on-stack 'sparse_banks'
KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use
KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
KVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush()
nested hook
KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()
KVM: x86: hyper-v: L2 TLB flush
KVM: x86: hyper-v: Introduce fast guest_hv_cpuid_has_l2_tlb_flush()
check
KVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv'
KVM: nVMX: hyper-v: Enable L2 TLB flush
KVM: nSVM: hyper-v: Enable L2 TLB flush
KVM: x86: Expose Hyper-V L2 TLB flush feature
KVM: selftests: Better XMM read/write helpers
KVM: selftests: Move HYPERV_LINUX_OS_ID definition to a common header
KVM: selftests: Move the function doing Hyper-V hypercall to a common
header
KVM: selftests: Hyper-V PV IPI selftest
KVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too
KVM: selftests: Export vm_vaddr_unused_gap() to make it possible to
request unmapped ranges
KVM: selftests: Export _vm_get_page_table_entry()
KVM: selftests: Hyper-V PV TLB flush selftest
KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with
hyperv-tlfs.h
KVM: selftests: Sync 'struct hv_vp_assist_page' definition with
hyperv-tlfs.h
KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h
KVM: selftests: Split off load_evmcs() from load_vmcs()
KVM: selftests: Create a vendor independent helper to allocate Hyper-V
specific test pages
KVM: selftests: Allocate Hyper-V partition assist page
KVM: selftests: evmcs_test: Introduce L2 TLB flush test
KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test
KVM: selftests: Rename 'evmcs_test' to 'hyperv_evmcs'

arch/x86/include/asm/hyperv-tlfs.h | 2 +
arch/x86/include/asm/kvm-x86-ops.h | 2 +-
arch/x86/include/asm/kvm_host.h | 42 +-
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/hyperv.c | 328 +++++++--
arch/x86/kvm/hyperv.h | 53 +-
arch/x86/kvm/svm/hyperv.c | 18 +
arch/x86/kvm/svm/hyperv.h | 48 ++
arch/x86/kvm/svm/nested.c | 39 +-
arch/x86/kvm/svm/svm_onhyperv.c | 2 +-
arch/x86/kvm/svm/svm_onhyperv.h | 6 +-
arch/x86/kvm/trace.h | 21 +-
arch/x86/kvm/vmx/evmcs.c | 42 +-
arch/x86/kvm/vmx/evmcs.h | 13 +-
arch/x86/kvm/vmx/nested.c | 44 +-
arch/x86/kvm/vmx/vmx.c | 6 +-
arch/x86/kvm/x86.c | 18 +-
arch/x86/kvm/x86.h | 1 +
include/asm-generic/hyperv-tlfs.h | 5 +
include/asm-generic/mshyperv.h | 11 +-
tools/testing/selftests/kvm/.gitignore | 4 +-
tools/testing/selftests/kvm/Makefile | 5 +-
.../selftests/kvm/include/kvm_util_base.h | 1 +
.../selftests/kvm/include/x86_64/evmcs.h | 48 +-
.../selftests/kvm/include/x86_64/hyperv.h | 100 +++
.../selftests/kvm/include/x86_64/processor.h | 72 +-
.../selftests/kvm/include/x86_64/vmx.h | 8 -
tools/testing/selftests/kvm/lib/kvm_util.c | 9 +-
.../testing/selftests/kvm/lib/x86_64/hyperv.c | 46 ++
.../selftests/kvm/lib/x86_64/processor.c | 5 +-
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 45 +-
.../x86_64/{evmcs_test.c => hyperv_evmcs.c} | 69 +-
.../selftests/kvm/x86_64/hyperv_features.c | 24 +-
.../testing/selftests/kvm/x86_64/hyperv_ipi.c | 330 +++++++++
.../selftests/kvm/x86_64/hyperv_svm_test.c | 64 +-
.../selftests/kvm/x86_64/hyperv_tlb_flush.c | 644 ++++++++++++++++++
36 files changed, 1924 insertions(+), 254 deletions(-)
create mode 100644 arch/x86/kvm/svm/hyperv.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/hyperv.c
rename tools/testing/selftests/kvm/x86_64/{evmcs_test.c => hyperv_evmcs.c} (73%)
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c

--
2.37.3


2022-09-21 15:28:14

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 11/39] KVM: x86: hyper-v: Create a separate fifo for L2 TLB flush

To handle L2 TLB flush requests, KVM needs to use a separate fifo from
regular (L1) Hyper-V TLB flush requests: e.g. when a request to flush
something in L2 is made, the target vCPU can transition from L2 to L1,
receive a request to flush a GVA for L1 and then try to enter L2 back.
The first request needs to be processed at this point. Similarly,
requests to flush GVAs in L1 must wait until L2 exits to L1.

No functional change as KVM doesn't handle L2 TLB flush requests from
L2 yet.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 +++++++-
arch/x86/kvm/hyperv.c | 11 +++++++----
arch/x86/kvm/hyperv.h | 18 +++++++++++++++---
3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c97161436a9d..add0718798c1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -612,6 +612,12 @@ struct kvm_vcpu_hv_synic {
*/
#define KVM_HV_TLB_FLUSHALL_ENTRY ((u64)-1)

+enum hv_tlb_flush_fifos {
+ HV_L1_TLB_FLUSH_FIFO,
+ HV_L2_TLB_FLUSH_FIFO,
+ HV_NR_TLB_FLUSH_FIFOS,
+};
+
struct kvm_vcpu_hv_tlb_flush_fifo {
spinlock_t write_lock;
DECLARE_KFIFO(entries, u64, KVM_HV_TLB_FLUSH_FIFO_SIZE);
@@ -639,7 +645,7 @@ struct kvm_vcpu_hv {
u32 nested_ebx; /* HYPERV_CPUID_NESTED_FEATURES.EBX */
} cpuid_cache;

- struct kvm_vcpu_hv_tlb_flush_fifo tlb_flush_fifo;
+ struct kvm_vcpu_hv_tlb_flush_fifo tlb_flush_fifo[HV_NR_TLB_FLUSH_FIFOS];
};

/* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 9764ebb7fd5f..23eb139b2936 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -956,8 +956,10 @@ int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)

hv_vcpu->vp_index = vcpu->vcpu_idx;

- INIT_KFIFO(hv_vcpu->tlb_flush_fifo.entries);
- spin_lock_init(&hv_vcpu->tlb_flush_fifo.write_lock);
+ for (i = 0; i < HV_NR_TLB_FLUSH_FIFOS; i++) {
+ INIT_KFIFO(hv_vcpu->tlb_flush_fifo[i].entries);
+ spin_lock_init(&hv_vcpu->tlb_flush_fifo[i].write_lock);
+ }

return 0;
}
@@ -1837,7 +1839,8 @@ static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
if (!hv_vcpu)
return;

- tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo;
+ /* kvm_hv_flush_tlb() is not ready to handle requests for L2s yet */
+ tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo[HV_L1_TLB_FLUSH_FIFO];

spin_lock_irqsave(&tlb_flush_fifo->write_lock, flags);

@@ -1874,7 +1877,7 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
return;
}

- tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo;
+ tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode(vcpu));

count = kfifo_out(&tlb_flush_fifo->entries, entries, KVM_HV_TLB_FLUSH_FIFO_SIZE);

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index ac30091ab346..ca7f1d2c134e 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -22,6 +22,7 @@
#define __ARCH_X86_KVM_HYPERV_H__

#include <linux/kvm_host.h>
+#include "x86.h"

/* "Hv#1" signature */
#define HYPERV_CPUID_SIGNATURE_EAX 0x31237648
@@ -151,16 +152,27 @@ int kvm_vm_ioctl_hv_eventfd(struct kvm *kvm, struct kvm_hyperv_eventfd *args);
int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
struct kvm_cpuid_entry2 __user *entries);

+static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struct kvm_vcpu *vcpu,
+ bool is_guest_mode)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+ int i = is_guest_mode ? HV_L2_TLB_FLUSH_FIFO :
+ HV_L1_TLB_FLUSH_FIFO;
+
+ /* KVM does not handle L2 TLB flush requests yet */
+ WARN_ON_ONCE(i != HV_L1_TLB_FLUSH_FIFO);
+
+ return &hv_vcpu->tlb_flush_fifo[i];
+}

static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
- struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);

- if (!hv_vcpu || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
+ if (!to_hv_vcpu(vcpu) || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
return;

- tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo;
+ tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode(vcpu));

kfifo_reset_out(&tlb_flush_fifo->entries);
}
--
2.37.3

2022-09-21 15:28:22

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 05/39] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently

Currently, HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls are handled
the exact same way as HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE{,EX}: by
flushing the whole VPID and this is sub-optimal. Switch to handling
these requests with 'flush_tlb_gva()' hooks instead. Use the newly
introduced TLB flush fifo to queue the requests.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 101 ++++++++++++++++++++++++++++++++++++------
1 file changed, 88 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index fb0f7342fccf..d5a329cebcc6 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1800,33 +1800,82 @@ static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
sparse_banks, consumed_xmm_halves, offset);
}

-static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu)
+static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc, u64 entries[],
+ int consumed_xmm_halves, gpa_t offset)
+{
+ return kvm_hv_get_hc_data(kvm, hc, hc->rep_cnt, hc->rep_cnt,
+ entries, consumed_xmm_halves, offset);
+}
+
+static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
{
struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
u64 flush_all_entry = KVM_HV_TLB_FLUSHALL_ENTRY;
+ unsigned long flags;

if (!hv_vcpu)
return;

tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo;

- kfifo_in_spinlocked(&tlb_flush_fifo->entries, &flush_all_entry,
- 1, &tlb_flush_fifo->write_lock);
+ spin_lock_irqsave(&tlb_flush_fifo->write_lock, flags);
+
+ /*
+ * All entries should fit on the fifo leaving one free for 'flush all'
+ * entry in case another request comes in. In case there's not enough
+ * space, just put 'flush all' entry there.
+ */
+ if (count && entries && count < kfifo_avail(&tlb_flush_fifo->entries)) {
+ WARN_ON(kfifo_in(&tlb_flush_fifo->entries, entries, count) != count);
+ goto out_unlock;
+ }
+
+ /*
+ * Note: full fifo always contains 'flush all' entry, no need to check the
+ * return value.
+ */
+ kfifo_in(&tlb_flush_fifo->entries, &flush_all_entry, 1);
+
+out_unlock:
+ spin_unlock_irqrestore(&tlb_flush_fifo->write_lock, flags);
}

void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+ u64 entries[KVM_HV_TLB_FLUSH_FIFO_SIZE];
+ int i, j, count;
+ gva_t gva;

- kvm_vcpu_flush_tlb_guest(vcpu);
-
- if (!hv_vcpu)
+ if (!tdp_enabled || !hv_vcpu) {
+ kvm_vcpu_flush_tlb_guest(vcpu);
return;
+ }

tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo;

+ count = kfifo_out(&tlb_flush_fifo->entries, entries, KVM_HV_TLB_FLUSH_FIFO_SIZE);
+
+ for (i = 0; i < count; i++) {
+ if (entries[i] == KVM_HV_TLB_FLUSHALL_ENTRY)
+ goto out_flush_all;
+
+ /*
+ * Lower 12 bits of 'address' encode the number of additional
+ * pages to flush.
+ */
+ gva = entries[i] & PAGE_MASK;
+ for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++)
+ static_call(kvm_x86_flush_tlb_gva)(vcpu, gva + j * PAGE_SIZE);
+
+ ++vcpu->stat.tlb_flush;
+ }
+ return;
+
+out_flush_all:
+ kvm_vcpu_flush_tlb_guest(vcpu);
kfifo_reset_out(&tlb_flush_fifo->entries);
}

@@ -1836,11 +1885,21 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
struct hv_tlb_flush_ex flush_ex;
struct hv_tlb_flush flush;
DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
+ /*
+ * Normally, there can be no more than 'KVM_HV_TLB_FLUSH_FIFO_SIZE'
+ * entries on the TLB flush fifo. The last entry, however, needs to be
+ * always left free for 'flush all' entry which gets placed when
+ * there is not enough space to put all the requested entries.
+ */
+ u64 __tlb_flush_entries[KVM_HV_TLB_FLUSH_FIFO_SIZE - 1];
+ u64 *tlb_flush_entries;
u64 valid_bank_mask;
u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
struct kvm_vcpu *v;
unsigned long i;
bool all_cpus;
+ int consumed_xmm_halves = 0;
+ gpa_t data_offset;

/*
* The Hyper-V TLFS doesn't allow more than 64 sparse banks, e.g. the
@@ -1856,10 +1915,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
flush.address_space = hc->ingpa;
flush.flags = hc->outgpa;
flush.processor_mask = sse128_lo(hc->xmm[0]);
+ consumed_xmm_halves = 1;
} else {
if (unlikely(kvm_read_guest(kvm, hc->ingpa,
&flush, sizeof(flush))))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
+ data_offset = sizeof(flush);
}

trace_kvm_hv_flush_tlb(flush.processor_mask,
@@ -1883,10 +1944,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
flush_ex.flags = hc->outgpa;
memcpy(&flush_ex.hv_vp_set,
&hc->xmm[0], sizeof(hc->xmm[0]));
+ consumed_xmm_halves = 2;
} else {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &flush_ex,
sizeof(flush_ex))))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
+ data_offset = sizeof(flush_ex);
}

trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask,
@@ -1902,25 +1965,37 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
return HV_STATUS_INVALID_HYPERCALL_INPUT;

if (all_cpus)
- goto do_flush;
+ goto read_flush_entries;

if (!hc->var_cnt)
goto ret_success;

- if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
- offsetof(struct hv_tlb_flush_ex,
- hv_vp_set.bank_contents)))
+ if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, consumed_xmm_halves,
+ data_offset))
+ return HV_STATUS_INVALID_HYPERCALL_INPUT;
+ data_offset += hc->var_cnt * sizeof(sparse_banks[0]);
+ consumed_xmm_halves += hc->var_cnt;
+ }
+
+read_flush_entries:
+ if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
+ hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX ||
+ hc->rep_cnt > ARRAY_SIZE(__tlb_flush_entries)) {
+ tlb_flush_entries = NULL;
+ } else {
+ if (kvm_hv_get_tlb_flush_entries(kvm, hc, __tlb_flush_entries,
+ consumed_xmm_halves, data_offset))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
+ tlb_flush_entries = __tlb_flush_entries;
}

-do_flush:
/*
* vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
* analyze it here, flush TLB regardless of the specified address space.
*/
if (all_cpus) {
kvm_for_each_vcpu(i, v, kvm)
- hv_tlb_flush_enqueue(v);
+ hv_tlb_flush_enqueue(v, tlb_flush_entries, hc->rep_cnt);

kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
} else {
@@ -1930,7 +2005,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
v = kvm_get_vcpu(kvm, i);
if (!v)
continue;
- hv_tlb_flush_enqueue(v);
+ hv_tlb_flush_enqueue(v, tlb_flush_entries, hc->rep_cnt);
}

kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
--
2.37.3

2022-09-21 15:30:05

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 22/39] KVM: x86: Expose Hyper-V L2 TLB flush feature

With both nSVM and nVMX implementations in place, KVM can now expose
Hyper-V L2 TLB flush feature to userspace.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8ae32ec87efb..1eb2e3c8b392 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2764,6 +2764,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,

case HYPERV_CPUID_NESTED_FEATURES:
ent->eax = evmcs_ver;
+ ent->eax |= HV_X64_NESTED_DIRECT_FLUSH;
ent->eax |= HV_X64_NESTED_MSR_BITMAP;
ent->ebx |= HV_X64_NESTED_EVMCS1_PERF_GLOBAL_CTRL;
break;
--
2.37.3

2022-09-21 15:30:13

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 23/39] KVM: selftests: Better XMM read/write helpers

set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
from 128-bit registers. Moreover, these helpers are not used. Borrow
_kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
XMM0-XMM8 for now.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 70 ++++++++++---------
1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 0cbc71b7af50..1c7805de8c27 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -436,71 +436,73 @@ static inline bool this_cpu_has(struct kvm_x86_cpu_feature feature)
return gprs[feature.reg] & BIT(feature.bit);
}

-#define SET_XMM(__var, __xmm) \
- asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
+typedef u32 __attribute__((vector_size(16))) sse128_t;
+#define __sse128_u union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; }
+#define sse128_lo(x) ({ __sse128_u t; t.vec = x; t.as_u64[0]; })
+#define sse128_hi(x) ({ __sse128_u t; t.vec = x; t.as_u64[1]; })

-static inline void set_xmm(int n, unsigned long val)
+static inline void read_sse_reg(int reg, sse128_t *data)
{
- switch (n) {
+ switch (reg) {
case 0:
- SET_XMM(val, xmm0);
+ asm("movdqa %%xmm0, %0" : "=m"(*data));
break;
case 1:
- SET_XMM(val, xmm1);
+ asm("movdqa %%xmm1, %0" : "=m"(*data));
break;
case 2:
- SET_XMM(val, xmm2);
+ asm("movdqa %%xmm2, %0" : "=m"(*data));
break;
case 3:
- SET_XMM(val, xmm3);
+ asm("movdqa %%xmm3, %0" : "=m"(*data));
break;
case 4:
- SET_XMM(val, xmm4);
+ asm("movdqa %%xmm4, %0" : "=m"(*data));
break;
case 5:
- SET_XMM(val, xmm5);
+ asm("movdqa %%xmm5, %0" : "=m"(*data));
break;
case 6:
- SET_XMM(val, xmm6);
+ asm("movdqa %%xmm6, %0" : "=m"(*data));
break;
case 7:
- SET_XMM(val, xmm7);
+ asm("movdqa %%xmm7, %0" : "=m"(*data));
break;
+ default:
+ BUG();
}
}

-#define GET_XMM(__xmm) \
-({ \
- unsigned long __val; \
- asm volatile("movq %%"#__xmm", %0" : "=r"(__val)); \
- __val; \
-})
-
-static inline unsigned long get_xmm(int n)
+static inline void write_sse_reg(int reg, const sse128_t *data)
{
- assert(n >= 0 && n <= 7);
-
- switch (n) {
+ switch (reg) {
case 0:
- return GET_XMM(xmm0);
+ asm("movdqa %0, %%xmm0" : : "m"(*data));
+ break;
case 1:
- return GET_XMM(xmm1);
+ asm("movdqa %0, %%xmm1" : : "m"(*data));
+ break;
case 2:
- return GET_XMM(xmm2);
+ asm("movdqa %0, %%xmm2" : : "m"(*data));
+ break;
case 3:
- return GET_XMM(xmm3);
+ asm("movdqa %0, %%xmm3" : : "m"(*data));
+ break;
case 4:
- return GET_XMM(xmm4);
+ asm("movdqa %0, %%xmm4" : : "m"(*data));
+ break;
case 5:
- return GET_XMM(xmm5);
+ asm("movdqa %0, %%xmm5" : : "m"(*data));
+ break;
case 6:
- return GET_XMM(xmm6);
+ asm("movdqa %0, %%xmm6" : : "m"(*data));
+ break;
case 7:
- return GET_XMM(xmm7);
+ asm("movdqa %0, %%xmm7" : : "m"(*data));
+ break;
+ default:
+ BUG();
}
-
- /* never reached */
- return 0;
}

static inline void cpu_relax(void)
--
2.37.3

2022-09-21 15:30:48

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 34/39] KVM: selftests: Split off load_evmcs() from load_vmcs()

In preparation to putting Hyper-V specific test pages to a dedicated
struct, move eVMCS load logic from load_vmcs(). Tests call load_vmcs()
directly and the only one which needs 'enlightened' version is
evmcs_test so there's not much gain in having this merged.

Temporary pass both GPA and HVA to load_evmcs().

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/include/x86_64/evmcs.h | 10 ++++++
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 33 ++++++++-----------
.../testing/selftests/kvm/x86_64/evmcs_test.c | 4 +--
3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 2530b5aeb4ba..59b60d45b8f6 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -256,6 +256,16 @@ static inline int evmcs_vmptrld(uint64_t vmcs_pa, void *vmcs)
return 0;
}

+static inline bool load_evmcs(uint64_t enlightened_vmcs_gpa, void *enlightened_vmcs)
+{
+ if (evmcs_vmptrld(enlightened_vmcs_gpa, enlightened_vmcs))
+ return false;
+
+ current_evmcs->revision_id = EVMCS_VERSION;
+
+ return true;
+}
+
static inline int evmcs_vmptrst(uint64_t *value)
{
*value = current_vp_assist->current_nested_vmcs &
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 80a568c439b8..f8acbc7c8d7d 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -169,26 +169,19 @@ bool prepare_for_vmx_operation(struct vmx_pages *vmx)

bool load_vmcs(struct vmx_pages *vmx)
{
- if (!enable_evmcs) {
- /* Load a VMCS. */
- *(uint32_t *)(vmx->vmcs) = vmcs_revision();
- if (vmclear(vmx->vmcs_gpa))
- return false;
-
- if (vmptrld(vmx->vmcs_gpa))
- return false;
-
- /* Setup shadow VMCS, do not load it yet. */
- *(uint32_t *)(vmx->shadow_vmcs) =
- vmcs_revision() | 0x80000000ul;
- if (vmclear(vmx->shadow_vmcs_gpa))
- return false;
- } else {
- if (evmcs_vmptrld(vmx->enlightened_vmcs_gpa,
- vmx->enlightened_vmcs))
- return false;
- current_evmcs->revision_id = EVMCS_VERSION;
- }
+ /* Load a VMCS. */
+ *(uint32_t *)(vmx->vmcs) = vmcs_revision();
+ if (vmclear(vmx->vmcs_gpa))
+ return false;
+
+ if (vmptrld(vmx->vmcs_gpa))
+ return false;
+
+ /* Setup shadow VMCS, do not load it yet. */
+ *(uint32_t *)(vmx->shadow_vmcs) =
+ vmcs_revision() | 0x80000000ul;
+ if (vmclear(vmx->shadow_vmcs_gpa))
+ return false;

return true;
}
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index 9007fb04343b..5a4c8b1873aa 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -81,10 +81,10 @@ void guest_code(struct vmx_pages *vmx_pages)
enable_vp_assist(vmx_pages->vp_assist_gpa, vmx_pages->vp_assist);
evmcs_enable();

- GUEST_ASSERT(vmx_pages->vmcs_gpa);
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_SYNC(3);
- GUEST_ASSERT(load_vmcs(vmx_pages));
+ GUEST_ASSERT(load_evmcs(vmx_pages->enlightened_vmcs_gpa,
+ vmx_pages->enlightened_vmcs));
GUEST_ASSERT(vmptrstz() == vmx_pages->enlightened_vmcs_gpa);

GUEST_SYNC(4);
--
2.37.3

2022-09-21 15:31:07

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 37/39] KVM: selftests: evmcs_test: Introduce L2 TLB flush test

Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the
Partition assist page.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/include/x86_64/evmcs.h | 2 +
.../testing/selftests/kvm/x86_64/evmcs_test.c | 50 ++++++++++++++++++-
2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 94d6059e9a12..901caf0e0939 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -237,6 +237,8 @@ struct hv_enlightened_vmcs {
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL BIT(15)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL 0xFFFF

+#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
+
extern struct hv_enlightened_vmcs *current_evmcs;

int vcpu_enable_evmcs(struct kvm_vcpu *vcpu);
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index 74f076ba574b..691dbe0a0881 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -16,6 +16,7 @@

#include "kvm_util.h"

+#include "hyperv.h"
#include "vmx.h"

static int ud_count;
@@ -48,6 +49,8 @@ static inline void rdmsr_gs_base(void)

void l2_guest_code(void)
{
+ u64 unused;
+
GUEST_SYNC(7);

GUEST_SYNC(8);
@@ -64,15 +67,33 @@ void l2_guest_code(void)
vmcall();
rdmsr_gs_base(); /* intercepted */

+ /* L2 TLB flush tests */
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
+ HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS);
+ rdmsr_fs_base();
+ /*
+ * Note: hypercall status (RAX) is not preserved correctly by L1 after
+ * synthetic vmexit, use unchecked version.
+ */
+ __hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE | HV_HYPERCALL_FAST_BIT, 0x0,
+ HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS,
+ &unused);
+ /* Make sure we're no issuing Hyper-V TLB flush call again */
+ __asm__ __volatile__ ("mov $0xdeadbeef, %rcx");
+
/* Done, exit to L1 and never come back. */
vmcall();
}

-void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages)
+void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages,
+ vm_vaddr_t hv_hcall_page_gpa)
{
#define L2_GUEST_STACK_SIZE 64
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];

+ wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+ wrmsr(HV_X64_MSR_HYPERCALL, hv_hcall_page_gpa);
+
x2apic_enable();

GUEST_SYNC(1);
@@ -102,6 +123,14 @@ void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages)
vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmreadz(PIN_BASED_VM_EXEC_CONTROL) |
PIN_BASED_NMI_EXITING);

+ /* L2 TLB flush setup */
+ current_evmcs->partition_assist_page = hv_pages->partition_assist_gpa;
+ current_evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;
+ current_evmcs->hv_vm_id = 1;
+ current_evmcs->hv_vp_id = 1;
+ current_vp_assist->nested_control.features.directhypercall = 1;
+ *(u32 *)(hv_pages->partition_assist) = 0;
+
GUEST_ASSERT(!vmlaunch());
GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);

@@ -146,6 +175,18 @@ void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages)
GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_MSR_READ);
current_evmcs->guest_rip += 2; /* rdmsr */

+ /*
+ * L2 TLB flush test. First VMCALL should be handled directly by L0,
+ * no VMCALL exit expected.
+ */
+ GUEST_ASSERT(!vmresume());
+ GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_MSR_READ);
+ current_evmcs->guest_rip += 2; /* rdmsr */
+ /* Enable synthetic vmexit */
+ *(u32 *)(hv_pages->partition_assist) = 1;
+ GUEST_ASSERT(!vmresume());
+ GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH);
+
GUEST_ASSERT(!vmresume());
GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
GUEST_SYNC(11);
@@ -199,6 +240,7 @@ static struct kvm_vcpu *save_restore_vm(struct kvm_vm *vm,
int main(int argc, char *argv[])
{
vm_vaddr_t vmx_pages_gva = 0, hv_pages_gva = 0;
+ vm_vaddr_t hcall_page;

struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
@@ -212,12 +254,16 @@ int main(int argc, char *argv[])
TEST_REQUIRE(kvm_has_cap(KVM_CAP_NESTED_STATE));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_HYPERV_ENLIGHTENED_VMCS));

+ hcall_page = vm_vaddr_alloc_pages(vm, 1);
+ memset(addr_gva2hva(vm, hcall_page), 0x0, getpagesize());
+
vcpu_set_hv_cpuid(vcpu);
vcpu_enable_evmcs(vcpu);

vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_alloc_hyperv_test_pages(vm, &hv_pages_gva);
- vcpu_args_set(vcpu, 2, vmx_pages_gva, hv_pages_gva);
+ vcpu_args_set(vcpu, 3, vmx_pages_gva, hv_pages_gva, addr_gva2gpa(vm, hcall_page));
+ vcpu_set_msr(vcpu, HV_X64_MSR_VP_INDEX, vcpu->id);

vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vcpu);
--
2.37.3

2022-09-21 15:31:38

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 25/39] KVM: selftests: Move the function doing Hyper-V hypercall to a common header

All Hyper-V specific tests issuing hypercalls need this.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/include/x86_64/hyperv.h | 16 ++++++++++++++++
.../selftests/kvm/x86_64/hyperv_features.c | 18 +-----------------
2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index f0a8a93694b2..285e9ff73573 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -185,6 +185,22 @@
/* hypercall options */
#define HV_HYPERCALL_FAST_BIT BIT(16)

+static inline uint8_t hyperv_hypercall(u64 control, vm_vaddr_t input_address,
+ vm_vaddr_t output_address,
+ uint64_t *hv_status)
+{
+ uint8_t vector;
+ /* Note both the hypercall and the "asm safe" clobber r9-r11. */
+ asm volatile("mov %[output_address], %%r8\n\t"
+ KVM_ASM_SAFE("vmcall")
+ : "=a" (*hv_status),
+ "+c" (control), "+d" (input_address),
+ KVM_ASM_SAFE_OUTPUTS(vector)
+ : [output_address] "r"(output_address)
+ : "cc", "memory", "r8", KVM_ASM_SAFE_CLOBBERS);
+ return vector;
+}
+
/* Proper HV_X64_MSR_GUEST_OS_ID value */
#define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
index 1144bd1ea626..c464d324cde0 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
@@ -13,22 +13,6 @@
#include "processor.h"
#include "hyperv.h"

-static inline uint8_t hypercall(u64 control, vm_vaddr_t input_address,
- vm_vaddr_t output_address, uint64_t *hv_status)
-{
- uint8_t vector;
-
- /* Note both the hypercall and the "asm safe" clobber r9-r11. */
- asm volatile("mov %[output_address], %%r8\n\t"
- KVM_ASM_SAFE("vmcall")
- : "=a" (*hv_status),
- "+c" (control), "+d" (input_address),
- KVM_ASM_SAFE_OUTPUTS(vector)
- : [output_address] "r"(output_address)
- : "cc", "memory", "r8", KVM_ASM_SAFE_CLOBBERS);
- return vector;
-}
-
struct msr_data {
uint32_t idx;
bool available;
@@ -78,7 +62,7 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)
input = output = 0;
}

- vector = hypercall(hcall->control, input, output, &res);
+ vector = hyperv_hypercall(hcall->control, input, output, &res);
if (hcall->ud_expected)
GUEST_ASSERT_2(vector == UD_VECTOR, hcall->control, vector);
else
--
2.37.3

2022-09-21 15:47:15

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 07/39] KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs

To handle L2 TLB flush requests, KVM needs to translate the specified
L2 GPA to L1 GPA to read hypercall arguments from there.

No functional change as KVM doesn't handle VMCALL/VMMCALL from L2 yet.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index aced2b3fe56b..ad2560914a5f 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,6 +23,7 @@
#include "ioapic.h"
#include "cpuid.h"
#include "hyperv.h"
+#include "mmu.h"
#include "xen.h"

#include <linux/cpu.h>
@@ -1909,6 +1910,12 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
*/
BUILD_BUG_ON(KVM_HV_MAX_SPARSE_VCPU_SET_BITS > 64);

+ if (!hc->fast && is_guest_mode(vcpu)) {
+ hc->ingpa = translate_nested_gpa(vcpu, hc->ingpa, 0, NULL);
+ if (unlikely(hc->ingpa == INVALID_GPA))
+ return HV_STATUS_INVALID_HYPERCALL_INPUT;
+ }
+
if (hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
hc->code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE) {
if (hc->fast) {
--
2.37.3

2022-09-21 15:47:30

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 04/39] KVM: x86: hyper-v: Add helper to read hypercall data for array

From: Sean Christopherson <[email protected]>

Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
reading a guest-provided array can be reused in the future, e.g. for
getting a list of virtual addresses whose TLB entries need to be flushed.

Opportunisticaly swap the order of the data and XMM adjustment so that
the XMM/gpa offsets are bundled together.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 53 +++++++++++++++++++++++++++----------------
1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index b127b6bb84dd..fb0f7342fccf 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1753,38 +1753,51 @@ struct kvm_hv_hcall {
sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
};

-static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
- int consumed_xmm_halves,
- u64 *sparse_banks, gpa_t offset)
-{
- u16 var_cnt;
- int i;

- if (hc->var_cnt > 64)
- return -EINVAL;
-
- /* Ignore banks that cannot possibly contain a legal VP index. */
- var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS);
+static int kvm_hv_get_hc_data(struct kvm *kvm, struct kvm_hv_hcall *hc,
+ u16 orig_cnt, u16 cnt_cap, u64 *data,
+ int consumed_xmm_halves, gpa_t offset)
+{
+ /*
+ * Preserve the original count when ignoring entries via a "cap", KVM
+ * still needs to validate the guest input (though the non-XMM path
+ * punts on the checks).
+ */
+ u16 cnt = min(orig_cnt, cnt_cap);
+ int i, j;

if (hc->fast) {
/*
* Each XMM holds two sparse banks, but do not count halves that
* have already been consumed for hypercall parameters.
*/
- if (hc->var_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
+ if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
return HV_STATUS_INVALID_HYPERCALL_INPUT;
- for (i = 0; i < var_cnt; i++) {
- int j = i + consumed_xmm_halves;
+
+ for (i = 0; i < cnt; i++) {
+ j = i + consumed_xmm_halves;
if (j % 2)
- sparse_banks[i] = sse128_hi(hc->xmm[j / 2]);
+ data[i] = sse128_hi(hc->xmm[j / 2]);
else
- sparse_banks[i] = sse128_lo(hc->xmm[j / 2]);
+ data[i] = sse128_lo(hc->xmm[j / 2]);
}
return 0;
}

- return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks,
- var_cnt * sizeof(*sparse_banks));
+ return kvm_read_guest(kvm, hc->ingpa + offset, data,
+ cnt * sizeof(*data));
+}
+
+static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
+ u64 *sparse_banks, int consumed_xmm_halves,
+ gpa_t offset)
+{
+ if (hc->var_cnt > 64)
+ return -EINVAL;
+
+ /* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
+ return kvm_hv_get_hc_data(kvm, hc, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS,
+ sparse_banks, consumed_xmm_halves, offset);
}

static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu)
@@ -1894,7 +1907,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
if (!hc->var_cnt)
goto ret_success;

- if (kvm_get_sparse_vp_set(kvm, hc, 2, sparse_banks,
+ if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
offsetof(struct hv_tlb_flush_ex,
hv_vp_set.bank_contents)))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
@@ -2005,7 +2018,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
if (!hc->var_cnt)
goto ret_success;

- if (kvm_get_sparse_vp_set(kvm, hc, 1, sparse_banks,
+ if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 1,
offsetof(struct hv_send_ipi_ex,
vp_set.bank_contents)))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
--
2.37.3

2022-09-21 15:47:40

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 19/39] KVM: nVMX: hyper-v: Cache VP assist page in 'struct kvm_vcpu_hv'

In preparation to enabling L2 TLB flush, cache VP assist page in
'struct kvm_vcpu_hv'. While on it, rename nested_enlightened_vmentry()
to nested_get_evmptr() and make it return eVMCS GPA directly.

No functional change intended.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/hyperv.c | 10 ++++++----
arch/x86/kvm/hyperv.h | 3 +--
arch/x86/kvm/vmx/evmcs.c | 21 +++++++--------------
arch/x86/kvm/vmx/evmcs.h | 2 +-
arch/x86/kvm/vmx/nested.c | 6 +++---
6 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b40df51b58d9..bc7d8527578d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -650,6 +650,8 @@ struct kvm_vcpu_hv {
/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
u64 sparse_banks[HV_MAX_SPARSE_VCPU_BANKS];

+ struct hv_vp_assist_page vp_assist_page;
+
struct {
u64 pa_page_gpa;
u64 vm_id;
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 28174a9edf35..8ae32ec87efb 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -900,13 +900,15 @@ bool kvm_hv_assist_page_enabled(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_hv_assist_page_enabled);

-bool kvm_hv_get_assist_page(struct kvm_vcpu *vcpu,
- struct hv_vp_assist_page *assist_page)
+bool kvm_hv_get_assist_page(struct kvm_vcpu *vcpu)
{
- if (!kvm_hv_assist_page_enabled(vcpu))
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ if (!hv_vcpu || !kvm_hv_assist_page_enabled(vcpu))
return false;
+
return !kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data,
- assist_page, sizeof(*assist_page));
+ &hv_vcpu->vp_assist_page, sizeof(struct hv_vp_assist_page));
}
EXPORT_SYMBOL_GPL(kvm_hv_get_assist_page);

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 3fff3a6f4bb9..990b4fc2e649 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -108,8 +108,7 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages);
void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu);

bool kvm_hv_assist_page_enabled(struct kvm_vcpu *vcpu);
-bool kvm_hv_get_assist_page(struct kvm_vcpu *vcpu,
- struct hv_vp_assist_page *assist_page);
+bool kvm_hv_get_assist_page(struct kvm_vcpu *vcpu);

static inline struct kvm_vcpu_hv_stimer *to_hv_stimer(struct kvm_vcpu *vcpu,
int timer_index)
diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 26bb40933e6d..635a0c81ff1d 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -322,24 +322,17 @@ const struct evmcs_field vmcs_field_to_evmcs_1[] = {
};
const unsigned int nr_evmcs_1_fields = ARRAY_SIZE(vmcs_field_to_evmcs_1);

-bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa)
+u64 nested_get_evmptr(struct kvm_vcpu *vcpu)
{
- struct hv_vp_assist_page assist_page;
-
- *evmcs_gpa = -1ull;
-
- if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
- return false;
-
- if (unlikely(!assist_page.enlighten_vmentry))
- return false;
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);

- if (unlikely(!evmptr_is_valid(assist_page.current_nested_vmcs)))
- return false;
+ if (unlikely(!kvm_hv_get_assist_page(vcpu)))
+ return EVMPTR_INVALID;

- *evmcs_gpa = assist_page.current_nested_vmcs;
+ if (unlikely(!hv_vcpu->vp_assist_page.enlighten_vmentry))
+ return EVMPTR_INVALID;

- return true;
+ return hv_vcpu->vp_assist_page.current_nested_vmcs;
}

uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 492d1c58c734..7ad56fbc4b4d 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -235,7 +235,7 @@ enum nested_evmptrld_status {
EVMPTRLD_ERROR,
};

-bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmcs_gpa);
+u64 nested_get_evmptr(struct kvm_vcpu *vcpu);
uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
int nested_enable_evmcs(struct kvm_vcpu *vcpu,
uint16_t *vmcs_version);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 640680228973..0634518a6719 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1992,7 +1992,8 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
if (likely(!guest_cpuid_has_evmcs(vcpu)))
return EVMPTRLD_DISABLED;

- if (!nested_enlightened_vmentry(vcpu, &evmcs_gpa)) {
+ evmcs_gpa = nested_get_evmptr(vcpu);
+ if (!evmptr_is_valid(evmcs_gpa)) {
nested_release_evmcs(vcpu);
return EVMPTRLD_DISABLED;
}
@@ -5220,7 +5221,6 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 zero = 0;
gpa_t vmptr;
- u64 evmcs_gpa;
int r;

if (!nested_vmx_check_permission(vcpu))
@@ -5246,7 +5246,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
* vmx->nested.hv_evmcs but this shouldn't be a problem.
*/
if (likely(!guest_cpuid_has_evmcs(vcpu) ||
- !nested_enlightened_vmentry(vcpu, &evmcs_gpa))) {
+ !evmptr_is_valid(nested_get_evmptr(vcpu)))) {
if (vmptr == vmx->nested.current_vmptr)
nested_release_vmcs12(vcpu);

--
2.37.3

2022-09-21 15:47:47

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 39/39] KVM: selftests: Rename 'evmcs_test' to 'hyperv_evmcs'

Conform to the rest of Hyper-V emulation selftests which have 'hyperv'
prefix. Get rid of '_test' suffix as well as the purpose of this code
is fairly obvious.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 2 +-
tools/testing/selftests/kvm/Makefile | 2 +-
.../selftests/kvm/x86_64/{evmcs_test.c => hyperv_evmcs.c} | 0
3 files changed, 2 insertions(+), 2 deletions(-)
rename tools/testing/selftests/kvm/x86_64/{evmcs_test.c => hyperv_evmcs.c} (100%)

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 8e9d208488a8..0ba43d0244c2 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -15,7 +15,6 @@
/x86_64/cpuid_test
/x86_64/cr4_cpuid_sync_test
/x86_64/debug_regs
-/x86_64/evmcs_test
/x86_64/emulator_error_test
/x86_64/fix_hypercall_test
/x86_64/get_msr_index_features
@@ -23,6 +22,7 @@
/x86_64/kvm_pv_test
/x86_64/hyperv_clock
/x86_64/hyperv_cpuid
+/x86_64/hyperv_evmcs
/x86_64/hyperv_features
/x86_64/hyperv_ipi
/x86_64/hyperv_svm_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 36692fe34e10..781efe90518f 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -79,11 +79,11 @@ TEST_PROGS_x86_64 += x86_64/nx_huge_pages_test.sh
TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
-TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
TEST_GEN_PROGS_x86_64 += x86_64/emulator_error_test
TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
+TEST_GEN_PROGS_x86_64 += x86_64/hyperv_evmcs
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/hyperv_evmcs.c
similarity index 100%
rename from tools/testing/selftests/kvm/x86_64/evmcs_test.c
rename to tools/testing/selftests/kvm/x86_64/hyperv_evmcs.c
--
2.37.3

2022-09-21 15:47:53

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 16/39] KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()

The newly introduced helper checks whether vCPU is performing a
Hyper-V TLB flush hypercall. This is required to filter out L2 TLB
flush hypercalls for processing.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index ca7f1d2c134e..d16f62e4f43a 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -176,6 +176,24 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)

kfifo_reset_out(&tlb_flush_fifo->entries);
}
+
+static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+ u16 code;
+
+ if (!hv_vcpu)
+ return false;
+
+ code = is_64_bit_hypercall(vcpu) ? kvm_rcx_read(vcpu) :
+ kvm_rax_read(vcpu);
+
+ return (code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE ||
+ code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST ||
+ code == HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX ||
+ code == HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX);
+}
+
void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);


--
2.37.3

2022-09-21 15:48:05

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 17/39] KVM: x86: hyper-v: L2 TLB flush

Handle L2 TLB flush requests by going through all vCPUs and checking
whether there are vCPUs running the same VM_ID with a VP_ID specified
in the requests. Perform synthetic exit to L2 upon finish.

Note, while checking VM_ID/VP_ID of running vCPUs seem to be a bit
racy, we count on the fact that KVM flushes the whole L2 VPID upon
transition. Also, KVM_REQ_HV_TLB_FLUSH request needs to be done upon
transition between L1 and L2 to make sure all pending requests are
always processed.

For the reference, Hyper-V TLFS refers to the feature as "Direct
Virtual Flush".

Note, nVMX/nSVM code does not handle VMCALL/VMMCALL from L2 yet.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 82 ++++++++++++++++++++++++++++++++++++-------
arch/x86/kvm/hyperv.h | 3 --
arch/x86/kvm/trace.h | 21 ++++++-----
3 files changed, 82 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 0f3d04223d60..28174a9edf35 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -34,6 +34,7 @@
#include <linux/eventfd.h>

#include <asm/apicdef.h>
+#include <asm/mshyperv.h>
#include <trace/events/kvm.h>

#include "trace.h"
@@ -1829,9 +1830,10 @@ static int kvm_hv_get_tlb_flush_entries(struct kvm *kvm, struct kvm_hv_hcall *hc
entries, consumed_xmm_halves, offset);
}

-static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
+static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu,
+ struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo,
+ u64 *entries, int count)
{
- struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
u64 flush_all_entry = KVM_HV_TLB_FLUSHALL_ENTRY;
unsigned long flags;
@@ -1839,9 +1841,6 @@ static void hv_tlb_flush_enqueue(struct kvm_vcpu *vcpu, u64 *entries, int count)
if (!hv_vcpu)
return;

- /* kvm_hv_flush_tlb() is not ready to handle requests for L2s yet */
- tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo[HV_L1_TLB_FLUSH_FIFO];
-
spin_lock_irqsave(&tlb_flush_fifo->write_lock, flags);

/*
@@ -1910,6 +1909,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
struct hv_tlb_flush_ex flush_ex;
struct hv_tlb_flush flush;
DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
+ struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
/*
* Normally, there can be no more than 'KVM_HV_TLB_FLUSH_FIFO_SIZE'
* entries on the TLB flush fifo. The last entry, however, needs to be
@@ -1953,7 +1953,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
}

trace_kvm_hv_flush_tlb(flush.processor_mask,
- flush.address_space, flush.flags);
+ flush.address_space, flush.flags,
+ is_guest_mode(vcpu));

valid_bank_mask = BIT_ULL(0);
sparse_banks[0] = flush.processor_mask;
@@ -1984,7 +1985,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
trace_kvm_hv_flush_tlb_ex(flush_ex.hv_vp_set.valid_bank_mask,
flush_ex.hv_vp_set.format,
flush_ex.address_space,
- flush_ex.flags);
+ flush_ex.flags, is_guest_mode(vcpu));

valid_bank_mask = flush_ex.hv_vp_set.valid_bank_mask;
all_cpus = flush_ex.hv_vp_set.format !=
@@ -2022,19 +2023,57 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
* vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't
* analyze it here, flush TLB regardless of the specified address space.
*/
- if (all_cpus) {
- kvm_for_each_vcpu(i, v, kvm)
- hv_tlb_flush_enqueue(v, tlb_flush_entries, hc->rep_cnt);
+ if (all_cpus && !is_guest_mode(vcpu)) {
+ kvm_for_each_vcpu(i, v, kvm) {
+ tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(v, false);
+ hv_tlb_flush_enqueue(v, tlb_flush_fifo,
+ tlb_flush_entries, hc->rep_cnt);
+ }

kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
- } else {
+ } else if (!is_guest_mode(vcpu)) {
sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);

for_each_set_bit(i, vcpu_mask, KVM_MAX_VCPUS) {
v = kvm_get_vcpu(kvm, i);
if (!v)
continue;
- hv_tlb_flush_enqueue(v, tlb_flush_entries, hc->rep_cnt);
+ tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(v, false);
+ hv_tlb_flush_enqueue(v, tlb_flush_fifo,
+ tlb_flush_entries, hc->rep_cnt);
+ }
+
+ kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
+ } else {
+ struct kvm_vcpu_hv *hv_v;
+
+ bitmap_zero(vcpu_mask, KVM_MAX_VCPUS);
+
+ kvm_for_each_vcpu(i, v, kvm) {
+ hv_v = to_hv_vcpu(v);
+
+ /*
+ * The following check races with nested vCPUs entering/exiting
+ * and/or migrating between L1's vCPUs, however the only case when
+ * KVM *must* flush the TLB is when the target L2 vCPU keeps
+ * running on the same L1 vCPU from the moment of the request until
+ * kvm_hv_flush_tlb() returns. TLB is fully flushed in all other
+ * cases, e.g. when the target L2 vCPU migrates to a different L1
+ * vCPU or when the corresponding L1 vCPU temporary switches to a
+ * different L2 vCPU while the request is being processed.
+ */
+ if (!hv_v || hv_v->nested.vm_id != hv_vcpu->nested.vm_id)
+ continue;
+
+ if (!all_cpus &&
+ !hv_is_vp_in_sparse_set(hv_v->nested.vp_id, valid_bank_mask,
+ sparse_banks))
+ continue;
+
+ __set_bit(i, vcpu_mask);
+ tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(v, true);
+ hv_tlb_flush_enqueue(v, tlb_flush_fifo,
+ tlb_flush_entries, hc->rep_cnt);
}

kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
@@ -2222,10 +2261,27 @@ static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)

static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
{
+ int ret;
+
trace_kvm_hv_hypercall_done(result);
kvm_hv_hypercall_set_result(vcpu, result);
++vcpu->stat.hypercalls;
- return kvm_skip_emulated_instruction(vcpu);
+ ret = kvm_skip_emulated_instruction(vcpu);
+
+ if (unlikely(hv_result_success(result) && is_guest_mode(vcpu)
+ && kvm_hv_is_tlb_flush_hcall(vcpu))) {
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+ u32 tlb_lock_count;
+
+ if (unlikely(kvm_read_guest(vcpu->kvm, hv_vcpu->nested.pa_page_gpa,
+ &tlb_lock_count, sizeof(tlb_lock_count))))
+ kvm_inject_gp(vcpu, 0);
+
+ if (tlb_lock_count)
+ kvm_x86_ops.nested_ops->hv_inject_synthetic_vmexit_post_tlb_flush(vcpu);
+ }
+
+ return ret;
}

static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index d16f62e4f43a..1b53dd4cff4d 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -159,9 +159,6 @@ static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struc
int i = is_guest_mode ? HV_L2_TLB_FLUSH_FIFO :
HV_L1_TLB_FLUSH_FIFO;

- /* KVM does not handle L2 TLB flush requests yet */
- WARN_ON_ONCE(i != HV_L1_TLB_FLUSH_FIFO);
-
return &hv_vcpu->tlb_flush_fifo[i];
}

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index bc25589ad588..09f3392dd830 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1547,38 +1547,41 @@ TRACE_EVENT(kvm_hv_timer_state,
* Tracepoint for kvm_hv_flush_tlb.
*/
TRACE_EVENT(kvm_hv_flush_tlb,
- TP_PROTO(u64 processor_mask, u64 address_space, u64 flags),
- TP_ARGS(processor_mask, address_space, flags),
+ TP_PROTO(u64 processor_mask, u64 address_space, u64 flags, bool guest_mode),
+ TP_ARGS(processor_mask, address_space, flags, guest_mode),

TP_STRUCT__entry(
__field(u64, processor_mask)
__field(u64, address_space)
__field(u64, flags)
+ __field(bool, guest_mode)
),

TP_fast_assign(
__entry->processor_mask = processor_mask;
__entry->address_space = address_space;
__entry->flags = flags;
+ __entry->guest_mode = guest_mode;
),

- TP_printk("processor_mask 0x%llx address_space 0x%llx flags 0x%llx",
+ TP_printk("processor_mask 0x%llx address_space 0x%llx flags 0x%llx %s",
__entry->processor_mask, __entry->address_space,
- __entry->flags)
+ __entry->flags, __entry->guest_mode ? "(L2)" : "")
);

/*
* Tracepoint for kvm_hv_flush_tlb_ex.
*/
TRACE_EVENT(kvm_hv_flush_tlb_ex,
- TP_PROTO(u64 valid_bank_mask, u64 format, u64 address_space, u64 flags),
- TP_ARGS(valid_bank_mask, format, address_space, flags),
+ TP_PROTO(u64 valid_bank_mask, u64 format, u64 address_space, u64 flags, bool guest_mode),
+ TP_ARGS(valid_bank_mask, format, address_space, flags, guest_mode),

TP_STRUCT__entry(
__field(u64, valid_bank_mask)
__field(u64, format)
__field(u64, address_space)
__field(u64, flags)
+ __field(bool, guest_mode)
),

TP_fast_assign(
@@ -1586,12 +1589,14 @@ TRACE_EVENT(kvm_hv_flush_tlb_ex,
__entry->format = format;
__entry->address_space = address_space;
__entry->flags = flags;
+ __entry->guest_mode = guest_mode;
),

TP_printk("valid_bank_mask 0x%llx format 0x%llx "
- "address_space 0x%llx flags 0x%llx",
+ "address_space 0x%llx flags 0x%llx %s",
__entry->valid_bank_mask, __entry->format,
- __entry->address_space, __entry->flags)
+ __entry->address_space, __entry->flags,
+ __entry->guest_mode ? "(L2)" : "")
);

/*
--
2.37.3

2022-09-21 15:48:15

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

In preparation to implementing fine-grained Hyper-V TLB flush and
L2 TLB flush, resurrect dedicated KVM_REQ_HV_TLB_FLUSH request bit. As
KVM_REQ_TLB_FLUSH_GUEST/KVM_REQ_TLB_FLUSH_CURRENT are stronger operations,
clear KVM_REQ_HV_TLB_FLUSH request in kvm_service_local_tlb_flush_requests()
when any of these were also requested.

No (real) functional change intended.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/hyperv.c | 4 ++--
arch/x86/kvm/x86.c | 10 ++++++++--
3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 504daf473092..45c390c804f0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -108,6 +108,8 @@
KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_MMU_FREE_OBSOLETE_ROOTS \
KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_HV_TLB_FLUSH \
+ KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 0adf4a437e85..3c0f639f6a05 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1870,11 +1870,11 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
* analyze it here, flush TLB regardless of the specified address space.
*/
if (all_cpus) {
- kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST);
+ kvm_make_all_cpus_request(kvm, KVM_REQ_HV_TLB_FLUSH);
} else {
sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);

- kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST, vcpu_mask);
+ kvm_make_vcpus_request_mask(kvm, KVM_REQ_HV_TLB_FLUSH, vcpu_mask);
}

ret_success:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f62d5799fcd7..86504a8bfd9a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
*/
void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
{
- if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
+ if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
kvm_vcpu_flush_tlb_current(vcpu);
+ kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+ }

- if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
+ if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
+ kvm_vcpu_flush_tlb_guest(vcpu);
+ kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+ } else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
kvm_vcpu_flush_tlb_guest(vcpu);
+ }
}
EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests);

--
2.37.3

2022-09-21 15:48:18

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 21/39] KVM: nSVM: hyper-v: Enable L2 TLB flush

Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled
both in extended 'nested controls' in VMCB and VP assist page.
According to Hyper-V TLFS, synthetic vmexit to L1 is performed with
- HV_SVM_EXITCODE_ENL exit_code.
- HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1.

Note: VP assist page is cached in 'struct kvm_vcpu_hv' so
recalc_intercepts() doesn't need to read from guest's memory. KVM
needs to update the case upon each VMRUN and after svm_set_nested_state
(svm_get_nested_state_pages()) to handle the case when the guest got
migrated while L2 was running.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/svm/hyperv.c | 7 +++++++
arch/x86/kvm/svm/hyperv.h | 30 ++++++++++++++++++++++++++++++
arch/x86/kvm/svm/nested.c | 36 ++++++++++++++++++++++++++++++++++--
3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
index 911f51021af1..088f6429b24c 100644
--- a/arch/x86/kvm/svm/hyperv.c
+++ b/arch/x86/kvm/svm/hyperv.c
@@ -8,4 +8,11 @@

void svm_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu)
{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->vmcb->control.exit_code = HV_SVM_EXITCODE_ENL;
+ svm->vmcb->control.exit_code_hi = 0;
+ svm->vmcb->control.exit_info_1 = HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH;
+ svm->vmcb->control.exit_info_2 = 0;
+ nested_svm_vmexit(svm);
}
diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
index dd2e393f84a0..7b01722838bf 100644
--- a/arch/x86/kvm/svm/hyperv.h
+++ b/arch/x86/kvm/svm/hyperv.h
@@ -33,6 +33,9 @@ struct hv_enlightenments {
*/
#define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW

+#define HV_SVM_EXITCODE_ENL 0xF0000000
+#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH (1)
+
static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -48,6 +51,33 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
hv_vcpu->nested.vp_id = hve->hv_vp_id;
}

+static inline bool nested_svm_hv_update_vp_assist(struct kvm_vcpu *vcpu)
+{
+ if (!to_hv_vcpu(vcpu))
+ return true;
+
+ if (!kvm_hv_assist_page_enabled(vcpu))
+ return true;
+
+ return kvm_hv_get_assist_page(vcpu);
+}
+
+static inline bool nested_svm_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct hv_enlightenments *hve =
+ (struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ if (!hv_vcpu)
+ return false;
+
+ if (!hve->hv_enlightenments_control.nested_flush_hypercall)
+ return false;
+
+ return hv_vcpu->vp_assist_page.nested_control.features.directhypercall;
+}
+
void svm_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu);

#endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b8df2f4f880e..e0484028dc70 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -149,8 +149,12 @@ void recalc_intercepts(struct vcpu_svm *svm)
vmcb_clr_intercept(c, INTERCEPT_VINTR);
}

- /* We don't want to see VMMCALLs from a nested guest */
- vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
+ /*
+ * We want to see VMMCALLs from a nested guest only when Hyper-V L2 TLB
+ * flush feature is enabled.
+ */
+ if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
+ vmcb_clr_intercept(c, INTERCEPT_VMMCALL);

for (i = 0; i < MAX_INTERCEPT; i++)
c->intercepts[i] |= g->intercepts[i];
@@ -473,6 +477,17 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,

static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
{
+ /*
+ * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
+ * L2's VP_ID upon request from the guest. Make sure we check for
+ * pending entries for the case when the request got misplaced (e.g.
+ * a transition from L2->L1 happened while processing L2 TLB flush
+ * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
+ * anything if there are no requests in the corresponding buffer.
+ */
+ if (to_hv_vcpu(vcpu))
+ kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+
/*
* TODO: optimize unconditional TLB flush/MMU sync. A partial list of
* things to fix before this can be conditional:
@@ -824,6 +839,12 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return 1;
}

+ /* This fails when VP assist page is enabled but the supplied GPA is bogus */
+ if (!nested_svm_hv_update_vp_assist(vcpu)) {
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+
vmcb12_gpa = svm->vmcb->save.rax;
ret = kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map);
if (ret == -EINVAL) {
@@ -1413,6 +1434,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
int nested_svm_exit_special(struct vcpu_svm *svm)
{
u32 exit_code = svm->vmcb->control.exit_code;
+ struct kvm_vcpu *vcpu = &svm->vcpu;

switch (exit_code) {
case SVM_EXIT_INTR:
@@ -1431,6 +1453,13 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
return NESTED_EXIT_HOST;
break;
}
+ case SVM_EXIT_VMMCALL:
+ /* Hyper-V L2 TLB flush hypercall is handled by L0 */
+ if (guest_hv_cpuid_has_l2_tlb_flush(vcpu) &&
+ nested_svm_l2_tlb_flush_enabled(vcpu) &&
+ kvm_hv_is_tlb_flush_hcall(vcpu))
+ return NESTED_EXIT_HOST;
+ break;
default:
break;
}
@@ -1711,6 +1740,9 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
return false;
}

+ if (!nested_svm_hv_update_vp_assist(vcpu))
+ return false;
+
return true;
}

--
2.37.3

2022-09-21 15:48:51

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 28/39] KVM: selftests: Export vm_vaddr_unused_gap() to make it possible to request unmapped ranges

Currently, tests can only request a new vaddr range by using
vm_vaddr_alloc()/vm_vaddr_alloc_page()/vm_vaddr_alloc_pages() but
these functions allocate and map physical pages too. Make it possible
to request unmapped range too.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/include/kvm_util_base.h | 1 +
tools/testing/selftests/kvm/lib/kvm_util.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 24fde97f6121..fe0ab920b3e7 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -379,6 +379,7 @@ void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags);
void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa);
void vm_mem_region_delete(struct kvm_vm *vm, uint32_t slot);
struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id);
+vm_vaddr_t vm_vaddr_unused_gap(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min);
vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min);
vm_vaddr_t vm_vaddr_alloc_pages(struct kvm_vm *vm, int nr_pages);
vm_vaddr_t vm_vaddr_alloc_page(struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index ad9e15d4c6a9..9f214d2a14a1 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1109,8 +1109,8 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
* TEST_ASSERT failure occurs for invalid input or no area of at least
* sz unallocated bytes >= vaddr_min is available.
*/
-static vm_vaddr_t vm_vaddr_unused_gap(struct kvm_vm *vm, size_t sz,
- vm_vaddr_t vaddr_min)
+vm_vaddr_t vm_vaddr_unused_gap(struct kvm_vm *vm, size_t sz,
+ vm_vaddr_t vaddr_min)
{
uint64_t pages = (sz + vm->page_size - 1) >> vm->page_shift;

--
2.37.3

2022-09-21 15:48:51

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 24/39] KVM: selftests: Move HYPERV_LINUX_OS_ID definition to a common header

HYPERV_LINUX_OS_ID needs to be written to HV_X64_MSR_GUEST_OS_ID by
each Hyper-V specific selftest.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/hyperv.h | 3 +++
tools/testing/selftests/kvm/x86_64/hyperv_features.c | 6 ++----
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index b66910702c0a..f0a8a93694b2 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -185,4 +185,7 @@
/* hypercall options */
#define HV_HYPERCALL_FAST_BIT BIT(16)

+/* Proper HV_X64_MSR_GUEST_OS_ID value */
+#define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
+
#endif /* !SELFTEST_KVM_HYPERV_H */
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
index 79ab0152d281..1144bd1ea626 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
@@ -13,8 +13,6 @@
#include "processor.h"
#include "hyperv.h"

-#define LINUX_OS_ID ((u64)0x8100 << 48)
-
static inline uint8_t hypercall(u64 control, vm_vaddr_t input_address,
vm_vaddr_t output_address, uint64_t *hv_status)
{
@@ -70,7 +68,7 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)

GUEST_ASSERT(hcall->control);

- wrmsr(HV_X64_MSR_GUEST_OS_ID, LINUX_OS_ID);
+ wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);

if (!(hcall->control & HV_HYPERCALL_FAST_BIT)) {
@@ -168,7 +166,7 @@ static void guest_test_msrs_access(void)
*/
msr->idx = HV_X64_MSR_GUEST_OS_ID;
msr->write = 1;
- msr->write_val = LINUX_OS_ID;
+ msr->write_val = HYPERV_LINUX_OS_ID;
msr->available = 1;
break;
case 3:
--
2.37.3

2022-09-21 15:48:52

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 26/39] KVM: selftests: Hyper-V PV IPI selftest

Introduce a selftest for Hyper-V PV IPI hypercalls
(HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx).

The test creates one 'sender' vCPU and two 'receiver' vCPU and then
issues various combinations of send IPI hypercalls in both 'normal'
and 'fast' (with XMM input where necessary) mode. Later, the test
checks whether IPIs were delivered to the expected destination vCPU[s].

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/hyperv.h | 35 +-
.../selftests/kvm/x86_64/hyperv_features.c | 2 +-
.../testing/selftests/kvm/x86_64/hyperv_ipi.c | 330 ++++++++++++++++++
5 files changed, 365 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_ipi.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 45d9aee1c0d8..70a853711f9f 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -24,6 +24,7 @@
/x86_64/hyperv_clock
/x86_64/hyperv_cpuid
/x86_64/hyperv_features
+/x86_64/hyperv_ipi
/x86_64/hyperv_svm_test
/x86_64/max_vcpuid_cap_test
/x86_64/mmio_warning_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 8b1b32628ac8..e13dbf35947b 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -84,6 +84,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
+TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index 285e9ff73573..605059f6b8d7 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -9,6 +9,8 @@
#ifndef SELFTEST_KVM_HYPERV_H
#define SELFTEST_KVM_HYPERV_H

+#include "processor.h"
+
#define HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS 0x40000000
#define HYPERV_CPUID_INTERFACE 0x40000001
#define HYPERV_CPUID_VERSION 0x40000002
@@ -184,10 +186,15 @@

/* hypercall options */
#define HV_HYPERCALL_FAST_BIT BIT(16)
+#define HV_HYPERCALL_VARHEAD_OFFSET 17

-static inline uint8_t hyperv_hypercall(u64 control, vm_vaddr_t input_address,
- vm_vaddr_t output_address,
- uint64_t *hv_status)
+/*
+ * Issue a Hyper-V hypercall. Returns exception vector raised or 0, 'hv_status'
+ * is set to the hypercall status (if no exception occurred).
+ */
+static inline uint8_t __hyperv_hypercall(u64 control, vm_vaddr_t input_address,
+ vm_vaddr_t output_address,
+ uint64_t *hv_status)
{
uint8_t vector;
/* Note both the hypercall and the "asm safe" clobber r9-r11. */
@@ -201,6 +208,28 @@ static inline uint8_t hyperv_hypercall(u64 control, vm_vaddr_t input_address,
return vector;
}

+/* Issue a Hyper-V hypercall and assert that it succeeded. */
+static inline void hyperv_hypercall(u64 control, vm_vaddr_t input_address,
+ vm_vaddr_t output_address)
+{
+ uint64_t hv_status;
+ uint8_t vector;
+
+ vector = __hyperv_hypercall(control, input_address, output_address, &hv_status);
+
+ GUEST_ASSERT(!vector);
+ GUEST_ASSERT((hv_status & 0xffff) == 0);
+}
+
+/* Write 'Fast' hypercall input 'data' to the first 'n_sse_regs' SSE regs */
+static inline void hyperv_write_xmm_input(void *data, int n_sse_regs)
+{
+ int i;
+
+ for (i = 0; i < n_sse_regs; i++)
+ write_sse_reg(i, (sse128_t *)(data + sizeof(sse128_t) * i));
+}
+
/* Proper HV_X64_MSR_GUEST_OS_ID value */
#define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
index c464d324cde0..03a1460d843c 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
@@ -62,7 +62,7 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)
input = output = 0;
}

- vector = hyperv_hypercall(hcall->control, input, output, &res);
+ vector = __hyperv_hypercall(hcall->control, input, output, &res);
if (hcall->ud_expected)
GUEST_ASSERT_2(vector == UD_VECTOR, hcall->control, vector);
else
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
new file mode 100644
index 000000000000..1d99741e339d
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
@@ -0,0 +1,330 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V HvCallSendSyntheticClusterIpi{,Ex} tests
+ *
+ * Copyright (C) 2022, Red Hat, Inc.
+ *
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <pthread.h>
+#include <inttypes.h>
+
+#include "kvm_util.h"
+#include "hyperv.h"
+#include "test_util.h"
+#include "vmx.h"
+
+#define RECEIVER_VCPU_ID_1 2
+#define RECEIVER_VCPU_ID_2 65
+
+#define IPI_VECTOR 0xfe
+
+static volatile uint64_t ipis_rcvd[RECEIVER_VCPU_ID_2 + 1];
+
+struct hv_vpset {
+ u64 format;
+ u64 valid_bank_mask;
+ u64 bank_contents[2];
+};
+
+enum HV_GENERIC_SET_FORMAT {
+ HV_GENERIC_SET_SPARSE_4K,
+ HV_GENERIC_SET_ALL,
+};
+
+/* HvCallSendSyntheticClusterIpi hypercall */
+struct hv_send_ipi {
+ u32 vector;
+ u32 reserved;
+ u64 cpu_mask;
+};
+
+/* HvCallSendSyntheticClusterIpiEx hypercall */
+struct hv_send_ipi_ex {
+ u32 vector;
+ u32 reserved;
+ struct hv_vpset vp_set;
+};
+
+static inline void hv_init(vm_vaddr_t pgs_gpa)
+{
+ wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+ wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
+}
+
+static void receiver_code(void *hcall_page, vm_vaddr_t pgs_gpa)
+{
+ u32 vcpu_id;
+
+ x2apic_enable();
+ hv_init(pgs_gpa);
+
+ vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
+
+ /* Signal sender vCPU we're ready */
+ ipis_rcvd[vcpu_id] = (u64)-1;
+
+ for (;;)
+ asm volatile("sti; hlt; cli");
+}
+
+static void guest_ipi_handler(struct ex_regs *regs)
+{
+ u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
+
+ ipis_rcvd[vcpu_id]++;
+ wrmsr(HV_X64_MSR_EOI, 1);
+}
+
+static inline void nop_loop(void)
+{
+ int i;
+
+ for (i = 0; i < 100000000; i++)
+ asm volatile("nop");
+}
+
+static void sender_guest_code(void *hcall_page, vm_vaddr_t pgs_gpa)
+{
+ struct hv_send_ipi *ipi = (struct hv_send_ipi *)hcall_page;
+ struct hv_send_ipi_ex *ipi_ex = (struct hv_send_ipi_ex *)hcall_page;
+ int stage = 1, ipis_expected[2] = {0};
+
+ hv_init(pgs_gpa);
+ GUEST_SYNC(stage++);
+
+ /* Wait for receiver vCPUs to come up */
+ while (!ipis_rcvd[RECEIVER_VCPU_ID_1] || !ipis_rcvd[RECEIVER_VCPU_ID_2])
+ nop_loop();
+ ipis_rcvd[RECEIVER_VCPU_ID_1] = ipis_rcvd[RECEIVER_VCPU_ID_2] = 0;
+
+ /* 'Slow' HvCallSendSyntheticClusterIpi to RECEIVER_VCPU_ID_1 */
+ ipi->vector = IPI_VECTOR;
+ ipi->cpu_mask = 1 << RECEIVER_VCPU_ID_1;
+ hyperv_hypercall(HVCALL_SEND_IPI, pgs_gpa, pgs_gpa + 4096);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+ GUEST_SYNC(stage++);
+ /* 'Fast' HvCallSendSyntheticClusterIpi to RECEIVER_VCPU_ID_1 */
+ hyperv_hypercall(HVCALL_SEND_IPI | HV_HYPERCALL_FAST_BIT,
+ IPI_VECTOR, 1 << RECEIVER_VCPU_ID_1);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+ GUEST_SYNC(stage++);
+
+ /* 'Slow' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_1 */
+ memset(hcall_page, 0, 4096);
+ ipi_ex->vector = IPI_VECTOR;
+ ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ ipi_ex->vp_set.valid_bank_mask = 1 << 0;
+ ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_1);
+ hyperv_hypercall(HVCALL_SEND_IPI_EX | (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+ pgs_gpa, pgs_gpa + 4096);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+ GUEST_SYNC(stage++);
+ /* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_1 */
+ hyperv_write_xmm_input(&ipi_ex->vp_set.valid_bank_mask, 1);
+ hyperv_hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
+ (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+ IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ipis_expected[1]);
+ GUEST_SYNC(stage++);
+
+ /* 'Slow' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_2 */
+ memset(hcall_page, 0, 4096);
+ ipi_ex->vector = IPI_VECTOR;
+ ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ ipi_ex->vp_set.valid_bank_mask = 1 << 1;
+ ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_2 - 64);
+ hyperv_hypercall(HVCALL_SEND_IPI_EX | (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+ pgs_gpa, pgs_gpa + 4096);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+ GUEST_SYNC(stage++);
+ /* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to RECEIVER_VCPU_ID_2 */
+ hyperv_write_xmm_input(&ipi_ex->vp_set.valid_bank_mask, 1);
+ hyperv_hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
+ (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+ IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+ GUEST_SYNC(stage++);
+
+ /* 'Slow' HvCallSendSyntheticClusterIpiEx to both RECEIVER_VCPU_ID_{1,2} */
+ memset(hcall_page, 0, 4096);
+ ipi_ex->vector = IPI_VECTOR;
+ ipi_ex->vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ ipi_ex->vp_set.valid_bank_mask = 1 << 1 | 1;
+ ipi_ex->vp_set.bank_contents[0] = BIT(RECEIVER_VCPU_ID_1);
+ ipi_ex->vp_set.bank_contents[1] = BIT(RECEIVER_VCPU_ID_2 - 64);
+ hyperv_hypercall(HVCALL_SEND_IPI_EX | (2 << HV_HYPERCALL_VARHEAD_OFFSET),
+ pgs_gpa, pgs_gpa + 4096);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+ GUEST_SYNC(stage++);
+ /* 'XMM Fast' HvCallSendSyntheticClusterIpiEx to both RECEIVER_VCPU_ID_{1, 2} */
+ hyperv_write_xmm_input(&ipi_ex->vp_set.valid_bank_mask, 2);
+ hyperv_hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT |
+ (2 << HV_HYPERCALL_VARHEAD_OFFSET),
+ IPI_VECTOR, HV_GENERIC_SET_SPARSE_4K);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+ GUEST_SYNC(stage++);
+
+ /* 'Slow' HvCallSendSyntheticClusterIpiEx to HV_GENERIC_SET_ALL */
+ memset(hcall_page, 0, 4096);
+ ipi_ex->vector = IPI_VECTOR;
+ ipi_ex->vp_set.format = HV_GENERIC_SET_ALL;
+ hyperv_hypercall(HVCALL_SEND_IPI_EX, pgs_gpa, pgs_gpa + 4096);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+ GUEST_SYNC(stage++);
+ /*
+ * 'XMM Fast' HvCallSendSyntheticClusterIpiEx to HV_GENERIC_SET_ALL.
+ * Nothing to write anything to XMM regs.
+ */
+ hyperv_hypercall(HVCALL_SEND_IPI_EX | HV_HYPERCALL_FAST_BIT,
+ IPI_VECTOR, HV_GENERIC_SET_ALL);
+ nop_loop();
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_1] == ++ipis_expected[0]);
+ GUEST_ASSERT(ipis_rcvd[RECEIVER_VCPU_ID_2] == ++ipis_expected[1]);
+ GUEST_SYNC(stage++);
+
+ GUEST_DONE();
+}
+
+static void *vcpu_thread(void *arg)
+{
+ struct kvm_vcpu *vcpu = (struct kvm_vcpu *)arg;
+ struct ucall uc;
+ int old;
+ int r;
+ unsigned int exit_reason;
+
+ r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+ TEST_ASSERT(r == 0,
+ "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
+ vcpu->id, r);
+
+ vcpu_run(vcpu);
+ exit_reason = vcpu->run->exit_reason;
+
+ TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+ "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+ vcpu->id, exit_reason, exit_reason_str(exit_reason));
+
+ if (get_ucall(vcpu, &uc) == UCALL_ABORT) {
+ TEST_ASSERT(false,
+ "vCPU %u exited with error: %s.\n",
+ vcpu->id, (const char *)uc.args[0]);
+ }
+
+ return NULL;
+}
+
+static void cancel_join_vcpu_thread(pthread_t thread, struct kvm_vcpu *vcpu)
+{
+ void *retval;
+ int r;
+
+ r = pthread_cancel(thread);
+ TEST_ASSERT(r == 0,
+ "pthread_cancel on vcpu_id=%d failed with errno=%d",
+ vcpu->id, r);
+
+ r = pthread_join(thread, &retval);
+ TEST_ASSERT(r == 0,
+ "pthread_join on vcpu_id=%d failed with errno=%d",
+ vcpu->id, r);
+ TEST_ASSERT(retval == PTHREAD_CANCELED,
+ "expected retval=%p, got %p", PTHREAD_CANCELED,
+ retval);
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_vm *vm;
+ struct kvm_vcpu *vcpu[3];
+ unsigned int exit_reason;
+ vm_vaddr_t hcall_page;
+ pthread_t threads[2];
+ int stage = 1, r;
+ struct ucall uc;
+
+ vm = vm_create_with_one_vcpu(&vcpu[0], sender_guest_code);
+
+ /* Hypercall input/output */
+ hcall_page = vm_vaddr_alloc_pages(vm, 2);
+ memset(addr_gva2hva(vm, hcall_page), 0x0, 2 * getpagesize());
+
+ vm_init_descriptor_tables(vm);
+
+ vcpu[1] = vm_vcpu_add(vm, RECEIVER_VCPU_ID_1, receiver_code);
+ vcpu_init_descriptor_tables(vcpu[1]);
+ vcpu_args_set(vcpu[1], 2, hcall_page, addr_gva2gpa(vm, hcall_page));
+ vcpu_set_msr(vcpu[1], HV_X64_MSR_VP_INDEX, RECEIVER_VCPU_ID_1);
+ vcpu_set_hv_cpuid(vcpu[1]);
+
+ vcpu[2] = vm_vcpu_add(vm, RECEIVER_VCPU_ID_2, receiver_code);
+ vcpu_init_descriptor_tables(vcpu[2]);
+ vcpu_args_set(vcpu[2], 2, hcall_page, addr_gva2gpa(vm, hcall_page));
+ vcpu_set_msr(vcpu[2], HV_X64_MSR_VP_INDEX, RECEIVER_VCPU_ID_2);
+ vcpu_set_hv_cpuid(vcpu[2]);
+
+ vm_install_exception_handler(vm, IPI_VECTOR, guest_ipi_handler);
+
+ vcpu_args_set(vcpu[0], 2, hcall_page, addr_gva2gpa(vm, hcall_page));
+ vcpu_set_hv_cpuid(vcpu[0]);
+
+ r = pthread_create(&threads[0], NULL, vcpu_thread, vcpu[1]);
+ TEST_ASSERT(r == 0,
+ "pthread_create halter failed errno=%d", errno);
+
+ r = pthread_create(&threads[1], NULL, vcpu_thread, vcpu[2]);
+ TEST_ASSERT(r == 0,
+ "pthread_create halter failed errno=%d", errno);
+
+ while (true) {
+ r = _vcpu_run(vcpu[0]);
+ exit_reason = vcpu[0]->run->exit_reason;
+
+ TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);
+ TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+ "unexpected exit reason: %u (%s)",
+ exit_reason, exit_reason_str(exit_reason));
+
+ switch (get_ucall(vcpu[0], &uc)) {
+ case UCALL_SYNC:
+ TEST_ASSERT(uc.args[1] == stage,
+ "Unexpected stage: %ld (%d expected)\n",
+ uc.args[1], stage);
+ break;
+ case UCALL_ABORT:
+ TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+ __FILE__, uc.args[1]);
+ return 1;
+ case UCALL_DONE:
+ return 0;
+ }
+
+ stage++;
+ }
+
+ cancel_join_vcpu_thread(threads[0], vcpu[1]);
+ cancel_join_vcpu_thread(threads[1], vcpu[2]);
+ kvm_vm_free(vm);
+
+ return 0;
+}
--
2.37.3

2022-09-21 15:48:54

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 10/39] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()

Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
for a smaller number of vCPUs in the request. When Hyper-V TLB flush
is in use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
send IPIs to a large number of vCPUs (and are rarely used in general).

Introduce hv_is_vp_in_sparse_set() to directly check if the specified
VP_ID is present in sparse vCPU set.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 69891c48c12a..9764ebb7fd5f 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1741,6 +1741,25 @@ static void sparse_set_to_vcpu_mask(struct kvm *kvm, u64 *sparse_banks,
}
}

+static bool hv_is_vp_in_sparse_set(u32 vp_id, u64 valid_bank_mask, u64 sparse_banks[])
+{
+ int bank, sbank = 0;
+
+ if (!test_bit(vp_id / HV_VCPUS_PER_SPARSE_BANK,
+ (unsigned long *)&valid_bank_mask))
+ return false;
+
+ for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
+ KVM_HV_MAX_SPARSE_VCPU_SET_BITS) {
+ if (bank == vp_id / HV_VCPUS_PER_SPARSE_BANK)
+ break;
+ sbank++;
+ }
+
+ return test_bit(vp_id % HV_VCPUS_PER_SPARSE_BANK,
+ (unsigned long *)&sparse_banks[sbank]);
+}
+
struct kvm_hv_hcall {
u64 param;
u64 ingpa;
@@ -2023,8 +2042,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
((u64)hc->rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET);
}

-static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
- unsigned long *vcpu_bitmap)
+static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
+ u64 *sparse_banks, u64 valid_bank_mask)
{
struct kvm_lapic_irq irq = {
.delivery_mode = APIC_DM_FIXED,
@@ -2034,7 +2053,10 @@ static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
unsigned long i;

kvm_for_each_vcpu(i, vcpu, kvm) {
- if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
+ if (sparse_banks &&
+ !hv_is_vp_in_sparse_set(kvm_hv_get_vpindex(vcpu),
+ valid_bank_mask,
+ sparse_banks))
continue;

/* We fail only when APIC is disabled */
@@ -2047,7 +2069,6 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
struct kvm *kvm = vcpu->kvm;
struct hv_send_ipi_ex send_ipi_ex;
struct hv_send_ipi send_ipi;
- DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
u64 valid_bank_mask;
u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
u32 vector;
@@ -2109,13 +2130,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return HV_STATUS_INVALID_HYPERCALL_INPUT;

- if (all_cpus) {
- kvm_send_ipi_to_many(kvm, vector, NULL);
- } else {
- sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
-
- kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
- }
+ kvm_hv_send_ipi_to_many(kvm, vector, all_cpus ? NULL : sparse_banks, valid_bank_mask);

ret_success:
return HV_STATUS_SUCCESS;
--
2.37.3

2022-09-21 15:48:55

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 18/39] KVM: x86: hyper-v: Introduce fast guest_hv_cpuid_has_l2_tlb_flush() check

Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
from L2 in L0 to process L2 TLB flush requests.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 1b53dd4cff4d..3fff3a6f4bb9 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -174,6 +174,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
kfifo_reset_out(&tlb_flush_fifo->entries);
}

+static inline bool guest_hv_cpuid_has_l2_tlb_flush(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ return hv_vcpu && (hv_vcpu->cpuid_cache.nested_eax & HV_X64_NESTED_DIRECT_FLUSH);
+}
+
static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
--
2.37.3

2022-09-21 15:48:56

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 29/39] KVM: selftests: Export _vm_get_page_table_entry()

Make it possible for tests to mangle guest's page table entries in
addition to just getting them (available with vm_get_page_table_entry()).

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 2 ++
tools/testing/selftests/kvm/lib/x86_64/processor.c | 5 ++---
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 1c7805de8c27..500d711eb989 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -827,6 +827,8 @@ static inline uint8_t wrmsr_safe(uint32_t msr, uint64_t val)
return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
}

+uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
+ uint64_t vaddr);
uint64_t vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
uint64_t vaddr);
void vm_set_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 2e6e61bbe81b..5c135f896ada 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -214,9 +214,8 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
__virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
}

-static uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm,
- struct kvm_vcpu *vcpu,
- uint64_t vaddr)
+uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
+ uint64_t vaddr)
{
uint16_t index[4];
uint64_t *pml4e, *pdpe, *pde;
--
2.37.3

2022-09-21 15:49:18

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 35/39] KVM: selftests: Create a vendor independent helper to allocate Hyper-V specific test pages

There's no need to pollute VMX and SVM code with Hyper-V specific
stuff and allocate Hyper-V specific test pages for all test as only
few really need them. Create a dedicated struct and an allocation
helper.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/include/x86_64/evmcs.h | 4 ++--
.../selftests/kvm/include/x86_64/hyperv.h | 15 +++++++++++++
.../selftests/kvm/include/x86_64/vmx.h | 8 -------
.../testing/selftests/kvm/lib/x86_64/hyperv.c | 20 +++++++++++++++++
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 12 ----------
.../testing/selftests/kvm/x86_64/evmcs_test.c | 22 +++++++++----------
6 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 59b60d45b8f6..94d6059e9a12 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -256,9 +256,9 @@ static inline int evmcs_vmptrld(uint64_t vmcs_pa, void *vmcs)
return 0;
}

-static inline bool load_evmcs(uint64_t enlightened_vmcs_gpa, void *enlightened_vmcs)
+static inline bool load_evmcs(struct hyperv_test_pages *hv)
{
- if (evmcs_vmptrld(enlightened_vmcs_gpa, enlightened_vmcs))
+ if (evmcs_vmptrld(hv->enlightened_vmcs_gpa, hv->enlightened_vmcs))
return false;

current_evmcs->revision_id = EVMCS_VERSION;
diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index 42213f5de17f..e00ce9e122f4 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -265,4 +265,19 @@ extern struct hv_vp_assist_page *current_vp_assist;

int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist);

+struct hyperv_test_pages {
+ /* VP assist page */
+ void *vp_assist_hva;
+ uint64_t vp_assist_gpa;
+ void *vp_assist;
+
+ /* Enlightened VMCS */
+ void *enlightened_vmcs_hva;
+ uint64_t enlightened_vmcs_gpa;
+ void *enlightened_vmcs;
+};
+
+struct hyperv_test_pages *
+vcpu_alloc_hyperv_test_pages(struct kvm_vm *vm, vm_vaddr_t *p_hv_pages_gva);
+
#endif /* !SELFTEST_KVM_HYPERV_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index d07f13c9fced..6d024e1c5f99 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -517,14 +517,6 @@ struct vmx_pages {
uint64_t vmwrite_gpa;
void *vmwrite;

- void *vp_assist_hva;
- uint64_t vp_assist_gpa;
- void *vp_assist;
-
- void *enlightened_vmcs_hva;
- uint64_t enlightened_vmcs_gpa;
- void *enlightened_vmcs;
-
void *eptp_hva;
uint64_t eptp_gpa;
void *eptp;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/hyperv.c b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
index 32dc0afd9e5b..e44bb5cc8566 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
@@ -8,6 +8,26 @@
#include "processor.h"
#include "hyperv.h"

+struct hyperv_test_pages *
+vcpu_alloc_hyperv_test_pages(struct kvm_vm *vm, vm_vaddr_t *p_hv_pages_gva)
+{
+ vm_vaddr_t hv_pages_gva = vm_vaddr_alloc_page(vm);
+ struct hyperv_test_pages *hv = addr_gva2hva(vm, hv_pages_gva);
+
+ /* Setup of a region of guest memory for the VP Assist page. */
+ hv->vp_assist = (void *)vm_vaddr_alloc_page(vm);
+ hv->vp_assist_hva = addr_gva2hva(vm, (uintptr_t)hv->vp_assist);
+ hv->vp_assist_gpa = addr_gva2gpa(vm, (uintptr_t)hv->vp_assist);
+
+ /* Setup of a region of guest memory for the enlightened VMCS. */
+ hv->enlightened_vmcs = (void *)vm_vaddr_alloc_page(vm);
+ hv->enlightened_vmcs_hva = addr_gva2hva(vm, (uintptr_t)hv->enlightened_vmcs);
+ hv->enlightened_vmcs_gpa = addr_gva2gpa(vm, (uintptr_t)hv->enlightened_vmcs);
+
+ *p_hv_pages_gva = hv_pages_gva;
+ return hv;
+}
+
int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist)
{
uint64_t val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) |
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index f8acbc7c8d7d..11e7f1f26624 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -107,18 +107,6 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
vmx->vmwrite_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmwrite);
memset(vmx->vmwrite_hva, 0, getpagesize());

- /* Setup of a region of guest memory for the VP Assist page. */
- vmx->vp_assist = (void *)vm_vaddr_alloc_page(vm);
- vmx->vp_assist_hva = addr_gva2hva(vm, (uintptr_t)vmx->vp_assist);
- vmx->vp_assist_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vp_assist);
-
- /* Setup of a region of guest memory for the enlightened VMCS. */
- vmx->enlightened_vmcs = (void *)vm_vaddr_alloc_page(vm);
- vmx->enlightened_vmcs_hva =
- addr_gva2hva(vm, (uintptr_t)vmx->enlightened_vmcs);
- vmx->enlightened_vmcs_gpa =
- addr_gva2gpa(vm, (uintptr_t)vmx->enlightened_vmcs);
-
*p_vmx_gva = vmx_gva;
return vmx;
}
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index 5a4c8b1873aa..74f076ba574b 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -68,7 +68,7 @@ void l2_guest_code(void)
vmcall();
}

-void guest_code(struct vmx_pages *vmx_pages)
+void guest_code(struct vmx_pages *vmx_pages, struct hyperv_test_pages *hv_pages)
{
#define L2_GUEST_STACK_SIZE 64
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
@@ -78,23 +78,22 @@ void guest_code(struct vmx_pages *vmx_pages)
GUEST_SYNC(1);
GUEST_SYNC(2);

- enable_vp_assist(vmx_pages->vp_assist_gpa, vmx_pages->vp_assist);
+ enable_vp_assist(hv_pages->vp_assist_gpa, hv_pages->vp_assist);
evmcs_enable();

GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
GUEST_SYNC(3);
- GUEST_ASSERT(load_evmcs(vmx_pages->enlightened_vmcs_gpa,
- vmx_pages->enlightened_vmcs));
- GUEST_ASSERT(vmptrstz() == vmx_pages->enlightened_vmcs_gpa);
+ GUEST_ASSERT(load_evmcs(hv_pages));
+ GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);

GUEST_SYNC(4);
- GUEST_ASSERT(vmptrstz() == vmx_pages->enlightened_vmcs_gpa);
+ GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);

prepare_vmcs(vmx_pages, l2_guest_code,
&l2_guest_stack[L2_GUEST_STACK_SIZE]);

GUEST_SYNC(5);
- GUEST_ASSERT(vmptrstz() == vmx_pages->enlightened_vmcs_gpa);
+ GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);
current_evmcs->revision_id = -1u;
GUEST_ASSERT(vmlaunch());
current_evmcs->revision_id = EVMCS_VERSION;
@@ -104,7 +103,7 @@ void guest_code(struct vmx_pages *vmx_pages)
PIN_BASED_NMI_EXITING);

GUEST_ASSERT(!vmlaunch());
- GUEST_ASSERT(vmptrstz() == vmx_pages->enlightened_vmcs_gpa);
+ GUEST_ASSERT(vmptrstz() == hv_pages->enlightened_vmcs_gpa);

/*
* NMI forces L2->L1 exit, resuming L2 and hope that EVMCS is
@@ -152,7 +151,7 @@ void guest_code(struct vmx_pages *vmx_pages)
GUEST_SYNC(11);

/* Try enlightened vmptrld with an incorrect GPA */
- evmcs_vmptrld(0xdeadbeef, vmx_pages->enlightened_vmcs);
+ evmcs_vmptrld(0xdeadbeef, hv_pages->enlightened_vmcs);
GUEST_ASSERT(vmlaunch());
GUEST_ASSERT(ud_count == 1);
GUEST_DONE();
@@ -199,7 +198,7 @@ static struct kvm_vcpu *save_restore_vm(struct kvm_vm *vm,

int main(int argc, char *argv[])
{
- vm_vaddr_t vmx_pages_gva = 0;
+ vm_vaddr_t vmx_pages_gva = 0, hv_pages_gva = 0;

struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
@@ -217,7 +216,8 @@ int main(int argc, char *argv[])
vcpu_enable_evmcs(vcpu);

vcpu_alloc_vmx(vm, &vmx_pages_gva);
- vcpu_args_set(vcpu, 1, vmx_pages_gva);
+ vcpu_alloc_hyperv_test_pages(vm, &hv_pages_gva);
+ vcpu_args_set(vcpu, 2, vmx_pages_gva, hv_pages_gva);

vm_init_descriptor_tables(vm);
vcpu_init_descriptor_tables(vcpu);
--
2.37.3

2022-09-21 15:49:20

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 27/39] KVM: selftests: Fill in vm->vpages_mapped bitmap in virt_map() too

Similar to vm_vaddr_alloc(), virt_map() needs to reflect the mapping
in vm->vpages_mapped.

While on it, remove unneeded code wraping in vm_vaddr_alloc().

Reviewed-by: Andrew Jones <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 9889fe0d8919..ad9e15d4c6a9 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1214,8 +1214,7 @@ vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min)

virt_pg_map(vm, vaddr, paddr);

- sparsebit_set(vm->vpages_mapped,
- vaddr >> vm->page_shift);
+ sparsebit_set(vm->vpages_mapped, vaddr >> vm->page_shift);
}

return vaddr_start;
@@ -1288,6 +1287,8 @@ void virt_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
virt_pg_map(vm, vaddr, paddr);
vaddr += page_size;
paddr += page_size;
+
+ sparsebit_set(vm->vpages_mapped, vaddr >> vm->page_shift);
}
}

--
2.37.3

2022-09-21 15:49:43

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 36/39] KVM: selftests: Allocate Hyper-V partition assist page

In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate
so-called Partition assist page.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/hyperv.h | 5 +++++
tools/testing/selftests/kvm/lib/x86_64/hyperv.c | 5 +++++
2 files changed, 10 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index e00ce9e122f4..e0c0dc4b3d5c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -271,6 +271,11 @@ struct hyperv_test_pages {
uint64_t vp_assist_gpa;
void *vp_assist;

+ /* Partition assist page */
+ void *partition_assist_hva;
+ uint64_t partition_assist_gpa;
+ void *partition_assist;
+
/* Enlightened VMCS */
void *enlightened_vmcs_hva;
uint64_t enlightened_vmcs_gpa;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/hyperv.c b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
index e44bb5cc8566..e222db65a188 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
@@ -19,6 +19,11 @@ vcpu_alloc_hyperv_test_pages(struct kvm_vm *vm, vm_vaddr_t *p_hv_pages_gva)
hv->vp_assist_hva = addr_gva2hva(vm, (uintptr_t)hv->vp_assist);
hv->vp_assist_gpa = addr_gva2gpa(vm, (uintptr_t)hv->vp_assist);

+ /* Setup of a region of guest memory for the partition assist page. */
+ hv->partition_assist = (void *)vm_vaddr_alloc_page(vm);
+ hv->partition_assist_hva = addr_gva2hva(vm, (uintptr_t)hv->partition_assist);
+ hv->partition_assist_gpa = addr_gva2gpa(vm, (uintptr_t)hv->partition_assist);
+
/* Setup of a region of guest memory for the enlightened VMCS. */
hv->enlightened_vmcs = (void *)vm_vaddr_alloc_page(vm);
hv->enlightened_vmcs_hva = addr_gva2hva(vm, (uintptr_t)hv->enlightened_vmcs);
--
2.37.3

2022-09-21 15:50:24

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 08/39] x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants

It may not come clear from where the magical '64' value used in
__cpumask_to_vpset() come from. Moreover, '64' means both the maximum
sparse bank number as well as the number of vCPUs per bank. Add defines
to make things clear. These defines are also going to be used by KVM.

No functional change.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
include/asm-generic/hyperv-tlfs.h | 5 +++++
include/asm-generic/mshyperv.h | 11 ++++++-----
2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index fdce7a4cfc6f..020ca9bdbb79 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -399,6 +399,11 @@ struct hv_vpset {
u64 bank_contents[];
} __packed;

+/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
+#define HV_MAX_SPARSE_VCPU_BANKS (64)
+/* The number of vCPUs in one sparse bank */
+#define HV_VCPUS_PER_SPARSE_BANK (64)
+
/* HvCallSendSyntheticClusterIpi hypercall */
struct hv_send_ipi {
u32 vector;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c05d2ce9b6cd..89a529093042 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
{
int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
int this_cpu = smp_processor_id();
+ int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;

- /* valid_bank_mask can represent up to 64 banks */
- if (hv_max_vp_index / 64 >= 64)
+ /* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
+ if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
return 0;

/*
@@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
* structs are not cleared between calls, we risk flushing unneeded
* vCPUs otherwise.
*/
- for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
+ for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
vpset->bank_contents[vcpu_bank] = 0;

/*
@@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
vcpu = hv_cpu_number_to_vp_number(cpu);
if (vcpu == VP_INVAL)
return -1;
- vcpu_bank = vcpu / 64;
- vcpu_offset = vcpu % 64;
+ vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
+ vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
__set_bit(vcpu_offset, (unsigned long *)
&vpset->bank_contents[vcpu_bank]);
if (vcpu_bank >= nr_bank)
--
2.37.3

2022-09-21 16:20:48

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 13/39] KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use

To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/
VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is
also needed to handle post-flush exit to L1 upon request.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/vmx/nested.c | 15 +++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 025c0d6cda69..b2413f58b919 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -649,6 +649,12 @@ struct kvm_vcpu_hv {

/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
u64 sparse_banks[HV_MAX_SPARSE_VCPU_BANKS];
+
+ struct {
+ u64 pa_page_gpa;
+ u64 vm_id;
+ u32 vp_id;
+ } nested;
};

/* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4da0558943ce..0b79cab4b5fc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -225,6 +225,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)

static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);

if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) {
@@ -233,6 +234,12 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
}

vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
+
+ if (hv_vcpu) {
+ hv_vcpu->nested.pa_page_gpa = INVALID_GPA;
+ hv_vcpu->nested.vm_id = 0;
+ hv_vcpu->nested.vp_id = 0;
+ }
}

static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
@@ -1557,11 +1564,19 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
{
struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12;
struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(&vmx->vcpu);

/* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */
vmcs12->tpr_threshold = evmcs->tpr_threshold;
vmcs12->guest_rip = evmcs->guest_rip;

+ if (unlikely(!(hv_clean_fields &
+ HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL))) {
+ hv_vcpu->nested.pa_page_gpa = evmcs->partition_assist_page;
+ hv_vcpu->nested.vm_id = evmcs->hv_vm_id;
+ hv_vcpu->nested.vp_id = evmcs->hv_vp_id;
+ }
+
if (unlikely(!(hv_clean_fields &
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC))) {
vmcs12->guest_rsp = evmcs->guest_rsp;
--
2.37.3

2022-09-21 16:22:42

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 33/39] KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h

Hyper-V VP assist page is not eVMCS specific, it is also used for
enlightened nSVM. Move the code to vendor neutral place.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/evmcs.h | 40 +------------------
.../selftests/kvm/include/x86_64/hyperv.h | 31 ++++++++++++++
.../testing/selftests/kvm/lib/x86_64/hyperv.c | 21 ++++++++++
.../testing/selftests/kvm/x86_64/evmcs_test.c | 1 +
5 files changed, 56 insertions(+), 38 deletions(-)
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/hyperv.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 149543b7fcd1..36692fe34e10 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -50,6 +50,7 @@ LIBKVM += lib/test_util.c

LIBKVM_x86_64 += lib/x86_64/apic.c
LIBKVM_x86_64 += lib/x86_64/handlers.S
+LIBKVM_x86_64 += lib/x86_64/hyperv.c
LIBKVM_x86_64 += lib/x86_64/perf_test_util.c
LIBKVM_x86_64 += lib/x86_64/processor.c
LIBKVM_x86_64 += lib/x86_64/svm.c
diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index efdc62704f27..2530b5aeb4ba 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -10,6 +10,7 @@
#define SELFTEST_KVM_EVMCS_H

#include <stdint.h>
+#include "hyperv.h"
#include "vmx.h"

#define u16 uint16_t
@@ -20,27 +21,6 @@

extern bool enable_evmcs;

-struct hv_nested_enlightenments_control {
- struct {
- __u32 directhypercall:1;
- __u32 reserved:31;
- } features;
- struct {
- __u32 reserved;
- } hypercallControls;
-} __packed;
-
-/* Define virtual processor assist page structure. */
-struct hv_vp_assist_page {
- __u32 apic_assist;
- __u32 reserved1;
- __u64 vtl_control[3];
- struct hv_nested_enlightenments_control nested_control;
- __u8 enlighten_vmentry;
- __u8 reserved2[7];
- __u64 current_nested_vmcs;
-} __packed;
-
struct hv_enlightened_vmcs {
u32 revision_id;
u32 abort;
@@ -257,29 +237,13 @@ struct hv_enlightened_vmcs {
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL BIT(15)
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL 0xFFFF

-#define HV_X64_MSR_VP_ASSIST_PAGE 0x40000073
-#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
-#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
-#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK \
- (~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
-
extern struct hv_enlightened_vmcs *current_evmcs;
-extern struct hv_vp_assist_page *current_vp_assist;

int vcpu_enable_evmcs(struct kvm_vcpu *vcpu);

-static inline int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist)
+static inline void evmcs_enable(void)
{
- u64 val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) |
- HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
-
- wrmsr(HV_X64_MSR_VP_ASSIST_PAGE, val);
-
- current_vp_assist = vp_assist;
-
enable_evmcs = true;
-
- return 0;
}

static inline int evmcs_vmptrld(uint64_t vmcs_pa, void *vmcs)
diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index 8a7d607e3a38..42213f5de17f 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -234,4 +234,35 @@ static inline void hyperv_write_xmm_input(void *data, int n_sse_regs)
/* Proper HV_X64_MSR_GUEST_OS_ID value */
#define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)

+#define HV_X64_MSR_VP_ASSIST_PAGE 0x40000073
+#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
+#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK \
+ (~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1))
+
+struct hv_nested_enlightenments_control {
+ struct {
+ __u32 directhypercall:1;
+ __u32 reserved:31;
+ } features;
+ struct {
+ __u32 reserved;
+ } hypercallControls;
+} __packed;
+
+/* Define virtual processor assist page structure. */
+struct hv_vp_assist_page {
+ __u32 apic_assist;
+ __u32 reserved1;
+ __u64 vtl_control[3];
+ struct hv_nested_enlightenments_control nested_control;
+ __u8 enlighten_vmentry;
+ __u8 reserved2[7];
+ __u64 current_nested_vmcs;
+} __packed;
+
+extern struct hv_vp_assist_page *current_vp_assist;
+
+int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist);
+
#endif /* !SELFTEST_KVM_HYPERV_H */
diff --git a/tools/testing/selftests/kvm/lib/x86_64/hyperv.c b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
new file mode 100644
index 000000000000..32dc0afd9e5b
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/x86_64/hyperv.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hyper-V specific functions.
+ *
+ * Copyright (C) 2021, Red Hat Inc.
+ */
+#include <stdint.h>
+#include "processor.h"
+#include "hyperv.h"
+
+int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist)
+{
+ uint64_t val = (vp_assist_pa & HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK) |
+ HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
+
+ wrmsr(HV_X64_MSR_VP_ASSIST_PAGE, val);
+
+ current_vp_assist = vp_assist;
+
+ return 0;
+}
diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
index 99bc202243d2..9007fb04343b 100644
--- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c
@@ -79,6 +79,7 @@ void guest_code(struct vmx_pages *vmx_pages)
GUEST_SYNC(2);

enable_vp_assist(vmx_pages->vp_assist_gpa, vmx_pages->vp_assist);
+ evmcs_enable();

GUEST_ASSERT(vmx_pages->vmcs_gpa);
GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages));
--
2.37.3

2022-09-21 16:24:26

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 14/39] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id

Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
assist page address to handle L2 TLB flush requests.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/svm/hyperv.h | 16 ++++++++++++++++
arch/x86/kvm/svm/nested.c | 2 ++
2 files changed, 18 insertions(+)

diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
index 7d6d97968fb9..8cf702fed7e5 100644
--- a/arch/x86/kvm/svm/hyperv.h
+++ b/arch/x86/kvm/svm/hyperv.h
@@ -9,6 +9,7 @@
#include <asm/mshyperv.h>

#include "../hyperv.h"
+#include "svm.h"

/*
* Hyper-V uses the software reserved 32 bytes in VMCB
@@ -32,4 +33,19 @@ struct hv_enlightenments {
*/
#define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW

+static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct hv_enlightenments *hve =
+ (struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ if (!hv_vcpu)
+ return;
+
+ hv_vcpu->nested.pa_page_gpa = hve->partition_assist_page;
+ hv_vcpu->nested.vm_id = hve->hv_vm_id;
+ hv_vcpu->nested.vp_id = hve->hv_vp_id;
+}
+
#endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 4c620999d230..9fd75d45b31b 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -800,6 +800,8 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
if (kvm_vcpu_apicv_active(vcpu))
kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);

+ nested_svm_hv_update_vm_vp_ids(vcpu);
+
return 0;
}

--
2.37.3

2022-09-21 16:31:56

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 20/39] KVM: nVMX: hyper-v: Enable L2 TLB flush

Enable L2 TLB flush feature on nVMX when:
- Enlightened VMCS is in use.
- The feature flag is enabled in eVMCS.
- The feature flag is enabled in partition assist page.

Perform synthetic vmexit to L1 after processing TLB flush call upon
request (HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH).

Note: nested_evmcs_l2_tlb_flush_enabled() uses cached VP assist page copy
which gets updated from nested_vmx_handle_enlightened_vmptrld(). This is
also guaranteed to happen post migration with eVMCS backed L2 running.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx/evmcs.c | 17 +++++++++++++++++
arch/x86/kvm/vmx/evmcs.h | 10 ++++++++++
arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++++++++
3 files changed, 49 insertions(+)

diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index 635a0c81ff1d..f7fedc3ed247 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -6,6 +6,7 @@
#include "../hyperv.h"
#include "../cpuid.h"
#include "evmcs.h"
+#include "nested.h"
#include "vmcs.h"
#include "vmx.h"
#include "trace.h"
@@ -501,6 +502,22 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
return 0;
}

+bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
+
+ if (!hv_vcpu || !evmcs)
+ return false;
+
+ if (!evmcs->hv_enlightenments_control.nested_flush_hypercall)
+ return false;
+
+ return hv_vcpu->vp_assist_page.nested_control.features.directhypercall;
+}
+
void vmx_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu)
{
+ nested_vmx_vmexit(vcpu, HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH, 0, 0);
}
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 7ad56fbc4b4d..dd1589336e79 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -63,6 +63,15 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
#define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (0)
#define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)

+/*
+ * Note, Hyper-V isn't actually stealing bit 28 from Intel, just abusing it by
+ * pairing it with architecturally impossible exit reasons. Bit 28 is set only
+ * on SMI exits to a SMI transfer monitor (STM) and if and only if a MTF VM-Exit
+ * is pending. I.e. it will never be set by hardware for non-SMI exits (there
+ * are only three), nor will it ever be set unless the VMM is an STM.
+ */
+#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031
+
struct evmcs_field {
u16 offset;
u16 clean_field;
@@ -241,6 +250,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
uint16_t *vmcs_version);
void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
+bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu);
void vmx_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu);

#endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0634518a6719..1451a7a2c488 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1132,6 +1132,17 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
{
struct vcpu_vmx *vmx = to_vmx(vcpu);

+ /*
+ * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
+ * L2's VP_ID upon request from the guest. Make sure we check for
+ * pending entries for the case when the request got misplaced (e.g.
+ * a transition from L2->L1 happened while processing L2 TLB flush
+ * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
+ * anything if there are no requests in the corresponding buffer.
+ */
+ if (to_hv_vcpu(vcpu))
+ kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+
/*
* If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
* for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
@@ -3267,6 +3278,12 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)

static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu)
{
+ /*
+ * Note: nested_get_evmcs_page() also updates 'vp_assist_page' copy
+ * in 'struct kvm_vcpu_hv' in case eVMCS is in use, this is mandatory
+ * to make nested_evmcs_l2_tlb_flush_enabled() work correctly post
+ * migration.
+ */
if (!nested_get_evmcs_page(vcpu)) {
pr_debug_ratelimited("%s: enlightened vmptrld failed\n",
__func__);
@@ -6143,6 +6160,11 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
* Handle L2's bus locks in L0 directly.
*/
return true;
+ case EXIT_REASON_VMCALL:
+ /* Hyper-V L2 TLB flush hypercall is handled by L0 */
+ return guest_hv_cpuid_has_l2_tlb_flush(vcpu) &&
+ nested_evmcs_l2_tlb_flush_enabled(vcpu) &&
+ kvm_hv_is_tlb_flush_hcall(vcpu);
default:
break;
}
--
2.37.3

2022-09-21 16:37:37

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 15/39] KVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush() nested hook

Hyper-V supports injecting synthetic L2->L1 exit after performing
L2 TLB flush operation but the procedure is vendor specific. Introduce
.hv_inject_synthetic_vmexit_post_tlb_flush nested hook for it.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/Makefile | 3 ++-
arch/x86/kvm/svm/hyperv.c | 11 +++++++++++
arch/x86/kvm/svm/hyperv.h | 2 ++
arch/x86/kvm/svm/nested.c | 1 +
arch/x86/kvm/vmx/evmcs.c | 4 ++++
arch/x86/kvm/vmx/evmcs.h | 1 +
arch/x86/kvm/vmx/nested.c | 1 +
8 files changed, 23 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kvm/svm/hyperv.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2413f58b919..b40df51b58d9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1694,6 +1694,7 @@ struct kvm_x86_nested_ops {
int (*enable_evmcs)(struct kvm_vcpu *vcpu,
uint16_t *vmcs_version);
uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu);
+ void (*hv_inject_synthetic_vmexit_post_tlb_flush)(struct kvm_vcpu *vcpu);
};

struct kvm_x86_init_ops {
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 30f244b64523..b6d53b045692 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -25,7 +25,8 @@ kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
kvm-intel-$(CONFIG_X86_SGX_KVM) += vmx/sgx.o

-kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o svm/sev.o
+kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
+ svm/sev.o svm/hyperv.o

ifdef CONFIG_HYPERV
kvm-amd-y += svm/svm_onhyperv.o
diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
new file mode 100644
index 000000000000..911f51021af1
--- /dev/null
+++ b/arch/x86/kvm/svm/hyperv.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD SVM specific code for Hyper-V on KVM.
+ *
+ * Copyright 2022 Red Hat, Inc. and/or its affiliates.
+ */
+#include "hyperv.h"
+
+void svm_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
index 8cf702fed7e5..dd2e393f84a0 100644
--- a/arch/x86/kvm/svm/hyperv.h
+++ b/arch/x86/kvm/svm/hyperv.h
@@ -48,4 +48,6 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
hv_vcpu->nested.vp_id = hve->hv_vp_id;
}

+void svm_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu);
+
#endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9fd75d45b31b..b8df2f4f880e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1722,4 +1722,5 @@ struct kvm_x86_nested_ops svm_nested_ops = {
.get_nested_state_pages = svm_get_nested_state_pages,
.get_state = svm_get_nested_state,
.set_state = svm_set_nested_state,
+ .hv_inject_synthetic_vmexit_post_tlb_flush = svm_hv_inject_synthetic_vmexit_post_tlb_flush,
};
diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
index d8b23c96d627..26bb40933e6d 100644
--- a/arch/x86/kvm/vmx/evmcs.c
+++ b/arch/x86/kvm/vmx/evmcs.c
@@ -507,3 +507,7 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,

return 0;
}
+
+void vmx_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu)
+{
+}
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 6f746ef3c038..492d1c58c734 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -241,5 +241,6 @@ int nested_enable_evmcs(struct kvm_vcpu *vcpu,
uint16_t *vmcs_version);
void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
int nested_evmcs_check_controls(struct vmcs12 *vmcs12);
+void vmx_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu);

#endif /* __KVM_X86_VMX_EVMCS_H */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0b79cab4b5fc..640680228973 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6994,4 +6994,5 @@ struct kvm_x86_nested_ops vmx_nested_ops = {
.write_log_dirty = nested_vmx_write_pml_buffer,
.enable_evmcs = nested_enable_evmcs,
.get_evmcs_version = nested_get_evmcs_version,
+ .hv_inject_synthetic_vmexit_post_tlb_flush = vmx_hv_inject_synthetic_vmexit_post_tlb_flush,
};
--
2.37.3

2022-09-21 16:39:12

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 32/39] KVM: selftests: Sync 'struct hv_vp_assist_page' definition with hyperv-tlfs.h

'struct hv_vp_assist_page' definition doesn't match TLFS. Also, define
'struct hv_nested_enlightenments_control' and use it instead of opaque
'__u64'.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/include/x86_64/evmcs.h | 22 ++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 4b6840df2979..efdc62704f27 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -20,14 +20,26 @@

extern bool enable_evmcs;

+struct hv_nested_enlightenments_control {
+ struct {
+ __u32 directhypercall:1;
+ __u32 reserved:31;
+ } features;
+ struct {
+ __u32 reserved;
+ } hypercallControls;
+} __packed;
+
+/* Define virtual processor assist page structure. */
struct hv_vp_assist_page {
__u32 apic_assist;
- __u32 reserved;
- __u64 vtl_control[2];
- __u64 nested_enlightenments_control[2];
- __u32 enlighten_vmentry;
+ __u32 reserved1;
+ __u64 vtl_control[3];
+ struct hv_nested_enlightenments_control nested_control;
+ __u8 enlighten_vmentry;
+ __u8 reserved2[7];
__u64 current_nested_vmcs;
-};
+} __packed;

struct hv_enlightened_vmcs {
u32 revision_id;
--
2.37.3

2022-09-21 16:39:12

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 38/39] KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test

Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the Partition
assist page.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/x86_64/hyperv_svm_test.c | 64 +++++++++++++++++--
1 file changed, 59 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c b/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
index a380ad7bb9b3..3bd9e4ceb33f 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_svm_test.c
@@ -41,8 +41,13 @@ struct hv_enlightenments {
*/
#define VMCB_HV_NESTED_ENLIGHTENMENTS (1U << 31)

+#define HV_SVM_EXITCODE_ENL 0xF0000000
+#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH (1)
+
void l2_guest_code(void)
{
+ u64 unused;
+
GUEST_SYNC(3);
/* Exit to L1 */
vmmcall();
@@ -56,11 +61,30 @@ void l2_guest_code(void)

GUEST_SYNC(5);

+ /* L2 TLB flush tests */
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE |
+ HV_HYPERCALL_FAST_BIT, 0x0,
+ HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES |
+ HV_FLUSH_ALL_PROCESSORS);
+ rdmsr(MSR_FS_BASE);
+ /*
+ * Note: hypercall status (RAX) is not preserved correctly by L1 after
+ * synthetic vmexit, use unchecked version.
+ */
+ __hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE |
+ HV_HYPERCALL_FAST_BIT, 0x0,
+ HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES |
+ HV_FLUSH_ALL_PROCESSORS, &unused);
+ /* Make sure we're not issuing Hyper-V TLB flush call again */
+ __asm__ __volatile__ ("mov $0xdeadbeef, %rcx");
+
/* Done, exit to L1 and never come back. */
vmmcall();
}

-static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
+static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm,
+ struct hyperv_test_pages *hv_pages,
+ vm_vaddr_t pgs_gpa)
{
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
struct vmcb *vmcb = svm->vmcb;
@@ -69,13 +93,23 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)

GUEST_SYNC(1);

- wrmsr(HV_X64_MSR_GUEST_OS_ID, (u64)0x8100 << 48);
+ wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+ wrmsr(HV_X64_MSR_HYPERCALL, pgs_gpa);
+ enable_vp_assist(hv_pages->vp_assist_gpa, hv_pages->vp_assist);

GUEST_ASSERT(svm->vmcb_gpa);
/* Prepare for L2 execution. */
generic_svm_setup(svm, l2_guest_code,
&l2_guest_stack[L2_GUEST_STACK_SIZE]);

+ /* L2 TLB flush setup */
+ hve->partition_assist_page = hv_pages->partition_assist_gpa;
+ hve->hv_enlightenments_control.nested_flush_hypercall = 1;
+ hve->hv_vm_id = 1;
+ hve->hv_vp_id = 1;
+ current_vp_assist->nested_control.features.directhypercall = 1;
+ *(u32 *)(hv_pages->partition_assist) = 0;
+
GUEST_SYNC(2);
run_guest(vmcb, svm->vmcb_gpa);
GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
@@ -110,6 +144,20 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)
GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_MSR);
vmcb->save.rip += 2; /* rdmsr */

+
+ /*
+ * L2 TLB flush test. First VMCALL should be handled directly by L0,
+ * no VMCALL exit expected.
+ */
+ run_guest(vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_MSR);
+ vmcb->save.rip += 2; /* rdmsr */
+ /* Enable synthetic vmexit */
+ *(u32 *)(hv_pages->partition_assist) = 1;
+ run_guest(vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT(vmcb->control.exit_code == HV_SVM_EXITCODE_ENL);
+ GUEST_ASSERT(vmcb->control.exit_info_1 == HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH);
+
run_guest(vmcb, svm->vmcb_gpa);
GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL);
GUEST_SYNC(6);
@@ -119,8 +167,8 @@ static void __attribute__((__flatten__)) guest_code(struct svm_test_data *svm)

int main(int argc, char *argv[])
{
- vm_vaddr_t nested_gva = 0;
-
+ vm_vaddr_t nested_gva = 0, hv_pages_gva = 0;
+ vm_vaddr_t hcall_page;
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
struct kvm_run *run;
@@ -134,7 +182,13 @@ int main(int argc, char *argv[])
vcpu_set_hv_cpuid(vcpu);
run = vcpu->run;
vcpu_alloc_svm(vm, &nested_gva);
- vcpu_args_set(vcpu, 1, nested_gva);
+ vcpu_alloc_hyperv_test_pages(vm, &hv_pages_gva);
+
+ hcall_page = vm_vaddr_alloc_pages(vm, 1);
+ memset(addr_gva2hva(vm, hcall_page), 0x0, getpagesize());
+
+ vcpu_args_set(vcpu, 3, nested_gva, hv_pages_gva, addr_gva2gpa(vm, hcall_page));
+ vcpu_set_msr(vcpu, HV_X64_MSR_VP_INDEX, vcpu->id);

for (stage = 1;; stage++) {
vcpu_run(vcpu);
--
2.37.3

2022-09-21 16:39:15

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 01/39] KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'

To make terminology between Hyper-V-on-KVM and KVM-on-Hyper-V consistent,
rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'. The change
eliminates the use of confusing 'direct' and adds the missing underscore.

No functional change.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 2 +-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/svm_onhyperv.c | 2 +-
arch/x86/kvm/svm/svm_onhyperv.h | 6 +++---
arch/x86/kvm/vmx/vmx.c | 6 +++---
arch/x86/kvm/x86.c | 6 +++---
6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 82ba4a564e58..6033b54963a4 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -123,7 +123,7 @@ KVM_X86_OP_OPTIONAL(guest_memory_reclaimed)
KVM_X86_OP(get_msr_feature)
KVM_X86_OP(can_emulate_instruction)
KVM_X86_OP(apic_init_signal_blocked)
-KVM_X86_OP_OPTIONAL(enable_direct_tlbflush)
+KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)
KVM_X86_OP_OPTIONAL(migrate_timers)
KVM_X86_OP(msr_filter_changed)
KVM_X86_OP(complete_emulated_msr)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b3ce723efb43..504daf473092 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1624,7 +1624,7 @@ struct kvm_x86_ops {
void *insn, int insn_len);

bool (*apic_init_signal_blocked)(struct kvm_vcpu *vcpu);
- int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
+ int (*enable_l2_tlb_flush)(struct kvm_vcpu *vcpu);

void (*migrate_timers)(struct kvm_vcpu *vcpu);
void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/svm_onhyperv.c b/arch/x86/kvm/svm/svm_onhyperv.c
index 8cdc62c74a96..69a7014d1cef 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.c
+++ b/arch/x86/kvm/svm/svm_onhyperv.c
@@ -14,7 +14,7 @@
#include "kvm_onhyperv.h"
#include "svm_onhyperv.h"

-int svm_hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
+int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu)
{
struct hv_enlightenments *hve;
struct hv_partition_assist_pg **p_hv_pa_pg =
diff --git a/arch/x86/kvm/svm/svm_onhyperv.h b/arch/x86/kvm/svm/svm_onhyperv.h
index e2fc59380465..d6ec4aeebedb 100644
--- a/arch/x86/kvm/svm/svm_onhyperv.h
+++ b/arch/x86/kvm/svm/svm_onhyperv.h
@@ -13,7 +13,7 @@

static struct kvm_x86_ops svm_x86_ops;

-int svm_hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu);
+int svm_hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu);

static inline void svm_hv_init_vmcb(struct vmcb *vmcb)
{
@@ -51,8 +51,8 @@ static inline void svm_hv_hardware_setup(void)

vp_ap->nested_control.features.directhypercall = 1;
}
- svm_x86_ops.enable_direct_tlbflush =
- svm_hv_enable_direct_tlbflush;
+ svm_x86_ops.enable_l2_tlb_flush =
+ svm_hv_enable_l2_tlb_flush;
}
}

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 94c314dc2393..87ef9aefc4ac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -526,7 +526,7 @@ static unsigned long host_idt_base;
static bool __read_mostly enlightened_vmcs = true;
module_param(enlightened_vmcs, bool, 0444);

-static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
+static int hv_enable_l2_tlb_flush(struct kvm_vcpu *vcpu)
{
struct hv_enlightened_vmcs *evmcs;
struct hv_partition_assist_pg **p_hv_pa_pg =
@@ -8479,8 +8479,8 @@ static int __init vmx_init(void)
}

if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
- vmx_x86_ops.enable_direct_tlbflush
- = hv_enable_direct_tlbflush;
+ vmx_x86_ops.enable_l2_tlb_flush
+ = hv_enable_l2_tlb_flush;

} else {
enlightened_vmcs = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b8328cb6c14..f62d5799fcd7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4467,7 +4467,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
kvm_x86_ops.nested_ops->get_state(NULL, NULL, 0) : 0;
break;
case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
- r = kvm_x86_ops.enable_direct_tlbflush != NULL;
+ r = kvm_x86_ops.enable_l2_tlb_flush != NULL;
break;
case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
r = kvm_x86_ops.nested_ops->enable_evmcs != NULL;
@@ -5483,10 +5483,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
}
return r;
case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
- if (!kvm_x86_ops.enable_direct_tlbflush)
+ if (!kvm_x86_ops.enable_l2_tlb_flush)
return -ENOTTY;

- return static_call(kvm_x86_enable_direct_tlbflush)(vcpu);
+ return static_call(kvm_x86_enable_l2_tlb_flush)(vcpu);

case KVM_CAP_HYPERV_ENFORCE_CPUID:
return kvm_hv_set_enforce_cpuid(vcpu, cap->args[0]);
--
2.37.3

2022-09-21 16:41:11

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 30/39] KVM: selftests: Hyper-V PV TLB flush selftest

Introduce a selftest for Hyper-V PV TLB flush hypercalls
(HvFlushVirtualAddressSpace/HvFlushVirtualAddressSpaceEx,
HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).

The test creates one 'sender' vCPU and two 'worker' vCPU which do busy
loop reading from a certain GVA checking the observed value. Sender
vCPU drops to the host to swap the data page with another page filled
with a different value. The expectation for workers is also
altered. Without TLB flush on worker vCPUs, they may continue to
observe old value. To guard against accidental TLB flushes for worker
vCPUs the test is repeated 100 times.

Hyper-V TLB flush hypercalls are tested in both 'normal' and 'XMM
fast' modes.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/hyperv.h | 1 +
.../selftests/kvm/x86_64/hyperv_tlb_flush.c | 644 ++++++++++++++++++
4 files changed, 647 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 70a853711f9f..8e9d208488a8 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -26,6 +26,7 @@
/x86_64/hyperv_features
/x86_64/hyperv_ipi
/x86_64/hyperv_svm_test
+/x86_64/hyperv_tlb_flush
/x86_64/max_vcpuid_cap_test
/x86_64/mmio_warning_test
/x86_64/monitor_mwait_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e13dbf35947b..149543b7fcd1 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -86,6 +86,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_ipi
TEST_GEN_PROGS_x86_64 += x86_64/hyperv_svm_test
+TEST_GEN_PROGS_x86_64 += x86_64/hyperv_tlb_flush
TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
index 605059f6b8d7..8a7d607e3a38 100644
--- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
+++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
@@ -187,6 +187,7 @@
/* hypercall options */
#define HV_HYPERCALL_FAST_BIT BIT(16)
#define HV_HYPERCALL_VARHEAD_OFFSET 17
+#define HV_HYPERCALL_REP_COMP_OFFSET 32

/*
* Issue a Hyper-V hypercall. Returns exception vector raised or 0, 'hv_status'
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
new file mode 100644
index 000000000000..1f0624b890fd
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
@@ -0,0 +1,644 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V HvFlushVirtualAddress{List,Space}{,Ex} tests
+ *
+ * Copyright (C) 2022, Red Hat, Inc.
+ *
+ */
+
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <pthread.h>
+#include <inttypes.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "hyperv.h"
+#include "test_util.h"
+#include "vmx.h"
+
+#define WORKER_VCPU_ID_1 2
+#define WORKER_VCPU_ID_2 65
+
+#define NTRY 100
+#define NTEST_PAGES 2
+
+struct hv_vpset {
+ u64 format;
+ u64 valid_bank_mask;
+ u64 bank_contents[];
+};
+
+enum HV_GENERIC_SET_FORMAT {
+ HV_GENERIC_SET_SPARSE_4K,
+ HV_GENERIC_SET_ALL,
+};
+
+#define HV_FLUSH_ALL_PROCESSORS BIT(0)
+#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES BIT(1)
+#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY BIT(2)
+#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT BIT(3)
+
+/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
+struct hv_tlb_flush {
+ u64 address_space;
+ u64 flags;
+ u64 processor_mask;
+ u64 gva_list[];
+} __packed;
+
+/* HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressListEx hypercalls */
+struct hv_tlb_flush_ex {
+ u64 address_space;
+ u64 flags;
+ struct hv_vpset hv_vp_set;
+ u64 gva_list[];
+} __packed;
+
+/*
+ * Pass the following info to 'workers' and 'sender'
+ * - Hypercall page's GVA
+ * - Hypercall page's GPA
+ * - Test pages GVA
+ * - GVAs of the test pages' PTEs
+ */
+struct test_data {
+ vm_vaddr_t hcall_gva;
+ vm_paddr_t hcall_gpa;
+ vm_vaddr_t test_pages;
+ vm_vaddr_t test_pages_pte[NTEST_PAGES];
+};
+
+/* 'Worker' vCPU code checking the contents of the test page */
+static void worker_guest_code(vm_vaddr_t test_data)
+{
+ struct test_data *data = (struct test_data *)test_data;
+ u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
+ unsigned char chr_exp1, chr_exp2, chr_cur;
+
+ x2apic_enable();
+ wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+
+ for (;;) {
+ /* Read the expected char, then check what's in the test pages and then
+ * check the expectation again to make sure it wasn't updated in the meantime.
+ */
+ chr_exp1 = READ_ONCE(*(unsigned char *)
+ (data->test_pages + PAGE_SIZE * NTEST_PAGES + vcpu_id));
+ asm volatile("lfence");
+ chr_cur = *(unsigned char *)data->test_pages;
+ asm volatile("lfence");
+ chr_exp2 = READ_ONCE(*(unsigned char *)
+ (data->test_pages + PAGE_SIZE * NTEST_PAGES + vcpu_id));
+ if (chr_exp1 && chr_exp1 == chr_exp2)
+ GUEST_ASSERT(chr_cur == chr_exp1);
+ asm volatile("nop");
+ }
+}
+
+/*
+ * Write per-CPU info indicating what each 'worker' CPU is supposed to see in
+ * test page. '0' means don't check.
+ */
+static void set_expected_char(void *addr, unsigned char chr, int vcpu_id)
+{
+ asm volatile("mfence");
+ *(unsigned char *)(addr + NTEST_PAGES * PAGE_SIZE + vcpu_id) = chr;
+}
+
+/* Update PTEs swapping two test pages */
+static void swap_two_test_pages(vm_paddr_t pte_gva1, vm_paddr_t pte_gva2)
+{
+ uint64_t pte[2];
+
+ pte[0] = *(uint64_t *)pte_gva1;
+ pte[1] = *(uint64_t *)pte_gva2;
+
+ *(uint64_t *)pte_gva1 = pte[1];
+ *(uint64_t *)pte_gva2 = pte[0];
+}
+
+/* Delay */
+static inline void rep_nop(void)
+{
+ int i;
+
+ for (i = 0; i < 1000000; i++)
+ asm volatile("nop");
+}
+
+/*
+ * Prepare to test: 'disable' workers by setting the expectation to '0',
+ * clear hypercall input page and then swap two test pages.
+ */
+static inline void prepare_to_test(struct test_data *data)
+{
+ /* Clear hypercall input page */
+ memset((void *)data->hcall_gva, 0, PAGE_SIZE);
+
+ /* 'Disable' workers */
+ set_expected_char((void *)data->test_pages, 0x0, WORKER_VCPU_ID_1);
+ set_expected_char((void *)data->test_pages, 0x0, WORKER_VCPU_ID_2);
+
+ /* Make sure workers have enough time to notice */
+ asm volatile("mfence");
+ rep_nop();
+
+ /* Swap test page mappings */
+ swap_two_test_pages(data->test_pages_pte[0], data->test_pages_pte[1]);
+}
+
+/*
+ * Finalize the test: check hypercall resule set the expected char for
+ * 'worker' CPUs and give them some time to test.
+ */
+static inline void post_test(struct test_data *data, char exp_char1, char exp_char2)
+{
+ /* Set the expectation for workers, '0' means don't test */
+ set_expected_char((void *)data->test_pages, exp_char1, WORKER_VCPU_ID_1);
+ set_expected_char((void *)data->test_pages, exp_char2, WORKER_VCPU_ID_2);
+
+ /* Make sure workers have enough time to test */
+ asm volatile("mfence");
+ rep_nop();
+}
+
+/* Main vCPU doing the test */
+static void sender_guest_code(vm_vaddr_t test_data)
+{
+ struct test_data *data = (struct test_data *)test_data;
+ struct hv_tlb_flush *flush = (struct hv_tlb_flush *)data->hcall_gva;
+ struct hv_tlb_flush_ex *flush_ex = (struct hv_tlb_flush_ex *)data->hcall_gva;
+ vm_paddr_t hcall_gpa = data->hcall_gpa;
+ int i, stage = 1;
+
+ wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
+ wrmsr(HV_X64_MSR_HYPERCALL, data->hcall_gpa);
+
+ /* "Slow" hypercalls */
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, hcall_gpa,
+ hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, 0x0);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+ flush->gva_list[0] = (u64)data->test_pages;
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, 0x0);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
+ flush->processor_mask = 0;
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, hcall_gpa,
+ hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES | HV_FLUSH_ALL_PROCESSORS;
+ flush->gva_list[0] = (u64)data->test_pages;
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
+ (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, 0x0, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ /* bank_contents and gva_list occupy the same space, thus [1] */
+ flush_ex->gva_list[1] = (u64)data->test_pages;
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+ (1 << HV_HYPERCALL_VARHEAD_OFFSET) |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, 0x0, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
+ BIT_ULL(WORKER_VCPU_ID_1 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+ flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
+ (2 << HV_HYPERCALL_VARHEAD_OFFSET),
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
+ BIT_ULL(WORKER_VCPU_ID_2 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+ flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ /* bank_contents and gva_list occupy the same space, thus [2] */
+ flush_ex->gva_list[2] = (u64)data->test_pages;
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+ (2 << HV_HYPERCALL_VARHEAD_OFFSET) |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX,
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+ flush_ex->gva_list[0] = (u64)data->test_pages;
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ hcall_gpa, hcall_gpa + PAGE_SIZE);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ /* "Fast" hypercalls */
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for WORKER_VCPU_ID_1 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+ hyperv_write_xmm_input(&flush->processor_mask, 1);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE |
+ HV_HYPERCALL_FAST_BIT, 0x0,
+ HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, i % 2 ? 0x1 : 0x2, 0x0);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for WORKER_VCPU_ID_1 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush->processor_mask = BIT(WORKER_VCPU_ID_1);
+ flush->gva_list[0] = (u64)data->test_pages;
+ hyperv_write_xmm_input(&flush->processor_mask, 1);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
+ HV_HYPERCALL_FAST_BIT |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, i % 2 ? 0x1 : 0x2, 0x0);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE for HV_FLUSH_ALL_PROCESSORS */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ hyperv_write_xmm_input(&flush->processor_mask, 1);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE |
+ HV_HYPERCALL_FAST_BIT, 0x0,
+ HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES |
+ HV_FLUSH_ALL_PROCESSORS);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST for HV_FLUSH_ALL_PROCESSORS */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush->gva_list[0] = (u64)data->test_pages;
+ hyperv_write_xmm_input(&flush->processor_mask, 1);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST |
+ HV_HYPERCALL_FAST_BIT |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET), 0x0,
+ HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES |
+ HV_FLUSH_ALL_PROCESSORS);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for WORKER_VCPU_ID_2 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ hyperv_write_xmm_input(&flush_ex->hv_vp_set, 2);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
+ HV_HYPERCALL_FAST_BIT |
+ (1 << HV_HYPERCALL_VARHEAD_OFFSET),
+ 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, 0x0, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for WORKER_VCPU_ID_2 */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ /* bank_contents and gva_list occupy the same space, thus [1] */
+ flush_ex->gva_list[1] = (u64)data->test_pages;
+ hyperv_write_xmm_input(&flush_ex->hv_vp_set, 2);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+ HV_HYPERCALL_FAST_BIT |
+ (1 << HV_HYPERCALL_VARHEAD_OFFSET) |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, 0x0, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for both vCPUs */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_2 / 64) |
+ BIT_ULL(WORKER_VCPU_ID_1 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+ flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ hyperv_write_xmm_input(&flush_ex->hv_vp_set, 2);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
+ HV_HYPERCALL_FAST_BIT |
+ (2 << HV_HYPERCALL_VARHEAD_OFFSET),
+ 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for both vCPUs */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
+ flush_ex->hv_vp_set.valid_bank_mask = BIT_ULL(WORKER_VCPU_ID_1 / 64) |
+ BIT_ULL(WORKER_VCPU_ID_2 / 64);
+ flush_ex->hv_vp_set.bank_contents[0] = BIT_ULL(WORKER_VCPU_ID_1 % 64);
+ flush_ex->hv_vp_set.bank_contents[1] = BIT_ULL(WORKER_VCPU_ID_2 % 64);
+ /* bank_contents and gva_list occupy the same space, thus [2] */
+ flush_ex->gva_list[2] = (u64)data->test_pages;
+ hyperv_write_xmm_input(&flush_ex->hv_vp_set, 3);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+ HV_HYPERCALL_FAST_BIT |
+ (2 << HV_HYPERCALL_VARHEAD_OFFSET) |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX for HV_GENERIC_SET_ALL */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+ hyperv_write_xmm_input(&flush_ex->hv_vp_set, 2);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX |
+ HV_HYPERCALL_FAST_BIT,
+ 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_SYNC(stage++);
+
+ /* HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX for HV_GENERIC_SET_ALL */
+ for (i = 0; i < NTRY; i++) {
+ prepare_to_test(data);
+ flush_ex->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
+ flush_ex->hv_vp_set.format = HV_GENERIC_SET_ALL;
+ flush_ex->gva_list[0] = (u64)data->test_pages;
+ hyperv_write_xmm_input(&flush_ex->hv_vp_set, 2);
+ hyperv_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX |
+ HV_HYPERCALL_FAST_BIT |
+ (1UL << HV_HYPERCALL_REP_COMP_OFFSET),
+ 0x0, HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES);
+ post_test(data, i % 2 ? 0x1 : 0x2, i % 2 ? 0x1 : 0x2);
+ }
+
+ GUEST_DONE();
+}
+
+static void *vcpu_thread(void *arg)
+{
+ struct kvm_vcpu *vcpu = (struct kvm_vcpu *)arg;
+ struct ucall uc;
+ int old;
+ int r;
+ unsigned int exit_reason;
+
+ r = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, &old);
+ TEST_ASSERT(r == 0,
+ "pthread_setcanceltype failed on vcpu_id=%u with errno=%d",
+ vcpu->id, r);
+
+ vcpu_run(vcpu);
+ exit_reason = vcpu->run->exit_reason;
+
+ TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+ "vCPU %u exited with unexpected exit reason %u-%s, expected KVM_EXIT_IO",
+ vcpu->id, exit_reason, exit_reason_str(exit_reason));
+
+ if (get_ucall(vcpu, &uc) == UCALL_ABORT) {
+ TEST_ASSERT(false,
+ "vCPU %u exited with error: %s.\n",
+ vcpu->id, (const char *)uc.args[0]);
+ }
+
+ return NULL;
+}
+
+static void cancel_join_vcpu_thread(pthread_t thread, struct kvm_vcpu *vcpu)
+{
+ void *retval;
+ int r;
+
+ r = pthread_cancel(thread);
+ TEST_ASSERT(r == 0,
+ "pthread_cancel on vcpu_id=%d failed with errno=%d",
+ vcpu->id, r);
+
+ r = pthread_join(thread, &retval);
+ TEST_ASSERT(r == 0,
+ "pthread_join on vcpu_id=%d failed with errno=%d",
+ vcpu->id, r);
+ TEST_ASSERT(retval == PTHREAD_CANCELED,
+ "expected retval=%p, got %p", PTHREAD_CANCELED,
+ retval);
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_vm *vm;
+ struct kvm_vcpu *vcpu[3];
+ unsigned int exit_reason;
+ pthread_t threads[2];
+ vm_vaddr_t test_data_page, gva;
+ vm_paddr_t gpa;
+ uint64_t *pte;
+ struct test_data *data;
+ struct ucall uc;
+ int stage = 1, r, i;
+
+ vm = vm_create_with_one_vcpu(&vcpu[0], sender_guest_code);
+
+ /* Test data page */
+ test_data_page = vm_vaddr_alloc_page(vm);
+ data = (struct test_data *)addr_gva2hva(vm, test_data_page);
+
+ /* Hypercall input/output */
+ data->hcall_gva = vm_vaddr_alloc_pages(vm, 2);
+ data->hcall_gpa = addr_gva2gpa(vm, data->hcall_gva);
+ memset(addr_gva2hva(vm, data->hcall_gva), 0x0, 2 * PAGE_SIZE);
+
+ /*
+ * Test pages: the first one is filled with '0x1's, the second with '0x2's
+ * and the test will swap their mappings. The third page keeps the indication
+ * about the current state of mappings.
+ */
+ data->test_pages = vm_vaddr_alloc_pages(vm, NTEST_PAGES + 1);
+ for (i = 0; i < NTEST_PAGES; i++)
+ memset(addr_gva2hva(vm, data->test_pages + PAGE_SIZE * i),
+ (char)(i + 1), PAGE_SIZE);
+ set_expected_char(addr_gva2hva(vm, data->test_pages), 0x0, WORKER_VCPU_ID_1);
+ set_expected_char(addr_gva2hva(vm, data->test_pages), 0x0, WORKER_VCPU_ID_2);
+
+ /*
+ * Get PTE pointers for test pages and map them inside the guest.
+ * Use separate page for each PTE for simplicity.
+ */
+ gva = vm_vaddr_unused_gap(vm, NTEST_PAGES * PAGE_SIZE, KVM_UTIL_MIN_VADDR);
+ for (i = 0; i < NTEST_PAGES; i++) {
+ pte = _vm_get_page_table_entry(vm, vcpu[0], data->test_pages + i * PAGE_SIZE);
+ gpa = addr_hva2gpa(vm, pte);
+ __virt_pg_map(vm, gva + PAGE_SIZE * i, gpa & PAGE_MASK, PG_LEVEL_4K);
+ data->test_pages_pte[i] = gva + (gpa & ~PAGE_MASK);
+ }
+
+ /*
+ * Sender vCPU which performs the test: swaps test pages, sets expectation
+ * for 'workers' and issues TLB flush hypercalls.
+ */
+ vcpu_args_set(vcpu[0], 1, test_data_page);
+ vcpu_set_hv_cpuid(vcpu[0]);
+
+ /* Create worker vCPUs which check the contents of the test pages */
+ vcpu[1] = vm_vcpu_add(vm, WORKER_VCPU_ID_1, worker_guest_code);
+ vcpu_args_set(vcpu[1], 1, test_data_page);
+ vcpu_set_msr(vcpu[1], HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_1);
+ vcpu_set_hv_cpuid(vcpu[1]);
+
+ vcpu[2] = vm_vcpu_add(vm, WORKER_VCPU_ID_2, worker_guest_code);
+ vcpu_args_set(vcpu[2], 1, test_data_page);
+ vcpu_set_msr(vcpu[2], HV_X64_MSR_VP_INDEX, WORKER_VCPU_ID_2);
+ vcpu_set_hv_cpuid(vcpu[2]);
+
+ r = pthread_create(&threads[0], NULL, vcpu_thread, vcpu[1]);
+ TEST_ASSERT(r == 0,
+ "pthread_create failed errno=%d", errno);
+
+ r = pthread_create(&threads[1], NULL, vcpu_thread, vcpu[2]);
+ TEST_ASSERT(r == 0,
+ "pthread_create failed errno=%d", errno);
+
+ while (true) {
+ r = _vcpu_run(vcpu[0]);
+ exit_reason = vcpu[0]->run->exit_reason;
+
+ TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);
+ TEST_ASSERT(exit_reason == KVM_EXIT_IO,
+ "unexpected exit reason: %u (%s)",
+ exit_reason, exit_reason_str(exit_reason));
+
+ switch (get_ucall(vcpu[0], &uc)) {
+ case UCALL_SYNC:
+ TEST_ASSERT(uc.args[1] == stage,
+ "Unexpected stage: %ld (%d expected)\n",
+ uc.args[1], stage);
+ break;
+ case UCALL_ABORT:
+ TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
+ __FILE__, uc.args[1]);
+ return 1;
+ case UCALL_DONE:
+ return 0;
+ }
+
+ stage++;
+ }
+
+ cancel_join_vcpu_thread(threads[0], vcpu[1]);
+ cancel_join_vcpu_thread(threads[1], vcpu[2]);
+ kvm_vm_free(vm);
+
+ return 0;
+}
--
2.37.3

2022-09-21 16:42:04

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 12/39] KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv' instead of on-stack 'sparse_banks'

To make kvm_hv_flush_tlb() ready to handle L2 TLB flush requests, KVM needs
to allow for all 64 sparse vCPU banks regardless of KVM_MAX_VCPUs as L1
may use vCPU overcommit for L2. To avoid growing on-stack allocation, make
'sparse_banks' part of per-vCPU 'struct kvm_vcpu_hv' which is allocated
dynamically.

Note: sparse_set_to_vcpu_mask() can't currently be used to handle L2
requests as KVM does not keep L2 VM_ID -> L2 VCPU_ID -> L1 vCPU mappings,
i.e. its vp_bitmap array is still bounded by the number of L1 vCPUs and so
can remain an on-stack allocation.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/hyperv.c | 6 ++++--
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index add0718798c1..025c0d6cda69 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -646,6 +646,9 @@ struct kvm_vcpu_hv {
} cpuid_cache;

struct kvm_vcpu_hv_tlb_flush_fifo tlb_flush_fifo[HV_NR_TLB_FLUSH_FIFOS];
+
+ /* Preallocated buffer for handling hypercalls passing sparse vCPU set */
+ u64 sparse_banks[HV_MAX_SPARSE_VCPU_BANKS];
};

/* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 23eb139b2936..0f3d04223d60 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1904,6 +1904,8 @@ void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)

static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+ u64 *sparse_banks = hv_vcpu->sparse_banks;
struct kvm *kvm = vcpu->kvm;
struct hv_tlb_flush_ex flush_ex;
struct hv_tlb_flush flush;
@@ -1917,7 +1919,6 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
u64 __tlb_flush_entries[KVM_HV_TLB_FLUSH_FIFO_SIZE - 1];
u64 *tlb_flush_entries;
u64 valid_bank_mask;
- u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
struct kvm_vcpu *v;
unsigned long i;
bool all_cpus;
@@ -2069,11 +2070,12 @@ static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,

static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+ u64 *sparse_banks = hv_vcpu->sparse_banks;
struct kvm *kvm = vcpu->kvm;
struct hv_send_ipi_ex send_ipi_ex;
struct hv_send_ipi send_ipi;
u64 valid_bank_mask;
- u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
u32 vector;
bool all_cpus;

--
2.37.3

2022-09-21 16:42:10

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH v10 31/39] KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with hyperv-tlfs.h

'struct hv_enlightened_vmcs' definition in selftests is not '__packed'
and so we rely on the compiler doing the right padding. This is not
obvious so it seems beneficial to use the same definition as in kernel.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/evmcs.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/evmcs.h b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
index 58db74f68af2..4b6840df2979 100644
--- a/tools/testing/selftests/kvm/include/x86_64/evmcs.h
+++ b/tools/testing/selftests/kvm/include/x86_64/evmcs.h
@@ -41,6 +41,8 @@ struct hv_enlightened_vmcs {
u16 host_gs_selector;
u16 host_tr_selector;

+ u16 padding16_1;
+
u64 host_ia32_pat;
u64 host_ia32_efer;

@@ -159,7 +161,7 @@ struct hv_enlightened_vmcs {
u64 ept_pointer;

u16 virtual_processor_id;
- u16 padding16[3];
+ u16 padding16_2[3];

u64 padding64_2[5];
u64 guest_physical_address;
@@ -195,13 +197,13 @@ struct hv_enlightened_vmcs {
u64 guest_rip;

u32 hv_clean_fields;
- u32 hv_padding_32;
+ u32 padding32_1;
u32 hv_synthetic_controls;
struct {
u32 nested_flush_hypercall:1;
u32 msr_bitmap:1;
u32 reserved:30;
- } hv_enlightenments_control;
+ } __packed hv_enlightenments_control;
u32 hv_vp_id;
u32 padding32_2;
u64 hv_vm_id;
@@ -222,7 +224,7 @@ struct hv_enlightened_vmcs {
u64 host_ssp;
u64 host_ia32_int_ssp_table_addr;
u64 padding64_6;
-};
+} __packed;

#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE 0
#define HV_VMX_ENLIGHTENED_CLEAN_FIELD_IO_BITMAP BIT(0)
--
2.37.3

2022-09-21 16:46:54

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f62d5799fcd7..86504a8bfd9a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
> */
> void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
> {
> - if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
> + if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
> kvm_vcpu_flush_tlb_current(vcpu);
> + kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);

This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
TLB flush both flesh "guest" TLB entries, i.e. linear and combined mappings.

Amusing side topic, apparently I like arm's stage-2 terminology better than "TDP",
because I actually typed out "stage-2" first.

> + }
>
> - if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
> + if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
> + kvm_vcpu_flush_tlb_guest(vcpu);
> + kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> + } else if (kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu)) {
> kvm_vcpu_flush_tlb_guest(vcpu);
> + }
> }
> EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests);
>
> --
> 2.37.3
>

2022-09-21 17:20:36

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

On Wed, Sep 21, 2022, Sean Christopherson wrote:
> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index f62d5799fcd7..86504a8bfd9a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
> > */
> > void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
> > {
> > - if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
> > + if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
> > kvm_vcpu_flush_tlb_current(vcpu);
> > + kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>
> This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
> mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
> TLB flush both flesh "guest" TLB entries, i.e. linear and combined mappings.
>
> Amusing side topic, apparently I like arm's stage-2 terminology better than "TDP",
> because I actually typed out "stage-2" first.
>
> > + }
> >
> > - if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
> > + if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
> > + kvm_vcpu_flush_tlb_guest(vcpu);
> > + kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);

Looking at future patches where KVM needs to reset the FIFO when doing a "guest"
TLB flush, i.e. needs to do more than just clearing the request, what about putting
this in kvm_vcpu_flush_tlb_guest() right away?

Ah, and there's already a second caller to kvm_vcpu_flush_tlb_guest(). I doubt
KVM's paravirt TLB flush will ever collide with Hyper-V's paravirt TLB flush,
but logically a "guest" flush that is initiated through KVM's paravirt interface
should also clear Hyper-V's queue/request.

And for consistency, slot this in before this patch:

From: Sean Christopherson <[email protected]>
Date: Wed, 21 Sep 2022 09:35:34 -0700
Subject: [PATCH] KVM: x86: Move clearing of TLB_FLUSH_CURRENT to
kvm_vcpu_flush_tlb_all()

Clear KVM_REQ_TLB_FLUSH_CURRENT in kvm_vcpu_flush_tlb_all() instead of in
its sole caller that processes KVM_REQ_TLB_FLUSH. Regardless of why/when
kvm_vcpu_flush_tlb_all() is called, flushing "all" TLB entries also
flushes "current" TLB entries.

Ideally, there will never be another caller of kvm_vcpu_flush_tlb_all(),
and moving the handling "requires" extra work to document the ordering
requirement, but future Hyper-V paravirt TLB flushing support will add
similar logic for flush "guest" (Hyper-V can flush a subset of "guest"
entries). And in the Hyper-V case, KVM needs to do more than just clear
the request, the queue of GPAs to flush also needs to purged, and doing
all only in the request path is undesirable as kvm_vcpu_flush_tlb_guest()
does have multiple callers (though it's unlikely KVM's paravirt TLB flush
will coincide with Hyper-V's paravirt TLB flush).

Move the logic even though it adds extra "work" so that KVM will be
consistent with how flush requests are processed when the Hyper-V support
lands.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f62d5799fcd7..3ea2e51a8cb5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3383,6 +3383,9 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
{
++vcpu->stat.tlb_flush;
static_call(kvm_x86_flush_tlb_all)(vcpu);
+
+ /* Flushing all ASIDs flushes the current ASID... */
+ kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
}

static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
@@ -10462,12 +10465,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_mmu_sync_roots(vcpu);
if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
kvm_mmu_load_pgd(vcpu);
- if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
+
+ /*
+ * Note, the order matters here, as flushing "all" TLB entries
+ * also flushes the "current" TLB entries, i.e. servicing the
+ * flush "all" will clear any request to flush "current".
+ */
+ if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
kvm_vcpu_flush_tlb_all(vcpu);
-
- /* Flushing all ASIDs flushes the current ASID... */
- kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
- }
kvm_service_local_tlb_flush_requests(vcpu);

if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {

base-commit: ed102fe0b59586397b362a849bd7fb32582b77d8
--

2022-09-21 17:57:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 05/39] KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> void kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
> {
> struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
> struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> + u64 entries[KVM_HV_TLB_FLUSH_FIFO_SIZE];
> + int i, j, count;
> + gva_t gva;
>
> - kvm_vcpu_flush_tlb_guest(vcpu);
> -
> - if (!hv_vcpu)
> + if (!tdp_enabled || !hv_vcpu) {
> + kvm_vcpu_flush_tlb_guest(vcpu);
> return;
> + }
>
> tlb_flush_fifo = &hv_vcpu->tlb_flush_fifo;
>
> + count = kfifo_out(&tlb_flush_fifo->entries, entries, KVM_HV_TLB_FLUSH_FIFO_SIZE);
> +
> + for (i = 0; i < count; i++) {
> + if (entries[i] == KVM_HV_TLB_FLUSHALL_ENTRY)
> + goto out_flush_all;
> +
> + /*
> + * Lower 12 bits of 'address' encode the number of additional
> + * pages to flush.
> + */
> + gva = entries[i] & PAGE_MASK;
> + for (j = 0; j < (entries[i] & ~PAGE_MASK) + 1; j++)
> + static_call(kvm_x86_flush_tlb_gva)(vcpu, gva + j * PAGE_SIZE);
> +
> + ++vcpu->stat.tlb_flush;
> + }
> + return;
> +
> +out_flush_all:
> + kvm_vcpu_flush_tlb_guest(vcpu);
> kfifo_reset_out(&tlb_flush_fifo->entries);
> }

If kvm_vcpu_flush_tlb_guest() is done as a fallback, then this can use -ENOSPC,
which again I like from a documentation perspective.

out_flush_all:
kfifo_reset_out(&tlb_flush_fifo->entries);
return -ENOSPC;

2022-09-21 21:48:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 10/39] KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in kvm_hv_send_ipi()

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
> for a smaller number of vCPUs in the request. When Hyper-V TLB flush
> is in use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
> send IPIs to a large number of vCPUs (and are rarely used in general).
>
> Introduce hv_is_vp_in_sparse_set() to directly check if the specified
> VP_ID is present in sparse vCPU set.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/hyperv.c | 37 ++++++++++++++++++++++++++-----------
> 1 file changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 69891c48c12a..9764ebb7fd5f 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1741,6 +1741,25 @@ static void sparse_set_to_vcpu_mask(struct kvm *kvm, u64 *sparse_banks,
> }
> }
>
> +static bool hv_is_vp_in_sparse_set(u32 vp_id, u64 valid_bank_mask, u64 sparse_banks[])
> +{
> + int bank, sbank = 0;
> +
> + if (!test_bit(vp_id / HV_VCPUS_PER_SPARSE_BANK,
> + (unsigned long *)&valid_bank_mask))
> + return false;
> +
> + for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
> + KVM_HV_MAX_SPARSE_VCPU_SET_BITS) {
> + if (bank == vp_id / HV_VCPUS_PER_SPARSE_BANK)
> + break;
> + sbank++;

At the risk of being too clever, this can be heavily optimized, which given what
this helper is used for is probably worth doing. The index into sparse_banks is
the number of bits preceding the target bit, and POPCNT can determine the number
of bits. So, to get the index, simply strip off the upper bits and do hweight64().

And to avoid bugs while also optimizing for "small" VMs, the math can be skipped
if vp_id < 64, i.e. if bank==0, because in that case there can't possibly be
preceding bits.

Compile tested only...

int valid_bit_nr = vp_id / HV_VCPUS_PER_SPARSE_BANK;
unsigned long sbank;

if (!test_bit(valid_bit_nr, (unsigned long *)&valid_bank_mask))
return false;

/*
* The index into the sparse bank is the number of preceding bits in
* the valid mask. Optimize for VMs with <64 vCPUs by skipping the
* fancy math if there can't possibly be preceding bits.
*/
if (valid_bit_nr)
sbank = hweight64(valid_bank_mask & GENMASK_ULL(valid_bit_nr - 1, 0));
else
sbank = 0;

return test_bit(vp_id % HV_VCPUS_PER_SPARSE_BANK,
(unsigned long *)&sparse_banks[sbank]);

yields this, where the "call __sw_hweight64" will be patched to POPCNT on 64-bit
hosts (POPCNT has been around for a long time).

call 0xffffffff810c3ea0 <__fentry__>
push %rbp
mov %edi,%eax
mov %rsp,%rbp
shr $0x6,%eax
sub $0x8,%rsp
mov %rsi,-0x8(%rbp)
mov %eax,%ecx
bt %rcx,-0x8(%rbp)
setb %cl
jae 0xffffffff81064784 <hv_is_vp_in_sparse_set+52>
test %eax,%eax
mov %edi,%r8d
jne 0xffffffff8106478c <hv_is_vp_in_sparse_set+60>
and $0x3f,%r8d
bt %r8,(%rdx)
setb %cl
leave
mov %ecx,%eax
jmp 0xffffffff81c02200 <__x86_return_thunk>
mov $0x40,%ecx
mov $0xffffffffffffffff,%rdi
sub %eax,%ecx
shr %cl,%rdi
and -0x8(%rbp),%rdi
call 0xffffffff815beea0 <__sw_hweight64>
lea (%rdx,%rax,8),%rdx
jmp 0xffffffff81064779 <hv_is_vp_in_sparse_set+41>

Alternatively, we could choose not to optimize bit==0 and just do:

sbank = hweight64(valid_bank_mask & GENMASK_ULL(valid_bit_nr, 0)) - 1;

but there's enough prep work needed for hweight64() that I think it's worth
opimtizing because "small" VMs are probably very common.

2022-09-21 21:54:18

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 18/39] KVM: x86: hyper-v: Introduce fast guest_hv_cpuid_has_l2_tlb_flush() check

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
> from L2 in L0 to process L2 TLB flush requests.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/hyperv.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 1b53dd4cff4d..3fff3a6f4bb9 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -174,6 +174,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
> kfifo_reset_out(&tlb_flush_fifo->entries);
> }
>
> +static inline bool guest_hv_cpuid_has_l2_tlb_flush(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> + return hv_vcpu && (hv_vcpu->cpuid_cache.nested_eax & HV_X64_NESTED_DIRECT_FLUSH);

Nit, IMO this is long enough that it's worth wrapping to fit under the soft char limit.

return hv_vcpu &&
(hv_vcpu->cpuid_cache.nested_eax & HV_X64_NESTED_DIRECT_FLUSH);

> +}
> +
> static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
> {
> struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> --
> 2.37.3
>

2022-09-21 21:54:57

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 14/39] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
> assist page address to handle L2 TLB flush requests.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/svm/hyperv.h | 16 ++++++++++++++++
> arch/x86/kvm/svm/nested.c | 2 ++
> 2 files changed, 18 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> index 7d6d97968fb9..8cf702fed7e5 100644
> --- a/arch/x86/kvm/svm/hyperv.h
> +++ b/arch/x86/kvm/svm/hyperv.h
> @@ -9,6 +9,7 @@
> #include <asm/mshyperv.h>
>
> #include "../hyperv.h"
> +#include "svm.h"
>
> /*
> * Hyper-V uses the software reserved 32 bytes in VMCB
> @@ -32,4 +33,19 @@ struct hv_enlightenments {
> */
> #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
>
> +static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> + struct hv_enlightenments *hve =
> + (struct hv_enlightenments *)svm->nested.ctl.reserved_sw;

Eww :-)

I posted a small series to fix the casting[*], and as noted in the cover letter it's
going to conflict mightily. Ignoring merge order for the moment, looking at the
series as a whole, if the Hyper-V definitions are moved to hyperv-tlfs.h, then I'm
tempted to say there's no need for svm/hyperv.h.

There should never be users of this stuff outside of svm/nested.c, and IMO there's
not enough stuff to warrant a separate set of files. nested_svm_hv_update_vp_assist()
isn't SVM specific and fits better alongside kvm_hv_get_assist_page().

That leaves three functions and ~40 lines of code, which can easily go directly
into svm/nested.c.

I'm definitely not dead set against having hyperv.{ch}, but unless there's a high
probability of SVM+Hyper-V getting to eVMCS levels of enlightenment, my vote is
to put these helpers in svm/nested.c and move then if/when we do end up accumulating
more SVM+Hyper-V code.

As for merge order, I don't think there's a need for this series to take a
dependency on the cleanup, especially if these helpers land in nested.c. Fixing
up the casting and s/hv_enlightenments/hv_vmcb_enlightenments is straightforward.

[*] https://lore.kernel.org/all/[email protected]

2022-09-21 21:57:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 25/39] KVM: selftests: Move the function doing Hyper-V hypercall to a common header

+Vipin

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> All Hyper-V specific tests issuing hypercalls need this.
>
> Reviewed-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> .../selftests/kvm/include/x86_64/hyperv.h | 16 ++++++++++++++++
> .../selftests/kvm/x86_64/hyperv_features.c | 18 +-----------------
> 2 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> index f0a8a93694b2..285e9ff73573 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> @@ -185,6 +185,22 @@
> /* hypercall options */
> #define HV_HYPERCALL_FAST_BIT BIT(16)
>
> +static inline uint8_t hyperv_hypercall(u64 control, vm_vaddr_t input_address,
> + vm_vaddr_t output_address,
> + uint64_t *hv_status)
> +{
> + uint8_t vector;

Newline after the variable declaration.

> + /* Note both the hypercall and the "asm safe" clobber r9-r11. */
> + asm volatile("mov %[output_address], %%r8\n\t"
> + KVM_ASM_SAFE("vmcall")
> + : "=a" (*hv_status),
> + "+c" (control), "+d" (input_address),
> + KVM_ASM_SAFE_OUTPUTS(vector)
> + : [output_address] "r"(output_address)
> + : "cc", "memory", "r8", KVM_ASM_SAFE_CLOBBERS);
> + return vector;
> +}
> +
> /* Proper HV_X64_MSR_GUEST_OS_ID value */
> #define HYPERV_LINUX_OS_ID ((u64)0x8100 << 48)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
> index 1144bd1ea626..c464d324cde0 100644
> --- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c
> +++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
> @@ -13,22 +13,6 @@
> #include "processor.h"
> #include "hyperv.h"
>
> -static inline uint8_t hypercall(u64 control, vm_vaddr_t input_address,
> - vm_vaddr_t output_address, uint64_t *hv_status)
> -{
> - uint8_t vector;
> -
> - /* Note both the hypercall and the "asm safe" clobber r9-r11. */
> - asm volatile("mov %[output_address], %%r8\n\t"
> - KVM_ASM_SAFE("vmcall")
> - : "=a" (*hv_status),
> - "+c" (control), "+d" (input_address),
> - KVM_ASM_SAFE_OUTPUTS(vector)
> - : [output_address] "r"(output_address)
> - : "cc", "memory", "r8", KVM_ASM_SAFE_CLOBBERS);
> - return vector;
> -}
> -
> struct msr_data {
> uint32_t idx;
> bool available;
> @@ -78,7 +62,7 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)
> input = output = 0;
> }
>
> - vector = hypercall(hcall->control, input, output, &res);
> + vector = hyperv_hypercall(hcall->control, input, output, &res);
> if (hcall->ud_expected)
> GUEST_ASSERT_2(vector == UD_VECTOR, hcall->control, vector);
> else

Just out of sight here, but I broke this code in commit cc5851c6be86 ("KVM: selftests:
Use exception fixup for #UD/#GP Hyper-V MSR/hcall tests"). I got too fancy and
inverted the ud_expected logic when checking the result. The broken code skips the
check when #UD _not_ expected.

I.e. this

if (hcall->ud_expected)
GUEST_ASSERT_2(vector == UD_VECTOR, hcall->control, vector);
else
GUEST_ASSERT_2(!vector, hcall->control, vector);

GUEST_ASSERT_2(!hcall->ud_expected || res == hcall->expect,
hcall->expect, res);

should be

if (hcall->ud_expected) {
GUEST_ASSERT_2(vector == UD_VECTOR, hcall->control, vector);
} else {
GUEST_ASSERT_2(!vector, hcall->control, vector);
GUEST_ASSERT_2(res == hcall->expect, hcall->expect, res);
}

The reason I bring this up here is because of the reason the test passes: gcc
zeros RAX before the hypercall (not entirely sure why), and so res=0 on #UD due
to nothing changing RAX. But clang doesn't zero RAX and so the test fails due
to RAX holding garbage (probably '1' from the lower 32 bits of HV_X64_MSR_HYPERCALL).

So, what do you think about explicitly setting hv_status, e.g. to -EFAULT, prior
to the hypercall, both to defend against selftest bugs and to verify that _KVM_
actually zeros the result?

2022-09-21 22:34:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 29/39] KVM: selftests: Export _vm_get_page_table_entry()

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> Make it possible for tests to mangle guest's page table entries in
> addition to just getting them (available with vm_get_page_table_entry()).
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> tools/testing/selftests/kvm/include/x86_64/processor.h | 2 ++
> tools/testing/selftests/kvm/lib/x86_64/processor.c | 5 ++---
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 1c7805de8c27..500d711eb989 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -827,6 +827,8 @@ static inline uint8_t wrmsr_safe(uint32_t msr, uint64_t val)
> return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
> }
>
> +uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
> + uint64_t vaddr);
> uint64_t vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
> uint64_t vaddr);
> void vm_set_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 2e6e61bbe81b..5c135f896ada 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -214,9 +214,8 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
> __virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
> }
>
> -static uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm,
> - struct kvm_vcpu *vcpu,
> - uint64_t vaddr)
> +uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
> + uint64_t vaddr)

Ugh, obviously not your fault, but this is a terrible name. Aside from using a
single underscore, it's semantically very different than vm_get_page_table_entry(),
i.e. violates the stand "double underscores is an inner helper".

The innards of vm_{g,s}et_page_table_entry() are quite hilarious too as they cast
a "uint64_t *" to "uint64_t*" now that KVM no longer uses structs to manage PTEs
(commit f18b4aebe107 ("kvm: selftests: do not use bitfields larger than 32-bits
for PTEs")).

And looking at the sole usage in emulator_error_test.c, provide get+set helpers
is silly.

Rather than expose this weirdness, what about slotting in the below to drop the
wrappers and just let tests modify PTEs directly?

---
From: Sean Christopherson <[email protected]>
Date: Wed, 21 Sep 2022 15:08:49 -0700
Subject: [PATCH] KVM: selftests: Drop helpers to read/write page table entries

Drop vm_{g,s}et_page_table_entry() and instead expose the "inner"
helper (was _vm_get_page_table_entry()) that returns a _pointer_ to the
PTE, i.e. let tests directly modify PTEs instead of bouncing through
helpers that just make life difficult.

Opportunsitically use BIT_ULL() in emulator_error_test, and use the
MAXPHYADDR define to set the "rogue" GPA bit instead of open coding the
same value.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 6 ++----
.../selftests/kvm/lib/x86_64/processor.c | 21 ++-----------------
.../kvm/x86_64/emulator_error_test.c | 6 ++++--
3 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 0cbc71b7af50..5999e974a150 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -825,10 +825,8 @@ static inline uint8_t wrmsr_safe(uint32_t msr, uint64_t val)
return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
}

-uint64_t vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
- uint64_t vaddr);
-void vm_set_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
- uint64_t vaddr, uint64_t pte);
+uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
+ uint64_t vaddr);

uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
uint64_t a3);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 2e6e61bbe81b..5e4bbe71dbff 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -214,9 +214,8 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
__virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
}

-static uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm,
- struct kvm_vcpu *vcpu,
- uint64_t vaddr)
+uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
+ uint64_t vaddr)
{
uint16_t index[4];
uint64_t *pml4e, *pdpe, *pde;
@@ -286,22 +285,6 @@ static uint64_t *_vm_get_page_table_entry(struct kvm_vm *vm,
return &pte[index[0]];
}

-uint64_t vm_get_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
- uint64_t vaddr)
-{
- uint64_t *pte = _vm_get_page_table_entry(vm, vcpu, vaddr);
-
- return *(uint64_t *)pte;
-}
-
-void vm_set_page_table_entry(struct kvm_vm *vm, struct kvm_vcpu *vcpu,
- uint64_t vaddr, uint64_t pte)
-{
- uint64_t *new_pte = _vm_get_page_table_entry(vm, vcpu, vaddr);
-
- *(uint64_t *)new_pte = pte;
-}
-
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
{
uint64_t *pml4e, *pml4e_start;
diff --git a/tools/testing/selftests/kvm/x86_64/emulator_error_test.c b/tools/testing/selftests/kvm/x86_64/emulator_error_test.c
index 236e11755ba6..bde247f3c8a1 100644
--- a/tools/testing/selftests/kvm/x86_64/emulator_error_test.c
+++ b/tools/testing/selftests/kvm/x86_64/emulator_error_test.c
@@ -152,8 +152,9 @@ int main(int argc, char *argv[])
{
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
- uint64_t gpa, pte;
+ uint64_t *pte;
uint64_t *hva;
+ uint64_t gpa;
int rc;

/* Tell stdout not to buffer its content */
@@ -178,8 +179,9 @@ int main(int argc, char *argv[])
virt_map(vm, MEM_REGION_GVA, MEM_REGION_GPA, 1);
hva = addr_gpa2hva(vm, MEM_REGION_GPA);
memset(hva, 0, PAGE_SIZE);
+
pte = vm_get_page_table_entry(vm, vcpu, MEM_REGION_GVA);
- vm_set_page_table_entry(vm, vcpu, MEM_REGION_GVA, pte | (1ull << 36));
+ *pte |= BIT_ULL(MAXPHYADDR);

vcpu_run(vcpu);
process_exit_on_emulation_error(vcpu);

base-commit: 3b69d246e2f1eef553508c79f5d3b2dfc4978bc1
--

2022-09-21 22:37:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 20/39] KVM: nVMX: hyper-v: Enable L2 TLB flush

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index 7ad56fbc4b4d..dd1589336e79 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -63,6 +63,15 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs);
> #define EVMCS1_UNSUPPORTED_VMENTRY_CTRL (0)
> #define EVMCS1_UNSUPPORTED_VMFUNC (VMX_VMFUNC_EPTP_SWITCHING)
>
> +/*
> + * Note, Hyper-V isn't actually stealing bit 28 from Intel, just abusing it by
> + * pairing it with architecturally impossible exit reasons. Bit 28 is set only
> + * on SMI exits to a SMI transfer monitor (STM) and if and only if a MTF VM-Exit
> + * is pending. I.e. it will never be set by hardware for non-SMI exits (there
> + * are only three), nor will it ever be set unless the VMM is an STM.
> + */
> +#define HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH 0x10000031

This definition should go into hyperv-tlfs.h since it's take verbatim from the TLFS.

https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/nested-virtualization#synthetic-vm-exit

2022-09-21 22:53:32

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 21/39] KVM: nSVM: hyper-v: Enable L2 TLB flush

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> index dd2e393f84a0..7b01722838bf 100644
> --- a/arch/x86/kvm/svm/hyperv.h
> +++ b/arch/x86/kvm/svm/hyperv.h
> @@ -33,6 +33,9 @@ struct hv_enlightenments {
> */
> #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
>
> +#define HV_SVM_EXITCODE_ENL 0xF0000000
> +#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH (1)

Same as the synthetic VMX exit reason, these should go in hyperv-tlfs.h. Keeping
these out of KVM also helps avoid the need for svm/hyperv.h.

https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/nested-virtualization#synthetic-vm-exit

> +
> static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> @@ -48,6 +51,33 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> hv_vcpu->nested.vp_id = hve->hv_vp_id;
> }
>
> +static inline bool

Strongly prefer 'int' with 0/-errno over a boolean. Hrm, maybe add a prep patch
to convert kvm_hv_get_assist_page() to return 0/-errno? That way this can still
return kvm_hv_get_assist_page() directly.

> nested_svm_hv_update_vp_assist(struct kvm_vcpu *vcpu)

Maybe s/update/verify since there isn't a true update anywhere?

> +{
> + if (!to_hv_vcpu(vcpu))

This check isn't necessary, it's covered by kvm_hv_assist_page_enabled().

> + return true;
> +
> + if (!kvm_hv_assist_page_enabled(vcpu))
> + return true;
> +
> + return kvm_hv_get_assist_page(vcpu);

As mentioned earlier, I think this belongs in arch/x86/kvm/hyperv.h.

2022-09-21 23:35:08

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 35/39] KVM: selftests: Create a vendor independent helper to allocate Hyper-V specific test pages

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> diff --git a/tools/testing/selftests/kvm/include/x86_64/hyperv.h b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> index 42213f5de17f..e00ce9e122f4 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/hyperv.h
> @@ -265,4 +265,19 @@ extern struct hv_vp_assist_page *current_vp_assist;
>
> int enable_vp_assist(uint64_t vp_assist_pa, void *vp_assist);
>
> +struct hyperv_test_pages {
> + /* VP assist page */
> + void *vp_assist_hva;
> + uint64_t vp_assist_gpa;
> + void *vp_assist;
> +
> + /* Enlightened VMCS */
> + void *enlightened_vmcs_hva;
> + uint64_t enlightened_vmcs_gpa;
> + void *enlightened_vmcs;

FYI (in case you or someone else is tempted to do further cleanup), at some point
there will be a patch to wrap these triplets[*] to cut down on the copy+paste.

[*] https://lore.kernel.org/all/[email protected]

> +};
> +
> +struct hyperv_test_pages *
> +vcpu_alloc_hyperv_test_pages(struct kvm_vm *vm, vm_vaddr_t *p_hv_pages_gva);

Please don't wrap before the function name.

2022-09-21 23:36:36

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 30/39] KVM: selftests: Hyper-V PV TLB flush selftest

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> +/* 'Worker' vCPU code checking the contents of the test page */
> +static void worker_guest_code(vm_vaddr_t test_data)
> +{
> + struct test_data *data = (struct test_data *)test_data;
> + u32 vcpu_id = rdmsr(HV_X64_MSR_VP_INDEX);
> + unsigned char chr_exp1, chr_exp2, chr_cur;

Any reason for "unsigned char" over uint8_t?

And the "chr_" prefix is rather weird, IMO it just makes the code harder to read.

Actually, why a single char? E.g. why not do a uint64_t? Oooh, because the
offset is only by vcpu_id, not by vcpu_id * PAGE_SIZE. Maybe add a comment about
that somewhere?

> +
> + x2apic_enable();
> + wrmsr(HV_X64_MSR_GUEST_OS_ID, HYPERV_LINUX_OS_ID);
> +
> + for (;;) {
> + /* Read the expected char, then check what's in the test pages and then
> + * check the expectation again to make sure it wasn't updated in the meantime.

Please wrap at the soft limit.

> + */

Except for apparently networking, kernel preferred style for block comments is:

/*
* This comment is for KVM.
*/

> + chr_exp1 = READ_ONCE(*(unsigned char *)
> + (data->test_pages + PAGE_SIZE * NTEST_PAGES + vcpu_id));

Use a local variable for the pointer, then these line lengths are much more sane.
Hmm, and if you give them descriptive names, I think it will make the code much
easier to follow. E.g. I've been staring at this test for ~10 minutes and am still
not entirely sure what shenanigans are going on.

> + asm volatile("lfence");

The kernel versions of these are provided by tools/arch/x86/include/asm/barrier.h,
which I think is available? I forget if we can use those in the selftests mess.

Regardless, this needs a comment explaining why LFENCE/rmb() is needed, and why
the writer needs MFENCE/mb().

> + chr_cur = *(unsigned char *)data->test_pages;

READ_ONCE()?

> + asm volatile("lfence");
> + chr_exp2 = READ_ONCE(*(unsigned char *)
> + (data->test_pages + PAGE_SIZE * NTEST_PAGES + vcpu_id));
> + if (chr_exp1 && chr_exp1 == chr_exp2)

IIUC, the "chr_exp1 != 0" check is the read side of "0 == disable". Splitting
that out and adding a comment would be helpful.

And if a local variable is used to hold the pointer, there's no need for an "exp2"
variable.

> + GUEST_ASSERT(chr_cur == chr_exp1);
> + asm volatile("nop");

Use cpu_relax(), which KVM selftests provide.

All in all, something like this?

for (;;) {
cpu_relax();

expected = READ_ONCE(*this_vcpu);

/* ??? */
rmb();
val = READ_ONCE(*???);
/* ??? */
rmb();

/*
* '0' indicates the sender is between iterations, wait until
* the sender is ready for this vCPU to start checking again.
*/
if (!expected)
continue;

/*
* Re-read the per-vCPU byte to ensure the sender didn't move
* onto a new iteration.
*/
if (expected != READ_ONCE(*this_vcpu))
continue;

GUEST_ASSERT(val == expected);
}

> + }
> +}
> +
> +/*
> + * Write per-CPU info indicating what each 'worker' CPU is supposed to see in
> + * test page. '0' means don't check.
> + */
> +static void set_expected_char(void *addr, unsigned char chr, int vcpu_id)
> +{
> + asm volatile("mfence");

Why MFENCE?

> + *(unsigned char *)(addr + NTEST_PAGES * PAGE_SIZE + vcpu_id) = chr;
> +}
> +
> +/* Update PTEs swapping two test pages */
> +static void swap_two_test_pages(vm_paddr_t pte_gva1, vm_paddr_t pte_gva2)
> +{
> + uint64_t pte[2];
> +
> + pte[0] = *(uint64_t *)pte_gva1;
> + pte[1] = *(uint64_t *)pte_gva2;
> +
> + *(uint64_t *)pte_gva1 = pte[1];
> + *(uint64_t *)pte_gva2 = pte[0];

xchg()? swap()?

> +}
> +
> +/* Delay */
> +static inline void rep_nop(void)

LOL, rep_nop() is a hilariously confusing function name. "REP NOP" is "PAUSE",
and for whatever reason the kernel proper use rep_nop() as the function name for
the wrapper. My reaction to the MFENCE+rep_nop() below was "how the hell does
MFENCE+PAUSE guarantee a delay?!?".

Anyways, why not do e.g. usleep(1)? And if you really need a udelay() and not a
usleep(), IMO it's worth adding exactly that instead of throwing NOPs at the CPU.
E.g. aarch64 KVM selftests already implements udelay(), so adding an x86 variant
would move us one step closer to being able to use it in common tests.


> +{
> + int i;
> +
> + for (i = 0; i < 1000000; i++)
> + asm volatile("nop");
> +}
> + r = pthread_create(&threads[0], NULL, vcpu_thread, vcpu[1]);
> + TEST_ASSERT(r == 0,

!r is preferred

> + "pthread_create failed errno=%d", errno);

TEST_ASSERT() already captures errno, e.g. these can be:

TEST_ASSERT(!r, "pthread_create() failed");

> +
> + r = pthread_create(&threads[1], NULL, vcpu_thread, vcpu[2]);
> + TEST_ASSERT(r == 0,
> + "pthread_create failed errno=%d", errno);
> +
> + while (true) {
> + r = _vcpu_run(vcpu[0]);
> + exit_reason = vcpu[0]->run->exit_reason;
> +
> + TEST_ASSERT(!r, "vcpu_run failed: %d\n", r);

Pretty sure newlines in asserts aren't necessary, though I forget if they cause
weirdness or just end up being ignored.

> + TEST_ASSERT(exit_reason == KVM_EXIT_IO,
> + "unexpected exit reason: %u (%s)",
> + exit_reason, exit_reason_str(exit_reason));
> +
> + switch (get_ucall(vcpu[0], &uc)) {
> + case UCALL_SYNC:
> + TEST_ASSERT(uc.args[1] == stage,
> + "Unexpected stage: %ld (%d expected)\n",
> + uc.args[1], stage);
> + break;
> + case UCALL_ABORT:
> + TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0],
> + __FILE__, uc.args[1]);

REPORT_GUEST_ASSERT(uc);

2022-09-22 10:14:44

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

Sean Christopherson <[email protected]> writes:

> On Wed, Sep 21, 2022, Sean Christopherson wrote:
>> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
>> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> > index f62d5799fcd7..86504a8bfd9a 100644
>> > --- a/arch/x86/kvm/x86.c
>> > +++ b/arch/x86/kvm/x86.c
>> > @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
>> > */
>> > void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>> > {
>> > - if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
>> > + if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
>> > kvm_vcpu_flush_tlb_current(vcpu);
>> > + kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>>
>> This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
>> mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
>> TLB flush both flesh "guest" TLB entries, i.e. linear and combined mappings.
>>
>> Amusing side topic, apparently I like arm's stage-2 terminology better than "TDP",
>> because I actually typed out "stage-2" first.
>>
>> > + }
>> >
>> > - if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
>> > + if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu)) {
>> > + kvm_vcpu_flush_tlb_guest(vcpu);
>> > + kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>
> Looking at future patches where KVM needs to reset the FIFO when doing a "guest"
> TLB flush, i.e. needs to do more than just clearing the request, what about putting
> this in kvm_vcpu_flush_tlb_guest() right away?

Will do.

>
> Ah, and there's already a second caller to kvm_vcpu_flush_tlb_guest(). I doubt
> KVM's paravirt TLB flush will ever collide with Hyper-V's paravirt TLB flush,
> but logically a "guest" flush that is initiated through KVM's paravirt interface
> should also clear Hyper-V's queue/request.

I ignored this as a case which is not worth optimizing for,
i.e. over-flushing is always correct.

>
> And for consistency, slot this in before this patch:
>

Will do, thanks!

> From: Sean Christopherson <[email protected]>
> Date: Wed, 21 Sep 2022 09:35:34 -0700
> Subject: [PATCH] KVM: x86: Move clearing of TLB_FLUSH_CURRENT to
> kvm_vcpu_flush_tlb_all()
>
> Clear KVM_REQ_TLB_FLUSH_CURRENT in kvm_vcpu_flush_tlb_all() instead of in
> its sole caller that processes KVM_REQ_TLB_FLUSH. Regardless of why/when
> kvm_vcpu_flush_tlb_all() is called, flushing "all" TLB entries also
> flushes "current" TLB entries.
>
> Ideally, there will never be another caller of kvm_vcpu_flush_tlb_all(),
> and moving the handling "requires" extra work to document the ordering
> requirement, but future Hyper-V paravirt TLB flushing support will add
> similar logic for flush "guest" (Hyper-V can flush a subset of "guest"
> entries). And in the Hyper-V case, KVM needs to do more than just clear
> the request, the queue of GPAs to flush also needs to purged, and doing
> all only in the request path is undesirable as kvm_vcpu_flush_tlb_guest()
> does have multiple callers (though it's unlikely KVM's paravirt TLB flush
> will coincide with Hyper-V's paravirt TLB flush).
>
> Move the logic even though it adds extra "work" so that KVM will be
> consistent with how flush requests are processed when the Hyper-V support
> lands.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/x86.c | 15 ++++++++++-----
> 1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f62d5799fcd7..3ea2e51a8cb5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3383,6 +3383,9 @@ static void kvm_vcpu_flush_tlb_all(struct kvm_vcpu *vcpu)
> {
> ++vcpu->stat.tlb_flush;
> static_call(kvm_x86_flush_tlb_all)(vcpu);
> +
> + /* Flushing all ASIDs flushes the current ASID... */
> + kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> }
>
> static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
> @@ -10462,12 +10465,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> kvm_mmu_sync_roots(vcpu);
> if (kvm_check_request(KVM_REQ_LOAD_MMU_PGD, vcpu))
> kvm_mmu_load_pgd(vcpu);
> - if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
> +
> + /*
> + * Note, the order matters here, as flushing "all" TLB entries
> + * also flushes the "current" TLB entries, i.e. servicing the
> + * flush "all" will clear any request to flush "current".
> + */
> + if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
> kvm_vcpu_flush_tlb_all(vcpu);
> -
> - /* Flushing all ASIDs flushes the current ASID... */
> - kvm_clear_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> - }
> kvm_service_local_tlb_flush_requests(vcpu);
>
> if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
>
> base-commit: ed102fe0b59586397b362a849bd7fb32582b77d8

--
Vitaly

2022-09-22 10:15:00

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

Sean Christopherson <[email protected]> writes:

> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index f62d5799fcd7..86504a8bfd9a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3418,11 +3418,17 @@ static inline void kvm_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu)
>> */
>> void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
>> {
>> - if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
>> + if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu)) {
>> kvm_vcpu_flush_tlb_current(vcpu);
>> + kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>
> This isn't correct, flush_tlb_current() flushes "host" TLB entries, i.e. guest-physical
> mappings in Intel terminology, where flush_tlb_guest() and (IIUC) Hyper-V's paravirt
> TLB flush both flesh "guest" TLB entries, i.e. linear and combined
> mappings.

(Honestly, I was waiting for this comment when I first brought this, I
even put it in a separate patch with a provokative "KVM: x86:
KVM_REQ_TLB_FLUSH_CURRENT is a superset of KVM_REQ_HV_TLB_FLUSH too"
name but AFAIR the only comment I got was "please merge with the patch
which clears KVM_REQ_TLB_FLUSH_GUEST" so started thinking this was the
right thing to do :) Jokes aside,

This small optimization was done for nSVM case. When switching from L1
to L2 and vice versa, the code does nested_svm_transition_tlb_flush()
which is

kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);

On AMD, both KVM_REQ_TLB_FLUSH_CURRENT and KVM_REQ_TLB_FLUSH_GUEST are
the same thing (.flush_tlb_current == .flush_tlb_guest ==
svm_flush_tlb_current()) flushing the whole ASID so processing Hyper-V
TLB flush requests is ceratainly redundant.

Now let's get to VMX and the point of my confusion (and thanks in
advance for educating me!):
AFAIU, when EPT is in use:
KVM_REQ_TLB_FLUSH_CURRENT == invept
KVM_REQ_TLB_FLUSH_GUEST = invvpid

For "normal" mappings (which are mapped on both stages) this is the same
thing as they're 'tagged' with both VPID and 'EPT root'. The question is
what's left. Given your comment, do I understand correctly that in case
of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this
will only be tagged with VPID but not with 'EPT root' (as the CPU never
reached to the second translation stage)? We certainly can't ignore
these. Another (probably pure theoretical question) is what are the
mappings which are tagged with 'EPT root' but don't have a VPID tag? Are
these the mapping which happen when e.g. vCPU has paging disabled? These
are probably unrelated to Hyper-V TLB flushing.

To preserve the 'small' optimization, we can probably move
kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);

to nested_svm_transition_tlb_flush() or, in case this sounds too
hackish, we can drop it for now and add it to the (already overfull)
bucket of the "optimize nested_svm_transition_tlb_flush()".

--
Vitaly

2022-09-22 10:44:42

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v10 14/39] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id

Sean Christopherson <[email protected]> writes:

> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
>> Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition
>> assist page address to handle L2 TLB flush requests.
>>
>> Reviewed-by: Maxim Levitsky <[email protected]>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> arch/x86/kvm/svm/hyperv.h | 16 ++++++++++++++++
>> arch/x86/kvm/svm/nested.c | 2 ++
>> 2 files changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
>> index 7d6d97968fb9..8cf702fed7e5 100644
>> --- a/arch/x86/kvm/svm/hyperv.h
>> +++ b/arch/x86/kvm/svm/hyperv.h
>> @@ -9,6 +9,7 @@
>> #include <asm/mshyperv.h>
>>
>> #include "../hyperv.h"
>> +#include "svm.h"
>>
>> /*
>> * Hyper-V uses the software reserved 32 bytes in VMCB
>> @@ -32,4 +33,19 @@ struct hv_enlightenments {
>> */
>> #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
>>
>> +static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
>> +{
>> + struct vcpu_svm *svm = to_svm(vcpu);
>> + struct hv_enlightenments *hve =
>> + (struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
>
> Eww :-)
>
> I posted a small series to fix the casting[*], and as noted in the cover letter it's
> going to conflict mightily. Ignoring merge order for the moment, looking at the
> series as a whole, if the Hyper-V definitions are moved to hyperv-tlfs.h, then I'm
> tempted to say there's no need for svm/hyperv.h.
>
> There should never be users of this stuff outside of svm/nested.c, and IMO there's
> not enough stuff to warrant a separate set of files. nested_svm_hv_update_vp_assist()
> isn't SVM specific and fits better alongside kvm_hv_get_assist_page().
>
> That leaves three functions and ~40 lines of code, which can easily go directly
> into svm/nested.c.
>
> I'm definitely not dead set against having hyperv.{ch}, but unless there's a high
> probability of SVM+Hyper-V getting to eVMCS levels of enlightenment, my vote is
> to put these helpers in svm/nested.c and move then if/when we do end up accumulating
> more SVM+Hyper-V code.

Well, there's more on the TODO list :-) There are even nSVM-only
features like "enlightened TLB" (to split ASID invalidations into two
stages) so I don't want to pollute 'nested.c'. In fact, I was thinking
about renaming vmx/evmcs.{ch} into vmx/hyperv.{ch} as we're doing more
than eVMCS there already. Also, having separate files help with the
newly introduces 'KVM X86 HYPER-V (KVM/hyper-v)' MAINTAINERS entry. Does
this sound like a good enough justification for keeping hyperv.{ch}?

>
> As for merge order, I don't think there's a need for this series to take a
> dependency on the cleanup, especially if these helpers land in nested.c. Fixing
> up the casting and s/hv_enlightenments/hv_vmcb_enlightenments is straightforward.
>
> [*] https://lore.kernel.org/all/[email protected]
>

I'll take a look, thanks!

--
Vitaly

2022-09-22 15:33:01

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote:
> Now let's get to VMX and the point of my confusion (and thanks in
> advance for educating me!):
> AFAIU, when EPT is in use:
> KVM_REQ_TLB_FLUSH_CURRENT == invept
> KVM_REQ_TLB_FLUSH_GUEST = invvpid
>
> For "normal" mappings (which are mapped on both stages) this is the same
> thing as they're 'tagged' with both VPID and 'EPT root'. The question is
> what's left. Given your comment, do I understand correctly that in case
> of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this
> will only be tagged with VPID but not with 'EPT root' (as the CPU never
> reached to the second translation stage)? We certainly can't ignore
> these. Another (probably pure theoretical question) is what are the
> mappings which are tagged with 'EPT root' but don't have a VPID tag?

Intel puts mappings into three categories, which for non-root mode equates to:

linear == GVA => GPA
guest-physical == GPA => HPA
combined == GVA => HPA

and essentially the categories that consume the GVA are tagged with the VPID
(linear and combined), and categories that consume the GPA are tagged with the
EPTP address (guest-physical and combined).

> Are these the mapping which happen when e.g. vCPU has paging disabled?

No, these mappings can be created at all times. Even with CR0.PG=1, the guest
can generate GPAs without going through a GVA=>GPA translation, e.g. the page tables
themselves, RTIT (Intel PT) addresses, etc... And even for combined/full
translations, the CPU can insert TLB entries for just the GPA=>HPA part.

E.g. when a page is allocated by/for userspace, the kernel will zero the page using
the kernel's direct map, but userspace will access the page via a different GVA.
I.e. the guest effectively aliases GPA(x) with GVA(k) and GVA(u). By inserting
the GPA(x) => HPA(y) into the TLB, when guest userspace access GVA(u), the CPU
encounters a TLB miss on GVA(u) => GPA(x), but gets a TLB hit on GPA(x) => HPA(y).

Separating EPT flushes from VPID (and PCID) flushes allows the CPU to retain
the partial TLB entries, e.g. a host change in the EPT tables will result in the
guest-physical and combined mappings being invalidated, but linear mappings can
be kept.

I'm 99% certain AMD also caches partial entries, e.g. see the blurb on INVLPGA
not affecting NPT translations, AMD just doesn't provide a way for the host to
flush _only_ NPT translations. Maybe the performance benefits weren't significant
enough to justify the extra complexity?

> These are probably unrelated to Hyper-V TLB flushing.
>
> To preserve the 'small' optimization, we can probably move
> kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>
> to nested_svm_transition_tlb_flush() or, in case this sounds too
> hackish

Move it to svm_flush_tlb_current(), because the justification is that on SVM,
flushing "current" TLB entries also flushes "guest" TLB entries due to the more
coarse-grained ASID-based TLB flush. E.g.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd599afc85f5..a86b41503723 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3737,6 +3737,13 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ /*
+ * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
+ * A TLB flush for the current ASID flushes both "host" and "guest" TLB
+ * entries, and thus is a superset of Hyper-V's fine grained flushing.
+ */
+ kvm_hv_vcpu_purge_flush_tlb(vcpu);
+
/*
* Flush only the current ASID even if the TLB flush was invoked via
* kvm_flush_remote_tlbs(). Although flushing remote TLBs requires all

> we can drop it for now and add it to the (already overfull)
> bucket of the "optimize nested_svm_transition_tlb_flush()".

I think even long term, purging Hyper-V's FIFO in svm_flush_tlb_current() is the
correct/desired behavior. This doesn't really have anything to do with nSVM,
it's all about SVM not providing a way to flush only NPT entries.

2022-09-22 16:30:46

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 20/39] KVM: nVMX: hyper-v: Enable L2 TLB flush

On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 0634518a6719..1451a7a2c488 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1132,6 +1132,17 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
>
> + /*
> + * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
> + * L2's VP_ID upon request from the guest. Make sure we check for
> + * pending entries for the case when the request got misplaced (e.g.

Kind of a nit, but I'd prefer to avoid "misplaced", as that implies KVM puts entries
into the wrong FIFO. The issue isn't that KVM puts entries in the wrong FIFO,
it's that the FIFO is filled asynchronously be other vCPUs and so it's possible
to switch to a FIFO that has valid entries without a pending request.

And thinking about this, KVM_REQ_HV_TLB_FLUSH shouldn't be handled in
kvm_service_local_tlb_flush_requests(). My initial reaction to this patch is that
queueing the request here is too late because the switch has already happened,
i.e. nVMX has already called kvm_service_local_tlb_flush_requests() and so the
request

But making the request for the _new_ context is correct _and_ necessary, e.g. given

vCPU0 vCPU1
FIFO[L1].insert
FIFO[L1].insert
L1 => L2 transition
FIFO[L1].insert
FIFO[L1].insert
KVM_REQ_HV_TLB_FLUSH

if nVMX made the request for the old contex, then this would happen

vCPU0 vCPU1
FIFO[L1].insert
FIFO[L1].insert
KVM_REQ_HV_TLB_FLUSH
service FIFO[L1]
L1 => L2 transition
FIFO[L1].insert
FIFO[L1].insert
KVM_REQ_HV_TLB_FLUSH
service FIFO[L2]
...
KVM_REQ_HV_TLB_FLUSH
service FIFO[L2]
L2 => L1 transition

Run L1 with FIFO[L1] entries!!!

whereas what is being done in this patch is:


vCPU0 vCPU1
FIFO[L1].insert
FIFO[L1].insert
L1 => L2 transition
KVM_REQ_HV_TLB_FLUSH
service FIFO[2]
FIFO[L1].insert
FIFO[L1].insert
KVM_REQ_HV_TLB_FLUSH
service FIFO[L2]
...
L2 => L1 transition
KVM_REQ_HV_TLB_FLUSH
service FIFO[L1]

which is correct and ensures that KVM will always consume FIFO entries prior to
running the associated context.

In other words, unlike KVM_REQ_TLB_FLUSH_CURRENT and KVM_REQ_TLB_FLUSH_GUEST,
KVM_REQ_HV_TLB_FLUSH is not a "local" request. It's much more like KVM_REQ_TLB_FLUSH
in that it can come from other vCPUs, i.e. is effectively a "remote" request.

So rather than handle KVM_REQ_TLB_FLUSH in the "local" path, it should be handled
only in the request path. Handling the request in kvm_service_local_tlb_flush_requests()
won't break anything, but conceptually it's wrong and as a result it's misleading
because it implies that nested transitions could also be handled by forcing
kvm_service_local_tlb_flush_requests() to service flushes for the current, i.e.
previous, context on nested transitions, but that wouldn't work (see example above).

I.e. we should end up with something like this:

/*
* Note, the order matters here, as flushing "all" TLB entries
* also flushes the "current" TLB entries, and flushing "guest"
* TLB entries is a superset of Hyper-V's fine-grained flushing.
* I.e. servicing the flush "all" will clear any request to
* flush "current", and flushing "guest" will clear any request
* to service Hyper-V's fine-grained flush.
*/
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
kvm_vcpu_flush_tlb_all(vcpu);

kvm_service_local_tlb_flush_requests(vcpu);

/*
* Fall back to a "full" guest flush if Hyper-V's precise
* flushing fails. Note, Hyper-V's flushing is per-vCPU, but
* the flushes are considered "remote" and not "local" because
* the requests can be initiated from other vCPUs.
*/
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu) &&
kvm_hv_vcpu_flush_tlb(vcpu))
kvm_vcpu_flush_tlb_guest(vcpu);



> + * a transition from L2->L1 happened while processing L2 TLB flush
> + * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
> + * anything if there are no requests in the corresponding buffer.
> + */
> + if (to_hv_vcpu(vcpu))

This should be:

if (to_hv_vcpu(vcpu) && enable_ept)

otherwise KVM will fall back to flushing the guest, which is the entire TLB, when
EPT is disabled. I'm guessing this applies to SVM+NPT as well.

> + kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);

2022-09-22 16:34:17

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v10 02/39] KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag

Sean Christopherson <[email protected]> writes:

> On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote:
>> Now let's get to VMX and the point of my confusion (and thanks in
>> advance for educating me!):
>> AFAIU, when EPT is in use:
>> KVM_REQ_TLB_FLUSH_CURRENT == invept
>> KVM_REQ_TLB_FLUSH_GUEST = invvpid
>>
>> For "normal" mappings (which are mapped on both stages) this is the same
>> thing as they're 'tagged' with both VPID and 'EPT root'. The question is
>> what's left. Given your comment, do I understand correctly that in case
>> of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this
>> will only be tagged with VPID but not with 'EPT root' (as the CPU never
>> reached to the second translation stage)? We certainly can't ignore
>> these. Another (probably pure theoretical question) is what are the
>> mappings which are tagged with 'EPT root' but don't have a VPID tag?
>
> Intel puts mappings into three categories, which for non-root mode equates to:
>
> linear == GVA => GPA
> guest-physical == GPA => HPA
> combined == GVA => HPA
>
> and essentially the categories that consume the GVA are tagged with the VPID
> (linear and combined), and categories that consume the GPA are tagged with the
> EPTP address (guest-physical and combined).
>
>> Are these the mapping which happen when e.g. vCPU has paging disabled?
>
> No, these mappings can be created at all times. Even with CR0.PG=1, the guest
> can generate GPAs without going through a GVA=>GPA translation, e.g. the page tables
> themselves, RTIT (Intel PT) addresses, etc... And even for combined/full
> translations, the CPU can insert TLB entries for just the GPA=>HPA part.
>
> E.g. when a page is allocated by/for userspace, the kernel will zero the page using
> the kernel's direct map, but userspace will access the page via a different GVA.
> I.e. the guest effectively aliases GPA(x) with GVA(k) and GVA(u). By inserting
> the GPA(x) => HPA(y) into the TLB, when guest userspace access GVA(u), the CPU
> encounters a TLB miss on GVA(u) => GPA(x), but gets a TLB hit on GPA(x) => HPA(y).
>
> Separating EPT flushes from VPID (and PCID) flushes allows the CPU to retain
> the partial TLB entries, e.g. a host change in the EPT tables will result in the
> guest-physical and combined mappings being invalidated, but linear mappings can
> be kept.
>

Thanks a bunch! For some reason I though it's always the full thing (combined)
which is tagged with both VPID/PCID and EPTP and linear/guest-physical
are just 'corner' cases (but are still combined and tagged). Apparently,
it's not like that.

> I'm 99% certain AMD also caches partial entries, e.g. see the blurb on INVLPGA
> not affecting NPT translations, AMD just doesn't provide a way for the host to
> flush _only_ NPT translations. Maybe the performance benefits weren't significant
> enough to justify the extra complexity?
>
>> These are probably unrelated to Hyper-V TLB flushing.
>>
>> To preserve the 'small' optimization, we can probably move
>> kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
>>
>> to nested_svm_transition_tlb_flush() or, in case this sounds too
>> hackish
>
> Move it to svm_flush_tlb_current(), because the justification is that on SVM,
> flushing "current" TLB entries also flushes "guest" TLB entries due to the more
> coarse-grained ASID-based TLB flush. E.g.
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dd599afc85f5..a86b41503723 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3737,6 +3737,13 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
>
> + /*
> + * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
> + * A TLB flush for the current ASID flushes both "host" and "guest" TLB
> + * entries, and thus is a superset of Hyper-V's fine grained flushing.
> + */
> + kvm_hv_vcpu_purge_flush_tlb(vcpu);
> +
> /*
> * Flush only the current ASID even if the TLB flush was invoked via
> * kvm_flush_remote_tlbs(). Although flushing remote TLBs requires all
>
>> we can drop it for now and add it to the (already overfull)
>> bucket of the "optimize nested_svm_transition_tlb_flush()".
>
> I think even long term, purging Hyper-V's FIFO in svm_flush_tlb_current() is the
> correct/desired behavior. This doesn't really have anything to do with nSVM,
> it's all about SVM not providing a way to flush only NPT entries.

True that, silly me forgot that even without any nesting, Hyper-V TLB
flush after svm_flush_tlb_current() makes no sense.

>

--
Vitaly

2022-09-22 19:58:36

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 14/39] KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id

On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
> > I'm definitely not dead set against having hyperv.{ch}, but unless there's a high
> > probability of SVM+Hyper-V getting to eVMCS levels of enlightenment, my vote is
> > to put these helpers in svm/nested.c and move then if/when we do end up accumulating
> > more SVM+Hyper-V code.
>
> Well, there's more on the TODO list :-) There are even nSVM-only
> features like "enlightened TLB" (to split ASID invalidations into two
> stages) so I don't want to pollute 'nested.c'. In fact, I was thinking
> about renaming vmx/evmcs.{ch} into vmx/hyperv.{ch} as we're doing more
> than eVMCS there already. Also, having separate files help with the
> newly introduces 'KVM X86 HYPER-V (KVM/hyper-v)' MAINTAINERS entry.

Ya, there is that.

> Does this sound like a good enough justification for keeping hyperv.{ch}?

Your call, I'm totally ok either way. If we do add svm/hyperv.{ch}, my vote is
to also rename vmx/evmcs.{ch} as you suggested. I like symmetry :-)

2022-10-03 13:12:49

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v10 30/39] KVM: selftests: Hyper-V PV TLB flush selftest

Sean Christopherson <[email protected]> writes:

> On Wed, Sep 21, 2022, Vitaly Kuznetsov wrote:

...

>> +}
>> +
>> +/* Delay */
>> +static inline void rep_nop(void)
>
> LOL, rep_nop() is a hilariously confusing function name. "REP NOP" is "PAUSE",
> and for whatever reason the kernel proper use rep_nop() as the function name for
> the wrapper. My reaction to the MFENCE+rep_nop() below was "how the hell does
> MFENCE+PAUSE guarantee a delay?!?".

Well, at least you got the joke :-)

>
> Anyways, why not do e.g. usleep(1)?

I was under the impression that all these 'sleep' functions result in a
syscall (and I do see TRIPLE_FAULT when I swap my rep_nop() with usleep())
and here we need to wait in the guest (sender) ...

> And if you really need a udelay() and not a
> usleep(), IMO it's worth adding exactly that instead of throwing NOPs at the CPU.
> E.g. aarch64 KVM selftests already implements udelay(), so adding an x86 variant
> would move us one step closer to being able to use it in common tests.

... so yes, I think we need a delay. The problem with implementing
udelay() is that TSC frequency is unknown. We can get it from kvmclock
but setting up kvmclock pages for all selftests looks like an
overkill. Hyper-V emulation gives us HV_X64_MSR_TSC_FREQUENCY but that's
not generic enough. Alternatively, we can use KVM_GET_TSC_KHZ when
creating a vCPU but we'll need to pass the value to guest code somehow.
AFAIR, we can use CPUID.0x15 and/or MSR_PLATFORM_INFO (0xce) or even
introduce a PV MSR for our purposes -- or am I missing an obvious "easy"
solution?

I'm thinking about being lazy here and implemnting a Hyper-V specific
udelay through HV_X64_MSR_TSC_FREQUENCY (unless you object, of course)
to avoid bloating this series beyond 46 patches it already has.

...

--
Vitaly

2022-10-03 16:01:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 30/39] KVM: selftests: Hyper-V PV TLB flush selftest

On Mon, Oct 03, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
> > Anyways, why not do e.g. usleep(1)?
>
> I was under the impression that all these 'sleep' functions result in a
> syscall (and I do see TRIPLE_FAULT when I swap my rep_nop() with usleep())
> and here we need to wait in the guest (sender) ...

Oh, duh, guest code.

> > And if you really need a udelay() and not a
> > usleep(), IMO it's worth adding exactly that instead of throwing NOPs at the CPU.
> > E.g. aarch64 KVM selftests already implements udelay(), so adding an x86 variant
> > would move us one step closer to being able to use it in common tests.
>
> ... so yes, I think we need a delay. The problem with implementing
> udelay() is that TSC frequency is unknown. We can get it from kvmclock
> but setting up kvmclock pages for all selftests looks like an
> overkill. Hyper-V emulation gives us HV_X64_MSR_TSC_FREQUENCY but that's
> not generic enough. Alternatively, we can use KVM_GET_TSC_KHZ when
> creating a vCPU but we'll need to pass the value to guest code somehow.
> AFAIR, we can use CPUID.0x15 and/or MSR_PLATFORM_INFO (0xce) or even
> introduce a PV MSR for our purposes -- or am I missing an obvious "easy"
> solution?

I don't think you're missing anything. Getting the value into the guest is the
biggest issue.

Vishal is solving a similar problem where the guest needs to know the "native"
hypercall. We can piggyback that hook to do KVM_GET_TSC_KHZ there during VM
creation, and then simply define udelay()'s behavior to always operate on the
"default" frequency. I.e. if a test wants to change the frequency _and_ use
udelay() _and_ cares about the precision of udelay(), then that test can go write
its own code.

> I'm thinking about being lazy here and implemnting a Hyper-V specific
> udelay through HV_X64_MSR_TSC_FREQUENCY (unless you object, of course)
> to avoid bloating this series beyond 46 patches it already has.

I'm totally fine being even lazier here and just using a loop of nops, but with a
different function name and a TODO (I completely forgot this was guest code when
making the usleep() suggestion). Then we can clean up the TODO via udelay() in a
follow-up series.

2022-10-03 16:39:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v10 30/39] KVM: selftests: Hyper-V PV TLB flush selftest

On Mon, Oct 03, 2022, Sean Christopherson wrote:
> On Mon, Oct 03, 2022, Vitaly Kuznetsov wrote:
> > Sean Christopherson <[email protected]> writes:
> > > And if you really need a udelay() and not a
> > > usleep(), IMO it's worth adding exactly that instead of throwing NOPs at the CPU.
> > > E.g. aarch64 KVM selftests already implements udelay(), so adding an x86 variant
> > > would move us one step closer to being able to use it in common tests.
> >
> > ... so yes, I think we need a delay. The problem with implementing
> > udelay() is that TSC frequency is unknown. We can get it from kvmclock
> > but setting up kvmclock pages for all selftests looks like an
> > overkill. Hyper-V emulation gives us HV_X64_MSR_TSC_FREQUENCY but that's
> > not generic enough. Alternatively, we can use KVM_GET_TSC_KHZ when
> > creating a vCPU but we'll need to pass the value to guest code somehow.
> > AFAIR, we can use CPUID.0x15 and/or MSR_PLATFORM_INFO (0xce) or even
> > introduce a PV MSR for our purposes -- or am I missing an obvious "easy"
> > solution?
>
> I don't think you're missing anything. Getting the value into the guest is the
> biggest issue.
>
> Vishal is solving a similar problem where the guest needs to know the "native"
> hypercall. We can piggyback that hook to do KVM_GET_TSC_KHZ there during VM
> creation, and then simply define udelay()'s behavior to always operate on the
> "default" frequency. I.e. if a test wants to change the frequency _and_ use
> udelay() _and_ cares about the precision of udelay(), then that test can go write
> its own code.

Forgot to connect the dots: https://lore.kernel.org/all/[email protected]