Changes since v1:
To address Sean's review comments:
- s,Direct,L2, everywhere.
- s,tlbflush,tlb_flush, everywhere.
- "KVM: x86: hyper-v: Add helper to read hypercall data for array" patch
added.
- "x86/hyperv: Introduce HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK
constants" patch added.
- "KVM: x86: hyper-v: Use HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK
instead of raw '64'" patch added.
- Other code improvements.
Other changes:
- Rebase to the latest kvm/queue.
- "KVM: selftests: add hyperv_svm_test to .gitignore" patch dropped
(already fixed).
- "KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'"
patch added.
- Fix a race in the newly introduced Hyper-V IPI test.
Original description:
Currently, KVM handles HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests
by flushing the whole VPID and this is sub-optimal. This series introduces
the required mechanism to make handling of these requests more
fine-grained by flushing individual GVAs only (when requested). On this
foundation, "Direct Virtual Flush" Hyper-V feature is implemented. The
feature allows L0 to handle Hyper-V TLB flush hypercalls directly at
L0 without the need to reflect the exit to L1. This has at least two
benefits: reflecting vmexit and the consequent vmenter are avoided + L0
has precise information whether the target vCPU is actually running (and
thus requires a kick).
Sean Christopherson (1):
KVM: x86: hyper-v: Add helper to read hypercall data for array
Vitaly Kuznetsov (33):
KVM: x86: hyper-v: Resurrect dedicated KVM_REQ_HV_TLB_FLUSH flag
KVM: x86: hyper-v: Introduce TLB flush ring
KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls
gently
KVM: x86: hyper-v: Expose support for extended gva ranges for flush
hypercalls
KVM: x86: Prepare kvm_hv_flush_tlb() to handle L2's GPAs
x86/hyperv: Introduce
HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK constants
KVM: x86: hyper-v: Use
HV_MAX_SPARSE_VCPU_BANKS/HV_VCPUS_PER_SPARSE_BANK instead of raw
'64'
KVM: x86: hyper-v: Don't use sparse_set_to_vcpu_mask() in
kvm_hv_send_ipi()
KVM: x86: hyper-v: Create a separate ring for L2 TLB flush
KVM: x86: hyper-v: Use preallocated buffer in 'struct kvm_vcpu_hv'
instead of on-stack 'sparse_banks'
KVM: nVMX: Keep track of hv_vm_id/hv_vp_id when eVMCS is in use
KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id
KVM: x86: Introduce .post_hv_l2_tlb_flush() nested hook
KVM: x86: hyper-v: Introduce kvm_hv_is_tlb_flush_hcall()
KVM: x86: hyper-v: L2 TLB flush
KVM: x86: hyper-v: Introduce fast kvm_hv_l2_tlb_flush_exposed() check
x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
KVM: nVMX: hyper-v: Enable L2 TLB flush
KVM: x86: KVM_REQ_TLB_FLUSH_CURRENT is a superset of
KVM_REQ_HV_TLB_FLUSH too
KVM: nSVM: hyper-v: Enable L2 TLB flush
KVM: x86: Expose Hyper-V L2 TLB flush feature
KVM: selftests: Better XMM read/write helpers
KVM: selftests: Hyper-V PV IPI selftest
KVM: selftests: Make it possible to replace PTEs with __virt_pg_map()
KVM: selftests: Hyper-V PV TLB flush selftest
KVM: selftests: Sync 'struct hv_enlightened_vmcs' definition with
hyperv-tlfs.h
KVM: selftests: nVMX: Allocate Hyper-V partition assist page
KVM: selftests: nSVM: Allocate Hyper-V partition assist and VP assist
pages
KVM: selftests: Sync 'struct hv_vp_assist_page' definition with
hyperv-tlfs.h
KVM: selftests: evmcs_test: Introduce L2 TLB flush test
KVM: selftests: Move Hyper-V VP assist page enablement out of evmcs.h
KVM: selftests: hyperv_svm_test: Introduce L2 TLB flush test
KVM: x86: Rename 'enable_direct_tlbflush' to 'enable_l2_tlb_flush'
arch/x86/include/asm/hyperv-tlfs.h | 6 +-
arch/x86/include/asm/kvm-x86-ops.h | 2 +-
arch/x86/include/asm/kvm_host.h | 37 +-
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/hyperv.c | 376 ++++++++--
arch/x86/kvm/hyperv.h | 48 ++
arch/x86/kvm/svm/hyperv.c | 18 +
arch/x86/kvm/svm/hyperv.h | 37 +
arch/x86/kvm/svm/nested.c | 25 +-
arch/x86/kvm/svm/svm_onhyperv.c | 2 +-
arch/x86/kvm/svm/svm_onhyperv.h | 6 +-
arch/x86/kvm/trace.h | 21 +-
arch/x86/kvm/vmx/evmcs.c | 24 +
arch/x86/kvm/vmx/evmcs.h | 11 +
arch/x86/kvm/vmx/nested.c | 32 +
arch/x86/kvm/vmx/vmx.c | 6 +-
arch/x86/kvm/x86.c | 20 +-
arch/x86/kvm/x86.h | 1 +
include/asm-generic/hyperv-tlfs.h | 5 +
include/asm-generic/mshyperv.h | 11 +-
tools/testing/selftests/kvm/.gitignore | 2 +
tools/testing/selftests/kvm/Makefile | 4 +-
.../selftests/kvm/include/x86_64/evmcs.h | 40 +-
.../selftests/kvm/include/x86_64/hyperv.h | 35 +
.../selftests/kvm/include/x86_64/processor.h | 72 +-
.../selftests/kvm/include/x86_64/svm_util.h | 10 +
.../selftests/kvm/include/x86_64/vmx.h | 4 +
.../testing/selftests/kvm/lib/x86_64/hyperv.c | 21 +
.../selftests/kvm/lib/x86_64/processor.c | 6 +-
tools/testing/selftests/kvm/lib/x86_64/svm.c | 10 +
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 7 +
.../selftests/kvm/max_guest_memory_test.c | 2 +-
.../testing/selftests/kvm/x86_64/evmcs_test.c | 53 +-
.../selftests/kvm/x86_64/hyperv_features.c | 5 +-
.../testing/selftests/kvm/x86_64/hyperv_ipi.c | 374 ++++++++++
.../selftests/kvm/x86_64/hyperv_svm_test.c | 60 +-
.../selftests/kvm/x86_64/hyperv_tlb_flush.c | 647 ++++++++++++++++++
.../selftests/kvm/x86_64/mmu_role_test.c | 2 +-
38 files changed, 1883 insertions(+), 162 deletions(-)
create mode 100644 arch/x86/kvm/svm/hyperv.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/hyperv.c
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
--
2.35.1
Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
from L2 in L0 to process L2 TLB flush requests.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/hyperv.c | 6 ++++++
arch/x86/kvm/hyperv.h | 7 +++++++
3 files changed, 14 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ce62fde5f4ff..168600490bd1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
+ u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
} cpuid_cache;
struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 79aabe0c33ec..68a0df4e3f66 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
else
hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
+
+ entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
+ if (entry)
+ hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
+ else
+ hv_vcpu->cpuid_cache.nested_features_eax = 0;
}
int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index f593c9fd1dee..d8cb6d70dbc8 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
}
+static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
+}
+
static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
{
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
--
2.35.1
Extended GVA ranges support bit seems to indicate whether lower 12
bits of GVA can be used to specify up to 4095 additional consequent
GVAs to flush. This is somewhat described in TLFS.
Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
requests by flushing the whole VPID so technically, extended GVA
ranges were already supported. As such requests are handled more
gently now, advertizing support for extended ranges starts making
sense to reduce the size of TLB flush requests.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/hyperv-tlfs.h | 2 ++
arch/x86/kvm/hyperv.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 0a9407dc0859..5225a85c08c3 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -61,6 +61,8 @@
#define HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE BIT(10)
/* Support for debug MSRs available */
#define HV_FEATURE_DEBUG_MSRS_AVAILABLE BIT(11)
+/* Support for extended gva ranges for flush hypercalls available */
+#define HV_FEATURE_EXT_GVA_RANGES_FLUSH BIT(14)
/*
* Support for returning hypercall output block via XMM
* registers is available
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 759e1a16e5c3..1a6f9628cee9 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2702,6 +2702,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_DEBUGGING;
ent->edx |= HV_X64_GUEST_DEBUGGING_AVAILABLE;
ent->edx |= HV_FEATURE_DEBUG_MSRS_AVAILABLE;
+ ent->edx |= HV_FEATURE_EXT_GVA_RANGES_FLUSH;
/*
* Direct Synthetic timers only make sense with in-kernel
--
2.35.1
Section 1.9 of TLFS v6.0b says:
"All structures are padded in such a way that fields are aligned
naturally (that is, an 8-byte field is aligned to an offset of 8 bytes
and so on)".
'struct enlightened_vmcs' has a glitch:
...
struct {
u32 nested_flush_hypercall:1; /* 836: 0 4 */
u32 msr_bitmap:1; /* 836: 1 4 */
u32 reserved:30; /* 836: 2 4 */
} hv_enlightenments_control; /* 836 4 */
u32 hv_vp_id; /* 840 4 */
u64 hv_vm_id; /* 844 8 */
u64 partition_assist_page; /* 852 8 */
...
And the observed values in 'partition_assist_page' make no sense at
all. Fix the layout by padding the structure properly.
Fixes: 68d1eb72ee99 ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits")
Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/hyperv-tlfs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 5225a85c08c3..e7ddae8e02c6 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -548,7 +548,7 @@ struct hv_enlightened_vmcs {
u64 guest_rip;
u32 hv_clean_fields;
- u32 hv_padding_32;
+ u32 padding32_1;
u32 hv_synthetic_controls;
struct {
u32 nested_flush_hypercall:1;
@@ -556,7 +556,7 @@ struct hv_enlightened_vmcs {
u32 reserved:30;
} __packed hv_enlightenments_control;
u32 hv_vp_id;
-
+ u32 padding32_2;
u64 hv_vm_id;
u64 partition_assist_page;
u64 padding64_4[4];
--
2.35.1
set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
from 128-bit registers. Moreover, these helpers are not used. Borrow
_kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
XMM0-XMM8 for now.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 70 ++++++++++---------
1 file changed, 36 insertions(+), 34 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 37db341d4cc5..9ad7602a257b 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -296,71 +296,73 @@ static inline void cpuid(uint32_t *eax, uint32_t *ebx,
: "memory");
}
-#define SET_XMM(__var, __xmm) \
- asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
+typedef u32 __attribute__((vector_size(16))) sse128_t;
+#define __sse128_u union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; }
+#define sse128_lo(x) ({ __sse128_u t; t.vec = x; t.as_u64[0]; })
+#define sse128_hi(x) ({ __sse128_u t; t.vec = x; t.as_u64[1]; })
-static inline void set_xmm(int n, unsigned long val)
+static inline void read_sse_reg(int reg, sse128_t *data)
{
- switch (n) {
+ switch (reg) {
case 0:
- SET_XMM(val, xmm0);
+ asm("movdqa %%xmm0, %0" : "=m"(*data));
break;
case 1:
- SET_XMM(val, xmm1);
+ asm("movdqa %%xmm1, %0" : "=m"(*data));
break;
case 2:
- SET_XMM(val, xmm2);
+ asm("movdqa %%xmm2, %0" : "=m"(*data));
break;
case 3:
- SET_XMM(val, xmm3);
+ asm("movdqa %%xmm3, %0" : "=m"(*data));
break;
case 4:
- SET_XMM(val, xmm4);
+ asm("movdqa %%xmm4, %0" : "=m"(*data));
break;
case 5:
- SET_XMM(val, xmm5);
+ asm("movdqa %%xmm5, %0" : "=m"(*data));
break;
case 6:
- SET_XMM(val, xmm6);
+ asm("movdqa %%xmm6, %0" : "=m"(*data));
break;
case 7:
- SET_XMM(val, xmm7);
+ asm("movdqa %%xmm7, %0" : "=m"(*data));
break;
+ default:
+ BUG();
}
}
-#define GET_XMM(__xmm) \
-({ \
- unsigned long __val; \
- asm volatile("movq %%"#__xmm", %0" : "=r"(__val)); \
- __val; \
-})
-
-static inline unsigned long get_xmm(int n)
+static inline void write_sse_reg(int reg, const sse128_t *data)
{
- assert(n >= 0 && n <= 7);
-
- switch (n) {
+ switch (reg) {
case 0:
- return GET_XMM(xmm0);
+ asm("movdqa %0, %%xmm0" : : "m"(*data));
+ break;
case 1:
- return GET_XMM(xmm1);
+ asm("movdqa %0, %%xmm1" : : "m"(*data));
+ break;
case 2:
- return GET_XMM(xmm2);
+ asm("movdqa %0, %%xmm2" : : "m"(*data));
+ break;
case 3:
- return GET_XMM(xmm3);
+ asm("movdqa %0, %%xmm3" : : "m"(*data));
+ break;
case 4:
- return GET_XMM(xmm4);
+ asm("movdqa %0, %%xmm4" : : "m"(*data));
+ break;
case 5:
- return GET_XMM(xmm5);
+ asm("movdqa %0, %%xmm5" : : "m"(*data));
+ break;
case 6:
- return GET_XMM(xmm6);
+ asm("movdqa %0, %%xmm6" : : "m"(*data));
+ break;
case 7:
- return GET_XMM(xmm7);
+ asm("movdqa %0, %%xmm7" : : "m"(*data));
+ break;
+ default:
+ BUG();
}
-
- /* never reached */
- return 0;
}
static inline void cpu_relax(void)
--
2.35.1
In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate
so-called Partition assist page and link it to 'struct vmx_pages'.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
tools/testing/selftests/kvm/include/x86_64/vmx.h | 4 ++++
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 7 +++++++
2 files changed, 11 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 583ceb0d1457..f99922ca8259 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -567,6 +567,10 @@ struct vmx_pages {
uint64_t enlightened_vmcs_gpa;
void *enlightened_vmcs;
+ void *partition_assist_hva;
+ uint64_t partition_assist_gpa;
+ void *partition_assist;
+
void *eptp_hva;
uint64_t eptp_gpa;
void *eptp;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index d089d8b850b5..3db21e0e1a8f 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -124,6 +124,13 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
vmx->enlightened_vmcs_gpa =
addr_gva2gpa(vm, (uintptr_t)vmx->enlightened_vmcs);
+ /* Setup of a region of guest memory for the partition assist page. */
+ vmx->partition_assist = (void *)vm_vaddr_alloc_page(vm);
+ vmx->partition_assist_hva =
+ addr_gva2hva(vm, (uintptr_t)vmx->partition_assist);
+ vmx->partition_assist_gpa =
+ addr_gva2gpa(vm, (uintptr_t)vmx->partition_assist);
+
*p_vmx_gva = vmx_gva;
return vmx;
}
--
2.35.1
Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled
both in extended 'nested controls' in VMCB and partition assist page.
According to Hyper-V TLFS, synthetic vmexit to L1 is performed with
- HV_SVM_EXITCODE_ENL exit_code.
- HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/svm/hyperv.c | 7 +++++++
arch/x86/kvm/svm/hyperv.h | 19 +++++++++++++++++++
arch/x86/kvm/svm/nested.c | 22 +++++++++++++++++++++-
3 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
index c0749fc282fe..3842548bb88c 100644
--- a/arch/x86/kvm/svm/hyperv.c
+++ b/arch/x86/kvm/svm/hyperv.c
@@ -8,4 +8,11 @@
void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->vmcb->control.exit_code = HV_SVM_EXITCODE_ENL;
+ svm->vmcb->control.exit_code_hi = 0;
+ svm->vmcb->control.exit_info_1 = HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH;
+ svm->vmcb->control.exit_info_2 = 0;
+ nested_svm_vmexit(svm);
}
diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
index a2b0d7580b0d..cd33e89f9f61 100644
--- a/arch/x86/kvm/svm/hyperv.h
+++ b/arch/x86/kvm/svm/hyperv.h
@@ -33,6 +33,9 @@ struct hv_enlightenments {
*/
#define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
+#define HV_SVM_EXITCODE_ENL 0xF0000000
+#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH (1)
+
static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -48,6 +51,22 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
hv_vcpu->nested.vp_id = hve->hv_vp_id;
}
+static inline bool nested_svm_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct hv_enlightenments *hve =
+ (struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
+ struct hv_vp_assist_page assist_page;
+
+ if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
+ return false;
+
+ if (!hve->hv_enlightenments_control.nested_flush_hypercall)
+ return false;
+
+ return assist_page.nested_control.features.directhypercall;
+}
+
void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
#endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de3f27301b5c..a6d9807c09b1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -172,7 +172,8 @@ void recalc_intercepts(struct vcpu_svm *svm)
}
/* We don't want to see VMMCALLs from a nested guest */
- vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
+ if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
+ vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
for (i = 0; i < MAX_INTERCEPT; i++)
c->intercepts[i] |= g->intercepts[i];
@@ -488,6 +489,17 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
{
+ /*
+ * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
+ * L2's VP_ID upon request from the guest. Make sure we check for
+ * pending entries for the case when the request got misplaced (e.g.
+ * a transition from L2->L1 happened while processing L2 TLB flush
+ * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
+ * anything if there are no requests in the corresponding buffer.
+ */
+ if (to_hv_vcpu(vcpu))
+ kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
+
/*
* TODO: optimize unconditional TLB flush/MMU sync. A partial list of
* things to fix before this can be conditional:
@@ -1357,6 +1369,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
int nested_svm_exit_special(struct vcpu_svm *svm)
{
u32 exit_code = svm->vmcb->control.exit_code;
+ struct kvm_vcpu *vcpu = &svm->vcpu;
switch (exit_code) {
case SVM_EXIT_INTR:
@@ -1375,6 +1388,13 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
return NESTED_EXIT_HOST;
break;
}
+ case SVM_EXIT_VMMCALL:
+ /* Hyper-V L2 TLB flush hypercall is handled by L0 */
+ if (kvm_hv_l2_tlb_flush_exposed(vcpu) &&
+ nested_svm_l2_tlb_flush_enabled(vcpu) &&
+ kvm_hv_is_tlb_flush_hcall(vcpu))
+ return NESTED_EXIT_HOST;
+ break;
default:
break;
}
--
2.35.1
To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/
VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is
also needed to handle post-flush exit to L1 upon request.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/vmx/nested.c | 15 +++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 837c07e213de..8b2a52bf26c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -622,6 +622,12 @@ struct kvm_vcpu_hv {
/* Preallocated buffer for handling hypercalls passing sparse vCPU set */
u64 sparse_banks[64];
+
+ struct {
+ u64 pa_page_gpa;
+ u64 vm_id;
+ u32 vp_id;
+ } nested;
};
/* Xen HVM per vcpu emulation context */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a6688663da4d..ee88921c6156 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -225,6 +225,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
struct vcpu_vmx *vmx = to_vmx(vcpu);
if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) {
@@ -233,6 +234,12 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
}
vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
+
+ if (hv_vcpu) {
+ hv_vcpu->nested.pa_page_gpa = INVALID_GPA;
+ hv_vcpu->nested.vm_id = 0;
+ hv_vcpu->nested.vp_id = 0;
+ }
}
static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
@@ -1591,11 +1598,19 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
{
struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12;
struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(&vmx->vcpu);
/* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */
vmcs12->tpr_threshold = evmcs->tpr_threshold;
vmcs12->guest_rip = evmcs->guest_rip;
+ if (unlikely(!(hv_clean_fields &
+ HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL))) {
+ hv_vcpu->nested.pa_page_gpa = evmcs->partition_assist_page;
+ hv_vcpu->nested.vm_id = evmcs->hv_vm_id;
+ hv_vcpu->nested.vp_id = evmcs->hv_vp_id;
+ }
+
if (unlikely(!(hv_clean_fields &
HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC))) {
vmcs12->guest_rsp = evmcs->guest_rsp;
--
2.35.1
From: Sean Christopherson <[email protected]>
Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
reading a guest-provided array can be reused in the future, e.g. for
getting a list of virtual addresses whose TLB entries need to be flushed.
Opportunisticaly swap the order of the data and XMM adjustment so that
the XMM/gpa offsets are bundled together.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 53 +++++++++++++++++++++++++++----------------
1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index fb716cf919ed..d66c27fd1e8a 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1758,38 +1758,51 @@ struct kvm_hv_hcall {
sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
};
-static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
- int consumed_xmm_halves,
- u64 *sparse_banks, gpa_t offset)
-{
- u16 var_cnt;
- int i;
- if (hc->var_cnt > 64)
- return -EINVAL;
-
- /* Ignore banks that cannot possibly contain a legal VP index. */
- var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS);
+static int kvm_hv_get_hc_data(struct kvm *kvm, struct kvm_hv_hcall *hc,
+ u16 orig_cnt, u16 cnt_cap, u64 *data,
+ int consumed_xmm_halves, gpa_t offset)
+{
+ /*
+ * Preserve the original count when ignoring entries via a "cap", KVM
+ * still needs to validate the guest input (though the non-XMM path
+ * punts on the checks).
+ */
+ u16 cnt = min(orig_cnt, cnt_cap);
+ int i, j;
if (hc->fast) {
/*
* Each XMM holds two sparse banks, but do not count halves that
* have already been consumed for hypercall parameters.
*/
- if (hc->var_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
+ if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
return HV_STATUS_INVALID_HYPERCALL_INPUT;
- for (i = 0; i < var_cnt; i++) {
- int j = i + consumed_xmm_halves;
+
+ for (i = 0; i < cnt; i++) {
+ j = i + consumed_xmm_halves;
if (j % 2)
- sparse_banks[i] = sse128_hi(hc->xmm[j / 2]);
+ data[i] = sse128_hi(hc->xmm[j / 2]);
else
- sparse_banks[i] = sse128_lo(hc->xmm[j / 2]);
+ data[i] = sse128_lo(hc->xmm[j / 2]);
}
return 0;
}
- return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks,
- var_cnt * sizeof(*sparse_banks));
+ return kvm_read_guest(kvm, hc->ingpa + offset, data,
+ cnt * sizeof(*data));
+}
+
+static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
+ u64 *sparse_banks, int consumed_xmm_halves,
+ gpa_t offset)
+{
+ if (hc->var_cnt > 64)
+ return -EINVAL;
+
+ /* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
+ return kvm_hv_get_hc_data(kvm, hc, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS,
+ sparse_banks, consumed_xmm_halves, offset);
}
static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
@@ -1937,7 +1950,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
if (!hc->var_cnt)
goto ret_success;
- if (kvm_get_sparse_vp_set(kvm, hc, 2, sparse_banks,
+ if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
offsetof(struct hv_tlb_flush_ex,
hv_vp_set.bank_contents)))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
@@ -2048,7 +2061,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
if (!hc->var_cnt)
goto ret_success;
- if (kvm_get_sparse_vp_set(kvm, hc, 1, sparse_banks,
+ if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 1,
offsetof(struct hv_send_ipi_ex,
vp_set.bank_contents)))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
--
2.35.1
It may not come clear from where the magical '64' value used in
__cpumask_to_vpset() come from. Moreover, '64' means both the maximum
sparse bank number as well as the number of vCPUs per bank. Add defines
to make things clear. These defines are also going to be used by KVM.
No functional change.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
include/asm-generic/hyperv-tlfs.h | 5 +++++
include/asm-generic/mshyperv.h | 11 ++++++-----
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index fdce7a4cfc6f..020ca9bdbb79 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -399,6 +399,11 @@ struct hv_vpset {
u64 bank_contents[];
} __packed;
+/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
+#define HV_MAX_SPARSE_VCPU_BANKS (64)
+/* The number of vCPUs in one sparse bank */
+#define HV_VCPUS_PER_SPARSE_BANK (64)
+
/* HvCallSendSyntheticClusterIpi hypercall */
struct hv_send_ipi {
u32 vector;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c08758b6b364..0abe91df1ef6 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
{
int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
int this_cpu = smp_processor_id();
+ int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
- /* valid_bank_mask can represent up to 64 banks */
- if (hv_max_vp_index / 64 >= 64)
+ /* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
+ if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
return 0;
/*
@@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
* structs are not cleared between calls, we risk flushing unneeded
* vCPUs otherwise.
*/
- for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
+ for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
vpset->bank_contents[vcpu_bank] = 0;
/*
@@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
vcpu = hv_cpu_number_to_vp_number(cpu);
if (vcpu == VP_INVAL)
return -1;
- vcpu_bank = vcpu / 64;
- vcpu_offset = vcpu % 64;
+ vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
+ vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
__set_bit(vcpu_offset, (unsigned long *)
&vpset->bank_contents[vcpu_bank]);
if (vcpu_bank >= nr_bank)
--
2.35.1
Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
for a smaller number of vCPUs in the request. When Hyper-V TLB flush
is in use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
send IPIs to a large number of vCPUs (and are rarely used in general).
Introduce hv_is_vp_in_sparse_set() to directly check if the specified
VP_ID is present in sparse vCPU set.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/hyperv.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 3cf68645a2e6..aebbb598ad1d 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1746,6 +1746,25 @@ static void sparse_set_to_vcpu_mask(struct kvm *kvm, u64 *sparse_banks,
}
}
+static bool hv_is_vp_in_sparse_set(u32 vp_id, u64 valid_bank_mask, u64 sparse_banks[])
+{
+ int bank, sbank = 0;
+
+ if (!test_bit(vp_id / HV_VCPUS_PER_SPARSE_BANK,
+ (unsigned long *)&valid_bank_mask))
+ return false;
+
+ for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
+ KVM_HV_MAX_SPARSE_VCPU_SET_BITS) {
+ if (bank == vp_id / HV_VCPUS_PER_SPARSE_BANK)
+ break;
+ sbank++;
+ }
+
+ return test_bit(vp_id % HV_VCPUS_PER_SPARSE_BANK,
+ (unsigned long *)&sparse_banks[sbank]);
+}
+
struct kvm_hv_hcall {
u64 param;
u64 ingpa;
@@ -2089,8 +2108,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
((u64)hc->rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET);
}
-static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
- unsigned long *vcpu_bitmap)
+static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
+ u64 *sparse_banks, u64 valid_bank_mask)
{
struct kvm_lapic_irq irq = {
.delivery_mode = APIC_DM_FIXED,
@@ -2100,7 +2119,10 @@ static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
unsigned long i;
kvm_for_each_vcpu(i, vcpu, kvm) {
- if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
+ if (sparse_banks &&
+ !hv_is_vp_in_sparse_set(kvm_hv_get_vpindex(vcpu),
+ valid_bank_mask,
+ sparse_banks))
continue;
/* We fail only when APIC is disabled */
@@ -2113,7 +2135,6 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
struct kvm *kvm = vcpu->kvm;
struct hv_send_ipi_ex send_ipi_ex;
struct hv_send_ipi send_ipi;
- DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
unsigned long valid_bank_mask;
u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
u32 vector;
@@ -2175,13 +2196,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
- if (all_cpus) {
- kvm_send_ipi_to_many(kvm, vector, NULL);
- } else {
- sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
-
- kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
- }
+ kvm_hv_send_ipi_to_many(kvm, vector, all_cpus ? NULL : sparse_banks, valid_bank_mask);
ret_success:
return HV_STATUS_SUCCESS;
--
2.35.1
From: Wei Liu <[email protected]> Sent: Monday, April 25, 2022 8:47 AM
> On Thu, Apr 14, 2022 at 03:19:46PM +0200, Vitaly Kuznetsov wrote:
> > It may not come clear from where the magical '64' value used in
> > __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> > sparse bank number as well as the number of vCPUs per bank. Add defines
> > to make things clear. These defines are also going to be used by KVM.
> >
> > No functional change.
> >
> > Signed-off-by: Vitaly Kuznetsov <[email protected]>
> > ---
> > include/asm-generic/hyperv-tlfs.h | 5 +++++
> > include/asm-generic/mshyperv.h | 11 ++++++-----
> > 2 files changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> > index fdce7a4cfc6f..020ca9bdbb79 100644
> > --- a/include/asm-generic/hyperv-tlfs.h
> > +++ b/include/asm-generic/hyperv-tlfs.h
> > @@ -399,6 +399,11 @@ struct hv_vpset {
> > u64 bank_contents[];
> > } __packed;
> >
> > +/* The maximum number of sparse vCPU banks which can be encoded by 'struct
> hv_vpset' */
> > +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> > +/* The number of vCPUs in one sparse bank */
> > +#define HV_VCPUS_PER_SPARSE_BANK (64)
>
> I think replacing the magic number with a macro is a good thing.
>
> Where do you get these names? Did you make them up yourself?
>
> I'm trying to dig into internal code to find the most appropriate names,
> but I couldn't find any so far. Michael, do you have insight here?
>
> Thanks,
> Wei.
These names look good to me. The "sparse" and "bank" terminology
comes from the Hyper-V TLFS, sections 7.8.7.3 thru 7.8.7.5. The TLFS
uses the constant "64", but for two different purposes as Vitaly
points out. But in both cases the "64" accrues from the use of
a uint64 value as a bitmap.
Michael
On Thu, Apr 14, 2022 at 03:19:46PM +0200, Vitaly Kuznetsov wrote:
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
>
> No functional change.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> include/asm-generic/hyperv-tlfs.h | 5 +++++
> include/asm-generic/mshyperv.h | 11 ++++++-----
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
> u64 bank_contents[];
> } __packed;
>
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)
I think replacing the magic number with a macro is a good thing.
Where do you get these names? Did you make them up yourself?
I'm trying to dig into internal code to find the most appropriate names,
but I couldn't find any so far. Michael, do you have insight here?
Thanks,
Wei.
Le 14/04/2022 à 15:19, Vitaly Kuznetsov a écrit :
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
>
> No functional change.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> include/asm-generic/hyperv-tlfs.h | 5 +++++
> include/asm-generic/mshyperv.h | 11 ++++++-----
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
> u64 bank_contents[];
> } __packed;
>
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)
> +
> /* HvCallSendSyntheticClusterIpi hypercall */
> struct hv_send_ipi {
> u32 vector;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c08758b6b364..0abe91df1ef6 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> {
> int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
> int this_cpu = smp_processor_id();
> + int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
>
> - /* valid_bank_mask can represent up to 64 banks */
> - if (hv_max_vp_index / 64 >= 64)
> + /* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
> + if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
> return 0;
>
> /*
> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> * structs are not cleared between calls, we risk flushing unneeded
> * vCPUs otherwise.
> */
> - for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
> + for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
> vpset->bank_contents[vcpu_bank] = 0;
and here:
bitmap_clear(vpset->bank_contents, 0, hv_max_vp_index);
or maybe even if it is safe to do so:
bitmap_zero(vpset->bank_contents, hv_max_vp_index);
CJ
>
> /*
> @@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> vcpu = hv_cpu_number_to_vp_number(cpu);
> if (vcpu == VP_INVAL)
> return -1;
> - vcpu_bank = vcpu / 64;
> - vcpu_offset = vcpu % 64;
> + vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
> + vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
> __set_bit(vcpu_offset, (unsigned long *)
> &vpset->bank_contents[vcpu_bank]);
> if (vcpu_bank >= nr_bank)
Hi,
Le 14/04/2022 à 15:19, Vitaly Kuznetsov a écrit :
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
>
> No functional change.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> include/asm-generic/hyperv-tlfs.h | 5 +++++
> include/asm-generic/mshyperv.h | 11 ++++++-----
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
> u64 bank_contents[];
> } __packed;
>
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)
> +
> /* HvCallSendSyntheticClusterIpi hypercall */
> struct hv_send_ipi {
> u32 vector;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c08758b6b364..0abe91df1ef6 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> {
> int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
> int this_cpu = smp_processor_id();
> + int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
>
> - /* valid_bank_mask can represent up to 64 banks */
> - if (hv_max_vp_index / 64 >= 64)
> + /* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
> + if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
> return 0;
>
> /*
> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> * structs are not cleared between calls, we risk flushing unneeded
> * vCPUs otherwise.
> */
> - for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
> + for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
> vpset->bank_contents[vcpu_bank] = 0;
>
> /*
> @@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> vcpu = hv_cpu_number_to_vp_number(cpu);
> if (vcpu == VP_INVAL)
> return -1;
> - vcpu_bank = vcpu / 64;
> - vcpu_offset = vcpu % 64;
> + vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
> + vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
> __set_bit(vcpu_offset, (unsigned long *)
> &vpset->bank_contents[vcpu_bank]);
Here, we could also use directly:
__set_bit(vcpu, vpset->bank_contents);
This is simpler, more readable (IMHO) and also makes 'vcpu_offset' useless.
And in case gcc is not able to optimize it by itself, this should also
save a few cycles.
Just my 2c,
CJ
> if (vcpu_bank >= nr_bank)
Christophe JAILLET <[email protected]> writes:
> Le 14/04/2022 à 15:19, Vitaly Kuznetsov a écrit :
...
>> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
>> * structs are not cleared between calls, we risk flushing unneeded
>> * vCPUs otherwise.
>> */
>> - for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
>> + for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
>> vpset->bank_contents[vcpu_bank] = 0;
>
> and here:
> bitmap_clear(vpset->bank_contents, 0, hv_max_vp_index);
> or maybe even if it is safe to do so:
> bitmap_zero(vpset->bank_contents, hv_max_vp_index);
Both your suggestions (including the one for "PATCH v3 07/34]") look
good to me, thanks! I'd however want to send them to linux-hyperv@
separately when this series lands through KVM tree just to not make this
heavy series even heavier.
--
Vitaly
On Thu, Apr 14, 2022 at 03:19:46PM +0200, Vitaly Kuznetsov wrote:
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
>
> No functional change.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
Acked-by: Wei Liu <[email protected]>
Vitaly Kuznetsov <[email protected]> writes:
> Changes since v1:
This should've beed 'since v2', obviously.
...
>
> Currently, KVM handles HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} requests
> by flushing the whole VPID and this is sub-optimal. This series introduces
> the required mechanism to make handling of these requests more
> fine-grained by flushing individual GVAs only (when requested). On this
> foundation, "Direct Virtual Flush" Hyper-V feature is implemented. The
> feature allows L0 to handle Hyper-V TLB flush hypercalls directly at
> L0 without the need to reflect the exit to L1. This has at least two
> benefits: reflecting vmexit and the consequent vmenter are avoided + L0
> has precise information whether the target vCPU is actually running (and
> thus requires a kick).
FWIW, patches still apply cleanly to kvm/queue so probably there's no
need to resend.
--
Vitaly
On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled
> both in extended 'nested controls' in VMCB and partition assist page.
> According to Hyper-V TLFS, synthetic vmexit to L1 is performed with
> - HV_SVM_EXITCODE_ENL exit_code.
> - HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/svm/hyperv.c | 7 +++++++
> arch/x86/kvm/svm/hyperv.h | 19 +++++++++++++++++++
> arch/x86/kvm/svm/nested.c | 22 +++++++++++++++++++++-
> 3 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/hyperv.c b/arch/x86/kvm/svm/hyperv.c
> index c0749fc282fe..3842548bb88c 100644
> --- a/arch/x86/kvm/svm/hyperv.c
> +++ b/arch/x86/kvm/svm/hyperv.c
> @@ -8,4 +8,11 @@
>
> void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu)
> {
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + svm->vmcb->control.exit_code = HV_SVM_EXITCODE_ENL;
> + svm->vmcb->control.exit_code_hi = 0;
> + svm->vmcb->control.exit_info_1 = HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH;
> + svm->vmcb->control.exit_info_2 = 0;
> + nested_svm_vmexit(svm);
> }
> diff --git a/arch/x86/kvm/svm/hyperv.h b/arch/x86/kvm/svm/hyperv.h
> index a2b0d7580b0d..cd33e89f9f61 100644
> --- a/arch/x86/kvm/svm/hyperv.h
> +++ b/arch/x86/kvm/svm/hyperv.h
> @@ -33,6 +33,9 @@ struct hv_enlightenments {
> */
> #define VMCB_HV_NESTED_ENLIGHTENMENTS VMCB_SW
>
> +#define HV_SVM_EXITCODE_ENL 0xF0000000
> +#define HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH (1)
> +
> static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> @@ -48,6 +51,22 @@ static inline void nested_svm_hv_update_vm_vp_ids(struct kvm_vcpu *vcpu)
> hv_vcpu->nested.vp_id = hve->hv_vp_id;
> }
>
> +static inline bool nested_svm_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> + struct hv_enlightenments *hve =
> + (struct hv_enlightenments *)svm->nested.ctl.reserved_sw;
> + struct hv_vp_assist_page assist_page;
> +
> + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
> + return false;
> +
> + if (!hve->hv_enlightenments_control.nested_flush_hypercall)
> + return false;
> +
> + return assist_page.nested_control.features.directhypercall;
> +}
> +
> void svm_post_hv_l2_tlb_flush(struct kvm_vcpu *vcpu);
>
> #endif /* __ARCH_X86_KVM_SVM_HYPERV_H__ */
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index de3f27301b5c..a6d9807c09b1 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -172,7 +172,8 @@ void recalc_intercepts(struct vcpu_svm *svm)
> }
>
> /* We don't want to see VMMCALLs from a nested guest */
Minor nitpick: Maybe update the comment?
> - vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
> + if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
> + vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
>
> for (i = 0; i < MAX_INTERCEPT; i++)
> c->intercepts[i] |= g->intercepts[i];
> @@ -488,6 +489,17 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
>
> static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
> {
> + /*
> + * KVM_REQ_HV_TLB_FLUSH flushes entries from either L1's VP_ID or
> + * L2's VP_ID upon request from the guest. Make sure we check for
> + * pending entries for the case when the request got misplaced (e.g.
> + * a transition from L2->L1 happened while processing L2 TLB flush
> + * request or vice versa). kvm_hv_vcpu_flush_tlb() will not flush
> + * anything if there are no requests in the corresponding buffer.
> + */
> + if (to_hv_vcpu(vcpu))
> + kvm_make_request(KVM_REQ_HV_TLB_FLUSH, vcpu);
> +
> /*
> * TODO: optimize unconditional TLB flush/MMU sync. A partial list of
> * things to fix before this can be conditional:
> @@ -1357,6 +1369,7 @@ static int svm_check_nested_events(struct kvm_vcpu *vcpu)
> int nested_svm_exit_special(struct vcpu_svm *svm)
> {
> u32 exit_code = svm->vmcb->control.exit_code;
> + struct kvm_vcpu *vcpu = &svm->vcpu;
>
> switch (exit_code) {
> case SVM_EXIT_INTR:
> @@ -1375,6 +1388,13 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
> return NESTED_EXIT_HOST;
> break;
> }
> + case SVM_EXIT_VMMCALL:
> + /* Hyper-V L2 TLB flush hypercall is handled by L0 */
> + if (kvm_hv_l2_tlb_flush_exposed(vcpu) &&
> + nested_svm_l2_tlb_flush_enabled(vcpu) &&
> + kvm_hv_is_tlb_flush_hcall(vcpu))
> + return NESTED_EXIT_HOST;
> + break;
> default:
> break;
> }
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Extended GVA ranges support bit seems to indicate whether lower 12
> bits of GVA can be used to specify up to 4095 additional consequent
> GVAs to flush. This is somewhat described in TLFS.
>
> Previously, KVM was handling HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX}
> requests by flushing the whole VPID so technically, extended GVA
> ranges were already supported. As such requests are handled more
> gently now, advertizing support for extended ranges starts making
> sense to reduce the size of TLB flush requests.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/include/asm/hyperv-tlfs.h | 2 ++
> arch/x86/kvm/hyperv.c | 1 +
> 2 files changed, 3 insertions(+)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 0a9407dc0859..5225a85c08c3 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -61,6 +61,8 @@
> #define HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE BIT(10)
> /* Support for debug MSRs available */
> #define HV_FEATURE_DEBUG_MSRS_AVAILABLE BIT(11)
> +/* Support for extended gva ranges for flush hypercalls available */
> +#define HV_FEATURE_EXT_GVA_RANGES_FLUSH BIT(14)
> /*
> * Support for returning hypercall output block via XMM
> * registers is available
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 759e1a16e5c3..1a6f9628cee9 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2702,6 +2702,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> ent->ebx |= HV_DEBUGGING;
> ent->edx |= HV_X64_GUEST_DEBUGGING_AVAILABLE;
> ent->edx |= HV_FEATURE_DEBUG_MSRS_AVAILABLE;
> + ent->edx |= HV_FEATURE_EXT_GVA_RANGES_FLUSH;
>
> /*
> * Direct Synthetic timers only make sense with in-kernel
I do think that we need to ask Microsoft to document this,
since from the spec (v6.0b) the only mention of this is
"Bit 14: ExtendedGvaRangesForFlushVirtualAddressListAvailable"
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> In preparation to testing Hyper-V L2 TLB flush hypercalls, allocate
> so-called Partition assist page and link it to 'struct vmx_pages'.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> tools/testing/selftests/kvm/include/x86_64/vmx.h | 4 ++++
> tools/testing/selftests/kvm/lib/x86_64/vmx.c | 7 +++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> index 583ceb0d1457..f99922ca8259 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
> @@ -567,6 +567,10 @@ struct vmx_pages {
> uint64_t enlightened_vmcs_gpa;
> void *enlightened_vmcs;
>
> + void *partition_assist_hva;
> + uint64_t partition_assist_gpa;
> + void *partition_assist;
> +
> void *eptp_hva;
> uint64_t eptp_gpa;
> void *eptp;
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> index d089d8b850b5..3db21e0e1a8f 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
> @@ -124,6 +124,13 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
> vmx->enlightened_vmcs_gpa =
> addr_gva2gpa(vm, (uintptr_t)vmx->enlightened_vmcs);
>
> + /* Setup of a region of guest memory for the partition assist page. */
> + vmx->partition_assist = (void *)vm_vaddr_alloc_page(vm);
> + vmx->partition_assist_hva =
> + addr_gva2hva(vm, (uintptr_t)vmx->partition_assist);
> + vmx->partition_assist_gpa =
> + addr_gva2gpa(vm, (uintptr_t)vmx->partition_assist);
> +
> *p_vmx_gva = vmx_gva;
> return vmx;
> }
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> To handle L2 TLB flush requests, KVM needs to keep track of L2's VM_ID/
> VP_IDs which are set by L1 hypervisor. 'Partition assist page' address is
> also needed to handle post-flush exit to L1 upon request.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 6 ++++++
> arch/x86/kvm/vmx/nested.c | 15 +++++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 837c07e213de..8b2a52bf26c0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -622,6 +622,12 @@ struct kvm_vcpu_hv {
>
> /* Preallocated buffer for handling hypercalls passing sparse vCPU set */
> u64 sparse_banks[64];
> +
> + struct {
> + u64 pa_page_gpa;
> + u64 vm_id;
> + u32 vp_id;
> + } nested;
> };
>
> /* Xen HVM per vcpu emulation context */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index a6688663da4d..ee88921c6156 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -225,6 +225,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
>
> static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
> {
> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> struct vcpu_vmx *vmx = to_vmx(vcpu);
>
> if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) {
> @@ -233,6 +234,12 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
> }
>
> vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID;
> +
> + if (hv_vcpu) {
> + hv_vcpu->nested.pa_page_gpa = INVALID_GPA;
> + hv_vcpu->nested.vm_id = 0;
> + hv_vcpu->nested.vp_id = 0;
> + }
> }
>
> static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
> @@ -1591,11 +1598,19 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields
> {
> struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12;
> struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs;
> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(&vmx->vcpu);
>
> /* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */
> vmcs12->tpr_threshold = evmcs->tpr_threshold;
> vmcs12->guest_rip = evmcs->guest_rip;
>
> + if (unlikely(!(hv_clean_fields &
> + HV_VMX_ENLIGHTENED_CLEAN_FIELD_ENLIGHTENMENTSCONTROL))) {
> + hv_vcpu->nested.pa_page_gpa = evmcs->partition_assist_page;
> + hv_vcpu->nested.vm_id = evmcs->hv_vm_id;
> + hv_vcpu->nested.vp_id = evmcs->hv_vp_id;
> + }
> +
> if (unlikely(!(hv_clean_fields &
> HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_BASIC))) {
> vmcs12->guest_rsp = evmcs->guest_rsp;
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Get rid of on-stack allocation of vcpu_mask and optimize kvm_hv_send_ipi()
> for a smaller number of vCPUs in the request. When Hyper-V TLB flush
> is in use, HvSendSyntheticClusterIpi{,Ex} calls are not commonly used to
> send IPIs to a large number of vCPUs (and are rarely used in general).
>
> Introduce hv_is_vp_in_sparse_set() to directly check if the specified
> VP_ID is present in sparse vCPU set.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/hyperv.c | 37 ++++++++++++++++++++++++++-----------
> 1 file changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 3cf68645a2e6..aebbb598ad1d 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1746,6 +1746,25 @@ static void sparse_set_to_vcpu_mask(struct kvm *kvm, u64 *sparse_banks,
> }
> }
>
> +static bool hv_is_vp_in_sparse_set(u32 vp_id, u64 valid_bank_mask, u64 sparse_banks[])
> +{
> + int bank, sbank = 0;
> +
> + if (!test_bit(vp_id / HV_VCPUS_PER_SPARSE_BANK,
> + (unsigned long *)&valid_bank_mask))
> + return false;
> +
> + for_each_set_bit(bank, (unsigned long *)&valid_bank_mask,
> + KVM_HV_MAX_SPARSE_VCPU_SET_BITS) {
> + if (bank == vp_id / HV_VCPUS_PER_SPARSE_BANK)
> + break;
> + sbank++;
> + }
> +
> + return test_bit(vp_id % HV_VCPUS_PER_SPARSE_BANK,
> + (unsigned long *)&sparse_banks[sbank]);
> +}
> +
> struct kvm_hv_hcall {
> u64 param;
> u64 ingpa;
> @@ -2089,8 +2108,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> ((u64)hc->rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET);
> }
>
> -static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
> - unsigned long *vcpu_bitmap)
> +static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
> + u64 *sparse_banks, u64 valid_bank_mask)
I think the indentation is wrong here (was wrong before as well)
> {
> struct kvm_lapic_irq irq = {
> .delivery_mode = APIC_DM_FIXED,
> @@ -2100,7 +2119,10 @@ static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
> unsigned long i;
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> - if (vcpu_bitmap && !test_bit(i, vcpu_bitmap))
> + if (sparse_banks &&
> + !hv_is_vp_in_sparse_set(kvm_hv_get_vpindex(vcpu),
> + valid_bank_mask,
> + sparse_banks))
> continue;
>
> /* We fail only when APIC is disabled */
> @@ -2113,7 +2135,6 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> struct kvm *kvm = vcpu->kvm;
> struct hv_send_ipi_ex send_ipi_ex;
> struct hv_send_ipi send_ipi;
> - DECLARE_BITMAP(vcpu_mask, KVM_MAX_VCPUS);
> unsigned long valid_bank_mask;
> u64 sparse_banks[KVM_HV_MAX_SPARSE_VCPU_SET_BITS];
> u32 vector;
> @@ -2175,13 +2196,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
>
> - if (all_cpus) {
> - kvm_send_ipi_to_many(kvm, vector, NULL);
> - } else {
> - sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, vcpu_mask);
> -
> - kvm_send_ipi_to_many(kvm, vector, vcpu_mask);
> - }
> + kvm_hv_send_ipi_to_many(kvm, vector, all_cpus ? NULL : sparse_banks, valid_bank_mask);
>
> ret_success:
> return HV_STATUS_SUCCESS;
Overall looks good to me, but I might have missed something.
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Section 1.9 of TLFS v6.0b says:
>
> "All structures are padded in such a way that fields are aligned
> naturally (that is, an 8-byte field is aligned to an offset of 8 bytes
> and so on)".
>
> 'struct enlightened_vmcs' has a glitch:
>
> ...
> struct {
> u32 nested_flush_hypercall:1; /* 836: 0 4 */
> u32 msr_bitmap:1; /* 836: 1 4 */
> u32 reserved:30; /* 836: 2 4 */
> } hv_enlightenments_control; /* 836 4 */
> u32 hv_vp_id; /* 840 4 */
> u64 hv_vm_id; /* 844 8 */
> u64 partition_assist_page; /* 852 8 */
> ...
>
> And the observed values in 'partition_assist_page' make no sense at
> all. Fix the layout by padding the structure properly.
>
> Fixes: 68d1eb72ee99 ("x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits")
> Reviewed-by: Michael Kelley <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/include/asm/hyperv-tlfs.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 5225a85c08c3..e7ddae8e02c6 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -548,7 +548,7 @@ struct hv_enlightened_vmcs {
> u64 guest_rip;
>
> u32 hv_clean_fields;
> - u32 hv_padding_32;
> + u32 padding32_1;
> u32 hv_synthetic_controls;
> struct {
> u32 nested_flush_hypercall:1;
> @@ -556,7 +556,7 @@ struct hv_enlightened_vmcs {
> u32 reserved:30;
> } __packed hv_enlightenments_control;
> u32 hv_vp_id;
> -
> + u32 padding32_2;
> u64 hv_vm_id;
> u64 partition_assist_page;
> u64 padding64_4[4];
Makes sense.
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> It may not come clear from where the magical '64' value used in
> __cpumask_to_vpset() come from. Moreover, '64' means both the maximum
> sparse bank number as well as the number of vCPUs per bank. Add defines
> to make things clear. These defines are also going to be used by KVM.
>
> No functional change.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> include/asm-generic/hyperv-tlfs.h | 5 +++++
> include/asm-generic/mshyperv.h | 11 ++++++-----
> 2 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdce7a4cfc6f..020ca9bdbb79 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -399,6 +399,11 @@ struct hv_vpset {
> u64 bank_contents[];
> } __packed;
>
> +/* The maximum number of sparse vCPU banks which can be encoded by 'struct hv_vpset' */
> +#define HV_MAX_SPARSE_VCPU_BANKS (64)
> +/* The number of vCPUs in one sparse bank */
> +#define HV_VCPUS_PER_SPARSE_BANK (64)
> +
> /* HvCallSendSyntheticClusterIpi hypercall */
> struct hv_send_ipi {
> u32 vector;
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index c08758b6b364..0abe91df1ef6 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -214,9 +214,10 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> {
> int cpu, vcpu, vcpu_bank, vcpu_offset, nr_bank = 1;
> int this_cpu = smp_processor_id();
> + int max_vcpu_bank = hv_max_vp_index / HV_VCPUS_PER_SPARSE_BANK;
>
> - /* valid_bank_mask can represent up to 64 banks */
> - if (hv_max_vp_index / 64 >= 64)
> + /* vpset.valid_bank_mask can represent up to HV_MAX_SPARSE_VCPU_BANKS banks */
> + if (max_vcpu_bank >= HV_MAX_SPARSE_VCPU_BANKS)
> return 0;
>
> /*
> @@ -224,7 +225,7 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> * structs are not cleared between calls, we risk flushing unneeded
> * vCPUs otherwise.
> */
> - for (vcpu_bank = 0; vcpu_bank <= hv_max_vp_index / 64; vcpu_bank++)
> + for (vcpu_bank = 0; vcpu_bank <= max_vcpu_bank; vcpu_bank++)
> vpset->bank_contents[vcpu_bank] = 0;
>
> /*
> @@ -236,8 +237,8 @@ static inline int __cpumask_to_vpset(struct hv_vpset *vpset,
> vcpu = hv_cpu_number_to_vp_number(cpu);
> if (vcpu == VP_INVAL)
> return -1;
> - vcpu_bank = vcpu / 64;
> - vcpu_offset = vcpu % 64;
> + vcpu_bank = vcpu / HV_VCPUS_PER_SPARSE_BANK;
> + vcpu_offset = vcpu % HV_VCPUS_PER_SPARSE_BANK;
> __set_bit(vcpu_offset, (unsigned long *)
> &vpset->bank_contents[vcpu_bank]);
> if (vcpu_bank >= nr_bank)
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:20 +0200, Vitaly Kuznetsov wrote:
> set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
> from 128-bit registers. Moreover, these helpers are not used. Borrow
> _kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
> XMM0-XMM8 for now.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> .../selftests/kvm/include/x86_64/processor.h | 70 ++++++++++---------
> 1 file changed, 36 insertions(+), 34 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 37db341d4cc5..9ad7602a257b 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -296,71 +296,73 @@ static inline void cpuid(uint32_t *eax, uint32_t *ebx,
> : "memory");
> }
>
> -#define SET_XMM(__var, __xmm) \
> - asm volatile("movq %0, %%"#__xmm : : "r"(__var) : #__xmm)
> +typedef u32 __attribute__((vector_size(16))) sse128_t;
> +#define __sse128_u union { sse128_t vec; u64 as_u64[2]; u32 as_u32[4]; }
> +#define sse128_lo(x) ({ __sse128_u t; t.vec = x; t.as_u64[0]; })
> +#define sse128_hi(x) ({ __sse128_u t; t.vec = x; t.as_u64[1]; })
>
> -static inline void set_xmm(int n, unsigned long val)
> +static inline void read_sse_reg(int reg, sse128_t *data)
> {
> - switch (n) {
> + switch (reg) {
> case 0:
> - SET_XMM(val, xmm0);
> + asm("movdqa %%xmm0, %0" : "=m"(*data));
> break;
> case 1:
> - SET_XMM(val, xmm1);
> + asm("movdqa %%xmm1, %0" : "=m"(*data));
> break;
> case 2:
> - SET_XMM(val, xmm2);
> + asm("movdqa %%xmm2, %0" : "=m"(*data));
> break;
> case 3:
> - SET_XMM(val, xmm3);
> + asm("movdqa %%xmm3, %0" : "=m"(*data));
> break;
> case 4:
> - SET_XMM(val, xmm4);
> + asm("movdqa %%xmm4, %0" : "=m"(*data));
> break;
> case 5:
> - SET_XMM(val, xmm5);
> + asm("movdqa %%xmm5, %0" : "=m"(*data));
> break;
> case 6:
> - SET_XMM(val, xmm6);
> + asm("movdqa %%xmm6, %0" : "=m"(*data));
> break;
> case 7:
> - SET_XMM(val, xmm7);
> + asm("movdqa %%xmm7, %0" : "=m"(*data));
> break;
> + default:
> + BUG();
> }
> }
>
> -#define GET_XMM(__xmm) \
> -({ \
> - unsigned long __val; \
> - asm volatile("movq %%"#__xmm", %0" : "=r"(__val)); \
> - __val; \
> -})
> -
> -static inline unsigned long get_xmm(int n)
> +static inline void write_sse_reg(int reg, const sse128_t *data)
> {
> - assert(n >= 0 && n <= 7);
> -
> - switch (n) {
> + switch (reg) {
> case 0:
> - return GET_XMM(xmm0);
> + asm("movdqa %0, %%xmm0" : : "m"(*data));
> + break;
> case 1:
> - return GET_XMM(xmm1);
> + asm("movdqa %0, %%xmm1" : : "m"(*data));
> + break;
> case 2:
> - return GET_XMM(xmm2);
> + asm("movdqa %0, %%xmm2" : : "m"(*data));
> + break;
> case 3:
> - return GET_XMM(xmm3);
> + asm("movdqa %0, %%xmm3" : : "m"(*data));
> + break;
> case 4:
> - return GET_XMM(xmm4);
> + asm("movdqa %0, %%xmm4" : : "m"(*data));
> + break;
> case 5:
> - return GET_XMM(xmm5);
> + asm("movdqa %0, %%xmm5" : : "m"(*data));
> + break;
> case 6:
> - return GET_XMM(xmm6);
> + asm("movdqa %0, %%xmm6" : : "m"(*data));
> + break;
> case 7:
> - return GET_XMM(xmm7);
> + asm("movdqa %0, %%xmm7" : : "m"(*data));
> + break;
> + default:
> + BUG();
> }
> -
> - /* never reached */
> - return 0;
> }
>
> static inline void cpu_relax(void)
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> From: Sean Christopherson <[email protected]>
>
> Move the guts of kvm_get_sparse_vp_set() to a helper so that the code for
> reading a guest-provided array can be reused in the future, e.g. for
> getting a list of virtual addresses whose TLB entries need to be flushed.
>
> Opportunisticaly swap the order of the data and XMM adjustment so that
> the XMM/gpa offsets are bundled together.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/hyperv.c | 53 +++++++++++++++++++++++++++----------------
> 1 file changed, 33 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index fb716cf919ed..d66c27fd1e8a 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1758,38 +1758,51 @@ struct kvm_hv_hcall {
> sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];
> };
>
> -static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
> - int consumed_xmm_halves,
> - u64 *sparse_banks, gpa_t offset)
> -{
> - u16 var_cnt;
> - int i;
>
> - if (hc->var_cnt > 64)
> - return -EINVAL;
> -
> - /* Ignore banks that cannot possibly contain a legal VP index. */
> - var_cnt = min_t(u16, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS);
> +static int kvm_hv_get_hc_data(struct kvm *kvm, struct kvm_hv_hcall *hc,
> + u16 orig_cnt, u16 cnt_cap, u64 *data,
> + int consumed_xmm_halves, gpa_t offset)
> +{
> + /*
> + * Preserve the original count when ignoring entries via a "cap", KVM
> + * still needs to validate the guest input (though the non-XMM path
> + * punts on the checks).
> + */
> + u16 cnt = min(orig_cnt, cnt_cap);
> + int i, j;
>
> if (hc->fast) {
> /*
> * Each XMM holds two sparse banks, but do not count halves that
> * have already been consumed for hypercall parameters.
> */
> - if (hc->var_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
> + if (orig_cnt > 2 * HV_HYPERCALL_MAX_XMM_REGISTERS - consumed_xmm_halves)
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
> - for (i = 0; i < var_cnt; i++) {
> - int j = i + consumed_xmm_halves;
> +
> + for (i = 0; i < cnt; i++) {
> + j = i + consumed_xmm_halves;
> if (j % 2)
> - sparse_banks[i] = sse128_hi(hc->xmm[j / 2]);
> + data[i] = sse128_hi(hc->xmm[j / 2]);
> else
> - sparse_banks[i] = sse128_lo(hc->xmm[j / 2]);
> + data[i] = sse128_lo(hc->xmm[j / 2]);
> }
> return 0;
> }
>
> - return kvm_read_guest(kvm, hc->ingpa + offset, sparse_banks,
> - var_cnt * sizeof(*sparse_banks));
> + return kvm_read_guest(kvm, hc->ingpa + offset, data,
> + cnt * sizeof(*data));
> +}
> +
> +static u64 kvm_get_sparse_vp_set(struct kvm *kvm, struct kvm_hv_hcall *hc,
> + u64 *sparse_banks, int consumed_xmm_halves,
> + gpa_t offset)
> +{
> + if (hc->var_cnt > 64)
> + return -EINVAL;
> +
> + /* Cap var_cnt to ignore banks that cannot contain a legal VP index. */
> + return kvm_hv_get_hc_data(kvm, hc, hc->var_cnt, KVM_HV_MAX_SPARSE_VCPU_SET_BITS,
> + sparse_banks, consumed_xmm_halves, offset);
> }
>
> static inline int hv_tlb_flush_ring_free(struct kvm_vcpu_hv *hv_vcpu,
> @@ -1937,7 +1950,7 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> if (!hc->var_cnt)
> goto ret_success;
>
> - if (kvm_get_sparse_vp_set(kvm, hc, 2, sparse_banks,
> + if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 2,
> offsetof(struct hv_tlb_flush_ex,
> hv_vp_set.bank_contents)))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
> @@ -2048,7 +2061,7 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> if (!hc->var_cnt)
> goto ret_success;
>
> - if (kvm_get_sparse_vp_set(kvm, hc, 1, sparse_banks,
> + if (kvm_get_sparse_vp_set(kvm, hc, sparse_banks, 1,
> offsetof(struct hv_send_ipi_ex,
> vp_set.bank_contents)))
> return HV_STATUS_INVALID_HYPERCALL_INPUT;
I don't see anything wrong, but I don't know this area that well, so I might have
missed something.
Best regards,
Maxim Levitsky
On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
> from L2 in L0 to process L2 TLB flush requests.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/hyperv.c | 6 ++++++
> arch/x86/kvm/hyperv.h | 7 +++++++
> 3 files changed, 14 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ce62fde5f4ff..168600490bd1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
> u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
> u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
> u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
> + u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
> } cpuid_cache;
>
> struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 79aabe0c33ec..68a0df4e3f66 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
> hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
> else
> hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
> +
> + entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
> + if (entry)
> + hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
> + else
> + hv_vcpu->cpuid_cache.nested_features_eax = 0;
> }
>
> int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index f593c9fd1dee..d8cb6d70dbc8 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
> tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
> }
>
> +static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> +
> + return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
> +}
Tiny nipick (feel free to ignore): maybe use 'supported' instead of 'exposed',
as we don't use this term in KVM often.
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
> +
> static inline bool kvm_hv_is_tlb_flush_hcall(struct kvm_vcpu *vcpu)
> {
> struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
On Wed, May 11, 2022, Maxim Levitsky wrote:
> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> > @@ -2089,8 +2108,8 @@ static u64 kvm_hv_flush_tlb(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
> > ((u64)hc->rep_cnt << HV_HYPERCALL_REP_COMP_OFFSET);
> > }
> >
> > -static void kvm_send_ipi_to_many(struct kvm *kvm, u32 vector,
> > - unsigned long *vcpu_bitmap)
> > +static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,
> > + u64 *sparse_banks, u64 valid_bank_mask)
> I think the indentation is wrong here (was wrong before as well)
It's correct, the "+" from the diff/patch misaligns the first line because there's
no tab to eat the extra character. Amusingly, the misaligment just gets worse the
more ">" / quotes that get added to the front.
I usually end up applying a patch to double check if I suspect indentation is
wrong, it's too hard for me to tell based on the raw patch alone unless it's super
bad/obvious.
On Thu, 2022-05-19 at 15:25 +0200, Vitaly Kuznetsov wrote:
> Maxim Levitsky <[email protected]> writes:
>
> > On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
> > > Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
> > > from L2 in L0 to process L2 TLB flush requests.
> > >
> > > Signed-off-by: Vitaly Kuznetsov <[email protected]>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 1 +
> > > arch/x86/kvm/hyperv.c | 6 ++++++
> > > arch/x86/kvm/hyperv.h | 7 +++++++
> > > 3 files changed, 14 insertions(+)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index ce62fde5f4ff..168600490bd1 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
> > > u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
> > > u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
> > > u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
> > > + u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
> > > } cpuid_cache;
> > >
> > > struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
> > > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > > index 79aabe0c33ec..68a0df4e3f66 100644
> > > --- a/arch/x86/kvm/hyperv.c
> > > +++ b/arch/x86/kvm/hyperv.c
> > > @@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
> > > hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
> > > else
> > > hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
> > > +
> > > + entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
> > > + if (entry)
> > > + hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
> > > + else
> > > + hv_vcpu->cpuid_cache.nested_features_eax = 0;
> > > }
> > >
> > > int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
> > > diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> > > index f593c9fd1dee..d8cb6d70dbc8 100644
> > > --- a/arch/x86/kvm/hyperv.h
> > > +++ b/arch/x86/kvm/hyperv.h
> > > @@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
> > > tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
> > > }
> > >
> > > +static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
> > > +{
> > > + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> > > +
> > > + return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
> > > +}
> >
> > Tiny nipick (feel free to ignore): maybe use 'supported' instead of 'exposed',
> > as we don't use this term in KVM often.
> >
>
> Indeed we don't. Basically, this is guest_cpuid_has() but for a Hyper-V
> bit. I don't quite like 'supported' because we don't actually check
> whether KVM or even L1 guest 'support' this feature or not, we check
> whether the feature was 'exposed' to L1 so it can actually use it. I'm
> going to rename this to
>
> guest_hv_cpuid_has_l2_tlb_flush()
Sounds perfect!
Best regards,
Maxim Levitsky
>
> then.
>
> > Reviewed-by: Maxim Levitsky <[email protected]>
> >
>
> Thanks!
>
Maxim Levitsky <[email protected]> writes:
> On Thu, 2022-04-14 at 15:19 +0200, Vitaly Kuznetsov wrote:
>> Introduce a helper to quickly check if KVM needs to handle VMCALL/VMMCALL
>> from L2 in L0 to process L2 TLB flush requests.
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/kvm/hyperv.c | 6 ++++++
>> arch/x86/kvm/hyperv.h | 7 +++++++
>> 3 files changed, 14 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index ce62fde5f4ff..168600490bd1 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -616,6 +616,7 @@ struct kvm_vcpu_hv {
>> u32 enlightenments_eax; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EAX */
>> u32 enlightenments_ebx; /* HYPERV_CPUID_ENLIGHTMENT_INFO.EBX */
>> u32 syndbg_cap_eax; /* HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES.EAX */
>> + u32 nested_features_eax; /* HYPERV_CPUID_NESTED_FEATURES.EAX */
>> } cpuid_cache;
>>
>> struct kvm_vcpu_hv_tlb_flush_ring tlb_flush_ring[HV_NR_TLB_FLUSH_RINGS];
>> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
>> index 79aabe0c33ec..68a0df4e3f66 100644
>> --- a/arch/x86/kvm/hyperv.c
>> +++ b/arch/x86/kvm/hyperv.c
>> @@ -2281,6 +2281,12 @@ void kvm_hv_set_cpuid(struct kvm_vcpu *vcpu)
>> hv_vcpu->cpuid_cache.syndbg_cap_eax = entry->eax;
>> else
>> hv_vcpu->cpuid_cache.syndbg_cap_eax = 0;
>> +
>> + entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_NESTED_FEATURES, 0);
>> + if (entry)
>> + hv_vcpu->cpuid_cache.nested_features_eax = entry->eax;
>> + else
>> + hv_vcpu->cpuid_cache.nested_features_eax = 0;
>> }
>>
>> int kvm_hv_set_enforce_cpuid(struct kvm_vcpu *vcpu, bool enforce)
>> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
>> index f593c9fd1dee..d8cb6d70dbc8 100644
>> --- a/arch/x86/kvm/hyperv.h
>> +++ b/arch/x86/kvm/hyperv.h
>> @@ -168,6 +168,13 @@ static inline void kvm_hv_vcpu_empty_flush_tlb(struct kvm_vcpu *vcpu)
>> tlb_flush_ring->read_idx = tlb_flush_ring->write_idx;
>> }
>>
>> +static inline bool kvm_hv_l2_tlb_flush_exposed(struct kvm_vcpu *vcpu)
>> +{
>> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
>> +
>> + return hv_vcpu && (hv_vcpu->cpuid_cache.nested_features_eax & HV_X64_NESTED_DIRECT_FLUSH);
>> +}
>
> Tiny nipick (feel free to ignore): maybe use 'supported' instead of 'exposed',
> as we don't use this term in KVM often.
>
Indeed we don't. Basically, this is guest_cpuid_has() but for a Hyper-V
bit. I don't quite like 'supported' because we don't actually check
whether KVM or even L1 guest 'support' this feature or not, we check
whether the feature was 'exposed' to L1 so it can actually use it. I'm
going to rename this to
guest_hv_cpuid_has_l2_tlb_flush()
then.
> Reviewed-by: Maxim Levitsky <[email protected]>
>
Thanks!
--
Vitaly