2018-10-13 14:56:00

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 00/15] x86/KVM/Hyper-v: Add HV ept tlb range flush hypercall support in KVM

From: Lan Tianyu <[email protected]>

For nested memory virtualization, Hyper-v doesn't set write-protect
L1 hypervisor EPT page directory and page table node to track changes
while it relies on guest to tell it changes via HvFlushGuestAddressLlist
hypercall. HvFlushGuestAddressLlist hypercall provides a way to flush
EPT page table with ranges which are specified by L1 hypervisor.

If L1 hypervisor uses INVEPT or HvFlushGuestAddressSpace hypercall to
flush EPT tlb, Hyper-V will invalidate associated EPT shadow page table
and sync L1's EPT table when next EPT page fault is triggered.
HvFlushGuestAddressLlist hypercall helps to avoid such redundant EPT
page fault and synchronization of shadow page table.


Change since v3:
1) Remove code of updating "tlbs_dirty" in kvm_flush_remote_tlbs_with_range()
2) Remove directly tlb flush in the kvm_handle_hva_range()
3) Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()
4) Combine Vitaly's "don't pass EPT configuration info to
vmx_hv_remote_flush_tlb()" fix

Change since v2:
1) Fix comment in the kvm_flush_remote_tlbs_with_range()
2) Move HV_MAX_FLUSH_PAGES and HV_MAX_FLUSH_REP_COUNT to
hyperv-tlfs.h.
3) Calculate HV_MAX_FLUSH_REP_COUNT in the macro definition
4) Use HV_MAX_FLUSH_REP_COUNT to define length of gpa_list in
struct hv_guest_mapping_flush_list.

Change since v1:
1) Convert "end_gfn" of struct kvm_tlb_range to "pages" in order
to avoid confusion as to whether "end_gfn" is inclusive or exlusive.
2) Add hyperv tlb range struct and replace kvm tlb range struct
with new struct in order to avoid using kvm struct in the hyperv
code directly.



Lan Tianyu (15):
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM/MMU: Add tlb flush with range helper function
KVM: Replace old tlb flush function with new one to flush a specified
range.
KVM: Make kvm_set_spte_hva() return int
KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to
kvm_mmu_notifier_change_pte()
KVM/MMU: Flush tlb directly in the kvm_set_pte_rmapp()
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_mmu_zap_collapsible_spte()
KVM: Add flush_link and parent_pte in the struct kvm_mmu_page
KVM: Add spte's point in the struct kvm_mmu_page
KVM/MMU: Replace tlb flush function with range list flush function
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
x86/Hyper-v: Add trace in the
hyperv_nested_flush_guest_mapping_range()
KVM/VMX: Change hv flush logic when ept tables are mismatched.
KVM/VMX: Add hv tlb range flush support

arch/arm/include/asm/kvm_host.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/mips/include/asm/kvm_host.h | 2 +-
arch/mips/kvm/mmu.c | 3 +-
arch/powerpc/include/asm/kvm_host.h | 2 +-
arch/powerpc/kvm/book3s.c | 3 +-
arch/powerpc/kvm/e500_mmu_host.c | 3 +-
arch/x86/hyperv/nested.c | 85 ++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 32 +++++++++
arch/x86/include/asm/kvm_host.h | 12 +++-
arch/x86/include/asm/mshyperv.h | 16 +++++
arch/x86/include/asm/trace/hyperv.h | 14 ++++
arch/x86/kvm/mmu.c | 138 ++++++++++++++++++++++++++++++------
arch/x86/kvm/paging_tmpl.h | 10 ++-
arch/x86/kvm/vmx.c | 70 +++++++++++++++---
virt/kvm/arm/mmu.c | 6 +-
virt/kvm/kvm_main.c | 5 +-
17 files changed, 360 insertions(+), 45 deletions(-)

--
2.14.4



2018-10-13 14:56:00

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 1/15] KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops

From: Lan Tianyu <[email protected]>

Add flush range call back in the kvm_x86_ops and platform can use it
to register its associated function. The parameter "kvm_tlb_range"
accepts a single range and flush list which contains a list of ranges.

Signed-off-by: Lan Tianyu <[email protected]>
---
Change since v1:
Change "end_gfn" to "pages" to aviod confusion as to whether
"end_gfn" is inclusive or exlusive.
---
arch/x86/include/asm/kvm_host.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4b09d4aa9bf4..fea95aa77319 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -439,6 +439,12 @@ struct kvm_mmu {
u64 pdptrs[4]; /* pae */
};

+struct kvm_tlb_range {
+ u64 start_gfn;
+ u64 pages;
+ struct list_head *flush_list;
+};
+
enum pmc_type {
KVM_PMC_GP = 0,
KVM_PMC_FIXED,
@@ -1039,6 +1045,8 @@ struct kvm_x86_ops {

void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
int (*tlb_remote_flush)(struct kvm *kvm);
+ int (*tlb_remote_flush_with_range)(struct kvm *kvm,
+ struct kvm_tlb_range *range);

/*
* Flush any TLB entries associated with the given GVA.
--
2.14.4


2018-10-13 14:56:00

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 3/15] KVM: Replace old tlb flush function with new one to flush a specified range.

From: Lan Tianyu <[email protected]>

This patch is to replace kvm_flush_remote_tlbs() with kvm_flush_
remote_tlbs_with_address() in some functions without logic change.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 31 +++++++++++++++++++++----------
arch/x86/kvm/paging_tmpl.h | 3 ++-
2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ff656d85903a..9b9db36df103 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1490,8 +1490,12 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)

static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
{
- if (__drop_large_spte(vcpu->kvm, sptep))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ if (__drop_large_spte(vcpu->kvm, sptep)) {
+ struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+ KVM_PAGES_PER_HPAGE(sp->role.level));
+ }
}

/*
@@ -1959,7 +1963,8 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);

kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, 0);
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+ KVM_PAGES_PER_HPAGE(sp->role.level));
}

int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
@@ -2475,7 +2480,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
account_shadowed(vcpu->kvm, sp);
if (level == PT_PAGE_TABLE_LEVEL &&
rmap_write_protect(vcpu, gfn))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);

if (level > PT_PAGE_TABLE_LEVEL && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
@@ -2595,7 +2600,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
return;

drop_parent_pte(child, sptep);
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
}
}

@@ -3019,8 +3024,10 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
ret = RET_PF_EMULATE;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
+
if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
+ KVM_PAGES_PER_HPAGE(level));

if (unlikely(is_mmio_spte(*sptep)))
ret = RET_PF_EMULATE;
@@ -5695,7 +5702,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
* on PT_WRITABLE_MASK anymore.
*/
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}

static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
@@ -5759,7 +5767,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
* dirty_bitmap.
*/
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);

@@ -5777,7 +5786,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
lockdep_assert_held(&kvm->slots_lock);

if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);

@@ -5794,7 +5804,8 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,

/* see kvm_mmu_slot_leaf_clear_dirty */
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7cf2185b7eb5..6bdca39829bc 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -894,7 +894,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);

if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm,
+ sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));

if (!rmap_can_add(vcpu))
break;
--
2.14.4


2018-10-13 14:56:00

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 4/15] KVM: Make kvm_set_spte_hva() return int

From: Lan Tianyu <[email protected]>

The patch is to make kvm_set_spte_hva() return int and caller can
check return value to determine flush tlb or not.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/arm/include/asm/kvm_host.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/mips/include/asm/kvm_host.h | 2 +-
arch/mips/kvm/mmu.c | 3 ++-
arch/powerpc/include/asm/kvm_host.h | 2 +-
arch/powerpc/kvm/book3s.c | 3 ++-
arch/powerpc/kvm/e500_mmu_host.c | 3 ++-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 3 ++-
virt/kvm/arm/mmu.c | 6 ++++--
10 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 3ad482d2f1eb..efb820bdad2c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -225,7 +225,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
#define KVM_ARCH_WANT_MMU_NOTIFIER
int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);

unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3d6d7336f871..2e506c0b3eb7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -358,7 +358,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
#define KVM_ARCH_WANT_MMU_NOTIFIER
int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 2c1c53d12179..71c3f21d80d5 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -933,7 +933,7 @@ enum kvm_mips_fault_result kvm_trap_emul_gva_fault(struct kvm_vcpu *vcpu,
#define KVM_ARCH_WANT_MMU_NOTIFIER
int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);

diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index d8dcdb350405..97e538a8c1be 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -551,7 +551,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gfn_t gfn, gfn_t gfn_end,
(pte_dirty(old_pte) && !pte_dirty(hva_pte));
}

-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
{
unsigned long end = hva + PAGE_SIZE;
int ret;
@@ -559,6 +559,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
ret = handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pte);
if (ret)
kvm_mips_callbacks->flush_shadow_all(kvm);
+ return 0;
}

static int kvm_age_hva_handler(struct kvm *kvm, gfn_t gfn, gfn_t gfn_end,
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index fac6f631ed29..ab23379c53a9 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -72,7 +72,7 @@ extern int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
-extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+extern int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);

#define HPTEG_CACHE_NUM (1 << 15)
#define HPTEG_HASH_BITS_PTE 13
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index fd9893bc7aa1..437613bb609a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -850,9 +850,10 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
return kvm->arch.kvm_ops->test_age_hva(kvm, hva);
}

-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
{
kvm->arch.kvm_ops->set_spte_hva(kvm, hva, pte);
+ return 0;
}

void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 8f2985e46f6f..c3f312b2bcb3 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -757,10 +757,11 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
return 0;
}

-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
{
/* The page will get remapped properly on its next fault */
kvm_unmap_hva(kvm, hva);
+ return 0;
}

/*****************************************/
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fea95aa77319..19985c602ed6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1504,7 +1504,7 @@ asmlinkage void kvm_spurious_fault(void);
int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9b9db36df103..fd24a4dc45e9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1918,9 +1918,10 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)
return kvm_handle_hva_range(kvm, start, end, 0, kvm_unmap_rmapp);
}

-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
{
kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
+ return 0;
}

static int kvm_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ed162a6c57c5..89a9c5fa9fd7 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1845,14 +1845,14 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data
}


-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
{
unsigned long end = hva + PAGE_SIZE;
kvm_pfn_t pfn = pte_pfn(pte);
pte_t stage2_pte;

if (!kvm->arch.pgd)
- return;
+ return 0;

trace_kvm_set_spte_hva(hva);

@@ -1863,6 +1863,8 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
clean_dcache_guest_page(pfn, PAGE_SIZE);
stage2_pte = pfn_pte(pfn, PAGE_S2);
handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
+
+ return 0;
}

static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
--
2.14.4


2018-10-13 14:56:00

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

From: Lan Tianyu <[email protected]>

This patch is to add wrapper functions for tlb_remote_flush_with_range
callback.

Signed-off-by: Lan Tianyu <[email protected]>
---
Change sicne V3:
Remove code of updating "tlbs_dirty"
Change since V2:
Fix comment in the kvm_flush_remote_tlbs_with_range()
---
arch/x86/kvm/mmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c73d9f650de7..ff656d85903a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -264,6 +264,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
static union kvm_mmu_page_role
kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);

+
+static inline bool kvm_available_flush_tlb_with_range(void)
+{
+ return kvm_x86_ops->tlb_remote_flush_with_range;
+}
+
+static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range)
+{
+ int ret = -ENOTSUPP;
+
+ if (range && kvm_x86_ops->tlb_remote_flush_with_range)
+ ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range);
+
+ if (ret)
+ kvm_flush_remote_tlbs(kvm);
+}
+
+static void kvm_flush_remote_tlbs_with_list(struct kvm *kvm,
+ struct list_head *flush_list)
+{
+ struct kvm_tlb_range range;
+
+ range.flush_list = flush_list;
+
+ kvm_flush_remote_tlbs_with_range(kvm, &range);
+}
+
+static void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
+ u64 start_gfn, u64 pages)
+{
+ struct kvm_tlb_range range;
+
+ range.start_gfn = start_gfn;
+ range.pages = pages;
+ range.flush_list = NULL;
+
+ kvm_flush_remote_tlbs_with_range(kvm, &range);
+}
+
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
{
BUG_ON((mmio_mask & mmio_value) != mmio_value);
--
2.14.4


2018-10-13 14:56:01

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 5/15] KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte()

From: Lan Tianyu <[email protected]>

This patch is to move tlb flush in kvm_set_pte_rmapp() to
kvm_mmu_notifier_change_pte() in order to avoid redundant tlb flush.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 8 ++------
virt/kvm/kvm_main.c | 5 ++++-
2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fd24a4dc45e9..5d3a180c57e2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1781,10 +1781,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
}
}

- if (need_flush)
- kvm_flush_remote_tlbs(kvm);
-
- return 0;
+ return need_flush;
}

struct slot_rmap_walk_iterator {
@@ -1920,8 +1917,7 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end)

int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
{
- kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
- return 0;
+ return kvm_handle_hva(kvm, hva, (unsigned long)&pte, kvm_set_pte_rmapp);
}

static int kvm_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index acc951cc2663..bd026d74541e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -354,7 +354,10 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);
kvm->mmu_notifier_seq++;
- kvm_set_spte_hva(kvm, address, pte);
+
+ if (kvm_set_spte_hva(kvm, address, pte))
+ kvm_flush_remote_tlbs(kvm);
+
spin_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
}
--
2.14.4


2018-10-13 14:56:10

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 6/15] KVM/MMU: Flush tlb directly in the kvm_set_pte_rmapp()

From: Lan Tianyu <[email protected]>

This patch is to flush tlb directly in the kvm_set_pte_rmapp()
and return 0.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5d3a180c57e2..f3742ff4ec18 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1781,6 +1781,11 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
}
}

+ if (need_flush && kvm_available_flush_tlb_with_range()) {
+ kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
+ return 0;
+ }
+
return need_flush;
}

--
2.14.4


2018-10-13 14:56:19

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 7/15] KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()

From: Lan Tianyu <[email protected]>

Originally, flush tlb is done by slot_handle_level_range(). This patch
is to flush tlb directly in the kvm_zap_gfn_range() when range
flush is available.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f3742ff4ec18..c4f7679f12c3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5647,6 +5647,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
+ bool flush = false;
int i;

spin_lock(&kvm->mmu_lock);
@@ -5654,18 +5655,27 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
slots = __kvm_memslots(kvm, i);
kvm_for_each_memslot(memslot, slots) {
gfn_t start, end;
+ bool flush_tlb = true;

start = max(gfn_start, memslot->base_gfn);
end = min(gfn_end, memslot->base_gfn + memslot->npages);
if (start >= end)
continue;

- slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
- PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
- start, end - 1, true);
+ if (kvm_available_flush_tlb_with_range())
+ flush_tlb = false;
+
+ flush = slot_handle_level_range(kvm, memslot,
+ kvm_zap_rmapp, PT_PAGE_TABLE_LEVEL,
+ PT_MAX_HUGEPAGE_LEVEL, start,
+ end - 1, flush_tlb);
}
}

+ if (flush && kvm_available_flush_tlb_with_range())
+ kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
+ gfn_end - gfn_start + 1);
+
spin_unlock(&kvm->mmu_lock);
}

--
2.14.4


2018-10-13 14:56:25

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 8/15] KVM/MMU: Flush tlb directly in kvm_mmu_zap_collapsible_spte()

From: Lan Tianyu <[email protected]>

kvm_mmu_zap_collapsible_spte() returns flush request to the
slot_handle_leaf() and the latter does flush on demand. When
range flush is available, make kvm_mmu_zap_collapsible_spte()
to flush tlb with range directly to avoid returning range back
to slot_handle_leaf().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c4f7679f12c3..e984a0067a43 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5743,7 +5743,13 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
!kvm_is_reserved_pfn(pfn) &&
PageTransCompoundMap(pfn_to_page(pfn))) {
drop_spte(kvm, sptep);
- need_tlb_flush = 1;
+
+ if (kvm_available_flush_tlb_with_range())
+ kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+ KVM_PAGES_PER_HPAGE(sp->role.level));
+ else
+ need_tlb_flush = 1;
+
goto restart;
}
}
--
2.14.4


2018-10-13 14:56:31

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 9/15] KVM: Add flush_link and parent_pte in the struct kvm_mmu_page

From: Lan Tianyu <[email protected]>

PV EPT tlb flush function will accept a list of flush ranges and
use struct kvm_mmu_page as the list entry.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 19985c602ed6..8279235285f8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -316,6 +316,7 @@ struct kvm_rmap_head {

struct kvm_mmu_page {
struct list_head link;
+ struct list_head flush_link;
struct hlist_node hash_link;
bool unsync;

--
2.14.4


2018-10-13 14:56:38

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 10/15] KVM: Add spte's point in the struct kvm_mmu_page

From: Lan Tianyu <[email protected]>

It's necessary to check whether mmu page is last or large page when add
mmu page into flush list. "spte" is needed for such check and so add
spte point in the struct kvm_mmu_page.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 5 +++++
arch/x86/kvm/paging_tmpl.h | 2 ++
3 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8279235285f8..c986ebefc9ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -333,6 +333,7 @@ struct kvm_mmu_page {
int root_count; /* Currently serving as active root */
unsigned int unsync_children;
struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
+ u64 *sptep;

/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */
unsigned long mmu_valid_gen;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e984a0067a43..393f4048dd7a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3165,6 +3165,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
pseudo_gfn = base_addr >> PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
iterator.level - 1, 1, ACC_ALL);
+ sp->sptep = iterator.sptep;

link_shadow_page(vcpu, iterator.sptep, sp);
}
@@ -3602,6 +3603,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
sp = kvm_mmu_get_page(vcpu, 0, 0,
vcpu->arch.mmu->shadow_root_level, 1, ACC_ALL);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu->root_hpa = __pa(sp->spt);
} else if (vcpu->arch.mmu->shadow_root_level == PT32E_ROOT_LEVEL) {
@@ -3618,6 +3620,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu->pae_root[i] = root | PT_PRESENT_MASK;
}
@@ -3658,6 +3661,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
vcpu->arch.mmu->shadow_root_level, 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu->root_hpa = root;
return 0;
@@ -3695,6 +3699,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);

vcpu->arch.mmu->pae_root[i] = root | pm_mask;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6bdca39829bc..833e8855bbc9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -633,6 +633,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
table_gfn = gw->table_gfn[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
+ sp->sptep = it.sptep;
}

/*
@@ -663,6 +664,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
true, direct_access);
+ sp->sptep = it.sptep;
link_shadow_page(vcpu, it.sptep, sp);
}

--
2.14.4


2018-10-13 14:56:41

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 11/15] KVM/MMU: Replace tlb flush function with range list flush function

From: Lan Tianyu <[email protected]>

This patch is to use range list flush function in the
mmu_sync_children(), kvm_mmu_commit_zap_page() and
FNAME(sync_page)().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 26 +++++++++++++++++++++++---
arch/x86/kvm/paging_tmpl.h | 5 ++++-
2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 393f4048dd7a..69e4cff1115d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1100,6 +1100,13 @@ static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot,
}
}

+static void kvm_mmu_queue_flush_request(struct kvm_mmu_page *sp,
+ struct list_head *flush_list)
+{
+ if (sp->sptep && is_last_spte(*sp->sptep, sp->role.level))
+ list_add(&sp->flush_link, flush_list);
+}
+
void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
{
update_gfn_disallow_lpage_count(slot, gfn, 1);
@@ -2372,12 +2379,16 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,

while (mmu_unsync_walk(parent, &pages)) {
bool protected = false;
+ LIST_HEAD(flush_list);

- for_each_sp(pages, sp, parents, i)
+ for_each_sp(pages, sp, parents, i) {
protected |= rmap_write_protect(vcpu, sp->gfn);
+ kvm_mmu_queue_flush_request(sp, &flush_list);
+ }

if (protected) {
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_list(vcpu->kvm,
+ &flush_list);
flush = false;
}

@@ -2713,6 +2724,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
struct list_head *invalid_list)
{
struct kvm_mmu_page *sp, *nsp;
+ LIST_HEAD(flush_list);

if (list_empty(invalid_list))
return;
@@ -2726,7 +2738,15 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
* In addition, kvm_flush_remote_tlbs waits for all vcpus to exit
* guest mode and/or lockless shadow page table walks.
*/
- kvm_flush_remote_tlbs(kvm);
+ if (kvm_available_flush_tlb_with_range()) {
+ list_for_each_entry(sp, invalid_list, link)
+ kvm_mmu_queue_flush_request(sp, &flush_list);
+
+ if (!list_empty(&flush_list))
+ kvm_flush_remote_tlbs_with_list(kvm, &flush_list);
+ } else {
+ kvm_flush_remote_tlbs(kvm);
+ }

list_for_each_entry_safe(sp, nsp, invalid_list, link) {
WARN_ON(!sp->role.invalid || sp->root_count);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 833e8855bbc9..e44737ce6bad 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -973,6 +973,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
bool host_writable;
gpa_t first_pte_gpa;
int set_spte_ret = 0;
+ LIST_HEAD(flush_list);

/* direct kvm_mmu_page can not be unsync. */
BUG_ON(sp->role.direct);
@@ -1033,10 +1034,12 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
pte_access, PT_PAGE_TABLE_LEVEL,
gfn, spte_to_pfn(sp->spt[i]),
true, false, host_writable);
+ if (set_spte_ret && kvm_available_flush_tlb_with_range())
+ kvm_mmu_queue_flush_request(sp, &flush_list);
}

if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_list(vcpu->kvm, &flush_list);

return nr_present;
}
--
2.14.4


2018-10-13 14:56:52

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 12/15] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

From: Lan Tianyu <[email protected]>

Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb
with specified ranges. This patch is to add the hypercall support.

Reviewed-by: Michael Kelley <[email protected]>
Signed-off-by: Lan Tianyu <[email protected]>
---
Change since v2:
Fix some coding style issues
- Move HV_MAX_FLUSH_PAGES and HV_MAX_FLUSH_REP_COUNT to
hyperv-tlfs.h.
- Calculate HV_MAX_FLUSH_REP_COUNT in the macro definition
- Use HV_MAX_FLUSH_REP_COUNT to define length of gpa_list in
struct hv_guest_mapping_flush_list.

Change since v1:
Add hyperv tlb flush struct to avoid use kvm tlb flush struct
in the hyperv file.
---
arch/x86/hyperv/nested.c | 84 ++++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 32 +++++++++++++++
arch/x86/include/asm/mshyperv.h | 16 ++++++++
3 files changed, 132 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index b8e60cc50461..a6fdfec63c7d 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -7,6 +7,7 @@
*
* Author : Lan Tianyu <[email protected]>
*/
+#define pr_fmt(fmt) "Hyper-V: " fmt


#include <linux/types.h>
@@ -54,3 +55,86 @@ int hyperv_flush_guest_mapping(u64 as)
return ret;
}
EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
+
+static int fill_flush_list(union hv_gpa_page_range gpa_list[],
+ int offset, u64 start_gfn, u64 pages)
+{
+ int gpa_n = offset;
+ u64 cur = start_gfn;
+ u64 additional_pages;
+
+ do {
+ /*
+ * If flush requests exceed max flush count, go back to
+ * flush tlbs without range.
+ */
+ if (gpa_n >= HV_MAX_FLUSH_REP_COUNT)
+ return -ENOSPC;
+
+ additional_pages = min_t(u64, pages, HV_MAX_FLUSH_PAGES) - 1;
+
+ gpa_list[gpa_n].page.additional_pages = additional_pages;
+ gpa_list[gpa_n].page.largepage = false;
+ gpa_list[gpa_n].page.basepfn = cur;
+
+ pages -= additional_pages + 1;
+ cur += additional_pages + 1;
+ gpa_n++;
+ } while (pages > 0);
+
+ return gpa_n;
+}
+
+int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range)
+{
+ struct hv_guest_mapping_flush_list **flush_pcpu;
+ struct hv_guest_mapping_flush_list *flush;
+ u64 status = 0;
+ unsigned long flags;
+ int ret = -ENOTSUPP;
+ int gpa_n = 0;
+
+ if (!hv_hypercall_pg)
+ goto fault;
+
+ local_irq_save(flags);
+
+ flush_pcpu = (struct hv_guest_mapping_flush_list **)
+ this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ flush = *flush_pcpu;
+ if (unlikely(!flush)) {
+ local_irq_restore(flags);
+ goto fault;
+ }
+
+ flush->address_space = as;
+ flush->flags = 0;
+
+ if (!range->flush_list)
+ gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
+ range->start_gfn, range->pages);
+ else if (range->parse_flush_list_func)
+ gpa_n = range->parse_flush_list_func(flush->gpa_list, gpa_n,
+ range->flush_list, fill_flush_list);
+ else
+ gpa_n = -1;
+
+ if (gpa_n < 0) {
+ local_irq_restore(flags);
+ goto fault;
+ }
+
+ status = hv_do_rep_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST,
+ gpa_n, 0, flush, NULL);
+
+ local_irq_restore(flags);
+
+ if (!(status & HV_HYPERCALL_RESULT_MASK))
+ ret = 0;
+ else
+ ret = status;
+fault:
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 00e01d215f74..cf59250c284a 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -10,6 +10,7 @@
#define _ASM_X86_HYPERV_TLFS_H

#include <linux/types.h>
+#include <asm/page.h>

/*
* The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
@@ -353,6 +354,7 @@ struct hv_tsc_emulation_status {
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
+#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0

#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
@@ -752,6 +754,36 @@ struct hv_guest_mapping_flush {
u64 flags;
};

+/*
+ * HV_MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited
+ * by the bitwidth of "additional_pages" in union hv_gpa_page_range.
+ */
+#define HV_MAX_FLUSH_PAGES (2048)
+
+/* HvFlushGuestPhysicalAddressList hypercall */
+union hv_gpa_page_range {
+ u64 address_space;
+ struct {
+ u64 additional_pages:11;
+ u64 largepage:1;
+ u64 basepfn:52;
+ } page;
+};
+
+/*
+ * All input flush parameters should be in single page. The max flush
+ * count is equal with how many entries of union hv_gpa_page_range can
+ * be populated into the input parameter page.
+ */
+#define HV_MAX_FLUSH_REP_COUNT (PAGE_SIZE - 2 * sizeof(u64) / \
+ sizeof(union hv_gpa_page_range))
+
+struct hv_guest_mapping_flush_list {
+ u64 address_space;
+ u64 flags;
+ union hv_gpa_page_range gpa_list[HV_MAX_FLUSH_REP_COUNT];
+};
+
/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
struct hv_tlb_flush {
u64 address_space;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f37704497d8f..19f49fbcf94d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -22,6 +22,16 @@ struct ms_hyperv_info {

extern struct ms_hyperv_info ms_hyperv;

+struct hyperv_tlb_range {
+ u64 start_gfn;
+ u64 pages;
+ struct list_head *flush_list;
+ int (*parse_flush_list_func)(union hv_gpa_page_range gpa_list[],
+ int offset, struct list_head *flush_list,
+ int (*fill_flush_list)(union hv_gpa_page_range gpa_list[],
+ int offset, u64 start_gfn, u64 end_gfn));
+};
+
/*
* Generate the guest ID.
*/
@@ -348,6 +358,7 @@ void set_hv_tscchange_cb(void (*cb)(void));
void clear_hv_tscchange_cb(void);
void hyperv_stop_tsc_emulation(void);
int hyperv_flush_guest_mapping(u64 as);
+int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range);

#ifdef CONFIG_X86_64
void hv_apic_init(void);
@@ -368,6 +379,11 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
return NULL;
}
static inline int hyperv_flush_guest_mapping(u64 as) { return -1; }
+static inline int hyperv_flush_guest_mapping_range(u64 as,
+ struct hyperv_tlb_range *range)
+{
+ return -1;
+}
#endif /* CONFIG_HYPERV */

#ifdef CONFIG_HYPERV_TSCPAGE
--
2.14.4


2018-10-13 14:57:12

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 13/15] x86/Hyper-v: Add trace in the hyperv_nested_flush_guest_mapping_range()

From: Lan Tianyu <[email protected]>

This patch is to trace log in the hyperv_nested_flush_
guest_mapping_range().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/hyperv/nested.c | 1 +
arch/x86/include/asm/trace/hyperv.h | 14 ++++++++++++++
2 files changed, 15 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index a6fdfec63c7d..4850c74508f3 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -135,6 +135,7 @@ int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range)
else
ret = status;
fault:
+ trace_hyperv_nested_flush_guest_mapping_range(as, ret);
return ret;
}
EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
index 2e6245a023ef..ace464f09681 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -42,6 +42,20 @@ TRACE_EVENT(hyperv_nested_flush_guest_mapping,
TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
);

+TRACE_EVENT(hyperv_nested_flush_guest_mapping_range,
+ TP_PROTO(u64 as, int ret),
+ TP_ARGS(as, ret),
+
+ TP_STRUCT__entry(
+ __field(u64, as)
+ __field(int, ret)
+ ),
+ TP_fast_assign(__entry->as = as;
+ __entry->ret = ret;
+ ),
+ TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
+ );
+
TRACE_EVENT(hyperv_send_ipi_mask,
TP_PROTO(const struct cpumask *cpus,
int vector),
--
2.14.4


2018-10-13 14:57:26

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 14/15] KVM/VMX: Change hv flush logic when ept tables are mismatched.

From: Lan Tianyu <[email protected]>

If ept table pointers are mismatched, flushing tlb for each vcpus via
hv flush interface still helps to reduce vmexits which are triggered
by IPI and INEPT emulation.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/vmx.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6f44d3a63434..8ff13f3aed11 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1571,7 +1571,8 @@ static void check_ept_pointer_match(struct kvm *kvm)

static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
{
- int ret;
+ struct kvm_vcpu *vcpu;
+ int ret = -ENOTSUPP, i;

spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);

@@ -1579,14 +1580,14 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
check_ept_pointer_match(kvm);

if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
- ret = -ENOTSUPP;
- goto out;
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ ret |= hyperv_flush_guest_mapping(
+ to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer);
+ } else {
+ ret = hyperv_flush_guest_mapping(
+ to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
}

- ret = hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
-
-out:
spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
return ret;
}
--
2.14.4


2018-10-13 14:59:07

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V4 15/15] KVM/VMX: Add hv tlb range flush support

From: Lan Tianyu <[email protected]>

This patch is to register tlb_remote_flush_with_range callback with
hv tlb range flush interface.

Signed-off-by: Lan Tianyu <[email protected]>
---
Change since v3:
Merge Vitaly's don't pass EPT configuration info to
vmx_hv_remote_flush_tlb() fix.
Change since v1:
Pass flush range with new hyper-v tlb flush struct rather
than KVM tlb flush struct.
---
arch/x86/kvm/vmx.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8ff13f3aed11..a93c2cc8a293 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1569,7 +1569,48 @@ static void check_ept_pointer_match(struct kvm *kvm)
to_kvm_vmx(kvm)->ept_pointers_match = EPT_POINTERS_MATCH;
}

-static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
+int kvm_parse_flush_list_func(union hv_gpa_page_range gpa_list[],
+ int offset, struct list_head *flush_list,
+ int (*fill_flush_list)(union hv_gpa_page_range gpa_list[],
+ int offset, u64 start_gfn, u64 end_gfn))
+{
+ struct kvm_mmu_page *sp;
+
+ list_for_each_entry(sp, flush_list,
+ flush_link) {
+ offset = fill_flush_list(gpa_list, offset,
+ sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+ }
+
+ return offset;
+}
+
+static inline int __hv_remote_flush_tlb_with_range(struct kvm *kvm,
+ struct kvm_vcpu *vcpu, struct kvm_tlb_range *range)
+{
+ u64 ept_pointer = to_vmx(vcpu)->ept_pointer;
+ struct hyperv_tlb_range flush_range;
+
+ if (range) {
+ flush_range.start_gfn = range->start_gfn;
+ flush_range.pages = range->pages;
+ flush_range.flush_list = range->flush_list;
+ flush_range.parse_flush_list_func = kvm_parse_flush_list_func;
+
+ /*
+ * FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs address
+ * of the base of EPT PML4 table, strip off EPT configuration
+ * information.
+ */
+ return hyperv_flush_guest_mapping_range(ept_pointer & PAGE_MASK,
+ &flush_range);
+ } else {
+ return hyperv_flush_guest_mapping(ept_pointer & PAGE_MASK);
+ }
+}
+
+static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range)
{
struct kvm_vcpu *vcpu;
int ret = -ENOTSUPP, i;
@@ -1581,16 +1622,21 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)

if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
kvm_for_each_vcpu(i, vcpu, kvm)
- ret |= hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer);
+ ret |= __hv_remote_flush_tlb_with_range(
+ kvm, vcpu, range);
} else {
- ret = hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
+ ret = __hv_remote_flush_tlb_with_range(kvm,
+ kvm_get_vcpu(kvm, 0), range);
}

spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
return ret;
}
+
+static int hv_remote_flush_tlb(struct kvm *kvm)
+{
+ return hv_remote_flush_tlb_with_range(kvm, NULL);
+}
#else /* !IS_ENABLED(CONFIG_HYPERV) */
static inline void evmcs_write64(unsigned long field, u64 value) {}
static inline void evmcs_write32(unsigned long field, u32 value) {}
@@ -7929,8 +7975,11 @@ static __init int hardware_setup(void)

#if IS_ENABLED(CONFIG_HYPERV)
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
- && enable_ept)
- kvm_x86_ops->tlb_remote_flush = vmx_hv_remote_flush_tlb;
+ && enable_ept) {
+ kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+ kvm_x86_ops->tlb_remote_flush_with_range =
+ hv_remote_flush_tlb_with_range;
+ }
#endif

if (!cpu_has_vmx_ple()) {
--
2.14.4


2018-10-14 07:29:54

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function



> On 13 Oct 2018, at 17:53, [email protected] wrote:
>
> From: Lan Tianyu <[email protected]>
>
> This patch is to add wrapper functions for tlb_remote_flush_with_range
> callback.
>
> Signed-off-by: Lan Tianyu <[email protected]>
> ---
> Change sicne V3:
> Remove code of updating "tlbs_dirty"
> Change since V2:
> Fix comment in the kvm_flush_remote_tlbs_with_range()
> ---
> arch/x86/kvm/mmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c73d9f650de7..ff656d85903a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -264,6 +264,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
> static union kvm_mmu_page_role
> kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
>
> +
> +static inline bool kvm_available_flush_tlb_with_range(void)
> +{
> + return kvm_x86_ops->tlb_remote_flush_with_range;
> +}

Seems that kvm_available_flush_tlb_with_range() is not used in this patch…

> +
> +static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
> + struct kvm_tlb_range *range)
> +{
> + int ret = -ENOTSUPP;
> +
> + if (range && kvm_x86_ops->tlb_remote_flush_with_range)
> + ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range);
> +
> + if (ret)
> + kvm_flush_remote_tlbs(kvm);
> +}
> +
> +static void kvm_flush_remote_tlbs_with_list(struct kvm *kvm,
> + struct list_head *flush_list)
> +{
> + struct kvm_tlb_range range;
> +
> + range.flush_list = flush_list;
> +
> + kvm_flush_remote_tlbs_with_range(kvm, &range);
> +}
> +
> +static void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> + u64 start_gfn, u64 pages)
> +{
> + struct kvm_tlb_range range;
> +
> + range.start_gfn = start_gfn;
> + range.pages = pages;
> + range.flush_list = NULL;
> +
> + kvm_flush_remote_tlbs_with_range(kvm, &range);
> +}
> +
> void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
> {
> BUG_ON((mmio_mask & mmio_value) != mmio_value);
> --
> 2.14.4
>


2018-10-14 08:18:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

On Sun, 14 Oct 2018, Liran Alon wrote:
> > On 13 Oct 2018, at 17:53, [email protected] wrote:
> >
> > From: Lan Tianyu <[email protected]>
> >
> > This patch is to add wrapper functions for tlb_remote_flush_with_range
> > callback.
> >
> > Signed-off-by: Lan Tianyu <[email protected]>
> > ---
> > Change sicne V3:
> > Remove code of updating "tlbs_dirty"
> > Change since V2:
> > Fix comment in the kvm_flush_remote_tlbs_with_range()
> > ---
> > arch/x86/kvm/mmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 40 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index c73d9f650de7..ff656d85903a 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -264,6 +264,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
> > static union kvm_mmu_page_role
> > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
> >
> > +
> > +static inline bool kvm_available_flush_tlb_with_range(void)
> > +{
> > + return kvm_x86_ops->tlb_remote_flush_with_range;
> > +}
>
> Seems that kvm_available_flush_tlb_with_range() is not used in this patch…

What's wrong with that?

It provides the implementation and later patches make use of it. It's a
sensible way to split patches into small, self contained entities.

Thanks,

tglx

2018-10-14 09:22:07

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function



> On 14 Oct 2018, at 11:16, Thomas Gleixner <[email protected]> wrote:
>
> On Sun, 14 Oct 2018, Liran Alon wrote:
>>> On 13 Oct 2018, at 17:53, [email protected] wrote:
>>>
>>> +
>>> +static inline bool kvm_available_flush_tlb_with_range(void)
>>> +{
>>> + return kvm_x86_ops->tlb_remote_flush_with_range;
>>> +}
>>
>> Seems that kvm_available_flush_tlb_with_range() is not used in this patch…
>
> What's wrong with that?
>
> It provides the implementation and later patches make use of it. It's a
> sensible way to split patches into small, self contained entities.
>
> Thanks,
>
> tglx
>

I guess it’s a matter of taste, but I prefer to not add dead-code for patches
in order for each commit to compile nicely without warnings of declared and unused functions.
I would prefer to just add this utility function on the patch that actually use it.

-Liran


2018-10-14 09:28:39

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

On Sun, Oct 14, 2018 at 10:16:56AM +0200, Thomas Gleixner wrote:
> On Sun, 14 Oct 2018, Liran Alon wrote:
> > > On 13 Oct 2018, at 17:53, [email protected] wrote:
> > >
> > > From: Lan Tianyu <[email protected]>
> > >
> > > This patch is to add wrapper functions for tlb_remote_flush_with_range
> > > callback.
> > >
> > > Signed-off-by: Lan Tianyu <[email protected]>
> > > ---
> > > Change sicne V3:
> > > Remove code of updating "tlbs_dirty"
> > > Change since V2:
> > > Fix comment in the kvm_flush_remote_tlbs_with_range()
> > > ---
> > > arch/x86/kvm/mmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 40 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > > index c73d9f650de7..ff656d85903a 100644
> > > --- a/arch/x86/kvm/mmu.c
> > > +++ b/arch/x86/kvm/mmu.c
> > > @@ -264,6 +264,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
> > > static union kvm_mmu_page_role
> > > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
> > >
> > > +
> > > +static inline bool kvm_available_flush_tlb_with_range(void)
> > > +{
> > > + return kvm_x86_ops->tlb_remote_flush_with_range;
> > > +}
> >
> > Seems that kvm_available_flush_tlb_with_range() is not used in this patch…
>
> What's wrong with that?
>
> It provides the implementation and later patches make use of it. It's a
> sensible way to split patches into small, self contained entities.

From what I can see of the patches that follow _this_ patch in the
series, this function remains unused. So, not only is it not used
in this patch, it's not used in this series.

I think the real question that needs asking is - what is the plan
for this function, and when will it be used?

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2018-10-14 09:36:33

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

On Sun, Oct 14, 2018 at 10:27:34AM +0100, Russell King - ARM Linux wrote:
> On Sun, Oct 14, 2018 at 10:16:56AM +0200, Thomas Gleixner wrote:
> > On Sun, 14 Oct 2018, Liran Alon wrote:
> > > > On 13 Oct 2018, at 17:53, [email protected] wrote:
> > > >
> > > > From: Lan Tianyu <[email protected]>
> > > >
> > > > This patch is to add wrapper functions for tlb_remote_flush_with_range
> > > > callback.
> > > >
> > > > Signed-off-by: Lan Tianyu <[email protected]>
> > > > ---
> > > > Change sicne V3:
> > > > Remove code of updating "tlbs_dirty"
> > > > Change since V2:
> > > > Fix comment in the kvm_flush_remote_tlbs_with_range()
> > > > ---
> > > > arch/x86/kvm/mmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 40 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > > > index c73d9f650de7..ff656d85903a 100644
> > > > --- a/arch/x86/kvm/mmu.c
> > > > +++ b/arch/x86/kvm/mmu.c
> > > > @@ -264,6 +264,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
> > > > static union kvm_mmu_page_role
> > > > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
> > > >
> > > > +
> > > > +static inline bool kvm_available_flush_tlb_with_range(void)
> > > > +{
> > > > + return kvm_x86_ops->tlb_remote_flush_with_range;
> > > > +}
> > >
> > > Seems that kvm_available_flush_tlb_with_range() is not used in this patch…
> >
> > What's wrong with that?
> >
> > It provides the implementation and later patches make use of it. It's a
> > sensible way to split patches into small, self contained entities.
>
> From what I can see of the patches that follow _this_ patch in the
> series, this function remains unused. So, not only is it not used
> in this patch, it's not used in this series.

Note - I seem to have only received patches 1 through 4, so this is
based on the patches I've received.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2018-10-14 12:58:14

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

Hi Liran & Thomas:
Thanks for your review.


On Sun, Oct 14, 2018 at 5:20 PM Liran Alon <[email protected]> wrote:
>
>
>
> > On 14 Oct 2018, at 11:16, Thomas Gleixner <[email protected]> wrote:
> >
> > On Sun, 14 Oct 2018, Liran Alon wrote:
> >>> On 13 Oct 2018, at 17:53, [email protected] wrote:
> >>>
> >>> +
> >>> +static inline bool kvm_available_flush_tlb_with_range(void)
> >>> +{
> >>> + return kvm_x86_ops->tlb_remote_flush_with_range;
> >>> +}
> >>
> >> Seems that kvm_available_flush_tlb_with_range() is not used in this patch…
> >
> > What's wrong with that?
> >
> > It provides the implementation and later patches make use of it. It's a
> > sensible way to split patches into small, self contained entities.
> >
> > Thanks,
> >
> > tglx
> >
>
> I guess it’s a matter of taste, but I prefer to not add dead-code for patches
> in order for each commit to compile nicely without warnings of declared and unused functions.
> I would prefer to just add this utility function on the patch that actually use it.
>
> -Liran
>

Normally, I also prefer to put the function definition into the patch
which use it.
But the following patch "KVM: Replace old tlb flush function with new
one to flush a specified range"
and other patches which use new functions will change a lot of places.
It's not friendly for review and
so I split them into pieces.
--
Best regards
Tianyu Lan

2018-10-14 13:22:11

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

Hi Russell:
Thanks for your review.

On Sun, Oct 14, 2018 at 5:36 PM Russell King - ARM Linux
<[email protected]> wrote:
>
> On Sun, Oct 14, 2018 at 10:27:34AM +0100, Russell King - ARM Linux wrote:
> > On Sun, Oct 14, 2018 at 10:16:56AM +0200, Thomas Gleixner wrote:
> > > On Sun, 14 Oct 2018, Liran Alon wrote:
> > > > > On 13 Oct 2018, at 17:53, [email protected] wrote:
> > > > >
> > > > > From: Lan Tianyu <[email protected]>
> > > > >
> > > > > This patch is to add wrapper functions for tlb_remote_flush_with_range
> > > > > callback.
> > > > >
> > > > > Signed-off-by: Lan Tianyu <[email protected]>
> > > > > ---
> > > > > Change sicne V3:
> > > > > Remove code of updating "tlbs_dirty"
> > > > > Change since V2:
> > > > > Fix comment in the kvm_flush_remote_tlbs_with_range()
> > > > > ---
> > > > > arch/x86/kvm/mmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
> > > > > 1 file changed, 40 insertions(+)
> > > > >
> > > > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > > > > index c73d9f650de7..ff656d85903a 100644
> > > > > --- a/arch/x86/kvm/mmu.c
> > > > > +++ b/arch/x86/kvm/mmu.c
> > > > > @@ -264,6 +264,46 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
> > > > > static union kvm_mmu_page_role
> > > > > kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
> > > > >
> > > > > +
> > > > > +static inline bool kvm_available_flush_tlb_with_range(void)
> > > > > +{
> > > > > + return kvm_x86_ops->tlb_remote_flush_with_range;
> > > > > +}
> > > >
> > > > Seems that kvm_available_flush_tlb_with_range() is not used in this patch…
> > >
> > > What's wrong with that?
> > >
> > > It provides the implementation and later patches make use of it. It's a
> > > sensible way to split patches into small, self contained entities.
> >
> > From what I can see of the patches that follow _this_ patch in the
> > series, this function remains unused. So, not only is it not used
> > in this patch, it's not used in this series.
>
> Note - I seem to have only received patches 1 through 4, so this is
> based on the patches I've received.
>

Sorry to confuse your. I get from CCers from get_maintainer.pl script.
The next patch "[PATCH V4 3/15]KVM: Replace old tlb flush function with
new one to flush a specified range" calls new function.
https://lkml.org/lkml/2018/10/13/254

The patch "[PATCH V4 4/15] KVM: Make kvm_set_spte_hva() return int"
changes under ARM directory. Please have a look. Thanks.
--
Best regards
Tianyu Lan

2018-10-14 13:34:55

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

On Sun, Oct 14, 2018 at 09:21:23PM +0800, Tianyu Lan wrote:
> Sorry to confuse your. I get from CCers from get_maintainer.pl script.

Unfortunately you seem to have made a mistake. My email address is
'[email protected]' not '[email protected]'. There is no
'[email protected]' in MAINTAINERS, yet your emails appear to be
copied to this address.

Consequently, if it was your intention to copy me with the entire
series, that didn't happen.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

2018-10-15 10:05:14

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 7/15] KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()

On 13/10/2018 16:53, [email protected] wrote:
> + bool flush = false;
> int i;
>
> spin_lock(&kvm->mmu_lock);
> @@ -5654,18 +5655,27 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> slots = __kvm_memslots(kvm, i);
> kvm_for_each_memslot(memslot, slots) {
> gfn_t start, end;
> + bool flush_tlb = true;
>
> start = max(gfn_start, memslot->base_gfn);
> end = min(gfn_end, memslot->base_gfn + memslot->npages);
> if (start >= end)
> continue;
>
> - slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
> - PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
> - start, end - 1, true);
> + if (kvm_available_flush_tlb_with_range())
> + flush_tlb = false;

This should be moved outside the for, because it's invariant.

> + flush = slot_handle_level_range(kvm, memslot,
> + kvm_zap_rmapp, PT_PAGE_TABLE_LEVEL,
> + PT_MAX_HUGEPAGE_LEVEL, start,
> + end - 1, flush_tlb);

... and this should be "flush |= ".
> }
> }
>
> + if (flush && kvm_available_flush_tlb_with_range())
> + kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
> + gfn_end - gfn_start + 1);
> +

... and this can be just if (flush), because if flush_tlb is true then
slot_handle_level_range always returns false.

Paolo

2018-10-15 10:14:25

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 9/15] KVM: Add flush_link and parent_pte in the struct kvm_mmu_page

On 13/10/2018 16:54, [email protected] wrote:
> From: Lan Tianyu <[email protected]>
>
> PV EPT tlb flush function will accept a list of flush ranges and
> use struct kvm_mmu_page as the list entry.
>
> Signed-off-by: Lan Tianyu <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 19985c602ed6..8279235285f8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -316,6 +316,7 @@ struct kvm_rmap_head {
>
> struct kvm_mmu_page {
> struct list_head link;
> + struct list_head flush_link;

This can be an hlist. However, you are not documenting what's the
locking here. There are many places in which KVM does a
"cond_resched_lock(&vcpu->kvm->mmu_lock);" and you need to explain how
flush_link is not live across that.

I would start from a simpler patch that just uses the list-based flush
in kvm_mmu_commit_zap_page, where you already have the list of things to
flush as invalid_list.

> struct hlist_node hash_link;
> bool unsync;
>
>

Also this is not adding parent_pte, so the subject is incorrect.

Paolo

2018-10-15 10:32:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 12/15] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

On 13/10/2018 16:54, [email protected] wrote:
>
> +static int fill_flush_list(union hv_gpa_page_range gpa_list[],
> + int offset, u64 start_gfn, u64 pages)

Pass the entire struct hv_guest_mapping_flush_list to this function,
it's simpler and it hides the gpa_list argument from
range->parse_flush_list_func.

> + if (!range->flush_list)
> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
> + range->start_gfn, range->pages);
> + else if (range->parse_flush_list_func)
> + gpa_n = range->parse_flush_list_func(flush->gpa_list, gpa_n,
> + range->flush_list, fill_flush_list);
> + else

You are making the code more complicated in order to avoid making
fill_flush_list public. Just make it public and always go through the
parse_flush_list_func case. In fact:

- make parse_flush_list_func an argument of hyperv_flush_guest_mapping_range

- drop struct hyperv_tlb_range completely, instead just pass a void* to
hyperv_flush_guest_mapping_range and pass it back to the callback. The
void * can point to the start_gfn/pages pair, it can be the flush_list,
or anything else.

Paolo

2018-10-15 11:17:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 14/15] KVM/VMX: Change hv flush logic when ept tables are mismatched.

On 13/10/2018 16:54, [email protected] wrote:
> From: Lan Tianyu <[email protected]>
>
> If ept table pointers are mismatched, flushing tlb for each vcpus via
> hv flush interface still helps to reduce vmexits which are triggered
> by IPI and INEPT emulation.
>
> Signed-off-by: Lan Tianyu <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6f44d3a63434..8ff13f3aed11 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1571,7 +1571,8 @@ static void check_ept_pointer_match(struct kvm *kvm)
>
> static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
> {
> - int ret;
> + struct kvm_vcpu *vcpu;
> + int ret = -ENOTSUPP, i;
>
> spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);
>
> @@ -1579,14 +1580,14 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
> check_ept_pointer_match(kvm);
>
> if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
> - ret = -ENOTSUPP;
> - goto out;
> + kvm_for_each_vcpu(i, vcpu, kvm)
> + ret |= hyperv_flush_guest_mapping(
> + to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer);
> + } else {
> + ret = hyperv_flush_guest_mapping(
> + to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
> }
>
> - ret = hyperv_flush_guest_mapping(
> - to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
> -
> -out:
> spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
> return ret;
> }
>

I think this is an independent change that can be applied separately?

Thanks,

Paolo

2018-10-15 11:52:08

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 11/15] KVM/MMU: Replace tlb flush function with range list flush function

On 13/10/2018 16:54, [email protected] wrote:
> while (mmu_unsync_walk(parent, &pages)) {
> bool protected = false;
> + LIST_HEAD(flush_list);
>
> - for_each_sp(pages, sp, parents, i)
> + for_each_sp(pages, sp, parents, i) {
> protected |= rmap_write_protect(vcpu, sp->gfn);
> + kvm_mmu_queue_flush_request(sp, &flush_list);
> + }

Here you already know that the page has to be flushed, because you are
dealing with shadow page tables and those always use 4K pages. So the
check on is_last_page is unnecessary.

>
> pte_access, PT_PAGE_TABLE_LEVEL,
> gfn, spte_to_pfn(sp->spt[i]),
> true, false, host_writable);
> + if (set_spte_ret && kvm_available_flush_tlb_with_range())
> + kvm_mmu_queue_flush_request(sp, &flush_list);
> }

This is wrong, I think. sp is always the same throughout the loop, so
you are adding it multiple times to flush_list.

Instead, you need to add a separate range for each virtual address (in
this case L2 GPA) that is synced; but for each PTE that you call
set_spte here for, you could be syncing multiple L2 GPAs if a single
page is reused multiple times by the guest's EPT page tables.

And actually I may be missing something, but doesn't this apply to all
call sites? For mmu_sync_children you can do the flush in
__rmap_write_protect and return false, similar to the first part of the
series, but not for kvm_mmu_commit_zap_page and sync_page.

Can you simplify this series to only have hv_remote_flush_tlb_with_range
and remove all the flush_list stuff? That first part is safe and well
understood, because it uses the rmap and so it's clear that you have L2
GPAs at hand. Most of the remarks I made on the Hyper-V API will still
apply.

Paolo

> if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
> - kvm_flush_remote_tlbs(vcpu->kvm);
> + kvm_flush_remote_tlbs_with_list(vcpu->kvm, &flush_list);
>
> return nr_present;


2018-10-15 11:53:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 6/15] KVM/MMU: Flush tlb directly in the kvm_set_pte_rmapp()

On 13/10/2018 16:53, [email protected] wrote:
> @@ -1781,6 +1781,11 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
> }
> }
>
> + if (need_flush && kvm_available_flush_tlb_with_range()) {
> + kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
> + return 0;
> + }
> +

Here you're passing an L1 GPA, not an L2 GPA. Is it correct?

Paolo

2018-10-15 12:03:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 2/15] KVM/MMU: Add tlb flush with range helper function

On 14/10/2018 10:16, Thomas Gleixner wrote:
>>> +static inline bool kvm_available_flush_tlb_with_range(void)
>>> +{
>>> + return kvm_x86_ops->tlb_remote_flush_with_range;
>>> +}
>> Seems that kvm_available_flush_tlb_with_range() is not used in this patch…
> What's wrong with that?
>
> It provides the implementation and later patches make use of it. It's a
> sensible way to split patches into small, self contained entities.

That's true, on the other hand I have indeed a concerns with this patch:
this series is not bisectable at all, because all the new code is dead
until the very last patch. Uses of the new feature should come _after_
the implementation.

I don't have any big problem with what Liran pointed out (and I can live
with the unused static functions that would warn with -Wunused, too),
but the above should be fixed in v5, basically by moving patches 12-15
at the beginning of the series.

Paolo

2018-10-15 12:05:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V4 00/15] x86/KVM/Hyper-v: Add HV ept tlb range flush hypercall support in KVM

On 13/10/2018 16:53, [email protected] wrote:
> From: Lan Tianyu <[email protected]>
>
> For nested memory virtualization, Hyper-v doesn't set write-protect
> L1 hypervisor EPT page directory and page table node to track changes
> while it relies on guest to tell it changes via HvFlushGuestAddressLlist
> hypercall. HvFlushGuestAddressLlist hypercall provides a way to flush
> EPT page table with ranges which are specified by L1 hypervisor.
>
> If L1 hypervisor uses INVEPT or HvFlushGuestAddressSpace hypercall to
> flush EPT tlb, Hyper-V will invalidate associated EPT shadow page table
> and sync L1's EPT table when next EPT page fault is triggered.
> HvFlushGuestAddressLlist hypercall helps to avoid such redundant EPT
> page fault and synchronization of shadow page table.

So I just told you that the first part is well understood but I must
retract that; after carefully reviewing the whole series, I think one of
us is actually very confused.

I am not afraid to say it can be me, but my understanding is that you're
passing L1 GPAs to the hypercall and instead the spec says it expects L2
GPAs. (Consider that, because KVM's shadow paging code is shared
between nested EPT and !EPT cases, every time you see gpa/gfn in the
code it is for L1, while nested EPT GPAs are really what the code calls
gva.)

What's going on?

Paolo

2018-10-15 13:09:35

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V4 7/15] KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()

Hi Paolo:
Thanks for your review.

On Mon, Oct 15, 2018 at 6:04 PM Paolo Bonzini <[email protected]> wrote:
>
> On 13/10/2018 16:53, [email protected] wrote:
> > + bool flush = false;
> > int i;
> >
> > spin_lock(&kvm->mmu_lock);
> > @@ -5654,18 +5655,27 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> > slots = __kvm_memslots(kvm, i);
> > kvm_for_each_memslot(memslot, slots) {
> > gfn_t start, end;
> > + bool flush_tlb = true;
> >
> > start = max(gfn_start, memslot->base_gfn);
> > end = min(gfn_end, memslot->base_gfn + memslot->npages);
> > if (start >= end)
> > continue;
> >
> > - slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
> > - PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
> > - start, end - 1, true);
> > + if (kvm_available_flush_tlb_with_range())
> > + flush_tlb = false;
>
> This should be moved outside the for, because it's invariant.
>
> > + flush = slot_handle_level_range(kvm, memslot,
> > + kvm_zap_rmapp, PT_PAGE_TABLE_LEVEL,
> > + PT_MAX_HUGEPAGE_LEVEL, start,
> > + end - 1, flush_tlb);
>
> ... and this should be "flush |= ".
> > }
> > }
> >
> > + if (flush && kvm_available_flush_tlb_with_range())
> > + kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
> > + gfn_end - gfn_start + 1);
> > +
>
> ... and this can be just if (flush), because if flush_tlb is true then
> slot_handle_level_range always returns false.

OK. Will update.

--
Best regards
Tianyu Lan