LinuxLists.cc - [PATCH V2 00/13] x86/KVM/Hyper-v: Add HV ept tlb range flush hypercall support in KVM

2018-09-18 03:18:41

Subject: [PATCH V2 00/13] x86/KVM/Hyper-v: Add HV ept tlb range flush hypercall support in KVM

For nested memory virtualization, Hyper-v doesn't set write-protect
L1 hypervisor EPT page directory and page table node to track changes
while it relies on guest to tell it changes via HvFlushGuestAddressLlist
hypercall. HvFlushGuestAddressLlist hypercall provides a way to flush
EPT page table with ranges which are specified by L1 hypervisor.

If L1 hypervisor uses INVEPT or HvFlushGuestAddressSpace hypercall to
flush EPT tlb, Hyper-V will invalidate associated EPT shadow page table
and sync L1's EPT table when next EPT page fault is triggered.
HvFlushGuestAddressLlist hypercall helps to avoid such redundant EPT
page fault and synchronization of shadow page table.

Change since v1:
1) Convert "end_gfn" of struct kvm_tlb_range to "pages" in order
to avoid confusion as to whether "end_gfn" is inclusive or exlusive.
2) Add hyperv tlb range struct and replace kvm tlb range struct
with new struct in order to avoid using kvm struct in the hyperv
code directly.

Lan Tianyu (13):
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM/MMU: Add tlb flush with range helper function
KVM: Replace old tlb flush function with new one to flush a specified
range.
KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_mmu_zap_collapsible_spte()
KVM: Add flush_link and parent_pte in the struct kvm_mmu_page
KVM: Add spte's point in the struct kvm_mmu_page
KVM/MMU: Replace tlb flush function with range list flush function
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
x86/Hyper-v: Add trace in the
hyperv_nested_flush_guest_mapping_range()
KVM/VMX: Change hv flush logic when ept tables are mismatched.
KVM/VMX: Add hv tlb range flush support

arch/x86/hyperv/nested.c | 104 ++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 17 +++++
arch/x86/include/asm/kvm_host.h | 10 +++
arch/x86/include/asm/mshyperv.h | 16 ++++
arch/x86/include/asm/trace/hyperv.h | 14 ++++
arch/x86/kvm/mmu.c | 143 +++++++++++++++++++++++++++++++-----
arch/x86/kvm/paging_tmpl.h | 16 +++-
arch/x86/kvm/vmx.c | 65 +++++++++++++---
8 files changed, 354 insertions(+), 31 deletions(-)

--
2.14.4

2018-09-18 03:18:43

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 1/13] KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops

Add flush range call back in the kvm_x86_ops and platform can use it
to register its associated function. The parameter "kvm_tlb_range"
accepts a single range and flush list which contains a list of ranges.

Signed-off-by: Lan Tianyu <[email protected]>
---
Change since v1:
Change "end_gfn" to "pages" to aviod confusion as to whether
"end_gfn" is inclusive or exlusive.
---
arch/x86/include/asm/kvm_host.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8e90488c3d56..a6b77978502e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -402,6 +402,12 @@ struct kvm_mmu {
u64 pdptrs[4]; /* pae */
};

+struct kvm_tlb_range {
+ u64 start_gfn;
+ u64 pages;
+ struct list_head *flush_list;
+};
+
enum pmc_type {
KVM_PMC_GP = 0,
KVM_PMC_FIXED,
@@ -991,6 +997,8 @@ struct kvm_x86_ops {

void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
int (*tlb_remote_flush)(struct kvm *kvm);
+ int (*tlb_remote_flush_with_range)(struct kvm *kvm,
+ struct kvm_tlb_range *range);

/*
* Flush any TLB entries associated with the given GVA.
--
2.14.4

2018-09-18 03:19:10

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 3/13] KVM: Replace old tlb flush function with new one to flush a specified range.

This patch is to replace kvm_flush_remote_tlbs() with kvm_flush_
remote_tlbs_with_address() in some functions without logic change.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 33 ++++++++++++++++++++++-----------
arch/x86/kvm/paging_tmpl.h | 3 ++-
2 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ac3b748f58f8..822e170881a4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1482,8 +1482,12 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)

static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
{
- if (__drop_large_spte(vcpu->kvm, sptep))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ if (__drop_large_spte(vcpu->kvm, sptep)) {
+ struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+ KVM_PAGES_PER_HPAGE(sp->role.level));
+ }
}

/*
@@ -1770,7 +1774,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
}

if (need_flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);

return 0;
}
@@ -1951,7 +1955,8 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);

kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, 0);
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+ KVM_PAGES_PER_HPAGE(sp->role.level));
}

int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
@@ -2467,7 +2472,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
account_shadowed(vcpu->kvm, sp);
if (level == PT_PAGE_TABLE_LEVEL &&
rmap_write_protect(vcpu, gfn))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);

if (level > PT_PAGE_TABLE_LEVEL && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
@@ -2587,7 +2592,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
return;

drop_parent_pte(child, sptep);
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
}
}

@@ -3011,8 +3016,10 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
ret = RET_PF_EMULATE;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
+
if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
+ KVM_PAGES_PER_HPAGE(level));

if (unlikely(is_mmio_spte(*sptep)))
ret = RET_PF_EMULATE;
@@ -5621,7 +5628,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
* on PT_WRITABLE_MASK anymore.
*/
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}

static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
@@ -5685,7 +5693,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
* dirty_bitmap.
*/
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);

@@ -5703,7 +5712,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
lockdep_assert_held(&kvm->slots_lock);

if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);

@@ -5720,7 +5730,8 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,

/* see kvm_mmu_slot_leaf_clear_dirty */
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->npages);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 14ffd973df54..708a5e44861a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -893,7 +893,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);

if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm,
+ sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));

if (!rmap_can_add(vcpu))
break;
--
2.14.4

2018-09-18 03:19:23

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()

This patch is to flush tlb directly in the kvm_handle_hva_range()
when range flush is available.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 822e170881a4..dfd2a0710417 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1888,6 +1888,13 @@ static int kvm_handle_hva_range(struct kvm *kvm,
&iterator)
ret |= handler(kvm, iterator.rmap, memslot,
iterator.gfn, iterator.level, data);
+
+ if (ret && kvm_available_flush_tlb_with_range()) {
+ kvm_flush_remote_tlbs_with_address(kvm,
+ gfn_start,
+ gfn_end - gfn_start);
+ ret = 0;
+ }
}
}

--
2.14.4

2018-09-18 03:19:37

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 5/13] KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()

Originally, flush tlb is done by slot_handle_level_range(). This patch
is to flush tlb directly in the kvm_zap_gfn_range() when range
flush is available.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index dfd2a0710417..56538fa6c017 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5578,6 +5578,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
+ bool flush = false;
int i;

spin_lock(&kvm->mmu_lock);
@@ -5585,18 +5586,27 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
slots = __kvm_memslots(kvm, i);
kvm_for_each_memslot(memslot, slots) {
gfn_t start, end;
+ bool flush_tlb = true;

start = max(gfn_start, memslot->base_gfn);
end = min(gfn_end, memslot->base_gfn + memslot->npages);
if (start >= end)
continue;

- slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
- PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
- start, end - 1, true);
+ if (kvm_available_flush_tlb_with_range())
+ flush_tlb = false;
+
+ flush = slot_handle_level_range(kvm, memslot,
+ kvm_zap_rmapp, PT_PAGE_TABLE_LEVEL,
+ PT_MAX_HUGEPAGE_LEVEL, start,
+ end - 1, flush_tlb);
}
}

+ if (flush && kvm_available_flush_tlb_with_range())
+ kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
+ gfn_end - gfn_start + 1);
+
spin_unlock(&kvm->mmu_lock);
}

--
2.14.4

2018-09-18 03:19:48

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 6/13] KVM/MMU: Flush tlb directly in kvm_mmu_zap_collapsible_spte()

kvm_mmu_zap_collapsible_spte() returns flush request to the
slot_handle_leaf() and the latter does flush on demand. When
range flush is available, make kvm_mmu_zap_collapsible_spte()
to flush tlb with range directly to avoid returning range back
to slot_handle_leaf().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 56538fa6c017..85a81a62e0a7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5674,7 +5674,13 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
!kvm_is_reserved_pfn(pfn) &&
PageTransCompoundMap(pfn_to_page(pfn))) {
drop_spte(kvm, sptep);
- need_tlb_flush = 1;
+
+ if (kvm_available_flush_tlb_with_range())
+ kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+ KVM_PAGES_PER_HPAGE(sp->role.level));
+ else
+ need_tlb_flush = 1;
+
goto restart;
}
}
--
2.14.4

2018-09-18 03:20:01

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 9/13] KVM/MMU: Replace tlb flush function with range list flush function

This patch is to use range list flush function in the
mmu_sync_children(), kvm_mmu_commit_zap_page() and
FNAME(sync_page)().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 26 +++++++++++++++++++++++---
arch/x86/kvm/paging_tmpl.h | 5 ++++-
2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8f27cb8c3989..0390e67715ee 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1092,6 +1092,13 @@ static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot,
}
}

+static void kvm_mmu_queue_flush_request(struct kvm_mmu_page *sp,
+ struct list_head *flush_list)
+{
+ if (sp->sptep && is_last_spte(*sp->sptep, sp->role.level))
+ list_add(&sp->flush_link, flush_list);
+}
+
void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
{
update_gfn_disallow_lpage_count(slot, gfn, 1);
@@ -2369,12 +2376,16 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,

while (mmu_unsync_walk(parent, &pages)) {
bool protected = false;
+ LIST_HEAD(flush_list);

- for_each_sp(pages, sp, parents, i)
+ for_each_sp(pages, sp, parents, i) {
protected |= rmap_write_protect(vcpu, sp->gfn);
+ kvm_mmu_queue_flush_request(sp, &flush_list);
+ }

if (protected) {
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_list(vcpu->kvm,
+ &flush_list);
flush = false;
}

@@ -2710,6 +2721,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
struct list_head *invalid_list)
{
struct kvm_mmu_page *sp, *nsp;
+ LIST_HEAD(flush_list);

if (list_empty(invalid_list))
return;
@@ -2723,7 +2735,15 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
* In addition, kvm_flush_remote_tlbs waits for all vcpus to exit
* guest mode and/or lockless shadow page table walks.
*/
- kvm_flush_remote_tlbs(kvm);
+ if (kvm_available_flush_tlb_with_range()) {
+ list_for_each_entry(sp, invalid_list, link)
+ kvm_mmu_queue_flush_request(sp, &flush_list);
+
+ if (!list_empty(&flush_list))
+ kvm_flush_remote_tlbs_with_list(kvm, &flush_list);
+ } else {
+ kvm_flush_remote_tlbs(kvm);
+ }

list_for_each_entry_safe(sp, nsp, invalid_list, link) {
WARN_ON(!sp->role.invalid || sp->root_count);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 5cbaf7c4a729..a5f967e81429 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -972,6 +972,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
bool host_writable;
gpa_t first_pte_gpa;
int set_spte_ret = 0;
+ LIST_HEAD(flush_list);

/* direct kvm_mmu_page can not be unsync. */
BUG_ON(sp->role.direct);
@@ -1032,10 +1033,12 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
pte_access, PT_PAGE_TABLE_LEVEL,
gfn, spte_to_pfn(sp->spt[i]),
true, false, host_writable);
+ if (set_spte_ret && kvm_available_flush_tlb_with_range())
+ kvm_mmu_queue_flush_request(sp, &flush_list);
}

if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_list(vcpu->kvm, &flush_list);

return nr_present;
}
--
2.14.4

2018-09-18 03:20:13

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 2/13] KVM/MMU: Add tlb flush with range helper function

This patch is to add wrapper functions for tlb_remote_flush_with_range
callback.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c67f09086378..ac3b748f58f8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -253,6 +253,54 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
static union kvm_mmu_page_role
kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);

+
+static inline bool kvm_available_flush_tlb_with_range(void)
+{
+ return kvm_x86_ops->tlb_remote_flush_with_range;
+}
+
+static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range)
+{
+ int ret = -ENOTSUPP;
+
+ if (range && kvm_x86_ops->tlb_remote_flush_with_range) {
+ /*
+ * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
+ * kvm_make_all_cpus_request.
+ */
+ long dirty_count = smp_load_acquire(&kvm->tlbs_dirty);
+
+ ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range);
+ cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
+ }
+
+ if (ret)
+ kvm_flush_remote_tlbs(kvm);
+}
+
+static void kvm_flush_remote_tlbs_with_list(struct kvm *kvm,
+ struct list_head *flush_list)
+{
+ struct kvm_tlb_range range;
+
+ range.flush_list = flush_list;
+
+ kvm_flush_remote_tlbs_with_range(kvm, &range);
+}
+
+static void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
+ u64 start_gfn, u64 pages)
+{
+ struct kvm_tlb_range range;
+
+ range.start_gfn = start_gfn;
+ range.pages = pages;
+ range.flush_list = NULL;
+
+ kvm_flush_remote_tlbs_with_range(kvm, &range);
+}
+
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
{
BUG_ON((mmio_mask & mmio_value) != mmio_value);
--
2.14.4

2018-09-18 03:20:13

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb
with specified ranges. This patch is to add the hypercall support.

Signed-off-by: Lan Tianyu <[email protected]>
---
Change since v1:
Add hyperv tlb flush struct to avoid use kvm tlb flush struct
in the hyperv file.
---
arch/x86/hyperv/nested.c | 103 +++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 17 ++++++
arch/x86/include/asm/mshyperv.h | 16 ++++++
3 files changed, 136 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index b8e60cc50461..40ddbfd54573 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -7,15 +7,32 @@
*
* Author : Lan Tianyu <[email protected]>
*/
+#define pr_fmt(fmt) "Hyper-V: " fmt

#include <linux/types.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/tlbflush.h>
+#include <asm/kvm_host.h>

#include <asm/trace/hyperv.h>

+/*
+ * MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited
+ * by the bitwidth of "additional_pages" in union hv_gpa_page_range.
+ */
+#define MAX_FLUSH_PAGES (2048)
+
+/*
+ * All input flush parameters are in single page. The max flush count
+ * is equal with how many entries of union hv_gpa_page_range can be
+ * populated in the input parameter page. MAX_FLUSH_REP_COUNT
+ * = (4096 - 16) / 8. (“Page Size” - "Address Space" - "Flags") /
+ * "GPA Range".
+ */
+#define MAX_FLUSH_REP_COUNT (510)
+
int hyperv_flush_guest_mapping(u64 as)
{
struct hv_guest_mapping_flush **flush_pcpu;
@@ -54,3 +71,89 @@ int hyperv_flush_guest_mapping(u64 as)
return ret;
}
EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
+
+static int fill_flush_list(union hv_gpa_page_range gpa_list[],
+ int offset, u64 start_gfn, u64 pages)
+{
+ int gpa_n = offset;
+ u64 cur = start_gfn;
+ u64 additional_pages;
+
+ do {
+ if (gpa_n >= MAX_FLUSH_REP_COUNT) {
+ pr_warn("Request exceeds HvFlushGuestList max flush count.");
+ return -ENOSPC;
+ }
+
+ if (pages > MAX_FLUSH_PAGES) {
+ additional_pages = MAX_FLUSH_PAGES - 1;
+ pages -= MAX_FLUSH_PAGES;
+ } else {
+ additional_pages = pages - 1;
+ pages = 0;
+ }
+
+ gpa_list[gpa_n].page.additional_pages = additional_pages;
+ gpa_list[gpa_n].page.largepage = false;
+ gpa_list[gpa_n].page.basepfn = cur;
+
+ cur += additional_pages + 1;
+ gpa_n++;
+ } while (pages > 0);
+
+ return gpa_n;
+}
+
+int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range)
+{
+ struct hv_guest_mapping_flush_list **flush_pcpu;
+ struct hv_guest_mapping_flush_list *flush;
+ u64 status = 0;
+ unsigned long flags;
+ int ret = -ENOTSUPP;
+ int gpa_n = 0;
+
+ if (!hv_hypercall_pg)
+ goto fault;
+
+ local_irq_save(flags);
+
+ flush_pcpu = (struct hv_guest_mapping_flush_list **)
+ this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ flush = *flush_pcpu;
+ if (unlikely(!flush)) {
+ local_irq_restore(flags);
+ goto fault;
+ }
+
+ flush->address_space = as;
+ flush->flags = 0;
+
+ if (!range->flush_list)
+ gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
+ range->start_gfn, range->pages);
+ else if (range->parse_flush_list_func)
+ gpa_n = range->parse_flush_list_func(flush->gpa_list, gpa_n,
+ range->flush_list, fill_flush_list);
+ else
+ gpa_n = -1;
+
+ if (gpa_n < 0) {
+ local_irq_restore(flags);
+ goto fault;
+ }
+
+ status = hv_do_rep_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST,
+ gpa_n, 0, flush, NULL);
+
+ local_irq_restore(flags);
+
+ if (!(status & HV_HYPERCALL_RESULT_MASK))
+ ret = 0;
+ else
+ ret = status;
+fault:
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index e977b6b3a538..512f22b49999 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -353,6 +353,7 @@ struct hv_tsc_emulation_status {
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
+#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0

#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
@@ -750,6 +751,22 @@ struct hv_guest_mapping_flush {
u64 flags;
};

+/* HvFlushGuestPhysicalAddressList hypercall */
+union hv_gpa_page_range {
+ u64 address_space;
+ struct {
+ u64 additional_pages:11;
+ u64 largepage:1;
+ u64 basepfn:52;
+ } page;
+};
+
+struct hv_guest_mapping_flush_list {
+ u64 address_space;
+ u64 flags;
+ union hv_gpa_page_range gpa_list[];
+};
+
/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
struct hv_tlb_flush {
u64 address_space;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f37704497d8f..19f49fbcf94d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -22,6 +22,16 @@ struct ms_hyperv_info {

extern struct ms_hyperv_info ms_hyperv;

+struct hyperv_tlb_range {
+ u64 start_gfn;
+ u64 pages;
+ struct list_head *flush_list;
+ int (*parse_flush_list_func)(union hv_gpa_page_range gpa_list[],
+ int offset, struct list_head *flush_list,
+ int (*fill_flush_list)(union hv_gpa_page_range gpa_list[],
+ int offset, u64 start_gfn, u64 end_gfn));
+};
+
/*
* Generate the guest ID.
*/
@@ -348,6 +358,7 @@ void set_hv_tscchange_cb(void (*cb)(void));
void clear_hv_tscchange_cb(void);
void hyperv_stop_tsc_emulation(void);
int hyperv_flush_guest_mapping(u64 as);
+int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range);

#ifdef CONFIG_X86_64
void hv_apic_init(void);
@@ -368,6 +379,11 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
return NULL;
}
static inline int hyperv_flush_guest_mapping(u64 as) { return -1; }
+static inline int hyperv_flush_guest_mapping_range(u64 as,
+ struct hyperv_tlb_range *range)
+{
+ return -1;
+}
#endif /* CONFIG_HYPERV */

#ifdef CONFIG_HYPERV_TSCPAGE
--
2.14.4

2018-09-18 03:20:14

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 11/13] x86/Hyper-v: Add trace in the hyperv_nested_flush_guest_mapping_range()

This patch is to trace log in the hyperv_nested_flush_
guest_mapping_range().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/hyperv/nested.c | 1 +
arch/x86/include/asm/trace/hyperv.h | 14 ++++++++++++++
2 files changed, 15 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index 40ddbfd54573..ae7181c6ede4 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -154,6 +154,7 @@ int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range)
else
ret = status;
fault:
+ trace_hyperv_nested_flush_guest_mapping_range(as, ret);
return ret;
}
EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
index 2e6245a023ef..ace464f09681 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -42,6 +42,20 @@ TRACE_EVENT(hyperv_nested_flush_guest_mapping,
TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
);

+TRACE_EVENT(hyperv_nested_flush_guest_mapping_range,
+ TP_PROTO(u64 as, int ret),
+ TP_ARGS(as, ret),
+
+ TP_STRUCT__entry(
+ __field(u64, as)
+ __field(int, ret)
+ ),
+ TP_fast_assign(__entry->as = as;
+ __entry->ret = ret;
+ ),
+ TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
+ );
+
TRACE_EVENT(hyperv_send_ipi_mask,
TP_PROTO(const struct cpumask *cpus,
int vector),
--
2.14.4

2018-09-18 03:20:23

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 12/13] KVM/VMX: Change hv flush logic when ept tables are mismatched.

If ept table pointers are mismatched, flushing tlb for each vcpus via
hv flush interface still helps to reduce vmexits which are triggered
by IPI and INEPT emulation.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/vmx.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 533a327372c8..2869c3e78168 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1557,7 +1557,8 @@ static void check_ept_pointer_match(struct kvm *kvm)

static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
{
- int ret;
+ struct kvm_vcpu *vcpu;
+ int ret = -ENOTSUPP, i;

spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);

@@ -1565,14 +1566,14 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
check_ept_pointer_match(kvm);

if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
- ret = -ENOTSUPP;
- goto out;
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ ret |= hyperv_flush_guest_mapping(
+ to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer);
+ } else {
+ ret = hyperv_flush_guest_mapping(
+ to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
}

- ret = hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
-
-out:
spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
return ret;
}
--
2.14.4

2018-09-18 03:20:41

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 13/13] KVM/VMX: Add hv tlb range flush support

This patch is to register tlb_remote_flush_with_range callback with
hv tlb range flush interface.

Signed-off-by: Lan Tianyu <[email protected]>
---
Change since v1:
Pass flush range with new hyper-v tlb flush struct rather
than KVM tlb flush struct.
---
arch/x86/kvm/vmx.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2869c3e78168..70e1f916bfc9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1555,7 +1555,43 @@ static void check_ept_pointer_match(struct kvm *kvm)
to_kvm_vmx(kvm)->ept_pointers_match = EPT_POINTERS_MATCH;
}

-static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
+int kvm_parse_flush_list_func(union hv_gpa_page_range gpa_list[],
+ int offset, struct list_head *flush_list,
+ int (*fill_flush_list)(union hv_gpa_page_range gpa_list[],
+ int offset, u64 start_gfn, u64 end_gfn))
+{
+ struct kvm_mmu_page *sp;
+
+ list_for_each_entry(sp, flush_list,
+ flush_link) {
+ offset = fill_flush_list(gpa_list, offset,
+ sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+ }
+
+ return offset;
+}
+
+static inline int __hv_remote_flush_tlb_with_range(struct kvm *kvm,
+ struct kvm_vcpu *vcpu, struct kvm_tlb_range *range)
+{
+ u64 ept_pointer = to_vmx(vcpu)->ept_pointer;
+ struct hyperv_tlb_range flush_range;
+
+ if (range) {
+ flush_range.start_gfn = range->start_gfn;
+ flush_range.pages = range->pages;
+ flush_range.flush_list = range->flush_list;
+ flush_range.parse_flush_list_func = kvm_parse_flush_list_func;
+
+ return hyperv_flush_guest_mapping_range(ept_pointer,
+ &flush_range);
+ } else {
+ return hyperv_flush_guest_mapping(ept_pointer);
+ }
+}
+
+static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range)
{
struct kvm_vcpu *vcpu;
int ret = -ENOTSUPP, i;
@@ -1567,16 +1603,21 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)

if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
kvm_for_each_vcpu(i, vcpu, kvm)
- ret |= hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer);
+ ret |= __hv_remote_flush_tlb_with_range(
+ kvm, vcpu, range);
} else {
- ret = hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
+ ret = __hv_remote_flush_tlb_with_range(kvm,
+ kvm_get_vcpu(kvm, 0), range);
}

spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
return ret;
}
+
+static int hv_remote_flush_tlb(struct kvm *kvm)
+{
+ return hv_remote_flush_tlb_with_range(kvm, NULL);
+}
#else /* !IS_ENABLED(CONFIG_HYPERV) */
static inline void evmcs_write64(unsigned long field, u64 value) {}
static inline void evmcs_write32(unsigned long field, u32 value) {}
@@ -7918,8 +7959,11 @@ static __init int hardware_setup(void)

#if IS_ENABLED(CONFIG_HYPERV)
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
- && enable_ept)
- kvm_x86_ops->tlb_remote_flush = vmx_hv_remote_flush_tlb;
+ && enable_ept) {
+ kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+ kvm_x86_ops->tlb_remote_flush_with_range =
+ hv_remote_flush_tlb_with_range;
+ }
#endif

if (!cpu_has_vmx_ple()) {
--
2.14.4

2018-09-18 03:21:09

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 7/13] KVM: Add flush_link and parent_pte in the struct kvm_mmu_page

PV EPT tlb flush function will accept a list of flush ranges and
use struct kvm_mmu_page as the list entry.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a6b77978502e..c96bc4cbe4b7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -279,6 +279,7 @@ struct kvm_rmap_head {

struct kvm_mmu_page {
struct list_head link;
+ struct list_head flush_link;
struct hlist_node hash_link;

/*
--
2.14.4

2018-09-18 03:21:11

by Tianyu Lan

[permalink] [raw]

Subject: [PATCH V2 8/13] KVM: Add spte's point in the struct kvm_mmu_page

It's necessary to check whether mmu page is last or large page when add
mmu page into flush list. "spte" is needed for such check and so add
spte point in the struct kvm_mmu_page.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 5 +++++
arch/x86/kvm/paging_tmpl.h | 2 ++
3 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c96bc4cbe4b7..d42d96e637b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -296,6 +296,7 @@ struct kvm_mmu_page {
int root_count; /* Currently serving as active root */
unsigned int unsync_children;
struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
+ u64 *sptep;

/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */
unsigned long mmu_valid_gen;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85a81a62e0a7..8f27cb8c3989 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3162,6 +3162,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
pseudo_gfn = base_addr >> PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
iterator.level - 1, 1, ACC_ALL);
+ sp->sptep = iterator.sptep;

link_shadow_page(vcpu, iterator.sptep, sp);
}
@@ -3599,6 +3600,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
sp = kvm_mmu_get_page(vcpu, 0, 0,
vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = __pa(sp->spt);
} else if (vcpu->arch.mmu.shadow_root_level == PT32E_ROOT_LEVEL) {
@@ -3615,6 +3617,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
}
@@ -3655,6 +3658,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.shadow_root_level, 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = root;
return 0;
@@ -3692,6 +3696,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);

vcpu->arch.mmu.pae_root[i] = root | pm_mask;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 708a5e44861a..5cbaf7c4a729 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -632,6 +632,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
table_gfn = gw->table_gfn[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
+ sp->sptep = it.sptep;
}

/*
@@ -662,6 +663,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
true, direct_access);
+ sp->sptep = it.sptep;
link_shadow_page(vcpu, it.sptep, sp);
}

--
2.14.4

2018-09-19 16:37:28

by Michael Kelley (EOSG)

[permalink] [raw]

Subject: RE: [PATCH V2 2/13] KVM/MMU: Add tlb flush with range helper function

From: Tianyu Lan Sent: Monday, September 17, 2018 8:18 PM
>
> +static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
> + struct kvm_tlb_range *range)
> +{
> + int ret = -ENOTSUPP;
> +
> + if (range && kvm_x86_ops->tlb_remote_flush_with_range) {
> + /*
> + * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
> + * kvm_make_all_cpus_request.
> + */
> + long dirty_count = smp_load_acquire(&kvm->tlbs_dirty);
> +
> + ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range);
> + cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> + }

The comment and the code that manipulates kvm->tlbs_dirty appears
to have been copied from kvm_flush_remote_tlbs(). But the above
code doesn't call kvm_make_all_cpus_request(). I haven't traced
all the details, but it seems like the comment should be updated,
or the code isn't needed.

Michael

2018-09-19 16:38:58

by Michael Kelley (EOSG)

[permalink] [raw]

Subject: RE: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()

From: Tianyu Lan Sent: Monday, September 17, 2018 8:19 PM
> +
> + if (ret && kvm_available_flush_tlb_with_range()) {
> + kvm_flush_remote_tlbs_with_address(kvm,
> + gfn_start,
> + gfn_end - gfn_start);

Does the above need to be gfn_end - gfn_start + 1?

> + ret = 0;
> + }

Michael

2018-09-19 17:20:17

by Michael Kelley (EOSG)

[permalink] [raw]

Subject: RE: [PATCH V2 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

From: Tianyu Lan Sent: Monday, September 17, 2018 8:19 PM
>
> #include <linux/types.h>
> #include <asm/hyperv-tlfs.h>
> #include <asm/mshyperv.h>
> #include <asm/tlbflush.h>
> +#include <asm/kvm_host.h>

Hopefully asm/kvm_host.h does not need to be #included, given
the new code structure.

>
> #include <asm/trace/hyperv.h>
>
> +/*
> + * MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited
> + * by the bitwidth of "additional_pages" in union hv_gpa_page_range.
> + */
> +#define MAX_FLUSH_PAGES (2048)
> +
> +/*
> + * All input flush parameters are in single page. The max flush count
> + * is equal with how many entries of union hv_gpa_page_range can be
> + * populated in the input parameter page. MAX_FLUSH_REP_COUNT
> + * = (4096 - 16) / 8. (“Page Size” - "Address Space" - "Flags") /
> + * "GPA Range".
> + */
> +#define MAX_FLUSH_REP_COUNT (510)
> +

I would recommend putting the above two definitions in
hyperv-tlfs.h. They are directly tied to the data structures defined
by Hyper-V in the TLFS. Put MAX_FLUSH_PAGES immediately after
the definition for hv_gpa_page_range so that the dependency is
obvious.

For MAX_FLUSH_REP_COUNT, can you do the calculation in
the #define rather than just in the comment? Alternatively, define
the gpa_list[] array to be of MAX_FLUSH_REP_COUNT size, and then
add a compile time assert that the size of struct
hv_guest_mapping_flush_list is exactly one page in size. It's just
a good way to use the compiler to help check for mistakes.

Also prefix them both with HV_ since they will be more
globally visible as part of hyperv-tlfs.h.

> int hyperv_flush_guest_mapping(u64 as)
> {
> struct hv_guest_mapping_flush **flush_pcpu;
> @@ -54,3 +71,89 @@ int hyperv_flush_guest_mapping(u64 as)
> return ret;
> }
> EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
> +
> +static int fill_flush_list(union hv_gpa_page_range gpa_list[],
> + int offset, u64 start_gfn, u64 pages)
> +{
> + int gpa_n = offset;
> + u64 cur = start_gfn;
> + u64 additional_pages;
> +
> + do {
> + if (gpa_n >= MAX_FLUSH_REP_COUNT) {
> + pr_warn("Request exceeds HvFlushGuestList max flush count.");
> + return -ENOSPC;

I wonder if the warning is really needed. When the error is returned up
through the higher levels of code, won't the higher levels just fallback to
the non-enlightened flush code? So nothing is actually goes wrong; it's just
that a slower code path gets taken. A comment about such expectations
might be helpful.

> + }
> +
> + if (pages > MAX_FLUSH_PAGES) {
> + additional_pages = MAX_FLUSH_PAGES - 1;
> + pages -= MAX_FLUSH_PAGES;
> + } else {
> + additional_pages = pages - 1;
> + pages = 0;
> + }

The above code is really doing:

additional_pages = min(pages, MAX_FLUSH_PAGES) - 1;
pages -= additional_pages + 1;

And you might want to move the decrement of 'pages' down to the
bottom of the loop where you update the other loop variables.

> +
> + gpa_list[gpa_n].page.additional_pages = additional_pages;
> + gpa_list[gpa_n].page.largepage = false;
> + gpa_list[gpa_n].page.basepfn = cur;
> +
> + cur += additional_pages + 1;
> + gpa_n++;
> + } while (pages > 0);
> +
> + return gpa_n;
> +}
> +
> +int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range)
> +{
> + struct hv_guest_mapping_flush_list **flush_pcpu;
> + struct hv_guest_mapping_flush_list *flush;
> + u64 status = 0;
> + unsigned long flags;
> + int ret = -ENOTSUPP;
> + int gpa_n = 0;
> +
> + if (!hv_hypercall_pg)
> + goto fault;
> +
> + local_irq_save(flags);
> +
> + flush_pcpu = (struct hv_guest_mapping_flush_list **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + flush = *flush_pcpu;
> + if (unlikely(!flush)) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + flush->address_space = as;
> + flush->flags = 0;
> +
> + if (!range->flush_list)
> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
> + range->start_gfn, range->pages);
> + else if (range->parse_flush_list_func)
> + gpa_n = range->parse_flush_list_func(flush->gpa_list, gpa_n,
> + range->flush_list, fill_flush_list);
> + else
> + gpa_n = -1;
> +
> + if (gpa_n < 0) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + status = hv_do_rep_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST,
> + gpa_n, 0, flush, NULL);
> +
> + local_irq_restore(flags);
> +
> + if (!(status & HV_HYPERCALL_RESULT_MASK))
> + ret = 0;
> + else
> + ret = status;
> +fault:
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index e977b6b3a538..512f22b49999 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -353,6 +353,7 @@ struct hv_tsc_emulation_status {
> #define HVCALL_POST_MESSAGE 0x005c
> #define HVCALL_SIGNAL_EVENT 0x005d
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> +#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
>
> #define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
> #define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
> @@ -750,6 +751,22 @@ struct hv_guest_mapping_flush {
> u64 flags;
> };
>
> +/* HvFlushGuestPhysicalAddressList hypercall */
> +union hv_gpa_page_range {
> + u64 address_space;
> + struct {
> + u64 additional_pages:11;
> + u64 largepage:1;
> + u64 basepfn:52;
> + } page;
> +};
> +
> +struct hv_guest_mapping_flush_list {
> + u64 address_space;
> + u64 flags;
> + union hv_gpa_page_range gpa_list[];
> +};
> +
> /* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
> struct hv_tlb_flush {
> u64 address_space;
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index f37704497d8f..19f49fbcf94d 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -22,6 +22,16 @@ struct ms_hyperv_info {
>
> extern struct ms_hyperv_info ms_hyperv;
>
> +struct hyperv_tlb_range {
> + u64 start_gfn;
> + u64 pages;
> + struct list_head *flush_list;
> + int (*parse_flush_list_func)(union hv_gpa_page_range gpa_list[],
> + int offset, struct list_head *flush_list,
> + int (*fill_flush_list)(union hv_gpa_page_range gpa_list[],
> + int offset, u64 start_gfn, u64 end_gfn));
> +};
> +
> /*
> * Generate the guest ID.
> */
> @@ -348,6 +358,7 @@ void set_hv_tscchange_cb(void (*cb)(void));
> void clear_hv_tscchange_cb(void);
> void hyperv_stop_tsc_emulation(void);
> int hyperv_flush_guest_mapping(u64 as);
> +int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range);
>
> #ifdef CONFIG_X86_64
> void hv_apic_init(void);
> @@ -368,6 +379,11 @@ static inline struct hv_vp_assist_page
> *hv_get_vp_assist_page(unsigned int cpu)
> return NULL;
> }
> static inline int hyperv_flush_guest_mapping(u64 as) { return -1; }
> +static inline int hyperv_flush_guest_mapping_range(u64 as,
> + struct hyperv_tlb_range *range)
> +{
> + return -1;
> +}
> #endif /* CONFIG_HYPERV */
>
> #ifdef CONFIG_HYPERV_TSCPAGE
> --
> 2.14.4

2018-09-20 14:32:10

by Tianyu Lan

[permalink] [raw]

Subject: Re: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()

On 9/20/2018 12:08 AM, Michael Kelley (EOSG) wrote:
> From: Tianyu Lan Sent: Monday, September 17, 2018 8:19 PM
>> +
>> + if (ret && kvm_available_flush_tlb_with_range()) {
>> + kvm_flush_remote_tlbs_with_address(kvm,
>> + gfn_start,
>> + gfn_end - gfn_start);
>
> Does the above need to be gfn_end - gfn_start + 1?

The flush range depends on the input parameter frame start and frame end
of for_each_slot_rmap_range().

for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
PT_MAX_HUGEPAGE_LEVEL,
gfn_start, gfn_end - 1,
&iterator)
ret |= handler(kvm, iterator.rmap, memslot,
iterator.gfn, iterator.level, data);

The start is "gfn_start" and the end is "gfn_end - 1". The flush size is
(gfn_end - 1) - gfn_start + 1 = gfn_end - gfn_start.

>
>> + ret = 0;
>> + }
>
> Michael
>

2018-09-20 19:06:36

by Michael Kelley (EOSG)

[permalink] [raw]

Subject: RE: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()

From: Tianyu Lan Sent: Thursday, September 20, 2018 7:30 AM
> On 9/20/2018 12:08 AM, Michael Kelley (EOSG) wrote:
> > From: Tianyu Lan Sent: Monday, September 17, 2018 8:19 PM
> >> +
> >> + if (ret && kvm_available_flush_tlb_with_range()) {
> >> + kvm_flush_remote_tlbs_with_address(kvm,
> >> + gfn_start,
> >> + gfn_end - gfn_start);
> >
> > Does the above need to be gfn_end - gfn_start + 1?
>
> The flush range depends on the input parameter frame start and frame end
> of for_each_slot_rmap_range().
>
> for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
> PT_MAX_HUGEPAGE_LEVEL,
> gfn_start, gfn_end - 1,
> &iterator)
> ret |= handler(kvm, iterator.rmap, memslot,
> iterator.gfn, iterator.level, data);
>
>
> The start is "gfn_start" and the end is "gfn_end - 1". The flush size is
> (gfn_end - 1) - gfn_start + 1 = gfn_end - gfn_start.
>

Got it. I agree.

Michael