2018-09-10 08:41:13

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 00/13] x86/KVM/Hyper-v: Add HV ept tlb range flush hypercall support in KVM

For nested memory virtualization, Hyper-v doesn't set write-protect
L1 hypervisor EPT page directory and page table node to track changes
while it relies on guest to tell it changes via HvFlushGuestAddressLlist
hypercall. HvFlushGuestAddressLlist hypercall provides a way to flush
EPT page table with ranges which are specified by L1 hypervisor.

If L1 hypervisor uses INVEPT or HvFlushGuestAddressSpace hypercall to
flush EPT tlb, Hyper-V will invalidate associated EPT shadow page table
and sync L1's EPT table when next EPT page fault is triggered.
HvFlushGuestAddressLlist hypercall helps to avoid such redundant EPT
page fault and synchronization of shadow page table.

Lan Tianyu (13):
KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
KVM/MMU: Add tlb flush with range helper function
KVM: Replace old tlb flush function with new one to flush a specified
range.
KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()
KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()
KVM/MMU: Flush tlb directly in kvm_mmu_zap_collapsible_spte()
KVM: Add flush_link and parent_pte in the struct kvm_mmu_page
KVM: Add spte's point in the struct kvm_mmu_page
KVM/MMU: Replace tlb flush function with range list flush function
x86/hyper-v: Add HvFlushGuestAddressList hypercall support
x86/Hyper-v: Add trace in the
hyperv_nested_flush_guest_mapping_range()
KVM/VMX: Change hv flush logic when ept tables are mismatched.
KVM/VMX: Add hv tlb range flush support

arch/x86/hyperv/nested.c | 111 +++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 17 +++++
arch/x86/include/asm/kvm_host.h | 10 +++
arch/x86/include/asm/mshyperv.h | 7 ++
arch/x86/include/asm/trace/hyperv.h | 14 ++++
arch/x86/kvm/mmu.c | 146 +++++++++++++++++++++++++++++++-----
arch/x86/kvm/paging_tmpl.h | 16 +++-
arch/x86/kvm/vmx.c | 41 +++++++---
8 files changed, 331 insertions(+), 31 deletions(-)

--
2.14.4


2018-09-10 08:39:58

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 1/13] KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops

Add flush range call back in the kvm_x86_ops and platform can use it
to register its associated function. The parameter "kvm_tlb_range"
accepts a single range and flush list which contains a list of ranges.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e12916e7c2fb..dcdf8cc16388 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -402,6 +402,12 @@ struct kvm_mmu {
u64 pdptrs[4]; /* pae */
};

+struct kvm_tlb_range {
+ u64 start_gfn;
+ u64 end_gfn;
+ struct list_head *flush_list;
+};
+
enum pmc_type {
KVM_PMC_GP = 0,
KVM_PMC_FIXED,
@@ -991,6 +997,8 @@ struct kvm_x86_ops {

void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
int (*tlb_remote_flush)(struct kvm *kvm);
+ int (*tlb_remote_flush_with_range)(struct kvm *kvm,
+ struct kvm_tlb_range *range);

/*
* Flush any TLB entries associated with the given GVA.
--
2.14.4

2018-09-10 08:40:15

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 2/13] KVM/MMU: Add tlb flush with range helper function

This patch is to add wrapper functions for tlb_remote_flush_with_range
callback.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 456d4d0f7eb7..8cf47befc0f2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -253,6 +253,54 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
static union kvm_mmu_page_role
kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);

+
+static inline bool kvm_available_flush_tlb_with_range(void)
+{
+ return kvm_x86_ops->tlb_remote_flush_with_range;
+}
+
+static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range)
+{
+ int ret = -ENOTSUPP;
+
+ if (range && kvm_x86_ops->tlb_remote_flush_with_range) {
+ /*
+ * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
+ * kvm_make_all_cpus_request.
+ */
+ long dirty_count = smp_load_acquire(&kvm->tlbs_dirty);
+
+ ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range);
+ cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
+ }
+
+ if (ret)
+ kvm_flush_remote_tlbs(kvm);
+}
+
+static void kvm_flush_remote_tlbs_with_list(struct kvm *kvm,
+ struct list_head *flush_list)
+{
+ struct kvm_tlb_range range;
+
+ range.flush_list = flush_list;
+
+ kvm_flush_remote_tlbs_with_range(kvm, &range);
+}
+
+static void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
+ u64 start_gfn, u64 end_gfn)
+{
+ struct kvm_tlb_range range;
+
+ range.start_gfn = start_gfn;
+ range.end_gfn = end_gfn;
+ range.flush_list = NULL;
+
+ kvm_flush_remote_tlbs_with_range(kvm, &range);
+}
+
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
{
BUG_ON((mmio_mask & mmio_value) != mmio_value);
--
2.14.4

2018-09-10 08:40:34

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()

This patch is to flush tlb directly in the kvm_handle_hva_range()
when range flush is available.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0ee310bad2c6..cadb6a0b5247 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1888,6 +1888,12 @@ static int kvm_handle_hva_range(struct kvm *kvm,
&iterator)
ret |= handler(kvm, iterator.rmap, memslot,
iterator.gfn, iterator.level, data);
+
+ if (ret && kvm_available_flush_tlb_with_range()) {
+ kvm_flush_remote_tlbs_with_address(kvm,
+ gfn_start, gfn_end - 1);
+ ret = 0;
+ }
}
}

--
2.14.4

2018-09-10 08:40:43

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 5/13] KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()

Originally, flush tlb is done by slot_handle_level_range(). This patch
is to flush tlb directly in the kvm_zap_gfn_range() when range
flush is available.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cadb6a0b5247..9ae5887c8d1c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5583,6 +5583,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
+ bool flush = false;
int i;

spin_lock(&kvm->mmu_lock);
@@ -5590,18 +5591,26 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
slots = __kvm_memslots(kvm, i);
kvm_for_each_memslot(memslot, slots) {
gfn_t start, end;
+ bool flush_tlb = true;

start = max(gfn_start, memslot->base_gfn);
end = min(gfn_end, memslot->base_gfn + memslot->npages);
if (start >= end)
continue;

- slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
- PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
- start, end - 1, true);
+ if (kvm_available_flush_tlb_with_range())
+ flush_tlb = false;
+
+ flush = slot_handle_level_range(kvm, memslot,
+ kvm_zap_rmapp, PT_PAGE_TABLE_LEVEL,
+ PT_MAX_HUGEPAGE_LEVEL, start,
+ end - 1, flush_tlb);
}
}

+ if (flush && kvm_available_flush_tlb_with_range())
+ kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
+
spin_unlock(&kvm->mmu_lock);
}

--
2.14.4

2018-09-10 08:40:47

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 3/13] KVM: Replace old tlb flush function with new one to flush a specified range.

This patch is to replace kvm_flush_remote_tlbs() with kvm_flush_
remote_tlbs_with_address() in some functions without logic change.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 34 +++++++++++++++++++++++-----------
arch/x86/kvm/paging_tmpl.h | 9 +++++++--
2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8cf47befc0f2..0ee310bad2c6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1482,8 +1482,12 @@ static bool __drop_large_spte(struct kvm *kvm, u64 *sptep)

static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
{
- if (__drop_large_spte(vcpu->kvm, sptep))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ if (__drop_large_spte(vcpu->kvm, sptep)) {
+ struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+ sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level) - 1);
+ }
}

/*
@@ -1770,7 +1774,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
}

if (need_flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, gfn, gfn);

return 0;
}
@@ -1956,7 +1960,8 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);

kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, 0);
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+ sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level) - 1);
}

int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
@@ -2472,7 +2477,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
account_shadowed(vcpu->kvm, sp);
if (level == PT_PAGE_TABLE_LEVEL &&
rmap_write_protect(vcpu, gfn))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, gfn);

if (level > PT_PAGE_TABLE_LEVEL && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
@@ -2592,7 +2597,8 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
return;

drop_parent_pte(child, sptep);
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn,
+ child->gfn);
}
}

@@ -3016,8 +3022,10 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access,
ret = RET_PF_EMULATE;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
+
if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
+ gfn + KVM_PAGES_PER_HPAGE(level) - 1);

if (unlikely(is_mmio_spte(*sptep)))
ret = RET_PF_EMULATE;
@@ -5626,7 +5634,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
* on PT_WRITABLE_MASK anymore.
*/
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->base_gfn + memslot->npages - 1);
}

static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
@@ -5690,7 +5699,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
* dirty_bitmap.
*/
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->base_gfn + memslot->npages - 1);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);

@@ -5708,7 +5718,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
lockdep_assert_held(&kvm->slots_lock);

if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->base_gfn + memslot->npages - 1);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);

@@ -5725,7 +5736,8 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,

/* see kvm_mmu_slot_leaf_clear_dirty */
if (flush)
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+ memslot->base_gfn + memslot->npages - 1);
}
EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 14ffd973df54..f45f1a5fbfe8 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -892,8 +892,13 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
pte_gpa = FNAME(get_level1_sp_gpa)(sp);
pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);

- if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
- kvm_flush_remote_tlbs(vcpu->kvm);
+ if (mmu_page_zap_pte(vcpu->kvm, sp, sptep)) {
+ u64 gfn_end = sp->gfn +
+ KVM_PAGES_PER_HPAGE(sp->role.level) - 1;
+
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm,
+ sp->gfn, gfn_end);
+ }

if (!rmap_can_add(vcpu))
break;
--
2.14.4

2018-09-10 08:41:08

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 7/13] KVM: Add flush_link and parent_pte in the struct kvm_mmu_page

PV EPT tlb flush function will accept a list of flush ranges and
use struct kvm_mmu_page as the list entry.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dcdf8cc16388..acdad302612a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -279,6 +279,7 @@ struct kvm_rmap_head {

struct kvm_mmu_page {
struct list_head link;
+ struct list_head flush_link;
struct hlist_node hash_link;

/*
--
2.14.4

2018-09-10 08:41:13

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 6/13] KVM/MMU: Flush tlb directly in kvm_mmu_zap_collapsible_spte()

kvm_mmu_zap_collapsible_spte() returns flush request to the
slot_handle_leaf() and the latter does flush on demand. When
range flush is available, make kvm_mmu_zap_collapsible_spte()
to flush tlb with range directly to avoid returning range back
to slot_handle_leaf().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9ae5887c8d1c..167ecfbab9bd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -5678,7 +5678,17 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
!kvm_is_reserved_pfn(pfn) &&
PageTransCompoundMap(pfn_to_page(pfn))) {
drop_spte(kvm, sptep);
- need_tlb_flush = 1;
+
+ if (kvm_available_flush_tlb_with_range()) {
+ u64 gfn_end = sp->gfn +
+ KVM_PAGES_PER_HPAGE(sp->role.level) - 1;
+
+ kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+ gfn_end);
+ } else {
+ need_tlb_flush = 1;
+ }
+
goto restart;
}
}
--
2.14.4

2018-09-10 08:41:20

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 8/13] KVM: Add spte's point in the struct kvm_mmu_page

It's necessary to check whether mmu page is last or large page when add
mmu page into flush list. "spte" is needed for such check and so add
spte point in the struct kvm_mmu_page.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 5 +++++
arch/x86/kvm/paging_tmpl.h | 2 ++
3 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index acdad302612a..03c7290f4187 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -296,6 +296,7 @@ struct kvm_mmu_page {
int root_count; /* Currently serving as active root */
unsigned int unsync_children;
struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
+ u64 *sptep;

/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */
unsigned long mmu_valid_gen;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 167ecfbab9bd..73e19ce589e7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3167,6 +3167,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
pseudo_gfn = base_addr >> PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
iterator.level - 1, 1, ACC_ALL);
+ sp->sptep = iterator.sptep;

link_shadow_page(vcpu, iterator.sptep, sp);
}
@@ -3604,6 +3605,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
sp = kvm_mmu_get_page(vcpu, 0, 0,
vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = __pa(sp->spt);
} else if (vcpu->arch.mmu.shadow_root_level == PT32E_ROOT_LEVEL) {
@@ -3620,6 +3622,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
}
@@ -3660,6 +3663,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.shadow_root_level, 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = root;
return 0;
@@ -3697,6 +3701,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
+ sp->sptep = NULL;
spin_unlock(&vcpu->kvm->mmu_lock);

vcpu->arch.mmu.pae_root[i] = root | pm_mask;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f45f1a5fbfe8..bb8c2cdf70c3 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -632,6 +632,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
table_gfn = gw->table_gfn[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
false, access);
+ sp->sptep = it.sptep;
}

/*
@@ -662,6 +663,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,

sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
true, direct_access);
+ sp->sptep = it.sptep;
link_shadow_page(vcpu, it.sptep, sp);
}

--
2.14.4

2018-09-10 08:41:25

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 9/13] KVM/MMU: Replace tlb flush function with range list flush function

This patch is to use range list flush function in the
mmu_sync_children(), kvm_mmu_commit_zap_page() and
FNAME(sync_page)().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 26 +++++++++++++++++++++++---
arch/x86/kvm/paging_tmpl.h | 5 ++++-
2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 73e19ce589e7..a071da797a15 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1092,6 +1092,13 @@ static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot,
}
}

+static void kvm_mmu_queue_flush_request(struct kvm_mmu_page *sp,
+ struct list_head *flush_list)
+{
+ if (sp->sptep && is_last_spte(*sp->sptep, sp->role.level))
+ list_add(&sp->flush_link, flush_list);
+}
+
void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn)
{
update_gfn_disallow_lpage_count(slot, gfn, 1);
@@ -2373,12 +2380,16 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,

while (mmu_unsync_walk(parent, &pages)) {
bool protected = false;
+ LIST_HEAD(flush_list);

- for_each_sp(pages, sp, parents, i)
+ for_each_sp(pages, sp, parents, i) {
protected |= rmap_write_protect(vcpu, sp->gfn);
+ kvm_mmu_queue_flush_request(sp, &flush_list);
+ }

if (protected) {
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_list(vcpu->kvm,
+ &flush_list);
flush = false;
}

@@ -2715,6 +2726,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
struct list_head *invalid_list)
{
struct kvm_mmu_page *sp, *nsp;
+ LIST_HEAD(flush_list);

if (list_empty(invalid_list))
return;
@@ -2728,7 +2740,15 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
* In addition, kvm_flush_remote_tlbs waits for all vcpus to exit
* guest mode and/or lockless shadow page table walks.
*/
- kvm_flush_remote_tlbs(kvm);
+ if (kvm_available_flush_tlb_with_range()) {
+ list_for_each_entry(sp, invalid_list, link)
+ kvm_mmu_queue_flush_request(sp, &flush_list);
+
+ if (!list_empty(&flush_list))
+ kvm_flush_remote_tlbs_with_list(kvm, &flush_list);
+ } else {
+ kvm_flush_remote_tlbs(kvm);
+ }

list_for_each_entry_safe(sp, nsp, invalid_list, link) {
WARN_ON(!sp->role.invalid || sp->root_count);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index bb8c2cdf70c3..aa450e0596a4 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -976,6 +976,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
bool host_writable;
gpa_t first_pte_gpa;
int set_spte_ret = 0;
+ LIST_HEAD(flush_list);

/* direct kvm_mmu_page can not be unsync. */
BUG_ON(sp->role.direct);
@@ -1036,10 +1037,12 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
pte_access, PT_PAGE_TABLE_LEVEL,
gfn, spte_to_pfn(sp->spt[i]),
true, false, host_writable);
+ if (set_spte_ret && kvm_available_flush_tlb_with_range())
+ kvm_mmu_queue_flush_request(sp, &flush_list);
}

if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH)
- kvm_flush_remote_tlbs(vcpu->kvm);
+ kvm_flush_remote_tlbs_with_list(vcpu->kvm, &flush_list);

return nr_present;
}
--
2.14.4

2018-09-10 08:41:30

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb
with specified ranges. This patch is to add the hypercall support.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/hyperv/nested.c | 110 +++++++++++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 17 ++++++
arch/x86/include/asm/mshyperv.h | 8 +++
3 files changed, 135 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index b8e60cc50461..bb716ed94320 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -7,15 +7,32 @@
*
* Author : Lan Tianyu <[email protected]>
*/
+#define pr_fmt(fmt) "Hyper-V: " fmt


#include <linux/types.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
#include <asm/tlbflush.h>
+#include <asm/kvm_host.h>

#include <asm/trace/hyperv.h>

+/*
+ * MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited
+ * by the bitwidth of "additional_pages" in union hv_gpa_page_range.
+ */
+#define MAX_FLUSH_PAGES (2048)
+
+/*
+ * All input flush parameters are in single page. The max flush count
+ * is equal with how many entries of union hv_gpa_page_range can be
+ * populated in the input parameter page. MAX_FLUSH_REP_COUNT
+ * = (4096 - 16) / 8. (“Page Size” - "Address Space" - "Flags") /
+ * "GPA Range".
+ */
+#define MAX_FLUSH_REP_COUNT (510)
+
int hyperv_flush_guest_mapping(u64 as)
{
struct hv_guest_mapping_flush **flush_pcpu;
@@ -54,3 +71,96 @@ int hyperv_flush_guest_mapping(u64 as)
return ret;
}
EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
+
+static inline int fill_flush_list(union hv_gpa_page_range gpa_list[],
+ int offset, u64 start_gfn, u64 end_gfn)
+{
+ int gpa_n = offset;
+ u64 cur = start_gfn;
+ u64 pages = end_gfn - start_gfn + 1;
+ u64 additional_pages;
+
+ if (end_gfn < start_gfn)
+ return -EINVAL;
+
+ do {
+ if (gpa_n == MAX_FLUSH_REP_COUNT) {
+ pr_warn("Request exceeds HvFlushGuestList max flush count.\n");
+ return -ENOSPC;
+ }
+
+ if (pages > MAX_FLUSH_PAGES) {
+ additional_pages = MAX_FLUSH_PAGES - 1;
+ pages -= MAX_FLUSH_PAGES;
+ } else {
+ additional_pages = pages - 1;
+ pages = 0;
+ }
+
+ gpa_list[gpa_n].page.additional_pages = additional_pages;
+ gpa_list[gpa_n].page.largepage = false;
+ gpa_list[gpa_n].page.basepfn = cur;
+
+ cur += additional_pages + 1;
+ gpa_n++;
+ } while (pages > 0);
+
+ return gpa_n;
+}
+
+int hyperv_flush_guest_mapping_range(u64 as, struct kvm_tlb_range *range)
+{
+ struct kvm_mmu_page *sp;
+ struct hv_guest_mapping_flush_list **flush_pcpu;
+ struct hv_guest_mapping_flush_list *flush;
+ u64 status = 0;
+ unsigned long flags;
+ int ret = -ENOTSUPP;
+ int gpa_n = 0;
+
+ if (!hv_hypercall_pg)
+ goto fault;
+
+ local_irq_save(flags);
+
+ flush_pcpu = (struct hv_guest_mapping_flush_list **)
+ this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ flush = *flush_pcpu;
+ if (unlikely(!flush)) {
+ local_irq_restore(flags);
+ goto fault;
+ }
+
+ flush->address_space = as;
+ flush->flags = 0;
+
+ if (!range->flush_list) {
+ gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
+ range->start_gfn, range->end_gfn);
+ } else {
+ list_for_each_entry(sp, range->flush_list,
+ flush_link) {
+ u64 end_gfn = sp->gfn +
+ KVM_PAGES_PER_HPAGE(sp->role.level) - 1;
+ gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
+ sp->gfn, end_gfn);
+ }
+ }
+
+ if (gpa_n < 0) {
+ local_irq_restore(flags);
+ goto fault;
+ }
+
+ status = hv_do_rep_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST,
+ gpa_n, 0, flush, NULL);
+
+ local_irq_restore(flags);
+
+ if (!(status & HV_HYPERCALL_RESULT_MASK))
+ ret = 0;
+fault:
+ return ret;
+}
+EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index e977b6b3a538..512f22b49999 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -353,6 +353,7 @@ struct hv_tsc_emulation_status {
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
+#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0

#define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE 0x00000001
#define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT 12
@@ -750,6 +751,22 @@ struct hv_guest_mapping_flush {
u64 flags;
};

+/* HvFlushGuestPhysicalAddressList hypercall */
+union hv_gpa_page_range {
+ u64 address_space;
+ struct {
+ u64 additional_pages:11;
+ u64 largepage:1;
+ u64 basepfn:52;
+ } page;
+};
+
+struct hv_guest_mapping_flush_list {
+ u64 address_space;
+ u64 flags;
+ union hv_gpa_page_range gpa_list[];
+};
+
/* HvFlushVirtualAddressSpace, HvFlushVirtualAddressList hypercalls */
struct hv_tlb_flush {
u64 address_space;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f37704497d8f..da68574404bf 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -22,6 +22,8 @@ struct ms_hyperv_info {

extern struct ms_hyperv_info ms_hyperv;

+struct kvm_tlb_range;
+
/*
* Generate the guest ID.
*/
@@ -348,6 +350,7 @@ void set_hv_tscchange_cb(void (*cb)(void));
void clear_hv_tscchange_cb(void);
void hyperv_stop_tsc_emulation(void);
int hyperv_flush_guest_mapping(u64 as);
+int hyperv_flush_guest_mapping_range(u64 as, struct kvm_tlb_range *range);

#ifdef CONFIG_X86_64
void hv_apic_init(void);
@@ -368,6 +371,11 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
return NULL;
}
static inline int hyperv_flush_guest_mapping(u64 as) { return -1; }
+static inline int hyperv_flush_guest_mapping_range(u64 as,
+ struct kvm_tlb_range *range)
+{
+ return -1;
+}
#endif /* CONFIG_HYPERV */

#ifdef CONFIG_HYPERV_TSCPAGE
--
2.14.4

2018-09-10 08:41:55

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 11/13] x86/Hyper-v: Add trace in the hyperv_nested_flush_guest_mapping_range()

This patch is to trace log in the hyperv_nested_flush_
guest_mapping_range().

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/hyperv/nested.c | 1 +
arch/x86/include/asm/trace/hyperv.h | 14 ++++++++++++++
2 files changed, 15 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index bb716ed94320..90e5eee1b410 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -161,6 +161,7 @@ int hyperv_flush_guest_mapping_range(u64 as, struct kvm_tlb_range *range)
if (!(status & HV_HYPERCALL_RESULT_MASK))
ret = 0;
fault:
+ trace_hyperv_nested_flush_guest_mapping_range(as, ret);
return ret;
}
EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/trace/hyperv.h b/arch/x86/include/asm/trace/hyperv.h
index 2e6245a023ef..ace464f09681 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -42,6 +42,20 @@ TRACE_EVENT(hyperv_nested_flush_guest_mapping,
TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
);

+TRACE_EVENT(hyperv_nested_flush_guest_mapping_range,
+ TP_PROTO(u64 as, int ret),
+ TP_ARGS(as, ret),
+
+ TP_STRUCT__entry(
+ __field(u64, as)
+ __field(int, ret)
+ ),
+ TP_fast_assign(__entry->as = as;
+ __entry->ret = ret;
+ ),
+ TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
+ );
+
TRACE_EVENT(hyperv_send_ipi_mask,
TP_PROTO(const struct cpumask *cpus,
int vector),
--
2.14.4

2018-09-10 08:42:13

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 13/13] KVM/VMX: Add hv tlb range flush support

This patch is to register tlb_remote_flush_with_range callback with
hv tlb range flush interface.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/vmx.c | 34 +++++++++++++++++++++++++++-------
1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3f9ccbafd0d8..ae904693b14f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1555,7 +1555,19 @@ static void check_ept_pointer_match(struct kvm *kvm)
to_kvm_vmx(kvm)->ept_pointers_match = EPT_POINTERS_MATCH;
}

-static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
+static inline int __hv_remote_flush_tlb_with_range(struct kvm *kvm,
+ struct kvm_vcpu *vcpu, struct kvm_tlb_range *range)
+{
+ u64 ept_pointer = to_vmx(vcpu)->ept_pointer;
+
+ if (range)
+ return hyperv_flush_guest_mapping_range(ept_pointer, range);
+ else
+ return hyperv_flush_guest_mapping(ept_pointer);
+}
+
+static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
+ struct kvm_tlb_range *range)
{
struct kvm_vcpu *vcpu;
int ret = -ENOTSUPP, i;
@@ -1567,16 +1579,21 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)

if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
kvm_for_each_vcpu(i, vcpu, kvm)
- ret |= hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer);
+ ret |= __hv_remote_flush_tlb_with_range(
+ kvm, vcpu, range);
} else {
- ret = hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
+ ret = __hv_remote_flush_tlb_with_range(kvm,
+ kvm_get_vcpu(kvm, 0), range);
}

spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
return ret;
}
+
+static int hv_remote_flush_tlb(struct kvm *kvm)
+{
+ return hv_remote_flush_tlb_with_range(kvm, NULL);
+}
#else /* !IS_ENABLED(CONFIG_HYPERV) */
static inline void evmcs_write64(unsigned long field, u64 value) {}
static inline void evmcs_write32(unsigned long field, u32 value) {}
@@ -7918,8 +7935,11 @@ static __init int hardware_setup(void)

#if IS_ENABLED(CONFIG_HYPERV)
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
- && enable_ept)
- kvm_x86_ops->tlb_remote_flush = vmx_hv_remote_flush_tlb;
+ && enable_ept) {
+ kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+ kvm_x86_ops->tlb_remote_flush_with_range =
+ hv_remote_flush_tlb_with_range;
+ }
#endif

if (!cpu_has_vmx_ple()) {
--
2.14.4

2018-09-10 08:42:49

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 12/13] KVM/VMX: Change hv flush logic when ept tables are mismatched.

If ept table pointers are mismatched, flushing tlb for each vcpus via
hv flush interface still helps to reduce vmexits which are triggered
by IPI and INEPT emulation.

Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/vmx.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f910d33858d9..3f9ccbafd0d8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1557,7 +1557,8 @@ static void check_ept_pointer_match(struct kvm *kvm)

static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
{
- int ret;
+ struct kvm_vcpu *vcpu;
+ int ret = -ENOTSUPP, i;

spin_lock(&to_kvm_vmx(kvm)->ept_pointer_lock);

@@ -1565,14 +1566,14 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
check_ept_pointer_match(kvm);

if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
- ret = -ENOTSUPP;
- goto out;
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ ret |= hyperv_flush_guest_mapping(
+ to_vmx(kvm_get_vcpu(kvm, i))->ept_pointer);
+ } else {
+ ret = hyperv_flush_guest_mapping(
+ to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
}

- ret = hyperv_flush_guest_mapping(
- to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer);
-
-out:
spin_unlock(&to_kvm_vmx(kvm)->ept_pointer_lock);
return ret;
}
--
2.14.4

2018-09-10 14:23:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 1/13] KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops

On Mon, 2018-09-10 at 08:38 +0000, Tianyu Lan wrote:
> Add flush range call back in the kvm_x86_ops and platform can use it
> to register its associated function. The parameter "kvm_tlb_range"
> accepts a single range and flush list which contains a list of ranges.
>
> Signed-off-by: Lan Tianyu <[email protected]>
> ---
>  arch/x86/include/asm/kvm_host.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e12916e7c2fb..dcdf8cc16388 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -402,6 +402,12 @@ struct kvm_mmu {
>   u64 pdptrs[4]; /* pae */
>  };
>  
> +struct kvm_tlb_range {
> + u64 start_gfn;
> + u64 end_gfn;

IMO this struct and all functions should pass around the number of pages
instead of end_gfn to avoid confusion as to whether end_gfn is inclusive
or exlusive.

> + struct list_head *flush_list;
> +};
> +
>  enum pmc_type {
>   KVM_PMC_GP = 0,
>   KVM_PMC_FIXED,
> @@ -991,6 +997,8 @@ struct kvm_x86_ops {
>  
>   void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
>   int  (*tlb_remote_flush)(struct kvm *kvm);
> + int  (*tlb_remote_flush_with_range)(struct kvm *kvm,
> + struct kvm_tlb_range *range);
>  
>   /*
>    * Flush any TLB entries associated with the given GVA.

2018-09-12 00:22:50

by Michael Kelley (EOSG)

[permalink] [raw]
Subject: RE: [PATCH 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

From: Tianyu Lan Sent: Monday, September 10, 2018 1:39 AM
> +
> +int hyperv_flush_guest_mapping_range(u64 as, struct kvm_tlb_range *range)

I'm really concerned about defining the Hyper-V function to flush
guest mappings in terms of a KVM struct definition. Your patch puts
this function in arch/x86/hyperv/nested.c. I haven't investigated all
the details, but on its face this approach seems like it would cause
trouble in the long run, and it doesn't support the case of a
hypervisor other than KVM running at L1.

I know that KVM code has taken a dependency on Hyper-V types and
code, but that's because KVM is emulating a lot of Hyper-V functionality
and it's taking advantage of Hyper-V enlightenments. Is there a top
level reason I haven't thought of for Hyper-V code to take a
dependency on KVM definitions? I would think we want Hyper-V
code to be generic, using Hyper-V data structure definitions. Then in
keeping with what's already been done, KVM code would use those
definitions where it needs to make calls to Hyper-V code.

> +{
> + struct kvm_mmu_page *sp;
> + struct hv_guest_mapping_flush_list **flush_pcpu;
> + struct hv_guest_mapping_flush_list *flush;
> + u64 status = 0;
> + unsigned long flags;
> + int ret = -ENOTSUPP;
> + int gpa_n = 0;
> +
> + if (!hv_hypercall_pg)
> + goto fault;
> +
> + local_irq_save(flags);
> +
> + flush_pcpu = (struct hv_guest_mapping_flush_list **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + flush = *flush_pcpu;
> + if (unlikely(!flush)) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + flush->address_space = as;
> + flush->flags = 0;
> +
> + if (!range->flush_list) {
> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
> + range->start_gfn, range->end_gfn);
> + } else {
> + list_for_each_entry(sp, range->flush_list,
> + flush_link) {
> + u64 end_gfn = sp->gfn +
> + KVM_PAGES_PER_HPAGE(sp->role.level) - 1;
> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
> + sp->gfn, end_gfn);
> + }

Per the previous comment, if this loop really needs to walk a KVM
data structure, look for a different way to organize things so that
the handling of KVM-specific data structures is in code that’s part
of KVM, rather than in Hyper-V code.

Michael

2018-09-12 13:32:29

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

Hi Michael:
Thanks for your review.

On 9/12/2018 8:22 AM, Michael Kelley (EOSG) wrote:
> From: Tianyu Lan Sent: Monday, September 10, 2018 1:39 AM
>> +
>> +int hyperv_flush_guest_mapping_range(u64 as, struct kvm_tlb_range *range)
>
> I'm really concerned about defining the Hyper-V function to flush
> guest mappings in terms of a KVM struct definition. Your patch puts
> this function in arch/x86/hyperv/nested.c. I haven't investigated all
> the details, but on its face this approach seems like it would cause
> trouble in the long run, and it doesn't support the case of a
> hypervisor other than KVM running at L1.
>
> I know that KVM code has taken a dependency on Hyper-V types and
> code, but that's because KVM is emulating a lot of Hyper-V functionality
> and it's taking advantage of Hyper-V enlightenments. Is there a top
> level reason I haven't thought of for Hyper-V code to take a
> dependency on KVM definitions? I would think we want Hyper-V
> code to be generic, using Hyper-V data structure definitions. Then in
> keeping with what's already been done, KVM code would use those
> definitions where it needs to make calls to Hyper-V code.
>

I think KVM is only one kernel-based hypervisor and is only caller of
nested hypercalls. So I reused KVM data structure in the new function.
I will make it more general in the next version.

>> +{
>> + struct kvm_mmu_page *sp;
>> + struct hv_guest_mapping_flush_list **flush_pcpu;
>> + struct hv_guest_mapping_flush_list *flush;
>> + u64 status = 0;
>> + unsigned long flags;
>> + int ret = -ENOTSUPP;
>> + int gpa_n = 0;
>> +
>> + if (!hv_hypercall_pg)
>> + goto fault;
>> +
>> + local_irq_save(flags);
>> +
>> + flush_pcpu = (struct hv_guest_mapping_flush_list **)
>> + this_cpu_ptr(hyperv_pcpu_input_arg);
>> +
>> + flush = *flush_pcpu;
>> + if (unlikely(!flush)) {
>> + local_irq_restore(flags);
>> + goto fault;
>> + }
>> +
>> + flush->address_space = as;
>> + flush->flags = 0;
>> +
>> + if (!range->flush_list) {
>> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
>> + range->start_gfn, range->end_gfn);
>> + } else {
>> + list_for_each_entry(sp, range->flush_list,
>> + flush_link) {
>> + u64 end_gfn = sp->gfn +
>> + KVM_PAGES_PER_HPAGE(sp->role.level) - 1;
>> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
>> + sp->gfn, end_gfn);
>> + }
>
> Per the previous comment, if this loop really needs to walk a KVM
> data structure, look for a different way to organize things so that
> the handling of KVM-specific data structures is in code that’s part
> of KVM, rather than in Hyper-V code.
>
> Michael
>

2018-09-12 13:41:49

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 1/13] KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops

Hi Sean:
Thanks for your review.

On 9/10/2018 10:21 PM, Sean Christopherson wrote:
> On Mon, 2018-09-10 at 08:38 +0000, Tianyu Lan wrote:
>> Add flush range call back in the kvm_x86_ops and platform can use it
>> to register its associated function. The parameter "kvm_tlb_range"
>> accepts a single range and flush list which contains a list of ranges.
>>
>> Signed-off-by: Lan Tianyu <[email protected]>
>> ---
>>  arch/x86/include/asm/kvm_host.h | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index e12916e7c2fb..dcdf8cc16388 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -402,6 +402,12 @@ struct kvm_mmu {
>>   u64 pdptrs[4]; /* pae */
>>  };
>>
>> +struct kvm_tlb_range {
>> + u64 start_gfn;
>> + u64 end_gfn;
>
> IMO this struct and all functions should pass around the number of pages
> instead of end_gfn to avoid confusion as to whether end_gfn is inclusive
> or exlusive.

That's good suggestion. Thanks.

>
>> + struct list_head *flush_list;
>> +};
>> +
>>  enum pmc_type {
>>   KVM_PMC_GP = 0,
>>   KVM_PMC_FIXED,
>> @@ -991,6 +997,8 @@ struct kvm_x86_ops {
>>
>>   void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
>>   int  (*tlb_remote_flush)(struct kvm *kvm);
>> + int  (*tlb_remote_flush_with_range)(struct kvm *kvm,
>> + struct kvm_tlb_range *range);
>>
>>   /*
>>    * Flush any TLB entries associated with the given GVA.