LinuxLists.cc - [PATCH 0/8] KVM: x86/mmu: ITLB multi-hit workaround fixes

2020-07-15 04:30:07

Subject: [PATCH 0/8] KVM: x86/mmu: ITLB multi-hit workaround fixes

Patch 1 is a minor fix for a very theoretical bug where KVM could skip
the final "commit zap" when recovering shadow pages for the NX huge
page mitigation.

Patch 2 is cleanup that's made possible by patch 1.

Patches 3-5 are the main course and fix bugs in the NX huge page
accounting where shadow pages are incorrectly added to the list of
disallowed huge pages. KVM doesn't actually check to see if the page
could actually have been a large page when adding to the disallowed list.
This result in what are effectively spurious zaps. The biggest issue is
likely with shadow pages in the upper levels, i.e. levels 3 and 4, as they
are either unlikely to be huge (1gb) or flat out can't be huge (512tb).
And because of the way KVM zaps, the upper levels will be zapped first,
i.e. KVM is likely zapping and rebuilding a decent number of its shadow
pages for zero benefit.

Ideally, patches 3-5 would be a single patch to ease backporting. In the
end, I decided the change is probably not suitable for stable as at worst
it creates an infrequent performance spike (assuming the admin isn't going
crazy with the recovery frequency), and it's far from straightforward or
risk free. Cramming everything into a single patch was a mess.

Patches 6-8 are cleanups in related code. The 'hlevel' name in particular
has been on my todo list for a while.

Sean Christopherson (8):
KVM: x86/mmu: Commit zap of remaining invalid pages when recovering
lpages
KVM: x86/mmu: Refactor the zap loop for recovering NX lpages
KVM: x86/mmu: Move "huge page disallowed" calculation into mapping
helpers
KVM: x86/mmu: Capture requested page level before NX huge page
workaround
KVM: x86/mmu: Account NX huge page disallowed iff huge page was
requested
KVM: x86/mmu: Rename 'hlevel' to 'level' in FNAME(fetch)
KVM: x86/mmu: Hoist ITLB multi-hit workaround check up a level
KVM: x86/mmu: Track write/user faults using bools

arch/x86/kvm/mmu/mmu.c | 58 +++++++++++++++++++++-------------
arch/x86/kvm/mmu/paging_tmpl.h | 39 ++++++++++++-----------
2 files changed, 57 insertions(+), 40 deletions(-)

--
2.26.0

2020-07-15 04:31:32

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 6/8] KVM: x86/mmu: Rename 'hlevel' to 'level' in FNAME(fetch)

Rename 'hlevel', which presumably stands for 'host level', to simply
'level' in FNAME(fetch). The variable hasn't tracked the host level for
quite some time.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 39578a1839ca4..35867a1a1ee89 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -636,7 +636,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
- int top_level, hlevel, req_level, ret;
+ int top_level, level, req_level, ret;
gfn_t base_gfn = gw->gfn;

direct_access = gw->pte_access;
@@ -682,8 +682,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
link_shadow_page(vcpu, it.sptep, sp);
}

- hlevel = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
- huge_page_disallowed, &req_level);
+ level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
+ huge_page_disallowed, &req_level);

trace_kvm_mmu_spte_requested(addr, gw->level, pfn);

@@ -694,10 +694,10 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
- disallowed_hugepage_adjust(it, gw->gfn, &pfn, &hlevel);
+ disallowed_hugepage_adjust(it, gw->gfn, &pfn, &level);

base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
- if (it.level == hlevel)
+ if (it.level == level)
break;

validate_direct_spte(vcpu, it.sptep, direct_access);
--
2.26.0

2020-07-15 04:32:11

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 4/8] KVM: x86/mmu: Capture requested page level before NX huge page workaround

Apply the "huge page disallowed" adjustment of the max level only after
capturing the original requested level. The requested level will be
used in a future patch to skip adding pages to the list of disallowed
huge pages if a huge page wasn't possible anyways, e.g. if the page
isn't mapped as a huge page in the host.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 22 +++++++++++++++-------
arch/x86/kvm/mmu/paging_tmpl.h | 8 +++-----
2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bbd7e8be2b936..974c9a89c2454 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3238,7 +3238,8 @@ static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn,
}

static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
- int max_level, kvm_pfn_t *pfnp)
+ int max_level, kvm_pfn_t *pfnp,
+ bool huge_page_disallowed, int *req_level)
{
struct kvm_memory_slot *slot;
struct kvm_lpage_info *linfo;
@@ -3246,6 +3247,8 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
kvm_pfn_t mask;
int level;

+ *req_level = PG_LEVEL_4K;
+
if (unlikely(max_level == PG_LEVEL_4K))
return PG_LEVEL_4K;

@@ -3270,7 +3273,14 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
if (level == PG_LEVEL_4K)
return level;

- level = min(level, max_level);
+ *req_level = level = min(level, max_level);
+
+ /*
+ * Enforce the iTLB multihit workaround after capturing the requested
+ * level, which will be used to do precise, accurate accounting.
+ */
+ if (huge_page_disallowed)
+ return PG_LEVEL_4K;

/*
* mmu_notifier_retry() was successful and mmu_lock is held, so
@@ -3316,17 +3326,15 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
struct kvm_shadow_walk_iterator it;
struct kvm_mmu_page *sp;
- int level, ret;
+ int level, req_level, ret;
gfn_t gfn = gpa >> PAGE_SHIFT;
gfn_t base_gfn = gfn;

if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
return RET_PF_RETRY;

- if (huge_page_disallowed)
- max_level = PG_LEVEL_4K;
-
- level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn);
+ level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+ huge_page_disallowed, &req_level);

trace_kvm_mmu_spte_requested(gpa, level, pfn);
for_each_shadow_entry(vcpu, gpa, it) {
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5536b2004dac8..b92d936c0900d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -636,7 +636,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
- int top_level, hlevel, ret;
+ int top_level, hlevel, req_level, ret;
gfn_t base_gfn = gw->gfn;

direct_access = gw->pte_access;
@@ -682,10 +682,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
link_shadow_page(vcpu, it.sptep, sp);
}

- if (huge_page_disallowed)
- max_level = PG_LEVEL_4K;
-
- hlevel = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn);
+ hlevel = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
+ huge_page_disallowed, &req_level);

trace_kvm_mmu_spte_requested(addr, gw->level, pfn);

--
2.26.0