The current kvm page fault handlers passes around many arguments to the
functions. To simplify those arguments and local variables, introduce
a data structure, struct kvm_page_fault, to hold those arguments and
variables. struct kvm_page_fault is allocated on stack on the caller
of kvm fault handler, kvm_mmu_do_page_fault(), and passed around.
Later in the series, my patches are interleaved with David's work to
add the memory slot to the struct and avoid repeated lookups. Along the
way you will find some cleanups of functions with a ludicrous number of
arguments, so that they use struct kvm_page_fault as much as possible
or at least receive related information from a single argument. make_spte
in particular goes from 11 to 10 arguments (yeah I know) despite gaining
two for kvm_mmu_page and kvm_memory_slot.
This can be sometimes a bit debatable (for example struct kvm_mmu_page
is used a little more on the TDP MMU paths), but overall I think the
result is an improvement. For example the SET_SPTE_* constants go
away, and they absolutely didn't belong in the TDP MMU. But if you
disagree with some of the changes, please speak up loudly!
Testing: survives kvm-unit-tests on Intel with all of ept=0, ept=1
tdp_mmu=0, ept=1. Will do more before committing to it in kvm/next of
course.
Paolo
David Matlack (5):
KVM: x86/mmu: Fold rmap_recycle into rmap_add
KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault
KVM: x86/mmu: Avoid memslot lookup in page_fault_handle_page_track
KVM: x86/mmu: Avoid memslot lookup in rmap_add
KVM: x86/mmu: Avoid memslot lookup in make_spte and
mmu_try_to_unsync_pages
Paolo Bonzini (25):
KVM: MMU: pass unadulterated gpa to direct_page_fault
KVM: MMU: Introduce struct kvm_page_fault
KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault
KVM: MMU: change direct_page_fault() arguments to kvm_page_fault
KVM: MMU: change page_fault_handle_page_track() arguments to
kvm_page_fault
KVM: MMU: change kvm_faultin_pfn() arguments to kvm_page_fault
KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault
KVM: MMU: change __direct_map() arguments to kvm_page_fault
KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault
KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault
KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to
kvm_page_fault
KVM: MMU: change fast_page_fault() arguments to kvm_page_fault
KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault
KVM: MMU: change disallowed_hugepage_adjust() arguments to
kvm_page_fault
KVM: MMU: change tracepoints arguments to kvm_page_fault
KVM: MMU: mark page dirty in make_spte
KVM: MMU: unify tdp_mmu_map_set_spte_atomic and
tdp_mmu_set_spte_atomic_no_dirty_log
KVM: MMU: inline set_spte in mmu_set_spte
KVM: MMU: inline set_spte in FNAME(sync_page)
KVM: MMU: clean up make_spte return value
KVM: MMU: remove unnecessary argument to mmu_set_spte
KVM: MMU: set ad_disabled in TDP MMU role
KVM: MMU: pass kvm_mmu_page struct to make_spte
KVM: MMU: pass struct kvm_page_fault to mmu_set_spte
KVM: MMU: make spte an in-out argument in make_spte
Sean Christopherson (1):
KVM: x86/mmu: Verify shadow walk doesn't terminate early in page
faults
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/include/asm/kvm_page_track.h | 4 +-
arch/x86/kvm/mmu.h | 84 +++++-
arch/x86/kvm/mmu/mmu.c | 408 +++++++++++---------------
arch/x86/kvm/mmu/mmu_internal.h | 22 +-
arch/x86/kvm/mmu/mmutrace.h | 18 +-
arch/x86/kvm/mmu/page_track.c | 6 +-
arch/x86/kvm/mmu/paging_tmpl.h | 137 +++++----
arch/x86/kvm/mmu/spte.c | 29 +-
arch/x86/kvm/mmu/spte.h | 14 +-
arch/x86/kvm/mmu/tdp_mmu.c | 123 +++-----
arch/x86/kvm/mmu/tdp_mmu.h | 4 +-
12 files changed, 390 insertions(+), 463 deletions(-)
--
2.27.0
Create a single structure for arguments that are passed from
kvm_mmu_do_page_fault to the page fault handlers. Later
the structure will grow to include various output parameters
that are passed back to the next steps in the page fault
handling.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e9688a9f7b57..0553ef92946e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -114,17 +114,45 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
vcpu->arch.mmu->shadow_root_level);
}
+struct kvm_page_fault {
+ /* arguments to kvm_mmu_do_page_fault. */
+ const gpa_t addr;
+ const u32 error_code;
+ const bool prefault;
+
+ /* Derived from error_code. */
+ const bool exec;
+ const bool write;
+ const bool present;
+ const bool rsvd;
+ const bool user;
+
+ /* Derived from mmu. */
+ const bool is_tdp;
+};
+
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
bool prefault);
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefault)
{
+ struct kvm_page_fault fault = {
+ .addr = cr2_or_gpa,
+ .error_code = err,
+ .exec = err & PFERR_FETCH_MASK,
+ .write = err & PFERR_WRITE_MASK,
+ .present = err & PFERR_PRESENT_MASK,
+ .rsvd = err & PFERR_RSVD_MASK,
+ .user = err & PFERR_USER_MASK,
+ .prefault = prefault,
+ .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+ };
#ifdef CONFIG_RETPOLINE
- if (likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault))
- return kvm_tdp_page_fault(vcpu, cr2_or_gpa, err, prefault);
+ if (fault.is_tdp)
+ return kvm_tdp_page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
#endif
- return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault);
+ return vcpu->arch.mmu->page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
}
/*
--
2.27.0
Pass struct kvm_page_fault to mmu->page_fault() instead of
extracting the arguments from the struct. FNAME(page_fault) can use
the precomputed bools from the error code.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++--
arch/x86/kvm/mmu.h | 7 +++----
arch/x86/kvm/mmu/mmu.c | 15 ++++++++-------
arch/x86/kvm/mmu/paging_tmpl.h | 22 +++++++++++-----------
4 files changed, 24 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0cb35ef26ab3..8e2e79e909e6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -407,6 +407,7 @@ struct kvm_mmu_root_info {
#define KVM_HAVE_MMU_RWLOCK
struct kvm_mmu_page;
+struct kvm_page_fault;
/*
* x86 supports 4 paging modes (5-level 64-bit, 4-level 64-bit, 3-level 32-bit,
@@ -416,8 +417,7 @@ struct kvm_mmu_page;
struct kvm_mmu {
unsigned long (*get_guest_pgd)(struct kvm_vcpu *vcpu);
u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
- int (*page_fault)(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 err,
- bool prefault);
+ int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
struct x86_exception *fault);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t gva_or_gpa,
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0553ef92946e..ee58177bc282 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -131,8 +131,7 @@ struct kvm_page_fault {
const bool is_tdp;
};
-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- bool prefault);
+int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefault)
@@ -150,9 +149,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
};
#ifdef CONFIG_RETPOLINE
if (fault.is_tdp)
- return kvm_tdp_page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
+ return kvm_tdp_page_fault(vcpu, &fault);
#endif
- return vcpu->arch.mmu->page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
+ return vcpu->arch.mmu->page_fault(vcpu, &fault);
}
/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 376e90f4f413..3ca4b1c69e03 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4012,13 +4012,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
return r;
}
-static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
- u32 error_code, bool prefault)
+static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
{
pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
- return direct_page_fault(vcpu, gpa, error_code, prefault,
+ return direct_page_fault(vcpu, fault->addr,
+ fault->error_code, fault->prefault,
PG_LEVEL_2M, false);
}
@@ -4055,10 +4056,10 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
}
EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- bool prefault)
+int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
int max_level;
+ gpa_t gpa = fault->addr;
for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
max_level > PG_LEVEL_4K;
@@ -4070,8 +4071,8 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
break;
}
- return direct_page_fault(vcpu, gpa, error_code, prefault,
- max_level, true);
+ return direct_page_fault(vcpu, gpa, fault->error_code,
+ fault->prefault, max_level, true);
}
static void nonpaging_init_context(struct kvm_mmu *context)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index b908d2ff6d4c..8eee1200117a 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -833,11 +833,10 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
* Returns: 1 if we need to emulate the instruction, 0 otherwise, or
* a negative value on error.
*/
-static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
- bool prefault)
+static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- bool write_fault = error_code & PFERR_WRITE_MASK;
- bool user_fault = error_code & PFERR_USER_MASK;
+ gpa_t addr = fault->addr;
+ u32 error_code = fault->error_code;
struct guest_walker walker;
int r;
kvm_pfn_t pfn;
@@ -847,6 +846,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
int max_level;
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
+ WARN_ON_ONCE(fault->is_tdp);
/*
* If PFEC.RSVD is set, this is a shadow page fault.
@@ -864,7 +864,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
*/
if (!r) {
pgprintk("%s: guest page fault\n", __func__);
- if (!prefault)
+ if (!fault->prefault)
kvm_inject_emulated_page_fault(vcpu, &walker.fault);
return RET_PF_RETRY;
@@ -882,7 +882,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
vcpu->arch.write_fault_to_shadow_pgtable = false;
is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
- &walker, user_fault, &vcpu->arch.write_fault_to_shadow_pgtable);
+ &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
if (is_self_change_mapping)
max_level = PG_LEVEL_4K;
@@ -892,8 +892,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (kvm_faultin_pfn(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
- write_fault, &map_writable, &r))
+ if (kvm_faultin_pfn(vcpu, fault->prefault, walker.gfn, addr, &pfn, &hva,
+ fault->write, &map_writable, &r))
return r;
if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
@@ -903,8 +903,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
* Do not change pte_access if the pfn is a mmio page, otherwise
* we will cache the incorrect access into mmio spte.
*/
- if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
- !is_cr0_wp(vcpu->arch.mmu) && !user_fault && !is_noslot_pfn(pfn)) {
+ if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
+ !is_cr0_wp(vcpu->arch.mmu) && !fault->user && !is_noslot_pfn(pfn)) {
walker.pte_access |= ACC_WRITE_MASK;
walker.pte_access &= ~ACC_USER_MASK;
@@ -928,7 +928,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
if (r)
goto out_unlock;
r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
- map_writable, prefault);
+ map_writable, fault->prefault);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
out_unlock:
--
2.27.0
Add fields to struct kvm_page_fault corresponding to outputs of
kvm_faultin_pfn(). For now they have to be extracted again from struct
kvm_page_fault in the subsequent steps, but this is temporary until
other functions in the chain are switched over as well.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 5 ++++
arch/x86/kvm/mmu/mmu.c | 50 ++++++++++++++++------------------
arch/x86/kvm/mmu/paging_tmpl.h | 19 ++++++-------
3 files changed, 36 insertions(+), 38 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index a5c2d4069964..6697571197a5 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -135,6 +135,11 @@ struct kvm_page_fault {
/* Shifted addr, or result of guest page table walk if addr is a gva. */
gfn_t gfn;
+
+ /* Outputs of kvm_faultin_pfn. */
+ kvm_pfn_t pfn;
+ hva_t hva;
+ bool map_writable;
};
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 41dc6796b80b..c2d2d019634b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3889,11 +3889,9 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
}
-static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
- gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
- bool write, bool *writable, int *r)
+static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
{
- struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
bool async;
/*
@@ -3907,8 +3905,8 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
if (!kvm_is_visible_memslot(slot)) {
/* Don't expose private memslots to L2. */
if (is_guest_mode(vcpu)) {
- *pfn = KVM_PFN_NOSLOT;
- *writable = false;
+ fault->pfn = KVM_PFN_NOSLOT;
+ fault->map_writable = false;
return false;
}
/*
@@ -3925,23 +3923,25 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
}
async = false;
- *pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
- write, writable, hva);
+ fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
+ fault->write, &fault->map_writable,
+ &fault->hva);
if (!async)
return false; /* *pfn has correct page already */
- if (!prefault && kvm_can_do_async_pf(vcpu)) {
- trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
- if (kvm_find_async_pf_gfn(vcpu, gfn)) {
- trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
+ if (!fault->prefault && kvm_can_do_async_pf(vcpu)) {
+ trace_kvm_try_async_get_page(fault->addr, fault->gfn);
+ if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) {
+ trace_kvm_async_pf_doublefault(fault->addr, fault->gfn);
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
goto out_retry;
- } else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn))
+ } else if (kvm_arch_setup_async_pf(vcpu, fault->addr, fault->gfn))
goto out_retry;
}
- *pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
- write, writable, hva);
+ fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
+ fault->write, &fault->map_writable,
+ &fault->hva);
out_retry:
*r = RET_PF_RETRY;
@@ -3953,11 +3953,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
gpa_t gpa = fault->addr;
u32 error_code = fault->error_code;
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
- bool map_writable;
unsigned long mmu_seq;
- kvm_pfn_t pfn;
- hva_t hva;
int r;
fault->gfn = gpa >> PAGE_SHIFT;
@@ -3975,12 +3972,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (kvm_faultin_pfn(vcpu, fault->prefault, fault->gfn, gpa, &pfn, &hva,
- fault->write, &map_writable, &r))
+ if (kvm_faultin_pfn(vcpu, fault, &r))
return r;
if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa,
- fault->gfn, pfn, ACC_ALL, &r))
+ fault->gfn, fault->pfn, ACC_ALL, &r))
return r;
r = RET_PF_RETRY;
@@ -3990,25 +3986,25 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
else
write_lock(&vcpu->kvm->mmu_lock);
- if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+ if (!is_noslot_pfn(fault->pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
goto out_unlock;
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;
if (is_tdp_mmu_fault)
- r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, fault->max_level,
- pfn, fault->prefault);
+ r = kvm_tdp_mmu_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
+ fault->pfn, fault->prefault);
else
- r = __direct_map(vcpu, gpa, error_code, map_writable, fault->max_level, pfn,
- fault->prefault, fault->is_tdp);
+ r = __direct_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
+ fault->pfn, fault->prefault, fault->is_tdp);
out_unlock:
if (is_tdp_mmu_fault)
read_unlock(&vcpu->kvm->mmu_lock);
else
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(pfn);
+ kvm_release_pfn_clean(fault->pfn);
return r;
}
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 44a19dde5e70..72f0b415be63 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -839,10 +839,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
u32 error_code = fault->error_code;
struct guest_walker walker;
int r;
- kvm_pfn_t pfn;
- hva_t hva;
unsigned long mmu_seq;
- bool map_writable, is_self_change_mapping;
+ bool is_self_change_mapping;
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
WARN_ON_ONCE(fault->is_tdp);
@@ -892,11 +890,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (kvm_faultin_pfn(vcpu, fault->prefault, fault->gfn, addr, &pfn, &hva,
- fault->write, &map_writable, &r))
+ if (kvm_faultin_pfn(vcpu, fault, &r))
return r;
- if (handle_abnormal_pfn(vcpu, addr, fault->gfn, pfn, walker.pte_access, &r))
+ if (handle_abnormal_pfn(vcpu, addr, fault->gfn, fault->pfn, walker.pte_access, &r))
return r;
/*
@@ -904,7 +901,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* we will cache the incorrect access into mmio spte.
*/
if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
- !is_cr0_wp(vcpu->arch.mmu) && !fault->user && !is_noslot_pfn(pfn)) {
+ !is_cr0_wp(vcpu->arch.mmu) && !fault->user && !is_noslot_pfn(fault->pfn)) {
walker.pte_access |= ACC_WRITE_MASK;
walker.pte_access &= ~ACC_USER_MASK;
@@ -920,20 +917,20 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = RET_PF_RETRY;
write_lock(&vcpu->kvm->mmu_lock);
- if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+ if (!is_noslot_pfn(fault->pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
goto out_unlock;
kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;
- r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, pfn,
- map_writable, fault->prefault);
+ r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, fault->pfn,
+ fault->map_writable, fault->prefault);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
out_unlock:
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(pfn);
+ kvm_release_pfn_clean(fault->pfn);
return r;
}
--
2.27.0
tdp_mmu_map_set_spte_atomic is not taking care of dirty logging anymore,
the only difference that remains is that it takes a vCPU instead of
the struct kvm. Merge the two functions.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 40 ++++++++++----------------------------
1 file changed, 10 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b41b6f5ea82b..2d92a5b54ded 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -489,8 +489,8 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
}
/*
- * tdp_mmu_set_spte_atomic_no_dirty_log - Set a TDP MMU SPTE atomically
- * and handle the associated bookkeeping, but do not mark the page dirty
+ * tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically
+ * and handle the associated bookkeeping. Do not mark the page dirty
* in KVM's dirty bitmaps.
*
* @kvm: kvm instance
@@ -499,9 +499,9 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
* Returns: true if the SPTE was set, false if it was not. If false is returned,
* this function will have no side-effects.
*/
-static inline bool tdp_mmu_set_spte_atomic_no_dirty_log(struct kvm *kvm,
- struct tdp_iter *iter,
- u64 new_spte)
+static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm,
+ struct tdp_iter *iter,
+ u64 new_spte)
{
lockdep_assert_held_read(&kvm->mmu_lock);
@@ -527,24 +527,6 @@ static inline bool tdp_mmu_set_spte_atomic_no_dirty_log(struct kvm *kvm,
return true;
}
-/*
- * tdp_mmu_map_set_spte_atomic - Set a leaf TDP MMU SPTE atomically to resolve a
- * TDP page fault.
- *
- * @vcpu: The vcpu instance that took the TDP page fault.
- * @iter: a tdp_iter instance currently on the SPTE that should be set
- * @new_spte: The value the SPTE should be set to
- *
- * Returns: true if the SPTE was set, false if it was not. If false is returned,
- * this function will have no side-effects.
- */
-static inline bool tdp_mmu_map_set_spte_atomic(struct kvm_vcpu *vcpu,
- struct tdp_iter *iter,
- u64 new_spte)
-{
- return tdp_mmu_set_spte_atomic_no_dirty_log(vcpu->kvm, iter, new_spte);
-}
-
static inline bool tdp_mmu_zap_spte_atomic(struct kvm *kvm,
struct tdp_iter *iter)
{
@@ -554,7 +536,7 @@ static inline bool tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* immediately installing a present entry in its place
* before the TLBs are flushed.
*/
- if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, iter, REMOVED_SPTE))
+ if (!tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE))
return false;
kvm_flush_remote_tlbs_with_address(kvm, iter->gfn,
@@ -928,7 +910,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
if (new_spte == iter->old_spte)
ret = RET_PF_SPURIOUS;
- else if (!tdp_mmu_map_set_spte_atomic(vcpu, iter, new_spte))
+ else if (!tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
return RET_PF_RETRY;
/*
@@ -1020,7 +1002,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
new_spte = make_nonleaf_spte(child_pt,
!shadow_accessed_mask);
- if (tdp_mmu_set_spte_atomic_no_dirty_log(vcpu->kvm, &iter, new_spte)) {
+ if (tdp_mmu_set_spte_atomic(vcpu->kvm, &iter, new_spte)) {
tdp_mmu_link_page(vcpu->kvm, sp,
fault->huge_page_disallowed &&
fault->req_level >= iter.level);
@@ -1208,8 +1190,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
- if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, &iter,
- new_spte)) {
+ if (!tdp_mmu_set_spte_atomic(kvm, &iter, new_spte)) {
/*
* The iter must explicitly re-read the SPTE because
* the atomic cmpxchg failed.
@@ -1277,8 +1258,7 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
continue;
}
- if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, &iter,
- new_spte)) {
+ if (!tdp_mmu_set_spte_atomic(kvm, &iter, new_spte)) {
/*
* The iter must explicitly re-read the SPTE because
* the atomic cmpxchg failed.
--
2.27.0
Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of
extracting the arguments from the struct. Tweak a bit the conditions
to avoid long lines.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 19 ++++++++-----------
arch/x86/kvm/mmu/mmu_internal.h | 3 +--
arch/x86/kvm/mmu/paging_tmpl.h | 3 +--
arch/x86/kvm/mmu/tdp_mmu.c | 3 +--
4 files changed, 11 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 877d0bda0f5e..7491dc685842 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2957,12 +2957,10 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
fault->pfn &= ~mask;
}
-void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, u8 *goal_levelp)
+void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level)
{
- int level = *goal_levelp;
-
- if (cur_level == level && level > PG_LEVEL_4K &&
+ if (cur_level > PG_LEVEL_4K &&
+ cur_level == fault->goal_level &&
is_shadow_present_pte(spte) &&
!is_large_pte(spte)) {
/*
@@ -2972,10 +2970,10 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
* patching back for them into pfn the next 9 bits of
* the address.
*/
- u64 page_mask = KVM_PAGES_PER_HPAGE(level) -
- KVM_PAGES_PER_HPAGE(level - 1);
- *pfnp |= gfn & page_mask;
- (*goal_levelp)--;
+ u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
+ KVM_PAGES_PER_HPAGE(cur_level - 1);
+ fault->pfn |= fault->gfn & page_mask;
+ fault->goal_level--;
}
}
@@ -2995,8 +2993,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* large page, as the leaf could be executable.
*/
if (fault->nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &fault->goal_level);
+ disallowed_hugepage_adjust(fault, *it.sptep, it.level);
base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == fault->goal_level)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ae0c7bc3b19b..f0295ad51f69 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -159,8 +159,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn,
kvm_pfn_t pfn, int max_level);
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
-void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, u8 *goal_levelp);
+void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);
void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 20f616963ff4..4a263f4511a5 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -740,8 +740,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* large page, as the leaf could be executable.
*/
if (fault->nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &fault->goal_level);
+ disallowed_hugepage_adjust(fault, *it.sptep, it.level);
base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == fault->goal_level)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b48256b88930..737af596adaf 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1001,8 +1001,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
if (fault->nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(iter.old_spte, fault->gfn,
- iter.level, &fault->pfn, &fault->goal_level);
+ disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
if (iter.level == fault->goal_level)
break;
--
2.27.0
Pass struct kvm_page_fault to tdp_mmu_map_handle_target_level() instead of
extracting the arguments from the struct.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4a5bb0b5b639..6cfba8c28ea2 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -929,21 +929,20 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
* Installs a last-level SPTE to handle a TDP page fault.
* (NPT/EPT violation/misconfiguration)
*/
-static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
- int map_writable,
- struct tdp_iter *iter,
- kvm_pfn_t pfn, bool prefault)
+static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault,
+ struct tdp_iter *iter)
{
u64 new_spte;
int ret = RET_PF_FIXED;
int make_spte_ret = 0;
- if (unlikely(is_noslot_pfn(pfn)))
+ if (unlikely(is_noslot_pfn(fault->pfn)))
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
make_spte_ret = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn,
- pfn, iter->old_spte, prefault, true,
- map_writable, !shadow_accessed_mask,
+ fault->pfn, iter->old_spte, fault->prefault, true,
+ fault->map_writable, !shadow_accessed_mask,
&new_spte);
if (new_spte == iter->old_spte)
@@ -957,7 +956,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
* the vCPU would have the same fault again.
*/
if (make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
- if (write)
+ if (fault->write)
ret = RET_PF_EMULATE;
}
@@ -1064,8 +1063,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return RET_PF_RETRY;
}
- ret = tdp_mmu_map_handle_target_level(vcpu, fault->write, fault->map_writable, &iter,
- fault->pfn, fault->prefault);
+ ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter);
rcu_read_unlock();
return ret;
--
2.27.0
Add fields to struct kvm_page_fault corresponding to the arguments
of page_fault_handle_page_track(). The fields are initialized in the
callers, and page_fault_handle_page_track() receives a struct
kvm_page_fault instead of having to extract the arguments out of it.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 18 +++++++++---------
arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++---
3 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 8d001b56f7b5..a5c2d4069964 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -132,6 +132,9 @@ struct kvm_page_fault {
/* Input to FNAME(fetch), __direct_map and kvm_tdp_mmu_map. */
u8 max_level;
+
+ /* Shifted addr, or result of guest page table walk if addr is a gva. */
+ gfn_t gfn;
};
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7685b4270d8c..41dc6796b80b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3846,20 +3846,19 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
}
static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
- u32 error_code, gfn_t gfn)
+ struct kvm_page_fault *fault)
{
- if (unlikely(error_code & PFERR_RSVD_MASK))
+ if (unlikely(fault->rsvd))
return false;
- if (!(error_code & PFERR_PRESENT_MASK) ||
- !(error_code & PFERR_WRITE_MASK))
+ if (!fault->present || !fault->write)
return false;
/*
* guest is writing the page which is write tracked which can
* not be fixed by page fault handler.
*/
- if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+ if (kvm_page_track_is_active(vcpu, fault->gfn, KVM_PAGE_TRACK_WRITE))
return true;
return false;
@@ -3956,13 +3955,13 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
bool map_writable;
- gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
kvm_pfn_t pfn;
hva_t hva;
int r;
- if (page_fault_handle_page_track(vcpu, error_code, gfn))
+ fault->gfn = gpa >> PAGE_SHIFT;
+ if (page_fault_handle_page_track(vcpu, fault))
return RET_PF_EMULATE;
r = fast_page_fault(vcpu, gpa, error_code);
@@ -3976,11 +3975,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (kvm_faultin_pfn(vcpu, fault->prefault, gfn, gpa, &pfn, &hva,
+ if (kvm_faultin_pfn(vcpu, fault->prefault, fault->gfn, gpa, &pfn, &hva,
fault->write, &map_writable, &r))
return r;
- if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+ if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa,
+ fault->gfn, pfn, ACC_ALL, &r))
return r;
r = RET_PF_RETRY;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index a39881a8ba78..44a19dde5e70 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -869,7 +869,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
return RET_PF_RETRY;
}
- if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) {
+ fault->gfn = walker.gfn;
+ if (page_fault_handle_page_track(vcpu, fault)) {
shadow_page_table_clear_flood(vcpu, addr);
return RET_PF_EMULATE;
}
@@ -891,11 +892,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (kvm_faultin_pfn(vcpu, fault->prefault, walker.gfn, addr, &pfn, &hva,
+ if (kvm_faultin_pfn(vcpu, fault->prefault, fault->gfn, addr, &pfn, &hva,
fault->write, &map_writable, &r))
return r;
- if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+ if (handle_abnormal_pfn(vcpu, addr, fault->gfn, pfn, walker.pte_access, &r))
return r;
/*
--
2.27.0
Pass struct kvm_page_fault to FNAME(fetch)() instead of
extracting the arguments from the struct.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 52 ++++++++++++++--------------------
1 file changed, 22 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0fa7a678b907..afd2ad8c5173 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -655,21 +655,18 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
* If the guest tries to write a write-protected page, we need to
* emulate this operation, return 1 to indicate this case.
*/
-static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
- struct guest_walker *gw, u32 error_code,
- int max_level, kvm_pfn_t pfn, bool map_writable,
- bool prefault)
+static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+ struct guest_walker *gw)
{
bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool write_fault = error_code & PFERR_WRITE_MASK;
- bool exec = error_code & PFERR_FETCH_MASK;
- bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
+ bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned int direct_access, access;
int top_level, level, req_level, ret;
- gfn_t base_gfn = gw->gfn;
+ gfn_t base_gfn = fault->gfn;
+ WARN_ON_ONCE(gw->gfn != base_gfn);
direct_access = gw->pte_access;
top_level = vcpu->arch.mmu->root_level;
@@ -687,7 +684,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
goto out_gpte_changed;
- for (shadow_walk_init(&it, vcpu, addr);
+ for (shadow_walk_init(&it, vcpu, fault->addr);
shadow_walk_okay(&it) && it.level > gw->level;
shadow_walk_next(&it)) {
gfn_t table_gfn;
@@ -699,7 +696,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
access = gw->pt_access[it.level - 2];
- sp = kvm_mmu_get_page(vcpu, table_gfn, addr,
+ sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr,
it.level-1, false, access);
/*
* We must synchronize the pagetable before linking it
@@ -733,10 +730,10 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
link_shadow_page(vcpu, it.sptep, sp);
}
- level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
+ level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, fault->max_level, &fault->pfn,
huge_page_disallowed, &req_level);
- trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, gw->level, fault->pfn);
for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
clear_sp_write_flooding_count(it.sptep);
@@ -746,10 +743,10 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
* large page, as the leaf could be executable.
*/
if (nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, gw->gfn, it.level,
- &pfn, &level);
+ disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
+ &fault->pfn, &level);
- base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+ base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == level)
break;
@@ -758,7 +755,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
drop_large_spte(vcpu, it.sptep);
if (!is_shadow_present_pte(*it.sptep)) {
- sp = kvm_mmu_get_page(vcpu, base_gfn, addr,
+ sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr,
it.level - 1, true, direct_access);
link_shadow_page(vcpu, it.sptep, sp);
if (huge_page_disallowed && req_level >= it.level)
@@ -766,8 +763,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
}
}
- ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault,
- it.level, base_gfn, pfn, prefault, map_writable);
+ ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
+ it.level, base_gfn, fault->pfn, fault->prefault,
+ fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;
@@ -835,26 +833,21 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
*/
static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- gpa_t addr = fault->addr;
- u32 error_code = fault->error_code;
struct guest_walker walker;
int r;
unsigned long mmu_seq;
bool is_self_change_mapping;
- pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
+ pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
WARN_ON_ONCE(fault->is_tdp);
/*
+ * Look up the guest pte for the faulting address.
* If PFEC.RSVD is set, this is a shadow page fault.
* The bit needs to be cleared before walking guest page tables.
*/
- error_code &= ~PFERR_RSVD_MASK;
-
- /*
- * Look up the guest pte for the faulting address.
- */
- r = FNAME(walk_addr)(&walker, vcpu, addr, error_code);
+ r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
+ fault->error_code & ~PFERR_RSVD_MASK);
/*
* The page is not mapped by the guest. Let the guest handle it.
@@ -869,7 +862,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
fault->gfn = walker.gfn;
if (page_fault_handle_page_track(vcpu, fault)) {
- shadow_page_table_clear_flood(vcpu, addr);
+ shadow_page_table_clear_flood(vcpu, fault->addr);
return RET_PF_EMULATE;
}
@@ -924,8 +917,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;
- r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, fault->pfn,
- fault->map_writable, fault->prefault);
+ r = FNAME(fetch)(vcpu, fault, &walker);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
out_unlock:
--
2.27.0
Since the two callers of set_spte do different things with the results,
inlining it actually makes the code simpler to reason about. For example,
FNAME(sync_page) already has a struct kvm_mmu_page *, but set_spte had to
fish it back out of sptep's private page data.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 21 ---------------------
arch/x86/kvm/mmu/paging_tmpl.h | 21 ++++++++++++---------
2 files changed, 12 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6ba7c60bd4f8..19c2fd2189a3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2674,27 +2674,6 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
return 0;
}
-static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
- unsigned int pte_access, int level,
- gfn_t gfn, kvm_pfn_t pfn, bool speculative,
- bool can_unsync, bool host_writable)
-{
- u64 spte;
- struct kvm_mmu_page *sp;
- int ret;
-
- sp = sptep_to_sp(sptep);
-
- ret = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
- can_unsync, host_writable, sp_ad_disabled(sp), &spte);
-
- if (*sptep == spte)
- ret |= SET_SPTE_SPURIOUS;
- else if (mmu_spte_update(sptep, spte))
- ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
- return ret;
-}
-
static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
unsigned int pte_access, bool write_fault, int level,
gfn_t gfn, kvm_pfn_t pfn, bool speculative,
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e4c7bf3deac8..500962dceda0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1061,7 +1061,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
int i;
bool host_writable;
gpa_t first_pte_gpa;
- int set_spte_ret = 0;
+ bool flush = false;
/*
* Ignore various flags when verifying that it's safe to sync a shadow
@@ -1091,6 +1091,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
+ u64 *sptep, spte;
unsigned pte_access;
pt_element_t gpte;
gpa_t pte_gpa;
@@ -1106,7 +1107,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
return -1;
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
- set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
+ flush = true;
continue;
}
@@ -1120,19 +1121,21 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
if (gfn != sp->gfns[i]) {
drop_spte(vcpu->kvm, &sp->spt[i]);
- set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
+ flush = true;
continue;
}
- host_writable = sp->spt[i] & shadow_host_writable_mask;
+ sptep = &sp->spt[i];
+ spte = *sptep;
+ host_writable = spte & shadow_host_writable_mask;
+ make_spte(vcpu, pte_access, PG_LEVEL_4K, gfn,
+ spte_to_pfn(spte), spte, true, false,
+ host_writable, sp_ad_disabled(sp), &spte);
- set_spte_ret |= set_spte(vcpu, &sp->spt[i],
- pte_access, PG_LEVEL_4K,
- gfn, spte_to_pfn(sp->spt[i]),
- true, false, host_writable);
+ flush |= mmu_spte_update(sptep, spte);
}
- return set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH;
+ return flush;
}
#undef pt_element_t
--
2.27.0
This simplifies set_spte, which we want to remove, and unifies code
between the shadow MMU and the TDP MMU. The warning will be added
back later to make_spte as well.
There is a small disadvantage in the TDP MMU; it may unnecessarily mark
a page as dirty twice if two vCPUs end up mapping the same page twice.
However, this is a very small cost for a case that is already rare.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 ---
arch/x86/kvm/mmu/spte.c | 3 +++
arch/x86/kvm/mmu/tdp_mmu.c | 21 +--------------------
3 files changed, 4 insertions(+), 23 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8b6dc276935f..5a757953b98b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2688,9 +2688,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
ret = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
can_unsync, host_writable, sp_ad_disabled(sp), &spte);
- if (spte & PT_WRITABLE_MASK)
- kvm_vcpu_mark_page_dirty(vcpu, gfn);
-
if (*sptep == spte)
ret |= SET_SPTE_SPURIOUS;
else if (mmu_spte_update(sptep, spte))
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index a33c581aabd6..66be9452ded1 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -179,6 +179,9 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
"spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
+ if (spte & PT_WRITABLE_MASK)
+ kvm_vcpu_mark_page_dirty(vcpu, gfn);
+
*new_spte = spte;
return ret;
}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3bf85a8c7d15..b41b6f5ea82b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -542,26 +542,7 @@ static inline bool tdp_mmu_map_set_spte_atomic(struct kvm_vcpu *vcpu,
struct tdp_iter *iter,
u64 new_spte)
{
- struct kvm *kvm = vcpu->kvm;
-
- if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, iter, new_spte))
- return false;
-
- /*
- * Use kvm_vcpu_gfn_to_memslot() instead of going through
- * handle_changed_spte_dirty_log() to leverage vcpu->last_used_slot.
- */
- if (is_writable_pte(new_spte)) {
- struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, iter->gfn);
-
- if (slot && kvm_slot_dirty_track_enabled(slot)) {
- /* Enforced by kvm_mmu_hugepage_adjust. */
- WARN_ON_ONCE(iter->level > PG_LEVEL_4K);
- mark_page_dirty_in_slot(kvm, slot, iter->gfn);
- }
- }
-
- return true;
+ return tdp_mmu_set_spte_atomic_no_dirty_log(vcpu->kvm, iter, new_spte);
}
static inline bool tdp_mmu_zap_spte_atomic(struct kvm *kvm,
--
2.27.0
Pass the old SPTE in the same variable that receives the new SPTE. This
reduces the number of arguments from 11 to 10.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 18 ++++++++----------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/mmu/spte.c | 7 ++++---
arch/x86/kvm/mmu/spte.h | 2 +-
arch/x86/kvm/mmu/tdp_mmu.c | 9 +++++----
5 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91292009780a..b363433bcd2c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2682,8 +2682,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
int was_rmapped = 0;
int ret = RET_PF_FIXED;
bool flush = false;
+ u64 spte = *sptep;
bool wrprot;
- u64 spte;
/* Prefetching always gets a writable pfn. */
bool host_writable = !fault || fault->map_writable;
@@ -2691,35 +2691,33 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
bool write_fault = fault && fault->write;
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
- *sptep, write_fault, gfn);
+ spte, write_fault, gfn);
if (unlikely(is_noslot_pfn(pfn))) {
mark_mmio_spte(vcpu, sptep, gfn, pte_access);
return RET_PF_EMULATE;
}
- if (is_shadow_present_pte(*sptep)) {
+ if (is_shadow_present_pte(spte)) {
/*
* If we overwrite a PTE page pointer with a 2MB PMD, unlink
* the parent of the now unreachable PTE.
*/
- if (level > PG_LEVEL_4K && !is_large_pte(*sptep)) {
+ if (level > PG_LEVEL_4K && !is_large_pte(spte)) {
struct kvm_mmu_page *child;
- u64 pte = *sptep;
-
- child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
+ child = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
drop_parent_pte(child, sptep);
flush = true;
- } else if (pfn != spte_to_pfn(*sptep)) {
+ } else if (pfn != spte_to_pfn(spte)) {
pgprintk("hfn old %llx new %llx\n",
- spte_to_pfn(*sptep), pfn);
+ spte_to_pfn(spte), pfn);
drop_spte(vcpu->kvm, sptep);
flush = true;
} else
was_rmapped = 1;
}
- wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, speculative,
+ wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, speculative,
true, host_writable, &spte);
if (*sptep == spte) {
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d8889e02c4b7..88551cfd06c6 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1130,7 +1130,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
host_writable = spte & shadow_host_writable_mask;
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
make_spte(vcpu, sp, slot, pte_access, gfn,
- spte_to_pfn(spte), spte, true, false,
+ spte_to_pfn(spte), true, false,
host_writable, &spte);
flush |= mmu_spte_update(sptep, spte);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 871f6114b0fa..91525388032e 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -92,10 +92,11 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
struct kvm_memory_slot *slot,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
- u64 old_spte, bool speculative, bool can_unsync,
- bool host_writable, u64 *new_spte)
+ bool speculative, bool can_unsync,
+ bool host_writable, u64 *sptep)
{
int level = sp->role.level;
+ u64 old_spte = *sptep;
u64 spte = SPTE_MMU_PRESENT_MASK;
bool wrprot = false;
@@ -187,7 +188,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
}
- *new_spte = spte;
+ *sptep = spte;
return wrprot;
}
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7c0b09461349..231531c6015a 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -337,7 +337,7 @@ static inline u64 get_mmio_spte_generation(u64 spte)
bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
struct kvm_memory_slot *slot,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
- u64 old_spte, bool speculative, bool can_unsync,
+ bool speculative, bool can_unsync,
bool host_writable, u64 *new_spte);
u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 953f24ded6bc..29b739c7bba4 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -903,13 +903,14 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
bool wrprot = false;
WARN_ON(sp->role.level != fault->goal_level);
- if (unlikely(!fault->slot))
+ if (unlikely(!fault->slot)) {
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
- else
+ } else {
+ new_spte = iter->old_spte;
wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
- fault->pfn, iter->old_spte, fault->prefault, true,
+ fault->pfn, fault->prefault, true,
fault->map_writable, &new_spte);
-
+ }
if (new_spte == iter->old_spte)
ret = RET_PF_SPURIOUS;
else if (!tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
--
2.27.0
The level and A/D bit support of the new SPTE can be found in the role,
which is stored in the kvm_mmu_page struct. This merges two arguments
into one.
For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
not use it if the SPTE is already present) so we fetch it just before
calling make_spte.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 4 ++--
arch/x86/kvm/mmu/paging_tmpl.h | 4 ++--
arch/x86/kvm/mmu/spte.c | 11 ++++++-----
arch/x86/kvm/mmu/spte.h | 8 ++++----
arch/x86/kvm/mmu/tdp_mmu.c | 7 ++++---
5 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91303006faaf..c208f001c302 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2716,8 +2716,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
was_rmapped = 1;
}
- wrprot = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
- true, host_writable, sp_ad_disabled(sp), &spte);
+ wrprot = make_spte(vcpu, sp, pte_access, gfn, pfn, *sptep, speculative,
+ true, host_writable, &spte);
if (*sptep == spte) {
ret = RET_PF_SPURIOUS;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 7f2c6eeed04f..fbbaa3f5fb4e 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1128,9 +1128,9 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
sptep = &sp->spt[i];
spte = *sptep;
host_writable = spte & shadow_host_writable_mask;
- make_spte(vcpu, pte_access, PG_LEVEL_4K, gfn,
+ make_spte(vcpu, sp, pte_access, gfn,
spte_to_pfn(spte), spte, true, false,
- host_writable, sp_ad_disabled(sp), &spte);
+ host_writable, &spte);
flush |= mmu_spte_update(sptep, spte);
}
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 29ea996201b4..2c5c14fbfbe9 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -89,15 +89,16 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
E820_TYPE_RAM);
}
-bool make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
- gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative,
- bool can_unsync, bool host_writable, bool ad_disabled,
- u64 *new_spte)
+bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+ unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
+ u64 old_spte, bool speculative, bool can_unsync,
+ bool host_writable, u64 *new_spte)
{
+ int level = sp->role.level;
u64 spte = SPTE_MMU_PRESENT_MASK;
bool wrprot = false;
- if (ad_disabled)
+ if (sp->role.ad_disabled)
spte |= SPTE_TDP_AD_DISABLED_MASK;
else if (kvm_vcpu_ad_need_write_protect(vcpu))
spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 1998ec559196..cbb02a961ac2 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -334,10 +334,10 @@ static inline u64 get_mmio_spte_generation(u64 spte)
return gen;
}
-bool make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
- gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative,
- bool can_unsync, bool host_writable, bool ad_disabled,
- u64 *new_spte);
+bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+ unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
+ u64 old_spte, bool speculative, bool can_unsync,
+ bool host_writable, u64 *new_spte);
u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
u64 mark_spte_for_access_track(u64 spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1cdb5618bb76..6dbf28924bc2 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -897,17 +897,18 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault,
struct tdp_iter *iter)
{
+ struct kvm_mmu_page *sp = sptep_to_sp(iter->sptep);
u64 new_spte;
int ret = RET_PF_FIXED;
bool wrprot = false;
+ WARN_ON(sp->role.level != fault->goal_level);
if (unlikely(!fault->slot))
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
- wrprot = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn,
+ wrprot = make_spte(vcpu, sp, ACC_ALL, iter->gfn,
fault->pfn, iter->old_spte, fault->prefault, true,
- fault->map_writable, !shadow_accessed_mask,
- &new_spte);
+ fault->map_writable, &new_spte);
if (new_spte == iter->old_spte)
ret = RET_PF_SPURIOUS;
--
2.27.0
mmu_set_spte is called for either PTE prefetching or page faults. The
three boolean arguments write_fault, speculative and host_writable are
always respectively false/true/true for prefetching and coming from
a struct kvm_page_fault for page faults.
Let mmu_set_spte distinguish these two situation by accepting a
possibly NULL struct kvm_page_fault argument.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 17 ++++++++++-------
arch/x86/kvm/mmu/paging_tmpl.h | 13 +++----------
2 files changed, 13 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c208f001c302..4b304f60cf44 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2675,9 +2675,8 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
}
static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
- unsigned int pte_access, bool write_fault,
- gfn_t gfn, kvm_pfn_t pfn, bool speculative,
- bool host_writable)
+ unsigned int pte_access, gfn_t gfn,
+ kvm_pfn_t pfn, struct kvm_page_fault *fault)
{
struct kvm_mmu_page *sp = sptep_to_sp(sptep);
int level = sp->role.level;
@@ -2687,6 +2686,11 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
bool wrprot;
u64 spte;
+ /* Prefetching always gets a writable pfn. */
+ bool host_writable = !fault || fault->map_writable;
+ bool speculative = !fault || fault->prefault;
+ bool write_fault = fault && fault->write;
+
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
*sptep, write_fault, gfn);
@@ -2778,8 +2782,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
for (i = 0; i < ret; i++, gfn++, start++) {
- mmu_set_spte(vcpu, start, access, false, gfn,
- page_to_pfn(pages[i]), true, true);
+ mmu_set_spte(vcpu, start, access, gfn,
+ page_to_pfn(pages[i]), NULL);
put_page(pages[i]);
}
@@ -2981,8 +2985,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return -EFAULT;
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
- fault->write, base_gfn, fault->pfn,
- fault->prefault, fault->map_writable);
+ base_gfn, fault->pfn, fault);
if (ret == RET_PF_SPURIOUS)
return ret;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index fbbaa3f5fb4e..8c07c42a4d73 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -578,13 +578,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (is_error_pfn(pfn))
return false;
- /*
- * we call mmu_set_spte() with host_writable = true because
- * pte_prefetch_gfn_to_pfn always gets a writable pfn.
- */
- mmu_set_spte(vcpu, spte, pte_access, false, gfn, pfn,
- true, true);
-
+ mmu_set_spte(vcpu, spte, pte_access, gfn, pfn, NULL);
kvm_release_pfn_clean(pfn);
return true;
}
@@ -763,9 +757,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
if (WARN_ON_ONCE(it.level != fault->goal_level))
return -EFAULT;
- ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
- base_gfn, fault->pfn, fault->prefault,
- fault->map_writable);
+ ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access,
+ base_gfn, fault->pfn, fault);
if (ret == RET_PF_SPURIOUS)
return ret;
--
2.27.0
From: David Matlack <[email protected]>
The memslot for the faulting gfn is used throughout the page fault
handling code, so capture it in kvm_page_fault as soon as we know the
gfn and use it in the page fault handling code that has direct access
to the kvm_page_fault struct. Replace various tests using is_noslot_pfn
with more direct tests on fault->slot being NULL.
This, in combination with the subsequent patch, improves "Populate
memory time" in dirty_log_perf_test by 5% when using the legacy MMU.
There is no discerable improvement to the performance of the TDP MMU.
No functional change intended.
Suggested-by: Ben Gardon <[email protected]>
Signed-off-by: David Matlack <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++--------------------
arch/x86/kvm/mmu/paging_tmpl.h | 6 ++++--
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
4 files changed, 20 insertions(+), 23 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 01a4d1bc5053..75367af1a6d3 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -158,6 +158,9 @@ struct kvm_page_fault {
/* Shifted addr, or result of guest page table walk if addr is a gva. */
gfn_t gfn;
+ /* The memslot containing gfn. May be NULL. */
+ struct kvm_memory_slot *slot;
+
/* Outputs of kvm_faultin_pfn. */
kvm_pfn_t pfn;
hva_t hva;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5a757953b98b..754578458cb7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2907,7 +2907,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- struct kvm_memory_slot *slot;
+ struct kvm_memory_slot *slot = fault->slot;
kvm_pfn_t mask;
fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
@@ -2918,8 +2918,7 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
return;
- slot = gfn_to_memslot_dirty_bitmap(vcpu, fault->gfn, true);
- if (!slot)
+ if (kvm_slot_dirty_track_enabled(slot))
return;
/*
@@ -3043,7 +3042,7 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
return true;
}
- if (unlikely(is_noslot_pfn(fault->pfn))) {
+ if (unlikely(!fault->slot)) {
gva_t gva = fault->is_tdp ? 0 : fault->addr;
vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
@@ -3097,13 +3096,9 @@ static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
* someone else modified the SPTE from its original value.
*/
static bool
-fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
u64 *sptep, u64 old_spte, u64 new_spte)
{
- gfn_t gfn;
-
- WARN_ON(!sp->role.direct);
-
/*
* Theoretically we could also set dirty bit (and flush TLB) here in
* order to eliminate unnecessary PML logging. See comments in
@@ -3119,14 +3114,8 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (cmpxchg64(sptep, old_spte, new_spte) != old_spte)
return false;
- if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) {
- /*
- * The gfn of direct spte is stable since it is
- * calculated by sp->gfn.
- */
- gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
- kvm_vcpu_mark_page_dirty(vcpu, gfn);
- }
+ if (is_writable_pte(new_spte) && !is_writable_pte(old_spte))
+ mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn);
return true;
}
@@ -3251,7 +3240,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* since the gfn is not stable for indirect shadow page. See
* Documentation/virt/kvm/locking.rst to get more detail.
*/
- if (fast_pf_fix_direct_spte(vcpu, sp, sptep, spte, new_spte)) {
+ if (fast_pf_fix_direct_spte(vcpu, fault, sptep, spte, new_spte)) {
ret = RET_PF_FIXED;
break;
}
@@ -3863,7 +3852,7 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
{
- struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
+ struct kvm_memory_slot *slot = fault->slot;
bool async;
/*
@@ -3877,6 +3866,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
if (!kvm_is_visible_memslot(slot)) {
/* Don't expose private memslots to L2. */
if (is_guest_mode(vcpu)) {
+ fault->slot = NULL;
fault->pfn = KVM_PFN_NOSLOT;
fault->map_writable = false;
return false;
@@ -3928,6 +3918,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
int r;
fault->gfn = fault->addr >> PAGE_SHIFT;
+ fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
+
if (page_fault_handle_page_track(vcpu, fault))
return RET_PF_EMULATE;
@@ -3955,7 +3947,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
else
write_lock(&vcpu->kvm->mmu_lock);
- if (!is_noslot_pfn(fault->pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
+ if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
goto out_unlock;
r = make_mmu_pages_available(vcpu);
if (r)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 7a8a2d14a3c7..e4c7bf3deac8 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -861,6 +861,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
}
fault->gfn = walker.gfn;
+ fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
+
if (page_fault_handle_page_track(vcpu, fault)) {
shadow_page_table_clear_flood(vcpu, fault->addr);
return RET_PF_EMULATE;
@@ -894,7 +896,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* we will cache the incorrect access into mmio spte.
*/
if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
- !is_cr0_wp(vcpu->arch.mmu) && !fault->user && !is_noslot_pfn(fault->pfn)) {
+ !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) {
walker.pte_access |= ACC_WRITE_MASK;
walker.pte_access &= ~ACC_USER_MASK;
@@ -910,7 +912,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = RET_PF_RETRY;
write_lock(&vcpu->kvm->mmu_lock);
- if (!is_noslot_pfn(fault->pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
+ if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
goto out_unlock;
kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 2d92a5b54ded..3e10658cf0d7 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -900,7 +900,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
int ret = RET_PF_FIXED;
int make_spte_ret = 0;
- if (unlikely(is_noslot_pfn(fault->pfn)))
+ if (unlikely(!fault->slot))
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
make_spte_ret = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn,
--
2.27.0
Add fields to struct kvm_page_fault corresponding to
the arguments of direct_page_fault(). The fields are
initialized in the callers, and direct_page_fault()
receives a struct kvm_page_fault instead of having to
extract the arguments out of it.
Also adjust FNAME(page_fault) to store the max_level in
struct kvm_page_fault, to keep it similar to the direct
map path.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 5 ++++
arch/x86/kvm/mmu/mmu.c | 43 +++++++++++++++-------------------
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++---
3 files changed, 27 insertions(+), 28 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index ee58177bc282..8d001b56f7b5 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -129,6 +129,9 @@ struct kvm_page_fault {
/* Derived from mmu. */
const bool is_tdp;
+
+ /* Input to FNAME(fetch), __direct_map and kvm_tdp_mmu_map. */
+ u8 max_level;
};
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
@@ -146,6 +149,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.user = err & PFERR_USER_MASK,
.prefault = prefault,
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+
+ .max_level = KVM_MAX_HUGEPAGE_LEVEL,
};
#ifdef CONFIG_RETPOLINE
if (fault.is_tdp)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3ca4b1c69e03..7685b4270d8c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3949,11 +3949,11 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
return true;
}
-static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- bool prefault, int max_level, bool is_tdp)
+static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
+ gpa_t gpa = fault->addr;
+ u32 error_code = fault->error_code;
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
- bool write = error_code & PFERR_WRITE_MASK;
bool map_writable;
gfn_t gfn = gpa >> PAGE_SHIFT;
@@ -3976,11 +3976,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
- if (kvm_faultin_pfn(vcpu, prefault, gfn, gpa, &pfn, &hva,
- write, &map_writable, &r))
+ if (kvm_faultin_pfn(vcpu, fault->prefault, gfn, gpa, &pfn, &hva,
+ fault->write, &map_writable, &r))
return r;
- if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+ if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
return r;
r = RET_PF_RETRY;
@@ -3997,11 +3997,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
goto out_unlock;
if (is_tdp_mmu_fault)
- r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
- pfn, prefault);
+ r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, fault->max_level,
+ pfn, fault->prefault);
else
- r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
- prefault, is_tdp);
+ r = __direct_map(vcpu, gpa, error_code, map_writable, fault->max_level, pfn,
+ fault->prefault, fault->is_tdp);
out_unlock:
if (is_tdp_mmu_fault)
@@ -4015,12 +4015,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
- pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
+ pgprintk("%s: gva %lx error %x\n", __func__, fault->addr, fault->error_code);
/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
- return direct_page_fault(vcpu, fault->addr,
- fault->error_code, fault->prefault,
- PG_LEVEL_2M, false);
+ fault->max_level = PG_LEVEL_2M;
+ return direct_page_fault(vcpu, fault);
}
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
@@ -4058,21 +4057,17 @@ EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- int max_level;
- gpa_t gpa = fault->addr;
-
- for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
- max_level > PG_LEVEL_4K;
- max_level--) {
- int page_num = KVM_PAGES_PER_HPAGE(max_level);
- gfn_t base = (gpa >> PAGE_SHIFT) & ~(page_num - 1);
+ while (fault->max_level > PG_LEVEL_4K) {
+ int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
+ gfn_t base = (fault->addr >> PAGE_SHIFT) & ~(page_num - 1);
if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
break;
+
+ --fault->max_level;
}
- return direct_page_fault(vcpu, gpa, fault->error_code,
- fault->prefault, max_level, true);
+ return direct_page_fault(vcpu, fault);
}
static void nonpaging_init_context(struct kvm_mmu *context)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 8eee1200117a..a39881a8ba78 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -843,7 +843,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
hva_t hva;
unsigned long mmu_seq;
bool map_writable, is_self_change_mapping;
- int max_level;
pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
WARN_ON_ONCE(fault->is_tdp);
@@ -885,9 +884,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
&walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);
if (is_self_change_mapping)
- max_level = PG_LEVEL_4K;
+ fault->max_level = PG_LEVEL_4K;
else
- max_level = walker.level;
+ fault->max_level = walker.level;
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
@@ -927,7 +926,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;
- r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+ r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, pfn,
map_writable, fault->prefault);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
--
2.27.0
Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of
extracting the arguments from the struct; the results are also stored
in the struct, so the callers are adjusted consequently.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 35 +++++++++++++++++--
arch/x86/kvm/mmu/mmu.c | 60 +++++++++++++++------------------
arch/x86/kvm/mmu/mmu_internal.h | 12 ++-----
arch/x86/kvm/mmu/paging_tmpl.h | 16 ++++-----
arch/x86/kvm/mmu/tdp_mmu.c | 21 +++++-------
5 files changed, 77 insertions(+), 67 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6697571197a5..01a4d1bc5053 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -127,12 +127,34 @@ struct kvm_page_fault {
const bool rsvd;
const bool user;
- /* Derived from mmu. */
+ /* Derived from mmu and global state. */
const bool is_tdp;
+ const bool nx_huge_page_workaround_enabled;
- /* Input to FNAME(fetch), __direct_map and kvm_tdp_mmu_map. */
+ /*
+ * Whether a >4KB mapping can be created or is forbidden due to NX
+ * hugepages.
+ */
+ bool huge_page_disallowed;
+
+ /*
+ * Maximum page size that can be created for this fault; input to
+ * FNAME(fetch), __direct_map and kvm_tdp_mmu_map.
+ */
u8 max_level;
+ /*
+ * Page size that can be created based on the max_level and the
+ * page size used by the host mapping.
+ */
+ u8 req_level;
+
+ /*
+ * Page size that will be created based on the req_level and
+ * huge_page_disallowed.
+ */
+ u8 goal_level;
+
/* Shifted addr, or result of guest page table walk if addr is a gva. */
gfn_t gfn;
@@ -144,6 +166,12 @@ struct kvm_page_fault {
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
+extern int nx_huge_pages;
+static inline bool is_nx_huge_page_enabled(void)
+{
+ return READ_ONCE(nx_huge_pages);
+}
+
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefault)
{
@@ -157,8 +185,11 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.user = err & PFERR_USER_MASK,
.prefault = prefault,
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+ .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
.max_level = KVM_MAX_HUGEPAGE_LEVEL,
+ .req_level = PG_LEVEL_4K,
+ .goal_level = PG_LEVEL_4K,
};
#ifdef CONFIG_RETPOLINE
if (fault.is_tdp)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 36cbe5cba085..877d0bda0f5e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2920,48 +2920,45 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
return min(host_level, max_level);
}
-int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
- int max_level, kvm_pfn_t *pfnp,
- bool huge_page_disallowed, int *req_level)
+void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_memory_slot *slot;
- kvm_pfn_t pfn = *pfnp;
kvm_pfn_t mask;
- int level;
- *req_level = PG_LEVEL_4K;
+ fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;
- if (unlikely(max_level == PG_LEVEL_4K))
- return PG_LEVEL_4K;
+ if (unlikely(fault->max_level == PG_LEVEL_4K))
+ return;
- if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
- return PG_LEVEL_4K;
+ if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
+ return;
- slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true);
+ slot = gfn_to_memslot_dirty_bitmap(vcpu, fault->gfn, true);
if (!slot)
- return PG_LEVEL_4K;
+ return;
/*
* Enforce the iTLB multihit workaround after capturing the requested
* level, which will be used to do precise, accurate accounting.
*/
- *req_level = level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, gfn, pfn, max_level);
- if (level == PG_LEVEL_4K || huge_page_disallowed)
- return PG_LEVEL_4K;
+ fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
+ fault->gfn, fault->pfn,
+ fault->max_level);
+ if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
+ return;
/*
* mmu_notifier_retry() was successful and mmu_lock is held, so
* the pmd can't be split from under us.
*/
- mask = KVM_PAGES_PER_HPAGE(level) - 1;
- VM_BUG_ON((gfn & mask) != (pfn & mask));
- *pfnp = pfn & ~mask;
-
- return level;
+ fault->goal_level = fault->req_level;
+ mask = KVM_PAGES_PER_HPAGE(fault->goal_level) - 1;
+ VM_BUG_ON((fault->gfn & mask) != (fault->pfn & mask));
+ fault->pfn &= ~mask;
}
void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, int *goal_levelp)
+ kvm_pfn_t *pfnp, u8 *goal_levelp)
{
int level = *goal_levelp;
@@ -2984,28 +2981,25 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_shadow_walk_iterator it;
struct kvm_mmu_page *sp;
- int level, req_level, ret;
+ int ret;
gfn_t base_gfn = fault->gfn;
- level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
- huge_page_disallowed, &req_level);
+ kvm_mmu_hugepage_adjust(vcpu, fault);
- trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);
for_each_shadow_entry(vcpu, fault->addr, it) {
/*
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
- if (nx_huge_page_workaround_enabled)
+ if (fault->nx_huge_page_workaround_enabled)
disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &level);
+ &fault->pfn, &fault->goal_level);
base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
- if (it.level == level)
+ if (it.level == fault->goal_level)
break;
drop_large_spte(vcpu, it.sptep);
@@ -3016,13 +3010,13 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
it.level - 1, true, ACC_ALL);
link_shadow_page(vcpu, it.sptep, sp);
- if (fault->is_tdp && huge_page_disallowed &&
- req_level >= it.level)
+ if (fault->is_tdp && fault->huge_page_disallowed &&
+ fault->req_level >= it.level)
account_huge_nx_page(vcpu->kvm, sp);
}
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
- fault->write, level, base_gfn, fault->pfn,
+ fault->write, fault->goal_level, base_gfn, fault->pfn,
fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 2ba12ef46cb0..ae0c7bc3b19b 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -118,12 +118,6 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
kvm_x86_ops.cpu_dirty_log_size;
}
-extern int nx_huge_pages;
-static inline bool is_nx_huge_page_enabled(void)
-{
- return READ_ONCE(nx_huge_pages);
-}
-
int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
bool speculative);
@@ -164,11 +158,9 @@ enum {
int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn,
kvm_pfn_t pfn, int max_level);
-int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
- int max_level, kvm_pfn_t *pfnp,
- bool huge_page_disallowed, int *req_level);
+void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, int *goal_levelp);
+ kvm_pfn_t *pfnp, u8 *goal_levelp);
void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index afd2ad8c5173..20f616963ff4 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -658,12 +658,10 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
struct guest_walker *gw)
{
- bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned int direct_access, access;
- int top_level, level, req_level, ret;
+ int top_level, ret;
gfn_t base_gfn = fault->gfn;
WARN_ON_ONCE(gw->gfn != base_gfn);
@@ -730,8 +728,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
link_shadow_page(vcpu, it.sptep, sp);
}
- level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, fault->max_level, &fault->pfn,
- huge_page_disallowed, &req_level);
+ kvm_mmu_hugepage_adjust(vcpu, fault);
trace_kvm_mmu_spte_requested(fault->addr, gw->level, fault->pfn);
@@ -742,12 +739,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
- if (nx_huge_page_workaround_enabled)
+ if (fault->nx_huge_page_workaround_enabled)
disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &level);
+ &fault->pfn, &fault->goal_level);
base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
- if (it.level == level)
+ if (it.level == fault->goal_level)
break;
validate_direct_spte(vcpu, it.sptep, direct_access);
@@ -758,7 +755,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr,
it.level - 1, true, direct_access);
link_shadow_page(vcpu, it.sptep, sp);
- if (huge_page_disallowed && req_level >= it.level)
+ if (fault->huge_page_disallowed &&
+ fault->req_level >= it.level)
account_huge_nx_page(vcpu->kvm, sp);
}
}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6cfba8c28ea2..b48256b88930 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -986,30 +986,25 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
*/
int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu *mmu = vcpu->arch.mmu;
struct tdp_iter iter;
struct kvm_mmu_page *sp;
u64 *child_pt;
u64 new_spte;
int ret;
- int level;
- int req_level;
- level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
- huge_page_disallowed, &req_level);
+ kvm_mmu_hugepage_adjust(vcpu, fault);
- trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);
rcu_read_lock();
tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
- if (nx_huge_page_workaround_enabled)
+ if (fault->nx_huge_page_workaround_enabled)
disallowed_hugepage_adjust(iter.old_spte, fault->gfn,
- iter.level, &fault->pfn, &level);
+ iter.level, &fault->pfn, &fault->goal_level);
- if (iter.level == level)
+ if (iter.level == fault->goal_level)
break;
/*
@@ -1047,8 +1042,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
if (tdp_mmu_set_spte_atomic_no_dirty_log(vcpu->kvm, &iter, new_spte)) {
tdp_mmu_link_page(vcpu->kvm, sp,
- huge_page_disallowed &&
- req_level >= iter.level);
+ fault->huge_page_disallowed &&
+ fault->req_level >= iter.level);
trace_kvm_mmu_get_page(sp, true);
} else {
@@ -1058,7 +1053,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
}
}
- if (iter.level != level) {
+ if (iter.level != fault->goal_level) {
rcu_read_unlock();
return RET_PF_RETRY;
}
--
2.27.0
Prepare for removing the ad_disabled argument of make_spte; instead it can
be found in the role of a struct kvm_mmu_page. First of all, the TDP MMU
must set the role accurately.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6de2c957edd6..1cdb5618bb76 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -167,6 +167,7 @@ static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu,
role.direct = true;
role.gpte_is_8_bytes = true;
role.access = ACC_ALL;
+ role.ad_disabled = !shadow_accessed_mask;
return role;
}
--
2.27.0
Pass struct kvm_page_fault to kvm_tdp_mmu_map() instead of
extracting the arguments from the struct.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +--
arch/x86/kvm/mmu/tdp_mmu.c | 23 +++++++++--------------
arch/x86/kvm/mmu/tdp_mmu.h | 4 +---
3 files changed, 11 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c84e978d76b0..b2020b481db2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3988,8 +3988,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
goto out_unlock;
if (is_tdp_mmu_fault)
- r = kvm_tdp_mmu_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
- fault->pfn, fault->prefault);
+ r = kvm_tdp_mmu_map(vcpu, fault);
else
r = __direct_map(vcpu, fault);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7a5a24ca50e4..4a5bb0b5b639 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -985,35 +985,30 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
* Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
* page tables and SPTEs to translate the faulting guest physical address.
*/
-int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- int map_writable, int max_level, kvm_pfn_t pfn,
- bool prefault)
+int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool write = error_code & PFERR_WRITE_MASK;
- bool exec = error_code & PFERR_FETCH_MASK;
- bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
+ bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu *mmu = vcpu->arch.mmu;
struct tdp_iter iter;
struct kvm_mmu_page *sp;
u64 *child_pt;
u64 new_spte;
int ret;
- gfn_t gfn = gpa >> PAGE_SHIFT;
int level;
int req_level;
- level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+ level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
huge_page_disallowed, &req_level);
- trace_kvm_mmu_spte_requested(gpa, level, pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);
rcu_read_lock();
- tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+ tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
if (nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(iter.old_spte, gfn,
- iter.level, &pfn, &level);
+ disallowed_hugepage_adjust(iter.old_spte, fault->gfn,
+ iter.level, &fault->pfn, &level);
if (iter.level == level)
break;
@@ -1069,8 +1064,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
return RET_PF_RETRY;
}
- ret = tdp_mmu_map_handle_target_level(vcpu, write, map_writable, &iter,
- pfn, prefault);
+ ret = tdp_mmu_map_handle_target_level(vcpu, fault->write, fault->map_writable, &iter,
+ fault->pfn, fault->prefault);
rcu_read_unlock();
return ret;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 358f447d4012..ceaf7ff3ca7c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -48,9 +48,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm);
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);
-int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- int map_writable, int max_level, kvm_pfn_t pfn,
- bool prefault);
+int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
bool flush);
--
2.27.0
Now that make_spte is called directly by the shadow MMU (rather than
wrapped by set_spte), it only has to return one boolean value.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 8 ++++----
arch/x86/kvm/mmu/mmu_internal.h | 5 -----
arch/x86/kvm/mmu/spte.c | 8 ++++----
arch/x86/kvm/mmu/spte.h | 7 +------
arch/x86/kvm/mmu/tdp_mmu.c | 6 +++---
5 files changed, 12 insertions(+), 22 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 19c2fd2189a3..dcbe7df2f890 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2683,7 +2683,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
int was_rmapped = 0;
int ret = RET_PF_FIXED;
bool flush = false;
- int make_spte_ret;
+ bool wrprot;
u64 spte;
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
@@ -2715,8 +2715,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
was_rmapped = 1;
}
- make_spte_ret = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
- true, host_writable, sp_ad_disabled(sp), &spte);
+ wrprot = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
+ true, host_writable, sp_ad_disabled(sp), &spte);
if (*sptep == spte) {
ret = RET_PF_SPURIOUS;
@@ -2725,7 +2725,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
flush |= mmu_spte_update(sptep, spte);
}
- if (make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
+ if (wrprot) {
if (write_fault)
ret = RET_PF_EMULATE;
}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f0295ad51f69..94f4e754facb 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -150,11 +150,6 @@ enum {
RET_PF_SPURIOUS,
};
-/* Bits which may be returned by set_spte() */
-#define SET_SPTE_WRITE_PROTECTED_PT BIT(0)
-#define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
-#define SET_SPTE_SPURIOUS BIT(2)
-
int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn,
kvm_pfn_t pfn, int max_level);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 66be9452ded1..29ea996201b4 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -89,13 +89,13 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
E820_TYPE_RAM);
}
-int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
+bool make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative,
bool can_unsync, bool host_writable, bool ad_disabled,
u64 *new_spte)
{
u64 spte = SPTE_MMU_PRESENT_MASK;
- int ret = 0;
+ bool wrprot = false;
if (ad_disabled)
spte |= SPTE_TDP_AD_DISABLED_MASK;
@@ -162,7 +162,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
if (mmu_try_to_unsync_pages(vcpu, gfn, can_unsync, speculative)) {
pgprintk("%s: found shadow page for %llx, marking ro\n",
__func__, gfn);
- ret |= SET_SPTE_WRITE_PROTECTED_PT;
+ wrprot = true;
pte_access &= ~ACC_WRITE_MASK;
spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask);
}
@@ -183,7 +183,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
kvm_vcpu_mark_page_dirty(vcpu, gfn);
*new_spte = spte;
- return ret;
+ return wrprot;
}
u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index eb7b227fc6cf..1998ec559196 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -334,12 +334,7 @@ static inline u64 get_mmio_spte_generation(u64 spte)
return gen;
}
-/* Bits which may be returned by set_spte() */
-#define SET_SPTE_WRITE_PROTECTED_PT BIT(0)
-#define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
-#define SET_SPTE_SPURIOUS BIT(2)
-
-int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
+bool make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative,
bool can_unsync, bool host_writable, bool ad_disabled,
u64 *new_spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3e10658cf0d7..6de2c957edd6 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -898,12 +898,12 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
{
u64 new_spte;
int ret = RET_PF_FIXED;
- int make_spte_ret = 0;
+ bool wrprot = false;
if (unlikely(!fault->slot))
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
- make_spte_ret = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn,
+ wrprot = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn,
fault->pfn, iter->old_spte, fault->prefault, true,
fault->map_writable, !shadow_accessed_mask,
&new_spte);
@@ -918,7 +918,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
* protected, emulation is needed. If the emulation was skipped,
* the vCPU would have the same fault again.
*/
- if (make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
+ if (wrprot) {
if (fault->write)
ret = RET_PF_EMULATE;
}
--
2.27.0
The level of the new SPTE can be found in the kvm_mmu_page struct; there
is no need to pass it down.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 7 ++++---
arch/x86/kvm/mmu/paging_tmpl.h | 6 +++---
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dcbe7df2f890..91303006faaf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2675,11 +2675,12 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
}
static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
- unsigned int pte_access, bool write_fault, int level,
+ unsigned int pte_access, bool write_fault,
gfn_t gfn, kvm_pfn_t pfn, bool speculative,
bool host_writable)
{
struct kvm_mmu_page *sp = sptep_to_sp(sptep);
+ int level = sp->role.level;
int was_rmapped = 0;
int ret = RET_PF_FIXED;
bool flush = false;
@@ -2777,7 +2778,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
for (i = 0; i < ret; i++, gfn++, start++) {
- mmu_set_spte(vcpu, start, access, false, sp->role.level, gfn,
+ mmu_set_spte(vcpu, start, access, false, gfn,
page_to_pfn(pages[i]), true, true);
put_page(pages[i]);
}
@@ -2980,7 +2981,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return -EFAULT;
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
- fault->write, fault->goal_level, base_gfn, fault->pfn,
+ fault->write, base_gfn, fault->pfn,
fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 500962dceda0..7f2c6eeed04f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -582,7 +582,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* we call mmu_set_spte() with host_writable = true because
* pte_prefetch_gfn_to_pfn always gets a writable pfn.
*/
- mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn,
+ mmu_set_spte(vcpu, spte, pte_access, false, gfn, pfn,
true, true);
kvm_release_pfn_clean(pfn);
@@ -764,8 +764,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
return -EFAULT;
ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
- fault->goal_level, base_gfn, fault->pfn,
- fault->prefault, fault->map_writable);
+ base_gfn, fault->pfn, fault->prefault,
+ fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;
--
2.27.0
From: David Matlack <[email protected]>
mmu_try_to_unsync_pages checks if page tracking is active for the given
gfn, which requires knowing the memslot. We can pass down the memslot
via make_spte to avoid this lookup.
The memslot is also handy for make_spte's marking of the gfn as dirty:
we can test whether dirty page tracking is enabled, and if so ensure that
pages are mapped as writable with 4K granularity. Apart from the warning,
no functional change is intended.
Signed-off-by: David Matlack <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 2 --
arch/x86/kvm/mmu/mmu.c | 8 ++++----
arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
arch/x86/kvm/mmu/page_track.c | 14 +++-----------
arch/x86/kvm/mmu/paging_tmpl.h | 4 +++-
arch/x86/kvm/mmu/spte.c | 10 +++++++---
arch/x86/kvm/mmu/spte.h | 1 +
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
8 files changed, 21 insertions(+), 24 deletions(-)
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 9cd9230e5cc8..5c12f97ce934 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -59,8 +59,6 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
void kvm_slot_page_track_remove_page(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn,
enum kvm_page_track_mode mode);
-bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
- enum kvm_page_track_mode mode);
bool kvm_slot_page_track_is_active(struct kvm_memory_slot *slot, gfn_t gfn,
enum kvm_page_track_mode mode);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7ff2c6c896a8..91292009780a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2572,8 +2572,8 @@ static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
* were marked unsync (or if there is no shadow page), -EPERM if the SPTE must
* be write-protected.
*/
-int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
- bool speculative)
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
+ gfn_t gfn, bool can_unsync, bool speculative)
{
struct kvm_mmu_page *sp;
bool locked = false;
@@ -2583,7 +2583,7 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
* track machinery is used to write-protect upper-level shadow pages,
* i.e. this guards the role.level == 4K assertion below!
*/
- if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+ if (kvm_slot_page_track_is_active(slot, gfn, KVM_PAGE_TRACK_WRITE))
return -EPERM;
/*
@@ -2719,7 +2719,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
was_rmapped = 1;
}
- wrprot = make_spte(vcpu, sp, pte_access, gfn, pfn, *sptep, speculative,
+ wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, speculative,
true, host_writable, &spte);
if (*sptep == spte) {
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 94f4e754facb..585146a712d2 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -118,8 +118,8 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
kvm_x86_ops.cpu_dirty_log_size;
}
-int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
- bool speculative);
+int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
+ gfn_t gfn, bool can_unsync, bool speculative);
void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 859800f7bb95..16e7176c97a5 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -136,6 +136,9 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
}
EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
+/*
+ * check if the corresponding access on the specified guest page is tracked.
+ */
bool kvm_slot_page_track_is_active(struct kvm_memory_slot *slot, gfn_t gfn,
enum kvm_page_track_mode mode)
{
@@ -151,17 +154,6 @@ bool kvm_slot_page_track_is_active(struct kvm_memory_slot *slot, gfn_t gfn,
return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
}
-/*
- * check if the corresponding access on the specified guest page is tracked.
- */
-bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
- enum kvm_page_track_mode mode)
-{
- struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-
- return kvm_slot_page_track_is_active(slot, gfn, mode);
-}
-
void kvm_page_track_cleanup(struct kvm *kvm)
{
struct kvm_page_track_notifier_head *head;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 44361f7e70c8..d8889e02c4b7 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1091,6 +1091,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
u64 *sptep, spte;
+ struct kvm_memory_slot *slot;
unsigned pte_access;
pt_element_t gpte;
gpa_t pte_gpa;
@@ -1127,7 +1128,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
sptep = &sp->spt[i];
spte = *sptep;
host_writable = spte & shadow_host_writable_mask;
- make_spte(vcpu, sp, pte_access, gfn,
+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ make_spte(vcpu, sp, slot, pte_access, gfn,
spte_to_pfn(spte), spte, true, false,
host_writable, &spte);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 2c5c14fbfbe9..871f6114b0fa 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -90,6 +90,7 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
}
bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+ struct kvm_memory_slot *slot,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
u64 old_spte, bool speculative, bool can_unsync,
bool host_writable, u64 *new_spte)
@@ -160,7 +161,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* e.g. it's write-tracked (upper-level SPs) or has one or more
* shadow pages and unsync'ing pages is not allowed.
*/
- if (mmu_try_to_unsync_pages(vcpu, gfn, can_unsync, speculative)) {
+ if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, speculative)) {
pgprintk("%s: found shadow page for %llx, marking ro\n",
__func__, gfn);
wrprot = true;
@@ -180,8 +181,11 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
"spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
- if (spte & PT_WRITABLE_MASK)
- kvm_vcpu_mark_page_dirty(vcpu, gfn);
+ if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
+ /* Enforced by kvm_mmu_hugepage_adjust. */
+ WARN_ON(level > PG_LEVEL_4K);
+ mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
+ }
*new_spte = spte;
return wrprot;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index cbb02a961ac2..7c0b09461349 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -335,6 +335,7 @@ static inline u64 get_mmio_spte_generation(u64 spte)
}
bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+ struct kvm_memory_slot *slot,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
u64 old_spte, bool speculative, bool can_unsync,
bool host_writable, u64 *new_spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6dbf28924bc2..953f24ded6bc 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -906,7 +906,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
if (unlikely(!fault->slot))
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
- wrprot = make_spte(vcpu, sp, ACC_ALL, iter->gfn,
+ wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
fault->pfn, iter->old_spte, fault->prefault, true,
fault->map_writable, &new_spte);
--
2.27.0
Do not bother removing the low bits of the gpa. This masking dates back
to the very first commit of KVM but it is unnecessary, as exemplified
by the other call in kvm_tdp_page_fault.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7ef9c001d1b6..376e90f4f413 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4018,7 +4018,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
- return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault,
+ return direct_page_fault(vcpu, gpa, error_code, prefault,
PG_LEVEL_2M, false);
}
--
2.27.0
Pass struct kvm_page_fault to fast_page_fault() instead of
extracting the arguments from the struct.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++----------------------
1 file changed, 17 insertions(+), 22 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b2020b481db2..36cbe5cba085 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3083,18 +3083,17 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
return false;
}
-static bool page_fault_can_be_fast(u32 error_code)
+static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
{
/*
* Do not fix the mmio spte with invalid generation number which
* need to be updated by slow page fault path.
*/
- if (unlikely(error_code & PFERR_RSVD_MASK))
+ if (fault->rsvd)
return false;
/* See if the page fault is due to an NX violation */
- if (unlikely(((error_code & (PFERR_FETCH_MASK | PFERR_PRESENT_MASK))
- == (PFERR_FETCH_MASK | PFERR_PRESENT_MASK))))
+ if (unlikely(fault->exec && fault->present))
return false;
/*
@@ -3111,9 +3110,7 @@ static bool page_fault_can_be_fast(u32 error_code)
* accesses to a present page.
*/
- return shadow_acc_track_mask != 0 ||
- ((error_code & (PFERR_WRITE_MASK | PFERR_PRESENT_MASK))
- == (PFERR_WRITE_MASK | PFERR_PRESENT_MASK));
+ return shadow_acc_track_mask != 0 || (fault->write && fault->present);
}
/*
@@ -3155,12 +3152,12 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
return true;
}
-static bool is_access_allowed(u32 fault_err_code, u64 spte)
+static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
{
- if (fault_err_code & PFERR_FETCH_MASK)
+ if (fault->exec)
return is_executable_pte(spte);
- if (fault_err_code & PFERR_WRITE_MASK)
+ if (fault->write)
return is_writable_pte(spte);
/* Fault was on Read access */
@@ -3193,7 +3190,7 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
/*
* Returns one of RET_PF_INVALID, RET_PF_FIXED or RET_PF_SPURIOUS.
*/
-static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
+static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_mmu_page *sp;
int ret = RET_PF_INVALID;
@@ -3201,7 +3198,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
u64 *sptep = NULL;
uint retry_count = 0;
- if (!page_fault_can_be_fast(error_code))
+ if (!page_fault_can_be_fast(fault))
return ret;
walk_shadow_page_lockless_begin(vcpu);
@@ -3210,9 +3207,9 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
u64 new_spte;
if (is_tdp_mmu(vcpu->arch.mmu))
- sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, gpa, &spte);
+ sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
else
- sptep = fast_pf_get_last_sptep(vcpu, gpa, &spte);
+ sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
if (!is_shadow_present_pte(spte))
break;
@@ -3231,7 +3228,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
* Need not check the access of upper level table entries since
* they are always ACC_ALL.
*/
- if (is_access_allowed(error_code, spte)) {
+ if (is_access_allowed(fault, spte)) {
ret = RET_PF_SPURIOUS;
break;
}
@@ -3246,7 +3243,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
* be removed in the fast path only if the SPTE was
* write-protected for dirty-logging or access tracking.
*/
- if ((error_code & PFERR_WRITE_MASK) &&
+ if (fault->write &&
spte_can_locklessly_be_made_writable(spte)) {
new_spte |= PT_WRITABLE_MASK;
@@ -3267,7 +3264,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
/* Verify that the fault can be handled in the fast path */
if (new_spte == spte ||
- !is_access_allowed(error_code, new_spte))
+ !is_access_allowed(fault, new_spte))
break;
/*
@@ -3288,7 +3285,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
} while (true);
- trace_fast_page_fault(vcpu, gpa, error_code, sptep, spte, ret);
+ trace_fast_page_fault(vcpu, fault->addr, fault->error_code, sptep, spte, ret);
walk_shadow_page_lockless_end(vcpu);
return ret;
@@ -3946,18 +3943,16 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- gpa_t gpa = fault->addr;
- u32 error_code = fault->error_code;
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
unsigned long mmu_seq;
int r;
- fault->gfn = gpa >> PAGE_SHIFT;
+ fault->gfn = fault->addr >> PAGE_SHIFT;
if (page_fault_handle_page_track(vcpu, fault))
return RET_PF_EMULATE;
- r = fast_page_fault(vcpu, gpa, error_code);
+ r = fast_page_fault(vcpu, fault);
if (r != RET_PF_INVALID)
return r;
--
2.27.0
Pass struct kvm_page_fault to __direct_map() instead of
extracting the arguments from the struct.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6821d05c0557..c84e978d76b0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2982,34 +2982,29 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
}
}
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- int map_writable, int max_level, kvm_pfn_t pfn,
- bool prefault, bool is_tdp)
+static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool write = error_code & PFERR_WRITE_MASK;
- bool exec = error_code & PFERR_FETCH_MASK;
- bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
+ bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_shadow_walk_iterator it;
struct kvm_mmu_page *sp;
int level, req_level, ret;
- gfn_t gfn = gpa >> PAGE_SHIFT;
- gfn_t base_gfn = gfn;
+ gfn_t base_gfn = fault->gfn;
- level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+ level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
huge_page_disallowed, &req_level);
- trace_kvm_mmu_spte_requested(gpa, level, pfn);
- for_each_shadow_entry(vcpu, gpa, it) {
+ trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);
+ for_each_shadow_entry(vcpu, fault->addr, it) {
/*
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
if (nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, gfn, it.level,
- &pfn, &level);
+ disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
+ &fault->pfn, &level);
- base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+ base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == level)
break;
@@ -3021,14 +3016,14 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
it.level - 1, true, ACC_ALL);
link_shadow_page(vcpu, it.sptep, sp);
- if (is_tdp && huge_page_disallowed &&
+ if (fault->is_tdp && huge_page_disallowed &&
req_level >= it.level)
account_huge_nx_page(vcpu->kvm, sp);
}
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
- write, level, base_gfn, pfn, prefault,
- map_writable);
+ fault->write, level, base_gfn, fault->pfn,
+ fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;
@@ -3996,8 +3991,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = kvm_tdp_mmu_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
fault->pfn, fault->prefault);
else
- r = __direct_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
- fault->pfn, fault->prefault, fault->is_tdp);
+ r = __direct_map(vcpu, fault);
out_unlock:
if (is_tdp_mmu_fault)
--
2.27.0
Pass struct kvm_page_fault to tracepoints instead of extracting the
arguments from the struct. This also lets the kvm_mmu_spte_requested
tracepoint pick the gfn directly from fault->gfn, instead of using
the address.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 4 ++--
arch/x86/kvm/mmu/mmutrace.h | 18 +++++++++---------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
4 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7491dc685842..5ba0a844f576 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2986,7 +2986,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
kvm_mmu_hugepage_adjust(vcpu, fault);
- trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault);
for_each_shadow_entry(vcpu, fault->addr, it) {
/*
* We cannot overwrite existing page tables with an NX
@@ -3276,7 +3276,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
} while (true);
- trace_fast_page_fault(vcpu, fault->addr, fault->error_code, sptep, spte, ret);
+ trace_fast_page_fault(vcpu, fault, sptep, spte, ret);
walk_shadow_page_lockless_end(vcpu);
return ret;
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 2924a4081a19..b8151bbca36a 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -252,9 +252,9 @@ TRACE_EVENT(
TRACE_EVENT(
fast_page_fault,
- TP_PROTO(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 error_code,
+ TP_PROTO(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
u64 *sptep, u64 old_spte, int ret),
- TP_ARGS(vcpu, cr2_or_gpa, error_code, sptep, old_spte, ret),
+ TP_ARGS(vcpu, fault, sptep, old_spte, ret),
TP_STRUCT__entry(
__field(int, vcpu_id)
@@ -268,8 +268,8 @@ TRACE_EVENT(
TP_fast_assign(
__entry->vcpu_id = vcpu->vcpu_id;
- __entry->cr2_or_gpa = cr2_or_gpa;
- __entry->error_code = error_code;
+ __entry->cr2_or_gpa = fault->addr;
+ __entry->error_code = fault->error_code;
__entry->sptep = sptep;
__entry->old_spte = old_spte;
__entry->new_spte = *sptep;
@@ -367,8 +367,8 @@ TRACE_EVENT(
TRACE_EVENT(
kvm_mmu_spte_requested,
- TP_PROTO(gpa_t addr, int level, kvm_pfn_t pfn),
- TP_ARGS(addr, level, pfn),
+ TP_PROTO(struct kvm_page_fault *fault),
+ TP_ARGS(fault),
TP_STRUCT__entry(
__field(u64, gfn)
@@ -377,9 +377,9 @@ TRACE_EVENT(
),
TP_fast_assign(
- __entry->gfn = addr >> PAGE_SHIFT;
- __entry->pfn = pfn | (__entry->gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
- __entry->level = level;
+ __entry->gfn = fault->gfn;
+ __entry->pfn = fault->pfn | (fault->gfn & (KVM_PAGES_PER_HPAGE(fault->goal_level) - 1));
+ __entry->level = fault->goal_level;
),
TP_printk("gfn %llx pfn %llx level %d",
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4a263f4511a5..6bc0dbc0baff 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -730,7 +730,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
kvm_mmu_hugepage_adjust(vcpu, fault);
- trace_kvm_mmu_spte_requested(fault->addr, gw->level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault);
for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
clear_sp_write_flooding_count(it.sptep);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 737af596adaf..3bf85a8c7d15 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -995,7 +995,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
kvm_mmu_hugepage_adjust(vcpu, fault);
- trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault);
rcu_read_lock();
--
2.27.0
Pass struct kvm_page_fault to handle_abnormal_pfn() instead of
extracting the arguments from the struct.
Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 18 +++++++++---------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c2d2d019634b..6821d05c0557 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3060,18 +3060,19 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
return -EFAULT;
}
-static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
- kvm_pfn_t pfn, unsigned int access,
- int *ret_val)
+static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+ unsigned int access, int *ret_val)
{
/* The pfn is invalid, report the error! */
- if (unlikely(is_error_pfn(pfn))) {
- *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn);
+ if (unlikely(is_error_pfn(fault->pfn))) {
+ *ret_val = kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
return true;
}
- if (unlikely(is_noslot_pfn(pfn))) {
- vcpu_cache_mmio_info(vcpu, gva, gfn,
+ if (unlikely(is_noslot_pfn(fault->pfn))) {
+ gva_t gva = fault->is_tdp ? 0 : fault->addr;
+
+ vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
access & shadow_mmio_access_mask);
/*
* If MMIO caching is disabled, emulate immediately without
@@ -3975,8 +3976,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (kvm_faultin_pfn(vcpu, fault, &r))
return r;
- if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa,
- fault->gfn, fault->pfn, ACC_ALL, &r))
+ if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r))
return r;
r = RET_PF_RETRY;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 72f0b415be63..0fa7a678b907 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -893,7 +893,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (kvm_faultin_pfn(vcpu, fault, &r))
return r;
- if (handle_abnormal_pfn(vcpu, addr, fault->gfn, fault->pfn, walker.pte_access, &r))
+ if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r))
return r;
/*
--
2.27.0
From: Sean Christopherson <[email protected]>
WARN and bail if the shadow walk for faulting in a SPTE terminates early,
i.e. doesn't reach the expected level because the walk encountered a
terminal SPTE. The shadow walks for page faults are subtle in that they
install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
body, and consume the newly created non-leaf SPTE in the loop control,
e.g. __shadow_walk_next(). In other words, the walks guarantee that the
walk will stop if and only if the target level is reached by installing
non-leaf SPTEs to guarantee the walk remains valid.
Opportunistically use fault->goal-level instead of it.level in
FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
the target level.
Reviewed-by: Lai Jiangshan <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +++
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++--
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5ba0a844f576..2ddbabad5bd2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3012,6 +3012,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
account_huge_nx_page(vcpu->kvm, sp);
}
+ if (WARN_ON_ONCE(it.level != fault->goal_level))
+ return -EFAULT;
+
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
fault->write, fault->goal_level, base_gfn, fault->pfn,
fault->prefault, fault->map_writable);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 6bc0dbc0baff..7a8a2d14a3c7 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -760,9 +760,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
}
}
+ if (WARN_ON_ONCE(it.level != fault->goal_level))
+ return -EFAULT;
+
ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
- it.level, base_gfn, fault->pfn, fault->prefault,
- fault->map_writable);
+ fault->goal_level, base_gfn, fault->pfn,
+ fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;
--
2.27.0
Since the two callers of set_spte do different things with the results,
inlining it actually makes the code simpler to reason about. For example,
mmu_set_spte looks quite like tdp_mmu_map_handle_target_level, but the
similarity is hidden by set_spte.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d63fe7b10bd1..6ba7c60bd4f8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2700,10 +2700,12 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
gfn_t gfn, kvm_pfn_t pfn, bool speculative,
bool host_writable)
{
+ struct kvm_mmu_page *sp = sptep_to_sp(sptep);
int was_rmapped = 0;
- int set_spte_ret;
int ret = RET_PF_FIXED;
bool flush = false;
+ int make_spte_ret;
+ u64 spte;
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
*sptep, write_fault, gfn);
@@ -2734,30 +2736,29 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
was_rmapped = 1;
}
- set_spte_ret = set_spte(vcpu, sptep, pte_access, level, gfn, pfn,
- speculative, true, host_writable);
- if (set_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
+ make_spte_ret = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative,
+ true, host_writable, sp_ad_disabled(sp), &spte);
+
+ if (*sptep == spte) {
+ ret = RET_PF_SPURIOUS;
+ } else {
+ trace_kvm_mmu_set_spte(level, gfn, sptep);
+ flush |= mmu_spte_update(sptep, spte);
+ }
+
+ if (make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
if (write_fault)
ret = RET_PF_EMULATE;
}
- if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
+ if (flush)
kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
KVM_PAGES_PER_HPAGE(level));
- /*
- * The fault is fully spurious if and only if the new SPTE and old SPTE
- * are identical, and emulation is not required.
- */
- if ((set_spte_ret & SET_SPTE_SPURIOUS) && ret == RET_PF_FIXED) {
- WARN_ON_ONCE(!was_rmapped);
- return RET_PF_SPURIOUS;
- }
-
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
- trace_kvm_mmu_set_spte(level, gfn, sptep);
if (!was_rmapped) {
+ WARN_ON_ONCE(ret == RET_PF_SPURIOUS);
kvm_update_page_stats(vcpu->kvm, level, 1);
rmap_add(vcpu, sptep, gfn);
}
--
2.27.0
From: David Matlack <[email protected]>
Now that kvm_page_fault has a pointer to the memslot it can be passed
down to the page tracking code to avoid a redundant slot lookup.
No functional change intended.
Signed-off-by: David Matlack <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 2 ++
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/mmu/page_track.c | 20 +++++++++++++-------
3 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 6a5f3acf2b33..9cd9230e5cc8 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -61,6 +61,8 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
enum kvm_page_track_mode mode);
bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
enum kvm_page_track_mode mode);
+bool kvm_slot_page_track_is_active(struct kvm_memory_slot *slot, gfn_t gfn,
+ enum kvm_page_track_mode mode);
void
kvm_page_track_register_notifier(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 754578458cb7..d63fe7b10bd1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3819,7 +3819,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
* guest is writing the page which is write tracked which can
* not be fixed by page fault handler.
*/
- if (kvm_page_track_is_active(vcpu, fault->gfn, KVM_PAGE_TRACK_WRITE))
+ if (kvm_slot_page_track_is_active(fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
return true;
return false;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 21427e84a82e..859800f7bb95 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -136,19 +136,14 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
}
EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
-/*
- * check if the corresponding access on the specified guest page is tracked.
- */
-bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
- enum kvm_page_track_mode mode)
+bool kvm_slot_page_track_is_active(struct kvm_memory_slot *slot, gfn_t gfn,
+ enum kvm_page_track_mode mode)
{
- struct kvm_memory_slot *slot;
int index;
if (WARN_ON(!page_track_mode_is_valid(mode)))
return false;
- slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
if (!slot)
return false;
@@ -156,6 +151,17 @@ bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
}
+/*
+ * check if the corresponding access on the specified guest page is tracked.
+ */
+bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn,
+ enum kvm_page_track_mode mode)
+{
+ struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+
+ return kvm_slot_page_track_is_active(slot, gfn, mode);
+}
+
void kvm_page_track_cleanup(struct kvm *kvm)
{
struct kvm_page_track_notifier_head *head;
--
2.27.0
From: David Matlack <[email protected]>
Avoid the memslot lookup in rmap_add, by passing it down from the fault
handling code to mmu_set_spte and then to rmap_add.
No functional change intended.
Signed-off-by: David Matlack <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 27 +++++++--------------------
arch/x86/kvm/mmu/paging_tmpl.h | 12 +++++++++---
2 files changed, 16 insertions(+), 23 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4b304f60cf44..7ff2c6c896a8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1625,16 +1625,15 @@ static bool kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
#define RMAP_RECYCLE_THRESHOLD 1000
-static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
+static void rmap_add(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
+ u64 *spte, gfn_t gfn)
{
- struct kvm_memory_slot *slot;
struct kvm_mmu_page *sp;
struct kvm_rmap_head *rmap_head;
int rmap_count;
sp = sptep_to_sp(spte);
kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
- slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
rmap_count = pte_list_add(vcpu, spte, rmap_head);
@@ -2674,8 +2673,8 @@ int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync,
return 0;
}
-static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
- unsigned int pte_access, gfn_t gfn,
+static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
+ u64 *sptep, unsigned int pte_access, gfn_t gfn,
kvm_pfn_t pfn, struct kvm_page_fault *fault)
{
struct kvm_mmu_page *sp = sptep_to_sp(sptep);
@@ -2744,24 +2743,12 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (!was_rmapped) {
WARN_ON_ONCE(ret == RET_PF_SPURIOUS);
kvm_update_page_stats(vcpu->kvm, level, 1);
- rmap_add(vcpu, sptep, gfn);
+ rmap_add(vcpu, slot, sptep, gfn);
}
return ret;
}
-static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
- bool no_dirty_log)
-{
- struct kvm_memory_slot *slot;
-
- slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
- if (!slot)
- return KVM_PFN_ERR_FAULT;
-
- return gfn_to_pfn_memslot_atomic(slot, gfn);
-}
-
static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp,
u64 *start, u64 *end)
@@ -2782,7 +2769,7 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
for (i = 0; i < ret; i++, gfn++, start++) {
- mmu_set_spte(vcpu, start, access, gfn,
+ mmu_set_spte(vcpu, slot, start, access, gfn,
page_to_pfn(pages[i]), NULL);
put_page(pages[i]);
}
@@ -2984,7 +2971,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
if (WARN_ON_ONCE(it.level != fault->goal_level))
return -EFAULT;
- ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
+ ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL,
base_gfn, fault->pfn, fault);
if (ret == RET_PF_SPURIOUS)
return ret;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 8c07c42a4d73..44361f7e70c8 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -561,6 +561,7 @@ static bool
FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 *spte, pt_element_t gpte, bool no_dirty_log)
{
+ struct kvm_memory_slot *slot;
unsigned pte_access;
gfn_t gfn;
kvm_pfn_t pfn;
@@ -573,12 +574,17 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
gfn = gpte_to_gfn(gpte);
pte_access = sp->role.access & FNAME(gpte_access)(gpte);
FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
- pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn,
+
+ slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
no_dirty_log && (pte_access & ACC_WRITE_MASK));
+ if (!slot)
+ return false;
+
+ pfn = gfn_to_pfn_memslot_atomic(slot, gfn);
if (is_error_pfn(pfn))
return false;
- mmu_set_spte(vcpu, spte, pte_access, gfn, pfn, NULL);
+ mmu_set_spte(vcpu, slot, spte, pte_access, gfn, pfn, NULL);
kvm_release_pfn_clean(pfn);
return true;
}
@@ -757,7 +763,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
if (WARN_ON_ONCE(it.level != fault->goal_level))
return -EFAULT;
- ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access,
+ ret = mmu_set_spte(vcpu, fault->slot, it.sptep, gw->pte_access,
base_gfn, fault->pfn, fault);
if (ret == RET_PF_SPURIOUS)
return ret;
--
2.27.0
From: David Matlack <[email protected]>
Consolidate rmap_recycle and rmap_add into a single function since they
are only ever called together (and only from one place). This has a nice
side effect of eliminating an extra kvm_vcpu_gfn_to_memslot(). In
addition it makes mmu_set_spte(), which is a very long function, a
little shorter.
No functional change intended.
Signed-off-by: David Matlack <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 40 ++++++++++++++--------------------------
1 file changed, 14 insertions(+), 26 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2ddbabad5bd2..8b6dc276935f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1071,20 +1071,6 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
return kvm_mmu_memory_cache_nr_free_objects(mc);
}
-static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
-{
- struct kvm_memory_slot *slot;
- struct kvm_mmu_page *sp;
- struct kvm_rmap_head *rmap_head;
-
- sp = sptep_to_sp(spte);
- kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
- slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
- rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
- return pte_list_add(vcpu, spte, rmap_head);
-}
-
-
static void rmap_remove(struct kvm *kvm, u64 *spte)
{
struct kvm_memslots *slots;
@@ -1097,9 +1083,9 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
/*
- * Unlike rmap_add and rmap_recycle, rmap_remove does not run in the
- * context of a vCPU so have to determine which memslots to use based
- * on context information in sp->role.
+ * Unlike rmap_add, rmap_remove does not run in the context of a vCPU
+ * so we have to determine which memslots to use based on context
+ * information in sp->role.
*/
slots = kvm_memslots_for_spte_role(kvm, sp->role);
@@ -1639,19 +1625,24 @@ static bool kvm_test_age_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
#define RMAP_RECYCLE_THRESHOLD 1000
-static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
+static void rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
{
struct kvm_memory_slot *slot;
- struct kvm_rmap_head *rmap_head;
struct kvm_mmu_page *sp;
+ struct kvm_rmap_head *rmap_head;
+ int rmap_count;
sp = sptep_to_sp(spte);
+ kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
+ rmap_count = pte_list_add(vcpu, spte, rmap_head);
- kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, __pte(0));
- kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
- KVM_PAGES_PER_HPAGE(sp->role.level));
+ if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
+ kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, __pte(0));
+ kvm_flush_remote_tlbs_with_address(
+ vcpu->kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
+ }
}
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -2713,7 +2704,6 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
bool host_writable)
{
int was_rmapped = 0;
- int rmap_count;
int set_spte_ret;
int ret = RET_PF_FIXED;
bool flush = false;
@@ -2772,9 +2762,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (!was_rmapped) {
kvm_update_page_stats(vcpu->kvm, level, 1);
- rmap_count = rmap_add(vcpu, sptep, gfn);
- if (rmap_count > RMAP_RECYCLE_THRESHOLD)
- rmap_recycle(vcpu, sptep, gfn);
+ rmap_add(vcpu, sptep, gfn);
}
return ret;
--
2.27.0
On Fri, Sep 24, 2021 at 12:31:52PM -0400, Paolo Bonzini wrote:
> Pass the old SPTE in the same variable that receives the new SPTE. This
> reduces the number of arguments from 11 to 10.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 18 ++++++++----------
> arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> arch/x86/kvm/mmu/spte.c | 7 ++++---
> arch/x86/kvm/mmu/spte.h | 2 +-
> arch/x86/kvm/mmu/tdp_mmu.c | 9 +++++----
> 5 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 91292009780a..b363433bcd2c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2682,8 +2682,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
> int was_rmapped = 0;
> int ret = RET_PF_FIXED;
> bool flush = false;
> + u64 spte = *sptep;
> bool wrprot;
> - u64 spte;
>
> /* Prefetching always gets a writable pfn. */
> bool host_writable = !fault || fault->map_writable;
> @@ -2691,35 +2691,33 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
> bool write_fault = fault && fault->write;
>
> pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
> - *sptep, write_fault, gfn);
> + spte, write_fault, gfn);
>
> if (unlikely(is_noslot_pfn(pfn))) {
> mark_mmio_spte(vcpu, sptep, gfn, pte_access);
> return RET_PF_EMULATE;
> }
>
> - if (is_shadow_present_pte(*sptep)) {
> + if (is_shadow_present_pte(spte)) {
> /*
> * If we overwrite a PTE page pointer with a 2MB PMD, unlink
> * the parent of the now unreachable PTE.
> */
> - if (level > PG_LEVEL_4K && !is_large_pte(*sptep)) {
> + if (level > PG_LEVEL_4K && !is_large_pte(spte)) {
> struct kvm_mmu_page *child;
> - u64 pte = *sptep;
> -
> - child = to_shadow_page(pte & PT64_BASE_ADDR_MASK);
> + child = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> drop_parent_pte(child, sptep);
> flush = true;
> - } else if (pfn != spte_to_pfn(*sptep)) {
> + } else if (pfn != spte_to_pfn(spte)) {
> pgprintk("hfn old %llx new %llx\n",
> - spte_to_pfn(*sptep), pfn);
> + spte_to_pfn(spte), pfn);
> drop_spte(vcpu->kvm, sptep);
> flush = true;
> } else
> was_rmapped = 1;
> }
>
> - wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, speculative,
> + wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, speculative,
> true, host_writable, &spte);
>
> if (*sptep == spte) {
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index d8889e02c4b7..88551cfd06c6 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -1130,7 +1130,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> host_writable = spte & shadow_host_writable_mask;
> slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> make_spte(vcpu, sp, slot, pte_access, gfn,
> - spte_to_pfn(spte), spte, true, false,
> + spte_to_pfn(spte), true, false,
> host_writable, &spte);
>
> flush |= mmu_spte_update(sptep, spte);
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 871f6114b0fa..91525388032e 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -92,10 +92,11 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
> bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> struct kvm_memory_slot *slot,
> unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
> - u64 old_spte, bool speculative, bool can_unsync,
> - bool host_writable, u64 *new_spte)
> + bool speculative, bool can_unsync,
> + bool host_writable, u64 *sptep)
I'd prefer a different name since `sptep` has specific meaning
throughout the mmu code. (It's the address of the spte in the page
table.)
Case in point, I was going to suggest we can get rid of struct
kvm_mmu_page since it can be derived from the sptep and then realized
how wrong that was :).
Instead of receiving the new spte as a parameter what do you think about
changing make_spte to return the new spte? I think that would make the
code more readable (but won't reduce the number of arguments because
you'd have to add wrprot).
> {
> int level = sp->role.level;
> + u64 old_spte = *sptep;
> u64 spte = SPTE_MMU_PRESENT_MASK;
> bool wrprot = false;
>
> @@ -187,7 +188,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
> }
>
> - *new_spte = spte;
> + *sptep = spte;
> return wrprot;
> }
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 7c0b09461349..231531c6015a 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -337,7 +337,7 @@ static inline u64 get_mmio_spte_generation(u64 spte)
> bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> struct kvm_memory_slot *slot,
> unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
> - u64 old_spte, bool speculative, bool can_unsync,
> + bool speculative, bool can_unsync,
> bool host_writable, u64 *new_spte);
> u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
> u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 953f24ded6bc..29b739c7bba4 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -903,13 +903,14 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
> bool wrprot = false;
>
> WARN_ON(sp->role.level != fault->goal_level);
> - if (unlikely(!fault->slot))
> + if (unlikely(!fault->slot)) {
> new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
> - else
> + } else {
> + new_spte = iter->old_spte;
> wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
> - fault->pfn, iter->old_spte, fault->prefault, true,
> + fault->pfn, fault->prefault, true,
> fault->map_writable, &new_spte);
> -
> + }
> if (new_spte == iter->old_spte)
> ret = RET_PF_SPURIOUS;
> else if (!tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte))
> --
> 2.27.0
>
On Fri, Sep 24, 2021 at 12:31:23PM -0400, Paolo Bonzini wrote:
> Create a single structure for arguments that are passed from
> kvm_mmu_do_page_fault to the page fault handlers. Later
> the structure will grow to include various output parameters
> that are passed back to the next steps in the page fault
> handling.
>
> Suggested-by: Isaku Yamahata <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/mmu.h | 34 +++++++++++++++++++++++++++++++---
> 1 file changed, 31 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index e9688a9f7b57..0553ef92946e 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -114,17 +114,45 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
> vcpu->arch.mmu->shadow_root_level);
> }
>
> +struct kvm_page_fault {
> + /* arguments to kvm_mmu_do_page_fault. */
> + const gpa_t addr;
> + const u32 error_code;
> + const bool prefault;
This is somewhat tangential to your change but... I notice KVM uses
"prefetch" and "prefault" interchangably. If we changed prefault to
prefetch here and in kvm_mmu_do_page_fault then that would make the
naming consistent throughout KVM ("prefetch" for everything).
> +
> + /* Derived from error_code. */
> + const bool exec;
> + const bool write;
> + const bool present;
> + const bool rsvd;
> + const bool user;
> +
> + /* Derived from mmu. */
> + const bool is_tdp;
> +};
> +
> int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> bool prefault);
>
> static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> u32 err, bool prefault)
> {
> + struct kvm_page_fault fault = {
> + .addr = cr2_or_gpa,
> + .error_code = err,
> + .exec = err & PFERR_FETCH_MASK,
> + .write = err & PFERR_WRITE_MASK,
> + .present = err & PFERR_PRESENT_MASK,
> + .rsvd = err & PFERR_RSVD_MASK,
> + .user = err & PFERR_USER_MASK,
> + .prefault = prefault,
> + .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
> + };
> #ifdef CONFIG_RETPOLINE
> - if (likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault))
> - return kvm_tdp_page_fault(vcpu, cr2_or_gpa, err, prefault);
> + if (fault.is_tdp)
> + return kvm_tdp_page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
> #endif
> - return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault);
> + return vcpu->arch.mmu->page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
> }
>
> /*
> --
> 2.27.0
>
>
On Fri, Sep 24, 2021 at 12:31:21PM -0400, Paolo Bonzini wrote:
> The current kvm page fault handlers passes around many arguments to the
> functions. To simplify those arguments and local variables, introduce
> a data structure, struct kvm_page_fault, to hold those arguments and
> variables. struct kvm_page_fault is allocated on stack on the caller
> of kvm fault handler, kvm_mmu_do_page_fault(), and passed around.
>
> Later in the series, my patches are interleaved with David's work to
> add the memory slot to the struct and avoid repeated lookups. Along the
> way you will find some cleanups of functions with a ludicrous number of
> arguments, so that they use struct kvm_page_fault as much as possible
> or at least receive related information from a single argument. make_spte
> in particular goes from 11 to 10 arguments (yeah I know) despite gaining
> two for kvm_mmu_page and kvm_memory_slot.
>
> This can be sometimes a bit debatable (for example struct kvm_mmu_page
> is used a little more on the TDP MMU paths), but overall I think the
> result is an improvement. For example the SET_SPTE_* constants go
> away, and they absolutely didn't belong in the TDP MMU. But if you
> disagree with some of the changes, please speak up loudly!
Thanks for getting this cleaned up and sent out. The series looks good
overall. I had 2 small comments but otherwise:
Reviewed-by: David Matlack <[email protected]>
>
> Testing: survives kvm-unit-tests on Intel with all of ept=0, ept=1
> tdp_mmu=0, ept=1. Will do more before committing to it in kvm/next of
> course.
>
> Paolo
>
> David Matlack (5):
> KVM: x86/mmu: Fold rmap_recycle into rmap_add
> KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault
> KVM: x86/mmu: Avoid memslot lookup in page_fault_handle_page_track
> KVM: x86/mmu: Avoid memslot lookup in rmap_add
> KVM: x86/mmu: Avoid memslot lookup in make_spte and
> mmu_try_to_unsync_pages
>
> Paolo Bonzini (25):
> KVM: MMU: pass unadulterated gpa to direct_page_fault
> KVM: MMU: Introduce struct kvm_page_fault
> KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault
> KVM: MMU: change direct_page_fault() arguments to kvm_page_fault
> KVM: MMU: change page_fault_handle_page_track() arguments to
> kvm_page_fault
> KVM: MMU: change kvm_faultin_pfn() arguments to kvm_page_fault
> KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault
> KVM: MMU: change __direct_map() arguments to kvm_page_fault
> KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault
> KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault
> KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to
> kvm_page_fault
> KVM: MMU: change fast_page_fault() arguments to kvm_page_fault
> KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault
> KVM: MMU: change disallowed_hugepage_adjust() arguments to
> kvm_page_fault
> KVM: MMU: change tracepoints arguments to kvm_page_fault
> KVM: MMU: mark page dirty in make_spte
> KVM: MMU: unify tdp_mmu_map_set_spte_atomic and
> tdp_mmu_set_spte_atomic_no_dirty_log
> KVM: MMU: inline set_spte in mmu_set_spte
> KVM: MMU: inline set_spte in FNAME(sync_page)
> KVM: MMU: clean up make_spte return value
> KVM: MMU: remove unnecessary argument to mmu_set_spte
> KVM: MMU: set ad_disabled in TDP MMU role
> KVM: MMU: pass kvm_mmu_page struct to make_spte
> KVM: MMU: pass struct kvm_page_fault to mmu_set_spte
> KVM: MMU: make spte an in-out argument in make_spte
>
> Sean Christopherson (1):
> KVM: x86/mmu: Verify shadow walk doesn't terminate early in page
> faults
>
> arch/x86/include/asm/kvm_host.h | 4 +-
> arch/x86/include/asm/kvm_page_track.h | 4 +-
> arch/x86/kvm/mmu.h | 84 +++++-
> arch/x86/kvm/mmu/mmu.c | 408 +++++++++++---------------
> arch/x86/kvm/mmu/mmu_internal.h | 22 +-
> arch/x86/kvm/mmu/mmutrace.h | 18 +-
> arch/x86/kvm/mmu/page_track.c | 6 +-
> arch/x86/kvm/mmu/paging_tmpl.h | 137 +++++----
> arch/x86/kvm/mmu/spte.c | 29 +-
> arch/x86/kvm/mmu/spte.h | 14 +-
> arch/x86/kvm/mmu/tdp_mmu.c | 123 +++-----
> arch/x86/kvm/mmu/tdp_mmu.h | 4 +-
> 12 files changed, 390 insertions(+), 463 deletions(-)
>
> --
> 2.27.0
>
On 29/09/21 01:25, David Matlack wrote:
>> +struct kvm_page_fault {
>> + /* arguments to kvm_mmu_do_page_fault. */
>> + const gpa_t addr;
>> + const u32 error_code;
>> + const bool prefault;
> This is somewhat tangential to your change but... I notice KVM uses
> "prefetch" and "prefault" interchangably. If we changed prefault to
> prefetch here and in kvm_mmu_do_page_fault then that would make the
> naming consistent throughout KVM ("prefetch" for everything).
>
Sounds good.
Paolo
On 29/09/21 01:20, David Matlack wrote:
>> bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
>> struct kvm_memory_slot *slot,
>> unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
>> - u64 old_spte, bool speculative, bool can_unsync,
>> - bool host_writable, u64 *new_spte)
>> + bool speculative, bool can_unsync,
>> + bool host_writable, u64 *sptep)
> I'd prefer a different name since `sptep` has specific meaning
> throughout the mmu code. (It's the address of the spte in the page
> table.)
>
> Case in point, I was going to suggest we can get rid of struct
> kvm_mmu_page since it can be derived from the sptep and then realized
> how wrong that was:).
>
> Instead of receiving the new spte as a parameter what do you think about
> changing make_spte to return the new spte? I think that would make the
> code more readable (but won't reduce the number of arguments because
> you'd have to add wrprot).
>
You have a point. I've dropped this patch for now.
Paolo