2021-08-07 13:51:04

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 00/16] KVM: x86: pass arguments on the page fault path via struct kvm_page_fault

This is a revival of Isaku's patches from
https://lore.kernel.org/kvm/[email protected]/.
The current kvm page fault handlers passes around many arguments to the
functions. To simplify those arguments and local variables, introduce
a data structure, struct kvm_page_fault, to hold those arguments and
variables. struct kvm_page_fault is allocated on stack on the caller
of kvm fault handler, kvm_mmu_do_page_fault(), and passed around.

The patches were redone from scratch based on the suggested struct layout
from the review (https://lore.kernel.org/kvm/[email protected]/)
and the subjects of Isaku's patches, so I kept authorship for myself
and gave him a "Suggested-by" tag.

The first two steps are unrelated cleanups that come in handy later on.

Paolo

Paolo Bonzini (16):
KVM: MMU: pass unadulterated gpa to direct_page_fault
KVM: x86: clamp host mapping level to max_level in
kvm_mmu_max_mapping_level
KVM: MMU: Introduce struct kvm_page_fault
KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault
KVM: MMU: change direct_page_fault() arguments to kvm_page_fault
KVM: MMU: change page_fault_handle_page_track() arguments to
kvm_page_fault
KVM: MMU: change try_async_pf() arguments to kvm_page_fault
KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault
KVM: MMU: change __direct_map() arguments to kvm_page_fault
KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault
KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault
KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to
kvm_page_fault
KVM: MMU: change fast_page_fault() arguments to kvm_page_fault
KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault
KVM: MMU: change disallowed_hugepage_adjust() arguments to
kvm_page_fault
KVM: MMU: change tracepoints arguments to kvm_page_fault

arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/mmu.h | 81 ++++++++++-
arch/x86/kvm/mmu/mmu.c | 241 ++++++++++++++------------------
arch/x86/kvm/mmu/mmu_internal.h | 13 +-
arch/x86/kvm/mmu/mmutrace.h | 18 +--
arch/x86/kvm/mmu/paging_tmpl.h | 96 ++++++-------
arch/x86/kvm/mmu/tdp_mmu.c | 49 +++----
arch/x86/kvm/mmu/tdp_mmu.h | 4 +-
8 files changed, 253 insertions(+), 253 deletions(-)

--
2.27.0


2021-08-07 13:51:04

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 03/16] KVM: MMU: Introduce struct kvm_page_fault

Create a single structure for arguments that are passed from
kvm_mmu_do_page_fault to the page fault handlers. Later
the structure will grow to include various output parameters
that are passed back to the next steps in the page fault
handling.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 83e6c6965f1e..5c06e059e483 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -114,17 +114,45 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
vcpu->arch.mmu->shadow_root_level);
}

+struct kvm_page_fault {
+ /* arguments to kvm_mmu_do_page_fault. */
+ const gpa_t addr;
+ const u32 error_code;
+ const bool prefault;
+
+ /* Derived from error_code. */
+ const bool exec;
+ const bool write;
+ const bool present;
+ const bool rsvd;
+ const bool user;
+
+ /* Derived from mmu. */
+ const bool is_tdp;
+};
+
int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
bool prefault);

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefault)
{
+ struct kvm_page_fault fault = {
+ .addr = cr2_or_gpa,
+ .error_code = err,
+ .exec = err & PFERR_FETCH_MASK,
+ .write = err & PFERR_WRITE_MASK,
+ .present = err & PFERR_PRESENT_MASK,
+ .rsvd = err & PFERR_RSVD_MASK,
+ .user = err & PFERR_USER_MASK,
+ .prefault = prefault,
+ .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+ };
#ifdef CONFIG_RETPOLINE
- if (likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault))
- return kvm_tdp_page_fault(vcpu, cr2_or_gpa, err, prefault);
+ if (fault.is_tdp)
+ return kvm_tdp_page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
#endif
- return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault);
+ return vcpu->arch.mmu->page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
}

/*
--
2.27.0


2021-08-07 13:51:06

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 04/16] KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault

Pass struct kvm_page_fault to mmu->page_fault() instead of
extracting the arguments from the struct. FNAME(page_fault) can use
the precomputed bools from the error code.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++--
arch/x86/kvm/mmu.h | 7 +++----
arch/x86/kvm/mmu/mmu.c | 15 ++++++++-------
arch/x86/kvm/mmu/paging_tmpl.h | 22 +++++++++++-----------
4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6a73ff7db5f9..3399470a44a9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -401,6 +401,7 @@ struct kvm_mmu_root_info {
#define KVM_HAVE_MMU_RWLOCK

struct kvm_mmu_page;
+struct kvm_page_fault;

/*
* x86 supports 4 paging modes (5-level 64-bit, 4-level 64-bit, 3-level 32-bit,
@@ -410,8 +411,7 @@ struct kvm_mmu_page;
struct kvm_mmu {
unsigned long (*get_guest_pgd)(struct kvm_vcpu *vcpu);
u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index);
- int (*page_fault)(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 err,
- bool prefault);
+ int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
struct x86_exception *fault);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t gva_or_gpa,
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5c06e059e483..bbe5fe57c2af 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -131,8 +131,7 @@ struct kvm_page_fault {
const bool is_tdp;
};

-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- bool prefault);
+int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefault)
@@ -150,9 +149,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
};
#ifdef CONFIG_RETPOLINE
if (fault.is_tdp)
- return kvm_tdp_page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
+ return kvm_tdp_page_fault(vcpu, &fault);
#endif
- return vcpu->arch.mmu->page_fault(vcpu, fault.addr, fault.error_code, fault.prefault);
+ return vcpu->arch.mmu->page_fault(vcpu, &fault);
}

/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5d4de39fe5a9..bb3a2c2aa62e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3941,13 +3941,14 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
return r;
}

-static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
- u32 error_code, bool prefault)
+static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
{
pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);

/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
- return direct_page_fault(vcpu, gpa, error_code, prefault,
+ return direct_page_fault(vcpu, fault->addr,
+ fault->error_code, fault->prefault,
PG_LEVEL_2M, false);
}

@@ -3984,10 +3985,10 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
}
EXPORT_SYMBOL_GPL(kvm_handle_page_fault);

-int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- bool prefault)
+int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
int max_level;
+ gpa_t gpa = fault->addr;

for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
max_level > PG_LEVEL_4K;
@@ -3999,8 +4000,8 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
break;
}

- return direct_page_fault(vcpu, gpa, error_code, prefault,
- max_level, true);
+ return direct_page_fault(vcpu, gpa, fault->error_code,
+ fault->prefault, max_level, true);
}

static void nonpaging_init_context(struct kvm_mmu *context)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index ee044d357b5f..916a8106d0f4 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -822,11 +822,10 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
* Returns: 1 if we need to emulate the instruction, 0 otherwise, or
* a negative value on error.
*/
-static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
- bool prefault)
+static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- bool write_fault = error_code & PFERR_WRITE_MASK;
- bool user_fault = error_code & PFERR_USER_MASK;
+ gpa_t addr = fault->addr;
+ u32 error_code = fault->error_code;
struct guest_walker walker;
int r;
kvm_pfn_t pfn;
@@ -836,6 +835,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
int max_level;

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
+ WARN_ON_ONCE(fault->is_tdp);

/*
* If PFEC.RSVD is set, this is a shadow page fault.
@@ -853,7 +853,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
*/
if (!r) {
pgprintk("%s: guest page fault\n", __func__);
- if (!prefault)
+ if (!fault->prefault)
kvm_inject_emulated_page_fault(vcpu, &walker.fault);

return RET_PF_RETRY;
@@ -871,7 +871,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
vcpu->arch.write_fault_to_shadow_pgtable = false;

is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
- &walker, user_fault, &vcpu->arch.write_fault_to_shadow_pgtable);
+ &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);

if (is_self_change_mapping)
max_level = PG_LEVEL_4K;
@@ -881,8 +881,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();

- if (try_async_pf(vcpu, prefault, walker.gfn, addr, &pfn, &hva,
- write_fault, &map_writable))
+ if (try_async_pf(vcpu, fault->prefault, walker.gfn, addr, &pfn, &hva,
+ fault->write, &map_writable))
return RET_PF_RETRY;

if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
@@ -892,8 +892,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
* Do not change pte_access if the pfn is a mmio page, otherwise
* we will cache the incorrect access into mmio spte.
*/
- if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) &&
- !is_cr0_wp(vcpu->arch.mmu) && !user_fault && !is_noslot_pfn(pfn)) {
+ if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
+ !is_cr0_wp(vcpu->arch.mmu) && !fault->user && !is_noslot_pfn(pfn)) {
walker.pte_access |= ACC_WRITE_MASK;
walker.pte_access &= ~ACC_USER_MASK;

@@ -917,7 +917,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
if (r)
goto out_unlock;
r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
- map_writable, prefault);
+ map_writable, fault->prefault);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);

out_unlock:
--
2.27.0


2021-08-07 13:51:24

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 01/16] KVM: MMU: pass unadulterated gpa to direct_page_fault

Do not bother removing the low bits of the gpa. This masking dates back
to the very first commit of KVM but it is unnecessary---or even
problematic, because the gpa is later used to fill in the MMIO page cache.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 964c797dcc46..7477f340d318 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3950,7 +3950,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);

/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
- return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault,
+ return direct_page_fault(vcpu, gpa, error_code, prefault,
PG_LEVEL_2M, false);
}

--
2.27.0


2021-08-07 13:51:28

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 06/16] KVM: MMU: change page_fault_handle_page_track() arguments to kvm_page_fault

Add fields to struct kvm_page_fault corresponding to the arguments
of page_fault_handle_page_track(). The fields are initialized in the
callers, and page_fault_handle_page_track() receives a struct
kvm_page_fault instead of having to extract the arguments out of it.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 18 +++++++++---------
arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++---
3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 158263847bef..c3d404155ceb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -132,6 +132,9 @@ struct kvm_page_fault {

/* Input to FNAME(fetch), __direct_map and kvm_tdp_mmu_map. */
u8 max_level;
+
+ /* Shifted addr, or result of guest page table walk if addr is a gva. */
+ gfn_t gfn;
};

int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e93ee0ebe5ff..3af49678d4f8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3788,20 +3788,19 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct)
}

static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
- u32 error_code, gfn_t gfn)
+ struct kvm_page_fault *fault)
{
- if (unlikely(error_code & PFERR_RSVD_MASK))
+ if (unlikely(fault->rsvd))
return false;

- if (!(error_code & PFERR_PRESENT_MASK) ||
- !(error_code & PFERR_WRITE_MASK))
+ if (!fault->present || !fault->write)
return false;

/*
* guest is writing the page which is write tracked which can
* not be fixed by page fault handler.
*/
- if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE))
+ if (kvm_page_track_is_active(vcpu, fault->gfn, KVM_PAGE_TRACK_WRITE))
return true;

return false;
@@ -3885,13 +3884,13 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
bool map_writable;

- gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
kvm_pfn_t pfn;
hva_t hva;
int r;

- if (page_fault_handle_page_track(vcpu, error_code, gfn))
+ fault->gfn = gpa >> PAGE_SHIFT;
+ if (page_fault_handle_page_track(vcpu, fault))
return RET_PF_EMULATE;

r = fast_page_fault(vcpu, gpa, error_code);
@@ -3905,11 +3904,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();

- if (try_async_pf(vcpu, fault->prefault, gfn, gpa, &pfn, &hva,
+ if (try_async_pf(vcpu, fault->prefault, fault->gfn, gpa, &pfn, &hva,
fault->write, &map_writable))
return RET_PF_RETRY;

- if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+ if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa,
+ fault->gfn, pfn, ACC_ALL, &r))
return r;

r = RET_PF_RETRY;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5352323646b0..2ef219bf1745 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -858,7 +858,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
return RET_PF_RETRY;
}

- if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) {
+ fault->gfn = walker.gfn;
+ if (page_fault_handle_page_track(vcpu, fault)) {
shadow_page_table_clear_flood(vcpu, addr);
return RET_PF_EMULATE;
}
@@ -880,11 +881,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();

- if (try_async_pf(vcpu, fault->prefault, walker.gfn, addr, &pfn, &hva,
+ if (try_async_pf(vcpu, fault->prefault, fault->gfn, addr, &pfn, &hva,
fault->write, &map_writable))
return RET_PF_RETRY;

- if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
+ if (handle_abnormal_pfn(vcpu, addr, fault->gfn, pfn, walker.pte_access, &r))
return r;

/*
--
2.27.0


2021-08-07 13:51:45

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 07/16] KVM: MMU: change try_async_pf() arguments to kvm_page_fault

Add fields to struct kvm_page_fault corresponding to outputs of
try_async_pf(). For now they have to be extracted again from struct
kvm_page_fault in the subsequent steps, but this is temporary until
other functions in the chain are switched over as well.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 5 ++++
arch/x86/kvm/mmu/mmu.c | 50 ++++++++++++++++------------------
arch/x86/kvm/mmu/paging_tmpl.h | 19 ++++++-------
3 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index c3d404155ceb..2ff2b340a206 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -135,6 +135,11 @@ struct kvm_page_fault {

/* Shifted addr, or result of guest page table walk if addr is a gva. */
gfn_t gfn;
+
+ /* Outputs of try_async_pf. */
+ kvm_pfn_t pfn;
+ hva_t hva;
+ bool map_writable;
};

int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3af49678d4f8..a6366f1c4197 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3834,11 +3834,9 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
}

-static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
- gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva,
- bool write, bool *writable)
+static bool try_async_pf(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn);
bool async;

/*
@@ -3851,29 +3849,31 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,

/* Don't expose private memslots to L2. */
if (is_guest_mode(vcpu) && !kvm_is_visible_memslot(slot)) {
- *pfn = KVM_PFN_NOSLOT;
- *writable = false;
+ fault->pfn = KVM_PFN_NOSLOT;
+ fault->map_writable = false;
return false;
}

async = false;
- *pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
- write, writable, hva);
+ fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
+ fault->write, &fault->map_writable,
+ &fault->hva);
if (!async)
return false; /* *pfn has correct page already */

- if (!prefault && kvm_can_do_async_pf(vcpu)) {
- trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
- if (kvm_find_async_pf_gfn(vcpu, gfn)) {
- trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
+ if (!fault->prefault && kvm_can_do_async_pf(vcpu)) {
+ trace_kvm_try_async_get_page(fault->addr, fault->gfn);
+ if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) {
+ trace_kvm_async_pf_doublefault(fault->addr, fault->gfn);
kvm_make_request(KVM_REQ_APF_HALT, vcpu);
return true;
- } else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn))
+ } else if (kvm_arch_setup_async_pf(vcpu, fault->addr, fault->gfn))
return true;
}

- *pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
- write, writable, hva);
+ fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL,
+ fault->write, &fault->map_writable,
+ &fault->hva);
return false;
}

@@ -3882,11 +3882,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
gpa_t gpa = fault->addr;
u32 error_code = fault->error_code;
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
- bool map_writable;

unsigned long mmu_seq;
- kvm_pfn_t pfn;
- hva_t hva;
int r;

fault->gfn = gpa >> PAGE_SHIFT;
@@ -3904,12 +3901,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();

- if (try_async_pf(vcpu, fault->prefault, fault->gfn, gpa, &pfn, &hva,
- fault->write, &map_writable))
+ if (try_async_pf(vcpu, fault))
return RET_PF_RETRY;

if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa,
- fault->gfn, pfn, ACC_ALL, &r))
+ fault->gfn, fault->pfn, ACC_ALL, &r))
return r;

r = RET_PF_RETRY;
@@ -3919,25 +3915,25 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
else
write_lock(&vcpu->kvm->mmu_lock);

- if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+ if (!is_noslot_pfn(fault->pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
goto out_unlock;
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;

if (is_tdp_mmu_fault)
- r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, fault->max_level,
- pfn, fault->prefault);
+ r = kvm_tdp_mmu_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
+ fault->pfn, fault->prefault);
else
- r = __direct_map(vcpu, gpa, error_code, map_writable, fault->max_level, pfn,
- fault->prefault, fault->is_tdp);
+ r = __direct_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
+ fault->pfn, fault->prefault, fault->is_tdp);

out_unlock:
if (is_tdp_mmu_fault)
read_unlock(&vcpu->kvm->mmu_lock);
else
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(pfn);
+ kvm_release_pfn_clean(fault->pfn);
return r;
}

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 2ef219bf1745..6ead472674ad 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -828,10 +828,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
u32 error_code = fault->error_code;
struct guest_walker walker;
int r;
- kvm_pfn_t pfn;
- hva_t hva;
unsigned long mmu_seq;
- bool map_writable, is_self_change_mapping;
+ bool is_self_change_mapping;

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
WARN_ON_ONCE(fault->is_tdp);
@@ -881,11 +879,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();

- if (try_async_pf(vcpu, fault->prefault, fault->gfn, addr, &pfn, &hva,
- fault->write, &map_writable))
+ if (try_async_pf(vcpu, fault))
return RET_PF_RETRY;

- if (handle_abnormal_pfn(vcpu, addr, fault->gfn, pfn, walker.pte_access, &r))
+ if (handle_abnormal_pfn(vcpu, addr, fault->gfn, fault->pfn, walker.pte_access, &r))
return r;

/*
@@ -893,7 +890,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* we will cache the incorrect access into mmio spte.
*/
if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) &&
- !is_cr0_wp(vcpu->arch.mmu) && !fault->user && !is_noslot_pfn(pfn)) {
+ !is_cr0_wp(vcpu->arch.mmu) && !fault->user && !is_noslot_pfn(fault->pfn)) {
walker.pte_access |= ACC_WRITE_MASK;
walker.pte_access &= ~ACC_USER_MASK;

@@ -909,20 +906,20 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault

r = RET_PF_RETRY;
write_lock(&vcpu->kvm->mmu_lock);
- if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
+ if (!is_noslot_pfn(fault->pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
goto out_unlock;

kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;
- r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, pfn,
- map_writable, fault->prefault);
+ r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, fault->pfn,
+ fault->map_writable, fault->prefault);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);

out_unlock:
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(pfn);
+ kvm_release_pfn_clean(fault->pfn);
return r;
}

--
2.27.0


2021-08-07 13:51:48

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 05/16] KVM: MMU: change direct_page_fault() arguments to kvm_page_fault

Add fields to struct kvm_page_fault corresponding to
the arguments of direct_page_fault(). The fields are
initialized in the callers, and direct_page_fault()
receives a struct kvm_page_fault instead of having to
extract the arguments out of it.

Also adjust FNAME(page_fault) to store the max_level in
struct kvm_page_fault, to keep it similar to the direct
map path.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 5 ++++
arch/x86/kvm/mmu/mmu.c | 43 +++++++++++++++-------------------
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++---
3 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bbe5fe57c2af..158263847bef 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -129,6 +129,9 @@ struct kvm_page_fault {

/* Derived from mmu. */
const bool is_tdp;
+
+ /* Input to FNAME(fetch), __direct_map and kvm_tdp_mmu_map. */
+ u8 max_level;
};

int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
@@ -146,6 +149,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.user = err & PFERR_USER_MASK,
.prefault = prefault,
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+
+ .max_level = KVM_MAX_HUGEPAGE_LEVEL,
};
#ifdef CONFIG_RETPOLINE
if (fault.is_tdp)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bb3a2c2aa62e..e93ee0ebe5ff 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3878,11 +3878,11 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
return false;
}

-static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- bool prefault, int max_level, bool is_tdp)
+static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
+ gpa_t gpa = fault->addr;
+ u32 error_code = fault->error_code;
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);
- bool write = error_code & PFERR_WRITE_MASK;
bool map_writable;

gfn_t gfn = gpa >> PAGE_SHIFT;
@@ -3905,11 +3905,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();

- if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva,
- write, &map_writable))
+ if (try_async_pf(vcpu, fault->prefault, gfn, gpa, &pfn, &hva,
+ fault->write, &map_writable))
return RET_PF_RETRY;

- if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
+ if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
return r;

r = RET_PF_RETRY;
@@ -3926,11 +3926,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
goto out_unlock;

if (is_tdp_mmu_fault)
- r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level,
- pfn, prefault);
+ r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, fault->max_level,
+ pfn, fault->prefault);
else
- r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn,
- prefault, is_tdp);
+ r = __direct_map(vcpu, gpa, error_code, map_writable, fault->max_level, pfn,
+ fault->prefault, fault->is_tdp);

out_unlock:
if (is_tdp_mmu_fault)
@@ -3944,12 +3944,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
- pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
+ pgprintk("%s: gva %lx error %x\n", __func__, fault->addr, fault->error_code);

/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
- return direct_page_fault(vcpu, fault->addr,
- fault->error_code, fault->prefault,
- PG_LEVEL_2M, false);
+ fault->max_level = PG_LEVEL_2M;
+ return direct_page_fault(vcpu, fault);
}

int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
@@ -3987,21 +3986,17 @@ EXPORT_SYMBOL_GPL(kvm_handle_page_fault);

int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- int max_level;
- gpa_t gpa = fault->addr;
-
- for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
- max_level > PG_LEVEL_4K;
- max_level--) {
- int page_num = KVM_PAGES_PER_HPAGE(max_level);
- gfn_t base = (gpa >> PAGE_SHIFT) & ~(page_num - 1);
+ while (fault->max_level > PG_LEVEL_4K) {
+ int page_num = KVM_PAGES_PER_HPAGE(fault->max_level);
+ gfn_t base = (fault->addr >> PAGE_SHIFT) & ~(page_num - 1);

if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num))
break;
+
+ --fault->max_level;
}

- return direct_page_fault(vcpu, gpa, fault->error_code,
- fault->prefault, max_level, true);
+ return direct_page_fault(vcpu, fault);
}

static void nonpaging_init_context(struct kvm_mmu *context)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 916a8106d0f4..5352323646b0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -832,7 +832,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
hva_t hva;
unsigned long mmu_seq;
bool map_writable, is_self_change_mapping;
- int max_level;

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
WARN_ON_ONCE(fault->is_tdp);
@@ -874,9 +873,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
&walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable);

if (is_self_change_mapping)
- max_level = PG_LEVEL_4K;
+ fault->max_level = PG_LEVEL_4K;
else
- max_level = walker.level;
+ fault->max_level = walker.level;

mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
@@ -916,7 +915,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;
- r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn,
+ r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, pfn,
map_writable, fault->prefault);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);

--
2.27.0


2021-08-07 13:51:53

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 08/16] KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault

Pass struct kvm_page_fault to handle_abnormal_pfn() instead of
extracting the arguments from the struct.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 17 ++++++++---------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a6366f1c4197..cec59ac2e1cd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3024,18 +3024,18 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
return -EFAULT;
}

-static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
- kvm_pfn_t pfn, unsigned int access,
- int *ret_val)
+static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+ unsigned int access, int *ret_val)
{
/* The pfn is invalid, report the error! */
- if (unlikely(is_error_pfn(pfn))) {
- *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn);
+ if (unlikely(is_error_pfn(fault->pfn))) {
+ *ret_val = kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
return true;
}

- if (unlikely(is_noslot_pfn(pfn))) {
- vcpu_cache_mmio_info(vcpu, gva, gfn,
+ if (unlikely(is_noslot_pfn(fault->pfn))) {
+ gva_t gva = fault->is_tdp ? 0 : fault->addr;
+ vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
access & shadow_mmio_access_mask);
/*
* If MMIO caching is disabled, emulate immediately without
@@ -3904,8 +3904,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (try_async_pf(vcpu, fault))
return RET_PF_RETRY;

- if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa,
- fault->gfn, fault->pfn, ACC_ALL, &r))
+ if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r))
return r;

r = RET_PF_RETRY;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 6ead472674ad..9b90097dea22 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -882,7 +882,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (try_async_pf(vcpu, fault))
return RET_PF_RETRY;

- if (handle_abnormal_pfn(vcpu, addr, fault->gfn, fault->pfn, walker.pte_access, &r))
+ if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r))
return r;

/*
--
2.27.0


2021-08-07 13:51:53

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 15/16] KVM: MMU: change disallowed_hugepage_adjust() arguments to kvm_page_fault

Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of
extracting the arguments from the struct. Tweak a bit the conditions
to avoid long lines.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 19 ++++++++-----------
arch/x86/kvm/mmu/mmu_internal.h | 3 +--
arch/x86/kvm/mmu/paging_tmpl.h | 3 +--
arch/x86/kvm/mmu/tdp_mmu.c | 3 +--
4 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6df7da9d1d77..a41325f452f4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2921,12 +2921,10 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
fault->pfn &= ~mask;
}

-void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, u8 *goal_levelp)
+void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level)
{
- int level = *goal_levelp;
-
- if (cur_level == level && level > PG_LEVEL_4K &&
+ if (cur_level > PG_LEVEL_4K &&
+ cur_level == fault->goal_level &&
is_shadow_present_pte(spte) &&
!is_large_pte(spte)) {
/*
@@ -2936,10 +2934,10 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
* patching back for them into pfn the next 9 bits of
* the address.
*/
- u64 page_mask = KVM_PAGES_PER_HPAGE(level) -
- KVM_PAGES_PER_HPAGE(level - 1);
- *pfnp |= gfn & page_mask;
- (*goal_levelp)--;
+ u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) -
+ KVM_PAGES_PER_HPAGE(cur_level - 1);
+ fault->pfn |= fault->gfn & page_mask;
+ fault->goal_level--;
}
}

@@ -2959,8 +2957,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* large page, as the leaf could be executable.
*/
if (fault->nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &fault->goal_level);
+ disallowed_hugepage_adjust(fault, *it.sptep, it.level);

base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == fault->goal_level)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f84e0e90e442..c80d89242c45 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -155,8 +155,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn,
kvm_pfn_t pfn, int max_level);
void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
-void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, u8 *goal_levelp);
+void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level);

void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index add1b41b0f1a..3f82f469abdf 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -729,8 +729,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* large page, as the leaf could be executable.
*/
if (fault->nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &fault->goal_level);
+ disallowed_hugepage_adjust(fault, *it.sptep, it.level);

base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == fault->goal_level)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f0a9009dff96..803da0334933 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1010,8 +1010,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
if (fault->nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(iter.old_spte, fault->gfn,
- iter.level, &fault->pfn, &fault->goal_level);
+ disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);

if (iter.level == fault->goal_level)
break;
--
2.27.0


2021-08-07 13:52:07

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 12/16] KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to kvm_page_fault

Pass struct kvm_page_fault to tdp_mmu_map_handle_target_level() instead of
extracting the arguments from the struct.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 340ae8a34fe3..d8b1735739c0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -937,21 +937,20 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
* Installs a last-level SPTE to handle a TDP page fault.
* (NPT/EPT violation/misconfiguration)
*/
-static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
- int map_writable,
- struct tdp_iter *iter,
- kvm_pfn_t pfn, bool prefault)
+static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault,
+ struct tdp_iter *iter)
{
u64 new_spte;
int ret = RET_PF_FIXED;
int make_spte_ret = 0;

- if (unlikely(is_noslot_pfn(pfn)))
+ if (unlikely(is_noslot_pfn(fault->pfn)))
new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
else
make_spte_ret = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn,
- pfn, iter->old_spte, prefault, true,
- map_writable, !shadow_accessed_mask,
+ fault->pfn, iter->old_spte, fault->prefault, true,
+ fault->map_writable, !shadow_accessed_mask,
&new_spte);

if (new_spte == iter->old_spte)
@@ -965,7 +964,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
* the vCPU would have the same fault again.
*/
if (make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) {
- if (write)
+ if (fault->write)
ret = RET_PF_EMULATE;
kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
}
@@ -1073,8 +1072,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return RET_PF_RETRY;
}

- ret = tdp_mmu_map_handle_target_level(vcpu, fault->write, fault->map_writable, &iter,
- fault->pfn, fault->prefault);
+ ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter);
rcu_read_unlock();

return ret;
--
2.27.0


2021-08-07 13:52:16

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 11/16] KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault

Pass struct kvm_page_fault to kvm_tdp_mmu_map() instead of
extracting the arguments from the struct.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +--
arch/x86/kvm/mmu/tdp_mmu.c | 23 +++++++++--------------
arch/x86/kvm/mmu/tdp_mmu.h | 4 +---
3 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 624277da4586..75635fc0720d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3916,8 +3916,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
goto out_unlock;

if (is_tdp_mmu_fault)
- r = kvm_tdp_mmu_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
- fault->pfn, fault->prefault);
+ r = kvm_tdp_mmu_map(vcpu, fault);
else
r = __direct_map(vcpu, fault);

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index dab6cb46cdb2..340ae8a34fe3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -994,35 +994,30 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write,
* Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
* page tables and SPTEs to translate the faulting guest physical address.
*/
-int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- int map_writable, int max_level, kvm_pfn_t pfn,
- bool prefault)
+int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool write = error_code & PFERR_WRITE_MASK;
- bool exec = error_code & PFERR_FETCH_MASK;
- bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
+ bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu *mmu = vcpu->arch.mmu;
struct tdp_iter iter;
struct kvm_mmu_page *sp;
u64 *child_pt;
u64 new_spte;
int ret;
- gfn_t gfn = gpa >> PAGE_SHIFT;
int level;
int req_level;

- level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+ level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
huge_page_disallowed, &req_level);

- trace_kvm_mmu_spte_requested(gpa, level, pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);

rcu_read_lock();

- tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) {
+ tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
if (nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(iter.old_spte, gfn,
- iter.level, &pfn, &level);
+ disallowed_hugepage_adjust(iter.old_spte, fault->gfn,
+ iter.level, &fault->pfn, &level);

if (iter.level == level)
break;
@@ -1078,8 +1073,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
return RET_PF_RETRY;
}

- ret = tdp_mmu_map_handle_target_level(vcpu, write, map_writable, &iter,
- pfn, prefault);
+ ret = tdp_mmu_map_handle_target_level(vcpu, fault->write, fault->map_writable, &iter,
+ fault->pfn, fault->prefault);
rcu_read_unlock();

return ret;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index b224d126adf9..330d43d09e97 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -51,9 +51,7 @@ void kvm_tdp_mmu_zap_all(struct kvm *kvm);
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm);
void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm);

-int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- int map_writable, int max_level, kvm_pfn_t pfn,
- bool prefault);
+int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);

bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
bool flush);
--
2.27.0


2021-08-07 13:52:51

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 10/16] KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault

Pass struct kvm_page_fault to FNAME(fetch)() instead of
extracting the arguments from the struct.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 54 +++++++++++++++-------------------
1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 9b90097dea22..261100d813af 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -663,21 +663,18 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
* If the guest tries to write a write-protected page, we need to
* emulate this operation, return 1 to indicate this case.
*/
-static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
- struct guest_walker *gw, u32 error_code,
- int max_level, kvm_pfn_t pfn, bool map_writable,
- bool prefault)
+static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+ struct guest_walker *gw)
{
bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool write_fault = error_code & PFERR_WRITE_MASK;
- bool exec = error_code & PFERR_FETCH_MASK;
- bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
+ bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned int direct_access, access;
int top_level, level, req_level, ret;
- gfn_t base_gfn = gw->gfn;
+ gfn_t base_gfn = fault->gfn;

+ WARN_ON_ONCE(gw->gfn != base_gfn);
direct_access = gw->pte_access;

top_level = vcpu->arch.mmu->root_level;
@@ -695,7 +692,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa)))
goto out_gpte_changed;

- for (shadow_walk_init(&it, vcpu, addr);
+ for (shadow_walk_init(&it, vcpu, fault->addr);
shadow_walk_okay(&it) && it.level > gw->level;
shadow_walk_next(&it)) {
gfn_t table_gfn;
@@ -707,8 +704,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
access = gw->pt_access[it.level - 2];
- sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
- false, access);
+ sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr,
+ it.level-1, false, access);
}

/*
@@ -722,10 +719,10 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
link_shadow_page(vcpu, it.sptep, sp);
}

- level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
+ level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, fault->max_level, &fault->pfn,
huge_page_disallowed, &req_level);

- trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, gw->level, fault->pfn);

for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
clear_sp_write_flooding_count(it.sptep);
@@ -735,10 +732,10 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
* large page, as the leaf could be executable.
*/
if (nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, gw->gfn, it.level,
- &pfn, &level);
+ disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
+ &fault->pfn, &level);

- base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+ base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == level)
break;

@@ -747,7 +744,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
drop_large_spte(vcpu, it.sptep);

if (!is_shadow_present_pte(*it.sptep)) {
- sp = kvm_mmu_get_page(vcpu, base_gfn, addr,
+ sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr,
it.level - 1, true, direct_access);
link_shadow_page(vcpu, it.sptep, sp);
if (huge_page_disallowed && req_level >= it.level)
@@ -755,8 +752,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
}
}

- ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault,
- it.level, base_gfn, pfn, prefault, map_writable);
+ ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
+ it.level, base_gfn, fault->pfn, fault->prefault,
+ fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;

@@ -824,26 +822,21 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu,
*/
static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- gpa_t addr = fault->addr;
- u32 error_code = fault->error_code;
struct guest_walker walker;
int r;
unsigned long mmu_seq;
bool is_self_change_mapping;

- pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
+ pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
WARN_ON_ONCE(fault->is_tdp);

/*
+ * Look up the guest pte for the faulting address.
* If PFEC.RSVD is set, this is a shadow page fault.
* The bit needs to be cleared before walking guest page tables.
*/
- error_code &= ~PFERR_RSVD_MASK;
-
- /*
- * Look up the guest pte for the faulting address.
- */
- r = FNAME(walk_addr)(&walker, vcpu, addr, error_code);
+ r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
+ fault->error_code & ~PFERR_RSVD_MASK);

/*
* The page is not mapped by the guest. Let the guest handle it.
@@ -858,7 +851,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault

fault->gfn = walker.gfn;
if (page_fault_handle_page_track(vcpu, fault)) {
- shadow_page_table_clear_flood(vcpu, addr);
+ shadow_page_table_clear_flood(vcpu, fault->addr);
return RET_PF_EMULATE;
}

@@ -913,8 +906,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = make_mmu_pages_available(vcpu);
if (r)
goto out_unlock;
- r = FNAME(fetch)(vcpu, addr, &walker, error_code, fault->max_level, fault->pfn,
- fault->map_writable, fault->prefault);
+ r = FNAME(fetch)(vcpu, fault, &walker);
kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);

out_unlock:
--
2.27.0


2021-08-07 13:52:59

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 16/16] KVM: MMU: change tracepoints arguments to kvm_page_fault

Pass struct kvm_page_fault to tracepoints instead of
extracting the arguments from the struct.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 4 ++--
arch/x86/kvm/mmu/mmutrace.h | 18 +++++++++---------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a41325f452f4..0c0061893ebe 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2950,7 +2950,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

kvm_mmu_hugepage_adjust(vcpu, fault);

- trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault);
for_each_shadow_entry(vcpu, fault->addr, it) {
/*
* We cannot overwrite existing page tables with an NX
@@ -3242,7 +3242,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

} while (true);

- trace_fast_page_fault(vcpu, fault->addr, fault->error_code, sptep, spte, ret);
+ trace_fast_page_fault(vcpu, fault, sptep, spte, ret);
walk_shadow_page_lockless_end(vcpu);

return ret;
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 2924a4081a19..0c37fb9d532e 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -252,9 +252,9 @@ TRACE_EVENT(

TRACE_EVENT(
fast_page_fault,
- TP_PROTO(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 error_code,
+ TP_PROTO(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
u64 *sptep, u64 old_spte, int ret),
- TP_ARGS(vcpu, cr2_or_gpa, error_code, sptep, old_spte, ret),
+ TP_ARGS(vcpu, fault, sptep, old_spte, ret),

TP_STRUCT__entry(
__field(int, vcpu_id)
@@ -268,8 +268,8 @@ TRACE_EVENT(

TP_fast_assign(
__entry->vcpu_id = vcpu->vcpu_id;
- __entry->cr2_or_gpa = cr2_or_gpa;
- __entry->error_code = error_code;
+ __entry->cr2_or_gpa = fault->addr;
+ __entry->error_code = fault->error_code;
__entry->sptep = sptep;
__entry->old_spte = old_spte;
__entry->new_spte = *sptep;
@@ -367,8 +367,8 @@ TRACE_EVENT(

TRACE_EVENT(
kvm_mmu_spte_requested,
- TP_PROTO(gpa_t addr, int level, kvm_pfn_t pfn),
- TP_ARGS(addr, level, pfn),
+ TP_PROTO(struct kvm_page_fault *fault),
+ TP_ARGS(fault),

TP_STRUCT__entry(
__field(u64, gfn)
@@ -377,9 +377,9 @@ TRACE_EVENT(
),

TP_fast_assign(
- __entry->gfn = addr >> PAGE_SHIFT;
- __entry->pfn = pfn | (__entry->gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
- __entry->level = level;
+ __entry->gfn = fault->addr >> PAGE_SHIFT;
+ __entry->pfn = fault->pfn | (__entry->gfn & (KVM_PAGES_PER_HPAGE(fault->goal_level) - 1));
+ __entry->level = fault->goal_level;
),

TP_printk("gfn %llx pfn %llx level %d",
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 3f82f469abdf..1c182ef61c79 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -719,7 +719,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,

kvm_mmu_hugepage_adjust(vcpu, fault);

- trace_kvm_mmu_spte_requested(fault->addr, gw->level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault);

for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
clear_sp_write_flooding_count(it.sptep);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 803da0334933..7f7de01cf665 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1004,7 +1004,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

kvm_mmu_hugepage_adjust(vcpu, fault);

- trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault);

rcu_read_lock();

--
2.27.0

2021-08-07 13:53:01

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 13/16] KVM: MMU: change fast_page_fault() arguments to kvm_page_fault

Pass struct kvm_page_fault to fast_page_fault() instead of
extracting the arguments from the struct.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++----------------------
1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 75635fc0720d..cd43cf8d247e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3046,18 +3046,17 @@ static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fa
return false;
}

-static bool page_fault_can_be_fast(u32 error_code)
+static bool page_fault_can_be_fast(struct kvm_page_fault *fault)
{
/*
* Do not fix the mmio spte with invalid generation number which
* need to be updated by slow page fault path.
*/
- if (unlikely(error_code & PFERR_RSVD_MASK))
+ if (fault->rsvd)
return false;

/* See if the page fault is due to an NX violation */
- if (unlikely(((error_code & (PFERR_FETCH_MASK | PFERR_PRESENT_MASK))
- == (PFERR_FETCH_MASK | PFERR_PRESENT_MASK))))
+ if (unlikely(fault->exec && fault->present))
return false;

/*
@@ -3074,9 +3073,7 @@ static bool page_fault_can_be_fast(u32 error_code)
* accesses to a present page.
*/

- return shadow_acc_track_mask != 0 ||
- ((error_code & (PFERR_WRITE_MASK | PFERR_PRESENT_MASK))
- == (PFERR_WRITE_MASK | PFERR_PRESENT_MASK));
+ return shadow_acc_track_mask != 0 || (fault->write && fault->present);
}

/*
@@ -3118,12 +3115,12 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
return true;
}

-static bool is_access_allowed(u32 fault_err_code, u64 spte)
+static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
{
- if (fault_err_code & PFERR_FETCH_MASK)
+ if (fault->exec)
return is_executable_pte(spte);

- if (fault_err_code & PFERR_WRITE_MASK)
+ if (fault->write)
return is_writable_pte(spte);

/* Fault was on Read access */
@@ -3159,7 +3156,7 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
/*
* Returns one of RET_PF_INVALID, RET_PF_FIXED or RET_PF_SPURIOUS.
*/
-static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
+static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_mmu_page *sp;
int ret = RET_PF_INVALID;
@@ -3167,7 +3164,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
u64 *sptep = NULL;
uint retry_count = 0;

- if (!page_fault_can_be_fast(error_code))
+ if (!page_fault_can_be_fast(fault))
return ret;

walk_shadow_page_lockless_begin(vcpu);
@@ -3176,9 +3173,9 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
u64 new_spte;

if (is_tdp_mmu(vcpu->arch.mmu))
- sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, gpa, &spte);
+ sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
else
- sptep = fast_pf_get_last_sptep(vcpu, gpa, &spte);
+ sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);

if (!is_shadow_present_pte(spte))
break;
@@ -3197,7 +3194,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
* Need not check the access of upper level table entries since
* they are always ACC_ALL.
*/
- if (is_access_allowed(error_code, spte)) {
+ if (is_access_allowed(fault, spte)) {
ret = RET_PF_SPURIOUS;
break;
}
@@ -3212,7 +3209,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)
* be removed in the fast path only if the SPTE was
* write-protected for dirty-logging or access tracking.
*/
- if ((error_code & PFERR_WRITE_MASK) &&
+ if (fault->write &&
spte_can_locklessly_be_made_writable(spte)) {
new_spte |= PT_WRITABLE_MASK;

@@ -3233,7 +3230,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)

/* Verify that the fault can be handled in the fast path */
if (new_spte == spte ||
- !is_access_allowed(error_code, new_spte))
+ !is_access_allowed(fault, new_spte))
break;

/*
@@ -3254,7 +3251,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code)

} while (true);

- trace_fast_page_fault(vcpu, gpa, error_code, sptep, spte, ret);
+ trace_fast_page_fault(vcpu, fault->addr, fault->error_code, sptep, spte, ret);
walk_shadow_page_lockless_end(vcpu);

return ret;
@@ -3874,18 +3871,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- gpa_t gpa = fault->addr;
- u32 error_code = fault->error_code;
bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);

unsigned long mmu_seq;
int r;

- fault->gfn = gpa >> PAGE_SHIFT;
+ fault->gfn = fault->addr >> PAGE_SHIFT;
if (page_fault_handle_page_track(vcpu, fault))
return RET_PF_EMULATE;

- r = fast_page_fault(vcpu, gpa, error_code);
+ r = fast_page_fault(vcpu, fault);
if (r != RET_PF_INVALID)
return r;

--
2.27.0


2021-08-07 13:53:06

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 09/16] KVM: MMU: change __direct_map() arguments to kvm_page_fault

Pass struct kvm_page_fault to __direct_map() instead of
extracting the arguments from the struct.

Suggested-by: Isaku Yamahata <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cec59ac2e1cd..624277da4586 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2946,34 +2946,29 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
}
}

-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
- int map_writable, int max_level, kvm_pfn_t pfn,
- bool prefault, bool is_tdp)
+static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool write = error_code & PFERR_WRITE_MASK;
- bool exec = error_code & PFERR_FETCH_MASK;
- bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled;
+ bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_shadow_walk_iterator it;
struct kvm_mmu_page *sp;
int level, req_level, ret;
- gfn_t gfn = gpa >> PAGE_SHIFT;
- gfn_t base_gfn = gfn;
+ gfn_t base_gfn = fault->gfn;

- level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn,
+ level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
huge_page_disallowed, &req_level);

- trace_kvm_mmu_spte_requested(gpa, level, pfn);
- for_each_shadow_entry(vcpu, gpa, it) {
+ trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);
+ for_each_shadow_entry(vcpu, fault->addr, it) {
/*
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
if (nx_huge_page_workaround_enabled)
- disallowed_hugepage_adjust(*it.sptep, gfn, it.level,
- &pfn, &level);
+ disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
+ &fault->pfn, &level);

- base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+ base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
if (it.level == level)
break;

@@ -2985,14 +2980,14 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
it.level - 1, true, ACC_ALL);

link_shadow_page(vcpu, it.sptep, sp);
- if (is_tdp && huge_page_disallowed &&
+ if (fault->is_tdp && huge_page_disallowed &&
req_level >= it.level)
account_huge_nx_page(vcpu->kvm, sp);
}

ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
- write, level, base_gfn, pfn, prefault,
- map_writable);
+ fault->write, level, base_gfn, fault->pfn,
+ fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;

@@ -3924,8 +3919,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
r = kvm_tdp_mmu_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
fault->pfn, fault->prefault);
else
- r = __direct_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
- fault->pfn, fault->prefault, fault->is_tdp);
+ r = __direct_map(vcpu, fault);

out_unlock:
if (is_tdp_mmu_fault)
--
2.27.0


2021-08-07 13:53:11

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 14/16] KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault

Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of
extracting the arguments from the struct; the results are also stored
in the struct, so the callers are adjusted consequently.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.h | 35 +++++++++++++++++--
arch/x86/kvm/mmu/mmu.c | 60 +++++++++++++++------------------
arch/x86/kvm/mmu/mmu_internal.h | 12 ++-----
arch/x86/kvm/mmu/paging_tmpl.h | 16 ++++-----
arch/x86/kvm/mmu/tdp_mmu.c | 21 +++++-------
5 files changed, 77 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2ff2b340a206..13c20af6fa07 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -127,12 +127,34 @@ struct kvm_page_fault {
const bool rsvd;
const bool user;

- /* Derived from mmu. */
+ /* Derived from mmu and global state. */
const bool is_tdp;
+ const bool nx_huge_page_workaround_enabled;

- /* Input to FNAME(fetch), __direct_map and kvm_tdp_mmu_map. */
+ /*
+ * Whether a >4KB mapping can be created or is forbidden due to NX
+ * hugepages.
+ */
+ bool huge_page_disallowed;
+
+ /*
+ * Maximum page size that can be created for this fault; input to
+ * FNAME(fetch), __direct_map and kvm_tdp_mmu_map.
+ */
u8 max_level;

+ /*
+ * Page size that can be created based on the max_level and the
+ * page size used by the host mapping.
+ */
+ u8 req_level;
+
+ /*
+ * Page size that will be created based on the req_level and
+ * huge_page_disallowed.
+ */
+ u8 goal_level;
+
/* Shifted addr, or result of guest page table walk if addr is a gva. */
gfn_t gfn;

@@ -144,6 +166,12 @@ struct kvm_page_fault {

int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);

+extern int nx_huge_pages;
+static inline bool is_nx_huge_page_enabled(void)
+{
+ return READ_ONCE(nx_huge_pages);
+}
+
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefault)
{
@@ -157,8 +185,11 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.user = err & PFERR_USER_MASK,
.prefault = prefault,
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
+ .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),

.max_level = KVM_MAX_HUGEPAGE_LEVEL,
+ .req_level = PG_LEVEL_4K,
+ .goal_level = PG_LEVEL_4K,
};
#ifdef CONFIG_RETPOLINE
if (fault.is_tdp)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cd43cf8d247e..6df7da9d1d77 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2884,48 +2884,45 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
return min(host_level, max_level);
}

-int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
- int max_level, kvm_pfn_t *pfnp,
- bool huge_page_disallowed, int *req_level)
+void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_memory_slot *slot;
- kvm_pfn_t pfn = *pfnp;
kvm_pfn_t mask;
- int level;

- *req_level = PG_LEVEL_4K;
+ fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled;

- if (unlikely(max_level == PG_LEVEL_4K))
- return PG_LEVEL_4K;
+ if (unlikely(fault->max_level == PG_LEVEL_4K))
+ return;

- if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))
- return PG_LEVEL_4K;
+ if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn))
+ return;

- slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true);
+ slot = gfn_to_memslot_dirty_bitmap(vcpu, fault->gfn, true);
if (!slot)
- return PG_LEVEL_4K;
+ return;

/*
* Enforce the iTLB multihit workaround after capturing the requested
* level, which will be used to do precise, accurate accounting.
*/
- *req_level = level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, gfn, pfn, max_level);
- if (level == PG_LEVEL_4K || huge_page_disallowed)
- return PG_LEVEL_4K;
+ fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
+ fault->gfn, fault->pfn,
+ fault->max_level);
+ if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
+ return;

/*
* mmu_notifier_retry() was successful and mmu_lock is held, so
* the pmd can't be split from under us.
*/
- mask = KVM_PAGES_PER_HPAGE(level) - 1;
- VM_BUG_ON((gfn & mask) != (pfn & mask));
- *pfnp = pfn & ~mask;
-
- return level;
+ fault->goal_level = fault->req_level;
+ mask = KVM_PAGES_PER_HPAGE(fault->goal_level) - 1;
+ VM_BUG_ON((fault->gfn & mask) != (fault->pfn & mask));
+ fault->pfn &= ~mask;
}

void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, int *goal_levelp)
+ kvm_pfn_t *pfnp, u8 *goal_levelp)
{
int level = *goal_levelp;

@@ -2948,28 +2945,25 @@ void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,

static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_shadow_walk_iterator it;
struct kvm_mmu_page *sp;
- int level, req_level, ret;
+ int ret;
gfn_t base_gfn = fault->gfn;

- level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
- huge_page_disallowed, &req_level);
+ kvm_mmu_hugepage_adjust(vcpu, fault);

- trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);
for_each_shadow_entry(vcpu, fault->addr, it) {
/*
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
- if (nx_huge_page_workaround_enabled)
+ if (fault->nx_huge_page_workaround_enabled)
disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &level);
+ &fault->pfn, &fault->goal_level);

base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
- if (it.level == level)
+ if (it.level == fault->goal_level)
break;

drop_large_spte(vcpu, it.sptep);
@@ -2980,13 +2974,13 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
it.level - 1, true, ACC_ALL);

link_shadow_page(vcpu, it.sptep, sp);
- if (fault->is_tdp && huge_page_disallowed &&
- req_level >= it.level)
+ if (fault->is_tdp && fault->huge_page_disallowed &&
+ fault->req_level >= it.level)
account_huge_nx_page(vcpu->kvm, sp);
}

ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
- fault->write, level, base_gfn, fault->pfn,
+ fault->write, fault->goal_level, base_gfn, fault->pfn,
fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ca7b7595bbfc..f84e0e90e442 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -116,12 +116,6 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu)
kvm_x86_ops.cpu_dirty_log_size;
}

-extern int nx_huge_pages;
-static inline bool is_nx_huge_page_enabled(void)
-{
- return READ_ONCE(nx_huge_pages);
-}
-
int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync);

void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn);
@@ -160,11 +154,9 @@ enum {
int kvm_mmu_max_mapping_level(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn,
kvm_pfn_t pfn, int max_level);
-int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
- int max_level, kvm_pfn_t *pfnp,
- bool huge_page_disallowed, int *req_level);
+void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level,
- kvm_pfn_t *pfnp, int *goal_levelp);
+ kvm_pfn_t *pfnp, u8 *goal_levelp);

void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 261100d813af..add1b41b0f1a 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -666,12 +666,10 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
struct guest_walker *gw)
{
- bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned int direct_access, access;
- int top_level, level, req_level, ret;
+ int top_level, ret;
gfn_t base_gfn = fault->gfn;

WARN_ON_ONCE(gw->gfn != base_gfn);
@@ -719,8 +717,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
link_shadow_page(vcpu, it.sptep, sp);
}

- level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, fault->max_level, &fault->pfn,
- huge_page_disallowed, &req_level);
+ kvm_mmu_hugepage_adjust(vcpu, fault);

trace_kvm_mmu_spte_requested(fault->addr, gw->level, fault->pfn);

@@ -731,12 +728,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* We cannot overwrite existing page tables with an NX
* large page, as the leaf could be executable.
*/
- if (nx_huge_page_workaround_enabled)
+ if (fault->nx_huge_page_workaround_enabled)
disallowed_hugepage_adjust(*it.sptep, fault->gfn, it.level,
- &fault->pfn, &level);
+ &fault->pfn, &fault->goal_level);

base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
- if (it.level == level)
+ if (it.level == fault->goal_level)
break;

validate_direct_spte(vcpu, it.sptep, direct_access);
@@ -747,7 +744,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr,
it.level - 1, true, direct_access);
link_shadow_page(vcpu, it.sptep, sp);
- if (huge_page_disallowed && req_level >= it.level)
+ if (fault->huge_page_disallowed &&
+ fault->req_level >= it.level)
account_huge_nx_page(vcpu->kvm, sp);
}
}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index d8b1735739c0..f0a9009dff96 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -995,30 +995,25 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
*/
int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
- bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled();
- bool huge_page_disallowed = fault->exec && nx_huge_page_workaround_enabled;
struct kvm_mmu *mmu = vcpu->arch.mmu;
struct tdp_iter iter;
struct kvm_mmu_page *sp;
u64 *child_pt;
u64 new_spte;
int ret;
- int level;
- int req_level;

- level = kvm_mmu_hugepage_adjust(vcpu, fault->gfn, fault->max_level, &fault->pfn,
- huge_page_disallowed, &req_level);
+ kvm_mmu_hugepage_adjust(vcpu, fault);

- trace_kvm_mmu_spte_requested(fault->addr, level, fault->pfn);
+ trace_kvm_mmu_spte_requested(fault->addr, fault->goal_level, fault->pfn);

rcu_read_lock();

tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) {
- if (nx_huge_page_workaround_enabled)
+ if (fault->nx_huge_page_workaround_enabled)
disallowed_hugepage_adjust(iter.old_spte, fault->gfn,
- iter.level, &fault->pfn, &level);
+ iter.level, &fault->pfn, &fault->goal_level);

- if (iter.level == level)
+ if (iter.level == fault->goal_level)
break;

/*
@@ -1056,8 +1051,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

if (tdp_mmu_set_spte_atomic_no_dirty_log(vcpu->kvm, &iter, new_spte)) {
tdp_mmu_link_page(vcpu->kvm, sp, true,
- huge_page_disallowed &&
- req_level >= iter.level);
+ fault->huge_page_disallowed &&
+ fault->req_level >= iter.level);

trace_kvm_mmu_get_page(sp, true);
} else {
@@ -1067,7 +1062,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
}
}

- if (iter.level != level) {
+ if (iter.level != fault->goal_level) {
rcu_read_unlock();
return RET_PF_RETRY;
}
--
2.27.0


2021-08-12 18:51:09

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH 00/16] KVM: x86: pass arguments on the page fault path via struct kvm_page_fault

On Sat, Aug 7, 2021 at 6:49 AM Paolo Bonzini <[email protected]> wrote:
>
> This is a revival of Isaku's patches from
> https://lore.kernel.org/kvm/[email protected]/.
> The current kvm page fault handlers passes around many arguments to the
> functions. To simplify those arguments and local variables, introduce
> a data structure, struct kvm_page_fault, to hold those arguments and
> variables. struct kvm_page_fault is allocated on stack on the caller
> of kvm fault handler, kvm_mmu_do_page_fault(), and passed around.

(I was out of office for the past few days so I'm just getting around
to this series now.)

Overall it looks good. Thanks for getting it cleaned up and merged
into kvm/queue. I'll get Ben's memslot series applied on top of this,
do a bit of performance testing, and send it out probably tomorrow or
early next week.

>
> The patches were redone from scratch based on the suggested struct layout
> from the review (https://lore.kernel.org/kvm/[email protected]/)
> and the subjects of Isaku's patches, so I kept authorship for myself
> and gave him a "Suggested-by" tag.
>
> The first two steps are unrelated cleanups that come in handy later on.
>
> Paolo
>
> Paolo Bonzini (16):
> KVM: MMU: pass unadulterated gpa to direct_page_fault
> KVM: x86: clamp host mapping level to max_level in
> kvm_mmu_max_mapping_level
> KVM: MMU: Introduce struct kvm_page_fault
> KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault
> KVM: MMU: change direct_page_fault() arguments to kvm_page_fault
> KVM: MMU: change page_fault_handle_page_track() arguments to
> kvm_page_fault
> KVM: MMU: change try_async_pf() arguments to kvm_page_fault
> KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault
> KVM: MMU: change __direct_map() arguments to kvm_page_fault
> KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault
> KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault
> KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to
> kvm_page_fault
> KVM: MMU: change fast_page_fault() arguments to kvm_page_fault
> KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault
> KVM: MMU: change disallowed_hugepage_adjust() arguments to
> kvm_page_fault
> KVM: MMU: change tracepoints arguments to kvm_page_fault
>
> arch/x86/include/asm/kvm_host.h | 4 +-
> arch/x86/kvm/mmu.h | 81 ++++++++++-
> arch/x86/kvm/mmu/mmu.c | 241 ++++++++++++++------------------
> arch/x86/kvm/mmu/mmu_internal.h | 13 +-
> arch/x86/kvm/mmu/mmutrace.h | 18 +--
> arch/x86/kvm/mmu/paging_tmpl.h | 96 ++++++-------
> arch/x86/kvm/mmu/tdp_mmu.c | 49 +++----
> arch/x86/kvm/mmu/tdp_mmu.h | 4 +-
> 8 files changed, 253 insertions(+), 253 deletions(-)
>
> --
> 2.27.0
>

2021-09-01 22:59:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 01/16] KVM: MMU: pass unadulterated gpa to direct_page_fault

On Sat, Aug 07, 2021, Paolo Bonzini wrote:
> Do not bother removing the low bits of the gpa. This masking dates back
> to the very first commit of KVM but it is unnecessary---or even
> problematic, because the gpa is later used to fill in the MMIO page cache.

I don't disagree with the code change, but I don't see how stripping the offset
can be problematic for the MMIO page cache. I assume you're referring to
handle_abnormal_pfn() -> vcpu_cache_mmio_info(). The "gva" is masked with
PAGE_MASK, i.e. the offset is stripped anyways. And fundamentally, that cache
is tied to the granularity of the memslots, tracking the offset would be wrong.

2021-09-01 23:09:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 07/16] KVM: MMU: change try_async_pf() arguments to kvm_page_fault

On Sat, Aug 07, 2021, Paolo Bonzini wrote:
> @@ -3919,25 +3915,25 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
> else
> write_lock(&vcpu->kvm->mmu_lock);
>
> - if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
> + if (!is_noslot_pfn(fault->pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
> goto out_unlock;
> r = make_mmu_pages_available(vcpu);
> if (r)
> goto out_unlock;
>
> if (is_tdp_mmu_fault)
> - r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, fault->max_level,
> - pfn, fault->prefault);
> + r = kvm_tdp_mmu_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
> + fault->pfn, fault->prefault);
> else
> - r = __direct_map(vcpu, gpa, error_code, map_writable, fault->max_level, pfn,
> - fault->prefault, fault->is_tdp);
> + r = __direct_map(vcpu, gpa, error_code, fault->map_writable, fault->max_level,
> + fault->pfn, fault->prefault, fault->is_tdp);

More unwanted spaces!

ERROR: code indent should use tabs where possible
#95: FILE: arch/x86/kvm/mmu/mmu.c:3951:
+^I^I^I^I fault->write, &fault->map_writable,$

ERROR: code indent should use tabs where possible
#96: FILE: arch/x86/kvm/mmu/mmu.c:3952:
+^I^I^I^I &fault->hva);$

ERROR: code indent should use tabs where possible
#123: FILE: arch/x86/kvm/mmu/mmu.c:3987:
+^I fault->gfn, fault->pfn, ACC_ALL, &r))$

ERROR: code indent should use tabs where possible
#147: FILE: arch/x86/kvm/mmu/mmu.c:4008:
+^I^I fault->pfn, fault->prefault, fault->is_tdp);$

total: 4 errors, 0 warnings, 180 lines checked

2021-09-01 23:17:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 08/16] KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault

On Sat, Aug 07, 2021, Paolo Bonzini wrote:
> Pass struct kvm_page_fault to handle_abnormal_pfn() instead of
> extracting the arguments from the struct.
>
> Suggested-by: Isaku Yamahata <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 17 ++++++++---------
> arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> 2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a6366f1c4197..cec59ac2e1cd 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3024,18 +3024,18 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> return -EFAULT;
> }
>
> -static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn,
> - kvm_pfn_t pfn, unsigned int access,
> - int *ret_val)
> +static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> + unsigned int access, int *ret_val)
> {
> /* The pfn is invalid, report the error! */
> - if (unlikely(is_error_pfn(pfn))) {
> - *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn);
> + if (unlikely(is_error_pfn(fault->pfn))) {
> + *ret_val = kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn);
> return true;
> }
>
> - if (unlikely(is_noslot_pfn(pfn))) {
> - vcpu_cache_mmio_info(vcpu, gva, gfn,
> + if (unlikely(is_noslot_pfn(fault->pfn))) {
> + gva_t gva = fault->is_tdp ? 0 : fault->addr;

Checkpatch wants a newline. I'm also surprised you didn't abuse bitwise math:

gva_t gva = fault->addr & ((u64)fault->is_tdp - 1);

I am _not_ suggesting you actually do that ;-)

> + vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
> access & shadow_mmio_access_mask);
> /*
> * If MMIO caching is disabled, emulate immediately without

2021-09-01 23:50:46

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 06/16] KVM: MMU: change page_fault_handle_page_track() arguments to kvm_page_fault

On Sat, Aug 07, 2021, Paolo Bonzini wrote:
> - if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))
> + if (handle_abnormal_pfn(vcpu, fault->is_tdp ? 0 : gpa,
> + fault->gfn, pfn, ACC_ALL, &r))

Spaces!

ERROR: code indent should use tabs where possible
#90: FILE: arch/x86/kvm/mmu/mmu.c:3991:
+^I fault->gfn, pfn, ACC_ALL, &r))$

total: 1 errors, 0 warnings, 84 lines checked


> return r;
>
> r = RET_PF_RETRY;

2021-09-01 23:51:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 16/16] KVM: MMU: change tracepoints arguments to kvm_page_fault

On Sat, Aug 07, 2021, Paolo Bonzini wrote:
> @@ -377,9 +377,9 @@ TRACE_EVENT(
> ),
>
> TP_fast_assign(
> - __entry->gfn = addr >> PAGE_SHIFT;
> - __entry->pfn = pfn | (__entry->gfn & (KVM_PAGES_PER_HPAGE(level) - 1));
> - __entry->level = level;
> + __entry->gfn = fault->addr >> PAGE_SHIFT;

Eww. The existing code also bastardizes addr vs. gpa, but this just looks even
more wrong because we have fault->gfn.

Maybe do this as a prep patch at the beginning of the series? And then use
fault->gfn directly.

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 7d03e9b7ccfa..b159749300b5 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -725,7 +725,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,
level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn,
huge_page_disallowed, &req_level);

- trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
+ trace_kvm_mmu_spte_requested(gw->gfn << PAGE_SHIFT, gw->level, pfn);

for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
clear_sp_write_flooding_count(it.sptep);

> + __entry->pfn = fault->pfn | (__entry->gfn & (KVM_PAGES_PER_HPAGE(fault->goal_level) - 1));

Similar thing here, it could use fault->gfn directly.

> + __entry->level = fault->goal_level;
> ),
>
> TP_printk("gfn %llx pfn %llx level %d",