2023-02-16 15:40:17

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 00/14] kvm: x86/mmu: Share the same code to invalidate each vTLB entry

From: Lai Jiangshan <[email protected]>

FNAME(invlpg) and FNAME(sync_page) invalidate vTLB entries but in
slightly different methods.

Make them use the same method and share the same code.

Patch 1: Address a subtle bug reported by Sean Christopherson.
Patch 2-6: Add FNAME(sync_page)
Patch 7-14: Refactor code which uses FNAME(invlpg) and finally use FNAME(sync_page).

Changed from V2:
Convert the address type and fix subtle bug
Check mmu->sync_page pointer before calling it
Fix the defination of KVM_MMU_ROOT_XXX

[V2]: https://lore.kernel.org/lkml/[email protected]/
[V1]: https://lore.kernel.org/lkml/[email protected]/

Lai Jiangshan (13):
KVM: x86/mmu: Use 64-bit address to invalidate to fix a subtle bug
kvm: x86/mmu: Move the check in FNAME(sync_page) as
kvm_sync_page_check()
kvm: x86/mmu: Check mmu->sync_page pointer in kvm_sync_page_check()
kvm: x86/mmu: Set mmu->sync_page as NULL for direct paging
kvm: x86/mmu: Move the code out of FNAME(sync_page)'s loop body into
mmu.c
kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_spte)
kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_addr()
kvm: x86/mmu: Use kvm_mmu_invalidate_addr() in kvm_mmu_invpcid_gva()
kvm: x86/mmu: Use kvm_mmu_invalidate_addr() in
nested_ept_invalidate_addr()
kvm: x86/mmu: Allow the roots to be invalid in FNAME(invlpg)
kvm: x86/mmu: Remove FNAME(invlpg) and use FNAME(sync_spte) to update
vTLB instead.
kvm: x86/mmu: Remove @no_dirty_log from FNAME(prefetch_gpte)
kvm: x86/mmu: Skip calling mmu->sync_spte() when the spte is 0

Sean Christopherson (1):
KVM: x86/mmu: Sanity check input to kvm_mmu_free_roots()

arch/x86/include/asm/kvm_host.h | 17 ++-
arch/x86/kvm/mmu/mmu.c | 201 ++++++++++++++++++++----------
arch/x86/kvm/mmu/paging_tmpl.h | 209 +++++++++-----------------------
arch/x86/kvm/vmx/nested.c | 5 +-
arch/x86/kvm/x86.c | 4 +-
5 files changed, 205 insertions(+), 231 deletions(-)

--
2.19.1.6.gb485710b



2023-02-16 15:40:23

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 01/14] KVM: x86/mmu: Use 64-bit address to invalidate to fix a subtle bug

From: Lai Jiangshan <[email protected]>

FNAME(invlpg)() and kvm_mmu_invalidate_gva() take a gva_t,
i.e. unsigned long, as the type of the address to invalidate.
On 32-bit kernels, the upper 32 bits of the GPA will get dropped when
an L2 GPA address is to invalidate in the shadowed TDP MMU.

Convert it to u64 to fix the problem.

Reported-by: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 +++---
arch/x86/kvm/mmu/mmu.c | 16 ++++++++--------
arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++---
arch/x86/kvm/x86.c | 4 ++--
4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4d2bc08794e4..5466f4152c67 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -443,7 +443,7 @@ struct kvm_mmu {
struct x86_exception *exception);
int (*sync_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp);
- void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
+ void (*invlpg)(struct kvm_vcpu *vcpu, u64 addr, hpa_t root_hpa);
struct kvm_mmu_root_info root;
union kvm_cpu_role cpu_role;
union kvm_mmu_page_role root_role;
@@ -2025,8 +2025,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len);
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
-void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- gva_t gva, hpa_t root_hpa);
+void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ u64 addr, hpa_t root_hpa);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c91ee2927dd7..91f8e1d1d4cc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5706,25 +5706,25 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

-void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- gva_t gva, hpa_t root_hpa)
+void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ u64 addr, hpa_t root_hpa)
{
int i;

/* It's actually a GPA for vcpu->arch.guest_mmu. */
if (mmu != &vcpu->arch.guest_mmu) {
/* INVLPG on a non-canonical address is a NOP according to the SDM. */
- if (is_noncanonical_address(gva, vcpu))
+ if (is_noncanonical_address(addr, vcpu))
return;

- static_call(kvm_x86_flush_tlb_gva)(vcpu, gva);
+ static_call(kvm_x86_flush_tlb_gva)(vcpu, addr);
}

if (!mmu->invlpg)
return;

if (root_hpa == INVALID_PAGE) {
- mmu->invlpg(vcpu, gva, mmu->root.hpa);
+ mmu->invlpg(vcpu, addr, mmu->root.hpa);

/*
* INVLPG is required to invalidate any global mappings for the VA,
@@ -5739,15 +5739,15 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
*/
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
if (VALID_PAGE(mmu->prev_roots[i].hpa))
- mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
+ mmu->invlpg(vcpu, addr, mmu->prev_roots[i].hpa);
} else {
- mmu->invlpg(vcpu, gva, root_hpa);
+ mmu->invlpg(vcpu, addr, root_hpa);
}
}

void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
{
- kvm_mmu_invalidate_gva(vcpu, vcpu->arch.walk_mmu, gva, INVALID_PAGE);
+ kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, INVALID_PAGE);
++vcpu->stat.invlpg;
}
EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 57f0b75c80f9..c7b1de064be5 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -887,7 +887,8 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
}

-static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
+/* Note, @addr is a GPA when invlpg() invalidates an L2 GPA translation in shadowed TDP */
+static void FNAME(invlpg)(struct kvm_vcpu *vcpu, u64 addr, hpa_t root_hpa)
{
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -895,7 +896,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
int level;
u64 *sptep;

- vcpu_clear_mmio_info(vcpu, gva);
+ vcpu_clear_mmio_info(vcpu, addr);

/*
* No need to check return value here, rmap_can_add() can
@@ -909,7 +910,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
}

write_lock(&vcpu->kvm->mmu_lock);
- for_each_shadow_entry_using_root(vcpu, root_hpa, gva, iterator) {
+ for_each_shadow_entry_using_root(vcpu, root_hpa, addr, iterator) {
level = iterator.level;
sptep = iterator.sptep;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 508074e47bc0..b9663623c128 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -798,8 +798,8 @@ void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
*/
if ((fault->error_code & PFERR_PRESENT_MASK) &&
!(fault->error_code & PFERR_RSVD_MASK))
- kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
- fault_mmu->root.hpa);
+ kvm_mmu_invalidate_addr(vcpu, fault_mmu, fault->address,
+ fault_mmu->root.hpa);

fault_mmu->inject_page_fault(vcpu, fault);
}
--
2.19.1.6.gb485710b


2023-02-16 15:40:28

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 02/14] kvm: x86/mmu: Move the check in FNAME(sync_page) as kvm_sync_page_check()

From: Lai Jiangshan <[email protected]>

Prepare to check mmu->sync_page pointer before calling it.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 43 +++++++++++++++++++++++++++++++++-
arch/x86/kvm/mmu/paging_tmpl.h | 27 ---------------------
2 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91f8e1d1d4cc..ee2837ea18d4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1914,10 +1914,51 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
&(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)]) \
if ((_sp)->gfn != (_gfn) || !sp_has_gptes(_sp)) {} else

+static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+ union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
+
+ /*
+ * Ignore various flags when verifying that it's safe to sync a shadow
+ * page using the current MMU context.
+ *
+ * - level: not part of the overall MMU role and will never match as the MMU's
+ * level tracks the root level
+ * - access: updated based on the new guest PTE
+ * - quadrant: not part of the overall MMU role (similar to level)
+ */
+ const union kvm_mmu_page_role sync_role_ign = {
+ .level = 0xf,
+ .access = 0x7,
+ .quadrant = 0x3,
+ .passthrough = 0x1,
+ };
+
+ /*
+ * Direct pages can never be unsync, and KVM should never attempt to
+ * sync a shadow page for a different MMU context, e.g. if the role
+ * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
+ * reserved bits checks will be wrong, etc...
+ */
+ if (WARN_ON_ONCE(sp->role.direct ||
+ (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
+ return false;
+
+ return true;
+}
+
+static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+ if (!kvm_sync_page_check(vcpu, sp))
+ return -1;
+
+ return vcpu->arch.mmu->sync_page(vcpu, sp);
+}
+
static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
struct list_head *invalid_list)
{
- int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
+ int ret = __kvm_sync_page(vcpu, sp);

if (ret < 0)
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index c7b1de064be5..e0aae0a7f646 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -984,38 +984,11 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
*/
static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
{
- union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
int i;
bool host_writable;
gpa_t first_pte_gpa;
bool flush = false;

- /*
- * Ignore various flags when verifying that it's safe to sync a shadow
- * page using the current MMU context.
- *
- * - level: not part of the overall MMU role and will never match as the MMU's
- * level tracks the root level
- * - access: updated based on the new guest PTE
- * - quadrant: not part of the overall MMU role (similar to level)
- */
- const union kvm_mmu_page_role sync_role_ign = {
- .level = 0xf,
- .access = 0x7,
- .quadrant = 0x3,
- .passthrough = 0x1,
- };
-
- /*
- * Direct pages can never be unsync, and KVM should never attempt to
- * sync a shadow page for a different MMU context, e.g. if the role
- * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
- * reserved bits checks will be wrong, etc...
- */
- if (WARN_ON_ONCE(sp->role.direct ||
- (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
- return -1;
-
first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);

for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
--
2.19.1.6.gb485710b


2023-02-16 15:40:33

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 03/14] kvm: x86/mmu: Check mmu->sync_page pointer in kvm_sync_page_check()

From: Lai Jiangshan <[email protected]>

Check the pointer before calling it to catch any possible mistake.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ee2837ea18d4..69ab0d1bb0ec 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1940,7 +1940,7 @@ static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
* differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
* reserved bits checks will be wrong, etc...
*/
- if (WARN_ON_ONCE(sp->role.direct ||
+ if (WARN_ON_ONCE(sp->role.direct || !vcpu->arch.mmu->sync_page ||
(sp->role.word ^ root_role.word) & ~sync_role_ign.word))
return false;

--
2.19.1.6.gb485710b


2023-02-16 15:40:45

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 04/14] kvm: x86/mmu: Set mmu->sync_page as NULL for direct paging

From: Lai Jiangshan <[email protected]>

mmu->sync_page for direct paging is never called.

And both mmu->sync_page and mm->invlpg only make sense in shadow paging.
Setting mmu->sync_page as NULL for direct paging makes it consistent
with mm->invlpg which is set NULL for the case.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 69ab0d1bb0ec..f50f82bb3662 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1789,12 +1789,6 @@ static void mark_unsync(u64 *spte)
kvm_mmu_mark_parents_unsync(sp);
}

-static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp)
-{
- return -1;
-}
-
#define KVM_PAGE_ARRAY_NR 16

struct kvm_mmu_pages {
@@ -4510,7 +4504,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
{
context->page_fault = nonpaging_page_fault;
context->gva_to_gpa = nonpaging_gva_to_gpa;
- context->sync_page = nonpaging_sync_page;
+ context->sync_page = NULL;
context->invlpg = NULL;
}

@@ -5198,7 +5192,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
context->cpu_role.as_u64 = cpu_role.as_u64;
context->root_role.word = root_role.word;
context->page_fault = kvm_tdp_page_fault;
- context->sync_page = nonpaging_sync_page;
+ context->sync_page = NULL;
context->invlpg = NULL;
context->get_guest_pgd = get_cr3;
context->get_pdptr = kvm_pdptr_read;
--
2.19.1.6.gb485710b


2023-02-16 15:40:52

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 05/14] kvm: x86/mmu: Move the code out of FNAME(sync_page)'s loop body into mmu.c

From: Lai Jiangshan <[email protected]>

Rename mmu->sync_page to mmu->sync_spte and move the code out
of FNAME(sync_page)'s loop body into mmu.c.

No functionalities change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/mmu/mmu.c | 34 ++++++++--
arch/x86/kvm/mmu/paging_tmpl.h | 114 +++++++++++++-------------------
3 files changed, 76 insertions(+), 76 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5466f4152c67..b71b52fdb5ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -441,8 +441,8 @@ struct kvm_mmu {
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t gva_or_gpa, u64 access,
struct x86_exception *exception);
- int (*sync_page)(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp);
+ int (*sync_spte)(struct kvm_vcpu *vcpu,
+ struct kvm_mmu_page *sp, int i);
void (*invlpg)(struct kvm_vcpu *vcpu, u64 addr, hpa_t root_hpa);
struct kvm_mmu_root_info root;
union kvm_cpu_role cpu_role;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f50f82bb3662..a8231b73ad4d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1934,7 +1934,7 @@ static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
* differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
* reserved bits checks will be wrong, etc...
*/
- if (WARN_ON_ONCE(sp->role.direct || !vcpu->arch.mmu->sync_page ||
+ if (WARN_ON_ONCE(sp->role.direct || !vcpu->arch.mmu->sync_spte ||
(sp->role.word ^ root_role.word) & ~sync_role_ign.word))
return false;

@@ -1943,10 +1943,30 @@ static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)

static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
{
+ int flush = 0;
+ int i;
+
if (!kvm_sync_page_check(vcpu, sp))
return -1;

- return vcpu->arch.mmu->sync_page(vcpu, sp);
+ for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
+ int ret = vcpu->arch.mmu->sync_spte(vcpu, sp, i);
+
+ if (ret < -1)
+ return -1;
+ flush |= ret;
+ }
+
+ /*
+ * Note, any flush is purely for KVM's correctness, e.g. when dropping
+ * an existing SPTE or clearing W/A/D bits to ensure an mmu_notifier
+ * unmap or dirty logging event doesn't fail to flush. The guest is
+ * responsible for flushing the TLB to ensure any changes in protection
+ * bits are recognized, i.e. until the guest flushes or page faults on
+ * a relevant address, KVM is architecturally allowed to let vCPUs use
+ * cached translations with the old protection bits.
+ */
+ return flush;
}

static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
@@ -4504,7 +4524,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
{
context->page_fault = nonpaging_page_fault;
context->gva_to_gpa = nonpaging_gva_to_gpa;
- context->sync_page = NULL;
+ context->sync_spte = NULL;
context->invlpg = NULL;
}

@@ -5095,7 +5115,7 @@ static void paging64_init_context(struct kvm_mmu *context)
{
context->page_fault = paging64_page_fault;
context->gva_to_gpa = paging64_gva_to_gpa;
- context->sync_page = paging64_sync_page;
+ context->sync_spte = paging64_sync_spte;
context->invlpg = paging64_invlpg;
}

@@ -5103,7 +5123,7 @@ static void paging32_init_context(struct kvm_mmu *context)
{
context->page_fault = paging32_page_fault;
context->gva_to_gpa = paging32_gva_to_gpa;
- context->sync_page = paging32_sync_page;
+ context->sync_spte = paging32_sync_spte;
context->invlpg = paging32_invlpg;
}

@@ -5192,7 +5212,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
context->cpu_role.as_u64 = cpu_role.as_u64;
context->root_role.word = root_role.word;
context->page_fault = kvm_tdp_page_fault;
- context->sync_page = NULL;
+ context->sync_spte = NULL;
context->invlpg = NULL;
context->get_guest_pgd = get_cr3;
context->get_pdptr = kvm_pdptr_read;
@@ -5324,7 +5344,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,

context->page_fault = ept_page_fault;
context->gva_to_gpa = ept_gva_to_gpa;
- context->sync_page = ept_sync_page;
+ context->sync_spte = ept_sync_spte;
context->invlpg = ept_invlpg;

update_permission_bitmask(context, true);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e0aae0a7f646..0ea938276ba8 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -978,87 +978,67 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
* can't change unless all sptes pointing to it are nuked first.
*
* Returns
- * < 0: the sp should be zapped
- * 0: the sp is synced and no tlb flushing is required
- * > 0: the sp is synced and tlb flushing is required
+ * < 0: failed to sync spte
+ * 0: the spte is synced and no tlb flushing is required
+ * > 0: the spte is synced and tlb flushing is required
*/
-static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i)
{
- int i;
bool host_writable;
gpa_t first_pte_gpa;
- bool flush = false;
-
- first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
-
- for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
- u64 *sptep, spte;
- struct kvm_memory_slot *slot;
- unsigned pte_access;
- pt_element_t gpte;
- gpa_t pte_gpa;
- gfn_t gfn;
-
- if (!sp->spt[i])
- continue;
+ u64 *sptep, spte;
+ struct kvm_memory_slot *slot;
+ unsigned pte_access;
+ pt_element_t gpte;
+ gpa_t pte_gpa;
+ gfn_t gfn;

- pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
+ if (!sp->spt[i])
+ return 0;

- if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
- sizeof(pt_element_t)))
- return -1;
+ first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
+ pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);

- if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
- flush = true;
- continue;
- }
+ if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
+ sizeof(pt_element_t)))
+ return -1;

- gfn = gpte_to_gfn(gpte);
- pte_access = sp->role.access;
- pte_access &= FNAME(gpte_access)(gpte);
- FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
+ if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte))
+ return 1;

- if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
- continue;
+ gfn = gpte_to_gfn(gpte);
+ pte_access = sp->role.access;
+ pte_access &= FNAME(gpte_access)(gpte);
+ FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);

- /*
- * Drop the SPTE if the new protections would result in a RWX=0
- * SPTE or if the gfn is changing. The RWX=0 case only affects
- * EPT with execute-only support, i.e. EPT without an effective
- * "present" bit, as all other paging modes will create a
- * read-only SPTE if pte_access is zero.
- */
- if ((!pte_access && !shadow_present_mask) ||
- gfn != kvm_mmu_page_get_gfn(sp, i)) {
- drop_spte(vcpu->kvm, &sp->spt[i]);
- flush = true;
- continue;
- }
+ if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
+ return 0;

- /* Update the shadowed access bits in case they changed. */
- kvm_mmu_page_set_access(sp, i, pte_access);
+ /*
+ * Drop the SPTE if the new protections would result in a RWX=0
+ * SPTE or if the gfn is changing. The RWX=0 case only affects
+ * EPT with execute-only support, i.e. EPT without an effective
+ * "present" bit, as all other paging modes will create a
+ * read-only SPTE if pte_access is zero.
+ */
+ if ((!pte_access && !shadow_present_mask) ||
+ gfn != kvm_mmu_page_get_gfn(sp, i)) {
+ drop_spte(vcpu->kvm, &sp->spt[i]);
+ return 1;
+ }

- sptep = &sp->spt[i];
- spte = *sptep;
- host_writable = spte & shadow_host_writable_mask;
- slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
- make_spte(vcpu, sp, slot, pte_access, gfn,
- spte_to_pfn(spte), spte, true, false,
- host_writable, &spte);
+ /* Update the shadowed access bits in case they changed. */
+ kvm_mmu_page_set_access(sp, i, pte_access);

- flush |= mmu_spte_update(sptep, spte);
- }
+ sptep = &sp->spt[i];
+ spte = *sptep;
+ host_writable = spte & shadow_host_writable_mask;
+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ make_spte(vcpu, sp, slot, pte_access, gfn,
+ spte_to_pfn(spte), spte, true, false,
+ host_writable, &spte);

- /*
- * Note, any flush is purely for KVM's correctness, e.g. when dropping
- * an existing SPTE or clearing W/A/D bits to ensure an mmu_notifier
- * unmap or dirty logging event doesn't fail to flush. The guest is
- * responsible for flushing the TLB to ensure any changes in protection
- * bits are recognized, i.e. until the guest flushes or page faults on
- * a relevant address, KVM is architecturally allowed to let vCPUs use
- * cached translations with the old protection bits.
- */
- return flush;
+ return mmu_spte_update(sptep, spte);
}

#undef pt_element_t
--
2.19.1.6.gb485710b


2023-02-16 15:41:02

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 06/14] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_spte)

From: Lai Jiangshan <[email protected]>

Sometimes when the guest updates its pagetable, it adds only new gptes
to it without changing any existed one, so there is no point to update
the sptes for these existed gptes.

Also when the sptes for these unchanged gptes are updated, the AD
bits are also removed since make_spte() is called with prefetch=true
which might result unneeded TLB flushing.

Just do nothing if the gpte's permissions are unchanged.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0ea938276ba8..7db167876cd7 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -1026,6 +1026,11 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
drop_spte(vcpu->kvm, &sp->spt[i]);
return 1;
}
+ /*
+ * Do nothing if the permissions are unchanged.
+ */
+ if (kvm_mmu_page_get_access(sp, i) == pte_access)
+ return 0;

/* Update the shadowed access bits in case they changed. */
kvm_mmu_page_set_access(sp, i, pte_access);
--
2.19.1.6.gb485710b


2023-02-16 15:41:12

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 07/14] KVM: x86/mmu: Sanity check input to kvm_mmu_free_roots()

From: Sean Christopherson <[email protected]>

Tweak KVM_MMU_ROOTS_ALL to precisely cover all current+previous root
flags, and add a sanity in kvm_mmu_free_roots() to verify that the set
of roots to free doesn't stray outside KVM_MMU_ROOTS_ALL.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++----
arch/x86/kvm/mmu/mmu.c | 2 ++
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b71b52fdb5ee..5bd91c49c8b3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -422,6 +422,10 @@ struct kvm_mmu_root_info {

#define KVM_MMU_NUM_PREV_ROOTS 3

+#define KVM_MMU_ROOT_CURRENT BIT(0)
+#define KVM_MMU_ROOT_PREVIOUS(i) BIT(1+i)
+#define KVM_MMU_ROOTS_ALL (BIT(1 + KVM_MMU_NUM_PREV_ROOTS) - 1)
+
#define KVM_HAVE_MMU_RWLOCK

struct kvm_mmu_page;
@@ -1978,10 +1982,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
return !!(*irq_state);
}

-#define KVM_MMU_ROOT_CURRENT BIT(0)
-#define KVM_MMU_ROOT_PREVIOUS(i) BIT(1+i)
-#define KVM_MMU_ROOTS_ALL (~0UL)
-
int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int irq_source_id, int level);
void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8231b73ad4d..a4793cb8d64a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3568,6 +3568,8 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
LIST_HEAD(invalid_list);
bool free_active_root;

+ WARN_ON_ONCE(roots_to_free & ~KVM_MMU_ROOTS_ALL);
+
BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);

/* Before acquiring the MMU lock, see if we need to do any real work. */
--
2.19.1.6.gb485710b


2023-02-16 15:41:21

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 08/14] kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_addr()

From: Lai Jiangshan <[email protected]>

The @root_hpa for kvm_mmu_invalidate_addr() is called with @mmu->root.hpa
or INVALID_PAGE where @mmu->root.hpa is to invalidate gva for the current
root (the same meaning as KVM_MMU_ROOT_CURRENT) and INVALID_PAGE is to
invalidate gva for all roots (the same meaning as KVM_MMU_ROOTS_ALL).

Change the argument type of kvm_mmu_invalidate_addr() and use
KVM_MMU_ROOT_XXX instead so that we can reuse the function for
kvm_mmu_invpcid_gva() and nested_ept_invalidate_addr() for invalidating
gva for different set of roots.

No fuctionalities changed.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++----------------
arch/x86/kvm/x86.c | 2 +-
3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5bd91c49c8b3..cce4243d6688 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2026,7 +2026,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len);
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- u64 addr, hpa_t root_hpa);
+ u64 addr, unsigned long roots);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a4793cb8d64a..9f261e444a32 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5764,10 +5764,12 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- u64 addr, hpa_t root_hpa)
+ u64 addr, unsigned long roots)
{
int i;

+ WARN_ON_ONCE(roots & ~KVM_MMU_ROOTS_ALL);
+
/* It's actually a GPA for vcpu->arch.guest_mmu. */
if (mmu != &vcpu->arch.guest_mmu) {
/* INVLPG on a non-canonical address is a NOP according to the SDM. */
@@ -5780,31 +5782,30 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
if (!mmu->invlpg)
return;

- if (root_hpa == INVALID_PAGE) {
+ if (roots & KVM_MMU_ROOT_CURRENT)
mmu->invlpg(vcpu, addr, mmu->root.hpa);

- /*
- * INVLPG is required to invalidate any global mappings for the VA,
- * irrespective of PCID. Since it would take us roughly similar amount
- * of work to determine whether any of the prev_root mappings of the VA
- * is marked global, or to just sync it blindly, so we might as well
- * just always sync it.
- *
- * Mappings not reachable via the current cr3 or the prev_roots will be
- * synced when switching to that cr3, so nothing needs to be done here
- * for them.
- */
- for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
- if (VALID_PAGE(mmu->prev_roots[i].hpa))
- mmu->invlpg(vcpu, addr, mmu->prev_roots[i].hpa);
- } else {
- mmu->invlpg(vcpu, addr, root_hpa);
+ for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+ if ((roots & KVM_MMU_ROOT_PREVIOUS(i)) &&
+ VALID_PAGE(mmu->prev_roots[i].hpa))
+ mmu->invlpg(vcpu, addr, mmu->prev_roots[i].hpa);
}
}

void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
{
- kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, INVALID_PAGE);
+ /*
+ * INVLPG is required to invalidate any global mappings for the VA,
+ * irrespective of PCID. Since it would take us roughly similar amount
+ * of work to determine whether any of the prev_root mappings of the VA
+ * is marked global, or to just sync it blindly, so we might as well
+ * just always sync it.
+ *
+ * Mappings not reachable via the current cr3 or the prev_roots will be
+ * synced when switching to that cr3, so nothing needs to be done here
+ * for them.
+ */
+ kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL);
++vcpu->stat.invlpg;
}
EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b9663623c128..37958763ae2f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -799,7 +799,7 @@ void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
if ((fault->error_code & PFERR_PRESENT_MASK) &&
!(fault->error_code & PFERR_RSVD_MASK))
kvm_mmu_invalidate_addr(vcpu, fault_mmu, fault->address,
- fault_mmu->root.hpa);
+ KVM_MMU_ROOT_CURRENT);

fault_mmu->inject_page_fault(vcpu, fault);
}
--
2.19.1.6.gb485710b


2023-02-16 15:41:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 09/14] kvm: x86/mmu: Use kvm_mmu_invalidate_addr() in kvm_mmu_invpcid_gva()

From: Lai Jiangshan <[email protected]>

Use kvm_mmu_invalidate_addr() instead open calls to mmu->invlpg().

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9f261e444a32..c48f98fbd6ae 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5814,27 +5814,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
{
struct kvm_mmu *mmu = vcpu->arch.mmu;
- bool tlb_flush = false;
+ unsigned long roots = 0;
uint i;

- if (pcid == kvm_get_active_pcid(vcpu)) {
- if (mmu->invlpg)
- mmu->invlpg(vcpu, gva, mmu->root.hpa);
- tlb_flush = true;
- }
+ if (pcid == kvm_get_active_pcid(vcpu))
+ roots |= KVM_MMU_ROOT_CURRENT;

for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
if (VALID_PAGE(mmu->prev_roots[i].hpa) &&
- pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd)) {
- if (mmu->invlpg)
- mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
- tlb_flush = true;
- }
+ pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd))
+ roots |= KVM_MMU_ROOT_PREVIOUS(i);
}

- if (tlb_flush)
- static_call(kvm_x86_flush_tlb_gva)(vcpu, gva);
-
+ if (roots)
+ kvm_mmu_invalidate_addr(vcpu, mmu, gva, roots);
++vcpu->stat.invlpg;

/*
--
2.19.1.6.gb485710b


2023-02-16 23:52:27

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 10/14] kvm: x86/mmu: Use kvm_mmu_invalidate_addr() in nested_ept_invalidate_addr()

From: Lai Jiangshan <[email protected]>

Use kvm_mmu_invalidate_addr() instead open calls to mmu->invlpg().

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/vmx/nested.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c48f98fbd6ae..9b5e3afbcdb4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5791,6 +5791,7 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
mmu->invlpg(vcpu, addr, mmu->prev_roots[i].hpa);
}
}
+EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);

void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
{
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 557b9c468734..cb502bbaee87 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -358,6 +358,7 @@ static bool nested_ept_root_matches(hpa_t root_hpa, u64 root_eptp, u64 eptp)
static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
gpa_t addr)
{
+ unsigned long roots = 0;
uint i;
struct kvm_mmu_root_info *cached_root;

@@ -368,8 +369,10 @@ static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,

if (nested_ept_root_matches(cached_root->hpa, cached_root->pgd,
eptp))
- vcpu->arch.mmu->invlpg(vcpu, addr, cached_root->hpa);
+ roots |= KVM_MMU_ROOT_PREVIOUS(i);
}
+ if (roots)
+ kvm_mmu_invalidate_addr(vcpu, vcpu->arch.mmu, addr, roots);
}

static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
--
2.19.1.6.gb485710b


2023-02-16 23:52:35

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 11/14] kvm: x86/mmu: Allow the roots to be invalid in FNAME(invlpg)

From: Lai Jiangshan <[email protected]>

Don't assume the current root to be valid, just check it and remove
the WARN().

Also move the code to check if the root is valid into FNAME(invlpg)
to simplify the code.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +--
arch/x86/kvm/mmu/paging_tmpl.h | 4 +---
2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9b5e3afbcdb4..7d5ff2b0f6d5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5786,8 +5786,7 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
mmu->invlpg(vcpu, addr, mmu->root.hpa);

for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
- if ((roots & KVM_MMU_ROOT_PREVIOUS(i)) &&
- VALID_PAGE(mmu->prev_roots[i].hpa))
+ if (roots & KVM_MMU_ROOT_PREVIOUS(i))
mmu->invlpg(vcpu, addr, mmu->prev_roots[i].hpa);
}
}
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 7db167876cd7..9be5a0f22a9f 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -904,10 +904,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, u64 addr, hpa_t root_hpa)
*/
mmu_topup_memory_caches(vcpu, true);

- if (!VALID_PAGE(root_hpa)) {
- WARN_ON(1);
+ if (!VALID_PAGE(root_hpa))
return;
- }

write_lock(&vcpu->kvm->mmu_lock);
for_each_shadow_entry_using_root(vcpu, root_hpa, addr, iterator) {
--
2.19.1.6.gb485710b


2023-02-16 23:52:38

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 12/14] kvm: x86/mmu: Remove FNAME(invlpg) and use FNAME(sync_spte) to update vTLB instead.

From: Lai Jiangshan <[email protected]>

In hardware TLB, invalidating TLB entries means the translations are
removed from the TLB.

In KVM shadowed vTLB, the translations (combinations of shadow paging
and hardware TLB) are generally maintained as long as they remain clean
when the TLB of an address space (i.e. a PCID or all) is flushed with
the help of write-protections, sp->unsync, and kvm_sync_page().

However, a single vTLB entry is always removed in FNAME(invlpg) if
sp->unsync and then recreated, and thus a remote flush is required
even the original vTLB entry is clean.

Besides this, it is a duplicate implementation of FNAME(sync_spte) to
invalidate a vTLB entry.

To address this, FNAME(sync_spte) can be used to share the code and
slightly modify the semantics, where clean vTLB entries are kept.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu/mmu.c | 56 ++++++++++++++++++++++----------
arch/x86/kvm/mmu/paging_tmpl.h | 57 ---------------------------------
3 files changed, 39 insertions(+), 75 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cce4243d6688..79dbf20ca026 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -447,7 +447,6 @@ struct kvm_mmu {
struct x86_exception *exception);
int (*sync_spte)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, int i);
- void (*invlpg)(struct kvm_vcpu *vcpu, u64 addr, hpa_t root_hpa);
struct kvm_mmu_root_info root;
union kvm_cpu_role cpu_role;
union kvm_mmu_page_role root_role;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7d5ff2b0f6d5..a8ac37d51287 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1073,14 +1073,6 @@ static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
}

-static bool rmap_can_add(struct kvm_vcpu *vcpu)
-{
- struct kvm_mmu_memory_cache *mc;
-
- mc = &vcpu->arch.mmu_pte_list_desc_cache;
- return kvm_mmu_memory_cache_nr_free_objects(mc);
-}
-
static void rmap_remove(struct kvm *kvm, u64 *spte)
{
struct kvm_memslots *slots;
@@ -4527,7 +4519,6 @@ static void nonpaging_init_context(struct kvm_mmu *context)
context->page_fault = nonpaging_page_fault;
context->gva_to_gpa = nonpaging_gva_to_gpa;
context->sync_spte = NULL;
- context->invlpg = NULL;
}

static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
@@ -5118,7 +5109,6 @@ static void paging64_init_context(struct kvm_mmu *context)
context->page_fault = paging64_page_fault;
context->gva_to_gpa = paging64_gva_to_gpa;
context->sync_spte = paging64_sync_spte;
- context->invlpg = paging64_invlpg;
}

static void paging32_init_context(struct kvm_mmu *context)
@@ -5126,7 +5116,6 @@ static void paging32_init_context(struct kvm_mmu *context)
context->page_fault = paging32_page_fault;
context->gva_to_gpa = paging32_gva_to_gpa;
context->sync_spte = paging32_sync_spte;
- context->invlpg = paging32_invlpg;
}

static union kvm_cpu_role
@@ -5215,7 +5204,6 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
context->root_role.word = root_role.word;
context->page_fault = kvm_tdp_page_fault;
context->sync_spte = NULL;
- context->invlpg = NULL;
context->get_guest_pgd = get_cr3;
context->get_pdptr = kvm_pdptr_read;
context->inject_page_fault = kvm_inject_page_fault;
@@ -5347,7 +5335,6 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
context->page_fault = ept_page_fault;
context->gva_to_gpa = ept_gva_to_gpa;
context->sync_spte = ept_sync_spte;
- context->invlpg = ept_invlpg;

update_permission_bitmask(context, true);
context->pkru_mask = 0;
@@ -5388,7 +5375,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu,
* L2 page tables are never shadowed, so there is no need to sync
* SPTEs.
*/
- g_context->invlpg = NULL;
+ g_context->sync_spte = NULL;

/*
* Note that arch.mmu->gva_to_gpa translates l2_gpa to l1_gpa using
@@ -5763,6 +5750,41 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

+static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ u64 addr, hpa_t root_hpa)
+{
+ struct kvm_shadow_walk_iterator iterator;
+
+ vcpu_clear_mmio_info(vcpu, addr);
+
+ if (!VALID_PAGE(root_hpa))
+ return;
+
+ write_lock(&vcpu->kvm->mmu_lock);
+ for_each_shadow_entry_using_root(vcpu, root_hpa, addr, iterator) {
+ struct kvm_mmu_page *sp = sptep_to_sp(iterator.sptep);
+
+ if (sp->unsync) {
+ /*
+ * Get the gfn beforehand for later flushing.
+ * Although mmu->sync_spte() doesn't change it, but just
+ * avoid the dependence.
+ */
+ gfn_t gfn = kvm_mmu_page_get_gfn(sp, iterator.index);
+ int ret = mmu->sync_spte(vcpu, sp, iterator.index);
+
+ if (ret < 0)
+ mmu_page_zap_pte(vcpu->kvm, sp, iterator.sptep, NULL);
+ if (ret)
+ kvm_flush_remote_tlbs_gfn(vcpu->kvm, gfn, PG_LEVEL_4K);
+ }
+
+ if (!sp->unsync_children)
+ break;
+ }
+ write_unlock(&vcpu->kvm->mmu_lock);
+}
+
void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
u64 addr, unsigned long roots)
{
@@ -5779,15 +5801,15 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
static_call(kvm_x86_flush_tlb_gva)(vcpu, addr);
}

- if (!mmu->invlpg)
+ if (!mmu->sync_spte)
return;

if (roots & KVM_MMU_ROOT_CURRENT)
- mmu->invlpg(vcpu, addr, mmu->root.hpa);
+ __kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->root.hpa);

for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
if (roots & KVM_MMU_ROOT_PREVIOUS(i))
- mmu->invlpg(vcpu, addr, mmu->prev_roots[i].hpa);
+ __kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
}
}
EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 9be5a0f22a9f..fca5ce349d9d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -887,63 +887,6 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
}

-/* Note, @addr is a GPA when invlpg() invalidates an L2 GPA translation in shadowed TDP */
-static void FNAME(invlpg)(struct kvm_vcpu *vcpu, u64 addr, hpa_t root_hpa)
-{
- struct kvm_shadow_walk_iterator iterator;
- struct kvm_mmu_page *sp;
- u64 old_spte;
- int level;
- u64 *sptep;
-
- vcpu_clear_mmio_info(vcpu, addr);
-
- /*
- * No need to check return value here, rmap_can_add() can
- * help us to skip pte prefetch later.
- */
- mmu_topup_memory_caches(vcpu, true);
-
- if (!VALID_PAGE(root_hpa))
- return;
-
- write_lock(&vcpu->kvm->mmu_lock);
- for_each_shadow_entry_using_root(vcpu, root_hpa, addr, iterator) {
- level = iterator.level;
- sptep = iterator.sptep;
-
- sp = sptep_to_sp(sptep);
- old_spte = *sptep;
- if (is_last_spte(old_spte, level)) {
- pt_element_t gpte;
- gpa_t pte_gpa;
-
- if (!sp->unsync)
- break;
-
- pte_gpa = FNAME(get_level1_sp_gpa)(sp);
- pte_gpa += spte_index(sptep) * sizeof(pt_element_t);
-
- mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
- if (is_shadow_present_pte(old_spte))
- kvm_flush_remote_tlbs_sptep(vcpu->kvm, sptep);
-
- if (!rmap_can_add(vcpu))
- break;
-
- if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
- sizeof(pt_element_t)))
- break;
-
- FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
- }
-
- if (!sp->unsync_children)
- break;
- }
- write_unlock(&vcpu->kvm->mmu_lock);
-}
-
/* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t addr, u64 access,
--
2.19.1.6.gb485710b


2023-02-16 23:52:47

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 13/14] kvm: x86/mmu: Remove @no_dirty_log from FNAME(prefetch_gpte)

From: Lai Jiangshan <[email protected]>

FNAME(prefetch_gpte) is always called with @no_dirty_log=true.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index fca5ce349d9d..e04950015dc4 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -519,7 +519,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,

static bool
FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
- u64 *spte, pt_element_t gpte, bool no_dirty_log)
+ u64 *spte, pt_element_t gpte)
{
struct kvm_memory_slot *slot;
unsigned pte_access;
@@ -535,8 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
pte_access = sp->role.access & FNAME(gpte_access)(gpte);
FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);

- slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
- no_dirty_log && (pte_access & ACC_WRITE_MASK));
+ slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, pte_access & ACC_WRITE_MASK);
if (!slot)
return false;

@@ -605,7 +604,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
if (is_shadow_present_pte(*spte))
continue;

- if (!FNAME(prefetch_gpte)(vcpu, sp, spte, gptep[i], true))
+ if (!FNAME(prefetch_gpte)(vcpu, sp, spte, gptep[i]))
break;
}
}
--
2.19.1.6.gb485710b


2023-02-16 23:52:58

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 14/14] kvm: x86/mmu: Skip calling mmu->sync_spte() when the spte is 0

From: Lai Jiangshan <[email protected]>

Sync the spte only when the spte is set and avoid the indirect branch.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 4 ++--
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8ac37d51287..cd8c38463c97 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1942,7 +1942,7 @@ static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
return -1;

for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
- int ret = vcpu->arch.mmu->sync_spte(vcpu, sp, i);
+ int ret = sp->spt[i] ? vcpu->arch.mmu->sync_spte(vcpu, sp, i) : 0;

if (ret < -1)
return -1;
@@ -5764,7 +5764,7 @@ static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
for_each_shadow_entry_using_root(vcpu, root_hpa, addr, iterator) {
struct kvm_mmu_page *sp = sptep_to_sp(iterator.sptep);

- if (sp->unsync) {
+ if (sp->unsync && *iterator.sptep) {
/*
* Get the gfn beforehand for later flushing.
* Although mmu->sync_spte() doesn't change it, but just
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e04950015dc4..3373d6705634 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -933,7 +933,7 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
gpa_t pte_gpa;
gfn_t gfn;

- if (!sp->spt[i])
+ if (WARN_ON_ONCE(!sp->spt[i]))
return 0;

first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
--
2.19.1.6.gb485710b


2023-03-23 22:54:51

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V3 00/14] kvm: x86/mmu: Share the same code to invalidate each vTLB entry

On Thu, 16 Feb 2023 23:41:06 +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> FNAME(invlpg) and FNAME(sync_page) invalidate vTLB entries but in
> slightly different methods.
>
> Make them use the same method and share the same code.
>
> [...]

Applied to kvm-x86 mmu, thanks! Made a few tweaks, I'll respond to invididual
patches (if I haven't already; and if I forget, I apologize in advance).

[01/14] KVM: x86/mmu: Use 64-bit address to invalidate to fix a subtle bug
https://github.com/kvm-x86/linux/commit/753b43c9d1b7
[02/14] kvm: x86/mmu: Move the check in FNAME(sync_page) as kvm_sync_page_check()
https://github.com/kvm-x86/linux/commit/90e444702a7c
[03/14] kvm: x86/mmu: Check mmu->sync_page pointer in kvm_sync_page_check()
https://github.com/kvm-x86/linux/commit/51dddf6c49b9
[04/14] kvm: x86/mmu: Set mmu->sync_page as NULL for direct paging
https://github.com/kvm-x86/linux/commit/8ef228c20cae
[05/14] kvm: x86/mmu: Move the code out of FNAME(sync_page)'s loop body into mmu.c
https://github.com/kvm-x86/linux/commit/c3c6c9fc5d24
[06/14] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_spte)
https://github.com/kvm-x86/linux/commit/e6722d9211b2
[07/14] KVM: x86/mmu: Sanity check input to kvm_mmu_free_roots()
https://github.com/kvm-x86/linux/commit/f94db0c8b9fa
[08/14] kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_addr()
https://github.com/kvm-x86/linux/commit/cd42853e9530
[09/14] kvm: x86/mmu: Use kvm_mmu_invalidate_addr() in kvm_mmu_invpcid_gva()
https://github.com/kvm-x86/linux/commit/9ebc3f51da6f
[10/14] kvm: x86/mmu: Use kvm_mmu_invalidate_addr() in nested_ept_invalidate_addr()
https://github.com/kvm-x86/linux/commit/2c86c444e275
[11/14] kvm: x86/mmu: Allow the roots to be invalid in FNAME(invlpg)
https://github.com/kvm-x86/linux/commit/ed335278bd12
[12/14] kvm: x86/mmu: Remove FNAME(invlpg) and use FNAME(sync_spte) to update vTLB instead.
https://github.com/kvm-x86/linux/commit/9fd4a4e3a3d9
[13/14] kvm: x86/mmu: Remove @no_dirty_log from FNAME(prefetch_gpte)
https://github.com/kvm-x86/linux/commit/91ca7672dc73
[14/14] kvm: x86/mmu: Skip calling mmu->sync_spte() when the spte is 0
https://github.com/kvm-x86/linux/commit/19ace7d6ca15

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes