2023-02-07 15:56:27

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 0/8] kvm: x86/mmu: Share the same code to invalidate each vTLB entry

From: Lai Jiangshan <[email protected]>

FNAME(invlpg) and FNAME(sync_page) invalidate vTLB entries but in
slightly different methods.

Make them use the same method and share the same code.

[V1]: https://lore.kernel.org/lkml/[email protected]/

Lai Jiangshan (8):
kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_gva()
kvm: x86/mmu: Use kvm_mmu_invalidate_gva() in kvm_mmu_invpcid_gva()
kvm: x86/mmu: Use kvm_mmu_invalidate_gva() in
nested_ept_invalidate_addr()
kvm: x86/mmu: Set mmu->sync_page as NULL for direct paging
kvm: x86/mmu: Move the code out of FNAME(sync_page)'s loop body into
mmu.c
kvm: x86/mmu: Remove FNAME(invlpg)
kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page)
kvm: x86/mmu: Remove @no_dirty_log from FNAME(prefetch_gpte)

arch/x86/include/asm/kvm_host.h | 7 +-
arch/x86/kvm/mmu/mmu.c | 177 +++++++++++++++++----------
arch/x86/kvm/mmu/paging_tmpl.h | 209 +++++++++-----------------------
arch/x86/kvm/vmx/nested.c | 5 +-
arch/x86/kvm/x86.c | 2 +-
5 files changed, 178 insertions(+), 222 deletions(-)

--
2.19.1.6.gb485710b



2023-02-07 15:56:32

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 1/8] kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_gva()

From: Lai Jiangshan <[email protected]>

The @root_hpa for kvm_mmu_invalidate_gva() is called with @mmu->root.hpa
or INVALID_PAGE where @mmu->root.hpa is to invalidate gva for the current
root (the same meaning as KVM_MMU_ROOT_CURRENT) and INVALID_PAGE is to
invalidate gva for all roots (the same meaning as KVM_MMU_ROOTS_ALL).

Change the argument type of kvm_mmu_invalidate_gva() and use
KVM_MMU_ROOT_XXX instead so that we can reuse the function for
kvm_mmu_invpcid_gva() and nested_ept_invalidate_addr() for invalidating
gva for different set of roots.

No fuctionalities changed.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++----------------
arch/x86/kvm/x86.c | 2 +-
3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4d2bc08794e4..81429a5640d6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2026,7 +2026,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len);
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- gva_t gva, hpa_t root_hpa);
+ gva_t gva, unsigned long roots);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c91ee2927dd7..958e8eb977ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5707,10 +5707,12 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- gva_t gva, hpa_t root_hpa)
+ gva_t gva, unsigned long roots)
{
int i;

+ WARN_ON_ONCE(roots & ~KVM_MMU_ROOTS_ALL);
+
/* It's actually a GPA for vcpu->arch.guest_mmu. */
if (mmu != &vcpu->arch.guest_mmu) {
/* INVLPG on a non-canonical address is a NOP according to the SDM. */
@@ -5723,31 +5725,30 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
if (!mmu->invlpg)
return;

- if (root_hpa == INVALID_PAGE) {
+ if ((roots & KVM_MMU_ROOT_CURRENT) && VALID_PAGE(mmu->root.hpa))
mmu->invlpg(vcpu, gva, mmu->root.hpa);

- /*
- * INVLPG is required to invalidate any global mappings for the VA,
- * irrespective of PCID. Since it would take us roughly similar amount
- * of work to determine whether any of the prev_root mappings of the VA
- * is marked global, or to just sync it blindly, so we might as well
- * just always sync it.
- *
- * Mappings not reachable via the current cr3 or the prev_roots will be
- * synced when switching to that cr3, so nothing needs to be done here
- * for them.
- */
- for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
- if (VALID_PAGE(mmu->prev_roots[i].hpa))
- mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
- } else {
- mmu->invlpg(vcpu, gva, root_hpa);
+ for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
+ if ((roots & KVM_MMU_ROOT_PREVIOUS(i)) &&
+ VALID_PAGE(mmu->prev_roots[i].hpa))
+ mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
}
}

void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
{
- kvm_mmu_invalidate_gva(vcpu, vcpu->arch.walk_mmu, gva, INVALID_PAGE);
+ /*
+ * INVLPG is required to invalidate any global mappings for the VA,
+ * irrespective of PCID. Since it would take us roughly similar amount
+ * of work to determine whether any of the prev_root mappings of the VA
+ * is marked global, or to just sync it blindly, so we might as well
+ * just always sync it.
+ *
+ * Mappings not reachable via the current cr3 or the prev_roots will be
+ * synced when switching to that cr3, so nothing needs to be done here
+ * for them.
+ */
+ kvm_mmu_invalidate_gva(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL);
++vcpu->stat.invlpg;
}
EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 508074e47bc0..a81937a8fe0c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -799,7 +799,7 @@ void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
if ((fault->error_code & PFERR_PRESENT_MASK) &&
!(fault->error_code & PFERR_RSVD_MASK))
kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address,
- fault_mmu->root.hpa);
+ KVM_MMU_ROOT_CURRENT);

fault_mmu->inject_page_fault(vcpu, fault);
}
--
2.19.1.6.gb485710b


2023-02-07 15:56:37

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 2/8] kvm: x86/mmu: Use kvm_mmu_invalidate_gva() in kvm_mmu_invpcid_gva()

From: Lai Jiangshan <[email protected]>

Use kvm_mmu_invalidate_gva() instead open calls to mmu->invlpg().

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 958e8eb977ed..8563b52b8bb7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5757,27 +5757,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
{
struct kvm_mmu *mmu = vcpu->arch.mmu;
- bool tlb_flush = false;
+ unsigned long roots = 0;
uint i;

- if (pcid == kvm_get_active_pcid(vcpu)) {
- if (mmu->invlpg)
- mmu->invlpg(vcpu, gva, mmu->root.hpa);
- tlb_flush = true;
- }
+ if (pcid == kvm_get_active_pcid(vcpu))
+ roots |= KVM_MMU_ROOT_CURRENT;

for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
if (VALID_PAGE(mmu->prev_roots[i].hpa) &&
- pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd)) {
- if (mmu->invlpg)
- mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
- tlb_flush = true;
- }
+ pcid == kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd))
+ roots |= KVM_MMU_ROOT_PREVIOUS(i);
}

- if (tlb_flush)
- static_call(kvm_x86_flush_tlb_gva)(vcpu, gva);
-
+ if (roots)
+ kvm_mmu_invalidate_gva(vcpu, mmu, gva, roots);
++vcpu->stat.invlpg;

/*
--
2.19.1.6.gb485710b


2023-02-07 15:56:49

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 3/8] kvm: x86/mmu: Use kvm_mmu_invalidate_gva() in nested_ept_invalidate_addr()

From: Lai Jiangshan <[email protected]>

Use kvm_mmu_invalidate_gva() instead open calls to mmu->invlpg().

No functional change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/vmx/nested.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8563b52b8bb7..e03cf5558773 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5734,6 +5734,7 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
}
}
+EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_gva);

void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
{
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 557b9c468734..f552f3c454b1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -358,6 +358,7 @@ static bool nested_ept_root_matches(hpa_t root_hpa, u64 root_eptp, u64 eptp)
static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
gpa_t addr)
{
+ unsigned long roots = 0;
uint i;
struct kvm_mmu_root_info *cached_root;

@@ -368,8 +369,10 @@ static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,

if (nested_ept_root_matches(cached_root->hpa, cached_root->pgd,
eptp))
- vcpu->arch.mmu->invlpg(vcpu, addr, cached_root->hpa);
+ roots |= KVM_MMU_ROOT_PREVIOUS(i);
}
+ if (roots)
+ kvm_mmu_invalidate_gva(vcpu, vcpu->arch.mmu, addr, roots);
}

static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
--
2.19.1.6.gb485710b


2023-02-07 15:57:05

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 4/8] kvm: x86/mmu: Set mmu->sync_page as NULL for direct paging

From: Lai Jiangshan <[email protected]>

mmu->sync_page for direct paging is never called.

And both mmu->sync_page and mm->invlpg only make sense in shadowpaging.
Setting mmu->sync_page as NULL for direct paging makes it consistent
with mm->invlpg which is set NULL for the case.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e03cf5558773..e30ca652f6ff 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1789,12 +1789,6 @@ static void mark_unsync(u64 *spte)
kvm_mmu_mark_parents_unsync(sp);
}

-static int nonpaging_sync_page(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp)
-{
- return -1;
-}
-
#define KVM_PAGE_ARRAY_NR 16

struct kvm_mmu_pages {
@@ -4469,7 +4463,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
{
context->page_fault = nonpaging_page_fault;
context->gva_to_gpa = nonpaging_gva_to_gpa;
- context->sync_page = nonpaging_sync_page;
+ context->sync_page = NULL;
context->invlpg = NULL;
}

@@ -5157,7 +5151,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
context->cpu_role.as_u64 = cpu_role.as_u64;
context->root_role.word = root_role.word;
context->page_fault = kvm_tdp_page_fault;
- context->sync_page = nonpaging_sync_page;
+ context->sync_page = NULL;
context->invlpg = NULL;
context->get_guest_pgd = get_cr3;
context->get_pdptr = kvm_pdptr_read;
--
2.19.1.6.gb485710b


2023-02-07 15:57:31

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 5/8] kvm: x86/mmu: Move the code out of FNAME(sync_page)'s loop body into mmu.c

From: Lai Jiangshan <[email protected]>

Rename mmu->sync_page to mmu->sync_spte and move the code out
of FNAME(sync_page)'s loop body into mmu.c.

No functionalities change intended.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/mmu/mmu.c | 64 +++++++++++++--
arch/x86/kvm/mmu/paging_tmpl.h | 139 +++++++++++---------------------
3 files changed, 106 insertions(+), 101 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 81429a5640d6..6c64ebfbd778 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -441,8 +441,8 @@ struct kvm_mmu {
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t gva_or_gpa, u64 access,
struct x86_exception *exception);
- int (*sync_page)(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp);
+ int (*sync_spte)(struct kvm_vcpu *vcpu,
+ struct kvm_mmu_page *sp, int i);
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
struct kvm_mmu_root_info root;
union kvm_cpu_role cpu_role;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e30ca652f6ff..c271d0a1ed54 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1908,10 +1908,62 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
&(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)]) \
if ((_sp)->gfn != (_gfn) || !sp_has_gptes(_sp)) {} else

+static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+{
+ union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
+ bool flush = false;
+ int i;
+
+ /*
+ * Ignore various flags when verifying that it's safe to sync a shadow
+ * page using the current MMU context.
+ *
+ * - level: not part of the overall MMU role and will never match as the MMU's
+ * level tracks the root level
+ * - access: updated based on the new guest PTE
+ * - quadrant: not part of the overall MMU role (similar to level)
+ */
+ const union kvm_mmu_page_role sync_role_ign = {
+ .level = 0xf,
+ .access = 0x7,
+ .quadrant = 0x3,
+ .passthrough = 0x1,
+ };
+
+ /*
+ * Direct pages can never be unsync, and KVM should never attempt to
+ * sync a shadow page for a different MMU context, e.g. if the role
+ * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
+ * reserved bits checks will be wrong, etc...
+ */
+ if (WARN_ON_ONCE(sp->role.direct ||
+ (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
+ return -1;
+
+ for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
+ int ret = vcpu->arch.mmu->sync_spte(vcpu, sp, i);
+
+ if (ret < -1)
+ return -1;
+ flush |= ret;
+ }
+
+ /*
+ * Note, any flush is purely for KVM's correctness, e.g. when dropping
+ * an existing SPTE or clearing W/A/D bits to ensure an mmu_notifier
+ * unmap or dirty logging event doesn't fail to flush. The guest is
+ * responsible for flushing the TLB to ensure any changes in protection
+ * bits are recognized, i.e. until the guest flushes or page faults on
+ * a relevant address, KVM is architecturally allowed to let vCPUs use
+ * cached translations with the old protection bits.
+ */
+ return flush;
+}
+
static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
struct list_head *invalid_list)
{
- int ret = vcpu->arch.mmu->sync_page(vcpu, sp);
+ int ret = __kvm_sync_page(vcpu, sp);

if (ret < 0)
kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list);
@@ -4463,7 +4515,7 @@ static void nonpaging_init_context(struct kvm_mmu *context)
{
context->page_fault = nonpaging_page_fault;
context->gva_to_gpa = nonpaging_gva_to_gpa;
- context->sync_page = NULL;
+ context->sync_spte = NULL;
context->invlpg = NULL;
}

@@ -5054,7 +5106,7 @@ static void paging64_init_context(struct kvm_mmu *context)
{
context->page_fault = paging64_page_fault;
context->gva_to_gpa = paging64_gva_to_gpa;
- context->sync_page = paging64_sync_page;
+ context->sync_spte = paging64_sync_spte;
context->invlpg = paging64_invlpg;
}

@@ -5062,7 +5114,7 @@ static void paging32_init_context(struct kvm_mmu *context)
{
context->page_fault = paging32_page_fault;
context->gva_to_gpa = paging32_gva_to_gpa;
- context->sync_page = paging32_sync_page;
+ context->sync_spte = paging32_sync_spte;
context->invlpg = paging32_invlpg;
}

@@ -5151,7 +5203,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
context->cpu_role.as_u64 = cpu_role.as_u64;
context->root_role.word = root_role.word;
context->page_fault = kvm_tdp_page_fault;
- context->sync_page = NULL;
+ context->sync_spte = NULL;
context->invlpg = NULL;
context->get_guest_pgd = get_cr3;
context->get_pdptr = kvm_pdptr_read;
@@ -5283,7 +5335,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,

context->page_fault = ept_page_fault;
context->gva_to_gpa = ept_gva_to_gpa;
- context->sync_page = ept_sync_page;
+ context->sync_spte = ept_sync_spte;
context->invlpg = ept_invlpg;

update_permission_bitmask(context, true);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 57f0b75c80f9..5ab9e974fdac 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -977,114 +977,67 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
* can't change unless all sptes pointing to it are nuked first.
*
* Returns
- * < 0: the sp should be zapped
- * 0: the sp is synced and no tlb flushing is required
- * > 0: the sp is synced and tlb flushing is required
+ * < 0: failed to sync spte
+ * 0: the spte is synced and no tlb flushing is required
+ * > 0: the spte is synced and tlb flushing is required
*/
-static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
+static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int i)
{
- union kvm_mmu_page_role root_role = vcpu->arch.mmu->root_role;
- int i;
bool host_writable;
gpa_t first_pte_gpa;
- bool flush = false;
-
- /*
- * Ignore various flags when verifying that it's safe to sync a shadow
- * page using the current MMU context.
- *
- * - level: not part of the overall MMU role and will never match as the MMU's
- * level tracks the root level
- * - access: updated based on the new guest PTE
- * - quadrant: not part of the overall MMU role (similar to level)
- */
- const union kvm_mmu_page_role sync_role_ign = {
- .level = 0xf,
- .access = 0x7,
- .quadrant = 0x3,
- .passthrough = 0x1,
- };
+ u64 *sptep, spte;
+ struct kvm_memory_slot *slot;
+ unsigned pte_access;
+ pt_element_t gpte;
+ gpa_t pte_gpa;
+ gfn_t gfn;

- /*
- * Direct pages can never be unsync, and KVM should never attempt to
- * sync a shadow page for a different MMU context, e.g. if the role
- * differs then the memslot lookup (SMM vs. non-SMM) will be bogus, the
- * reserved bits checks will be wrong, etc...
- */
- if (WARN_ON_ONCE(sp->role.direct ||
- (sp->role.word ^ root_role.word) & ~sync_role_ign.word))
- return -1;
+ if (!sp->spt[i])
+ return 0;

first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
+ pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);

- for (i = 0; i < SPTE_ENT_PER_PAGE; i++) {
- u64 *sptep, spte;
- struct kvm_memory_slot *slot;
- unsigned pte_access;
- pt_element_t gpte;
- gpa_t pte_gpa;
- gfn_t gfn;
-
- if (!sp->spt[i])
- continue;
-
- pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
-
- if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
- sizeof(pt_element_t)))
- return -1;
-
- if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
- flush = true;
- continue;
- }
+ if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
+ sizeof(pt_element_t)))
+ return -1;

- gfn = gpte_to_gfn(gpte);
- pte_access = sp->role.access;
- pte_access &= FNAME(gpte_access)(gpte);
- FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);
+ if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte))
+ return 1;

- if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
- continue;
+ gfn = gpte_to_gfn(gpte);
+ pte_access = sp->role.access;
+ pte_access &= FNAME(gpte_access)(gpte);
+ FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);

- /*
- * Drop the SPTE if the new protections would result in a RWX=0
- * SPTE or if the gfn is changing. The RWX=0 case only affects
- * EPT with execute-only support, i.e. EPT without an effective
- * "present" bit, as all other paging modes will create a
- * read-only SPTE if pte_access is zero.
- */
- if ((!pte_access && !shadow_present_mask) ||
- gfn != kvm_mmu_page_get_gfn(sp, i)) {
- drop_spte(vcpu->kvm, &sp->spt[i]);
- flush = true;
- continue;
- }
+ if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access))
+ return 0;

- /* Update the shadowed access bits in case they changed. */
- kvm_mmu_page_set_access(sp, i, pte_access);
+ /*
+ * Drop the SPTE if the new protections would result in a RWX=0
+ * SPTE or if the gfn is changing. The RWX=0 case only affects
+ * EPT with execute-only support, i.e. EPT without an effective
+ * "present" bit, as all other paging modes will create a
+ * read-only SPTE if pte_access is zero.
+ */
+ if ((!pte_access && !shadow_present_mask) ||
+ gfn != kvm_mmu_page_get_gfn(sp, i)) {
+ drop_spte(vcpu->kvm, &sp->spt[i]);
+ return 1;
+ }

- sptep = &sp->spt[i];
- spte = *sptep;
- host_writable = spte & shadow_host_writable_mask;
- slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
- make_spte(vcpu, sp, slot, pte_access, gfn,
- spte_to_pfn(spte), spte, true, false,
- host_writable, &spte);
+ /* Update the shadowed access bits in case they changed. */
+ kvm_mmu_page_set_access(sp, i, pte_access);

- flush |= mmu_spte_update(sptep, spte);
- }
+ sptep = &sp->spt[i];
+ spte = *sptep;
+ host_writable = spte & shadow_host_writable_mask;
+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+ make_spte(vcpu, sp, slot, pte_access, gfn,
+ spte_to_pfn(spte), spte, true, false,
+ host_writable, &spte);

- /*
- * Note, any flush is purely for KVM's correctness, e.g. when dropping
- * an existing SPTE or clearing W/A/D bits to ensure an mmu_notifier
- * unmap or dirty logging event doesn't fail to flush. The guest is
- * responsible for flushing the TLB to ensure any changes in protection
- * bits are recognized, i.e. until the guest flushes or page faults on
- * a relevant address, KVM is architecturally allowed to let vCPUs use
- * cached translations with the old protection bits.
- */
- return flush;
+ return mmu_spte_update(sptep, spte);
}

#undef pt_element_t
--
2.19.1.6.gb485710b


2023-02-07 15:57:42

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 6/8] kvm: x86/mmu: Remove FNAME(invlpg)

From: Lai Jiangshan <[email protected]>

Use FNAME(sync_spte) to invalidate vTLB instead.

In hardware TLB, invalidating TLB entries means the translations are
removed from the TLB.

In kvm shadowed vTLB, vTLB translations (combinations of shadowpaging
and hardware TLB) are usually kept as long as they are clean when flushing
TLB of an address space (a PCID or all) with the help of write-protections,
sp->unsync, kvm_sync_page().

But a single vTLB entry is always removed in FNAME(invlpg) if sp->unsync
and then prefetched. And clean vTLB entry is always removed and a new one
is recreated. The new one might be failed to be created or different
(with more permission) and a remote flush is always requred.

Above all, it is duplicate implementation of FNAME(sync_spte) to invalidate
a vTLB entry.

Use FNAME(sync_spte) to share the code which has a slight semantics
changed: clean vTLB entry is kept.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu/mmu.c | 48 +++++++++++++++++----------
arch/x86/kvm/mmu/paging_tmpl.h | 58 ---------------------------------
3 files changed, 31 insertions(+), 76 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6c64ebfbd778..86ae8f6419f1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -443,7 +443,6 @@ struct kvm_mmu {
struct x86_exception *exception);
int (*sync_spte)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, int i);
- void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
struct kvm_mmu_root_info root;
union kvm_cpu_role cpu_role;
union kvm_mmu_page_role root_role;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c271d0a1ed54..3880f98a9cb6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1073,14 +1073,6 @@ static struct kvm_rmap_head *gfn_to_rmap(gfn_t gfn, int level,
return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
}

-static bool rmap_can_add(struct kvm_vcpu *vcpu)
-{
- struct kvm_mmu_memory_cache *mc;
-
- mc = &vcpu->arch.mmu_pte_list_desc_cache;
- return kvm_mmu_memory_cache_nr_free_objects(mc);
-}
-
static void rmap_remove(struct kvm *kvm, u64 *spte)
{
struct kvm_memslots *slots;
@@ -4516,7 +4508,6 @@ static void nonpaging_init_context(struct kvm_mmu *context)
context->page_fault = nonpaging_page_fault;
context->gva_to_gpa = nonpaging_gva_to_gpa;
context->sync_spte = NULL;
- context->invlpg = NULL;
}

static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
@@ -5107,7 +5098,6 @@ static void paging64_init_context(struct kvm_mmu *context)
context->page_fault = paging64_page_fault;
context->gva_to_gpa = paging64_gva_to_gpa;
context->sync_spte = paging64_sync_spte;
- context->invlpg = paging64_invlpg;
}

static void paging32_init_context(struct kvm_mmu *context)
@@ -5115,7 +5105,6 @@ static void paging32_init_context(struct kvm_mmu *context)
context->page_fault = paging32_page_fault;
context->gva_to_gpa = paging32_gva_to_gpa;
context->sync_spte = paging32_sync_spte;
- context->invlpg = paging32_invlpg;
}

static union kvm_cpu_role
@@ -5204,7 +5193,6 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu,
context->root_role.word = root_role.word;
context->page_fault = kvm_tdp_page_fault;
context->sync_spte = NULL;
- context->invlpg = NULL;
context->get_guest_pgd = get_cr3;
context->get_pdptr = kvm_pdptr_read;
context->inject_page_fault = kvm_inject_page_fault;
@@ -5336,7 +5324,6 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
context->page_fault = ept_page_fault;
context->gva_to_gpa = ept_gva_to_gpa;
context->sync_spte = ept_sync_spte;
- context->invlpg = ept_invlpg;

update_permission_bitmask(context, true);
context->pkru_mask = 0;
@@ -5377,7 +5364,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu,
* L2 page tables are never shadowed, so there is no need to sync
* SPTEs.
*/
- g_context->invlpg = NULL;
+ g_context->sync_spte = NULL;

/*
* Note that arch.mmu->gva_to_gpa translates l2_gpa to l1_gpa using
@@ -5752,6 +5739,33 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

+static void __kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ gva_t gva, hpa_t root_hpa)
+{
+ struct kvm_shadow_walk_iterator iterator;
+
+ vcpu_clear_mmio_info(vcpu, gva);
+
+ write_lock(&vcpu->kvm->mmu_lock);
+ for_each_shadow_entry_using_root(vcpu, root_hpa, gva, iterator) {
+ struct kvm_mmu_page *sp = sptep_to_sp(iterator.sptep);
+
+ if (sp->unsync && *iterator.sptep) {
+ gfn_t gfn = kvm_mmu_page_get_gfn(sp, iterator.index);
+ int ret = mmu->sync_spte(vcpu, sp, iterator.index);
+
+ if (ret < 0)
+ mmu_page_zap_pte(vcpu->kvm, sp, iterator.sptep, NULL);
+ if (ret)
+ kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);
+ }
+
+ if (!sp->unsync_children)
+ break;
+ }
+ write_unlock(&vcpu->kvm->mmu_lock);
+}
+
void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gva_t gva, unsigned long roots)
{
@@ -5768,16 +5782,16 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
static_call(kvm_x86_flush_tlb_gva)(vcpu, gva);
}

- if (!mmu->invlpg)
+ if (!mmu->sync_spte)
return;

if ((roots & KVM_MMU_ROOT_CURRENT) && VALID_PAGE(mmu->root.hpa))
- mmu->invlpg(vcpu, gva, mmu->root.hpa);
+ __kvm_mmu_invalidate_gva(vcpu, mmu, gva, mmu->root.hpa);

for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
if ((roots & KVM_MMU_ROOT_PREVIOUS(i)) &&
VALID_PAGE(mmu->prev_roots[i].hpa))
- mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
+ __kvm_mmu_invalidate_gva(vcpu, mmu, gva, mmu->prev_roots[i].hpa);
}
}
EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_gva);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5ab9e974fdac..0031fe22af3d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -887,64 +887,6 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
}

-static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
-{
- struct kvm_shadow_walk_iterator iterator;
- struct kvm_mmu_page *sp;
- u64 old_spte;
- int level;
- u64 *sptep;
-
- vcpu_clear_mmio_info(vcpu, gva);
-
- /*
- * No need to check return value here, rmap_can_add() can
- * help us to skip pte prefetch later.
- */
- mmu_topup_memory_caches(vcpu, true);
-
- if (!VALID_PAGE(root_hpa)) {
- WARN_ON(1);
- return;
- }
-
- write_lock(&vcpu->kvm->mmu_lock);
- for_each_shadow_entry_using_root(vcpu, root_hpa, gva, iterator) {
- level = iterator.level;
- sptep = iterator.sptep;
-
- sp = sptep_to_sp(sptep);
- old_spte = *sptep;
- if (is_last_spte(old_spte, level)) {
- pt_element_t gpte;
- gpa_t pte_gpa;
-
- if (!sp->unsync)
- break;
-
- pte_gpa = FNAME(get_level1_sp_gpa)(sp);
- pte_gpa += spte_index(sptep) * sizeof(pt_element_t);
-
- mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
- if (is_shadow_present_pte(old_spte))
- kvm_flush_remote_tlbs_sptep(vcpu->kvm, sptep);
-
- if (!rmap_can_add(vcpu))
- break;
-
- if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte,
- sizeof(pt_element_t)))
- break;
-
- FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false);
- }
-
- if (!sp->unsync_children)
- break;
- }
- write_unlock(&vcpu->kvm->mmu_lock);
-}
-
/* Note, @addr is a GPA when gva_to_gpa() translates an L2 GPA to an L1 GPA. */
static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t addr, u64 access,
--
2.19.1.6.gb485710b


2023-02-07 15:57:51

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 7/8] kvm: x86/mmu: Reduce the update to the spte in FNAME(sync_page)

From: Lai Jiangshan <[email protected]>

Sometimes when the guest updates its pagetable, it adds only new gptes
to it without changing any existed one, so there is no point to update
the sptes for these existed gptes.

Also when the sptes for these unchanged gptes are updated, the AD
bits are also removed since make_spte() is called with prefetch=true
which might result unneeded TLB flushing.

Do nothing if the gpte's permissions are unchanged.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0031fe22af3d..fca5ce349d9d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -967,6 +967,11 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
drop_spte(vcpu->kvm, &sp->spt[i]);
return 1;
}
+ /*
+ * Do nothing if the permissions are unchanged.
+ */
+ if (kvm_mmu_page_get_access(sp, i) == pte_access)
+ return 0;

/* Update the shadowed access bits in case they changed. */
kvm_mmu_page_set_access(sp, i, pte_access);
--
2.19.1.6.gb485710b


2023-02-07 15:57:56

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 8/8] kvm: x86/mmu: Remove @no_dirty_log from FNAME(prefetch_gpte)

From: Lai Jiangshan <[email protected]>

FNAME(prefetch_gpte) is always called with @no_dirty_log=true.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index fca5ce349d9d..e04950015dc4 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -519,7 +519,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,

static bool
FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
- u64 *spte, pt_element_t gpte, bool no_dirty_log)
+ u64 *spte, pt_element_t gpte)
{
struct kvm_memory_slot *slot;
unsigned pte_access;
@@ -535,8 +535,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
pte_access = sp->role.access & FNAME(gpte_access)(gpte);
FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte);

- slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn,
- no_dirty_log && (pte_access & ACC_WRITE_MASK));
+ slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, pte_access & ACC_WRITE_MASK);
if (!slot)
return false;

@@ -605,7 +604,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
if (is_shadow_present_pte(*spte))
continue;

- if (!FNAME(prefetch_gpte)(vcpu, sp, spte, gptep[i], true))
+ if (!FNAME(prefetch_gpte)(vcpu, sp, spte, gptep[i]))
break;
}
}
--
2.19.1.6.gb485710b


2023-02-07 16:18:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V2 4/8] kvm: x86/mmu: Set mmu->sync_page as NULL for direct paging

On 2/7/23 16:57, Lai Jiangshan wrote:
> From: Lai Jiangshan<[email protected]>
>
> mmu->sync_page for direct paging is never called.
>
> And both mmu->sync_page and mm->invlpg only make sense in shadowpaging.
> Setting mmu->sync_page as NULL for direct paging makes it consistent
> with mm->invlpg which is set NULL for the case.
>
> Signed-off-by: Lai Jiangshan<[email protected]>

I'd rather have a WARN_ON_ONCE in kvm_sync_page(), otherwise looks good.

Paolo


2023-02-10 00:50:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V2 1/8] kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_gva()

On Tue, Feb 07, 2023, Lai Jiangshan wrote:
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c91ee2927dd7..958e8eb977ed 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5707,10 +5707,12 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
>
> void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> - gva_t gva, hpa_t root_hpa)
> + gva_t gva, unsigned long roots)
> {
> int i;
>
> + WARN_ON_ONCE(roots & ~KVM_MMU_ROOTS_ALL);

This does nothing since KVM_MMU_ROOTS_ALL is ~0ul. What I wanted is below, can
you slot that it? Compile tested only.

--
From: Sean Christopherson <[email protected]>
Date: Thu, 9 Feb 2023 16:37:43 -0800
Subject: [PATCH 2/8] KVM: x86/mmu: Sanity check input to kvm_mmu_free_roots()

Tweak KVM_MMU_ROOTS_ALL to precisely cover all current+previous root
flags, and add a sanity in kvm_mmu_free_roots() to verify that the set
of roots to free doesn't stray outside KVM_MMU_ROOTS_ALL.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++----
arch/x86/kvm/mmu/mmu.c | 2 ++
2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4d2bc08794e4..01d34f7d009d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -422,6 +422,10 @@ struct kvm_mmu_root_info {

#define KVM_MMU_NUM_PREV_ROOTS 3

+#define KVM_MMU_ROOT_CURRENT BIT(0)
+#define KVM_MMU_ROOT_PREVIOUS(i) BIT(1+i)
+#define KVM_MMU_ROOTS_ALL (BIT(1 + KVM_MMU_NUM_PREV_ROOTS) - 1)
+
#define KVM_HAVE_MMU_RWLOCK

struct kvm_mmu_page;
@@ -1978,10 +1982,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
return !!(*irq_state);
}

-#define KVM_MMU_ROOT_CURRENT BIT(0)
-#define KVM_MMU_ROOT_PREVIOUS(i) BIT(1+i)
-#define KVM_MMU_ROOTS_ALL (~0UL)
-
int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int irq_source_id, int level);
void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c91ee2927dd7..896abf9c1126 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3513,6 +3513,8 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
LIST_HEAD(invalid_list);
bool free_active_root;

+ WARN_ON_ONCE(roots_to_free & ~KVM_MMU_ROOTS_ALL);
+
BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >= BITS_PER_LONG);

/* Before acquiring the MMU lock, see if we need to do any real work. */
--


2023-02-10 00:53:09

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V2 3/8] kvm: x86/mmu: Use kvm_mmu_invalidate_gva() in nested_ept_invalidate_addr()

On Tue, Feb 07, 2023, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> Use kvm_mmu_invalidate_gva() instead open calls to mmu->invlpg().
>
> No functional change intended.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 1 +
> arch/x86/kvm/vmx/nested.c | 5 ++++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 8563b52b8bb7..e03cf5558773 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5734,6 +5734,7 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
> }
> }
> +EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_gva);
>
> void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
> {
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 557b9c468734..f552f3c454b1 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -358,6 +358,7 @@ static bool nested_ept_root_matches(hpa_t root_hpa, u64 root_eptp, u64 eptp)
> static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
> gpa_t addr)
> {
> + unsigned long roots = 0;
> uint i;
> struct kvm_mmu_root_info *cached_root;
>
> @@ -368,8 +369,10 @@ static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp,
>
> if (nested_ept_root_matches(cached_root->hpa, cached_root->pgd,
> eptp))
> - vcpu->arch.mmu->invlpg(vcpu, addr, cached_root->hpa);
> + roots |= KVM_MMU_ROOT_PREVIOUS(i);
> }
> + if (roots)
> + kvm_mmu_invalidate_gva(vcpu, vcpu->arch.mmu, addr, roots);

This is subtly buggy. On 32-bit kernels, the upper 32 bits of the GPA will get
dropped. "addr" here is a gpa_t, i.e. u64, whereas kvm_mmu_invalidate_gva() takes
a gva_t, i.e. unsigned long.

FNAME(gva_to_gpa) just eats the misnomer, but I think I'd rather rename this path,
e.g. kvm_mmu_invalidate_addr()?

2023-02-10 00:59:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V2 4/8] kvm: x86/mmu: Set mmu->sync_page as NULL for direct paging

On Tue, Feb 07, 2023, Paolo Bonzini wrote:
> On 2/7/23 16:57, Lai Jiangshan wrote:
> > From: Lai Jiangshan<[email protected]>
> >
> > mmu->sync_page for direct paging is never called.
> >
> > And both mmu->sync_page and mm->invlpg only make sense in shadowpaging.
> > Setting mmu->sync_page as NULL for direct paging makes it consistent
> > with mm->invlpg which is set NULL for the case.
> >
> > Signed-off-by: Lai Jiangshan<[email protected]>
>
> I'd rather have a WARN_ON_ONCE in kvm_sync_page(), otherwise looks good.

Agreed, there's even a WARN_ON_ONCE() to piggyback:

if (WARN_ON_ONCE(sp->role.direct || !vcpu->arch.mmu->sync_spte ||
(sp->role.word ^ root_role.word) & ~sync_role_ign.word))
return -1;

Of course, the direct + match checks on the role effectively do the same, but I
would rather be explicit.

2023-02-10 01:11:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V2 6/8] kvm: x86/mmu: Remove FNAME(invlpg)

On Tue, Feb 07, 2023, Lai Jiangshan wrote:
> Use FNAME(sync_spte) to share the code which has a slight semantics
> changed: clean vTLB entry is kept.

...

> +static void __kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> + gva_t gva, hpa_t root_hpa)
> +{
> + struct kvm_shadow_walk_iterator iterator;
> +
> + vcpu_clear_mmio_info(vcpu, gva);
> +
> + write_lock(&vcpu->kvm->mmu_lock);
> + for_each_shadow_entry_using_root(vcpu, root_hpa, gva, iterator) {
> + struct kvm_mmu_page *sp = sptep_to_sp(iterator.sptep);
> +
> + if (sp->unsync && *iterator.sptep) {

Please make the !0 change in a separate patch. It took me a while to connect the
dots, and to also understand what I suspect is a major motivation: sync_spte()
already has this check, i.e. the change is happening regardless, so might as well
avoid the indirect branch.

> + gfn_t gfn = kvm_mmu_page_get_gfn(sp, iterator.index);
> + int ret = mmu->sync_spte(vcpu, sp, iterator.index);
> +
> + if (ret < 0)
> + mmu_page_zap_pte(vcpu->kvm, sp, iterator.sptep, NULL);
> + if (ret)
> + kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);

Why open code kvm_flush_remote_tlbs_sptep()? Does it actually shave enough
cycles to be visible?

If open coding is really justified, can you rebase on one of the two branches?
And then change this to kvm_flush_remote_tlbs_gfn().

https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/mmu

2023-02-16 04:16:33

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2 6/8] kvm: x86/mmu: Remove FNAME(invlpg)

On Fri, Feb 10, 2023 at 9:11 AM Sean Christopherson <[email protected]> wrote:

>
> > + gfn_t gfn = kvm_mmu_page_get_gfn(sp, iterator.index);
> > + int ret = mmu->sync_spte(vcpu, sp, iterator.index);
> > +
> > + if (ret < 0)
> > + mmu_page_zap_pte(vcpu->kvm, sp, iterator.sptep, NULL);
> > + if (ret)
> > + kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);
>
> Why open code kvm_flush_remote_tlbs_sptep()? Does it actually shave enough
> cycles to be visible?


Although I have read the code of sync_page() many times,
I don't know why I had been having the assumption that it can possibly
change the sp->gfns[] (now sp->shadowed_translation[]).

I will add the following comments before calling kvm_mmu_page_get_gfn():

Get the gfn beforehand for later flushing. Although mmu->sync_spte()
doesn't change it, but just avoid dependence.

Or I will use kvm_flush_remote_tlbs_sptep() with comments stating
that the gfn will not be changed.

>
> If open coding is really justified, can you rebase on one of the two branches?
> And then change this to kvm_flush_remote_tlbs_gfn().
>
> https://github.com/kvm-x86/linux/tree/next
> https://github.com/kvm-x86/linux/tree/mmu

The code was based on https://github.com/kvm-x86/linux/tree/mmu.

Thanks
Lai