LinuxLists.cc - [PATCH 00/12] KVM: X86: misc fixes and cleanup

2021-11-24 13:43:55

Subject: [PATCH 00/12] KVM: X86: misc fixes and cleanup

From: Lai Jiangshan <[email protected]>

This pacheset has misc fixes and cleanups with loose dependence to each
other. It includes a prevous patch that inverts and renames
gpte_is_8_bytes.

Ohter patches focus on the usage of vcpu->arch.walk_mmu and
vcpu->arch.mmu, root_level, lpage_level.

Patch 1,4,5,6,7 fixes something, but I don't think path4,5,6,7 are worth
in stable tree.

Lai Jiangshan (12):
KVM: X86: Fix when shadow_root_level=5 && guest root_level<4
KVM: X86: Add parameter struct kvm_mmu *mmu into mmu->gva_to_gpa()
KVM: X86: Remove mmu->translate_gpa
KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg()
KVM: X86: Change the type of a parameter of kvm_mmu_invalidate_gva()
and mmu->invlpg() to gpa_t
KVM: X86: Add huge_page_level to __reset_rsvds_bits_mask_ept()
KVM: X86: Add parameter huge_page_level to kvm_init_shadow_ept_mmu()
KVM: VMX: Use ept_caps_to_lpage_level() in hardware_setup()
KVM: X86: Rename gpte_is_8_bytes to has_4_byte_gpte and invert the
direction
KVM: X86: Remove mmu parameter from load_pdptrs()
KVM: X86: Check root_level only in fast_pgd_switch()
KVM: X86: Walk shadow page starting with shadow_root_level

Documentation/virt/kvm/mmu.rst | 8 +--
arch/x86/include/asm/kvm_host.h | 23 ++++----
arch/x86/kvm/mmu.h | 16 ++++-
arch/x86/kvm/mmu/mmu.c | 100 +++++++++++++++-----------------
arch/x86/kvm/mmu/mmu_audit.c | 5 +-
arch/x86/kvm/mmu/mmutrace.h | 2 +-
arch/x86/kvm/mmu/paging_tmpl.h | 55 ++++--------------
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
arch/x86/kvm/svm/nested.c | 4 +-
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/capabilities.h | 9 +++
arch/x86/kvm/vmx/nested.c | 12 ++--
arch/x86/kvm/vmx/vmx.c | 12 +---
arch/x86/kvm/x86.c | 55 +++++++++++-------
14 files changed, 145 insertions(+), 160 deletions(-)

--
2.19.1.6.gb485710b

2021-11-24 13:44:03

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 03/12] KVM: X86: Remove mmu->translate_gpa

From: Lai Jiangshan <[email protected]>

Reduce an indirect function call (retpoline) and some intialization
code.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ----
arch/x86/kvm/mmu.h | 13 +++++++++++++
arch/x86/kvm/mmu/mmu.c | 11 +----------
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++----
arch/x86/kvm/x86.c | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8419aff7136f..dd16fdedc0e8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -429,8 +429,6 @@ struct kvm_mmu {
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gpa_t gva_or_gpa, u32 access,
struct x86_exception *exception);
- gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
- struct x86_exception *exception);
int (*sync_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp);
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
@@ -1764,8 +1762,6 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
ulong roots_to_free);
void kvm_mmu_free_guest_mode_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu);
-gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
- struct x86_exception *exception);
gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);
gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva,
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 9ae6168d381e..97e13c2988b3 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -351,4 +351,17 @@ static inline void kvm_update_page_stats(struct kvm *kvm, int level, int count)
{
atomic64_add(count, &kvm->stat.pages[level - 1]);
}
+
+gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
+ struct x86_exception *exception);
+
+static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *mmu,
+ gpa_t gpa, u32 access,
+ struct x86_exception *exception)
+{
+ if (mmu != &vcpu->arch.nested_mmu)
+ return gpa;
+ return translate_nested_gpa(vcpu, gpa, access, exception);
+}
#endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3e00a54e23b6..f3aa91db4a7e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -335,12 +335,6 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
return likely(kvm_gen == spte_gen);
}

-static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access,
- struct x86_exception *exception)
-{
- return gpa;
-}
-
static int is_cpuid_PSE36(void)
{
return 1;
@@ -3734,7 +3728,7 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
{
if (exception)
exception->error_code = 0;
- return mmu->translate_gpa(vcpu, vaddr, access, exception);
+ return kvm_translate_gpa(vcpu, mmu, vaddr, access, exception);
}

static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct)
@@ -5487,7 +5481,6 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)

mmu->root_hpa = INVALID_PAGE;
mmu->root_pgd = 0;
- mmu->translate_gpa = translate_gpa;
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;

@@ -5549,8 +5542,6 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu = &vcpu->arch.root_mmu;
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;

- vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
-
ret = __kvm_mmu_create(vcpu, &vcpu->arch.guest_mmu);
if (ret)
return ret;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4e203fe703b0..5c78300fc7d9 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -403,9 +403,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
walker->table_gfn[walker->level - 1] = table_gfn;
walker->pte_gpa[walker->level - 1] = pte_gpa;

- real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
- nested_access,
- &walker->fault);
+ real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(table_gfn),
+ nested_access, &walker->fault);

/*
* FIXME: This can happen if emulation (for of an INS/OUTS
@@ -467,7 +466,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
if (PTTYPE == 32 && walker->level > PG_LEVEL_4K && is_cpuid_PSE36())
gfn += pse36_gfn_delta(pte);

- real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn), access, &walker->fault);
+ real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(gfn), access, &walker->fault);
if (real_gpa == UNMAPPED_GVA)
return 0;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 808786677b2b..25e278ba4666 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -810,8 +810,8 @@ int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
* If the MMU is nested, CR3 holds an L2 GPA and needs to be translated
* to an L1 GPA.
*/
- real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(pdpt_gfn),
- PFERR_USER_MASK | PFERR_WRITE_MASK, NULL);
+ real_gpa = kvm_translate_gpa(vcpu, mmu, gfn_to_gpa(pdpt_gfn),
+ PFERR_USER_MASK | PFERR_WRITE_MASK, NULL);
if (real_gpa == UNMAPPED_GVA)
return 0;

--
2.19.1.6.gb485710b

2021-11-24 13:44:12

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 01/12] KVM: X86: Fix when shadow_root_level=5 && guest root_level<4

From: Lai Jiangshan <[email protected]>

If the is an L1 with nNPT in 32bit, the shadow walk starts with
pae_root.

Fixes: a717a780fc4e ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host)
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6948f2d696c3..701c67c55239 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2171,10 +2171,10 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato
iterator->shadow_addr = root;
iterator->level = vcpu->arch.mmu->shadow_root_level;

- if (iterator->level == PT64_ROOT_4LEVEL &&
+ if (iterator->level >= PT64_ROOT_4LEVEL &&
vcpu->arch.mmu->root_level < PT64_ROOT_4LEVEL &&
!vcpu->arch.mmu->direct_map)
- --iterator->level;
+ iterator->level = PT32E_ROOT_LEVEL;

if (iterator->level == PT32E_ROOT_LEVEL) {
/*
--
2.19.1.6.gb485710b

2021-11-24 13:44:26

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 06/12] KVM: X86: Add huge_page_level to __reset_rsvds_bits_mask_ept()

From: Lai Jiangshan <[email protected]>

Bit 7 on pte depends on the level of supported large page.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d3bad4ae72fb..8a371d6c2291 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4339,22 +4339,28 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,

static void
__reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
- u64 pa_bits_rsvd, bool execonly)
+ u64 pa_bits_rsvd, bool execonly, int huge_page_level)
{
u64 high_bits_rsvd = pa_bits_rsvd & rsvd_bits(0, 51);
+ u64 large_1g_rsvd = 0, large_2m_rsvd = 0;
u64 bad_mt_xwr;

+ if (huge_page_level < PG_LEVEL_1G)
+ large_1g_rsvd = rsvd_bits(7, 7);
+ if (huge_page_level < PG_LEVEL_2M)
+ large_2m_rsvd = rsvd_bits(7, 7);
+
rsvd_check->rsvd_bits_mask[0][4] = high_bits_rsvd | rsvd_bits(3, 7);
rsvd_check->rsvd_bits_mask[0][3] = high_bits_rsvd | rsvd_bits(3, 7);
- rsvd_check->rsvd_bits_mask[0][2] = high_bits_rsvd | rsvd_bits(3, 6);
- rsvd_check->rsvd_bits_mask[0][1] = high_bits_rsvd | rsvd_bits(3, 6);
+ rsvd_check->rsvd_bits_mask[0][2] = high_bits_rsvd | rsvd_bits(3, 6) | large_1g_rsvd;
+ rsvd_check->rsvd_bits_mask[0][1] = high_bits_rsvd | rsvd_bits(3, 6) | large_2m_rsvd;
rsvd_check->rsvd_bits_mask[0][0] = high_bits_rsvd;

/* large page */
rsvd_check->rsvd_bits_mask[1][4] = rsvd_check->rsvd_bits_mask[0][4];
rsvd_check->rsvd_bits_mask[1][3] = rsvd_check->rsvd_bits_mask[0][3];
- rsvd_check->rsvd_bits_mask[1][2] = high_bits_rsvd | rsvd_bits(12, 29);
- rsvd_check->rsvd_bits_mask[1][1] = high_bits_rsvd | rsvd_bits(12, 20);
+ rsvd_check->rsvd_bits_mask[1][2] = high_bits_rsvd | rsvd_bits(12, 29) | large_1g_rsvd;
+ rsvd_check->rsvd_bits_mask[1][1] = high_bits_rsvd | rsvd_bits(12, 20) | large_2m_rsvd;
rsvd_check->rsvd_bits_mask[1][0] = rsvd_check->rsvd_bits_mask[0][0];

bad_mt_xwr = 0xFFull << (2 * 8); /* bits 3..5 must not be 2 */
@@ -4370,10 +4376,11 @@ __reset_rsvds_bits_mask_ept(struct rsvd_bits_validate *rsvd_check,
}

static void reset_rsvds_bits_mask_ept(struct kvm_vcpu *vcpu,
- struct kvm_mmu *context, bool execonly)
+ struct kvm_mmu *context, bool execonly, int huge_page_level)
{
__reset_rsvds_bits_mask_ept(&context->guest_rsvd_check,
- vcpu->arch.reserved_gpa_bits, execonly);
+ vcpu->arch.reserved_gpa_bits, execonly,
+ huge_page_level);
}

static inline u64 reserved_hpa_bits(void)
@@ -4449,7 +4456,8 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
false, true);
else
__reset_rsvds_bits_mask_ept(shadow_zero_check,
- reserved_hpa_bits(), false);
+ reserved_hpa_bits(), false,
+ max_huge_page_level);

if (!shadow_me_mask)
return;
@@ -4469,7 +4477,8 @@ reset_ept_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
struct kvm_mmu *context, bool execonly)
{
__reset_rsvds_bits_mask_ept(&context->shadow_zero_check,
- reserved_hpa_bits(), execonly);
+ reserved_hpa_bits(), execonly,
+ max_huge_page_level);
}

#define BYTE_MASK(access) \
@@ -4904,7 +4913,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,

update_permission_bitmask(context, true);
update_pkru_bitmask(context);
- reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+ reset_rsvds_bits_mask_ept(vcpu, context, execonly, max_huge_page_level);
reset_ept_shadow_zero_bits_mask(vcpu, context, execonly);
}
EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
--
2.19.1.6.gb485710b

2021-11-24 13:44:29

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 05/12] KVM: X86: Change the type of a parameter of kvm_mmu_invalidate_gva() and mmu->invlpg() to gpa_t

From: Lai Jiangshan <[email protected]>

When kvm_mmu_invalidate_gva() is called for nested TDP, the @gva is L2
GPA, so the type of the parameter should be gpa_t like mmu->gva_to_gpa().

The parameter name is also changed to gva_or_l2pa for self documentation.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 +++---
arch/x86/kvm/mmu/mmu.c | 14 +++++++-------
arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++---
3 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dd16fdedc0e8..e382596baa1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -427,11 +427,11 @@ struct kvm_mmu {
void (*inject_page_fault)(struct kvm_vcpu *vcpu,
struct x86_exception *fault);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- gpa_t gva_or_gpa, u32 access,
+ gpa_t gva_or_l2pa, u32 access,
struct x86_exception *exception);
int (*sync_page)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp);
- void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
+ void (*invlpg)(struct kvm_vcpu *vcpu, gpa_t gva_or_l2pa, hpa_t root_hpa);
hpa_t root_hpa;
gpa_t root_pgd;
union kvm_mmu_role mmu_role;
@@ -1785,7 +1785,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len);
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- gva_t gva, hpa_t root_hpa);
+ gpa_t gva_or_l2pa, hpa_t root_hpa);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 72ce0d78435e..d3bad4ae72fb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5313,24 +5313,24 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
- gva_t gva, hpa_t root_hpa)
+ gpa_t gva_or_l2pa, hpa_t root_hpa)
{
int i;

- /* It's actually a GPA for vcpu->arch.guest_mmu. */
+ /* It's actually a L2 GPA for vcpu->arch.guest_mmu. */
if (mmu != &vcpu->arch.guest_mmu) {
/* INVLPG on a non-canonical address is a NOP according to the SDM. */
- if (is_noncanonical_address(gva, vcpu))
+ if (is_noncanonical_address(gva_or_l2pa, vcpu))
return;

- static_call(kvm_x86_tlb_flush_gva)(vcpu, gva);
+ static_call(kvm_x86_tlb_flush_gva)(vcpu, gva_or_l2pa);
}

if (!mmu->invlpg)
return;

if (root_hpa == INVALID_PAGE) {
- mmu->invlpg(vcpu, gva, mmu->root_hpa);
+ mmu->invlpg(vcpu, gva_or_l2pa, mmu->root_hpa);

/*
* INVLPG is required to invalidate any global mappings for the VA,
@@ -5345,9 +5345,9 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
*/
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
if (VALID_PAGE(mmu->prev_roots[i].hpa))
- mmu->invlpg(vcpu, gva, mmu->prev_roots[i].hpa);
+ mmu->invlpg(vcpu, gva_or_l2pa, mmu->prev_roots[i].hpa);
} else {
- mmu->invlpg(vcpu, gva, root_hpa);
+ mmu->invlpg(vcpu, gva_or_l2pa, root_hpa);
}
}

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 5c78300fc7d9..7b86209e73f9 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -928,7 +928,8 @@ static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
}

-static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
+/* Note, @gva_or_l2pa is a GPA when invlpg() invalidates an L2 GPA. */
+static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gpa_t gva_or_l2pa, hpa_t root_hpa)
{
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -936,7 +937,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
int level;
u64 *sptep;

- vcpu_clear_mmio_info(vcpu, gva);
+ vcpu_clear_mmio_info(vcpu, gva_or_l2pa);

/*
* No need to check return value here, rmap_can_add() can
@@ -950,7 +951,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
}

write_lock(&vcpu->kvm->mmu_lock);
- for_each_shadow_entry_using_root(vcpu, root_hpa, gva, iterator) {
+ for_each_shadow_entry_using_root(vcpu, root_hpa, gva_or_l2pa, iterator) {
level = iterator.level;
sptep = iterator.sptep;

--
2.19.1.6.gb485710b

2021-11-24 13:46:53

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 09/12] KVM: X86: Rename gpte_is_8_bytes to has_4_byte_gpte and invert the direction

From: Lai Jiangshan <[email protected]>

It's a bit more confusing to make gpte_is_8_bytes=1 if there are no
guest PTEs at all. And has_4_byte_gpte is only changed to be only set
when there are guest PTEs and the guest PTE size is 4 bytes. So when
nonpaping, the value is not inverted, it is still false.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
Documentation/virt/kvm/mmu.rst | 8 ++++----
arch/x86/include/asm/kvm_host.h | 8 ++++----
arch/x86/kvm/mmu/mmu.c | 12 ++++++------
arch/x86/kvm/mmu/mmutrace.h | 2 +-
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/mmu.rst b/Documentation/virt/kvm/mmu.rst
index f60f5488e121..5b1ebad24c77 100644
--- a/Documentation/virt/kvm/mmu.rst
+++ b/Documentation/virt/kvm/mmu.rst
@@ -161,7 +161,7 @@ Shadow pages contain the following information:
If clear, this page corresponds to a guest page table denoted by the gfn
field.
role.quadrant:
- When role.gpte_is_8_bytes=0, the guest uses 32-bit gptes while the host uses 64-bit
+ When role.has_4_byte_gpte=1, the guest uses 32-bit gptes while the host uses 64-bit
sptes. That means a guest page table contains more ptes than the host,
so multiple shadow pages are needed to shadow one guest page.
For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the
@@ -177,9 +177,9 @@ Shadow pages contain the following information:
The page is invalid and should not be used. It is a root page that is
currently pinned (by a cpu hardware register pointing to it); once it is
unpinned it will be destroyed.
- role.gpte_is_8_bytes:
- Reflects the size of the guest PTE for which the page is valid, i.e. '1'
- if 64-bit gptes are in use, '0' if 32-bit gptes are in use.
+ role.has_4_byte_gpte:
+ Reflects the size of the guest PTE for which the page is valid, i.e. '0'
+ if direct map or 64-bit gptes are in use, '1' if 32-bit gptes are in use.
role.efer_nx:
Contains the value of efer.nx for which the page is valid.
role.cr0_wp:
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e382596baa1d..01e50703c878 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -296,14 +296,14 @@ struct kvm_kernel_irq_routing_entry;
*
* - invalid shadow pages are not accounted, so the bits are effectively 18
*
- * - quadrant will only be used if gpte_is_8_bytes=0 (non-PAE paging);
+ * - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
* execonly and ad_disabled are only used for nested EPT which has
- * gpte_is_8_bytes=1. Therefore, 2 bits are always unused.
+ * has_4_byte_gpte=0. Therefore, 2 bits are always unused.
*
* - the 4 bits of level are effectively limited to the values 2/3/4/5,
* as 4k SPs are not tracked (allowed to go unsync). In addition non-PAE
* paging has exactly one upper level, making level completely redundant
- * when gpte_is_8_bytes=0.
+ * when has_4_byte_gpte=1.
*
* - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
* cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
@@ -315,7 +315,7 @@ union kvm_mmu_page_role {
u32 word;
struct {
unsigned level:4;
- unsigned gpte_is_8_bytes:1;
+ unsigned has_4_byte_gpte:1;
unsigned quadrant:2;
unsigned direct:1;
unsigned access:3;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f5a1da112daf..9fb9927264d8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2077,7 +2077,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
role.level = level;
role.direct = direct;
role.access = access;
- if (!direct_mmu && !role.gpte_is_8_bytes) {
+ if (role.has_4_byte_gpte) {
quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
role.quadrant = quadrant;
@@ -4727,7 +4727,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
role.base.ad_disabled = (shadow_accessed_mask == 0);
role.base.level = kvm_mmu_get_tdp_level(vcpu);
role.base.direct = true;
- role.base.gpte_is_8_bytes = true;
+ role.base.has_4_byte_gpte = false;

return role;
}
@@ -4772,7 +4772,7 @@ kvm_calc_shadow_root_page_role_common(struct kvm_vcpu *vcpu,

role.base.smep_andnot_wp = role.ext.cr4_smep && !____is_cr0_wp(regs);
role.base.smap_andnot_wp = role.ext.cr4_smap && !____is_cr0_wp(regs);
- role.base.gpte_is_8_bytes = ____is_cr0_pg(regs) && ____is_cr4_pae(regs);
+ role.base.has_4_byte_gpte = ____is_cr0_pg(regs) && !____is_cr4_pae(regs);

return role;
}
@@ -4871,7 +4871,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
role.base.smm = vcpu->arch.root_mmu.mmu_role.base.smm;

role.base.level = level;
- role.base.gpte_is_8_bytes = true;
+ role.base.has_4_byte_gpte = false;
role.base.direct = false;
role.base.ad_disabled = !accessed_dirty;
role.base.guest_mode = true;
@@ -5155,7 +5155,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
gpa, bytes, sp->role.word);

offset = offset_in_page(gpa);
- pte_size = sp->role.gpte_is_8_bytes ? 8 : 4;
+ pte_size = sp->role.has_4_byte_gpte ? 4 : 8;

/*
* Sometimes, the OS only writes the last one bytes to update status
@@ -5179,7 +5179,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
page_offset = offset_in_page(gpa);
level = sp->role.level;
*nspte = 1;
- if (!sp->role.gpte_is_8_bytes) {
+ if (sp->role.has_4_byte_gpte) {
page_offset <<= 1; /* 32->64 */
/*
* A 32-bit pde maps 4MB while the shadow pdes map
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index b8151bbca36a..de5e8e4e1aa7 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -35,7 +35,7 @@
" %snxe %sad root %u %s%c", \
__entry->mmu_valid_gen, \
__entry->gfn, role.level, \
- role.gpte_is_8_bytes ? 8 : 4, \
+ role.has_4_byte_gpte ? 4 : 8, \
role.quadrant, \
role.direct ? " direct" : "", \
access_str[role.access], \
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 377a96718a2e..fb602c025d9d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -165,7 +165,7 @@ static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu,
role = vcpu->arch.mmu->mmu_role.base;
role.level = level;
role.direct = true;
- role.gpte_is_8_bytes = true;
+ role.has_4_byte_gpte = false;
role.access = ACC_ALL;
role.ad_disabled = !shadow_accessed_mask;

--
2.19.1.6.gb485710b

2021-11-24 13:48:53

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 10/12] KVM: X86: Remove mmu parameter from load_pdptrs()

From: Lai Jiangshan <[email protected]>

It uses vcpu->arch.walk_mmu always.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/nested.c | 4 ++--
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/nested.c | 4 ++--
arch/x86/kvm/x86.c | 12 ++++++------
5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 01e50703c878..c106ad7efe23 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1591,7 +1591,7 @@ void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);

-int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
+int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);

int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
const void *val, int bytes);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 598843cfe6c4..6bcea96cdb92 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -461,7 +461,7 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
return -EINVAL;

if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) &&
- CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)))
+ CC(!load_pdptrs(vcpu, cr3)))
return -EINVAL;

if (!nested_npt)
@@ -1518,7 +1518,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu)
* the guest CR3 might be restored prior to setting the nested
* state which can lead to a load of wrong PDPTRs.
*/
- if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3)))
+ if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
return false;

if (!nested_svm_vmrun_msrpm(svm)) {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d855ba664fc2..e0c18682cbd0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1588,7 +1588,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
switch (reg) {
case VCPU_EXREG_PDPTR:
BUG_ON(!npt_enabled);
- load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu));
+ load_pdptrs(vcpu, kvm_read_cr3(vcpu));
break;
default:
KVM_BUG_ON(1, vcpu->kvm);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 20e126de1c96..d97588bebaaf 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1097,7 +1097,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3,
* must not be dereferenced.
*/
if (reload_pdptrs && !nested_ept && is_pae_paging(vcpu) &&
- CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))) {
+ CC(!load_pdptrs(vcpu, cr3))) {
*entry_failure_code = ENTRY_FAIL_PDPTE;
return -EINVAL;
}
@@ -3142,7 +3142,7 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
* the guest CR3 might be restored prior to setting the nested
* state which can lead to a load of wrong PDPTRs.
*/
- if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3)))
+ if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3)))
return false;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 25e278ba4666..f94b0ebe9a4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -798,8 +798,9 @@ static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
/*
* Load the pae pdptrs. Return 1 if they are all valid, 0 otherwise.
*/
-int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
+int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
{
+ struct kvm_mmu *mmu = vcpu->arch.walk_mmu;
gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
gpa_t real_gpa;
int i;
@@ -887,7 +888,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
#endif
if (!(vcpu->arch.efer & EFER_LME) && (cr0 & X86_CR0_PG) &&
is_pae(vcpu) && ((cr0 ^ old_cr0) & pdptr_bits) &&
- !load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu)))
+ !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
return 1;

if (!(cr0 & X86_CR0_PG) && kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE))
@@ -1063,8 +1064,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
return 1;
} else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE)
&& ((cr4 ^ old_cr4) & pdptr_bits)
- && !load_pdptrs(vcpu, vcpu->arch.walk_mmu,
- kvm_read_cr3(vcpu)))
+ && !load_pdptrs(vcpu, kvm_read_cr3(vcpu)))
return 1;

if ((cr4 & X86_CR4_PCIDE) && !(old_cr4 & X86_CR4_PCIDE)) {
@@ -1153,7 +1153,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
return 1;

- if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
+ if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3))
return 1;

if (cr3 != kvm_read_cr3(vcpu))
@@ -10553,7 +10553,7 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
if (update_pdptrs) {
idx = srcu_read_lock(&vcpu->kvm->srcu);
if (is_pae_paging(vcpu)) {
- load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu));
+ load_pdptrs(vcpu, kvm_read_cr3(vcpu));
*mmu_reset_needed = 1;
}
srcu_read_unlock(&vcpu->kvm->srcu, idx);
--
2.19.1.6.gb485710b

2021-11-24 13:51:26

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 12/12] KVM: X86: Walk shadow page starting with shadow_root_level

From: Lai Jiangshan <[email protected]>

Walking from the root page of the shadow page table should start with
the level of the shadow page table: shadow_root_level.

Also change a small defect in audit_mappings(), it is believed
that the current walking level is more valuable to print.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu_audit.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
index 9e7dcf999f08..6bbbf85b3e46 100644
--- a/arch/x86/kvm/mmu/mmu_audit.c
+++ b/arch/x86/kvm/mmu/mmu_audit.c
@@ -63,7 +63,7 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
hpa_t root = vcpu->arch.mmu->root_hpa;

sp = to_shadow_page(root);
- __mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->root_level);
+ __mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->shadow_root_level);
return;
}

@@ -119,8 +119,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
hpa = pfn << PAGE_SHIFT;
if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
- "ent %llxn", vcpu->arch.mmu->root_level, pfn,
- hpa, *sptep);
+ "ent %llxn", level, pfn, hpa, *sptep);
}

static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
--
2.19.1.6.gb485710b

2021-11-24 13:52:06

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 11/12] KVM: X86: Check root_level only in fast_pgd_switch()

From: Lai Jiangshan <[email protected]>

If root_level >= 4, shadow_root_level must be >= 4 too.
Checking only root_level can reduce a check.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9fb9927264d8..1dc8bfd12ecd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4136,8 +4136,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
* having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
* later if necessary.
*/
- if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
- mmu->root_level >= PT64_ROOT_4LEVEL)
+ if (mmu->root_level >= PT64_ROOT_4LEVEL)
return cached_root_available(vcpu, new_pgd, new_role);

return false;
--
2.19.1.6.gb485710b

2021-11-26 12:58:33

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 11/12] KVM: X86: Check root_level only in fast_pgd_switch()

On 11/24/21 13:20, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> If root_level >= 4, shadow_root_level must be >= 4 too.
> Checking only root_level can reduce a check.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 9fb9927264d8..1dc8bfd12ecd 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4136,8 +4136,7 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd,
> * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> * later if necessary.
> */
> - if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL &&
> - mmu->root_level >= PT64_ROOT_4LEVEL)
> + if (mmu->root_level >= PT64_ROOT_4LEVEL)
> return cached_root_available(vcpu, new_pgd, new_role);
>
> return false;
>

Hmm, I think this is more confusing. I *think* that adding support for
PAE would be mostly an issue with the guest PDPTRs, and not with the
shadow PDPTRs, but without thinking more about it I'm leaning towards
not applying this patch.

Paolo

2021-11-26 13:00:22

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 12/12] KVM: X86: Walk shadow page starting with shadow_root_level

On 11/24/21 13:20, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> Walking from the root page of the shadow page table should start with
> the level of the shadow page table: shadow_root_level.
>
> Also change a small defect in audit_mappings(), it is believed
> that the current walking level is more valuable to print.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu_audit.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c
> index 9e7dcf999f08..6bbbf85b3e46 100644
> --- a/arch/x86/kvm/mmu/mmu_audit.c
> +++ b/arch/x86/kvm/mmu/mmu_audit.c
> @@ -63,7 +63,7 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspect_spte_fn fn)
> hpa_t root = vcpu->arch.mmu->root_hpa;
>
> sp = to_shadow_page(root);
> - __mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->root_level);
> + __mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->shadow_root_level);
> return;
> }
>
> @@ -119,8 +119,7 @@ static void audit_mappings(struct kvm_vcpu *vcpu, u64 *sptep, int level)
> hpa = pfn << PAGE_SHIFT;
> if ((*sptep & PT64_BASE_ADDR_MASK) != hpa)
> audit_printk(vcpu->kvm, "levels %d pfn %llx hpa %llx "
> - "ent %llxn", vcpu->arch.mmu->root_level, pfn,
> - hpa, *sptep);
> + "ent %llxn", level, pfn, hpa, *sptep);
> }
>
> static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
>

Queued all except patch 11, thanks.

Paolo

2021-12-09 01:16:30

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 1/12] KVM: X86: Fix when shadow_root_level=5 && guest root_level<4

On Wed, Nov 24, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> If the is an L1 with nNPT in 32bit, the shadow walk starts with
> pae_root.
>
> Fixes: a717a780fc4e ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host)

Have you actually run with 5-level nNPT? I don't have access to hardware, at least
not that I know of :-)

I'm staring at kvm_mmu_sync_roots() and don't see how it can possibly work for
5-level nNPT with a 4-level NPT guest.

2021-12-09 01:21:54

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 1/12] KVM: X86: Fix when shadow_root_level=5 && guest root_level<4

On Thu, Dec 09, 2021, Sean Christopherson wrote:
> On Wed, Nov 24, 2021, Lai Jiangshan wrote:
> > From: Lai Jiangshan <[email protected]>
> >
> > If the is an L1 with nNPT in 32bit, the shadow walk starts with
> > pae_root.
> >
> > Fixes: a717a780fc4e ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host)
>
> Have you actually run with 5-level nNPT? I don't have access to hardware, at least
> not that I know of :-)
>
> I'm staring at kvm_mmu_sync_roots() and don't see how it can possibly work for
> 5-level nNPT with a 4-level NPT guest.

Oh, and fast_pgd_switch() will also break kvm_mmu_sync_prev_roots() / is_unsync_root()
by putting a root into the prev_roots array that doesn't have a shadow page associated
with the root.

2021-12-10 09:34:43

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH 1/12] KVM: X86: Fix when shadow_root_level=5 && guest root_level<4

On 2021/12/9 09:16, Sean Christopherson wrote:
> On Wed, Nov 24, 2021, Lai Jiangshan wrote:
>> From: Lai Jiangshan <[email protected]>
>>
>> If the is an L1 with nNPT in 32bit, the shadow walk starts with
>> pae_root.
>>
>> Fixes: a717a780fc4e ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host)
>
> Have you actually run with 5-level nNPT? I don't have access to hardware, at least
> not that I know of :-)

The code is just obvious incorrect for shadow_root_level=5 && guest root_level<4.

>
> I'm staring at kvm_mmu_sync_roots() and don't see how it can possibly work for
> 5-level nNPT with a 4-level NPT guest.
>

It doesn't use pml5_root for 5-level nNPT with a 4-level NPT guest, so
kvm_mmu_sync_roots() can work in a silence way with an "unexpected" root shadow
page. It has problems for 5-level nNPT with a 4-level NPT guest.

See:
https://lore.kernel.org/lkml/[email protected]/

especially patch4.

Your this reply motivated me to complete the changelog of a patchset and send
it, thanks!

Although the patchset is immature, it would be better than losing it.