From: Lai Jiangshan <[email protected]>
Current code uses mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
setup special roots. The initialization code is complex and the roots
are not associated with struct kvm_mmu_page which causes the code more
complex.
So add new local shadow pages to simplify it.
The local shadow pages are associated with struct kvm_mmu_page and
VCPU-local.
The local shadow pages are created and freed when the roots are
changed (or one-off) which can be optimized but not in the patchset
since the re-creating is light way (in normal case only the struct
kvm_mmu_page needs to be re-allocated and sp->spt doens't, because
it is likely to be mmu->pae_root)
The patchset also fixes a possible bug described in:
https://lore.kernel.org/lkml/[email protected]/
as patch1.
And the fixing is simplifed in patch9 with the help of local shadow page.
Note:
using_local_root_page() can be implemented in two ways.
static bool using_local_root_page(struct kvm_mmu *mmu)
{
return mmu->root_role.level == PT32E_ROOT_LEVEL ||
(!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
}
static bool using_local_root_page(struct kvm_mmu *mmu)
{
if (mmu->root_role.direct)
return mmu->root_role.level == PT32E_ROOT_LEVEL;
else
return mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL;
}
I prefer the second way. But when I wrote the documents for them. I
couldn't explain well enough for the second way. Maybe I explained the
second way in a wrong aspect or my English is not qualified to explain
it.
So I put the first way in patch 2 and the second way in patch3.
Patch3 adds much more documents and changes the first way to the second
way. Patch3 can be discarded.
Changed from v2:
Add document for using_local_root_page()
Update many documents
Address review comments
Add a patch that fix a possible bug (and split other patches for patch9)
Changed from v1:
Rebase to newest kvm/queue. Slightly update patch4.
[V2]: https://lore.kernel.org/lkml/[email protected]/
[V1]: https://lore.kernel.org/lkml/[email protected]/
Lai Jiangshan (12):
KVM: X86/MMU: Verify PDPTE for nested NPT in PAE paging mode when page
fault
KVM: X86/MMU: Add using_local_root_page()
KVM: X86/MMU: Reduce a check in using_local_root_page() for common
cases
KVM: X86/MMU: Add local shadow pages
KVM: X86/MMU: Link PAE root pagetable with its children
KVM: X86/MMU: Activate local shadow pages and remove old logic
KVM: X86/MMU: Remove the check of the return value of to_shadow_page()
KVM: X86/MMU: Allocate mmu->pae_root for PAE paging on-demand
KVM: X86/MMU: Move the verifying of NPT's PDPTE in FNAME(fetch)
KVM: X86/MMU: Remove unused INVALID_PAE_ROOT and IS_VALID_PAE_ROOT
KVM: X86/MMU: Don't use mmu->pae_root when shadowing PAE NPT in 64-bit
host
KVM: X86/MMU: Remove mmu_alloc_special_roots()
arch/x86/include/asm/kvm_host.h | 5 +-
arch/x86/kvm/mmu/mmu.c | 575 ++++++++++++++------------------
arch/x86/kvm/mmu/mmu_internal.h | 10 -
arch/x86/kvm/mmu/paging_tmpl.h | 51 ++-
arch/x86/kvm/mmu/spte.c | 7 +
arch/x86/kvm/mmu/spte.h | 1 +
arch/x86/kvm/mmu/tdp_mmu.h | 7 +-
arch/x86/kvm/x86.c | 4 +-
8 files changed, 303 insertions(+), 357 deletions(-)
--
2.19.1.6.gb485710b
From: Lai Jiangshan <[email protected]>
They are unused and replaced with 0ull like other zero sptes and
is_shadow_present_pte().
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu_internal.h | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index bd2a26897b97..4feb1ac2742c 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -20,16 +20,6 @@ extern bool dbg;
#define MMU_WARN_ON(x) do { } while (0)
#endif
-/*
- * Unlike regular MMU roots, PAE "roots", a.k.a. PDPTEs/PDPTRs, have a PRESENT
- * bit, and thus are guaranteed to be non-zero when valid. And, when a guest
- * PDPTR is !PRESENT, its corresponding PAE root cannot be set to INVALID_PAGE,
- * as the CPU would treat that as PRESENT PDPTR with reserved bits set. Use
- * '0' instead of INVALID_PAGE to indicate an invalid PAE root.
- */
-#define INVALID_PAE_ROOT 0
-#define IS_VALID_PAE_ROOT(x) (!!(x))
-
typedef u64 __rcu *tdp_ptep_t;
struct kvm_mmu_page {
--
2.19.1.6.gb485710b
From: Lai Jiangshan <[email protected]>
For most cases, mmu->root_role.direct is true and mmu->root_role.level
is not PT32E_ROOT_LEVEL which means using_local_root_page() is often
checking for all the three test which is not good in fast paths.
Morph the conditions in using_local_root_page() to an equivalent one
to reduce a check.
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 45 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 624b6d2473f7..240ebe589caf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1716,11 +1716,52 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
*
* (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
* already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
+ *
+ * But mmu->root_role.direct is normally true and mmu->root_role.level is
+ * normally not PT32E_ROOT_LEVEL. To reduce a check for the fast path of
+ * fast_pgd_switch() in mormal case, mmu->root_role.direct is checked first.
+ *
+ * The return value is:
+ * mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * =>
+ * (mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * (!mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * =>
+ * (mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * (!mmu->root_role.direct &&
+ * (mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL))
+ * => (for !direct, mmu->root_role.level == PT32E_ROOT_LEVEL implies
+ * mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * =>
+ * (mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL) ||
+ * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ *
+ * In other words:
+ *
+ * For the first and third cases, it is
+ * mmu->root_role.direct && mmu->root_role.level == PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be one of the two cases.
+ *
+ * For the 2nd, 4th and 5th cases, it is
+ * !mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be one of the three cases although
+ * it is not so intuitive. It can be split into:
+ * mmu->root_role.level == PT32E_ROOT_LEVEL &&
+ * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ * which is for the 2nd and 4th cases and
+ * mmu->root_role.level > PT32E_ROOT_LEVEL &&
+ * !mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * which is the last case.
*/
static bool using_local_root_page(struct kvm_mmu *mmu)
{
- return mmu->root_role.level == PT32E_ROOT_LEVEL ||
- (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
+ if (mmu->root_role.direct)
+ return mmu->root_role.level == PT32E_ROOT_LEVEL;
+ else
+ return mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL;
}
static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
--
2.19.1.6.gb485710b
From: Lai Jiangshan <[email protected]>
mmu_alloc_special_roots() allocates mmu->pae_root for non-PAE paging
(as for shadowing 32bit NPT on 64 bit host) and mmu->pml4_root and
mmu->pml5_root.
But mmu->pml4_root and mmu->pml5_root is not used, neither mmu->pae_root
for non-PAE paging.
So remove mmu_alloc_special_roots(), mmu->pml4_root and mmu->pml5_root.
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 --
arch/x86/kvm/mmu/mmu.c | 77 ---------------------------------
2 files changed, 80 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fb9751dfc1a7..ec44e6c3d5ea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -458,9 +458,6 @@ struct kvm_mmu {
u8 permissions[16];
u64 *pae_root;
- u64 *pml4_root;
- u64 *pml5_root;
-
/*
* check zero bits on shadow page table entries, these
* bits include not only hardware reserved bits but also
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 73e6a8e1e1a9..b8eed217314d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3691,78 +3691,6 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
return r;
}
-static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
-{
- struct kvm_mmu *mmu = vcpu->arch.mmu;
- bool need_pml5 = mmu->root_role.level > PT64_ROOT_4LEVEL;
- u64 *pml5_root = NULL;
- u64 *pml4_root = NULL;
- u64 *pae_root;
-
- /*
- * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
- * tables are allocated and initialized at root creation as there is no
- * equivalent level in the guest's NPT to shadow. Allocate the tables
- * on demand, as running a 32-bit L1 VMM on 64-bit KVM is very rare.
- */
- if (mmu->root_role.direct ||
- mmu->cpu_role.base.level >= PT64_ROOT_4LEVEL ||
- mmu->root_role.level < PT64_ROOT_4LEVEL)
- return 0;
-
- /*
- * NPT, the only paging mode that uses this horror, uses a fixed number
- * of levels for the shadow page tables, e.g. all MMUs are 4-level or
- * all MMus are 5-level. Thus, this can safely require that pml5_root
- * is allocated if the other roots are valid and pml5 is needed, as any
- * prior MMU would also have required pml5.
- */
- if (mmu->pae_root && mmu->pml4_root && (!need_pml5 || mmu->pml5_root))
- return 0;
-
- /*
- * The special roots should always be allocated in concert. Yell and
- * bail if KVM ends up in a state where only one of the roots is valid.
- */
- if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root ||
- (need_pml5 && mmu->pml5_root)))
- return -EIO;
-
- /*
- * Unlike 32-bit NPT, the PDP table doesn't need to be in low mem, and
- * doesn't need to be decrypted.
- */
- pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
- if (!pae_root)
- return -ENOMEM;
-
-#ifdef CONFIG_X86_64
- pml4_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
- if (!pml4_root)
- goto err_pml4;
-
- if (need_pml5) {
- pml5_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
- if (!pml5_root)
- goto err_pml5;
- }
-#endif
-
- mmu->pae_root = pae_root;
- mmu->pml4_root = pml4_root;
- mmu->pml5_root = pml5_root;
-
- return 0;
-
-#ifdef CONFIG_X86_64
-err_pml5:
- free_page((unsigned long)pml4_root);
-err_pml4:
- free_page((unsigned long)pae_root);
- return -ENOMEM;
-#endif
-}
-
static bool is_unsync_root(hpa_t root)
{
struct kvm_mmu_page *sp;
@@ -5166,9 +5094,6 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
r = mmu_alloc_pae_root(vcpu);
if (r)
return r;
- r = mmu_alloc_special_roots(vcpu);
- if (r)
- goto out;
if (vcpu->arch.mmu->root_role.direct)
r = mmu_alloc_direct_roots(vcpu);
else
@@ -5626,8 +5551,6 @@ static void free_mmu_pages(struct kvm_mmu *mmu)
if (!tdp_enabled && mmu->pae_root)
set_memory_encrypted((unsigned long)mmu->pae_root, 1);
free_page((unsigned long)mmu->pae_root);
- free_page((unsigned long)mmu->pml4_root);
- free_page((unsigned long)mmu->pml5_root);
}
static void __kvm_mmu_create(struct kvm_mmu *mmu)
--
2.19.1.6.gb485710b
On Sat, May 21, 2022 at 9:16 PM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> Current code uses mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
> setup special roots. The initialization code is complex and the roots
> are not associated with struct kvm_mmu_page which causes the code more
> complex.
>
> So add new local shadow pages to simplify it.
>
> The local shadow pages are associated with struct kvm_mmu_page and
> VCPU-local.
>
> The local shadow pages are created and freed when the roots are
> changed (or one-off) which can be optimized but not in the patchset
> since the re-creating is light way (in normal case only the struct
> kvm_mmu_page needs to be re-allocated and sp->spt doens't, because
> it is likely to be mmu->pae_root)
>
> The patchset also fixes a possible bug described in:
> https://lore.kernel.org/lkml/[email protected]/
> as patch1.
>
Ping and please ignore patch1 and patch9. It would not cause any conflict
without patch1 and patch9 if both are ignored together.
The fix is wrong (see new discussion in the above link). So the possible
correct fix will not have any conflict with this patchset of one-off
local shadow page. I don't want to add extra stuff in this patchset
anymore.
Thanks
Lai
On Thu, May 26, 2022 at 04:49:09PM +0800, Lai Jiangshan wrote:
> On Sat, May 21, 2022 at 9:16 PM Lai Jiangshan <[email protected]> wrote:
> >
> > From: Lai Jiangshan <[email protected]>
> >
> > Current code uses mmu->pae_root, mmu->pml4_root, and mmu->pml5_root to
> > setup special roots. The initialization code is complex and the roots
> > are not associated with struct kvm_mmu_page which causes the code more
> > complex.
> >
> > So add new local shadow pages to simplify it.
> >
> > The local shadow pages are associated with struct kvm_mmu_page and
> > VCPU-local.
> >
> > The local shadow pages are created and freed when the roots are
> > changed (or one-off) which can be optimized but not in the patchset
> > since the re-creating is light way (in normal case only the struct
> > kvm_mmu_page needs to be re-allocated and sp->spt doens't, because
> > it is likely to be mmu->pae_root)
> >
> > The patchset also fixes a possible bug described in:
> > https://lore.kernel.org/lkml/[email protected]/
> > as patch1.
> >
>
> Ping and please ignore patch1 and patch9. It would not cause any conflict
> without patch1 and patch9 if both are ignored together.
>
> The fix is wrong (see new discussion in the above link). So the possible
> correct fix will not have any conflict with this patchset of one-off
> local shadow page. I don't want to add extra stuff in this patchset
> anymore.
Yeah I agree with splitting this fix out to a separate patchset, and
ordered after this cleanup so it can be done in one patch.
When you get around to it, can you also implement a kvm-unit-test to
demonstrate the bug? It would be good to have a regression test.
>
> Thanks
> Lai
On Sat, May 21, 2022, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> They are unused and replaced with 0ull like other zero sptes and
> is_shadow_present_pte().
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
Reviewed-by: Sean Christopherson <[email protected]>