From: Lai Jiangshan <[email protected]>
In some cases, local root pages are used for MMU. It is often using
to_shadow_page(mmu->root.hpa) to check if local root pages are used.
Add using_local_root_page() to directly check if local root pages are
used or needed to be used even mmu->root.hpa is not set.
Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
using local shadow [root] pages.
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index efe5a3dca1e0..624b6d2473f7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
mmu_spte_clear_no_track(parent_pte);
}
+/*
+ * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
+ * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
+ * 32bit L1 hypervisor.
+ *
+ * It includes cases:
+ * nonpaging when !tdp_enabled (direct paging)
+ * shadow paging for 32 bit guest when !tdp_enabled (shadow paging)
+ * NPT in 32bit host (not shadowing nested NPT) (direct paging)
+ * shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
+ * shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
+ *
+ * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
+ * shadow pagetable is using PAE paging.
+ *
+ * For the last case, it is
+ * mmu->root_role.level > PT32E_ROOT_LEVEL &&
+ * !mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
+ * And if this condition is true, it must be the last case.
+ *
+ * With the two conditions combined, the checking condition is:
+ * mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
+ *
+ * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
+ * already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
+ */
+static bool using_local_root_page(struct kvm_mmu *mmu)
+{
+ return mmu->root_role.level == PT32E_ROOT_LEVEL ||
+ (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
+}
+
static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
{
struct kvm_mmu_page *sp;
@@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
{
/*
* For now, limit the caching to 64-bit hosts+VMs in order to avoid
- * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
- * later if necessary.
+ * having to deal with PDPTEs. Local roots can not be put into
+ * mmu->prev_roots[] because mmu->pae_root can not be shared for
+ * different roots at the same time.
*/
- if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
+ if (unlikely(using_local_root_page(mmu)))
kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
if (VALID_PAGE(mmu->root.hpa))
--
2.19.1.6.gb485710b
On Thu, May 26, 2022, David Matlack wrote:
> On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <[email protected]>
> >
> > In some cases, local root pages are used for MMU. It is often using
> > to_shadow_page(mmu->root.hpa) to check if local root pages are used.
> >
> > Add using_local_root_page() to directly check if local root pages are
> > used or needed to be used even mmu->root.hpa is not set.
> >
> > Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> > using local shadow [root] pages.
> >
> > Signed-off-by: Lai Jiangshan <[email protected]>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 37 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index efe5a3dca1e0..624b6d2473f7 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
> > mmu_spte_clear_no_track(parent_pte);
> > }
> >
> > +/*
> > + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> > + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> > + * 32bit L1 hypervisor.
>
> How about using the terms "private" and "shared" instead of "local" and
> "non-local"? I think that more accurately conveys what is special about
> these pages: they are private to the vCPU using them. And then "shared"
> is more intuitive to understand than "non-local" (which is used
> elsewhere in this series).
Please avoid "private" and "shared". I haven't read the full context of the
discussion, but those terms have already been claimed by confidential VMs.
FWIW, I believe similar discussions happened around mm/ and kmap(), and they ended
up with thread_local and kmap_local(). Maybe "vCPU local" and "common"?
On Sat, May 21, 2022 at 09:16:50PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> In some cases, local root pages are used for MMU. It is often using
> to_shadow_page(mmu->root.hpa) to check if local root pages are used.
>
> Add using_local_root_page() to directly check if local root pages are
> used or needed to be used even mmu->root.hpa is not set.
>
> Prepare for making to_shadow_page(mmu->root.hpa) returns non-NULL via
> using local shadow [root] pages.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 40 +++++++++++++++++++++++++++++++++++++---
> 1 file changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index efe5a3dca1e0..624b6d2473f7 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1690,6 +1690,39 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
> mmu_spte_clear_no_track(parent_pte);
> }
>
> +/*
> + * KVM uses the VCPU's local root page (vcpu->mmu->pae_root) when either the
> + * shadow pagetable is using PAE paging or the host is shadowing nested NPT for
> + * 32bit L1 hypervisor.
How about using the terms "private" and "shared" instead of "local" and
"non-local"? I think that more accurately conveys what is special about
these pages: they are private to the vCPU using them. And then "shared"
is more intuitive to understand than "non-local" (which is used
elsewhere in this series).
> + *
> + * It includes cases:
> + * nonpaging when !tdp_enabled (direct paging)
> + * shadow paging for 32 bit guest when !tdp_enabled (shadow paging)
> + * NPT in 32bit host (not shadowing nested NPT) (direct paging)
> + * shadow nested NPT for 32bit L1 hypervisor in 32bit host (shadow paging)
> + * shadow nested NPT for 32bit L1 hypervisor in 64bit host (shadow paging)
> + *
> + * For the first four cases, mmu->root_role.level is PT32E_ROOT_LEVEL and the
> + * shadow pagetable is using PAE paging.
> + *
> + * For the last case, it is
> + * mmu->root_role.level > PT32E_ROOT_LEVEL &&
> + * !mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL
> + * And if this condition is true, it must be the last case.
> + *
> + * With the two conditions combined, the checking condition is:
> + * mmu->root_role.level == PT32E_ROOT_LEVEL ||
> + * (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL)
> + *
> + * (There is no "mmu->root_role.level > PT32E_ROOT_LEVEL" here, because it is
> + * already ensured that mmu->root_role.level >= PT32E_ROOT_LEVEL)
> + */
> +static bool using_local_root_page(struct kvm_mmu *mmu)
> +{
> + return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> + (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
> static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
> {
> struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
> {
> /*
> * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> - * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> - * later if necessary.
> + * having to deal with PDPTEs. Local roots can not be put into
> + * mmu->prev_roots[] because mmu->pae_root can not be shared for
> + * different roots at the same time.
This comment ends up being a little confusing by the end of this series
because using_local_root_page() does not necessarily imply pae_root is
in use. i.e. case 5 (shadow nested NPT for 32bit L1 hypervisor in 64bit
host) does not use pae_root.
How about rewording this commit to say something like:
If the vCPU is using a private root, it might be using pae_root, which
cannot be shared for different roots at the same time.
> */
> - if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> + if (unlikely(using_local_root_page(mmu)))
> kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
>
> if (VALID_PAGE(mmu->root.hpa))
> --
> 2.19.1.6.gb485710b
>
On Sat, May 21, 2022, Lai Jiangshan wrote:
> +static bool using_local_root_page(struct kvm_mmu *mmu)
Hmm, I agree with David that "local" isn't the most intuitive terminology. But
I also do want to avoid private vs. shared to avoid confusion with confidential VMs.
Luckily, I don't think we need to come up with new terminology, just be literal
and call 'em "per-vCPU root pages". E.g.
static bool kvm_mmu_has_per_vcpu_root_page()
That way readers don't have to understand what "local" means, and that also captures
per-vCPU roots are an exception, i.e. that most roots are NOT per-vCPU.
> +{
> + return mmu->root_role.level == PT32E_ROOT_LEVEL ||
> + (!mmu->root_role.direct && mmu->cpu_role.base.level <= PT32E_ROOT_LEVEL);
> +}
> +
> static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct)
> {
> struct kvm_mmu_page *sp;
> @@ -4252,10 +4285,11 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
> {
> /*
> * For now, limit the caching to 64-bit hosts+VMs in order to avoid
> - * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
> - * later if necessary.
> + * having to deal with PDPTEs. Local roots can not be put into
> + * mmu->prev_roots[] because mmu->pae_root can not be shared for
> + * different roots at the same time.
> */
> - if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa))
> + if (unlikely(using_local_root_page(mmu)))
I don't know that I like using the local/per-vCPU helper. The problem isn't _just_
that KVM is using a per-vCPU root, KVM is also deliberately punting on dealing with
PDTPRs. E.g. the per-vCPU aspect doesn't explain why KVM doesn't allow reusing the
current root. I don't like that the using_local_root_page() obfuscates that check.
My preference for this would be to revert back to a streamlined variation of the
code prior to commit 5499ea73e7db ("KVM: x86/mmu: look for a cached PGD when going
from 32-bit to 64-bit").
KVM switched to the !to_shadow_page() check to _avoid_ consuming (what is now)
mmu->root_role because, at the time of the patch, mmu held the _old_ data, which
was wrong/stale for nested virtualization transitions.
In other words, I would prefer that explicitly do (in a separate patch):
/*
* For now, limit the fast switch to 64-bit VMs in order to avoid having
* to deal with PDPTEs. 32-bit VMs can be supported later if necessary.
*/
if (new_role.level < PT64_ROOT_LEVEL4)
kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
The "hosts+VMs" can be shortened to just "VMs", because running a 64-bit VM with
a 32-bit host just doesn't work for a variety of reasons, i.e. doesn't need to be
called out here.