2023-05-12 00:13:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 6/9] KVM: x86/mmu: Bug the VM if a vCPU ends up in long mode without PAE enabled

Promote the ASSERT(), which is quite dead code in KVM, into a KVM_BUG_ON()
for KVM's sanity check that CR4.PAE=1 if the vCPU is in long mode when
performing a walk of guest page tables. The sanity is quite cheap since
neither EFER nor CR4.PAE requires a VMREAD, especially relative to the
cost of walking the guest page tables.

More importantly, the sanity check would have prevented the true badness
fixed by commit 112e66017bff ("KVM: nVMX: add missing consistency checks
for CR0 and CR4"). The missed consistency check resulted in some versions
of KVM corrupting the on-stack guest_walker structure due to KVM thinking
there are 4/5 levels of page tables, but wiring up the MMU hooks to point
at the paging32 implementation, which only allocates space for two levels
of page tables in "struct guest_walker32".

Queue a page fault for injection if the assertion fails, as the sole
caller, FNAME(gva_to_gpa), assumes that walker.fault contains sane info
on a walk failure, i.e. avoid making the situation worse between the time
the assertion fails and when KVM kicks the vCPU out to userspace (because
the VM is bugged).

Move the check below the initialization of "pte_access" so that the
aforementioned to-be-injected page fault doesn't consume uninitialized
stack data. The information _shouldn't_ reach the guest or userspace,
but there's zero downside to being paranoid in this case.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index a3fc7c1a7f8d..f297e9311dcd 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -338,7 +338,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
}
#endif
walker->max_level = walker->level;
- ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu)));

/*
* FIXME: on Intel processors, loads of the PDPTE registers for PAE paging
@@ -348,6 +347,10 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;

pte_access = ~0;
+
+ if (KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm))
+ goto error;
+
++walker->level;

do {
--
2.40.1.606.ga4b1b128d6-goog



2023-05-12 23:38:26

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: x86/mmu: Bug the VM if a vCPU ends up in long mode without PAE enabled

On Thu, May 11, 2023 at 04:59:14PM -0700, Sean Christopherson wrote:
> Promote the ASSERT(), which is quite dead code in KVM, into a KVM_BUG_ON()
> for KVM's sanity check that CR4.PAE=1 if the vCPU is in long mode when
> performing a walk of guest page tables. The sanity is quite cheap since
> neither EFER nor CR4.PAE requires a VMREAD, especially relative to the
> cost of walking the guest page tables.
>
> More importantly, the sanity check would have prevented the true badness
> fixed by commit 112e66017bff ("KVM: nVMX: add missing consistency checks
> for CR0 and CR4"). The missed consistency check resulted in some versions
> of KVM corrupting the on-stack guest_walker structure due to KVM thinking
> there are 4/5 levels of page tables, but wiring up the MMU hooks to point
> at the paging32 implementation, which only allocates space for two levels
> of page tables in "struct guest_walker32".
>
> Queue a page fault for injection if the assertion fails, as the sole
> caller, FNAME(gva_to_gpa), assumes that walker.fault contains sane info

FNAME(page_fault)->FNAME(walk_addr)->FNAME(walk_addr_generic) is another
caller but I think the same reasoning here applies.

> on a walk failure, i.e. avoid making the situation worse between the time
> the assertion fails and when KVM kicks the vCPU out to userspace (because
> the VM is bugged).
>
> Move the check below the initialization of "pte_access" so that the
> aforementioned to-be-injected page fault doesn't consume uninitialized
> stack data. The information _shouldn't_ reach the guest or userspace,
> but there's zero downside to being paranoid in this case.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/paging_tmpl.h | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index a3fc7c1a7f8d..f297e9311dcd 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -338,7 +338,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> }
> #endif
> walker->max_level = walker->level;
> - ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu)));
>
> /*
> * FIXME: on Intel processors, loads of the PDPTE registers for PAE paging
> @@ -348,6 +347,10 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;
>
> pte_access = ~0;
> +
> + if (KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm))
> + goto error;

This if() deserves a comment since it's queueing a page fault for what
is likely a KVM bug. As a reader that'd be pretty jarring to see.

> +
> ++walker->level;
>
> do {
> --
> 2.40.1.606.ga4b1b128d6-goog
>

2023-05-12 23:50:51

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 6/9] KVM: x86/mmu: Bug the VM if a vCPU ends up in long mode without PAE enabled

On Fri, May 12, 2023, David Matlack wrote:
> On Thu, May 11, 2023 at 04:59:14PM -0700, Sean Christopherson wrote:
> > Promote the ASSERT(), which is quite dead code in KVM, into a KVM_BUG_ON()
> > for KVM's sanity check that CR4.PAE=1 if the vCPU is in long mode when
> > performing a walk of guest page tables. The sanity is quite cheap since
> > neither EFER nor CR4.PAE requires a VMREAD, especially relative to the
> > cost of walking the guest page tables.
> >
> > More importantly, the sanity check would have prevented the true badness
> > fixed by commit 112e66017bff ("KVM: nVMX: add missing consistency checks
> > for CR0 and CR4"). The missed consistency check resulted in some versions
> > of KVM corrupting the on-stack guest_walker structure due to KVM thinking
> > there are 4/5 levels of page tables, but wiring up the MMU hooks to point
> > at the paging32 implementation, which only allocates space for two levels
> > of page tables in "struct guest_walker32".
> >
> > Queue a page fault for injection if the assertion fails, as the sole
> > caller, FNAME(gva_to_gpa), assumes that walker.fault contains sane info
>
> FNAME(page_fault)->FNAME(walk_addr)->FNAME(walk_addr_generic) is another
> caller but I think the same reasoning here applies.

Huh. No idea what I was doing. Missed the super obvious use case... I'll make
sure the call from walk_addr() does something not awful.

> > on a walk failure, i.e. avoid making the situation worse between the time
> > the assertion fails and when KVM kicks the vCPU out to userspace (because
> > the VM is bugged).
> >
> > Move the check below the initialization of "pte_access" so that the
> > aforementioned to-be-injected page fault doesn't consume uninitialized
> > stack data. The information _shouldn't_ reach the guest or userspace,
> > but there's zero downside to being paranoid in this case.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/mmu/paging_tmpl.h | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> > index a3fc7c1a7f8d..f297e9311dcd 100644
> > --- a/arch/x86/kvm/mmu/paging_tmpl.h
> > +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> > @@ -338,7 +338,6 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> > }
> > #endif
> > walker->max_level = walker->level;
> > - ASSERT(!(is_long_mode(vcpu) && !is_pae(vcpu)));
> >
> > /*
> > * FIXME: on Intel processors, loads of the PDPTE registers for PAE paging
> > @@ -348,6 +347,10 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> > nested_access = (have_ad ? PFERR_WRITE_MASK : 0) | PFERR_USER_MASK;
> >
> > pte_access = ~0;
> > +
> > + if (KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm))
> > + goto error;
>
> This if() deserves a comment since it's queueing a page fault for what
> is likely a KVM bug. As a reader that'd be pretty jarring to see.

Will add.