2023-07-22 01:41:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/5] KVM: x86/mmu: Don't synthesize triple fault on bad root

Rework the handling of !visible guest root gfns to wait until the guest
actually tries to access memory before synthesizing a fault. KVM currently
just immediately synthesizes triple fault, which causes problems for nVMX
and nSVM as immediately injecting a fault causes KVM to try and forward the
fault to L1 (as a VM-Exit) before completing nested VM-Enter, e.g. if L1
runs L2 with a "bad" nested TDP root.

To get around the conundrum of not wanting to shadow garbage, load a dummy
root, backed by the zero page, into CR3/EPTP/nCR3, and then inject an
appropriate page fault when the guest (likely) hits a !PRESENT fault.

Note, KVM's behavior is still not strictly correct with respect to x86
architecture, the primary goal is purely to prevent triggering KVM's WARN
at will. No real world guest intentionally loads CR3 (or EPTP or nCR3)
with a GPA that points at MMIO and expects it to work (and KVM has a long
and storied history of punting on emulated MMIO corner cases).

I didn't Cc any of this for stable because syzkaller is really the only
thing that I expect to care, and the whole dummy root thing isn't exactly
risk free. If someone _really_ wants to squash the WARN in LTS kernels,
the way to do that would be to exempt triple fault shutdown VM-Exits from
the sanity checks in nVMX and nSVM, i.e. sweep the problem under the rug.

I have a KUT test for this that'll I'll post next week.

Sean Christopherson (5):
KVM: x86/mmu: Add helper to convert root hpa to shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest
roots

arch/x86/kvm/mmu/mmu.c | 88 ++++++++++++++++++---------------
arch/x86/kvm/mmu/mmu_internal.h | 10 ++++
arch/x86/kvm/mmu/paging_tmpl.h | 18 ++++++-
arch/x86/kvm/mmu/spte.h | 12 +++++
arch/x86/kvm/mmu/tdp_iter.c | 11 +++--
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
6 files changed, 93 insertions(+), 48 deletions(-)


base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.41.0.487.g6d72f3e995-goog



2023-07-22 01:42:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 4/5] KVM: x86/mmu: Disallow guest from using !visible slots for page tables

Explicitly inject a page fault if guest attempts to use a !visible gfn
as a page table. kvm_vcpu_gfn_to_hva_prot() will naturally handle the
case where there is no memslot, but doesn't catch the scenario where the
gfn points at a KVM-internal memslot.

Letting the guest backdoor its way into accessing KVM-internal memslots
isn't dangerous on its own, e.g. at worst the guest can crash itself, but
disallowing the behavior will simplify fixing how KVM handles !visible
guest root gfns (immediately synthesizing a triple fault when loading the
root is architecturally wrong).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0662e0278e70..122bfc0124d3 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -351,6 +351,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
++walker->level;

do {
+ struct kvm_memory_slot *slot;
unsigned long host_addr;

pt_access = pte_access;
@@ -381,7 +382,11 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
if (unlikely(real_gpa == INVALID_GPA))
return 0;

- host_addr = kvm_vcpu_gfn_to_hva_prot(vcpu, gpa_to_gfn(real_gpa),
+ slot = kvm_vcpu_gfn_to_memslot(vcpu, gpa_to_gfn(real_gpa));
+ if (!kvm_is_visible_memslot(slot))
+ goto error;
+
+ host_addr = gfn_to_hva_memslot_prot(slot, gpa_to_gfn(real_gpa),
&walker->pte_writable[walker->level - 1]);
if (unlikely(kvm_is_error_hva(host_addr)))
goto error;
--
2.41.0.487.g6d72f3e995-goog


2023-07-22 01:42:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 2/5] KVM: x86/mmu: Harden new PGD against roots without shadow pages

Harden kvm_mmu_new_pgd() against NULL pointer dereference bugs by sanity
checking that the target root has an associated shadow page prior to
dereferencing said shadow page. The code in question is guaranteed to
only see roots with shadow pages as fast_pgd_switch() explicitly frees the
current root if it doesn't have a shadow page, i.e. is a PAE root, and
that in turn prevents valid roots from being cached, but that's all very
subtle.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1eadfcde30be..dd8cc46551b2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4560,9 +4560,19 @@ static void nonpaging_init_context(struct kvm_mmu *context)
static inline bool is_root_usable(struct kvm_mmu_root_info *root, gpa_t pgd,
union kvm_mmu_page_role role)
{
- return (role.direct || pgd == root->pgd) &&
- VALID_PAGE(root->hpa) &&
- role.word == root_to_sp(root->hpa)->role.word;
+ struct kvm_mmu_page *sp;
+
+ if (!VALID_PAGE(root->hpa))
+ return false;
+
+ if (!role.direct && pgd != root->pgd)
+ return false;
+
+ sp = root_to_sp(root->hpa);
+ if (WARN_ON_ONCE(!sp))
+ return false;
+
+ return role.word == sp->role.word;
}

/*
@@ -4682,9 +4692,12 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
* If this is a direct root page, it doesn't have a write flooding
* count. Otherwise, clear the write flooding count.
*/
- if (!new_role.direct)
- __clear_sp_write_flooding_count(
- root_to_sp(vcpu->arch.mmu->root.hpa));
+ if (!new_role.direct) {
+ struct kvm_mmu_page *sp = root_to_sp(vcpu->arch.mmu->root.hpa);
+
+ if (!WARN_ON_ONCE(!sp))
+ __clear_sp_write_flooding_count(sp);
+ }
}
EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);

--
2.41.0.487.g6d72f3e995-goog


2023-07-22 01:43:57

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 5/5] KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots

When attempting to allocate a shadow root for a !visible guest root gfn,
e.g. that resides in MMIO space, load a dummy root that is backed by the
zero page instead of immediately synthesizing a triple fault shutdown
(using the zero page ensures any attempt to translate memory will generate
a !PRESENT fault and thus VM-Exit).

Unless the vCPU is racing with memslot activity, KVM will inject a page
fault due to not finding a visible slot in FNAME(walk_addr_generic), i.e.
the end result is mostly same, but critically KVM will inject a fault only
*after* KVM runs the vCPU with the bogus root.

Waiting to inject a fault until after running the vCPU fixes a bug where
KVM would bail from nested VM-Enter if L1 tried to run L2 with TDP enabled
and a !visible root. Even though a bad root will *probably* lead to
shutdown, (a) it's not guaranteed and (b) the CPU won't read the
underlying memory until after VM-Enter succeeds. E.g. if L1 runs L2 with
a VMX preemption timer value of '0', then architecturally the preemption
timer VM-Exit is guaranteed to occur before the CPU executes any
instruction, i.e. before the CPU needs to translate a GPA to a HPA (so
long as there are no injected events with higher priority than the
preemption timer).

If KVM manages to get to FNAME(fetch) with a dummy root, e.g. because
userspace created a memslot between installing the dummy root and handling
the page fault, simply unload the MMU to allocate a new root and retry the
instruction.

Reported-by: Reima Ishii <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 41 +++++++++++++++------------------
arch/x86/kvm/mmu/mmu_internal.h | 10 ++++++++
arch/x86/kvm/mmu/paging_tmpl.h | 11 +++++++++
arch/x86/kvm/mmu/spte.h | 3 +++
4 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dd8cc46551b2..20e289e872eb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3620,7 +3620,9 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
&invalid_list);

if (free_active_root) {
- if (root_to_sp(mmu->root.hpa)) {
+ if (kvm_mmu_is_dummy_root(mmu->root.hpa)) {
+ /* Nothing to cleanup for dummy roots. */
+ } else if (root_to_sp(mmu->root.hpa)) {
mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list);
} else if (mmu->pae_root) {
for (i = 0; i < 4; ++i) {
@@ -3668,19 +3670,6 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
}
EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);

-
-static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
-{
- int ret = 0;
-
- if (!kvm_vcpu_is_visible_gfn(vcpu, root_gfn)) {
- kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
- ret = 1;
- }
-
- return ret;
-}
-
static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
u8 level)
{
@@ -3818,8 +3807,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
root_pgd = kvm_mmu_get_guest_pgd(vcpu, mmu);
root_gfn = root_pgd >> PAGE_SHIFT;

- if (mmu_check_root(vcpu, root_gfn))
- return 1;
+ if (!kvm_vcpu_is_visible_gfn(vcpu, root_gfn)) {
+ mmu->root.hpa = kvm_mmu_get_dummy_root();
+ return 0;
+ }

/*
* On SVM, reading PDPTRs might access guest memory, which might fault
@@ -3831,8 +3822,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
if (!(pdptrs[i] & PT_PRESENT_MASK))
continue;

- if (mmu_check_root(vcpu, pdptrs[i] >> PAGE_SHIFT))
- return 1;
+ if (!kvm_vcpu_is_visible_gfn(vcpu, pdptrs[i] >> PAGE_SHIFT))
+ pdptrs[i] = 0;
}
}

@@ -3999,7 +3990,7 @@ static bool is_unsync_root(hpa_t root)
{
struct kvm_mmu_page *sp;

- if (!VALID_PAGE(root))
+ if (!VALID_PAGE(root) || kvm_mmu_is_dummy_root(root))
return false;

/*
@@ -4405,6 +4396,10 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
{
int r;

+ /* Dummy roots are used only for shadowing bad guest roots. */
+ if (WARN_ON_ONCE(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa)))
+ return RET_PF_RETRY;
+
if (page_fault_handle_page_track(vcpu, fault))
return RET_PF_EMULATE;

@@ -4642,9 +4637,8 @@ static bool fast_pgd_switch(struct kvm *kvm, struct kvm_mmu *mmu,
gpa_t new_pgd, union kvm_mmu_page_role new_role)
{
/*
- * For now, limit the caching to 64-bit hosts+VMs in order to avoid
- * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs
- * later if necessary.
+ * Limit reuse to 64-bit hosts+VMs without "special" roots in order to
+ * avoid having to deal with PDPTEs and other complexities.
*/
if (VALID_PAGE(mmu->root.hpa) && !root_to_sp(mmu->root.hpa))
kvm_mmu_free_roots(kvm, mmu, KVM_MMU_ROOT_CURRENT);
@@ -5561,7 +5555,8 @@ static bool is_obsolete_root(struct kvm *kvm, hpa_t root_hpa)
* positives and free roots that don't strictly need to be freed, but
* such false positives are relatively rare:
*
- * (a) only PAE paging and nested NPT has roots without shadow pages
+ * (a) only PAE paging and nested NPT have roots without shadow pages
+ * (or any shadow paging flavor with a dummy root)
* (b) remote reloads due to a memslot update obsoletes _all_ roots
* (c) KVM doesn't track previous roots for PAE paging, and the guest
* is unlikely to zap an in-use PGD.
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d39af5639ce9..3ca986450393 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -44,6 +44,16 @@ extern bool dbg;
#define INVALID_PAE_ROOT 0
#define IS_VALID_PAE_ROOT(x) (!!(x))

+static inline hpa_t kvm_mmu_get_dummy_root(void)
+{
+ return my_zero_pfn(0) << PAGE_SHIFT;
+}
+
+static inline bool kvm_mmu_is_dummy_root(hpa_t shadow_page)
+{
+ return is_zero_pfn(shadow_page >> PAGE_SHIFT);
+}
+
typedef u64 __rcu *tdp_ptep_t;

struct kvm_mmu_page {
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 122bfc0124d3..e9d4d7b66111 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -646,6 +646,17 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
goto out_gpte_changed;

+ /*
+ * Load a new root and retry the faulting instruction in the extremely
+ * unlikely scenario that the guest root gfn became visible between
+ * loading a dummy root and handling the resulting page fault, e.g. if
+ * userspace create a memslot in the interim.
+ */
+ if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) {
+ kvm_mmu_unload(vcpu);
+ goto out_gpte_changed;
+ }
+
for_each_shadow_entry(vcpu, fault->addr, it) {
gfn_t table_gfn;

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 9f8e8cda89e8..ac8ad12f9698 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -238,6 +238,9 @@ static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)

static inline struct kvm_mmu_page *root_to_sp(hpa_t root)
{
+ if (kvm_mmu_is_dummy_root(root))
+ return NULL;
+
/*
* The "root" may be a special root, e.g. a PAE entry, treat it as a
* SPTE to ensure any non-PA bits are dropped.
--
2.41.0.487.g6d72f3e995-goog


2023-07-25 11:52:52

by Yu Zhang

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots

> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 122bfc0124d3..e9d4d7b66111 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -646,6 +646,17 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> goto out_gpte_changed;
>
> + /*
> + * Load a new root and retry the faulting instruction in the extremely
> + * unlikely scenario that the guest root gfn became visible between
> + * loading a dummy root and handling the resulting page fault, e.g. if
> + * userspace create a memslot in the interim.
> + */
> + if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) {
> + kvm_mmu_unload(vcpu);

Do we really need a kvm_mmu_unload()? Could we just set
vcpu->arch.mmu->root.hpa to INVALID_PAGE here?

> + goto out_gpte_changed;
> + }
> +
> for_each_shadow_entry(vcpu, fault->addr, it) {
> gfn_t table_gfn;

B.R.
Yu

2023-07-25 17:14:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots

On Tue, Jul 25, 2023, Yu Zhang wrote:
> > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> > index 122bfc0124d3..e9d4d7b66111 100644
> > --- a/arch/x86/kvm/mmu/paging_tmpl.h
> > +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> > @@ -646,6 +646,17 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> > if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> > goto out_gpte_changed;
> >
> > + /*
> > + * Load a new root and retry the faulting instruction in the extremely
> > + * unlikely scenario that the guest root gfn became visible between
> > + * loading a dummy root and handling the resulting page fault, e.g. if
> > + * userspace create a memslot in the interim.
> > + */
> > + if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) {
> > + kvm_mmu_unload(vcpu);
>
> Do we really need a kvm_mmu_unload()? Could we just set
> vcpu->arch.mmu->root.hpa to INVALID_PAGE here?

Oof, yeah. Not only is a full unload overkill, if this code were hit it would
lead to deadlock because kvm_mmu_free_roots() expects to be called *without*
mmu_lock held.

Hmm, but I don't love the idea of open coding a free/reset of the current root.
I'm leaning toward

kvm_make_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu);

since it's conceptually similar to KVM unloading roots when a memslot is deleted
or moved, just reversed. That would obviously tie this code to KVM's handling of
the dummy root just as much as manually invalidating root.hpa (probably more so),
but that might actually be a good thing because then the rule for the dummy root
is that it's always considered obsolete (when checked), and that could be
explicitly documented in is_obsolete_root().

2023-07-26 10:43:42

by Yu Zhang

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots

On Tue, Jul 25, 2023 at 08:53:19AM -0700, Sean Christopherson wrote:
> On Tue, Jul 25, 2023, Yu Zhang wrote:
> > > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> > > index 122bfc0124d3..e9d4d7b66111 100644
> > > --- a/arch/x86/kvm/mmu/paging_tmpl.h
> > > +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> > > @@ -646,6 +646,17 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> > > if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> > > goto out_gpte_changed;
> > >
> > > + /*
> > > + * Load a new root and retry the faulting instruction in the extremely
> > > + * unlikely scenario that the guest root gfn became visible between
> > > + * loading a dummy root and handling the resulting page fault, e.g. if
> > > + * userspace create a memslot in the interim.
> > > + */
> > > + if (unlikely(kvm_mmu_is_dummy_root(vcpu->arch.mmu->root.hpa))) {
> > > + kvm_mmu_unload(vcpu);
> >
> > Do we really need a kvm_mmu_unload()? Could we just set
> > vcpu->arch.mmu->root.hpa to INVALID_PAGE here?
>
> Oof, yeah. Not only is a full unload overkill, if this code were hit it would
> lead to deadlock because kvm_mmu_free_roots() expects to be called *without*
> mmu_lock held.
>
> Hmm, but I don't love the idea of open coding a free/reset of the current root.
> I'm leaning toward
>
> kvm_make_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu);
>
> since it's conceptually similar to KVM unloading roots when a memslot is deleted
> or moved, just reversed. That would obviously tie this code to KVM's handling of
> the dummy root just as much as manually invalidating root.hpa (probably more so),
> but that might actually be a good thing because then the rule for the dummy root
> is that it's always considered obsolete (when checked), and that could be
> explicitly documented in is_obsolete_root().
>

Oh, right. KVM_REQ_MMU_FREE_OBSOLETE_ROOTS should work. Thanks!

B.R.
Yu