2021-08-12 15:15:15

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next()

From: Lai Jiangshan <[email protected]>

So far, the loop bodies already ensure the pte is present before calling
__shadow_walk_next(). But checking pte present in __shadow_walk_next()
is a more prudent way of programing and loop bodies will not need
to always check it. It allows us removing is_shadow_present_pte()
in the loop bodies.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a272ccbddfa1..c48ecb25d5f8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2231,7 +2231,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
u64 spte)
{
- if (is_last_spte(spte, iterator->level)) {
+ if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
iterator->level = 0;
return;
}
--
2.19.1.6.gb485710b


2021-08-12 15:17:31

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/2] KVM: X86: Remove the present check from for_each_shadow_entry* loop body

From: Lai Jiangshan <[email protected]>

The function __shadow_walk_next() for the for_each_shadow_entry* looping
has the check and propagates the result to shadow_walk_okay().

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c48ecb25d5f8..42eebba6782e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3152,9 +3152,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
sptep = iterator.sptep;
*spte = old_spte;
-
- if (!is_shadow_present_pte(old_spte))
- break;
}

return sptep;
@@ -3694,9 +3691,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
spte = mmu_spte_get_lockless(iterator.sptep);

sptes[leaf] = spte;
-
- if (!is_shadow_present_pte(spte))
- break;
}

return leaf;
@@ -3811,11 +3805,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
u64 spte;

walk_shadow_page_lockless_begin(vcpu);
- for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+ for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
clear_sp_write_flooding_count(iterator.sptep);
- if (!is_shadow_present_pte(spte))
- break;
- }
walk_shadow_page_lockless_end(vcpu);
}

--
2.19.1.6.gb485710b

2021-08-12 16:06:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next()

On Thu, Aug 12, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> So far, the loop bodies already ensure the pte is present before calling
> __shadow_walk_next(). But checking pte present in __shadow_walk_next()
> is a more prudent way of programing and loop bodies will not need
> to always check it. It allows us removing is_shadow_present_pte()
> in the loop bodies.

There needs to be more analysis in the changelog, as there are many more callers
to __shadow_walk_next() than the three that are modified in the next patch. It
might even make sense to squash the two patches together, i.e. make it a "move"
instead of an "add + remove", and then explicitly explain why it's ok to add the
check for paths that do _not_ currently have a !is_shadow_present_pte() in the
loop body.

Specifically, FNAME(fetch) via shadow_walk_next() and __direct_map() via
for_each_shadow_entry() do not currently terminate their walks with a !PRESENT,
but they get away with it because they install present non-leaf SPTEs in the loop
itself.

The other argument for the audit (changelog+patch) of all users is that the next
patch misses FNAME(invlpg), e.g.

@@ -977,7 +980,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
FNAME(update_pte)(vcpu, sp, sptep, &gpte);
}

- if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
+ if (!sp->unsync_children)
break;
}
write_unlock(&vcpu->kvm->mmu_lock);

It would also be worthwhile to document via the changelog that terminating on
!is_shadow_present_pte() is 100% the correct behavior, as walking past a !PRESENT
SPTE would lead to attempting to read a the next level SPTE from a garbage
iter->shadow_addr.

And for clarity and safety, I think it would be worth adding the patch below as
a prep patch to document and enforce that walking the non-leaf SPTEs when faulting
in a page should never terminate early.


From 1c202a7e82b1931e4eb37b23aa9d7108340a6cd2 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <[email protected]>
Date: Thu, 12 Aug 2021 08:38:35 -0700
Subject: [PATCH] KVM: x86/mmu: Verify shadow walk doesn't terminate early in
page faults

WARN and bail if the shadow walk for faulting in a SPTE terminates early,
i.e. doesn't reach the expected level because the walk encountered a
terminal SPTE. The shadow walks for page faults are subtle in that they
install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
body, and consume the newly created non-leaf SPTE in the loop control,
e.g. __shadow_walk_next(). In other words, the walks guarantee that the
walk will stop if and only if the target level is reached by installing
non-leaf SPTEs to guarantee the walk remains valid.

Opportunistically use fault->goal-level instead of it.level in
FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
the target level.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +++
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++--
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a272ccbddfa1..2a243ae1d64c 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2992,6 +2992,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
account_huge_nx_page(vcpu->kvm, sp);
}

+ if (WARN_ON_ONCE(it.level != fault->goal_level))
+ return -EFAULT;
+
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
fault->write, fault->goal_level, base_gfn, fault->pfn,
fault->prefault, fault->map_writable);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f70afecbf3a2..3a8a7b2f9979 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
}
}

+ if (WARN_ON_ONCE(it.level != fault->goal_level))
+ return -EFAULT;
+
ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
- it.level, base_gfn, fault->pfn, fault->prefault,
- fault->map_writable);
+ fault->goal_level, base_gfn, fault->pfn,
+ fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;

--
2.33.0.rc1.237.g0d66db33f3-goog

2021-08-13 14:11:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()

From: Lai Jiangshan <[email protected]>

So far, the loop bodies already ensure the PTE is present before calling
__shadow_walk_next(): Some loop bodies simply exit with a !PRESENT
directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
do not currently terminate their walks with a !PRESENT, but they get away
with it because they install present non-leaf SPTEs in the loop itself.

But checking pte present in __shadow_walk_next() is a more prudent way of
programing and loop bodies will not need to always check it. It allows us
removing unneded is_shadow_present_pte() in the loop bodies.

Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
walking past a !PRESENT SPTE would lead to attempting to read a the next
level SPTE from a garbage iter->shadow_addr. Even some paths that do _not_
currently have a !is_shadow_present_pte() in the loop body is Ok since
they will install present non-leaf SPTEs and the additinal present check
is just an NOP.

The checking result in __shadow_walk_next() will be propagated to
shadow_walk_okay() for being used in any for(;;) loop.

Signed-off-by: Lai Jiangshan <[email protected]>
---
Changed from V1:
Merge the two patches
Update changelog
Remove !is_shadow_present_pte() in FNAME(invlpg)
arch/x86/kvm/mmu/mmu.c | 13 ++-----------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a272ccbddfa1..42eebba6782e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2231,7 +2231,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
u64 spte)
{
- if (is_last_spte(spte, iterator->level)) {
+ if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
iterator->level = 0;
return;
}
@@ -3152,9 +3152,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
sptep = iterator.sptep;
*spte = old_spte;
-
- if (!is_shadow_present_pte(old_spte))
- break;
}

return sptep;
@@ -3694,9 +3691,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
spte = mmu_spte_get_lockless(iterator.sptep);

sptes[leaf] = spte;
-
- if (!is_shadow_present_pte(spte))
- break;
}

return leaf;
@@ -3811,11 +3805,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
u64 spte;

walk_shadow_page_lockless_begin(vcpu);
- for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+ for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
clear_sp_write_flooding_count(iterator.sptep);
- if (!is_shadow_present_pte(spte))
- break;
- }
walk_shadow_page_lockless_end(vcpu);
}

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f70afecbf3a2..13138b03cc69 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -977,7 +977,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
FNAME(update_pte)(vcpu, sp, sptep, &gpte);
}

- if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
+ if (!sp->unsync_children)
break;
}
write_unlock(&vcpu->kvm->mmu_lock);
--
2.19.1.6.gb485710b

2021-08-14 09:49:22

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: X86: Check pte present first in __shadow_walk_next()



On 2021/8/13 00:03, Sean Christopherson wrote:

>
> And for clarity and safety, I think it would be worth adding the patch below as
> a prep patch to document and enforce that walking the non-leaf SPTEs when faulting
> in a page should never terminate early.

I'v sent V2 patch which was updated as you suggested.
The patch you provided below doesn't need to be updated. It and my V2 patch
do not depend on each other, so I didn't resent your patch with mine together.

For your patch

Acked-by: Lai Jiangshan <[email protected]>

And it can be queued earlier.

>
>
> From 1c202a7e82b1931e4eb37b23aa9d7108340a6cd2 Mon Sep 17 00:00:00 2001
> From: Sean Christopherson <[email protected]>
> Date: Thu, 12 Aug 2021 08:38:35 -0700
> Subject: [PATCH] KVM: x86/mmu: Verify shadow walk doesn't terminate early in
> page faults
>
> WARN and bail if the shadow walk for faulting in a SPTE terminates early,
> i.e. doesn't reach the expected level because the walk encountered a
> terminal SPTE. The shadow walks for page faults are subtle in that they
> install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
> body, and consume the newly created non-leaf SPTE in the loop control,
> e.g. __shadow_walk_next(). In other words, the walks guarantee that the
> walk will stop if and only if the target level is reached by installing
> non-leaf SPTEs to guarantee the walk remains valid.
>
> Opportunistically use fault->goal-level instead of it.level in
> FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
> the target level.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 3 +++
> arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++--
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..2a243ae1d64c 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2992,6 +2992,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> account_huge_nx_page(vcpu->kvm, sp);
> }
>
> + if (WARN_ON_ONCE(it.level != fault->goal_level))
> + return -EFAULT;
> +
> ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
> fault->write, fault->goal_level, base_gfn, fault->pfn,
> fault->prefault, fault->map_writable);
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index f70afecbf3a2..3a8a7b2f9979 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> }
> }
>
> + if (WARN_ON_ONCE(it.level != fault->goal_level))
> + return -EFAULT;
> +
> ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
> - it.level, base_gfn, fault->pfn, fault->prefault,
> - fault->map_writable);
> + fault->goal_level, base_gfn, fault->pfn,
> + fault->prefault, fault->map_writable);
> if (ret == RET_PF_SPURIOUS)
> return ret;
>
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>

2021-08-24 08:36:19

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()

Hello, Paolo

Could you have a review please.

Thanks
Lai

On Fri, Aug 13, 2021 at 9:01 PM Lai Jiangshan <[email protected]> wrote:
>
> From: Lai Jiangshan <[email protected]>
>
> So far, the loop bodies already ensure the PTE is present before calling
> __shadow_walk_next(): Some loop bodies simply exit with a !PRESENT
> directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
> do not currently terminate their walks with a !PRESENT, but they get away
> with it because they install present non-leaf SPTEs in the loop itself.
>
> But checking pte present in __shadow_walk_next() is a more prudent way of
> programing and loop bodies will not need to always check it. It allows us
> removing unneded is_shadow_present_pte() in the loop bodies.
>
> Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
> walking past a !PRESENT SPTE would lead to attempting to read a the next
> level SPTE from a garbage iter->shadow_addr. Even some paths that do _not_
> currently have a !is_shadow_present_pte() in the loop body is Ok since
> they will install present non-leaf SPTEs and the additinal present check
> is just an NOP.
>
> The checking result in __shadow_walk_next() will be propagated to
> shadow_walk_okay() for being used in any for(;;) loop.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> Changed from V1:
> Merge the two patches
> Update changelog
> Remove !is_shadow_present_pte() in FNAME(invlpg)
> arch/x86/kvm/mmu/mmu.c | 13 ++-----------
> arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> 2 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..42eebba6782e 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2231,7 +2231,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
> static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
> u64 spte)
> {
> - if (is_last_spte(spte, iterator->level)) {
> + if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
> iterator->level = 0;
> return;
> }
> @@ -3152,9 +3152,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
> for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
> sptep = iterator.sptep;
> *spte = old_spte;
> -
> - if (!is_shadow_present_pte(old_spte))
> - break;
> }
>
> return sptep;
> @@ -3694,9 +3691,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
> spte = mmu_spte_get_lockless(iterator.sptep);
>
> sptes[leaf] = spte;
> -
> - if (!is_shadow_present_pte(spte))
> - break;
> }
>
> return leaf;
> @@ -3811,11 +3805,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
> u64 spte;
>
> walk_shadow_page_lockless_begin(vcpu);
> - for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
> + for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
> clear_sp_write_flooding_count(iterator.sptep);
> - if (!is_shadow_present_pte(spte))
> - break;
> - }
> walk_shadow_page_lockless_end(vcpu);
> }
>
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index f70afecbf3a2..13138b03cc69 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -977,7 +977,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
> FNAME(update_pte)(vcpu, sp, sptep, &gpte);
> }
>
> - if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
> + if (!sp->unsync_children)
> break;
> }
> write_unlock(&vcpu->kvm->mmu_lock);
> --
> 2.19.1.6.gb485710b
>

2021-09-02 20:58:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH V2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()

On Fri, Aug 13, 2021, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> So far, the loop bodies already ensure the PTE is present before calling
> __shadow_walk_next(): Some loop bodies simply exit with a !PRESENT
> directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
> do not currently terminate their walks with a !PRESENT, but they get away
> with it because they install present non-leaf SPTEs in the loop itself.
>
> But checking pte present in __shadow_walk_next() is a more prudent way of
> programing and loop bodies will not need to always check it. It allows us
> removing unneded is_shadow_present_pte() in the loop bodies.
^^^^^^^
unneeded

>
> Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
> walking past a !PRESENT SPTE would lead to attempting to read a the next
> level SPTE from a garbage iter->shadow_addr. Even some paths that do _not_
> currently have a !is_shadow_present_pte() in the loop body is Ok since
> they will install present non-leaf SPTEs and the additinal present check
^^^^^^^^^
additional
> is just an NOP.
>
> The checking result in __shadow_walk_next() will be propagated to
> shadow_walk_okay() for being used in any for(;;) loop.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---

Nits aside,

Reviewed-by: Sean Christopherson <[email protected]>

2021-09-06 12:29:50

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults

From: Sean Christopherson <[email protected]>

WARN and bail if the shadow walk for faulting in a SPTE terminates early,
i.e. doesn't reach the expected level because the walk encountered a
terminal SPTE. The shadow walks for page faults are subtle in that they
install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
body, and consume the newly created non-leaf SPTE in the loop control,
e.g. __shadow_walk_next(). In other words, the walks guarantee that the
walk will stop if and only if the target level is reached by installing
non-leaf SPTEs to guarantee the walk remains valid.

Opportunistically use fault->goal-level instead of it.level in
FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
the target level.

Reviewed-by: Lai Jiangshan <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +++
arch/x86/kvm/mmu/paging_tmpl.h | 7 +++++--
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4853c033e6ce..538be037549d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3006,6 +3006,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
account_huge_nx_page(vcpu->kvm, sp);
}

+ if (WARN_ON_ONCE(it.level != fault->goal_level))
+ return -EFAULT;
+
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
fault->write, fault->goal_level, base_gfn, fault->pfn,
fault->prefault, fault->map_writable);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50ade6450ace..4d559d2d4d66 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -749,9 +749,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
}
}

+ if (WARN_ON_ONCE(it.level != fault->goal_level))
+ return -EFAULT;
+
ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, fault->write,
- it.level, base_gfn, fault->pfn, fault->prefault,
- fault->map_writable);
+ fault->goal_level, base_gfn, fault->pfn,
+ fault->prefault, fault->map_writable);
if (ret == RET_PF_SPURIOUS)
return ret;

--
2.19.1.6.gb485710b

2021-09-06 12:30:08

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()

From: Lai Jiangshan <[email protected]>

So far, the loop bodies already ensure the PTE is present before calling
__shadow_walk_next(): Some loop bodies simply exit with a !PRESENT
directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
do not currently terminate their walks with a !PRESENT, but they get away
with it because they install present non-leaf SPTEs in the loop itself.

But checking pte present in __shadow_walk_next() is a more prudent way of
programing and loop bodies will not need to always check it. It allows us
removing unneeded is_shadow_present_pte() in the loop bodies.

Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
walking past a !PRESENT SPTE would lead to attempting to read a the next
level SPTE from a garbage iter->shadow_addr. Even some paths that do _not_
currently have a !is_shadow_present_pte() in the loop body is Ok since
they will install present non-leaf SPTEs and the additional present check
is just an NOP.

The checking result in __shadow_walk_next() will be propagated to
shadow_walk_okay() for being used in any for(;;) loop.

Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
Changed from V2:
Fix typo in the changelog reported by Sean
Add Reviewed-by from Sean
Changed from V1:
Merge the two patches
Update changelog
Remove !is_shadow_present_pte() in FNAME(invlpg)
arch/x86/kvm/mmu/mmu.c | 13 ++-----------
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 538be037549d..26f6bd238a77 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2223,7 +2223,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
u64 spte)
{
- if (is_last_spte(spte, iterator->level)) {
+ if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
iterator->level = 0;
return;
}
@@ -3159,9 +3159,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
sptep = iterator.sptep;
*spte = old_spte;
-
- if (!is_shadow_present_pte(old_spte))
- break;
}

return sptep;
@@ -3721,9 +3718,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
spte = mmu_spte_get_lockless(iterator.sptep);

sptes[leaf] = spte;
-
- if (!is_shadow_present_pte(spte))
- break;
}

return leaf;
@@ -3838,11 +3832,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
u64 spte;

walk_shadow_page_lockless_begin(vcpu);
- for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
+ for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
clear_sp_write_flooding_count(iterator.sptep);
- if (!is_shadow_present_pte(spte))
- break;
- }
walk_shadow_page_lockless_end(vcpu);
}

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4d559d2d4d66..72f358613786 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -982,7 +982,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
FNAME(update_pte)(vcpu, sp, sptep, &gpte);
}

- if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
+ if (!sp->unsync_children)
break;
}
write_unlock(&vcpu->kvm->mmu_lock);
--
2.19.1.6.gb485710b

2021-09-24 09:38:47

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V3 1/2] KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults

On 06/09/21 14:25, Lai Jiangshan wrote:
> Opportunistically use fault->goal-level instead of it.level in
> FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
> the target level.

This argument will go away when mmu_set_spte starts fishing out the
level from the kvm_mmu_page role, but it makes a good point for now.

Paolo

2021-09-24 09:58:48

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH V3 2/2] KVM: X86: Move PTE present check from loop body to __shadow_walk_next()

On 06/09/21 14:25, Lai Jiangshan wrote:
> But checking pte present in __shadow_walk_next() is a more prudent way of
> programing and loop bodies will not need to always check it. It allows us
> removing unneeded is_shadow_present_pte() in the loop bodies.
>
> Terminating on !is_shadow_present_pte() is 100% the correct behavior, as
> walking past a !PRESENT SPTE would lead to attempting to read a the next
> level SPTE from a garbage iter->shadow_addr. Even some paths that do_not_
> currently have a !is_shadow_present_pte() in the loop body is Ok since
> they will install present non-leaf SPTEs and the additional present check
> is just an NOP.
>
> The checking result in __shadow_walk_next() will be propagated to
> shadow_walk_okay() for being used in any for(;;) loop.
>
> Reviewed-by: Sean Christopherson<[email protected]>
> Signed-off-by: Lai Jiangshan<[email protected]>
> ---
> Changed from V2:
> Fix typo in the changelog reported by Sean
> Add Reviewed-by from Sean
> Changed from V1:
> Merge the two patches
> Update changelog
> Remove !is_shadow_present_pte() in FNAME(invlpg)
> arch/x86/kvm/mmu/mmu.c | 13 ++-----------
> arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
> 2 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 538be037549d..26f6bd238a77 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2223,7 +2223,7 @@ static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
> static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator,
> u64 spte)
> {
> - if (is_last_spte(spte, iterator->level)) {
> + if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) {
> iterator->level = 0;
> return;
> }
> @@ -3159,9 +3159,6 @@ static u64 *fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, gpa_t gpa, u64 *spte)
> for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) {
> sptep = iterator.sptep;
> *spte = old_spte;
> -
> - if (!is_shadow_present_pte(old_spte))
> - break;
> }
>
> return sptep;
> @@ -3721,9 +3718,6 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
> spte = mmu_spte_get_lockless(iterator.sptep);
>
> sptes[leaf] = spte;
> -
> - if (!is_shadow_present_pte(spte))
> - break;
> }
>
> return leaf;
> @@ -3838,11 +3832,8 @@ static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr)
> u64 spte;
>
> walk_shadow_page_lockless_begin(vcpu);
> - for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) {
> + for_each_shadow_entry_lockless(vcpu, addr, iterator, spte)
> clear_sp_write_flooding_count(iterator.sptep);
> - if (!is_shadow_present_pte(spte))
> - break;
> - }
> walk_shadow_page_lockless_end(vcpu);
> }
>
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index 4d559d2d4d66..72f358613786 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -982,7 +982,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
> FNAME(update_pte)(vcpu, sp, sptep, &gpte);
> }
>
> - if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
> + if (!sp->unsync_children)

Queued both, thanks.

Paolo