LinuxLists.cc - [PATCH] x86/kvm/mmu: Remove unnecessary ‘NULL’ values from sptep

2023-09-01 15:13:32

Subject: [PATCH] x86/kvm/mmu: Remove unnecessary ‘NULL’ values from sptep

sptep is assigned first, so it does not need to initialize the assignment.

Signed-off-by: Li zeming <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ec169f5c7dce..95f745aec4aa 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3457,7 +3457,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
struct kvm_mmu_page *sp;
int ret = RET_PF_INVALID;
u64 spte = 0ull;
- u64 *sptep = NULL;
+ u64 *sptep;
uint retry_count = 0;

if (!page_fault_can_be_fast(fault))
--
2.18.2

2023-09-05 16:23:40

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH] x86/kvm/mmu: Remove unnecessary ‘NULL’ values from sptep

On Sun, Sep 03, 2023, Li zeming wrote:
> sptep is assigned first, so it does not need to initialize the assignment.
>
> Signed-off-by: Li zeming <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index ec169f5c7dce..95f745aec4aa 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3457,7 +3457,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> struct kvm_mmu_page *sp;
> int ret = RET_PF_INVALID;
> u64 spte = 0ull;
> - u64 *sptep = NULL;
> + u64 *sptep;
> uint retry_count = 0;
>
> if (!page_fault_can_be_fast(fault))

Hmm, this is safe, but there's some ugliness lurking. Theoretically, it's possible
for spte to be left untouched by the walkers. That _shouldn't_ happen, as it means
there's a bug somewhere in KVM. But if that did happen, on the second or later
iteration, it's (again, theoretically) possible to consume a stale spte.

if (tdp_mmu_enabled)
sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
else
sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);

if (!is_shadow_present_pte(spte)) <=== could consume stale data
break;

If we're going to tidy up sptep, I think we should also give spte similar treatment
and harden KVM in the process, e.g.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6325bb3e8c2b..ae2f87bbbf0a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3430,8 +3430,8 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_mmu_page *sp;
int ret = RET_PF_INVALID;
- u64 spte = 0ull;
- u64 *sptep = NULL;
+ u64 spte;
+ u64 *sptep;
uint retry_count = 0;

if (!page_fault_can_be_fast(fault))
@@ -3447,6 +3447,14 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
else
sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);

+ /*
+ * It's entirely possible for the mapping to have been zapped
+ * by a different task, but the root page is should always be
+ * available as the vCPU holds a reference to its root(s).
+ */
+ if (WARN_ON_ONCE(!sptep))
+ spte = REMOVED_SPTE;
+
if (!is_shadow_present_pte(spte))
break;

2023-09-05 16:24:43

by Kai Huang

[permalink] [raw]

Subject: Re: [PATCH] x86/kvm/mmu: Remove unnecessary ‘NULL’ values from sptep

On Fri, 2023-09-01 at 09:48 -0700, Sean Christopherson wrote:
> On Sun, Sep 03, 2023, Li zeming wrote:
> > sptep is assigned first, so it does not need to initialize the assignment.
> >
> > Signed-off-by: Li zeming <[email protected]>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index ec169f5c7dce..95f745aec4aa 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3457,7 +3457,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > struct kvm_mmu_page *sp;
> > int ret = RET_PF_INVALID;
> > u64 spte = 0ull;
> > - u64 *sptep = NULL;
> > + u64 *sptep;
> > uint retry_count = 0;
> >
> > if (!page_fault_can_be_fast(fault))
>
> Hmm, this is safe, but there's some ugliness lurking. Theoretically, it's possible
> for spte to be left untouched by the walkers. That _shouldn't_ happen, as it means
> there's a bug somewhere in KVM. But if that did happen, on the second or later
> iteration, it's (again, theoretically) possible to consume a stale spte.
>
> if (tdp_mmu_enabled)
> sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
> else
> sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
>
> if (!is_shadow_present_pte(spte)) <=== could consume stale data
> break;
>
> If we're going to tidy up sptep, I think we should also give spte similar treatment
> and harden KVM in the process, e.g.
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6325bb3e8c2b..ae2f87bbbf0a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3430,8 +3430,8 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> {
> struct kvm_mmu_page *sp;
> int ret = RET_PF_INVALID;
> - u64 spte = 0ull;
> - u64 *sptep = NULL;
> + u64 spte;
> + u64 *sptep;
> uint retry_count = 0;
>
> if (!page_fault_can_be_fast(fault))
> @@ -3447,6 +3447,14 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> else
> sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte);
>
> + /*
> + * It's entirely possible for the mapping to have been zapped
> + * by a different task, but the root page is should always be
> + * available as the vCPU holds a reference to its root(s).
> + */
> + if (WARN_ON_ONCE(!sptep))
> + spte = REMOVED_SPTE;

If I recall correctly, REMOVED_SPTE is only used by TDP MMU code. Should we use
0 (or initial SPTE value for case like TDX) instead of REMOVED_SPTE?

And I agree this code is more error proof (although theoretically for now).

> +
> if (!is_shadow_present_pte(spte))
> break;
>
>