2020-09-23 22:08:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/4] KVM: x86/mmu: Page fault handling cleanups

Cleanups for page fault handling that were encountered during early TDX
enabling, but are worthwhile on their own. Specifically, patch 4 fixes an
issue where KVM doesn't detect a spurious page fault (due to the fault
being fixed by a different pCPU+vCPU) and does the full gamut of writing
the SPTE, updating stats, and prefetching SPTEs.

Sean Christopherson (4):
KVM: x86/mmu: Return -EIO if page fault returns RET_PF_INVALID
KVM: x86/mmu: Invert RET_PF_* check when falling through to emulation
KVM: x86/mmu: Return unique RET_PF_* values if the fault was fixed
KVM: x86/mmu: Bail early from final #PF handling on spurious faults

arch/x86/kvm/mmu/mmu.c | 70 +++++++++++++++++++++-------------
arch/x86/kvm/mmu/mmutrace.h | 13 +++----
arch/x86/kvm/mmu/paging_tmpl.h | 3 ++
3 files changed, 52 insertions(+), 34 deletions(-)

--
2.28.0


2020-09-23 22:08:34

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 4/4] KVM: x86/mmu: Bail early from final #PF handling on spurious faults

Detect spurious page faults, e.g. page faults that occur when multiple
vCPUs simultaneously access a not-present page, and skip the SPTE write,
prefetch, and stats update for spurious faults.

Note, the performance benefits of skipping the write and prefetch are
likely negligible, and the false positive stats adjustment is probably
lost in the noise. The primary motivation is to play nice with TDX's
SEPT in the long term. SEAMCALLs (to program SEPT entries) are quite
costly, e.g. thousands of cycles, and a spurious SEPT update will result
in a SEAMCALL error (which KVM will ideally treat as fatal).

Reported-by: Kai Huang <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 17 ++++++++++++++++-
arch/x86/kvm/mmu/paging_tmpl.h | 3 +++
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9253bcecbfe3..9bd657a8e78b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2973,6 +2973,7 @@ static bool kvm_is_mmio_pfn(kvm_pfn_t pfn)
/* Bits which may be returned by set_spte() */
#define SET_SPTE_WRITE_PROTECTED_PT BIT(0)
#define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
+#define SET_SPTE_SPURIOUS BIT(2)

static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
unsigned int pte_access, int level,
@@ -3061,7 +3062,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte = mark_spte_for_access_track(spte);

set_pte:
- if (mmu_spte_update(sptep, spte))
+ if (*sptep == spte)
+ ret |= SET_SPTE_SPURIOUS;
+ else if (mmu_spte_update(sptep, spte))
ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH;
return ret;
}
@@ -3116,6 +3119,15 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (unlikely(is_mmio_spte(*sptep)))
ret = RET_PF_EMULATE;

+ /*
+ * The fault is fully spurious if and only if the new SPTE and old SPTE
+ * are identical, and emulation is not required.
+ */
+ if ((set_spte_ret & SET_SPTE_SPURIOUS) && ret == RET_PF_FIXED) {
+ WARN_ON_ONCE(!was_rmapped);
+ return RET_PF_SPURIOUS;
+ }
+
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
trace_kvm_mmu_set_spte(level, gfn, sptep);
if (!was_rmapped && is_large_pte(*sptep))
@@ -3352,6 +3364,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL,
write, level, base_gfn, pfn, prefault,
map_writable);
+ if (ret == RET_PF_SPURIOUS)
+ return ret;
+
direct_pte_prefetch(vcpu, it.sptep);
++vcpu->stat.pf_fixed;
return ret;
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 4dd6b1e5b8cf..dda09c27ff25 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -711,6 +711,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr,

ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault,
it.level, base_gfn, pfn, prefault, map_writable);
+ if (ret == RET_PF_SPURIOUS)
+ return ret;
+
FNAME(pte_prefetch)(vcpu, gw, it.sptep);
++vcpu->stat.pf_fixed;
return ret;
--
2.28.0

2020-09-25 20:40:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/4] KVM: x86/mmu: Page fault handling cleanups

On 24/09/20 00:04, Sean Christopherson wrote:
> Cleanups for page fault handling that were encountered during early TDX
> enabling, but are worthwhile on their own. Specifically, patch 4 fixes an
> issue where KVM doesn't detect a spurious page fault (due to the fault
> being fixed by a different pCPU+vCPU) and does the full gamut of writing
> the SPTE, updating stats, and prefetching SPTEs.
>
> Sean Christopherson (4):
> KVM: x86/mmu: Return -EIO if page fault returns RET_PF_INVALID
> KVM: x86/mmu: Invert RET_PF_* check when falling through to emulation
> KVM: x86/mmu: Return unique RET_PF_* values if the fault was fixed
> KVM: x86/mmu: Bail early from final #PF handling on spurious faults
>
> arch/x86/kvm/mmu/mmu.c | 70 +++++++++++++++++++++-------------
> arch/x86/kvm/mmu/mmutrace.h | 13 +++----
> arch/x86/kvm/mmu/paging_tmpl.h | 3 ++
> 3 files changed, 52 insertions(+), 34 deletions(-)
>

Queued, thanks. Looking at the KVM_BUG_ON now since patch 1 is somewhat
related.

Paolo

2020-09-25 20:49:48

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/4] KVM: x86/mmu: Page fault handling cleanups

On Fri, Sep 25, 2020 at 10:38:02PM +0200, Paolo Bonzini wrote:
> On 24/09/20 00:04, Sean Christopherson wrote:
> > Cleanups for page fault handling that were encountered during early TDX
> > enabling, but are worthwhile on their own. Specifically, patch 4 fixes an
> > issue where KVM doesn't detect a spurious page fault (due to the fault
> > being fixed by a different pCPU+vCPU) and does the full gamut of writing
> > the SPTE, updating stats, and prefetching SPTEs.
> >
> > Sean Christopherson (4):
> > KVM: x86/mmu: Return -EIO if page fault returns RET_PF_INVALID
> > KVM: x86/mmu: Invert RET_PF_* check when falling through to emulation
> > KVM: x86/mmu: Return unique RET_PF_* values if the fault was fixed
> > KVM: x86/mmu: Bail early from final #PF handling on spurious faults
> >
> > arch/x86/kvm/mmu/mmu.c | 70 +++++++++++++++++++++-------------
> > arch/x86/kvm/mmu/mmutrace.h | 13 +++----
> > arch/x86/kvm/mmu/paging_tmpl.h | 3 ++
> > 3 files changed, 52 insertions(+), 34 deletions(-)
> >
>
> Queued, thanks. Looking at the KVM_BUG_ON now since patch 1 is somewhat
> related.

Ha, very prescient of you, that's actually a KVM_BUG_ON() in my "kitchen sink"
combo of everything :-)