2024-05-07 16:18:37

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v2 00/17] KVM: x86/mmu: Page fault and MMIO cleanups

This is an updated version of the series at
https://patchew.org/linux/[email protected]/
which has been used for SEV-SNP development, taking into account
all review comments. Patch 8 is the only completely new patch
(and even then it had been posted together with the various
TDX/SNP prep series).

Here is an annotated git range-diff:

==============================================================================
KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation
- tweak commit message, add comment

@@ Commit message
Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault
triggers emulation of any kind, as KVM doesn't currently support emulating
access to guest private memory. Practically speaking, private faults and
- emulation are already mutually exclusive, but there are edge cases upon
- edge cases where KVM can return RET_PF_EMULATE, and adding one last check
- to harden against weird, unexpected combinations is inexpensive.
+ emulation are already mutually exclusive, but there are many flow that
+ can result in KVM returning RET_PF_EMULATE, and adding one last check
+ to harden against weird, unexpected combinations and/or KVM bugs is
+ inexpensive.

Suggested-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
@@ arch/x86/kvm/mmu/mmu_internal.h: static inline int kvm_mmu_do_page_fault(struct
else
r = vcpu->arch.mmu->page_fault(vcpu, &fault);

++ /*
++ * Not sure what's happening, but punt to userspace and hope that
++ * they can fix it by changing memory to shared, or they can
++ * provide a better error.
++ */
+ if (r == RET_PF_EMULATE && fault.is_private) {
++ pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
+ kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
+ return -EFAULT;
+ }


==============================================================================
KVM: x86: Remove separate "bit" defines for page fault error code masks
- do not use ilog2

@@ Commit message
just to see which flag corresponds to which bit is quite annoying, as is
having to define two macros just to add recognition of a new flag.

- Use ilog2() to derive the bit in permission_fault(), the one function that
- actually needs the bit number (it does clever shifting to manipulate flags
- in order to avoid conditional branches).
+ Use ternary operator to derive the bit in permission_fault(), the one
+ function that actually needs the bit number as part of clever shifting
+ to avoid conditional branches. Generally the compiler is able to turn
+ it into a conditional move, and if not it's not really a big deal.

No functional change intended.

@@ arch/x86/kvm/mmu.h: static inline u8 permission_fault(struct kvm_vcpu *vcpu, str
u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
- int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
-+ int index = (pfec + (not_smap << ilog2(PFERR_RSVD_MASK))) >> 1;
++ int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1;
u32 errcode = PFERR_PRESENT_MASK;
bool fault;

@@ arch/x86/kvm/mmu.h: static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+ pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;

/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
- offset = (pfec & ~1) +
+- offset = (pfec & ~1) +
- ((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));
-+ ((pte_access & PT_USER_MASK) << (ilog2(PFERR_RSVD_MASK) - PT_USER_SHIFT));
++ offset = (pfec & ~1) | ((pte_access & PT_USER_MASK) ? PFERR_RSVD_MASK : 0);

pkru_bits &= mmu->pkru_mask >> offset;
errcode |= -pkru_bits & PFERR_PK_MASK;

==============================================================================
KVM: x86: Define more SEV+ page fault error bits/flags for #NPF
- match commit message to bits defined in header file

@@ Commit message
Define more #NPF error code flags that are relevant to SEV+ (mostly SNP)
guests, as specified by the APM:

+ * Bit 31 (RMP): Set to 1 if the fault was caused due to an RMP check or a
+ VMPL check failure, 0 otherwise.
* Bit 34 (ENC): Set to 1 if the guest’s effective C-bit was 1, 0 otherwise.
* Bit 35 (SIZEM): Set to 1 if the fault was caused by a size mismatch between
PVALIDATE or RMPADJUST and the RMP, 0 otherwise.
* Bit 36 (VMPL): Set to 1 if the fault was caused by a VMPL permission
check failure, 0 otherwise.
- * Bit 37 (SSS): Set to VMPL permission mask SSS (bit 4) value if VmplSSS is
- enabled.

Note, the APM is *extremely* misleading, and strongly implies that the
above flags can _only_ be set for #NPF exits from SNP guests. That is a


==============================================================================
KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
- add a description of PFERR_IMPLICIT_ACCESS

@@ Commit message
Add a compile-time assert in the legacy #PF handler to make sure that KVM-
define flags are covered by its existing sanity check on the upper bits.

+ Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
+ are removing the comment that defined it.
+
Signed-off-by: Sean Christopherson <[email protected]>
+ Reviewed-by: Kai Huang <[email protected]>
+ Reviewed-by: Binbin Wu <[email protected]>
Message-ID: <[email protected]>
+ Signed-off-by: Paolo Bonzini <[email protected]>
+
+ ## arch/x86/include/asm/kvm_host.h ##
+@@ arch/x86/include/asm/kvm_host.h: enum x86_intercept_stage;
+ #define PFERR_GUEST_ENC_MASK BIT_ULL(34)
+ #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
+ #define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
++
++/*
++ * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
++ * when emulating instructions that triggers implicit access.
++ */
+ #define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
++#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
+
+ #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
+ PFERR_WRITE_MASK | \

## arch/x86/kvm/mmu/mmu.c ##
@@ arch/x86/kvm/mmu/mmu.c: int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
- if (WARN_ON_ONCE(error_code >> 32))
- error_code = lower_32_bits(error_code);
+ return -EFAULT;
+ #endif

+ /* Ensure the above sanity check also covers KVM-defined flags. */
+ BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));

- move in front of the other synthetic page fault error code patches

==============================================================================
KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are non-zero
commit message copy editing

@@ Commit message
and even more explicitly in the SDM as VMCS.VM_EXIT_INTR_ERROR_CODE is a
32-bit field.

- Simply drop the upper bits of hardware provides garbage, as spurious
+ Simply drop the upper bits if hardware provides garbage, as spurious
information should do no harm (though in all likelihood hardware is buggy
and the kernel is doomed).

@@ Commit message
which in turn will allow deriving PFERR_PRIVATE_ACCESS from AMD's
PFERR_GUEST_ENC_MASK without running afoul of the sanity check.

- Note, this also why Intel uses bit 15 for SGX (highest bit on Intel CPUs)
+ Note, this is also why Intel uses bit 15 for SGX (highest bit on Intel CPUs)
and AMD uses bit 31 for RMP (highest bit on AMD CPUs); using the highest
bit minimizes the probability of a collision with the "other" vendor,
without needing to plumb more bits through microcode.


==============================================================================
KVM: x86/mmu: Use synthetic page fault error code to indicate private faults
- patch reordering, no other changes

==============================================================================
KVM: x86/mmu: check for invalid async page faults involving private memory
- new patch coming from TDX/SNP prep series; test PFERR_PRIVATE_ACCESS, set arch.error_code

@@ arch/x86/kvm/mmu/mmu.c: static u32 alloc_apf_token(struct kvm_vcpu *vcpu)
arch.token = alloc_apf_token(vcpu);
- arch.gfn = gfn;
+ arch.gfn = fault->gfn;
++ arch.error_code = fault->error_code;
arch.direct_map = vcpu->arch.mmu->root_role.direct;
arch.cr3 = kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu);

@@ arch/x86/kvm/mmu/mmu.c: static u32 alloc_apf_token(struct kvm_vcpu *vcpu)
{
int r;

-+ if (WARN_ON_ONCE(work->arch.error_code & PFERR_GUEST_ENC_MASK))
++ if (WARN_ON_ONCE(work->arch.error_code & PFERR_PRIVATE_ACCESS))
+ return;
+
if ((vcpu->arch.mmu->root_role.direct != work->arch.direct_map) ||


==============================================================================
KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults
- exit to userspace if the wrong case happens, test PFERR_PRIVATE_ACCESS

@@ Commit message

## arch/x86/kvm/mmu/mmu.c ##
@@ arch/x86/kvm/mmu/mmu.c: int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
- error_code |= PFERR_PRIVATE_ACCESS;

r = RET_PF_INVALID;
-- if (unlikely(error_code & PFERR_RSVD_MASK)) {
-+ if (unlikely((error_code & PFERR_RSVD_MASK) &&
-+ !WARN_ON_ONCE(error_code & PFERR_GUEST_ENC_MASK))) {
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
++ if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
++ return -EFAULT;
++
r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
if (r == RET_PF_EMULATE)
goto emulate;

==============================================================================
KVM: x86/mmu: Move private vs. shared check above slot validity checks
- add comment about use of mmu_invalidate_seq

@@ arch/x86/kvm/mmu/mmu.c: static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, stru
return kvm_faultin_pfn_private(vcpu, fault);

@@ arch/x86/kvm/mmu/mmu.c: static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+ {
+ int ret;
+
++ /*
++ * Note that the mmu_invalidate_seq also serves to detect a concurrent
++ * change in attributes. is_page_fault_stale() will detect an
++ * invalidation relate to fault->fn and resume the guest without
++ * installing a mapping in the page tables.
++ */
fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
smp_rmb();

+ /*
-+ * Check for a private vs. shared mismatch *after* taking a snapshot of
-+ * mmu_invalidate_seq, as changes to gfn attributes are guarded by the
-+ * invalidation notifier.
++ * Now that we have a snapshot of mmu_invalidate_seq we can check for a
++ * private vs. shared mismatch.
+ */
+ if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+ kvm_mmu_prepare_memory_fault_exit(vcpu, fault);



==============================================================================
KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to kvm_faultin_pfn()
- differences in moved code, range-diff is unreadable


==============================================================================
KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()
- remove unnecessary change

@@ arch/x86/kvm/mmu/mmu.c: static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct
return kvm_handle_noslot_fault(vcpu, fault, access);

/*
-
- ## arch/x86/kvm/mmu/mmu_internal.h ##
-@@ arch/x86/kvm/mmu/mmu_internal.h: struct kvm_page_fault {
- /* The memslot containing gfn. May be NULL. */
- struct kvm_memory_slot *slot;
-
-- /* Outputs of kvm_faultin_pfn. */
-+ /* Outputs of kvm_faultin_pfn. */
- unsigned long mmu_seq;
- kvm_pfn_t pfn;
- hva_t hva;


Isaku Yamahata (1):
KVM: x86/mmu: Pass full 64-bit error code when handling page faults

Paolo Bonzini (1):
KVM: x86/mmu: check for invalid async page faults involving private
memory

Sean Christopherson (15):
KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits
emulation
KVM: x86: Remove separate "bit" defines for page fault error code
masks
KVM: x86: Define more SEV+ page fault error bits/flags for #NPF
KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler
KVM: x86/mmu: WARN if upper 32 bits of legacy #PF error code are
non-zero
KVM: x86/mmu: Use synthetic page fault error code to indicate private
faults
KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page
faults
KVM: x86/mmu: Move private vs. shared check above slot validity checks
KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC
internal slots
KVM: x86/mmu: Explicitly disallow private accesses to emulated MMIO
KVM: x86/mmu: Move slot checks from __kvm_faultin_pfn() to
kvm_faultin_pfn()
KVM: x86/mmu: Handle no-slot faults at the beginning of
kvm_faultin_pfn()
KVM: x86/mmu: Set kvm_page_fault.hva to KVM_HVA_ERR_BAD for "no slot"
faults
KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values
KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create
noslot pfns

arch/x86/include/asm/kvm_host.h | 46 ++++----
arch/x86/kvm/mmu.h | 5 +-
arch/x86/kvm/mmu/mmu.c | 182 ++++++++++++++++++++------------
arch/x86/kvm/mmu/mmu_internal.h | 28 ++++-
arch/x86/kvm/mmu/mmutrace.h | 2 +-
arch/x86/kvm/svm/svm.c | 9 ++
6 files changed, 174 insertions(+), 98 deletions(-)

--
2.43.0



2024-05-07 16:19:20

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 03/17] KVM: x86: Define more SEV+ page fault error bits/flags for #NPF

From: Sean Christopherson <[email protected]>

Define more #NPF error code flags that are relevant to SEV+ (mostly SNP)
guests, as specified by the APM:

* Bit 31 (RMP): Set to 1 if the fault was caused due to an RMP check or a
VMPL check failure, 0 otherwise.
* Bit 34 (ENC): Set to 1 if the guest’s effective C-bit was 1, 0 otherwise.
* Bit 35 (SIZEM): Set to 1 if the fault was caused by a size mismatch between
PVALIDATE or RMPADJUST and the RMP, 0 otherwise.
* Bit 36 (VMPL): Set to 1 if the fault was caused by a VMPL permission
check failure, 0 otherwise.

Note, the APM is *extremely* misleading, and strongly implies that the
above flags can _only_ be set for #NPF exits from SNP guests. That is a
lie, as bit 34 (C-bit=1, i.e. was encrypted) can be set when running _any_
flavor of SEV guest on SNP capable hardware.

Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a047480da5af..58bbcf76ad1e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -261,8 +261,12 @@ enum x86_intercept_stage;
#define PFERR_FETCH_MASK BIT(4)
#define PFERR_PK_MASK BIT(5)
#define PFERR_SGX_MASK BIT(15)
+#define PFERR_GUEST_RMP_MASK BIT_ULL(31)
#define PFERR_GUEST_FINAL_MASK BIT_ULL(32)
#define PFERR_GUEST_PAGE_MASK BIT_ULL(33)
+#define PFERR_GUEST_ENC_MASK BIT_ULL(34)
+#define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
+#define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
#define PFERR_IMPLICIT_ACCESS BIT_ULL(48)

#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
--
2.43.0



2024-05-07 16:27:44

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler

From: Sean Christopherson <[email protected]>

Move the sanity check that hardware never sets bits that collide with KVM-
define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
i.e. make the sanity check #NPF specific. The legacy #PF path already
WARNs if _any_ of bits 63:32 are set, and the error code that comes from
VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
EXIT_QUALIFICATION into error code flags).

Add a compile-time assert in the legacy #PF handler to make sure that KVM-
define flags are covered by its existing sanity check on the upper bits.

Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
are removing the comment that defined it.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
Reviewed-by: Binbin Wu <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/mmu/mmu.c | 14 +++-----------
arch/x86/kvm/svm/svm.c | 9 +++++++++
3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 58bbcf76ad1e..12e727301262 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -267,7 +267,13 @@ enum x86_intercept_stage;
#define PFERR_GUEST_ENC_MASK BIT_ULL(34)
#define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
#define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
+
+/*
+ * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
+ * when emulating instructions that triggers implicit access.
+ */
#define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
+#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)

#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
PFERR_WRITE_MASK | \
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c72a2033ca96..5562d693880a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
return -EFAULT;
#endif

+ /* Ensure the above sanity check also covers KVM-defined flags. */
+ BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));
+
vcpu->arch.l1tf_flush_l1d = true;
if (!flags) {
trace_kvm_page_fault(vcpu, fault_address, error_code);
@@ -5786,17 +5789,6 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
int r, emulation_type = EMULTYPE_PF;
bool direct = vcpu->arch.mmu->root_role.direct;

- /*
- * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
- * checks when emulating instructions that triggers implicit access.
- * WARN if hardware generates a fault with an error code that collides
- * with the KVM-defined value. Clear the flag and continue on, i.e.
- * don't terminate the VM, as KVM can't possibly be relying on a flag
- * that KVM doesn't know about.
- */
- if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
- error_code &= ~PFERR_IMPLICIT_ACCESS;
-
if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
return RET_PF_RETRY;

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0f3b59da0d4a..535018f152a3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2047,6 +2047,15 @@ static int npf_interception(struct kvm_vcpu *vcpu)
u64 fault_address = svm->vmcb->control.exit_info_2;
u64 error_code = svm->vmcb->control.exit_info_1;

+ /*
+ * WARN if hardware generates a fault with an error code that collides
+ * with KVM-defined sythentic flags. Clear the flags and continue on,
+ * i.e. don't terminate the VM, as KVM can't possibly be relying on a
+ * flag that KVM doesn't know about.
+ */
+ if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
+ error_code &= ~PFERR_SYNTHETIC_MASK;
+
trace_kvm_page_fault(vcpu, fault_address, error_code);
return kvm_mmu_page_fault(vcpu, fault_address, error_code,
static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
--
2.43.0



2024-05-07 16:31:10

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 11/17] KVM: x86/mmu: Don't force emulation of L2 accesses to non-APIC internal slots

From: Sean Christopherson <[email protected]>

Allow mapping KVM's internal memslots used for EPT without unrestricted
guest into L2, i.e. allow mapping the hidden TSS and the identity mapped
page tables into L2. Unlike the APIC access page, there is no correctness
issue with letting L2 access the "hidden" memory. Allowing these memslots
to be mapped into L2 fixes a largely theoretical bug where KVM could
incorrectly emulate subsequent _L1_ accesses as MMIO, and also ensures
consistent KVM behavior for L2.

If KVM is using TDP, but L1 is using shadow paging for L2, then routing
through kvm_handle_noslot_fault() will incorrectly cache the gfn as MMIO,
and create an MMIO SPTE. Creating an MMIO SPTE is ok, but only because
kvm_mmu_page_role.guest_mode ensure KVM uses different roots for L1 vs.
L2. But vcpu->arch.mmio_gfn will remain valid, and could cause KVM to
incorrectly treat an L1 access to the hidden TSS or identity mapped page
tables as MMIO.

Furthermore, forcing L2 accesses to be treated as "no slot" faults doesn't
actually prevent exposing KVM's internal memslots to L2, it simply forces
KVM to emulate the access. In most cases, that will trigger MMIO,
amusingly due to filling vcpu->arch.mmio_gfn, but also because
vcpu_is_mmio_gpa() unconditionally treats APIC accesses as MMIO, i.e. APIC
accesses are ok. But the hidden TSS and identity mapped page tables could
go either way (MMIO or access the private memslot's backing memory).

Alternatively, the inconsistent emulator behavior could be addressed by
forcing MMIO emulation for L2 access to all internal memslots, not just to
the APIC. But that's arguably less correct than letting L2 access the
hidden TSS and identity mapped page tables, not to mention that it's
*extremely* unlikely anyone cares what KVM does in this case. From L1's
perspective there is R/W memory at those memslots, the memory just happens
to be initialized with non-zero data. Making the memory disappear when it
is accessed by L2 is far more magical and arbitrary than the memory
existing in the first place.

The APIC access page is special because KVM _must_ emulate the access to
do the right thing (emulate an APIC access instead of reading/writing the
APIC access page). And despite what commit 3a2936dedd20 ("kvm: mmu: Don't
expose private memslots to L2") said, it's not just necessary when L1 is
accelerating L2's virtual APIC, it's just as important (likely *more*
imporant for correctness when L1 is passing through its own APIC to L2.

Fixes: 3a2936dedd20 ("kvm: mmu: Don't expose private memslots to L2")
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ba50b93e93ed..a8e14c2b68a7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4298,8 +4298,18 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
return RET_PF_RETRY;

- if (!kvm_is_visible_memslot(slot)) {
- /* Don't expose private memslots to L2. */
+ if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
+ /*
+ * Don't map L1's APIC access page into L2, KVM doesn't support
+ * using APICv/AVIC to accelerate L2 accesses to L1's APIC,
+ * i.e. the access needs to be emulated. Emulating access to
+ * L1's APIC is also correct if L1 is accelerating L2's own
+ * virtual APIC, but for some reason L1 also maps _L1's_ APIC
+ * into L2. Note, vcpu_is_mmio_gpa() always treats access to
+ * the APIC as MMIO. Allow an MMIO SPTE to be created, as KVM
+ * uses different roots for L1 vs. L2, i.e. there is no danger
+ * of breaking APICv/AVIC for L1.
+ */
if (is_guest_mode(vcpu)) {
fault->slot = NULL;
fault->pfn = KVM_PFN_NOSLOT;
@@ -4312,8 +4322,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
* MMIO SPTE. That way the cache doesn't need to be purged
* when the AVIC is re-enabled.
*/
- if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
- !kvm_apicv_activated(vcpu->kvm))
+ if (!kvm_apicv_activated(vcpu->kvm))
return RET_PF_EMULATE;
}

--
2.43.0



2024-05-07 16:31:57

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()

From: Sean Christopherson <[email protected]>

Handle the "no memslot" case at the beginning of kvm_faultin_pfn(), just
after the private versus shared check, so that there's no need to
repeatedly query whether or not a slot exists. This also makes it more
obvious that, except for private vs. shared attributes, the process of
faulting in a pfn simply doesn't apply to gfns without a slot.

Opportunistically stuff @fault's metadata in kvm_handle_noslot_fault() so
that it doesn't need to be duplicated in all paths that invoke
kvm_handle_noslot_fault(), and to minimize the probability of not stuffing
the right fields.

Leave the existing handle behind, but convert it to a WARN, to guard
against __kvm_faultin_pfn() unexpectedly nullifying fault->slot.

Cc: David Matlack <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b09c8034ed15..7630ad8cb022 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3270,6 +3270,10 @@ static int kvm_handle_noslot_fault(struct kvm_vcpu *vcpu,
vcpu_cache_mmio_info(vcpu, gva, fault->gfn,
access & shadow_mmio_access_mask);

+ fault->slot = NULL;
+ fault->pfn = KVM_PFN_NOSLOT;
+ fault->map_writable = false;
+
/*
* If MMIO caching is disabled, emulate immediately without
* touching the shadow page tables as attempting to install an
@@ -4350,15 +4354,18 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
return -EFAULT;
}

+ if (unlikely(!slot))
+ return kvm_handle_noslot_fault(vcpu, fault, access);
+
/*
* Retry the page fault if the gfn hit a memslot that is being deleted
* or moved. This ensures any existing SPTEs for the old memslot will
* be zapped before KVM inserts a new MMIO SPTE for the gfn.
*/
- if (slot && (slot->flags & KVM_MEMSLOT_INVALID))
+ if (slot->flags & KVM_MEMSLOT_INVALID)
return RET_PF_RETRY;

- if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
+ if (slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT) {
/*
* Don't map L1's APIC access page into L2, KVM doesn't support
* using APICv/AVIC to accelerate L2 accesses to L1's APIC,
@@ -4370,12 +4377,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* uses different roots for L1 vs. L2, i.e. there is no danger
* of breaking APICv/AVIC for L1.
*/
- if (is_guest_mode(vcpu)) {
- fault->slot = NULL;
- fault->pfn = KVM_PFN_NOSLOT;
- fault->map_writable = false;
- goto faultin_done;
- }
+ if (is_guest_mode(vcpu))
+ return kvm_handle_noslot_fault(vcpu, fault, access);
+
/*
* If the APIC access page exists but is disabled, go directly
* to emulation without caching the MMIO access or creating a
@@ -4386,6 +4390,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
return RET_PF_EMULATE;
}

+ fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+ smp_rmb();
+
/*
* Check for a relevant mmu_notifier invalidation event before getting
* the pfn from the primary MMU, and before acquiring mmu_lock.
@@ -4407,19 +4414,17 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
* to detect retry guarantees the worst case latency for the vCPU.
*/
- if (fault->slot &&
- mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
+ if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
return RET_PF_RETRY;

ret = __kvm_faultin_pfn(vcpu, fault);
if (ret != RET_PF_CONTINUE)
return ret;

-faultin_done:
if (unlikely(is_error_pfn(fault->pfn)))
return kvm_handle_error_pfn(vcpu, fault);

- if (unlikely(!fault->slot))
+ if (WARN_ON_ONCE(!fault->slot))
return kvm_handle_noslot_fault(vcpu, fault, access);

/*
--
2.43.0



2024-05-07 16:36:48

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns

From: Sean Christopherson <[email protected]>

WARN if __kvm_faultin_pfn() generates a "no slot" pfn, and gracefully
handle the unexpected behavior instead of continuing on with dangerous
state, e.g. tdp_mmu_map_handle_target_level() _only_ checks fault->slot,
and so could install a bogus PFN into the guest.

The existing code is functionally ok, because kvm_faultin_pfn() pre-checks
all of the cases that result in KVM_PFN_NOSLOT, but it is unnecessarily
unsafe as it relies on __gfn_to_pfn_memslot() getting the _exact_ same
memslot, i.e. not a re-retrieved pointer with KVM_MEMSLOT_INVALID set.
And checking only fault->slot would fall apart if KVM ever added a flag or
condition that forced emulation, similar to how KVM handles writes to
read-only memslots.

Cc: David Matlack <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Kai Huang <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d717d60c6f19..510eb1117012 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4425,7 +4425,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
if (unlikely(is_error_pfn(fault->pfn)))
return kvm_handle_error_pfn(vcpu, fault);

- if (WARN_ON_ONCE(!fault->slot))
+ if (WARN_ON_ONCE(!fault->slot || is_noslot_pfn(fault->pfn)))
return kvm_handle_noslot_fault(vcpu, fault, access);

/*
--
2.43.0


2024-05-07 17:20:34

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

From: Sean Christopherson <[email protected]>

Add and use a synthetic, KVM-defined page fault error code to indicate
whether a fault is to private vs. shared memory. TDX and SNP have
different mechanisms for reporting private vs. shared, and KVM's
software-protected VMs have no mechanism at all. Usurp an error code
flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
and friends.

Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
for TDX and software-protected VMs as appropriate, but that would require
*clearing* the flag for SEV and SEV-ES VMs, which support encrypted
memory at the hardware layer, but don't utilize private memory at the
KVM layer.

Opportunistically add a comment to call out that the logic for software-
protected VMs is (and was before this commit) broken for nested MMUs, i.e.
for nested TDP, as the GPA is an L2 GPA. Punt on trying to play nice with
nested MMUs as there is a _lot_ of functionality that simply doesn't work
for software-protected VMs, e.g. all of the paths where KVM accesses guest
memory need to be updated to be aware of private vs. shared memory.

Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 7 ++++++-
arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++++
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 12e727301262..0dc755a6dc0c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -273,7 +273,12 @@ enum x86_intercept_stage;
* when emulating instructions that triggers implicit access.
*/
#define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
-#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
+/*
+ * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
+ * when the guest was accessing private memory.
+ */
+#define PFERR_PRIVATE_ACCESS BIT_ULL(49)
+#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)

#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
PFERR_WRITE_MASK | \
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3609167ba30e..eb041acec2dc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5799,6 +5799,20 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
return RET_PF_RETRY;

+ /*
+ * Except for reserved faults (emulated MMIO is shared-only), set the
+ * PFERR_PRIVATE_ACCESS flag for software-protected VMs based on the gfn's
+ * current attributes, which are the source of truth for such VMs. Note,
+ * this wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
+ * currently supported nested virtualization (among many other things)
+ * for software-protected VMs.
+ */
+ if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
+ !(error_code & PFERR_RSVD_MASK) &&
+ vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
+ kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
+ error_code |= PFERR_PRIVATE_ACCESS;
+
r = RET_PF_INVALID;
if (unlikely(error_code & PFERR_RSVD_MASK)) {
r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 797b80f996a7..dfd9ff383663 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.max_level = KVM_MAX_HUGEPAGE_LEVEL,
.req_level = PG_LEVEL_4K,
.goal_level = PG_LEVEL_4K,
- .is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+ .is_private = err & PFERR_PRIVATE_ACCESS,
};
int r;

--
2.43.0



2024-05-07 17:21:50

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation

From: Sean Christopherson <[email protected]>

Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault
triggers emulation of any kind, as KVM doesn't currently support emulating
access to guest private memory. Practically speaking, private faults and
emulation are already mutually exclusive, but there are many flow that
can result in KVM returning RET_PF_EMULATE, and adding one last check
to harden against weird, unexpected combinations and/or KVM bugs is
inexpensive.

Suggested-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 8 --------
arch/x86/kvm/mmu/mmu_internal.h | 19 +++++++++++++++++++
2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 45b6d8f9e359..c72a2033ca96 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4257,14 +4257,6 @@ static inline u8 kvm_max_level_for_order(int order)
return PG_LEVEL_4K;
}

-static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
- struct kvm_page_fault *fault)
-{
- kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
- PAGE_SIZE, fault->write, fault->exec,
- fault->is_private);
-}
-
static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 5390a591a571..61f49967047a 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -279,6 +279,14 @@ enum {
RET_PF_SPURIOUS,
};

+static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
+{
+ kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
+ PAGE_SIZE, fault->write, fault->exec,
+ fault->is_private);
+}
+
static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
u32 err, bool prefetch, int *emulation_type)
{
@@ -320,6 +328,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
else
r = vcpu->arch.mmu->page_fault(vcpu, &fault);

+ /*
+ * Not sure what's happening, but punt to userspace and hope that
+ * they can fix it by changing memory to shared, or they can
+ * provide a better error.
+ */
+ if (r == RET_PF_EMULATE && fault.is_private) {
+ pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
+ kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
+ return -EFAULT;
+ }
+
if (fault.write_fault_to_shadow_pgtable && emulation_type)
*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;

--
2.43.0



2024-05-07 17:22:24

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults

From: Sean Christopherson <[email protected]>

WARN and skip the emulated MMIO fastpath if a private, reserved page fault
is encountered, as private+reserved should be an impossible combination
(KVM should never create an MMIO SPTE for a private access).

Signed-off-by: Sean Christopherson <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d52794663290..0d884d0b0f35 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5819,6 +5819,9 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err

r = RET_PF_INVALID;
if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
+ return -EFAULT;
+
r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
if (r == RET_PF_EMULATE)
goto emulate;
--
2.43.0



2024-05-13 05:26:13

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 01/17] KVM: x86/mmu: Exit to userspace with -EFAULT if private fault hits emulation

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <[email protected]>
>
> Exit to userspace with -EFAULT / KVM_EXIT_MEMORY_FAULT if a private fault
> triggers emulation of any kind, as KVM doesn't currently support emulating
> access to guest private memory. Practically speaking, private faults and
> emulation are already mutually exclusive, but there are many flow that
> can result in KVM returning RET_PF_EMULATE, and adding one last check
> to harden against weird, unexpected combinations and/or KVM bugs is
> inexpensive.
>
> Suggested-by: Yan Zhao <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> Message-ID: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>

Reviewed-by: Xiaoyao Li <[email protected]>

> ---
> arch/x86/kvm/mmu/mmu.c | 8 --------
> arch/x86/kvm/mmu/mmu_internal.h | 19 +++++++++++++++++++
> 2 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 45b6d8f9e359..c72a2033ca96 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4257,14 +4257,6 @@ static inline u8 kvm_max_level_for_order(int order)
> return PG_LEVEL_4K;
> }
>
> -static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> - struct kvm_page_fault *fault)
> -{
> - kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> - PAGE_SIZE, fault->write, fault->exec,
> - fault->is_private);
> -}
> -
> static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault)
> {
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 5390a591a571..61f49967047a 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -279,6 +279,14 @@ enum {
> RET_PF_SPURIOUS,
> };
>
> +static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> + struct kvm_page_fault *fault)
> +{
> + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> + PAGE_SIZE, fault->write, fault->exec,
> + fault->is_private);
> +}
> +
> static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> u32 err, bool prefetch, int *emulation_type)
> {
> @@ -320,6 +328,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> else
> r = vcpu->arch.mmu->page_fault(vcpu, &fault);
>
> + /*
> + * Not sure what's happening, but punt to userspace and hope that
> + * they can fix it by changing memory to shared, or they can
> + * provide a better error.
> + */
> + if (r == RET_PF_EMULATE && fault.is_private) {
> + pr_warn_ratelimited("kvm: unexpected emulation request on private memory\n");
> + kvm_mmu_prepare_memory_fault_exit(vcpu, &fault);
> + return -EFAULT;
> + }
> +
> if (fault.write_fault_to_shadow_pgtable && emulation_type)
> *emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
>


2024-05-13 05:50:38

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <[email protected]>
>
> Move the sanity check that hardware never sets bits that collide with KVM-
> define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
> i.e. make the sanity check #NPF specific. The legacy #PF path already
> WARNs if _any_ of bits 63:32 are set, and the error code that comes from
> VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
> EXIT_QUALIFICATION into error code flags).
>
> Add a compile-time assert in the legacy #PF handler to make sure that KVM-
> define flags are covered by its existing sanity check on the upper bits.
>
> Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
> are removing the comment that defined it.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> Reviewed-by: Kai Huang <[email protected]>
> Reviewed-by: Binbin Wu <[email protected]>
> Message-ID: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 6 ++++++
> arch/x86/kvm/mmu/mmu.c | 14 +++-----------
> arch/x86/kvm/svm/svm.c | 9 +++++++++
> 3 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 58bbcf76ad1e..12e727301262 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -267,7 +267,13 @@ enum x86_intercept_stage;
> #define PFERR_GUEST_ENC_MASK BIT_ULL(34)
> #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
> #define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
> +
> +/*
> + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
> + * when emulating instructions that triggers implicit access.
> + */
> #define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
> +#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
>
> #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
> PFERR_WRITE_MASK | \
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c72a2033ca96..5562d693880a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> return -EFAULT;
> #endif
>
> + /* Ensure the above sanity check also covers KVM-defined flags. */

1. There is no sanity check above related to KVM-defined flags yet. It
has to be after Patch 6.

2. I somehow cannot parse the comment properly, though I know it's to
ensure KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.

> + BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));
> +
> vcpu->arch.l1tf_flush_l1d = true;
> if (!flags) {
> trace_kvm_page_fault(vcpu, fault_address, error_code);
> @@ -5786,17 +5789,6 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> int r, emulation_type = EMULTYPE_PF;
> bool direct = vcpu->arch.mmu->root_role.direct;
>
> - /*
> - * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> - * checks when emulating instructions that triggers implicit access.
> - * WARN if hardware generates a fault with an error code that collides
> - * with the KVM-defined value. Clear the flag and continue on, i.e.
> - * don't terminate the VM, as KVM can't possibly be relying on a flag
> - * that KVM doesn't know about.
> - */
> - if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> - error_code &= ~PFERR_IMPLICIT_ACCESS;
> -
> if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> return RET_PF_RETRY;
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 0f3b59da0d4a..535018f152a3 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2047,6 +2047,15 @@ static int npf_interception(struct kvm_vcpu *vcpu)
> u64 fault_address = svm->vmcb->control.exit_info_2;
> u64 error_code = svm->vmcb->control.exit_info_1;
>
> + /*
> + * WARN if hardware generates a fault with an error code that collides
> + * with KVM-defined sythentic flags. Clear the flags and continue on,
> + * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> + * flag that KVM doesn't know about.
> + */
> + if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> + error_code &= ~PFERR_SYNTHETIC_MASK;
> +
> trace_kvm_page_fault(vcpu, fault_address, error_code);
> return kvm_mmu_page_fault(vcpu, fault_address, error_code,
> static_cpu_has(X86_FEATURE_DECODEASSISTS) ?


2024-05-13 05:56:52

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 07/17] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <[email protected]>
>
> Add and use a synthetic, KVM-defined page fault error code to indicate
> whether a fault is to private vs. shared memory. TDX and SNP have
> different mechanisms for reporting private vs. shared, and KVM's
> software-protected VMs have no mechanism at all. Usurp an error code
> flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
> and friends.
>
> Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
> for TDX and software-protected VMs as appropriate, but that would require
> *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
> memory at the hardware layer, but don't utilize private memory at the
> KVM layer.
>
> Opportunistically add a comment to call out that the logic for software-
> protected VMs is (and was before this commit) broken for nested MMUs, i.e.
> for nested TDP, as the GPA is an L2 GPA. Punt on trying to play nice with
> nested MMUs as there is a _lot_ of functionality that simply doesn't work
> for software-protected VMs, e.g. all of the paths where KVM accesses guest
> memory need to be updated to be aware of private vs. shared memory.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> Message-Id: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>

Reviewed-by: Xiaoyao Li <[email protected]>

> ---
> arch/x86/include/asm/kvm_host.h | 7 ++++++-
> arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++++
> arch/x86/kvm/mmu/mmu_internal.h | 2 +-
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 12e727301262..0dc755a6dc0c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -273,7 +273,12 @@ enum x86_intercept_stage;
> * when emulating instructions that triggers implicit access.
> */
> #define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
> -#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
> +/*
> + * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
> + * when the guest was accessing private memory.
> + */
> +#define PFERR_PRIVATE_ACCESS BIT_ULL(49)
> +#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
>
> #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
> PFERR_WRITE_MASK | \
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 3609167ba30e..eb041acec2dc 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5799,6 +5799,20 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> return RET_PF_RETRY;
>
> + /*
> + * Except for reserved faults (emulated MMIO is shared-only), set the
> + * PFERR_PRIVATE_ACCESS flag for software-protected VMs based on the gfn's
> + * current attributes, which are the source of truth for such VMs. Note,
> + * this wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> + * currently supported nested virtualization (among many other things)
> + * for software-protected VMs.
> + */
> + if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> + !(error_code & PFERR_RSVD_MASK) &&
> + vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> + kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> + error_code |= PFERR_PRIVATE_ACCESS;
> +
> r = RET_PF_INVALID;
> if (unlikely(error_code & PFERR_RSVD_MASK)) {
> r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 797b80f996a7..dfd9ff383663 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> .max_level = KVM_MAX_HUGEPAGE_LEVEL,
> .req_level = PG_LEVEL_4K,
> .goal_level = PG_LEVEL_4K,
> - .is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> + .is_private = err & PFERR_PRIVATE_ACCESS,
> };
> int r;
>


2024-05-13 06:15:26

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 09/17] KVM: x86/mmu: WARN and skip MMIO cache on private, reserved page faults

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <[email protected]>
>
> WARN and skip the emulated MMIO fastpath if a private, reserved page fault
> is encountered, as private+reserved should be an impossible combination
> (KVM should never create an MMIO SPTE for a private access).
>
> Signed-off-by: Sean Christopherson <[email protected]>
> Message-ID: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>

Reviewed-by: Xiaoyao Li <[email protected]>

> ---
> arch/x86/kvm/mmu/mmu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d52794663290..0d884d0b0f35 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5819,6 +5819,9 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>
> r = RET_PF_INVALID;
> if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + if (WARN_ON_ONCE(error_code & PFERR_PRIVATE_ACCESS))
> + return -EFAULT;
> +
> r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> if (r == RET_PF_EMULATE)
> goto emulate;


2024-05-13 06:29:41

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 14/17] KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <[email protected]>
>
> Handle the "no memslot" case at the beginning of kvm_faultin_pfn(), just
> after the private versus shared check, so that there's no need to
> repeatedly query whether or not a slot exists. This also makes it more
> obvious that, except for private vs. shared attributes, the process of
> faulting in a pfn simply doesn't apply to gfns without a slot.
>
> Opportunistically stuff @fault's metadata in kvm_handle_noslot_fault() so
> that it doesn't need to be duplicated in all paths that invoke
> kvm_handle_noslot_fault(), and to minimize the probability of not stuffing
> the right fields.
>
> Leave the existing handle behind, but convert it to a WARN, to guard
> against __kvm_faultin_pfn() unexpectedly nullifying fault->slot.
>
> Cc: David Matlack <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> Reviewed-by: Kai Huang <[email protected]>
> Message-ID: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>

Reviewed-by: Xiaoyao Li <[email protected]>



2024-05-13 06:40:30

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 17/17] KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns

On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> From: Sean Christopherson <[email protected]>
>
> WARN if __kvm_faultin_pfn() generates a "no slot" pfn, and gracefully
> handle the unexpected behavior instead of continuing on with dangerous
> state, e.g. tdp_mmu_map_handle_target_level() _only_ checks fault->slot,
> and so could install a bogus PFN into the guest.
>
> The existing code is functionally ok, because kvm_faultin_pfn() pre-checks
> all of the cases that result in KVM_PFN_NOSLOT, but it is unnecessarily
> unsafe as it relies on __gfn_to_pfn_memslot() getting the _exact_ same
> memslot, i.e. not a re-retrieved pointer with KVM_MEMSLOT_INVALID set.
> And checking only fault->slot would fall apart if KVM ever added a flag or
> condition that forced emulation, similar to how KVM handles writes to
> read-only memslots.
>
> Cc: David Matlack <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> Reviewed-by: Kai Huang <[email protected]>
> Message-ID: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>

Reviewed-by: Xiaoyao Li <[email protected]>

> ---
> arch/x86/kvm/mmu/mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d717d60c6f19..510eb1117012 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4425,7 +4425,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> if (unlikely(is_error_pfn(fault->pfn)))
> return kvm_handle_error_pfn(vcpu, fault);
>
> - if (WARN_ON_ONCE(!fault->slot))
> + if (WARN_ON_ONCE(!fault->slot || is_noslot_pfn(fault->pfn)))
> return kvm_handle_noslot_fault(vcpu, fault, access);
>
> /*


2024-05-13 17:31:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler

On Mon, May 13, 2024, Xiaoyao Li wrote:
> On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> > From: Sean Christopherson <[email protected]>
> >
> > Move the sanity check that hardware never sets bits that collide with KVM-
> > define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
> > i.e. make the sanity check #NPF specific. The legacy #PF path already
> > WARNs if _any_ of bits 63:32 are set, and the error code that comes from
> > VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
> > EXIT_QUALIFICATION into error code flags).
> >
> > Add a compile-time assert in the legacy #PF handler to make sure that KVM-
> > define flags are covered by its existing sanity check on the upper bits.
> >
> > Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
> > are removing the comment that defined it.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > Reviewed-by: Kai Huang <[email protected]>
> > Reviewed-by: Binbin Wu <[email protected]>
> > Message-ID: <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 6 ++++++
> > arch/x86/kvm/mmu/mmu.c | 14 +++-----------
> > arch/x86/kvm/svm/svm.c | 9 +++++++++
> > 3 files changed, 18 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 58bbcf76ad1e..12e727301262 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -267,7 +267,13 @@ enum x86_intercept_stage;
> > #define PFERR_GUEST_ENC_MASK BIT_ULL(34)
> > #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
> > #define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
> > +
> > +/*
> > + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
> > + * when emulating instructions that triggers implicit access.
> > + */
> > #define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
> > +#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
> > #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
> > PFERR_WRITE_MASK | \
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index c72a2033ca96..5562d693880a 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> > return -EFAULT;
> > #endif
> > + /* Ensure the above sanity check also covers KVM-defined flags. */
>
> 1. There is no sanity check above related to KVM-defined flags yet. It has
> to be after Patch 6.

Ya, it's not just the comment, the entire changelog expects this patch to land
after patch 6.
>
> 2. I somehow cannot parse the comment properly, though I know it's to ensure
> KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.

Hmm, how about this?

/*
* Ensure that the above sanity check on hardware error code bits 63:32
* also prevents false positives on KVM-defined flags.
*/

2024-05-14 04:26:16

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler

On 5/14/2024 1:31 AM, Sean Christopherson wrote:
> On Mon, May 13, 2024, Xiaoyao Li wrote:
>> On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
>>> From: Sean Christopherson <[email protected]>
>>>
>>> Move the sanity check that hardware never sets bits that collide with KVM-
>>> define synthetic bits from kvm_mmu_page_fault() to npf_interception(),
>>> i.e. make the sanity check #NPF specific. The legacy #PF path already
>>> WARNs if _any_ of bits 63:32 are set, and the error code that comes from
>>> VMX's EPT Violatation and Misconfig is 100% synthesized (KVM morphs VMX's
>>> EXIT_QUALIFICATION into error code flags).
>>>
>>> Add a compile-time assert in the legacy #PF handler to make sure that KVM-
>>> define flags are covered by its existing sanity check on the upper bits.
>>>
>>> Opportunistically add a description of PFERR_IMPLICIT_ACCESS, since we
>>> are removing the comment that defined it.
>>>
>>> Signed-off-by: Sean Christopherson <[email protected]>
>>> Reviewed-by: Kai Huang <[email protected]>
>>> Reviewed-by: Binbin Wu <[email protected]>
>>> Message-ID: <[email protected]>
>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>> ---
>>> arch/x86/include/asm/kvm_host.h | 6 ++++++
>>> arch/x86/kvm/mmu/mmu.c | 14 +++-----------
>>> arch/x86/kvm/svm/svm.c | 9 +++++++++
>>> 3 files changed, 18 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>> index 58bbcf76ad1e..12e727301262 100644
>>> --- a/arch/x86/include/asm/kvm_host.h
>>> +++ b/arch/x86/include/asm/kvm_host.h
>>> @@ -267,7 +267,13 @@ enum x86_intercept_stage;
>>> #define PFERR_GUEST_ENC_MASK BIT_ULL(34)
>>> #define PFERR_GUEST_SIZEM_MASK BIT_ULL(35)
>>> #define PFERR_GUEST_VMPL_MASK BIT_ULL(36)
>>> +
>>> +/*
>>> + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
>>> + * when emulating instructions that triggers implicit access.
>>> + */
>>> #define PFERR_IMPLICIT_ACCESS BIT_ULL(48)
>>> +#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
>>> #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
>>> PFERR_WRITE_MASK | \
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index c72a2033ca96..5562d693880a 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
>>> return -EFAULT;
>>> #endif
>>> + /* Ensure the above sanity check also covers KVM-defined flags. */
>>
>> 1. There is no sanity check above related to KVM-defined flags yet. It has
>> to be after Patch 6.
>
> Ya, it's not just the comment, the entire changelog expects this patch to land
> after patch 6.
>>
>> 2. I somehow cannot parse the comment properly, though I know it's to ensure
>> KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.
>
> Hmm, how about this?
>
> /*
> * Ensure that the above sanity check on hardware error code bits 63:32
> * also prevents false positives on KVM-defined flags.
> */
>

Maybe it's just myself inability, I still cannot interpret it well.

Can't we put it above the sanity check of error code, and just with a
comment like

/*
* Ensure KVM-defined flags not occupied any bits below 32-bits,
* that are used by hardware.
* /

2024-05-14 15:32:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler

On Tue, May 14, 2024, Xiaoyao Li wrote:
> On 5/14/2024 1:31 AM, Sean Christopherson wrote:
> > On Mon, May 13, 2024, Xiaoyao Li wrote:
> > > On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
> > > > +#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
> > > > #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
> > > > PFERR_WRITE_MASK | \
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index c72a2033ca96..5562d693880a 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> > > > return -EFAULT;
> > > > #endif
> > > > + /* Ensure the above sanity check also covers KVM-defined flags. */
> > >
> > > 1. There is no sanity check above related to KVM-defined flags yet. It has
> > > to be after Patch 6.
> >
> > Ya, it's not just the comment, the entire changelog expects this patch to land
> > after patch 6.
> > >
> > > 2. I somehow cannot parse the comment properly, though I know it's to ensure
> > > KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.
> >
> > Hmm, how about this?
> >
> > /*
> > * Ensure that the above sanity check on hardware error code bits 63:32
> > * also prevents false positives on KVM-defined flags.
> > */
> >
>
> Maybe it's just myself inability, I still cannot interpret it well.
>
> Can't we put it above the sanity check of error code, and just with a
> comment like
>
> /*
> * Ensure KVM-defined flags not occupied any bits below 32-bits,
> * that are used by hardware.

This is somewhat misleading, as hardware does use bits 63:32 (for #NPF), just not
for #PF error codes. And the reason I'm using rather indirect wording is that
KVM _could_ define synthetic flags in bits 31:0, there's simply a higher probability
of needing to reshuffle bit numbers due to a conflict with a future feature.

Is this better? I think it captures what you're looking for, while hopefully also
capturing that staying out of bits 31:0 isn't a hard requirement.

/*
* Restrict KVM-defined flags to bits 63:32 so that it's impossible for
* them to conflict with #PF error codes, which are limited to 32 bits.
*/

2024-05-15 01:08:43

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 04/17] KVM: x86: Move synthetic PFERR_* sanity checks to SVM's #NPF handler

On 5/14/2024 11:32 PM, Sean Christopherson wrote:
> On Tue, May 14, 2024, Xiaoyao Li wrote:
>> On 5/14/2024 1:31 AM, Sean Christopherson wrote:
>>> On Mon, May 13, 2024, Xiaoyao Li wrote:
>>>> On 5/7/2024 11:58 PM, Paolo Bonzini wrote:
>>>>> +#define PFERR_SYNTHETIC_MASK (PFERR_IMPLICIT_ACCESS)
>>>>> #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
>>>>> PFERR_WRITE_MASK | \
>>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>>>> index c72a2033ca96..5562d693880a 100644
>>>>> --- a/arch/x86/kvm/mmu/mmu.c
>>>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>>>> @@ -4502,6 +4502,9 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
>>>>> return -EFAULT;
>>>>> #endif
>>>>> + /* Ensure the above sanity check also covers KVM-defined flags. */
>>>>
>>>> 1. There is no sanity check above related to KVM-defined flags yet. It has
>>>> to be after Patch 6.
>>>
>>> Ya, it's not just the comment, the entire changelog expects this patch to land
>>> after patch 6.
>>>>
>>>> 2. I somehow cannot parse the comment properly, though I know it's to ensure
>>>> KVM-defined PFERR_SYNTHETIC_MASK not contain any bit below 32-bits.
>>>
>>> Hmm, how about this?
>>>
>>> /*
>>> * Ensure that the above sanity check on hardware error code bits 63:32
>>> * also prevents false positives on KVM-defined flags.
>>> */
>>>
>>
>> Maybe it's just myself inability, I still cannot interpret it well.
>>
>> Can't we put it above the sanity check of error code, and just with a
>> comment like
>>
>> /*
>> * Ensure KVM-defined flags not occupied any bits below 32-bits,
>> * that are used by hardware.
>
> This is somewhat misleading, as hardware does use bits 63:32 (for #NPF), just not
> for #PF error codes. And the reason I'm using rather indirect wording is that
> KVM _could_ define synthetic flags in bits 31:0, there's simply a higher probability
> of needing to reshuffle bit numbers due to a conflict with a future feature.
>
> Is this better? I think it captures what you're looking for, while hopefully also
> capturing that staying out of bits 31:0 isn't a hard requirement.

yeah, it looks better!

> /*
> * Restrict KVM-defined flags to bits 63:32 so that it's impossible for
> * them to conflict with #PF error codes, which are limited to 32 bits.
> */