LinuxLists.cc - [PATCH 0/9] KVM: x86: Fixes for KVM_INTEL_PROVE

2024-05-18 00:04:48

Subject: [PATCH 0/9] KVM: x86: Fixes for KVM_INTEL_PROVE_VE

Fixes and debug help for prove #VE support. I'm not in love with the sanity
check implementation, but I also don't love the idea of plumbing in @kvm to
the low level SPTE helpers.

Not super well tested, but I wanted to get this posted asap in case someone
wants to debug the unexpected #VEs we're seeing.

Note, Isaku's patch needs his SoB.

Isaku Yamahata (1):
KVM: x86/mmu: Use SHADOW_NONPRESENT_VALUE for atomic zap in TDP MMU

Sean Christopherson (8):
KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE
support
KVM: nVMX: Always handle #VEs in L0 (never forward #VEs from L2 to L1)
KVM: x86/mmu: Add sanity checks that KVM doesn't create EPT #VE SPTEs
KVM: VMX: Dump VMCS on unexpected #VE
KVM: x86/mmu: Print SPTEs on unexpected #VE
KVM: VMX: Don't kill the VM on an unexpected #VE
KVM: VMX: Enumerate EPT Violation #VE support in /proc/cpuinfo
KVM: x86: Disable KVM_INTEL_PROVE_VE by default

arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/vmxfeatures.h | 2 +-
arch/x86/kvm/Kconfig | 6 ++--
arch/x86/kvm/mmu/mmu.c | 45 ++++++++++++++++++++++++------
arch/x86/kvm/mmu/spte.h | 9 ++++++
arch/x86/kvm/mmu/tdp_iter.h | 2 ++
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
arch/x86/kvm/vmx/nested.c | 5 ++++
arch/x86/kvm/vmx/vmx.c | 11 ++++++--
9 files changed, 67 insertions(+), 16 deletions(-)

base-commit: 4aad0b1893a141f114ba40ed509066f3c9bc24b0
--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:05:00

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 1/9] KVM: x86/mmu: Use SHADOW_NONPRESENT_VALUE for atomic zap in TDP MMU

From: Isaku Yamahata <[email protected]>

Use SHADOW_NONPRESENT_VALUE when zapping TDP MMU SPTEs with mmu_lock held
for read, tdp_mmu_zap_spte_atomic() was simply missed during the initial
development.

Fixes: 7f01cab84928 ("KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE")
Not-yet-signed-off-by: Isaku Yamahata <[email protected]>
[sean: write changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1259dd63defc..36539c1b36cd 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -626,7 +626,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* SPTEs.
*/
handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
- 0, iter->level, true);
+ SHADOW_NONPRESENT_VALUE, iter->level, true);

return 0;
}
--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:05:29

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support

Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
enabled and a VE info address pointing at pfn 0.

Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d5b832126e34..6798fadaa335 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2242,6 +2242,9 @@ static void prepare_vmcs02_constant_state(struct vcpu_vmx *vmx)
vmcs_write64(EPT_POINTER,
construct_eptp(&vmx->vcpu, 0, PT64_ROOT_4LEVEL));

+ if (vmx->ve_info)
+ vmcs_write64(VE_INFORMATION_ADDRESS, __pa(vmx->ve_info));
+
/* All VMFUNCs are currently emulated through L0 vmexits. */
if (cpu_has_vmx_vmfunc())
vmcs_write64(VM_FUNCTION_CONTROL, 0);
--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:05:36

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 3/9] KVM: nVMX: Always handle #VEs in L0 (never forward #VEs from L2 to L1)

Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
are KVM's responsibility.

Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6798fadaa335..643935a0f70a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6233,6 +6233,8 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu *vcpu,
else if (is_alignment_check(intr_info) &&
!vmx_guest_inject_ac(vcpu))
return true;
+ else if (is_ve_fault(intr_info))
+ return true;
return false;
case EXIT_REASON_EXTERNAL_INTERRUPT:
return true;
--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:05:54

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 4/9] KVM: x86/mmu: Add sanity checks that KVM doesn't create EPT #VE SPTEs

Assert that KVM doesn't set a SPTE to a value that could trigger an EPT
Violation #VE on a non-MMIO SPTE, e.g. to help detect bugs even without
KVM_INTEL_PROVE_VE enabled, and to help debug actual #VE failures.

Note, this will run afoul of TDX support, which needs to reflect emulated
MMIO accesses into the guest as #VEs (which was the whole point of adding
EPT Violation #VE support in KVM). The obvious fix for that is to exempt
MMIO SPTEs, but that's annoyingly difficult now that is_mmio_spte() relies
on a per-VM value. However, resolving that conundrum is a future problem,
whereas getting KVM_INTEL_PROVE_VE healthy is a current problem.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 +++
arch/x86/kvm/mmu/spte.h | 9 +++++++++
arch/x86/kvm/mmu/tdp_iter.h | 2 ++
3 files changed, 14 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5095fb46713e..d2af077d8b34 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -336,16 +336,19 @@ static int is_cpuid_PSE36(void)
#ifdef CONFIG_X86_64
static void __set_spte(u64 *sptep, u64 spte)
{
+ KVM_MMU_WARN_ON(is_ept_ve_possible(spte));
WRITE_ONCE(*sptep, spte);
}

static void __update_clear_spte_fast(u64 *sptep, u64 spte)
{
+ KVM_MMU_WARN_ON(is_ept_ve_possible(spte));
WRITE_ONCE(*sptep, spte);
}

static u64 __update_clear_spte_slow(u64 *sptep, u64 spte)
{
+ KVM_MMU_WARN_ON(is_ept_ve_possible(spte));
return xchg(sptep, spte);
}

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 5dd5405fa07a..52fa004a1fbc 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -3,6 +3,8 @@
#ifndef KVM_X86_MMU_SPTE_H
#define KVM_X86_MMU_SPTE_H

+#include <asm/vmx.h>
+
#include "mmu.h"
#include "mmu_internal.h"

@@ -276,6 +278,13 @@ static inline bool is_shadow_present_pte(u64 pte)
return !!(pte & SPTE_MMU_PRESENT_MASK);
}

+static inline bool is_ept_ve_possible(u64 spte)
+{
+ return (shadow_present_mask & VMX_EPT_SUPPRESS_VE_BIT) &&
+ !(spte & VMX_EPT_SUPPRESS_VE_BIT) &&
+ (spte & VMX_EPT_RWX_MASK) != VMX_EPT_MISCONFIG_WX_VALUE;
+}
+
/*
* Returns true if A/D bits are supported in hardware and are enabled by KVM.
* When enabled, KVM uses A/D bits for all non-nested MMUs. Because L1 can
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index fae559559a80..2880fd392e0c 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -21,11 +21,13 @@ static inline u64 kvm_tdp_mmu_read_spte(tdp_ptep_t sptep)

static inline u64 kvm_tdp_mmu_write_spte_atomic(tdp_ptep_t sptep, u64 new_spte)
{
+ KVM_MMU_WARN_ON(is_ept_ve_possible(new_spte));
return xchg(rcu_dereference(sptep), new_spte);
}

static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte)
{
+ KVM_MMU_WARN_ON(is_ept_ve_possible(new_spte));
WRITE_ONCE(*rcu_dereference(sptep), new_spte);
}

--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:06:15

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 5/9] KVM: VMX: Dump VMCS on unexpected #VE

Dump the VMCS on an unexpected #VE, otherwise it's practically impossible
to figure out why the #VE occurred.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 51b2cd13250a..0c68643d982b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5217,8 +5217,10 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
if (is_invalid_opcode(intr_info))
return handle_ud(vcpu);

- if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm))
+ if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm)) {
+ dump_vmcs(vcpu);
return -EIO;
+ }

error_code = 0;
if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:06:34

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 6/9] KVM: x86/mmu: Print SPTEs on unexpected #VE

Print the SPTEs that correspond to the faulting GPA on an unexpected EPT
Violation #VE to help the user debug failures, e.g. to pinpoint which SPTE
didn't have SUPPRESS_VE set.

Opportunistically assert that the underlying exit reason was indeed an EPT
Violation, as the CPU has *really* gone off the rails if a #VE occurs due
to a completely unexpected exit reason.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 42 ++++++++++++++++++++++++++-------
arch/x86/kvm/vmx/vmx.c | 5 ++++
3 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aabf1648a56a..9bb2e164c523 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2159,6 +2159,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);

int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
void *insn, int insn_len);
+void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
u64 addr, unsigned long roots);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d2af077d8b34..f2c9580d9588 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4124,6 +4124,22 @@ static int get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level
return leaf;
}

+static int get_sptes_lockless(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
+ int *root_level)
+{
+ int leaf;
+
+ walk_shadow_page_lockless_begin(vcpu);
+
+ if (is_tdp_mmu_active(vcpu))
+ leaf = kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root_level);
+ else
+ leaf = get_walk(vcpu, addr, sptes, root_level);
+
+ walk_shadow_page_lockless_end(vcpu);
+ return leaf;
+}
+
/* return true if reserved bit(s) are detected on a valid, non-MMIO SPTE. */
static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
{
@@ -4132,15 +4148,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep)
int root, leaf, level;
bool reserved = false;

- walk_shadow_page_lockless_begin(vcpu);
-
- if (is_tdp_mmu_active(vcpu))
- leaf = kvm_tdp_mmu_get_walk(vcpu, addr, sptes, &root);
- else
- leaf = get_walk(vcpu, addr, sptes, &root);
-
- walk_shadow_page_lockless_end(vcpu);
-
+ leaf = get_sptes_lockless(vcpu, addr, sptes, &root);
if (unlikely(leaf < 0)) {
*sptep = 0ull;
return reserved;
@@ -5963,6 +5971,22 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

+void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg)
+{
+ u64 sptes[PT64_ROOT_MAX_LEVEL + 1];
+ int root_level, leaf, level;
+
+ leaf = get_sptes_lockless(vcpu, gpa, sptes, &root_level);
+ if (unlikely(leaf < 0))
+ return;
+
+ pr_err("%s %llx", msg, gpa);
+ for (level = root_level; level >= leaf; level--)
+ pr_cont(", spte[%d] = 0x%llx", level, sptes[level]);
+ pr_cont("\n");
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_print_sptes);
+
static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
u64 addr, hpa_t root_hpa)
{
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0c68643d982b..2a3fce61c785 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5218,7 +5218,12 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
return handle_ud(vcpu);

if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm)) {
+ struct vmx_ve_information *ve_info = vmx->ve_info;
+
+ WARN_ONCE(ve_info->exit_reason != EXIT_REASON_EPT_VIOLATION,
+ "Unexpected #VE on VM-Exit reason 0x%x", ve_info->exit_reason);
dump_vmcs(vcpu);
+ kvm_mmu_print_sptes(vcpu, ve_info->guest_physical_address, "#VE");
return -EIO;
}

--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:06:46

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 7/9] KVM: VMX: Don't kill the VM on an unexpected #VE

Don't terminiate the VM on an unexpected #VE, as it's extremely unlikely
the #VE is fatal to the guest, and even less likely that it presents a
danger to the host. Simply resume the guest on "failure", as the #VE info
page's BUSY field will prevent converting any more EPT Violations to #VEs
for the vCPU (at least, that's what the BUSY field is supposed to do).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2a3fce61c785..58832aae2248 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5217,14 +5217,14 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
if (is_invalid_opcode(intr_info))
return handle_ud(vcpu);

- if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm)) {
+ if (WARN_ON_ONCE(is_ve_fault(intr_info))) {
struct vmx_ve_information *ve_info = vmx->ve_info;

WARN_ONCE(ve_info->exit_reason != EXIT_REASON_EPT_VIOLATION,
"Unexpected #VE on VM-Exit reason 0x%x", ve_info->exit_reason);
dump_vmcs(vcpu);
kvm_mmu_print_sptes(vcpu, ve_info->guest_physical_address, "#VE");
- return -EIO;
+ return 1;
}

error_code = 0;
--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:07:16

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 8/9] KVM: VMX: Enumerate EPT Violation #VE support in /proc/cpuinfo

Don't suppress printing EPT_VIOLATION_VE in /proc/cpuinfo, knowing whether
or not KVM_INTEL_PROVE_VE actually does anything is extremely valuable.
A privileged user can get at the information by reading the raw MSR, but
the whole point of the VMX flags is to avoid needing to glean information
from raw MSR reads.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/vmxfeatures.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index 266daf5b5b84..695f36664889 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -77,7 +77,7 @@
#define VMX_FEATURE_ENCLS_EXITING ( 2*32+ 15) /* "" VM-Exit on ENCLS (leaf dependent) */
#define VMX_FEATURE_RDSEED_EXITING ( 2*32+ 16) /* "" VM-Exit on RDSEED */
#define VMX_FEATURE_PAGE_MOD_LOGGING ( 2*32+ 17) /* "pml" Log dirty pages into buffer */
-#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* "" Conditionally reflect EPT violations as #VE exceptions */
+#define VMX_FEATURE_EPT_VIOLATION_VE ( 2*32+ 18) /* Conditionally reflect EPT violations as #VE exceptions */
#define VMX_FEATURE_PT_CONCEAL_VMX ( 2*32+ 19) /* "" Suppress VMX indicators in Processor Trace */
#define VMX_FEATURE_XSAVES ( 2*32+ 20) /* "" Enable XSAVES and XRSTORS in guest */
#define VMX_FEATURE_MODE_BASED_EPT_EXEC ( 2*32+ 22) /* "ept_mode_based_exec" Enable separate EPT EXEC bits for supervisor vs. user */
--
2.45.0.215.g3402c0e53f-goog

2024-05-18 00:07:19

by Sean Christopherson

[permalink] [raw]

Subject: [PATCH 9/9] KVM: x86: Disable KVM_INTEL_PROVE_VE by default

Disable KVM's "prove #VE" support by default, as it provides no functional
value, and even its sanity checking benefits are relatively limited. I.e.
it should be fully opt-in even on debug kernels, especially since EPT
Violation #VE suppression appears to be buggy on some CPUs.

Opportunistically add a line in the help text to make it abundantly clear
that KVM_INTEL_PROVE_VE should never be enabled in a production
environment.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/Kconfig | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2a7f69abcac3..3468efc4be55 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -97,15 +97,15 @@ config KVM_INTEL

config KVM_INTEL_PROVE_VE
bool "Check that guests do not receive #VE exceptions"
- default KVM_PROVE_MMU || DEBUG_KERNEL
- depends on KVM_INTEL
+ depends on KVM_INTEL && DEBUG_KERNEL && EXPERT
help
-
Checks that KVM's page table management code will not incorrectly
let guests receive a virtualization exception. Virtualization
exceptions will be trapped by the hypervisor rather than injected
in the guest.

+ This should never be enabled in a production environment.
+
If unsure, say N.

config X86_SGX_KVM
--
2.45.0.215.g3402c0e53f-goog

2024-05-20 12:52:05

by Kai Huang

[permalink] [raw]

Subject: Re: [PATCH 1/9] KVM: x86/mmu: Use SHADOW_NONPRESENT_VALUE for atomic zap in TDP MMU

On Fri, 2024-05-17 at 17:04 -0700, Sean Christopherson wrote:
> From: Isaku Yamahata <[email protected]>
>
> Use SHADOW_NONPRESENT_VALUE when zapping TDP MMU SPTEs with mmu_lock held
> for read, tdp_mmu_zap_spte_atomic() was simply missed during the initial
> development.
>
> Fixes: 7f01cab84928 ("KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE")
> Not-yet-signed-off-by: Isaku Yamahata <[email protected]>
> [sean: write changelog]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1259dd63defc..36539c1b36cd 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -626,7 +626,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
> * SPTEs.
> */
> handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> - 0, iter->level, true);
> + SHADOW_NONPRESENT_VALUE, iter->level, true);
>
> return 0;
> }

Reviewed-by: Kai Huang <[email protected]>

2024-05-20 23:10:02

by Kai Huang

[permalink] [raw]

Subject: Re: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support

On 18/05/2024 12:04 pm, Sean Christopherson wrote:
> Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
> initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
> enabled and a VE info address pointing at pfn 0.

How about we just clear EPT_VIOLATION_VE bit in 2nd_exec_control
unconditionally for vmcs02? Your next patch says:

"
Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
are KVM's responsibility.
"

>
> Fixes: 8131cf5b4fd8 ("KVM: VMX: Introduce test mode related to EPT violation VE")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/nested.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d5b832126e34..6798fadaa335 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2242,6 +2242,9 @@ static void prepare_vmcs02_constant_state(struct vcpu_vmx *vmx)
> vmcs_write64(EPT_POINTER,
> construct_eptp(&vmx->vcpu, 0, PT64_ROOT_4LEVEL));
>
> + if (vmx->ve_info)
> + vmcs_write64(VE_INFORMATION_ADDRESS, __pa(vmx->ve_info));
> +
> /* All VMFUNCs are currently emulated through L0 vmexits. */
> if (cpu_has_vmx_vmfunc())
> vmcs_write64(VM_FUNCTION_CONTROL, 0);

2024-05-20 23:23:10

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support

On Tue, May 21, 2024, Kai Huang wrote:
> On 18/05/2024 12:04 pm, Sean Christopherson wrote:
> > Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
> > initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
> > enabled and a VE info address pointing at pfn 0.
>
> How about we just clear EPT_VIOLATION_VE bit in 2nd_exec_control
> unconditionally for vmcs02?

Because then KVM wouldn't get any EPT Violation #VE coverage for L2, and as
evidence by the KVM-Unit-Test failure, running L2 with EPT Violation #VEs enabled
provides unique coverage. Doing so definitely provides coverage beyond what is
strictly needed for TDX, but it's just as easy to set the VE info page in vmcs02
as it is so clear EPT_VIOLATION_VE, so why not.

> Your next patch says:
>
> "
> Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
> as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
> are KVM's responsibility.
> "

I don't see how that's relevant to whether or not KVM enables EPT Violation #VEs
while L2 is running. That patch simply routes all #VEs to L0, it doesn't affect
whether or not it's safe to enable EPT Violation #VEs for L2.

2024-05-20 23:50:18

by Kai Huang

[permalink] [raw]

Subject: Re: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support

On 21/05/2024 11:22 am, Sean Christopherson wrote:
> On Tue, May 21, 2024, Kai Huang wrote:
>> On 18/05/2024 12:04 pm, Sean Christopherson wrote:
>>> Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
>>> initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
>>> enabled and a VE info address pointing at pfn 0.
>>
>> How about we just clear EPT_VIOLATION_VE bit in 2nd_exec_control
>> unconditionally for vmcs02?
>
> Because then KVM wouldn't get any EPT Violation #VE coverage for L2, and as
> evidence by the KVM-Unit-Test failure, running L2 with EPT Violation #VEs enabled
> provides unique coverage. Doing so definitely provides coverage beyond what is
> strictly needed for TDX, but it's just as easy to set the VE info page in vmcs02
> as it is so clear EPT_VIOLATION_VE, so why not.
>
>> Your next patch says:
>>
>> "
>> Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
>> as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
>> are KVM's responsibility.
>> "
>
> I don't see how that's relevant to whether or not KVM enables EPT Violation #VEs
> while L2 is running. That patch simply routes all #VEs to L0, it doesn't affect
> whether or not it's safe to enable EPT Violation #VEs for L2.

My logic is, if #VE exit cannot possibly happen for L2, then we don't
need to deal whether to route #VE exits to L1. :-)

Well, actually I think conceptually, it kinda makes sense to route #VE
exits to L1:

L1 should never enable #VE related bits so L1 is certainly not expecting
to see #VE from L2. But how to act should be depending on L1's logic?
E.g., it can choose to ignore, or just kill the L2 etc?

Unconditionally disable #VE in vmcs02 can avoid such issue because it's
just not possible for L2 to have the #VE exit.

2024-05-21 00:21:49

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support

On Tue, May 21, 2024, Kai Huang wrote:
> On 21/05/2024 11:22 am, Sean Christopherson wrote:
> > On Tue, May 21, 2024, Kai Huang wrote:
> > > On 18/05/2024 12:04 pm, Sean Christopherson wrote:
> > > > Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
> > > > initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
> > > > enabled and a VE info address pointing at pfn 0.
> > >
> > > How about we just clear EPT_VIOLATION_VE bit in 2nd_exec_control
> > > unconditionally for vmcs02?
> >
> > Because then KVM wouldn't get any EPT Violation #VE coverage for L2, and as
> > evidence by the KVM-Unit-Test failure, running L2 with EPT Violation #VEs enabled
> > provides unique coverage. Doing so definitely provides coverage beyond what is
> > strictly needed for TDX, but it's just as easy to set the VE info page in vmcs02
> > as it is so clear EPT_VIOLATION_VE, so why not.
> >
> > > Your next patch says:
> > >
> > > "
> > > Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
> > > as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
> > > are KVM's responsibility.
> > > "
> >
> > I don't see how that's relevant to whether or not KVM enables EPT Violation #VEs
> > while L2 is running. That patch simply routes all #VEs to L0, it doesn't affect
> > whether or not it's safe to enable EPT Violation #VEs for L2.
>
> My logic is, if #VE exit cannot possibly happen for L2, then we don't need
> to deal whether to route #VE exits to L1. :-)
>
> Well, actually I think conceptually, it kinda makes sense to route #VE exits
> to L1:
>
> L1 should never enable #VE related bits so L1 is certainly not expecting to

Not "should never", "can never". If L1 attempts to enable EPT_VIOLATION_VE, then
VM-Enter will VM-Fail.

> see #VE from L2. But how to act should be depending on L1's logic? E.g., it
> can choose to ignore, or just kill the L2 etc?

No. Architecturally, from L1's perspective, a #VE VM-Exit _cannot_ occur in L2.
L1 can inject a #VE into L2, but a #VE cannot be generated by the CPU and thus
cannot cause a VM-Exit.

> Unconditionally disable #VE in vmcs02 can avoid such issue because it's just
> not possible for L2 to have the #VE exit.

Sure, but by that argument we could just avoid all nested VMX issues by never
enabling anything for L2.

If there's an argument to be made for disabling EPT_VIOLATION_VE in vmcs02, it's
that the potential maintenance cost of keeping nEPT, nVMX, and the shadow MMU
healthy outweighs the benefits. I.e. we don't have a use case for enabling
EPT_VIOLATION_VE while L2 is running, so why validate it?

If whatever bug the KUT EPT found ends up being a KVM bug that specifically only
affects nVMX, then it'd be worth revisiting whether or not it's worth enabling
EPT_VIOLATION_VE in vmcs02. But that's a rather big "if" at this point.

2024-05-21 00:43:15

by Kai Huang

[permalink] [raw]

Subject: Re: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support

On 21/05/2024 12:21 pm, Sean Christopherson wrote:
> On Tue, May 21, 2024, Kai Huang wrote:
>> On 21/05/2024 11:22 am, Sean Christopherson wrote:
>>> On Tue, May 21, 2024, Kai Huang wrote:
>>>> On 18/05/2024 12:04 pm, Sean Christopherson wrote:
>>>>> Point vmcs02.VE_INFORMATION_ADDRESS at the vCPU's #VE info page when
>>>>> initializing vmcs02, otherwise KVM will run L2 with EPT Violation #VE
>>>>> enabled and a VE info address pointing at pfn 0.
>>>>
>>>> How about we just clear EPT_VIOLATION_VE bit in 2nd_exec_control
>>>> unconditionally for vmcs02?
>>>
>>> Because then KVM wouldn't get any EPT Violation #VE coverage for L2, and as
>>> evidence by the KVM-Unit-Test failure, running L2 with EPT Violation #VEs enabled
>>> provides unique coverage. Doing so definitely provides coverage beyond what is
>>> strictly needed for TDX, but it's just as easy to set the VE info page in vmcs02
>>> as it is so clear EPT_VIOLATION_VE, so why not.
>>>
>>>> Your next patch says:
>>>>
>>>> "
>>>> Always handle #VEs, e.g. due to prove EPT Violation #VE failures, in L0,
>>>> as KVM does not expose any #VE capabilities to L1, i.e. any and all #VEs
>>>> are KVM's responsibility.
>>>> "
>>>
>>> I don't see how that's relevant to whether or not KVM enables EPT Violation #VEs
>>> while L2 is running. That patch simply routes all #VEs to L0, it doesn't affect
>>> whether or not it's safe to enable EPT Violation #VEs for L2.
>>
>> My logic is, if #VE exit cannot possibly happen for L2, then we don't need
>> to deal whether to route #VE exits to L1. :-)
>>
>> Well, actually I think conceptually, it kinda makes sense to route #VE exits
>> to L1:
>>
>> L1 should never enable #VE related bits so L1 is certainly not expecting to
>
> Not "should never", "can never". If L1 attempts to enable EPT_VIOLATION_VE, then
> VM-Enter will VM-Fail.
>
>> see #VE from L2. But how to act should be depending on L1's logic? E.g., it
>> can choose to ignore, or just kill the L2 etc?
>
> No. Architecturally, from L1's perspective, a #VE VM-Exit _cannot_ occur in L2.
> L1 can inject a #VE into L2, but a #VE cannot be generated by the CPU and thus
> cannot cause a VM-Exit.

OK. The point is not to argue about L1 how to handle, but whether we
should inject to L1 -- L1 can do whatever it believes legal/sane.

But I understand the purpose is to test/validate, so it's fine for L0 to
handle, and by handle it eventually means we want to just dump that #VE
exit.

But now L0 always handles #VE exits from L2, and AFAICT L0 will just
kill the L1, until the patch:

KVM: VMX: Don't kill the VM on an unexpected #VE

lands.

So looks that patch at least should be done first. Otherwise it doesn't
make a lot sense to kill L1 for #VE exits from L2.

>
>> Unconditionally disable #VE in vmcs02 can avoid such issue because it's just
>> not possible for L2 to have the #VE exit.
>
> Sure, but by that argument we could just avoid all nested VMX issues by never
> enabling anything for L2.
>
> If there's an argument to be made for disabling EPT_VIOLATION_VE in vmcs02, it's
> that the potential maintenance cost of keeping nEPT, nVMX, and the shadow MMU
> healthy outweighs the benefits. I.e. we don't have a use case for enabling
> EPT_VIOLATION_VE while L2 is running, so why validate it?

Yeah. I am not sure the purpose of validating #VE exits from L2.

>
> If whatever bug the KUT EPT found ends up being a KVM bug that specifically only
> affects nVMX, then it'd be worth revisiting whether or not it's worth enabling
> EPT_VIOLATION_VE in vmcs02. But that's a rather big "if" at this point.

OK.

2024-05-21 01:02:46

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH 2/9] KVM: nVMX: Initialize #VE info page for vmcs02 when proving #VE support

On Tue, May 21, 2024, Kai Huang wrote:
> But now L0 always handles #VE exits from L2, and AFAICT L0 will just kill
> the L1, until the patch:
>
> KVM: VMX: Don't kill the VM on an unexpected #VE
>
> lands.
>
> So looks that patch at least should be done first. Otherwise it doesn't
> make a lot sense to kill L1 for #VE exits from L2.

I have no objection to changing the order.

2024-05-21 07:21:23

by Isaku Yamahata

[permalink] [raw]

Subject: Re: [PATCH 1/9] KVM: x86/mmu: Use SHADOW_NONPRESENT_VALUE for atomic zap in TDP MMU

On Fri, May 17, 2024 at 05:04:22PM -0700,
Sean Christopherson <[email protected]> wrote:

> From: Isaku Yamahata <[email protected]>
>
> Use SHADOW_NONPRESENT_VALUE when zapping TDP MMU SPTEs with mmu_lock held
> for read, tdp_mmu_zap_spte_atomic() was simply missed during the initial
> development.
>
> Fixes: 7f01cab84928 ("KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE")
> Not-yet-signed-off-by: Isaku Yamahata <[email protected]>
> [sean: write changelog]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1259dd63defc..36539c1b36cd 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -626,7 +626,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
> * SPTEs.
> */
> handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> - 0, iter->level, true);
> + SHADOW_NONPRESENT_VALUE, iter->level, true);
>
> return 0;
> }
> --
> 2.45.0.215.g3402c0e53f-goog

Signed-off-by: Isaku Yamahata <[email protected]>
--
Isaku Yamahata <[email protected]>

2024-05-21 17:37:03

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 9/9] KVM: x86: Disable KVM_INTEL_PROVE_VE by default

On Sat, May 18, 2024 at 2:04 AM Sean Christopherson <[email protected]> wrote:
> Disable KVM's "prove #VE" support by default, as it provides no functional
> value, and even its sanity checking benefits are relatively limited. I.e.
> it should be fully opt-in even on debug kernels, especially since EPT
> Violation #VE suppression appears to be buggy on some CPUs.

More #VE trapping than #VE suppression.

I wouldn't go so far as making it *depend* on DEBUG_KERNEL. EXPERT
plus the scary help message is good enough.

What about this:

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index b6831e17ec31..2864608c7016 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -97,14 +97,15 @@ config KVM_INTEL

config KVM_INTEL_PROVE_VE
bool "Check that guests do not receive #VE exceptions"
- depends on KVM_INTEL && DEBUG_KERNEL && EXPERT
+ depends on KVM_INTEL && EXPERT
help
Checks that KVM's page table management code will not incorrectly
let guests receive a virtualization exception. Virtualization
exceptions will be trapped by the hypervisor rather than injected
in the guest.

- This should never be enabled in a production environment.
+ Note that #VE trapping appears to be buggy on some CPUs.
+ This should never be enabled in a production environment!

If unsure, say N.

Paolo

> Opportunistically add a line in the help text to make it abundantly clear
> that KVM_INTEL_PROVE_VE should never be enabled in a production
> environment.
>
> Suggested-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/Kconfig | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 2a7f69abcac3..3468efc4be55 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -97,15 +97,15 @@ config KVM_INTEL
>
> config KVM_INTEL_PROVE_VE
> bool "Check that guests do not receive #VE exceptions"
> - default KVM_PROVE_MMU || DEBUG_KERNEL
> - depends on KVM_INTEL
> + depends on KVM_INTEL && DEBUG_KERNEL && EXPERT
> help
> -
> Checks that KVM's page table management code will not incorrectly
> let guests receive a virtualization exception. Virtualization
> exceptions will be trapped by the hypervisor rather than injected
> in the guest.
>
> + This should never be enabled in a production environment.
> +
> If unsure, say N.
>
> config X86_SGX_KVM
> --
> 2.45.0.215.g3402c0e53f-goog
>

2024-05-21 20:26:18

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 9/9] KVM: x86: Disable KVM_INTEL_PROVE_VE by default

On Tue, May 21, 2024 at 8:18 PM Sean Christopherson <[email protected]> wrote:
> > - This should never be enabled in a production environment.
> > + Note that #VE trapping appears to be buggy on some CPUs.
>
> I see where you're coming from, but I don't think "trapping" is much better,
> e.g. it suggests there's something broken with the interception of #VEs. Ah,
> the entire help text is weird.

Yeah, I didn't want to say #VE is broken altogether - interception is
where we saw issues, and #VE is used in production as far as I know
(not just by TDX; at least Xen and maybe Hyper-V use it for
anti-malware purposes?).

Maybe "Note: there appear to be bugs in some CPUs that will trigger
the WARN, in particular with eptad=0 and/or nested virtualization"
covers all bases.

Paolo

>
> This?
>
> config KVM_INTEL_PROVE_VE
> bool "Verify guests do not receive unexpected EPT Violation #VEs"
> depends on KVM_INTEL && EXPERT
> help
> Enable EPT Violation #VEs (when supported) for all VMs, to verify
> that KVM's EPT management code will not incorrectly result in a #VE
> (KVM is supposed to supress #VEs by default). Unexpected #VEs will
> be intercepted by KVM and will trigger a WARN, but are otherwise
> transparent to the guest.
>
> Note, EPT Violation #VE support appears to be buggy on some CPUs.
>
> This should never be enabled in a production environment!
>
> If unsure, say N.
>

2024-05-23 16:47:02

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH 0/9] KVM: x86: Fixes for KVM_INTEL_PROVE_VE

Queued, thanks.

I moved "KVM: VMX: Don't kill the VM on an unexpected #VE" as the second patch
in the series.

Paolo