Fix two bugs I introduced when adding the enable_mmio_caching module param.
Bug #1 is that KVM unintentionally makes disabling caching due to a config
incompatibility "sticky", e.g. disabling caching because there are no
reserved PA bits prevents KVM from enabling when "switching" to an EPT
config (doesn't rely on PA bits) or when SVM adjusts the MMIO masks to
account for C-bit shenanigans (even if MAXPHYADDR=52 and C-bit=51, there
can be reserved PA bits due to the "real" MAXPHYADDR being reduced).
Bug #2 is that KVM doesn't explicitly check that MMIO caching is enabled
when doing SEV-ES setup. Prior to the module param, MMIO caching was
guaranteed when SEV-ES could be enabled as SEV-ES-capable CPUs effectively
guarantee there will be at least one reserved PA bit (see above). With
the module param, userspace can explicitly disable MMIO caching, thus
silently breaking SEV-ES.
I believe I tested all combinations of things getting disabled and enabled
by hacking kvm_mmu_reset_all_pte_masks() to disable MMIO caching on a much
lower MAXPHYADDR, e.g. 43 instead of 52. That said, definitely wait for a
thumbs up from the AMD folks before queueing.
Sean Christopherson (4):
KVM: x86: Tag kvm_mmu_x86_module_init() with __init
KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change
KVM: SVM: Adjust MMIO masks (for caching) before doing SEV(-ES) setup
KVM: SVM: Disable SEV-ES support if MMIO caching is disable
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.h | 2 ++
arch/x86/kvm/mmu/mmu.c | 6 +++++-
arch/x86/kvm/mmu/spte.c | 20 ++++++++++++++++++++
arch/x86/kvm/mmu/spte.h | 3 +--
arch/x86/kvm/svm/sev.c | 10 ++++++++++
arch/x86/kvm/svm/svm.c | 9 ++++++---
7 files changed, 45 insertions(+), 7 deletions(-)
base-commit: 1a4d88a361af4f2e91861d632c6a1fe87a9665c2
--
2.37.1.455.g008518b4e5-goog
Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
generating #NPF(RSVD), which are reflected by the CPU into the guest as
a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
to the guest instruction stream or register state and so can't directly
emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO
caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
and those flavors of #NPF cause automatic VM-Exits, not #VC.
Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
Reported-by: Michael Roth <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu.h | 2 ++
arch/x86/kvm/mmu/spte.c | 1 +
arch/x86/kvm/mmu/spte.h | 2 --
arch/x86/kvm/svm/sev.c | 10 ++++++++++
4 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index a99acec925eb..6bdaacb6faa0 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -6,6 +6,8 @@
#include "kvm_cache_regs.h"
#include "cpuid.h"
+extern bool __read_mostly enable_mmio_caching;
+
#define PT_WRITABLE_SHIFT 1
#define PT_USER_SHIFT 2
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 66f76f5a15bd..03ca740bf721 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -22,6 +22,7 @@
bool __read_mostly enable_mmio_caching = true;
static bool __ro_after_init allow_mmio_caching;
module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
+EXPORT_SYMBOL_GPL(enable_mmio_caching);
u64 __read_mostly shadow_host_writable_mask;
u64 __read_mostly shadow_mmu_writable_mask;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 26b144ffd146..9a9414b8d1d6 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -5,8 +5,6 @@
#include "mmu_internal.h"
-extern bool __read_mostly enable_mmio_caching;
-
/*
* A MMU present SPTE is backed by actual memory and may or may not be present
* in hardware. E.g. MMIO SPTEs are not considered present. Use bit 11, as it
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 309bcdb2f929..05bf6301acac 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -22,6 +22,7 @@
#include <asm/trapnr.h>
#include <asm/fpu/xcr.h>
+#include "mmu.h"
#include "x86.h"
#include "svm.h"
#include "svm_ops.h"
@@ -2205,6 +2206,15 @@ void __init sev_hardware_setup(void)
if (!sev_es_enabled)
goto out;
+ /*
+ * SEV-ES requires MMIO caching as KVM doesn't have access to the guest
+ * instruction stream, i.e. can't emulate in response to a #NPF and
+ * instead relies on #NPF(RSVD) being reflected into the guest as #VC
+ * (the guest can then do a #VMGEXIT to request MMIO emulation).
+ */
+ if (!enable_mmio_caching)
+ goto out;
+
/* Does the CPU support SEV-ES? */
if (!boot_cpu_has(X86_FEATURE_SEV_ES))
goto out;
--
2.37.1.455.g008518b4e5-goog
On Thu, Jul 28, 2022 at 10:17:55PM +0000, Sean Christopherson wrote:
> Fix two bugs I introduced when adding the enable_mmio_caching module param.
>
> Bug #1 is that KVM unintentionally makes disabling caching due to a config
> incompatibility "sticky", e.g. disabling caching because there are no
> reserved PA bits prevents KVM from enabling when "switching" to an EPT
> config (doesn't rely on PA bits) or when SVM adjusts the MMIO masks to
> account for C-bit shenanigans (even if MAXPHYADDR=52 and C-bit=51, there
> can be reserved PA bits due to the "real" MAXPHYADDR being reduced).
>
> Bug #2 is that KVM doesn't explicitly check that MMIO caching is enabled
> when doing SEV-ES setup. Prior to the module param, MMIO caching was
> guaranteed when SEV-ES could be enabled as SEV-ES-capable CPUs effectively
> guarantee there will be at least one reserved PA bit (see above). With
> the module param, userspace can explicitly disable MMIO caching, thus
> silently breaking SEV-ES.
>
> I believe I tested all combinations of things getting disabled and enabled
> by hacking kvm_mmu_reset_all_pte_masks() to disable MMIO caching on a much
> lower MAXPHYADDR, e.g. 43 instead of 52. That said, definitely wait for a
> thumbs up from the AMD folks before queueing.
I tested the below systems/configurations and everything looks good
to me. Thanks for the quick fix!
AMD Milan, MAXPHYADDR = 48 bits, kvm.mmio_caching=on (on by default)
normal: pass
SEV: pass
SEV-ES: pass
AMD Milan, MAXPHYADDR = 48 bits, kvm.mmio_caching=off
normal: pass
SEV: pass
SEV-ES: fail (as expected, since kvm_amd.sev_es gets forced to off)
AMD unreleased, MAXPHYADDR = 52 bits, kvm.mmio_caching=on (on by default)
normal: pass
SEV: pass
SEV-ES: pass
AMD unreleased, MAXPHYADDR = 52 bits, kvm.mmio_caching=off
normal: pass
SEV: pass
SEV-ES: fail (as expected, since kvm_amd.sev_es gets forced to off)
>
> Sean Christopherson (4):
> KVM: x86: Tag kvm_mmu_x86_module_init() with __init
> KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change
> KVM: SVM: Adjust MMIO masks (for caching) before doing SEV(-ES) setup
> KVM: SVM: Disable SEV-ES support if MMIO caching is disable
Series:
Tested-by: Michael Roth <[email protected]>
-Mike
On Thu, 2022-07-28 at 22:17 +0000, Sean Christopherson wrote:
> Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
> generating #NPF(RSVD), which are reflected by the CPU into the guest as
> a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
> to the guest instruction stream or register state and so can't directly
> emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO
> caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
> and those flavors of #NPF cause automatic VM-Exits, not #VC.
>
> Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
> Reported-by: Michael Roth <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu.h | 2 ++
> arch/x86/kvm/mmu/spte.c | 1 +
> arch/x86/kvm/mmu/spte.h | 2 --
> arch/x86/kvm/svm/sev.c | 10 ++++++++++
> 4 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index a99acec925eb..6bdaacb6faa0 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -6,6 +6,8 @@
> #include "kvm_cache_regs.h"
> #include "cpuid.h"
>
> +extern bool __read_mostly enable_mmio_caching;
> +
> #define PT_WRITABLE_SHIFT 1
> #define PT_USER_SHIFT 2
>
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 66f76f5a15bd..03ca740bf721 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -22,6 +22,7 @@
> bool __read_mostly enable_mmio_caching = true;
> static bool __ro_after_init allow_mmio_caching;
> module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
> +EXPORT_SYMBOL_GPL(enable_mmio_caching);
>
> u64 __read_mostly shadow_host_writable_mask;
> u64 __read_mostly shadow_mmu_writable_mask;
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 26b144ffd146..9a9414b8d1d6 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -5,8 +5,6 @@
>
> #include "mmu_internal.h"
>
> -extern bool __read_mostly enable_mmio_caching;
> -
> /*
> * A MMU present SPTE is backed by actual memory and may or may not be present
> * in hardware. E.g. MMIO SPTEs are not considered present. Use bit 11, as it
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 309bcdb2f929..05bf6301acac 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -22,6 +22,7 @@
> #include <asm/trapnr.h>
> #include <asm/fpu/xcr.h>
>
> +#include "mmu.h"
> #include "x86.h"
> #include "svm.h"
> #include "svm_ops.h"
> @@ -2205,6 +2206,15 @@ void __init sev_hardware_setup(void)
> if (!sev_es_enabled)
> goto out;
>
> + /*
> + * SEV-ES requires MMIO caching as KVM doesn't have access to the guest
> + * instruction stream, i.e. can't emulate in response to a #NPF and
> + * instead relies on #NPF(RSVD) being reflected into the guest as #VC
> + * (the guest can then do a #VMGEXIT to request MMIO emulation).
> + */
> + if (!enable_mmio_caching)
> + goto out;
> +
>
I am not familiar with SEV, but looks it is similar to TDX -- they both causes
#VE to guest instead of faulting into KVM. And they both require explicit call
from guest to do MMIO.
In this case, does existing MMIO caching logic still apply to them? Should we
still treat SEV and TDX's MMIO handling as MMIO caching being enabled? Or
perhaps another variable?
--
Thanks,
-Kai
On Fri, Jul 29, 2022, Kai Huang wrote:
> On Thu, 2022-07-28 at 22:17 +0000, Sean Christopherson wrote:
> > Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
> > generating #NPF(RSVD), which are reflected by the CPU into the guest as
> > a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
> > to the guest instruction stream or register state and so can't directly
> > emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO
> > caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
> > and those flavors of #NPF cause automatic VM-Exits, not #VC.
> >
> > Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
> > Reported-by: Michael Roth <[email protected]>
> > Cc: Tom Lendacky <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
...
> > + /*
> > + * SEV-ES requires MMIO caching as KVM doesn't have access to the guest
> > + * instruction stream, i.e. can't emulate in response to a #NPF and
> > + * instead relies on #NPF(RSVD) being reflected into the guest as #VC
> > + * (the guest can then do a #VMGEXIT to request MMIO emulation).
> > + */
> > + if (!enable_mmio_caching)
> > + goto out;
> > +
> >
>
> I am not familiar with SEV, but looks it is similar to TDX -- they both causes
> #VE to guest instead of faulting into KVM. ?And they both require explicit call
> from guest to do MMIO.
>
> In this case, does existing MMIO caching logic still apply to them?
Yes, because TDX/SEV-ES+ need to generate #VE/#VC on emulated MMIO so that legacy
(or intentionally unenlightened) software in the guest doesn't simply hang/die on
memory accesses to emulated MMIO (as opposed to direct TDVMCALL/#VMGEXIT).
> Should we still treat SEV and TDX's MMIO handling as MMIO caching being
> enabled? Or perhaps another variable?
I don't think a separate variable is necesary. At its core, KVM is still caching
MMIO GPAs via magic SPTE values. The fact that it's required for functionality
doesn't make the name wrong.
SEV-ES+ in particular doesn't have a strong guarantee that inducing #VC via #NPF(RSVD)
is always possible. Theoretically, an SEV-ES+ capable CPU could ship with an effective
MAXPHYADDR=51 (after reducing the raw MAXPHYADDR) and C-bit=51, in which case there are
no resered PA bits and thus no reserved PTE bits at all. That's obviously unlikely to
happen, but if it does come up, then disabling SEV-ES+ due to MMIO caching not being
possible is the desired behavior, e.g. either the CPU configuration is bad or KVM is
lacking support for a newfangled way to support emulated MMIO (in a future theoretical
SEV-* product).
On Fri, 2022-07-29 at 15:21 +0000, Sean Christopherson wrote:
> On Fri, Jul 29, 2022, Kai Huang wrote:
> > On Thu, 2022-07-28 at 22:17 +0000, Sean Christopherson wrote:
> > > Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
> > > generating #NPF(RSVD), which are reflected by the CPU into the guest as
> > > a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
> > > to the guest instruction stream or register state and so can't directly
> > > emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO
> > > caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
> > > and those flavors of #NPF cause automatic VM-Exits, not #VC.
> > >
> > > Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
> > > Reported-by: Michael Roth <[email protected]>
> > > Cc: Tom Lendacky <[email protected]>
> > > Cc: [email protected]
> > > Signed-off-by: Sean Christopherson <[email protected]>
> > > ---
>
> ...
>
> > > + /*
> > > + * SEV-ES requires MMIO caching as KVM doesn't have access to the guest
> > > + * instruction stream, i.e. can't emulate in response to a #NPF and
> > > + * instead relies on #NPF(RSVD) being reflected into the guest as #VC
> > > + * (the guest can then do a #VMGEXIT to request MMIO emulation).
> > > + */
> > > + if (!enable_mmio_caching)
> > > + goto out;
> > > +
> > >
> >
> > I am not familiar with SEV, but looks it is similar to TDX -- they both causes
> > #VE to guest instead of faulting into KVM. And they both require explicit call
> > from guest to do MMIO.
> >
> > In this case, does existing MMIO caching logic still apply to them?
>
> Yes, because TDX/SEV-ES+ need to generate #VE/#VC on emulated MMIO so that legacy
> (or intentionally unenlightened) software in the guest doesn't simply hang/die on
> memory accesses to emulated MMIO (as opposed to direct TDVMCALL/#VMGEXIT).
>
> > Should we still treat SEV and TDX's MMIO handling as MMIO caching being
> > enabled? Or perhaps another variable?
>
> I don't think a separate variable is necesary. At its core, KVM is still caching
> MMIO GPAs via magic SPTE values. The fact that it's required for functionality
> doesn't make the name wrong.
OK.
>
> SEV-ES+ in particular doesn't have a strong guarantee that inducing #VC via #NPF(RSVD)
> is always possible. Theoretically, an SEV-ES+ capable CPU could ship with an effective
> MAXPHYADDR=51 (after reducing the raw MAXPHYADDR) and C-bit=51, in which case there are
> no resered PA bits and thus no reserved PTE bits at all. That's obviously unlikely to
> happen, but if it does come up, then disabling SEV-ES+ due to MMIO caching not being
> possible is the desired behavior, e.g. either the CPU configuration is bad or KVM is
> lacking support for a newfangled way to support emulated MMIO (in a future theoretical
> SEV-* product).
I bet AMD will see this (your) response and never ship such chips :)
--
Thanks,
-Kai