If CR0.WP may be guest-owned, i.e. TDP is enabled, refresh the MMU's
snapshot of the guest's CR0.WP prior to checking for permission faults
when emulating a guest memory access. If the guest toggles only CR0.WP
and triggers emulation of a supervisor write, e.g. when KVM is emulating
UMIP, KVM may consume a stale CR0.WP, i.e. use stale protection bits
metadata.
Reported-by: Mathias Krause <[email protected]>
Link: https://lkml.kernel.org/r/677169b4-051f-fcae-756b-9a3e1bb9f8fe%40grsecurity.net
Fixes: fb509f76acc8 ("KVM: VMX: Make CR0.WP a guest owned bit")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu.h | 26 +++++++++++++++++++++++++-
arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 89f532516a45..92d5a1924fc1 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -113,6 +113,8 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
+void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *mmu);
int kvm_mmu_load(struct kvm_vcpu *vcpu);
void kvm_mmu_unload(struct kvm_vcpu *vcpu);
@@ -153,6 +155,24 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
vcpu->arch.mmu->root_role.level);
}
+static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *mmu)
+{
+ /*
+ * When EPT is enabled, KVM may passthrough CR0.WP to the guest, i.e.
+ * @mmu's snapshot of CR0.WP and thus all related paging metadata may
+ * be stale. Refresh CR0.WP and the metadata on-demand when checking
+ * for permission faults. Exempt nested MMUs, i.e. MMUs for shadowing
+ * nEPT and nNPT, as CR0.WP is ignored in both cases. Note, KVM does
+ * need to refresh nested_mmu, a.k.a. the walker used to translate L2
+ * GVAs to GPAs, as that "MMU" needs to honor L2's CR0.WP.
+ */
+ if (!tdp_enabled || mmu == &vcpu->arch.guest_mmu)
+ return;
+
+ __kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
+}
+
/*
* Check if a given access (described through the I/D, W/R and U/S bits of a
* page fault error code pfec) causes a permission fault with the given PTE
@@ -184,8 +204,12 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
- bool fault = (mmu->permissions[index] >> pte_access) & 1;
u32 errcode = PFERR_PRESENT_MASK;
+ bool fault;
+
+ kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
+
+ fault = (mmu->permissions[index] >> pte_access) & 1;
WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
if (unlikely(mmu->pkru_mask)) {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4c874d4ec68f..47269d50e98d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5186,6 +5186,21 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kvm_vcpu *vcpu,
return role;
}
+void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *mmu)
+{
+ const bool cr0_wp = kvm_is_cr0_bit_set(vcpu, X86_CR0_WP);
+
+ BUILD_BUG_ON((KVM_MMU_CR0_ROLE_BITS & KVM_POSSIBLE_CR0_GUEST_BITS) != X86_CR0_WP);
+ BUILD_BUG_ON((KVM_MMU_CR4_ROLE_BITS & KVM_POSSIBLE_CR4_GUEST_BITS));
+
+ if (is_cr0_wp(mmu) == cr0_wp)
+ return;
+
+ mmu->cpu_role.base.cr0_wp = cr0_wp;
+ reset_guest_paging_metadata(vcpu, mmu);
+}
+
static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
{
/* tdp_root_level is architecture forced level, use it if nonzero */
base-commit: 27d6845d258b67f4eb3debe062b7dacc67e0c393
--
2.40.0.348.gf938b09366-goog
On 05.04.23 02:26, Sean Christopherson wrote:
> If CR0.WP may be guest-owned, i.e. TDP is enabled, refresh the MMU's
> snapshot of the guest's CR0.WP prior to checking for permission faults
> when emulating a guest memory access. If the guest toggles only CR0.WP
> and triggers emulation of a supervisor write, e.g. when KVM is emulating
> UMIP, KVM may consume a stale CR0.WP, i.e. use stale protection bits
> metadata.
This reads a little awkward for a non-native speaker. Maybe something
like this:
As CR0.WP may be guest-owned if TDP is enabled, refresh...
[in between as is]
...consume a stale CR0.WP and use stale protection bits metadata.
>
> Reported-by: Mathias Krause <[email protected]>
> Link: https://lkml.kernel.org/r/677169b4-051f-fcae-756b-9a3e1bb9f8fe%40grsecurity.net
> Fixes: fb509f76acc8 ("KVM: VMX: Make CR0.WP a guest owned bit")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu.h | 26 +++++++++++++++++++++++++-
> arch/x86/kvm/mmu/mmu.c | 15 +++++++++++++++
> 2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 89f532516a45..92d5a1924fc1 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -113,6 +113,8 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
> bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
> int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> u64 fault_address, char *insn, int insn_len);
> +void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> + struct kvm_mmu *mmu);
>
> int kvm_mmu_load(struct kvm_vcpu *vcpu);
> void kvm_mmu_unload(struct kvm_vcpu *vcpu);
> @@ -153,6 +155,24 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
> vcpu->arch.mmu->root_role.level);
> }
>
> +static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> + struct kvm_mmu *mmu)
> +{
> + /*
> + * When EPT is enabled, KVM may passthrough CR0.WP to the guest, i.e.
> + * @mmu's snapshot of CR0.WP and thus all related paging metadata may
> + * be stale. Refresh CR0.WP and the metadata on-demand when checking
> + * for permission faults. Exempt nested MMUs, i.e. MMUs for shadowing
> + * nEPT and nNPT, as CR0.WP is ignored in both cases. Note, KVM does
> + * need to refresh nested_mmu, a.k.a. the walker used to translate L2
> + * GVAs to GPAs, as that "MMU" needs to honor L2's CR0.WP.
> + */
> + if (!tdp_enabled || mmu == &vcpu->arch.guest_mmu)
> + return;
> +
> + __kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
> +}
> +
> /*
> * Check if a given access (described through the I/D, W/R and U/S bits of a
> * page fault error code pfec) causes a permission fault with the given PTE
> @@ -184,8 +204,12 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
> bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
> int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
> - bool fault = (mmu->permissions[index] >> pte_access) & 1;
> u32 errcode = PFERR_PRESENT_MASK;
> + bool fault;
> +
> + kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
> +
> + fault = (mmu->permissions[index] >> pte_access) & 1;
>
> WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
> if (unlikely(mmu->pkru_mask)) {
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4c874d4ec68f..47269d50e98d 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5186,6 +5186,21 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kvm_vcpu *vcpu,
> return role;
> }
>
> +void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> + struct kvm_mmu *mmu)
> +{
> + const bool cr0_wp = kvm_is_cr0_bit_set(vcpu, X86_CR0_WP);
> +
> + BUILD_BUG_ON((KVM_MMU_CR0_ROLE_BITS & KVM_POSSIBLE_CR0_GUEST_BITS) != X86_CR0_WP);
> + BUILD_BUG_ON((KVM_MMU_CR4_ROLE_BITS & KVM_POSSIBLE_CR4_GUEST_BITS));
Just curious, this should assert that we don't run into similar issues
if we make more bits of CR4 guest owned?
> +
> + if (is_cr0_wp(mmu) == cr0_wp)
> + return;
> +
> + mmu->cpu_role.base.cr0_wp = cr0_wp;
> + reset_guest_paging_metadata(vcpu, mmu);
> +}
> +
> static inline int kvm_mmu_get_tdp_level(struct kvm_vcpu *vcpu)
> {
> /* tdp_root_level is architecture forced level, use it if nonzero */
>
> base-commit: 27d6845d258b67f4eb3debe062b7dacc67e0c393
Tested-by: Mathias Krause <[email protected]>
Thanks again, Sean!
On Wed, Apr 05, 2023, Mathias Krause wrote:
> On 05.04.23 02:26, Sean Christopherson wrote:
> > +void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> > + struct kvm_mmu *mmu)
> > +{
> > + const bool cr0_wp = kvm_is_cr0_bit_set(vcpu, X86_CR0_WP);
> > +
> > + BUILD_BUG_ON((KVM_MMU_CR0_ROLE_BITS & KVM_POSSIBLE_CR0_GUEST_BITS) != X86_CR0_WP);
>
> > + BUILD_BUG_ON((KVM_MMU_CR4_ROLE_BITS & KVM_POSSIBLE_CR4_GUEST_BITS));
>
> Just curious, this should assert that we don't run into similar issues
> if we make more bits of CR4 guest owned?
Yes? I'm not sure what you're asking. BUILD_BUG_ON() is a just more flexible
version of stiatic_assert(); it only requires that the inputs be compile-time
constants, not purely "static".
he above throws an error at compile-time if there is new overlap between the
CR{0,4} MMU role bits and the possible guest-owned bits. E.g. adding SMEP to the
possible guest-owned CR4 bits yields:
arch/x86/kvm/mmu/mmu.c: In function ‘__kvm_mmu_refresh_passthrough_bits’:
include/linux/compiler_types.h:397:45: error: call to ‘__compiletime_assert_1564’
declared with attribute error: BUILD_BUG_ON failed: (KVM_MMU_CR4_ROLE_BITS & KVM_POSSIBLE_CR4_GUEST_BITS)
397 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:378:25: note: in definition of macro ‘__compiletime_assert’
378 | prefix ## suffix(); \
| ^~~~~~
include/linux/compiler_types.h:397:9: note: in expansion of macro ‘_compiletime_assert’
397 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:9: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
arch/x86/kvm/mmu/mmu.c:5191:9: note: in expansion of macro ‘BUILD_BUG_ON’
5191 | BUILD_BUG_ON((KVM_MMU_CR4_ROLE_BITS & KVM_POSSIBLE_CR4_GUEST_BITS));
| ^~~~~~~~~~~~
scripts/Makefile.build:252: arch/x86/kvm/mmu/mmu.o] Error 1
On 05.04.23 16:36, Sean Christopherson wrote:
> On Wed, Apr 05, 2023, Mathias Krause wrote:
>> On 05.04.23 02:26, Sean Christopherson wrote:
>>> +void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
>>> + struct kvm_mmu *mmu)
>>> +{
>>> + const bool cr0_wp = kvm_is_cr0_bit_set(vcpu, X86_CR0_WP);
>>> +
>>> + BUILD_BUG_ON((KVM_MMU_CR0_ROLE_BITS & KVM_POSSIBLE_CR0_GUEST_BITS) != X86_CR0_WP);
>>
>>> + BUILD_BUG_ON((KVM_MMU_CR4_ROLE_BITS & KVM_POSSIBLE_CR4_GUEST_BITS));
>>
>> Just curious, this should assert that we don't run into similar issues
>> if we make more bits of CR4 guest owned?
>
> Yes? I'm not sure what you're asking. BUILD_BUG_ON() is a just more flexible
> version of stiatic_assert(); it only requires that the inputs be compile-time
> constants, not purely "static>
> he above throws an error at compile-time if there is new overlap between the
> CR{0,4} MMU role bits and the possible guest-owned bits. E.g. adding SMEP to the
> possible guest-owned CR4 bits yields:
Yes, I was just asking about the reasoning behind it as it, obviously,
isn't a problem with the current code. But I ran into it while doing
backports, so thanks for adding it :D
>
> [...]
On Tue, 04 Apr 2023 17:26:08 -0700, Sean Christopherson wrote:
> If CR0.WP may be guest-owned, i.e. TDP is enabled, refresh the MMU's
> snapshot of the guest's CR0.WP prior to checking for permission faults
> when emulating a guest memory access. If the guest toggles only CR0.WP
> and triggers emulation of a supervisor write, e.g. when KVM is emulating
> UMIP, KVM may consume a stale CR0.WP, i.e. use stale protection bits
> metadata.
>
> [...]
Applied to kvm-x86 misc, with a reworked changelog.
[1/1] KVM: x86/mmu: Refresh CR0.WP prior to checking for emulated permission faults
https://github.com/kvm-x86/linux/commit/cf9f4c0eb169
--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes
On Wed, Apr 05, 2023, Mathias Krause wrote:
> On 05.04.23 02:26, Sean Christopherson wrote:
> > If CR0.WP may be guest-owned, i.e. TDP is enabled, refresh the MMU's
> > snapshot of the guest's CR0.WP prior to checking for permission faults
> > when emulating a guest memory access. If the guest toggles only CR0.WP
> > and triggers emulation of a supervisor write, e.g. when KVM is emulating
> > UMIP, KVM may consume a stale CR0.WP, i.e. use stale protection bits
> > metadata.
>
> This reads a little awkward for a non-native speaker.
Heh, I don't think being a non-native English speaker has anything to do with it
being awkward, I also found it confusing when I reread it :-)
I rewrote the changelog to the below when applying. Holler if it's still weird,
I can easily fixup and force push the changelog.
Thanks!
Refresh the MMU's snapshot of the vCPU's CR0.WP prior to checking for
permission faults when emulating a guest memory access and CR0.WP may be
guest owned. If the guest toggles only CR0.WP and triggers emulation of
a supervisor write, e.g. when KVM is emulating UMIP, KVM may consume a
stale CR0.WP, i.e. use stale protection bits metadata.
Note, KVM passes through CR0.WP if and only if EPT is enabled as CR0.WP
is part of the MMU role for legacy shadow paging, and SVM (NPT) doesn't
support per-bit interception controls for CR0. Don't bother checking for
EPT vs. NPT as the "old == new" check will always be true under NPT, i.e.
the only cost is the read of vcpu->arch.cr4 (SVM unconditionally grabs CR0
from the VMCB on VM-Exit).