2021-03-09 02:24:23

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/2] Fixups to hide our goofs

Please squash away our mistakes and hide them from the world. :-)

Stuffed the MMIO generation to start at 0x3ffffffffffffff0 (bits 61:4 set)
and role over into bit 62. Bit 63 is used for the "update in-progress" so
I'm fairly confident there are no more collisions with other SPTE bits.

For the PCID thing, note that there are two patches with the same changelog.
Not sure what's intended there...

Also, I forgot about adding the PAE root helpers until I tried testing and
PAE didn't work with SME. I'll get those to you tomorrow.

Sean Christopherson (2):
KVM: x86: Fixup "Get active PCID only when writing a CR3 value"
KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

arch/x86/kvm/mmu/spte.h | 12 +++++++-----
arch/x86/kvm/svm/svm.c | 9 +++++++--
2 files changed, 14 insertions(+), 7 deletions(-)

--
2.30.1.766.gb4fecdf3b7-goog


2021-03-09 02:24:52

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

Drop bit 11, used for the MMU_PRESENT flag, from the set of bits used to
store the generation number in MMIO SPTEs. MMIO SPTEs with bit 11 set,
which occurs when userspace creates 128+ memslots in an address space,
get false positives for is_shadow_present_spte(), which lead to a variety
of fireworks, crashes KVM, and likely hangs the host kernel.

Fixes: b14e28f37e9b ("KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs")
Reported-by: Tom Lendacky <[email protected]>
Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/spte.h | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index b53036d9ddf3..bca0ba11cccf 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
#undef SHADOW_ACC_TRACK_SAVED_MASK

/*
- * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
+ * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
* the memslots generation and is derived as follows:
*
- * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
- * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
+ * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
+ * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
*
* The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
* the MMIO generation number, as doing so would require stealing a bit from
@@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
*/

#define MMIO_SPTE_GEN_LOW_START 3
-#define MMIO_SPTE_GEN_LOW_END 11
+#define MMIO_SPTE_GEN_LOW_END 10

#define MMIO_SPTE_GEN_HIGH_START 52
#define MMIO_SPTE_GEN_HIGH_END 62
@@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
MMIO_SPTE_GEN_LOW_START)
#define MMIO_SPTE_GEN_HIGH_MASK GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
MMIO_SPTE_GEN_HIGH_START)
+static_assert(!(SPTE_MMU_PRESENT_MASK &
+ (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));

#define MMIO_SPTE_GEN_LOW_BITS (MMIO_SPTE_GEN_LOW_END - MMIO_SPTE_GEN_LOW_START + 1)
#define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)

/* remember to adjust the comment above as well if you change these */
-static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
+static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);

#define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0)
#define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)
--
2.30.1.766.gb4fecdf3b7-goog

2021-03-09 02:24:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 1/2] KVM: x86: Fixup "Get active PCID only when writing a CR3 value"

From: Sean Christopherson <[email protected]>

Fix SME and PCID, which got horribly mangled on application.

Fixes: a16241ae56fa ("KVM: x86: Get active PCID only when writing a CR3 value")
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7876ddf896b8..271196400495 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3907,15 +3907,20 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
struct vcpu_svm *svm = to_svm(vcpu);
unsigned long cr3;

- cr3 = __sme_set(root_hpa);
if (npt_enabled) {
- svm->vmcb->control.nested_cr3 = root_hpa;
+ svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
vmcb_mark_dirty(svm->vmcb, VMCB_NPT);

/* Loading L2's CR3 is handled by enter_svm_guest_mode. */
if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
return;
cr3 = vcpu->arch.cr3;
+ } else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
+ cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
+ } else {
+ /* PCID in the guest should be impossible with a 32-bit MMU. */
+ WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
+ cr3 = root_hpa;
}

svm->vmcb->save.cr3 = cr3;
--
2.30.1.766.gb4fecdf3b7-goog

2021-03-09 10:13:14

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

On Mon, 2021-03-08 at 18:19 -0800, Sean Christopherson wrote:
> Drop bit 11, used for the MMU_PRESENT flag, from the set of bits used to
> store the generation number in MMIO SPTEs. MMIO SPTEs with bit 11 set,
> which occurs when userspace creates 128+ memslots in an address space,
> get false positives for is_shadow_present_spte(), which lead to a variety
> of fireworks, crashes KVM, and likely hangs the host kernel.
>
> Fixes: b14e28f37e9b ("KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs")
> Reported-by: Tom Lendacky <[email protected]>
> Reported-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/spte.h | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index b53036d9ddf3..bca0ba11cccf 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
> #undef SHADOW_ACC_TRACK_SAVED_MASK
>
> /*
> - * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
> + * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
> * the memslots generation and is derived as follows:
> *
> - * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
> - * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
> + * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
> + * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
> *
> * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
> * the MMIO generation number, as doing so would require stealing a bit from
> @@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
> */
>
> #define MMIO_SPTE_GEN_LOW_START 3
> -#define MMIO_SPTE_GEN_LOW_END 11
> +#define MMIO_SPTE_GEN_LOW_END 10
>
> #define MMIO_SPTE_GEN_HIGH_START 52
> #define MMIO_SPTE_GEN_HIGH_END 62
> @@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
> MMIO_SPTE_GEN_LOW_START)
> #define MMIO_SPTE_GEN_HIGH_MASK GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
> MMIO_SPTE_GEN_HIGH_START)
> +static_assert(!(SPTE_MMU_PRESENT_MASK &
> + (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));
>
> #define MMIO_SPTE_GEN_LOW_BITS (MMIO_SPTE_GEN_LOW_END - MMIO_SPTE_GEN_LOW_START + 1)
> #define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)
>
> /* remember to adjust the comment above as well if you change these */
> -static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
> +static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
>
> #define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0)
> #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)
I bisected this and I reached the same conclusion that bit 11 has to be removed from mmio generation mask.

Reviewed-by: Maxim Levitsky <[email protected]>

I do wonder, why do we need 19 (and now 18 bits) for the mmio generation:

What happens if mmio generation overflows (e.g if userspace keeps on updating the memslots)?
In theory if we have a SPTE with a stale generation, it can became valid, no?

I think that we should in the case of the overflow zap all mmio sptes.
What do you think?

Best regards,
Maxim Levitsky

2021-03-09 13:14:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

On 09/03/21 11:09, Maxim Levitsky wrote:
> What happens if mmio generation overflows (e.g if userspace keeps on updating the memslots)?
> In theory if we have a SPTE with a stale generation, it can became valid, no?
>
> I think that we should in the case of the overflow zap all mmio sptes.
> What do you think?

Zapping all MMIO SPTEs is done by updating the generation count. When
it overflows, all SPs are zapped:

/*
* The very rare case: if the MMIO generation number has wrapped,
* zap all shadow pages.
*/
if (unlikely(gen == 0)) {
kvm_debug_ratelimited("kvm: zapping shadow pages for
mmio generation wraparound\n");
kvm_mmu_zap_all_fast(kvm);
}

So giving it more bits make this more rare, at the same time having to
remove one or two bits is not the end of the world.

Paolo

2021-03-09 13:33:03

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

On Tue, 2021-03-09 at 14:12 +0100, Paolo Bonzini wrote:
> On 09/03/21 11:09, Maxim Levitsky wrote:
> > What happens if mmio generation overflows (e.g if userspace keeps on updating the memslots)?
> > In theory if we have a SPTE with a stale generation, it can became valid, no?
> >
> > I think that we should in the case of the overflow zap all mmio sptes.
> > What do you think?
>
> Zapping all MMIO SPTEs is done by updating the generation count. When
> it overflows, all SPs are zapped:
>
> /*
> * The very rare case: if the MMIO generation number has wrapped,
> * zap all shadow pages.
> */
> if (unlikely(gen == 0)) {
> kvm_debug_ratelimited("kvm: zapping shadow pages for
> mmio generation wraparound\n");
> kvm_mmu_zap_all_fast(kvm);
> }
>
> So giving it more bits make this more rare, at the same time having to
> remove one or two bits is not the end of the world.

This is exactly what I expected to happen, I just didn't find that code.
Thanks for explanation, and it shows that I didn't study the mmio spte
code much.

Best regards,
Maxim Levitsky

>
> Paolo
>


2021-03-09 16:41:35

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

On 3/8/21 8:19 PM, Sean Christopherson wrote:
> Drop bit 11, used for the MMU_PRESENT flag, from the set of bits used to
> store the generation number in MMIO SPTEs. MMIO SPTEs with bit 11 set,
> which occurs when userspace creates 128+ memslots in an address space,
> get false positives for is_shadow_present_spte(), which lead to a variety
> of fireworks, crashes KVM, and likely hangs the host kernel.
>
> Fixes: b14e28f37e9b ("KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs")
> Reported-by: Tom Lendacky <[email protected]>

Fixes the issue for me. Thanks, Sean.

Tested-by: Tom Lendacky <[email protected]>

> Reported-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/spte.h | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index b53036d9ddf3..bca0ba11cccf 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
> #undef SHADOW_ACC_TRACK_SAVED_MASK
>
> /*
> - * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
> + * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
> * the memslots generation and is derived as follows:
> *
> - * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
> - * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
> + * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
> + * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
> *
> * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
> * the MMIO generation number, as doing so would require stealing a bit from
> @@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
> */
>
> #define MMIO_SPTE_GEN_LOW_START 3
> -#define MMIO_SPTE_GEN_LOW_END 11
> +#define MMIO_SPTE_GEN_LOW_END 10
>
> #define MMIO_SPTE_GEN_HIGH_START 52
> #define MMIO_SPTE_GEN_HIGH_END 62
> @@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
> MMIO_SPTE_GEN_LOW_START)
> #define MMIO_SPTE_GEN_HIGH_MASK GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
> MMIO_SPTE_GEN_HIGH_START)
> +static_assert(!(SPTE_MMU_PRESENT_MASK &
> + (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));
>
> #define MMIO_SPTE_GEN_LOW_BITS (MMIO_SPTE_GEN_LOW_END - MMIO_SPTE_GEN_LOW_START + 1)
> #define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)
>
> /* remember to adjust the comment above as well if you change these */
> -static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
> +static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
>
> #define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0)
> #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)
>

2021-03-09 17:30:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Fixup "Get active PCID only when writing a CR3 value"

On Mon, Mar 08, 2021, Sean Christopherson wrote:
> From: Sean Christopherson <[email protected]>
>
> Fix SME and PCID, which got horribly mangled on application.

Gah, the SME changes are supposed to be in "KVM: x86/mmu: Mark the PAE roots as
decrypted for shadow paging", which has not yet been merged. Stuffing them
here doesn't make things work, but it does break git blame.

I'll send you a v2 with more appropriate fixup, and the PAE changes on top.

> Fixes: a16241ae56fa ("KVM: x86: Get active PCID only when writing a CR3 value")
> Signed-off-by: Sean Christopherson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7876ddf896b8..271196400495 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3907,15 +3907,20 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> struct vcpu_svm *svm = to_svm(vcpu);
> unsigned long cr3;
>
> - cr3 = __sme_set(root_hpa);
> if (npt_enabled) {
> - svm->vmcb->control.nested_cr3 = root_hpa;
> + svm->vmcb->control.nested_cr3 = __sme_set(root_hpa);
> vmcb_mark_dirty(svm->vmcb, VMCB_NPT);
>
> /* Loading L2's CR3 is handled by enter_svm_guest_mode. */
> if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail))
> return;
> cr3 = vcpu->arch.cr3;
> + } else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
> + cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu);
> + } else {
> + /* PCID in the guest should be impossible with a 32-bit MMU. */
> + WARN_ON_ONCE(kvm_get_active_pcid(vcpu));
> + cr3 = root_hpa;
> }
>
> svm->vmcb->save.cr3 = cr3;
> --
> 2.30.1.766.gb4fecdf3b7-goog
>