2023-12-05 23:50:39

by Michael Roth

[permalink] [raw]
Subject: [PATCH] KVM: SEV: Fix handling of EFER_LMA bit when SEV-ES is enabled

In general, activating long mode involves setting the EFER_LME bit in
the EFER register and then enabling the X86_CR0_PG bit in the CR0
register. At this point, the EFER_LMA bit will be set automatically by
hardware.

In the case of SVM/SEV guests where writes to CR0 are intercepted, it's
necessary for the host to set EFER_LMA on behalf of the guest since
hardware does not see the actual CR0 write.

In the case of SEV-ES guests where writes to CR0 are trapped instead of
intercepted, the hardware *does* see/record the write to CR0 before
exiting and passing the value on to the host, so as part of enabling
SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to
support intercepts under SEV-ES") dropped special handling of the
EFER_LMA bit with the understanding that it would be set automatically.

However, since the guest never explicitly sets the EFER_LMA bit, the
host never becomes aware that it has been set. This becomes problematic
when userspace tries to get/set the EFER values via
KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host
will be missing the EFER_LMA bit, and when userspace attempts to pass
the EFER value back via KVM_SET_SREGS it will fail a sanity check that
asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME
are set.

Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG
and EFER_LME, regardless of whether or not SEV-ES is enabled.

Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Suggested-by: Tom Lendacky <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Michael Roth <[email protected]>
---
arch/x86/kvm/svm/svm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5d75a1732da4..b31d4f2deb66 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1869,7 +1869,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
bool old_paging = is_paging(vcpu);

#ifdef CONFIG_X86_64
- if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) {
+ if (vcpu->arch.efer & EFER_LME) {
if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
vcpu->arch.efer |= EFER_LMA;
svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
--
2.25.1


2023-12-06 15:28:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix handling of EFER_LMA bit when SEV-ES is enabled

On Tue, Dec 05, 2023, Michael Roth wrote:
> In general, activating long mode involves setting the EFER_LME bit in
> the EFER register and then enabling the X86_CR0_PG bit in the CR0
> register. At this point, the EFER_LMA bit will be set automatically by
> hardware.
>
> In the case of SVM/SEV guests where writes to CR0 are intercepted, it's
> necessary for the host to set EFER_LMA on behalf of the guest since
> hardware does not see the actual CR0 write.
>
> In the case of SEV-ES guests where writes to CR0 are trapped instead of
> intercepted, the hardware *does* see/record the write to CR0 before
> exiting and passing the value on to the host, so as part of enabling
> SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to
> support intercepts under SEV-ES") dropped special handling of the
> EFER_LMA bit with the understanding that it would be set automatically.
>
> However, since the guest never explicitly sets the EFER_LMA bit, the
> host never becomes aware that it has been set. This becomes problematic
> when userspace tries to get/set the EFER values via
> KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host
> will be missing the EFER_LMA bit, and when userspace attempts to pass
> the EFER value back via KVM_SET_SREGS it will fail a sanity check that
> asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME
> are set.
>
> Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG
> and EFER_LME, regardless of whether or not SEV-ES is enabled.

Blech. This is a hack to fix even worse hacks. KVM ignores CR0/CR4/EFER values
that are set via KVM_SET_SREGS, i.e. KVM is rejecting an EFER value that it will
never consume, which is ridiculous. And the fact that you're not trying to have
KVM actually set state further strengthens my assertion that tracking CR0/CR4/EFER
in KVM is pointless necessary for SEV-ES+ guests[1].

This also fixes the _source_, which makes it far less useful, e.g. live migrating
to fixed version of KVM won't work.

So my very strong preference is to first skip the kvm_is_valid_sregs() check, and
then in a follow-up series disable the CR0/CR4/EFER traps and probably use maximal
(according to the guest's capabilities) "defaults" for CR0/CR4/EFER (and I suppose
XCR0 and XSS too), i.e. assume the vCPU is in 64-bit mode with everything enabled.

My understanding is that SVM_VMGEXIT_AP_CREATION is going to force KVM to assume
maximal state anyways since KVM will have no way of verifying what state is actually
shoved into the VMSA, i.e. emulating INIT is wildly broken[2].

Side topic, Peter suspected that KVM _does_ need to let userspace set CR8 since
that's not captured in the VMSA[3].

[1] https://lore.kernel.org/all/[email protected]
[2] https://lore.kernel.org/all/[email protected]
[3] https://lore.kernel.org/all/CAMkAt6oL9tfF5rvP0htbQNDPr50Zk41Q4KP-dM0N+SJ7xmsWvw@mail.gmail.com

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c924075f6f1..6fb2b913009e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11620,7 +11620,8 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
int idx;
struct desc_ptr dt;

- if (!kvm_is_valid_sregs(vcpu, sregs))
+ if (!vcpu->arch.guest_state_protected &&
+ !kvm_is_valid_sregs(vcpu, sregs))
return -EINVAL;

apic_base_msr.data = sregs->apic_base;

2023-12-06 15:33:34

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix handling of EFER_LMA bit when SEV-ES is enabled


On 12/5/23 17:49, Michael Roth wrote:
> In general, activating long mode involves setting the EFER_LME bit in
> the EFER register and then enabling the X86_CR0_PG bit in the CR0
> register. At this point, the EFER_LMA bit will be set automatically by
> hardware.
>
> In the case of SVM/SEV guests where writes to CR0 are intercepted, it's
> necessary for the host to set EFER_LMA on behalf of the guest since
> hardware does not see the actual CR0 write.
>
> In the case of SEV-ES guests where writes to CR0 are trapped instead of
> intercepted, the hardware *does* see/record the write to CR0 before
> exiting and passing the value on to the host, so as part of enabling
> SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to
> support intercepts under SEV-ES") dropped special handling of the
> EFER_LMA bit with the understanding that it would be set automatically.

Maybe add here that it was thought that the change to EFER would generate
a trap. However, a trap isn't generated for hardware changes to EFER and
only explicit writes to EFER generate the trap.

Then the "However" can be dropped from the start of the next sentence.

>
> However, since the guest never explicitly sets the EFER_LMA bit, the
> host never becomes aware that it has been set. This becomes problematic
> when userspace tries to get/set the EFER values via
> KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host
> will be missing the EFER_LMA bit, and when userspace attempts to pass
> the EFER value back via KVM_SET_SREGS it will fail a sanity check that
> asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME
> are set.
>
> Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG
> and EFER_LME, regardless of whether or not SEV-ES is enabled.
>
> Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
> Suggested-by: Tom Lendacky <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Michael Roth <[email protected]>

Acked-by: Tom Lendacky <[email protected]>

> ---
> arch/x86/kvm/svm/svm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5d75a1732da4..b31d4f2deb66 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1869,7 +1869,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> bool old_paging = is_paging(vcpu);
>
> #ifdef CONFIG_X86_64
> - if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) {
> + if (vcpu->arch.efer & EFER_LME) {
> if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
> vcpu->arch.efer |= EFER_LMA;
> svm->vmcb->save.efer |= EFER_LMA | EFER_LME;

2023-12-06 16:21:50

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix handling of EFER_LMA bit when SEV-ES is enabled

On Tue, 2023-12-05 at 17:49 -0600, Michael Roth wrote:
> In general, activating long mode involves setting the EFER_LME bit in
> the EFER register and then enabling the X86_CR0_PG bit in the CR0
> register. At this point, the EFER_LMA bit will be set automatically by
> hardware.
>
> In the case of SVM/SEV guests where writes to CR0 are intercepted, it's
> necessary for the host to set EFER_LMA on behalf of the guest since
> hardware does not see the actual CR0 write.

Could you explain in which case the writes to CR0 will be still intercepted?
It's for CPUs that only support SEV-ES and nothing beyond it?


>
> In the case of SEV-ES guests where writes to CR0 are trapped instead of
> intercepted, the hardware *does* see/record the write to CR0 before
> exiting and passing the value on to the host, so as part of enabling
> SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to
> support intercepts under SEV-ES") dropped special handling of the
> EFER_LMA bit with the understanding that it would be set automatically.
>
> However, since the guest never explicitly sets the EFER_LMA bit, the
> host never becomes aware that it has been set. This becomes problematic
> when userspace tries to get/set the EFER values via
> KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host
> will be missing the EFER_LMA bit, and when userspace attempts to pass
> the EFER value back via KVM_SET_SREGS it will fail a sanity check that
> asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME
> are set.
>
> Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG
> and EFER_LME, regardless of whether or not SEV-ES is enabled.
>
> Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
> Suggested-by: Tom Lendacky <[email protected]>
> Cc: Sean Christopherson <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Michael Roth <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5d75a1732da4..b31d4f2deb66 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1869,7 +1869,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
> bool old_paging = is_paging(vcpu);
>
> #ifdef CONFIG_X86_64
> - if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) {
> + if (vcpu->arch.efer & EFER_LME) {
> if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
> vcpu->arch.efer |= EFER_LMA;
> svm->vmcb->save.efer |= EFER_LMA | EFER_LME;

Purely from the point of view of not confusing future readers of this code:
Due to encrypted guest state, if I understand correctly, the 'svm->vmcb->save.efer'
is only given for the hypervisor to see but not to modify.

While the modification of save.efer is a nop, can we still guard it with
'vcpu->arch.guest_state_protected'?

Besides these nitpicks:

Reviewed-by: Maxim Levitsky <[email protected]>


Best regards,
Maxim Levitsky

2023-12-08 18:39:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix handling of EFER_LMA bit when SEV-ES is enabled

On Wed, Dec 6, 2023 at 4:28 PM Sean Christopherson <[email protected]> wrote:
> Blech. This is a hack to fix even worse hacks. KVM ignores CR0/CR4/EFER values
> that are set via KVM_SET_SREGS, i.e. KVM is rejecting an EFER value that it will
> never consume, which is ridiculous. And the fact that you're not trying to have
> KVM actually set state further strengthens my assertion that tracking CR0/CR4/EFER
> in KVM is pointless necessary for SEV-ES+ guests[1].

I agree that KVM is not going to consume CR0/CR4/EFER. I disagree that
it's a good idea to have a value of vcpu->arch.efer that is
architecturally impossible (so much so that it would fail vmentry in a
non-SEV-ES guest).

I also agree that changing the source is not particularly useful, but
then changing the destination can be easily done in userspace.

In other words, bugfix or not this can and should be merged as a code
cleanup (though your older "[PATCH 1/2] KVM: SVM: Update EFER software
model on CR0 trap for SEV-ES" is nicer in that it clarifies that
svm->vmcb->save.efer is not used, and that's what I would like to
apply).

> So my very strong preference is to first skip the kvm_is_valid_sregs() check

No, please don't. If you want to add a quirk that, when disabled,
causes all guest state get/set ioctls to fail, go ahead. But invalid
processor state remains invalid, and should be rejected, even when KVM
won't consume it.

> My understanding is that SVM_VMGEXIT_AP_CREATION is going to force KVM to assume
> maximal state anyways since KVM will have no way of verifying what state is actually
> shoved into the VMSA, i.e. emulating INIT is wildly broken[2].

Yes, or alternatively a way to pass CR0/CR4/EFER from the guest should
be included in the VMGEXIT spec.

> Side topic, Peter suspected that KVM _does_ need to let userspace set CR8 since
> that's not captured in the VMSA[3].

Makes sense, and then we would have to apply the 2/2 patch from 2021
as well. But for now I'll leave that aside.

Paolo

> [1] https://lore.kernel.org/all/[email protected]
> [2] https://lore.kernel.org/all/[email protected]
> [3] https://lore.kernel.org/all/CAMkAt6oL9tfF5rvP0htbQNDPr50Zk41Q4KP-dM0N+SJ7xmsWvw@mail.gmail.com
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2c924075f6f1..6fb2b913009e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11620,7 +11620,8 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs,
> int idx;
> struct desc_ptr dt;
>
> - if (!kvm_is_valid_sregs(vcpu, sregs))
> + if (!vcpu->arch.guest_state_protected &&
> + !kvm_is_valid_sregs(vcpu, sregs))
> return -EINVAL;
>
> apic_base_msr.data = sregs->apic_base;
>

2023-12-08 20:36:30

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: SEV: Fix handling of EFER_LMA bit when SEV-ES is enabled

On Fri, Dec 08, 2023, Paolo Bonzini wrote:
> On Wed, Dec 6, 2023 at 4:28 PM Sean Christopherson <[email protected]> wrote:
> > So my very strong preference is to first skip the kvm_is_valid_sregs() check
>
> No, please don't. If you want to add a quirk that, when disabled,
> causes all guest state get/set ioctls to fail, go ahead. But invalid
> processor state remains invalid, and should be rejected, even when KVM
> won't consume it.

Ugh, true, KVM should still reject garbage.