2022-09-16 05:52:19

by Jim Mattson

[permalink] [raw]
Subject: [PATCH 2/5] KVM: svm: Disallow EFER.LMSLE on hardware that doesn't support it

KVM has never properly virtualized EFER.LMSLE. When the "nested"
module parameter is true, it allows an SVM guest to set EFER.LMSLE,
and it passes the bit through in the VMCB, but the KVM emulator
doesn't perform the required data segment limit checks in 64-bit mode.

With Zen3, AMD has dropped support for EFER.LMSLE. Hence, if a Zen3
guest sets EFER.LMSLE, the next VMRUN will fail with "invalid VMCB."

When the host reports X86_FEATURE_NO_LMSLE, treat EFER.LMSLE as a
reserved bit in the guest. Now, if a guest tries to set EFER.LMSLE on
a host without support for EFER.LMSLE, the WRMSR will raise a #GP.

At the moment, the #GP may come as a surprise, but it's an improvement
over the failed VMRUN. The #GP will be vindicated anon.

Fixes: eec4b140c924 ("KVM: SVM: Allow EFER.LMSLE to be set with nested svm")
Signed-off-by: Jim Mattson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..7c4fd594166c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5012,7 +5012,9 @@ static __init int svm_hardware_setup(void)

if (nested) {
printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
- kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
+ kvm_enable_efer_bits(EFER_SVME);
+ if (!boot_cpu_has(X86_FEATURE_NO_LMSLE))
+ kvm_enable_efer_bits(EFER_LMSLE);
}

/*
--
2.37.3.968.ga6b4b080e4-goog


2022-09-16 20:24:47

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: svm: Disallow EFER.LMSLE on hardware that doesn't support it

On Thu, Sep 15, 2022, Jim Mattson wrote:
> KVM has never properly virtualized EFER.LMSLE. When the "nested"
> module parameter is true, it allows an SVM guest to set EFER.LMSLE,
> and it passes the bit through in the VMCB, but the KVM emulator
> doesn't perform the required data segment limit checks in 64-bit mode.
>
> With Zen3, AMD has dropped support for EFER.LMSLE. Hence, if a Zen3
> guest sets EFER.LMSLE, the next VMRUN will fail with "invalid VMCB."
>
> When the host reports X86_FEATURE_NO_LMSLE, treat EFER.LMSLE as a
> reserved bit in the guest. Now, if a guest tries to set EFER.LMSLE on
> a host without support for EFER.LMSLE, the WRMSR will raise a #GP.
>
> At the moment, the #GP may come as a surprise, but it's an improvement
> over the failed VMRUN. The #GP will be vindicated anon.
>
> Fixes: eec4b140c924 ("KVM: SVM: Allow EFER.LMSLE to be set with nested svm")
> Signed-off-by: Jim Mattson <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f3813dbacb9f..7c4fd594166c 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5012,7 +5012,9 @@ static __init int svm_hardware_setup(void)
>
> if (nested) {
> printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
> - kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
> + kvm_enable_efer_bits(EFER_SVME);
> + if (!boot_cpu_has(X86_FEATURE_NO_LMSLE))
> + kvm_enable_efer_bits(EFER_LMSLE);

Since KVM doesn't correctly virtualize EFER.LMSLE, I wonder if we can get away with
dropping support entirely. I.e. delete the reference to EFER_LMSLE and unconditionally
set F(NO_LMSLE) in KVM_GET_SUPPORTED_CPUID.

2022-09-16 21:05:18

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: svm: Disallow EFER.LMSLE on hardware that doesn't support it

On Fri, Sep 16, 2022 at 1:14 PM Sean Christopherson <[email protected]> wrote:
>
> On Thu, Sep 15, 2022, Jim Mattson wrote:
> > KVM has never properly virtualized EFER.LMSLE. When the "nested"
> > module parameter is true, it allows an SVM guest to set EFER.LMSLE,
> > and it passes the bit through in the VMCB, but the KVM emulator
> > doesn't perform the required data segment limit checks in 64-bit mode.
> >
> > With Zen3, AMD has dropped support for EFER.LMSLE. Hence, if a Zen3
> > guest sets EFER.LMSLE, the next VMRUN will fail with "invalid VMCB."
> >
> > When the host reports X86_FEATURE_NO_LMSLE, treat EFER.LMSLE as a
> > reserved bit in the guest. Now, if a guest tries to set EFER.LMSLE on
> > a host without support for EFER.LMSLE, the WRMSR will raise a #GP.
> >
> > At the moment, the #GP may come as a surprise, but it's an improvement
> > over the failed VMRUN. The #GP will be vindicated anon.
> >
> > Fixes: eec4b140c924 ("KVM: SVM: Allow EFER.LMSLE to be set with nested svm")
> > Signed-off-by: Jim Mattson <[email protected]>
> > ---
> > arch/x86/kvm/svm/svm.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index f3813dbacb9f..7c4fd594166c 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -5012,7 +5012,9 @@ static __init int svm_hardware_setup(void)
> >
> > if (nested) {
> > printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
> > - kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
> > + kvm_enable_efer_bits(EFER_SVME);
> > + if (!boot_cpu_has(X86_FEATURE_NO_LMSLE))
> > + kvm_enable_efer_bits(EFER_LMSLE);
>
> Since KVM doesn't correctly virtualize EFER.LMSLE, I wonder if we can get away with
> dropping support entirely. I.e. delete the reference to EFER_LMSLE and unconditionally
> set F(NO_LMSLE) in KVM_GET_SUPPORTED_CPUID.

It's possible that SLES11 Xen 4.0 sets the bit, but never actually
uses truncated segments in 64-bit mode. In any case, according to the
original commit, it won't boot if setting EFER.LMSLE is not allowed.

2022-09-16 22:51:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: svm: Disallow EFER.LMSLE on hardware that doesn't support it

On Fri, Sep 16, 2022 at 02:00:26PM -0700, Jim Mattson wrote:
> It's possible that SLES11 Xen 4.0 sets the bit, but never actually
> uses truncated segments in 64-bit mode. In any case, according to the
> original commit, it won't boot if setting EFER.LMSLE is not allowed.

How is SLE11 at all relevant to the upstream kernel?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-09-16 22:54:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: svm: Disallow EFER.LMSLE on hardware that doesn't support it

On Sat, Sep 17, 2022, Borislav Petkov wrote:
> On Fri, Sep 16, 2022 at 02:00:26PM -0700, Jim Mattson wrote:
> > It's possible that SLES11 Xen 4.0 sets the bit, but never actually
> > uses truncated segments in 64-bit mode. In any case, according to the
> > original commit, it won't boot if setting EFER.LMSLE is not allowed.
>
> How is SLE11 at all relevant to the upstream kernel?

Yeah, I'm inclined to revert the original commit and advertise NO_LSMLE unconditionally.
I don't like the idea of knowingly ignoring the fact that KVM doesn't correctly
virtualize LMSLE.

Xen itself already does exactly this:

commit 23ccf530431561268b0190f0f1b740b618771b7b
Author: Andrew Cooper <[email protected]>
Date: Fri Apr 2 14:10:25 2021 +0100

x86/cpuid: Advertise no-lmsl unilaterally to hvm guests

While part of the original AMD64 spec, Long Mode Segment Limit was a feature
not picked up by Intel, and therefore didn't see much adoption in software.
AMD have finally dropped the feature from hardware, and allocated a CPUID bit
to indicate its absence.

Xen has never supported the feature for guests, even when running on capable
hardware, so advertise the feature's absence unilaterally.

There is nothing specifically wrong with exposing this bit to PV guests, but
the PV ABI doesn't include a working concept of MSR_EFER in the first place,
so exposing it to PV guests would be out-of-place.

Signed-off-by: Andrew Cooper <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>

because as noted in commit f2c6084442 ("x86/SVM: correct boot time cpu_data[] handling"),
Xen broke detection of LMSLE support shortly after it was added in 2010, presumably
before any official release. "Support" was added for HVM guests in by commit

727bc17d20 ("svm: support EFER.LMSLE for guests")

and then broken a few weeks later by commit

566ddbe833 ("x86: Fail CPU bringup cleanly if it cannot initialise HVM.")

Note that Xen did a "safe" WRMSR+RDMSR to detect LMSLE, so either someone added
extra out-of-tree code that caused Xen to fail to boot, or "necessary ... to boot
with nested svm" only meant being able to expose SVM to L2.

Either way, KVM appears to be carrying a half-baked "fix" for a buggy guest that's
long since gone. So like we did in commit 8805875aa473 ("Revert "KVM: nVMX: Do not
expose MPX VMX controls when guest MPX disabled""), I think we should just revert
the "fix".

2022-09-18 19:47:43

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: svm: Disallow EFER.LMSLE on hardware that doesn't support it

On Fri, Sep 16, 2022 at 10:33:29PM +0000, Sean Christopherson wrote:
> ...
> Either way, KVM appears to be carrying a half-baked "fix" for a buggy guest that's
> long since gone. So like we did in commit 8805875aa473 ("Revert "KVM: nVMX: Do not
> expose MPX VMX controls when guest MPX disabled""), I think we should just revert
> the "fix".

If, as message 0/5 says, setting this bit so that SLE11 Xen 4.0 boots as
a nested hypervisor is the use case, then sure, unconditional NO_LSMLE
and we all should go on with our lives.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2022-09-19 18:32:49

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: svm: Disallow EFER.LMSLE on hardware that doesn't support it

On Sun, Sep 18, 2022 at 12:04 PM Borislav Petkov <[email protected]> wrote:
>
> On Fri, Sep 16, 2022 at 10:33:29PM +0000, Sean Christopherson wrote:
> > ...
> > Either way, KVM appears to be carrying a half-baked "fix" for a buggy guest that's
> > long since gone. So like we did in commit 8805875aa473 ("Revert "KVM: nVMX: Do not
> > expose MPX VMX controls when guest MPX disabled""), I think we should just revert
> > the "fix".
>
> If, as message 0/5 says, setting this bit so that SLE11 Xen 4.0 boots as
> a nested hypervisor is the use case, then sure, unconditional NO_LSMLE
> and we all should go on with our lives.

Fantastic! That's what I'll do in V2.