2021-06-01 14:41:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
> Thanks for your suggestion, I'll try to set up early #GP handler to fix
> the problem.

Why? AFAICT, you only need to return early in sme_enable() if CPUID is
not "AuthenticAMD". Just do that please.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2021-06-01 16:16:44

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Tue, Jun 01, 2021, Borislav Petkov wrote:
> On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
> > Thanks for your suggestion, I'll try to set up early #GP handler to fix
> > the problem.
>
> Why? AFAICT, you only need to return early in sme_enable() if CPUID is
> not "AuthenticAMD". Just do that please.

I don't think that would suffice, presumably MSR_AMD64_SEV doesn't exist on older
AMD CPUs either. E.g. there's no mention of MSR 0xC001_0131 in the dev's guide
from 2015[*].

I also don't see the point in checking the vendor string. A malicious hypervisor
can lie about CPUID.0x0 just as easily as it can lie about CPUID.0x8000001f, so
for SEV the options are to either trust the hypervisor or eat #GPs on RDMSR for
non-SEV CPUs. If we go with "trust the hypervisor", then the original patch of
hoisting the CPUID.0x8000001f check up is simpler than checking the vendor string.


[*] https://www.amd.com/system/files/TechDocs/48751_16h_bkdg.pdf

2021-06-01 16:38:33

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first



On 6/1/21 11:14 AM, Sean Christopherson wrote:
> On Tue, Jun 01, 2021, Borislav Petkov wrote:
>> On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
>>> Thanks for your suggestion, I'll try to set up early #GP handler to fix
>>> the problem.
>>
>> Why? AFAICT, you only need to return early in sme_enable() if CPUID is
>> not "AuthenticAMD". Just do that please.
>
> I don't think that would suffice, presumably MSR_AMD64_SEV doesn't exist on older
> AMD CPUs either. E.g. there's no mention of MSR 0xC001_0131 in the dev's guide
> from 2015[*].

That is the reason for checking the maximum supported leaf being at least
0x8000001f. If that leaf is supported, we expect the SEV status MSR to be
valid. The problem is that the Hygon ucode does not support the MSR in
question. I'm not sure what it would take for that to be added to their
ucode and just always return 0.

>
> I also don't see the point in checking the vendor string. A malicious hypervisor
> can lie about CPUID.0x0 just as easily as it can lie about CPUID.0x8000001f, so
> for SEV the options are to either trust the hypervisor or eat #GPs on RDMSR for
> non-SEV CPUs. If we go with "trust the hypervisor", then the original patch of
> hoisting the CPUID.0x8000001f check up is simpler than checking the vendor string.

Because a hypervisor can put anything it wants in the CPUID 0x0 /
0x80000000 fields, I don't think we can just check for "AuthenticAMD".

If we want the read of CPUID 0x8000001f done before reading the SEV status
MSR, then the original patch is close, but slightly flawed, e.g. only SME
can be indicated but then MSR_AMD64_SEV can say SEV active.

If we want to introduce support for handling/detecting #GP, this might
become overly complicated because of the very early, identity mapped state
the code is in - especially for backport to stable.

Thanks,
Tom

>
>
> [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F48751_16h_bkdg.pdf&data=04%7C01%7Cthomas.lendacky%40amd.com%7C6025fb08694e4e6b74bb08d92518549a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637581608717458896%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CeJXPNxAOPiXQYYXxDKqSrcVBbiY%2FrkQ%2FmPzbRXbSHQ%3D&reserved=0
>

2021-06-01 17:01:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Tue, Jun 01, 2021 at 11:36:31AM -0500, Tom Lendacky wrote:
> That is the reason for checking the maximum supported leaf being at least
> 0x8000001f. If that leaf is supported, we expect the SEV status MSR to be
> valid. The problem is that the Hygon ucode does not support the MSR in
> question. I'm not sure what it would take for that to be added to their
> ucode and just always return 0.

Yap, that sounds good too.

> Because a hypervisor can put anything it wants in the CPUID 0x0 /
> 0x80000000 fields, I don't think we can just check for "AuthenticAMD".

By that logic you can forget even checking CPUID at all in that case.
The only reliable check you can do is MSR_AMD64_SEV which is guest-only.

> If we want the read of CPUID 0x8000001f done before reading the SEV status
> MSR, then the original patch is close, but slightly flawed, e.g. only SME
> can be indicated but then MSR_AMD64_SEV can say SEV active.
>
> If we want to introduce support for handling/detecting #GP, this might
> become overly complicated because of the very early, identity mapped state
> the code is in - especially for backport to stable.

Yah, ain't gonna happen. I'm not taking some #GP handler to the early
code just because some hardware is operating out of spec.

If some hypervisor running on Hygon hardware is lying and says it is an
AMD which supports the 0x8000001f leaf, then that hypervisor gets to
keep both pieces.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-01 17:10:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Tue, Jun 01, 2021, Tom Lendacky wrote:
>
> On 6/1/21 11:14 AM, Sean Christopherson wrote:
> > On Tue, Jun 01, 2021, Borislav Petkov wrote:
> >> On Mon, May 31, 2021 at 10:56:50PM +0800, Pu Wen wrote:
> >>> Thanks for your suggestion, I'll try to set up early #GP handler to fix
> >>> the problem.
> >>
> >> Why? AFAICT, you only need to return early in sme_enable() if CPUID is
> >> not "AuthenticAMD". Just do that please.
> >
> > I don't think that would suffice, presumably MSR_AMD64_SEV doesn't exist on older
> > AMD CPUs either. E.g. there's no mention of MSR 0xC001_0131 in the dev's guide
> > from 2015[*].
>
> That is the reason for checking the maximum supported leaf being at least
> 0x8000001f. If that leaf is supported, we expect the SEV status MSR to be
> valid. The problem is that the Hygon ucode does not support the MSR in
> question. I'm not sure what it would take for that to be added to their
> ucode and just always return 0.

Ah. But it's also legal/possible for the max extended leaf to be greater than
0x8000001f, e.g. 0x80000020, without 0x8000001f itself being supported. Even
if AMD can guarantee no such processor will exist, I don't think it would be
illegal for a hypervisor to emulate a feature (on an "AuthenticAMD" virtual CPU)
enumerated by a higher leaf on an older physical AMD CPU (or non-AMD CPU!) that
doesn't support MSR_AMD64_SEV.

> > I also don't see the point in checking the vendor string. A malicious hypervisor
> > can lie about CPUID.0x0 just as easily as it can lie about CPUID.0x8000001f, so
> > for SEV the options are to either trust the hypervisor or eat #GPs on RDMSR for
> > non-SEV CPUs. If we go with "trust the hypervisor", then the original patch of
> > hoisting the CPUID.0x8000001f check up is simpler than checking the vendor string.
>
> Because a hypervisor can put anything it wants in the CPUID 0x0 /
> 0x80000000 fields, I don't think we can just check for "AuthenticAMD".
>
> If we want the read of CPUID 0x8000001f done before reading the SEV status
> MSR, then the original patch is close, but slightly flawed, e.g. only SME
> can be indicated but then MSR_AMD64_SEV can say SEV active.

I didn't follow this. Bare metal CPUs should never report only SME in CPUID and
then report SEV as being active in the MSR.

In the SEV case, relying on CPUID in any way, shape, or form requires trusting
the hypervisor. The code could assert that CPUID and the MSR are consistent, but
I don't see any value in doing so as a much more effective attack would be to
report neither SME nor SEV as supported.

2021-06-01 17:18:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Tue, Jun 01, 2021, Borislav Petkov wrote:
> Yah, ain't gonna happen. I'm not taking some #GP handler to the early
> code just because some hardware is operating out of spec.

The bug isn't limited to out-of-spec hardware. At the point of #GP, sme_enable()
has only verified the max leaf is greater than 0x8000001f, it has not verified
that 0x8000001f is actually supported. The APM itself declares several leafs
between 0x80000000 and 0x8000001f as reserved/unsupported, so we can't argue that
0x8000001f must be supported if the max leaf is greater than 0x8000001f.

The only way to verify that 0x8000001f is supported is to find a non-zero bit,
which is what Pu Wen's patch does.

2021-06-01 17:50:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Tue, Jun 01, 2021 at 05:16:12PM +0000, Sean Christopherson wrote:
> The bug isn't limited to out-of-spec hardware. At the point of #GP, sme_enable()
> has only verified the max leaf is greater than 0x8000001f, it has not verified
> that 0x8000001f is actually supported. The APM itself declares several leafs
> between 0x80000000 and 0x8000001f as reserved/unsupported, so we can't argue that
> 0x8000001f must be supported if the max leaf is greater than 0x8000001f.

If a hypervisor says that 0x8000001f is supported but then we explode
when reading MSR_AMD64_SEV, then hypervisor gets to keep both pieces.

We're not going to workaround all possible insane hardware/hypervisor
configurations just because they dropped the ball.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-06-01 18:09:25

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Tue, Jun 01, 2021, Borislav Petkov wrote:
> On Tue, Jun 01, 2021 at 05:16:12PM +0000, Sean Christopherson wrote:
> > The bug isn't limited to out-of-spec hardware. At the point of #GP, sme_enable()
> > has only verified the max leaf is greater than 0x8000001f, it has not verified
> > that 0x8000001f is actually supported. The APM itself declares several leafs
> > between 0x80000000 and 0x8000001f as reserved/unsupported, so we can't argue that
> > 0x8000001f must be supported if the max leaf is greater than 0x8000001f.
>
> If a hypervisor says that 0x8000001f is supported but then we explode
> when reading MSR_AMD64_SEV, then hypervisor gets to keep both pieces.

But in my scenario, the hypervisor has not said that 0x8000001f is valid, it has
only said that at least one leaf > 0x8000001f is valid.

E.g. if a (virtual) CPU supports CPUID ranges:

0x80000000 - 0x8000000A
0x80000020 - 0x80000021

then the below check will pass as eax will be 0x80000021.

/* Check for the SME/SEV support leaf */
eax = 0x80000000;
ecx = 0;
native_cpuid(&eax, &ebx, &ecx, &edx);
if (eax < 0x8000001f)
return;

But we have not yet verified that 0x8000001f is supported, only that the result
of CPUID.0x8000001f can be trusted (to handle Intel CPUs which return data from
the highest supported leaf if the provided leaf function is greater than the max
supported leaf). Verifying that 0x8000001f is supported doesn't happen until
0x8000001f is actually read, which is currently done after the RDMSR that #GPs
and explodes.

> We're not going to workaround all possible insane hardware/hypervisor
> configurations just because they dropped the ball.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2021-06-01 18:25:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/sev: Check whether SEV or SME is supported first

On Tue, Jun 01, 2021 at 06:08:19PM +0000, Sean Christopherson wrote:
> But we have not yet verified that 0x8000001f is supported, only that the result
> of CPUID.0x8000001f can be trusted (to handle Intel CPUs which return data from
> the highest supported leaf if the provided leaf function is greater than the max
> supported leaf). Verifying that 0x8000001f is supported doesn't happen until
> 0x8000001f is actually read, which is currently done after the RDMSR that #GPs
> and explodes.

Yeah yeah, Tom just convinced me on IRC that the patch is ok after
all... so let's do that. And again, we cannot stop hypervisors from
doing shady things here so I don't even wanna try to. People should run
SNP/TDX guests only anyway if they care about this stuff.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette