2023-11-28 00:01:24

by Dave Hansen

[permalink] [raw]
Subject: AMD Memory encryption vs. kexec

... actually cc'd the mailing lists and x86@ exploder on this one.
Please reply here.

---

There are two kexec-related wbinvd's:

One for the kexec boot CPU in relocate_kernel() which is driven by
CC_ATTR_HOST_MEM_ENCRYPT:

> image->start = relocate_kernel((unsigned long)image->head,
> (unsigned long)page_list,
> image->start,
> image->preserve_context,
> cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));

the other is for non-boot CPUs in stop_this_cpu():

> if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
> native_wbinvd();

By my reading, the CC_ATTR_HOST_MEM_ENCRYPT is basically a check for
whether the current kernel has enabled SME but not SEV while the
stop_this_cpu() site is driven purely by whether the hardware *supports*
SME.

The whole supposed reason stop_this_cpu() checks CPUID directly is that
the current kernel SME/SEV enabling might not match the _next_ kernel's
enabling choices.

So, why is a _current_ kernel check OK for relocate_kernel(), but not OK
for stop_this_cpu()?

It seems to me like both sites might need to use the
stop_this_cpu()-style "raw" hardware support checks.

Why do I care? TDX potentially needs wbinvd at the same two spots. It
would be nice to have a common cc_attr for both sites, but I need to
reconcile the apparently disparate AMD uses first.


2023-11-28 14:03:23

by Tom Lendacky

[permalink] [raw]
Subject: Re: AMD Memory encryption vs. kexec

On 11/27/23 18:00, Dave Hansen wrote:
> ... actually cc'd the mailing lists and x86@ exploder on this one.
> Please reply here.
>
> ---
>
> There are two kexec-related wbinvd's:
>
> One for the kexec boot CPU in relocate_kernel() which is driven by
> CC_ATTR_HOST_MEM_ENCRYPT:
>
>> image->start = relocate_kernel((unsigned long)image->head,
>> (unsigned long)page_list,
>> image->start,
>> image->preserve_context,
>> cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
>
> the other is for non-boot CPUs in stop_this_cpu():
>
>> if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0)))
>> native_wbinvd();
>
> By my reading, the CC_ATTR_HOST_MEM_ENCRYPT is basically a check for
> whether the current kernel has enabled SME but not SEV while the
> stop_this_cpu() site is driven purely by whether the hardware *supports*
> SME.
>
> The whole supposed reason stop_this_cpu() checks CPUID directly is that
> the current kernel SME/SEV enabling might not match the _next_ kernel's
> enabling choices.

Correct.

>
> So, why is a _current_ kernel check OK for relocate_kernel(), but not OK
> for stop_this_cpu()?

The relocate_kernel() check provides an indication of whether SME is
actually active. The kexec kernel is placed in unencrypted memory to match
how the system was booted - where the kernel is loaded into unencrypted
memory and then encrypted in-place if SME is desired (mem_encrypt=on).
Since the kexec kernel will be unencrypted, the cc_platform_has() call is
used to indicate whether to perform a wbinvd to remove encrypted cache
line entries. If SME is not active, then there is no need to flush caches
prior to booting the kexec kernel.

With SEV, the kernel is loaded encrypted from the start and so the kexec
kernel can remain in encrypted memory and no wbinvd is required.

Thanks,
Tom

>
> It seems to me like both sites might need to use the
> stop_this_cpu()-style "raw" hardware support checks.
>
> Why do I care? TDX potentially needs wbinvd at the same two spots. It
> would be nice to have a common cc_attr for both sites, but I need to
> reconcile the apparently disparate AMD uses first.

2023-11-29 20:02:28

by Dave Hansen

[permalink] [raw]
Subject: Re: AMD Memory encryption vs. kexec

On 11/28/23 06:03, Tom Lendacky wrote:
...
>> By my reading, the CC_ATTR_HOST_MEM_ENCRYPT is basically a check for
>> whether the current kernel has enabled SME but not SEV while the
>> stop_this_cpu() site is driven purely by whether the hardware *supports*
>> SME.
>>
>> The whole supposed reason stop_this_cpu() checks CPUID directly is that
>> the current kernel SME/SEV enabling might not match the _next_ kernel's
>> enabling choices.
>
> Correct.
>
>> So, why is a _current_ kernel check OK for relocate_kernel(), but not OK
>> for stop_this_cpu()?
>
> The relocate_kernel() check provides an indication of whether SME is
> actually active. The kexec kernel is placed in unencrypted memory to
> match how the system was booted - where the kernel is loaded into
> unencrypted memory and then encrypted in-place if SME is desired
> (mem_encrypt=on). Since the kexec kernel will be unencrypted, the
> cc_platform_has() call is used to indicate whether to perform a wbinvd
> to remove encrypted cache line entries. If SME is not active, then there
> is no need to flush caches prior to booting the kexec kernel.

Ahh, so that wbinvd is truly specific to kexec. It protects the
always-unencrypted kexec area from being zapped by encrypted lines. It
isn't necessary when the old kexec kernel is mem_encrypt=off because the
unencrypted old kernel matches the always unencrypted kexec area.

What I was worried about was the _larger_ case. Not the kexec area, the
*rest* of memory. But I think that's irrelevant because there's yet
*another* wbinvd in __enc_copy() that is will flush the rest of memory
when going from mem_encrypt=off=>on.

I'd like to propose a simplification. Let's add a
CC_ATTR_HOST_MEM_INCOHERENT. That bit gets set on all hardware that
needs WBVINDs at kexec. On AMD, it can use the stop_this_cpu() logic.
This will cause an additional wbinvd in case where a mem_encrypt=off
kernel is kexec'ing.

We can also set it on any TDX-enabled Intel hardware.

That leads to very simple logic at kexec:

Could the old kernel leave incoherent caches
around? If so, do WBINVD.

That logic gets applied to all CPUs, both boot and secondary. It
applies to all the SME-only systems (currently CC_ATTR_HOST_MEM_ENCRYPT)
and also all TDX systems. It would not depend on the current kernel's
SME enabling and it would allow both kexec-related sites to share the
same logic.

I don't really like the idea of yet another CC_ATTR_HOST_MEM_INCOHERENT
bit, but I do think it's better than adding some TDX-specific paths.

2023-11-29 20:54:48

by Tom Lendacky

[permalink] [raw]
Subject: Re: AMD Memory encryption vs. kexec

On 11/29/23 14:01, Dave Hansen wrote:
> On 11/28/23 06:03, Tom Lendacky wrote:
> ...
>>> By my reading, the CC_ATTR_HOST_MEM_ENCRYPT is basically a check for
>>> whether the current kernel has enabled SME but not SEV while the
>>> stop_this_cpu() site is driven purely by whether the hardware *supports*
>>> SME.
>>>
>>> The whole supposed reason stop_this_cpu() checks CPUID directly is that
>>> the current kernel SME/SEV enabling might not match the _next_ kernel's
>>> enabling choices.
>>
>> Correct.
>>
>>> So, why is a _current_ kernel check OK for relocate_kernel(), but not OK
>>> for stop_this_cpu()?
>>
>> The relocate_kernel() check provides an indication of whether SME is
>> actually active. The kexec kernel is placed in unencrypted memory to
>> match how the system was booted - where the kernel is loaded into
>> unencrypted memory and then encrypted in-place if SME is desired
>> (mem_encrypt=on). Since the kexec kernel will be unencrypted, the
>> cc_platform_has() call is used to indicate whether to perform a wbinvd
>> to remove encrypted cache line entries. If SME is not active, then there
>> is no need to flush caches prior to booting the kexec kernel.
>
> Ahh, so that wbinvd is truly specific to kexec. It protects the
> always-unencrypted kexec area from being zapped by encrypted lines. It
> isn't necessary when the old kexec kernel is mem_encrypt=off because the
> unencrypted old kernel matches the always unencrypted kexec area.
>
> What I was worried about was the _larger_ case. Not the kexec area, the
> *rest* of memory. But I think that's irrelevant because there's yet
> *another* wbinvd in __enc_copy() that is will flush the rest of memory
> when going from mem_encrypt=off=>on.

Correct (I was actually sitting here before I got your email wondering if
I should reply to my previous email with just that info).

>
> I'd like to propose a simplification. Let's add a
> CC_ATTR_HOST_MEM_INCOHERENT. That bit gets set on all hardware that
> needs WBVINDs at kexec. On AMD, it can use the stop_this_cpu() logic.
> This will cause an additional wbinvd in case where a mem_encrypt=off
> kernel is kexec'ing.
>
> We can also set it on any TDX-enabled Intel hardware.
>
> That leads to very simple logic at kexec:
>
> Could the old kernel leave incoherent caches
> around? If so, do WBINVD.
>
> That logic gets applied to all CPUs, both boot and secondary. It
> applies to all the SME-only systems (currently CC_ATTR_HOST_MEM_ENCRYPT)
> and also all TDX systems. It would not depend on the current kernel's
> SME enabling and it would allow both kexec-related sites to share the
> same logic.
>
> I don't really like the idea of yet another CC_ATTR_HOST_MEM_INCOHERENT
> bit, but I do think it's better than adding some TDX-specific paths.

I'm good with that change. I think an additional WBINVD during kexec is
acceptable to make everything less complicated in the code.

Thanks,
Tom