Subject: Re: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()



On 1/5/2024 10:30 AM, [email protected] wrote:
> From: Michael Kelley <[email protected]>
>
> In a CoCo VM, when transitioning memory from encrypted to decrypted, or
> vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
> is responsible for ensuring the memory isn't in use and isn't referenced
> while the transition is in progress. The transition has multiple steps,
> and the memory is in an inconsistent state until all steps are complete.
> A reference while the state is inconsistent could result in an exception
> that can't be cleanly fixed up.
>
> However, the kernel load_unaligned_zeropad() mechanism could cause a stray
> reference that can't be prevented by the caller of set_memory_encrypted()
> or set_memory_decrypted(), so there's specific code to handle this case.
> But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
> with the #VC or #VE exception routed to the paravisor. There's no
> architectural way to forward the exceptions back to the guest kernel, and
> in such a case, the load_unaligned_zeropad() specific code doesn't work.
>
> To avoid this problem, mark pages as "not present" while a transition
> is in progress. If load_unaligned_zeropad() causes a stray reference, a
> normal page fault is generated instead of #VC or #VE, and the
> page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
> reference. When the encrypted/decrypted transition is complete, mark the
> pages as "present" again.

Change looks good to me. But I am wondering why are adding it part of prepare
and finish callbacks instead of directly in set_memory_encrypted() function.

Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>


>
> Signed-off-by: Michael Kelley <[email protected]>
> ---
> arch/x86/hyperv/ivm.c | 49 ++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 46 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 8ba18635e338..5ad39256a5d2 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -15,6 +15,7 @@
> #include <asm/io.h>
> #include <asm/coco.h>
> #include <asm/mem_encrypt.h>
> +#include <asm/set_memory.h>
> #include <asm/mshyperv.h>
> #include <asm/hypervisor.h>
> #include <asm/mtrr.h>
> @@ -502,6 +503,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> return -EFAULT;
> }
>
> +/*
> + * When transitioning memory between encrypted and decrypted, the caller
> + * of set_memory_encrypted() or set_memory_decrypted() is responsible for
> + * ensuring that the memory isn't in use and isn't referenced while the
> + * transition is in progress. The transition has multiple steps, and the
> + * memory is in an inconsistent state until all steps are complete. A
> + * reference while the state is inconsistent could result in an exception
> + * that can't be cleanly fixed up.
> + *
> + * But the Linux kernel load_unaligned_zeropad() mechanism could cause a
> + * stray reference that can't be prevented by the caller, so Linux has
> + * specific code to handle this case. But when the #VC and #VE exceptions
> + * routed to a paravisor, the specific code doesn't work. To avoid this
> + * problem, mark the pages as "not present" while the transition is in
> + * progress. If load_unaligned_zeropad() causes a stray reference, a normal
> + * page fault is generated instead of #VC or #VE, and the page-fault-based
> + * handlers for load_unaligned_zeropad() resolve the reference. When the
> + * transition is complete, hv_vtom_set_host_visibility() marks the pages
> + * as "present" again.
> + */
> +static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
> +{
> + return !set_memory_np(kbuffer, pagecount);
> +}
> +
> /*
> * hv_vtom_set_host_visibility - Set specified memory visible to host.
> *
> @@ -521,7 +547,7 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
>
> pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
> if (!pfn_array)
> - return false;
> + goto err_set_memory_p;
>
> for (i = 0, pfn = 0; i < pagecount; i++) {
> /*
> @@ -545,14 +571,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
> }
> }
>
> - err_free_pfn_array:
> +err_free_pfn_array:
> kfree(pfn_array);
> +
> +err_set_memory_p:
> + /*
> + * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present()
> + * did. Do this even if there is an error earlier in this function in
> + * order to avoid leaving the memory range in a "broken" state. Setting
> + * the PRESENT bits shouldn't fail, but return an error if it does.
> + */
> + if (set_memory_p(kbuffer, pagecount))
> + result = false;
> +
> return result;
> }
>
> static bool hv_vtom_tlb_flush_required(bool private)
> {
> - return true;
> + /*
> + * Since hv_vtom_clear_present() marks the PTEs as "not present"
> + * and flushes the TLB, they can't be in the TLB. That makes the
> + * flush controlled by this function redundant, so return "false".
> + */
> + return false;
> }
>
> static bool hv_vtom_cache_flush_required(void)
> @@ -615,6 +657,7 @@ void __init hv_vtom_init(void)
> x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
> x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
> x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
> + x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present;
> x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
>
> /* Set WB as the default cache mode. */

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


2024-01-08 19:13:45

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()

From: Kuppuswamy Sathyanarayanan <[email protected]>
Sent: Monday, January 8, 2024 10:37 AM
>
> On 1/5/2024 10:30 AM, [email protected] wrote:
> > From: Michael Kelley <[email protected]>
> >
> > In a CoCo VM, when transitioning memory from encrypted to decrypted, or
> > vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
> > is responsible for ensuring the memory isn't in use and isn't referenced
> > while the transition is in progress. The transition has multiple steps,
> > and the memory is in an inconsistent state until all steps are complete.
> > A reference while the state is inconsistent could result in an exception
> > that can't be cleanly fixed up.
> >
> > However, the kernel load_unaligned_zeropad() mechanism could cause a stray
> > reference that can't be prevented by the caller of set_memory_encrypted()
> > or set_memory_decrypted(), so there's specific code to handle this case.
> > But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
> > with the #VC or #VE exception routed to the paravisor. There's no
> > architectural way to forward the exceptions back to the guest kernel, and
> > in such a case, the load_unaligned_zeropad() specific code doesn't work.
> >
> > To avoid this problem, mark pages as "not present" while a transition
> > is in progress. If load_unaligned_zeropad() causes a stray reference, a
> > normal page fault is generated instead of #VC or #VE, and the
> > page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
> > reference. When the encrypted/decrypted transition is complete, mark the
> > pages as "present" again.
>
> Change looks good to me. But I am wondering why are adding it part of
> prepare and finish callbacks instead of directly in set_memory_encrypted() function.
>

The prepare/finish callbacks are different for TDX, SEV-SNP, and
Hyper-V CoCo guests running with a paravisor -- so there are three sets
of callbacks. As described in the cover letter, I've given up on using this
scheme for the TDX and SEV-SNP cases, because of the difficulty with
the SEV-SNP callbacks needing a valid virtual address (whereas TDX and
Hyper-V paravisor need only a physical address). So it seems like the
callbacks specific to the Hyper-V paravisor are the natural place for the
code. That leaves the TDX and SEV-SNP code paths unchanged, which
was my intent.

Or maybe I'm not understanding your comment? If that's the case,
please elaborate.

Michael

> Reviewed-by: Kuppuswamy Sathyanarayanan
> <[email protected]>
>

Subject: Re: [PATCH v3 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()



On 1/8/2024 11:13 AM, Michael Kelley wrote:
> From: Kuppuswamy Sathyanarayanan <[email protected]>
> Sent: Monday, January 8, 2024 10:37 AM
>>
>> On 1/5/2024 10:30 AM, [email protected] wrote:
>>> From: Michael Kelley <[email protected]>
>>>
>>> In a CoCo VM, when transitioning memory from encrypted to decrypted, or
>>> vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
>>> is responsible for ensuring the memory isn't in use and isn't referenced
>>> while the transition is in progress. The transition has multiple steps,
>>> and the memory is in an inconsistent state until all steps are complete.
>>> A reference while the state is inconsistent could result in an exception
>>> that can't be cleanly fixed up.
>>>
>>> However, the kernel load_unaligned_zeropad() mechanism could cause a stray
>>> reference that can't be prevented by the caller of set_memory_encrypted()
>>> or set_memory_decrypted(), so there's specific code to handle this case.
>>> But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
>>> with the #VC or #VE exception routed to the paravisor. There's no
>>> architectural way to forward the exceptions back to the guest kernel, and
>>> in such a case, the load_unaligned_zeropad() specific code doesn't work.
>>>
>>> To avoid this problem, mark pages as "not present" while a transition
>>> is in progress. If load_unaligned_zeropad() causes a stray reference, a
>>> normal page fault is generated instead of #VC or #VE, and the
>>> page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
>>> reference. When the encrypted/decrypted transition is complete, mark the
>>> pages as "present" again.
>>
>> Change looks good to me. But I am wondering why are adding it part of
>> prepare and finish callbacks instead of directly in set_memory_encrypted() function.
>>
>
> The prepare/finish callbacks are different for TDX, SEV-SNP, and
> Hyper-V CoCo guests running with a paravisor -- so there are three sets
> of callbacks. As described in the cover letter, I've given up on using this
> scheme for the TDX and SEV-SNP cases, because of the difficulty with
> the SEV-SNP callbacks needing a valid virtual address (whereas TDX and
> Hyper-V paravisor need only a physical address). So it seems like the
> callbacks specific to the Hyper-V paravisor are the natural place for the
> code. That leaves the TDX and SEV-SNP code paths unchanged, which
> was my intent.
>

Got it. Thanks for clarifying it.

> Or maybe I'm not understanding your comment? If that's the case,
> please elaborate.
>
> Michael
>
>> Reviewed-by: Kuppuswamy Sathyanarayanan
>> <[email protected]>
>>

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer