2024-01-16 02:20:38

by Michael Kelley

[permalink] [raw]
Subject: [PATCH v4 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

From: Michael Kelley <[email protected]>

In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.

However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.

To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.

This version of the patch series marks transitioning pages "not present"
only when running as a Hyper-V guest with a paravisor. Previous
versions[1] marked transitioning pages "not present" regardless of the
hypervisor and regardless of whether a paravisor is in use. That more
general use had the benefit of decoupling the load_unaligned_zeropad()
fixup from CoCo VM #VE and #VC exception handling. But the implementation
was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks
require a valid virtual address, not a physical address like with TDX and
the Hyper-V paravisor. Marking the transitioning pages "not present"
causes the virtual address to not be valid, and the PVALIDATE
instruction in the SEV-SNP callback fails. Constructing a temporary
virtual address for this purpose is slower and adds complexity that
negates the benefits of the more general use. So this version narrows
the applicability of the approach to just where it is required
because of the #VC and #VE exceptions being routed to a paravisor.

The previous version minimized the TLB flushing done during page
transitions between encrypted and decrypted. Because this version
marks the pages "not present" in hypervisor specific callbacks and
not in __set_memory_enc_pgtable(), doing such optimization is more
difficult to coordinate. But the page transitions are not a hot path,
so this version eschews optimization of TLB flushing in favor of
simplicity.

Since this version no longer touches __set_memory_enc_pgtable(),
I've also removed patches that add comments about error handling
in that function. Rick Edgecombe has proposed patches to improve
that error handling, and I'll leave those comments to Rick's
patches.

Patch 1 handles implications of the hypervisor callbacks needing
to do virt-to-phys translations on pages that are temporarily
marked not present.

Patch 2 makes the existing set_memory_p() function available for
use in the hypervisor callbacks.

Patch 3 is the core change that marks the transitioning pages
as not present.

This patch set is based on the linux-next20240103 code tree.

Changes in v4:
* Patch 1: Updated comment in slow_virt_to_phys() to reduce the
likelihood of the comment becoming stale. The new comment
describes the requirement to work with leaf PTE not present,
but doesn't directly reference the CoCo hypervisor callbacks.
[Rick Edgecombe]
* Patch 1: Decomposed a complex line-wrapped statement into
multiple statements for ease of understanding. No functional
change compared with v3. [Kirill Shutemov]
* Patch 3: Fixed handling of memory allocation errors. [Rick
Edgecombe]

Changes in v3:
* Major rework and simplification per discussion above.

Changes in v2:
* Added Patches 3 and 4 to deal with the failure on SEV-SNP
[Tom Lendacky]
* Split the main change into two separate patches (Patch 5 and
Patch 6) to improve reviewability and to offer the option of
retaining both hypervisor callbacks.
* Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64
so that the code builds correctly for 32-bit, even though it
is never executed for 32-bit [reported by kernel test robot]

[1] https://lore.kernel.org/lkml/[email protected]/

Michael Kelley (3):
x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
callback
x86/mm: Regularize set_memory_p() parameters and make non-static
x86/hyperv: Make encrypted/decrypted changes safe for
load_unaligned_zeropad()

arch/x86/hyperv/ivm.c | 65 ++++++++++++++++++++++++++++---
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/mm/pat/set_memory.c | 24 +++++++-----
3 files changed, 75 insertions(+), 15 deletions(-)

--
2.25.1



2024-01-16 02:21:03

by Michael Kelley

[permalink] [raw]
Subject: [PATCH v4 3/3] x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()

From: Michael Kelley <[email protected]>

In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.

However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.

To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.

Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
---
arch/x86/hyperv/ivm.c | 53 +++++++++++++++++++++++++++++++++++++++----
1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 851107c77f4d..95036feb95e7 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,7 @@
#include <asm/io.h>
#include <asm/coco.h>
#include <asm/mem_encrypt.h>
+#include <asm/set_memory.h>
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
#include <asm/mtrr.h>
@@ -502,6 +503,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
return -EFAULT;
}

+/*
+ * When transitioning memory between encrypted and decrypted, the caller
+ * of set_memory_encrypted() or set_memory_decrypted() is responsible for
+ * ensuring that the memory isn't in use and isn't referenced while the
+ * transition is in progress. The transition has multiple steps, and the
+ * memory is in an inconsistent state until all steps are complete. A
+ * reference while the state is inconsistent could result in an exception
+ * that can't be cleanly fixed up.
+ *
+ * But the Linux kernel load_unaligned_zeropad() mechanism could cause a
+ * stray reference that can't be prevented by the caller, so Linux has
+ * specific code to handle this case. But when the #VC and #VE exceptions
+ * routed to a paravisor, the specific code doesn't work. To avoid this
+ * problem, mark the pages as "not present" while the transition is in
+ * progress. If load_unaligned_zeropad() causes a stray reference, a normal
+ * page fault is generated instead of #VC or #VE, and the page-fault-based
+ * handlers for load_unaligned_zeropad() resolve the reference. When the
+ * transition is complete, hv_vtom_set_host_visibility() marks the pages
+ * as "present" again.
+ */
+static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
+{
+ return !set_memory_np(kbuffer, pagecount);
+}
+
/*
* hv_vtom_set_host_visibility - Set specified memory visible to host.
*
@@ -522,8 +548,10 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
int i, pfn;

pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
- if (!pfn_array)
- return false;
+ if (!pfn_array) {
+ result = false;
+ goto err_set_memory_p;
+ }

for (i = 0, pfn = 0; i < pagecount; i++) {
/*
@@ -548,14 +576,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
}
}

- err_free_pfn_array:
+err_free_pfn_array:
kfree(pfn_array);
+
+err_set_memory_p:
+ /*
+ * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present()
+ * did. Do this even if there is an error earlier in this function in
+ * order to avoid leaving the memory range in a "broken" state. Setting
+ * the PRESENT bits shouldn't fail, but return an error if it does.
+ */
+ if (set_memory_p(kbuffer, pagecount))
+ result = false;
+
return result;
}

static bool hv_vtom_tlb_flush_required(bool private)
{
- return true;
+ /*
+ * Since hv_vtom_clear_present() marks the PTEs as "not present"
+ * and flushes the TLB, they can't be in the TLB. That makes the
+ * flush controlled by this function redundant, so return "false".
+ */
+ return false;
}

static bool hv_vtom_cache_flush_required(void)
@@ -618,6 +662,7 @@ void __init hv_vtom_init(void)
x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
+ x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present;
x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;

/* Set WB as the default cache mode. */
--
2.25.1


2024-01-16 02:21:10

by Michael Kelley

[permalink] [raw]
Subject: [PATCH v4 2/3] x86/mm: Regularize set_memory_p() parameters and make non-static

From: Michael Kelley <[email protected]>

set_memory_p() is currently static. It has parameters that don't
match set_memory_p() under arch/powerpc and that aren't congruent
with the other set_memory_* functions. There's no good reason for
the difference.

Fix this by making the parameters consistent, and update the one
existing call site. Make the function non-static and add it to
include/asm/set_memory.h so that it is completely parallel to
set_memory_np() and is usable in other modules.

No functional change.

Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/mm/pat/set_memory.c | 12 ++++++------
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index a5e89641bd2d..9aee31862b4a 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -47,6 +47,7 @@ int set_memory_uc(unsigned long addr, int numpages);
int set_memory_wc(unsigned long addr, int numpages);
int set_memory_wb(unsigned long addr, int numpages);
int set_memory_np(unsigned long addr, int numpages);
+int set_memory_p(unsigned long addr, int numpages);
int set_memory_4k(unsigned long addr, int numpages);
int set_memory_encrypted(unsigned long addr, int numpages);
int set_memory_decrypted(unsigned long addr, int numpages);
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index e76ac64b516e..164d32029424 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2045,17 +2045,12 @@ int set_mce_nospec(unsigned long pfn)
return rc;
}

-static int set_memory_p(unsigned long *addr, int numpages)
-{
- return change_page_attr_set(addr, numpages, __pgprot(_PAGE_PRESENT), 0);
-}
-
/* Restore full speculative operation to the pfn. */
int clear_mce_nospec(unsigned long pfn)
{
unsigned long addr = (unsigned long) pfn_to_kaddr(pfn);

- return set_memory_p(&addr, 1);
+ return set_memory_p(addr, 1);
}
EXPORT_SYMBOL_GPL(clear_mce_nospec);
#endif /* CONFIG_X86_64 */
@@ -2108,6 +2103,11 @@ int set_memory_np_noalias(unsigned long addr, int numpages)
CPA_NO_CHECK_ALIAS, NULL);
}

+int set_memory_p(unsigned long addr, int numpages)
+{
+ return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+}
+
int set_memory_4k(unsigned long addr, int numpages)
{
return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
--
2.25.1


2024-01-16 02:21:20

by Michael Kelley

[permalink] [raw]
Subject: [PATCH v4 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

From: Michael Kelley <[email protected]>

In preparation for temporarily marking pages not present during a
transition between encrypted and decrypted, use slow_virt_to_phys()
in the hypervisor callback. As long as the PFN is correct,
slow_virt_to_phys() works even if the leaf PTE is not present.
The existing functions that depend on vmalloc_to_page() all
require that the leaf PTE be marked present, so they don't work.

Update the comments for slow_virt_to_phys() to note this broader usage
and the requirement to work even if the PTE is not marked present.

Signed-off-by: Michael Kelley <[email protected]>
---
arch/x86/hyperv/ivm.c | 12 +++++++++++-
arch/x86/mm/pat/set_memory.c | 12 ++++++++----
2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 02e55237d919..851107c77f4d 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -515,6 +515,8 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
enum hv_mem_host_visibility visibility = enc ?
VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE;
u64 *pfn_array;
+ phys_addr_t paddr;
+ void *vaddr;
int ret = 0;
bool result = true;
int i, pfn;
@@ -524,7 +526,15 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
return false;

for (i = 0, pfn = 0; i < pagecount; i++) {
- pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
+ /*
+ * Use slow_virt_to_phys() because the PRESENT bit has been
+ * temporarily cleared in the PTEs. slow_virt_to_phys() works
+ * without the PRESENT bit while virt_to_hvpfn() or similar
+ * does not.
+ */
+ vaddr = (void *)kbuffer + (i * HV_HYP_PAGE_SIZE);
+ paddr = slow_virt_to_phys(vaddr);
+ pfn_array[pfn] = paddr >> HV_HYP_PAGE_SHIFT;
pfn++;

if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index bda9f129835e..e76ac64b516e 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -755,10 +755,14 @@ pmd_t *lookup_pmd_address(unsigned long address)
* areas on 32-bit NUMA systems. The percpu areas can
* end up in this kind of memory, for instance.
*
- * This could be optimized, but it is only intended to be
- * used at initialization time, and keeping it
- * unoptimized should increase the testing coverage for
- * the more obscure platforms.
+ * Note that as long as the PTEs are well-formed with correct PFNs, this
+ * works without checking the PRESENT bit in the leaf PTE. This is unlike
+ * the similar vmalloc_to_page() and derivatives. Callers may depend on
+ * this behavior.
+ *
+ * This could be optimized, but it is only used in paths that are not perf
+ * sensitive, and keeping it unoptimized should increase the testing coverage
+ * for the more obscure platforms.
*/
phys_addr_t slow_virt_to_phys(void *__virt_addr)
{
--
2.25.1


2024-01-16 17:02:49

by Rick Edgecombe

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

On Mon, 2024-01-15 at 18:20 -0800, [email protected] wrote:
> From: Michael Kelley <[email protected]>
>
> In preparation for temporarily marking pages not present during a
> transition between encrypted and decrypted, use slow_virt_to_phys()
> in the hypervisor callback. As long as the PFN is correct,
> slow_virt_to_phys() works even if the leaf PTE is not present.
> The existing functions that depend on vmalloc_to_page() all
> require that the leaf PTE be marked present, so they don't work.
>
> Update the comments for slow_virt_to_phys() to note this broader
> usage
> and the requirement to work even if the PTE is not marked present.
>
> Signed-off-by: Michael Kelley <[email protected]>

Reviewed-by: Rick Edgecombe <[email protected]>

2024-01-16 17:16:18

by Rick Edgecombe

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

On Mon, 2024-01-15 at 18:20 -0800, [email protected] wrote:
>   x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
>     callback
>   x86/mm: Regularize set_memory_p() parameters and make non-static
>   x86/hyperv: Make encrypted/decrypted changes safe for
>     load_unaligned_zeropad()

I'm not clear on the HyperV specifics, but everything else looked good
to me. Thanks.

2024-01-17 18:26:02

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v4 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

On Mon, Jan 15, 2024 at 06:20:06PM -0800, [email protected] wrote:
> From: Michael Kelley <[email protected]>
>
> In preparation for temporarily marking pages not present during a
> transition between encrypted and decrypted, use slow_virt_to_phys()
> in the hypervisor callback. As long as the PFN is correct,
> slow_virt_to_phys() works even if the leaf PTE is not present.
> The existing functions that depend on vmalloc_to_page() all
> require that the leaf PTE be marked present, so they don't work.
>
> Update the comments for slow_virt_to_phys() to note this broader usage
> and the requirement to work even if the PTE is not marked present.
>
> Signed-off-by: Michael Kelley <[email protected]>

Acked-by: Kirill A. Shutemov <[email protected]>

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-02-09 16:01:41

by Michael Kelley

[permalink] [raw]
Subject: RE: [PATCH v4 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

From: [email protected] <[email protected]> Sent: Monday, January 15, 2024 6:20 PM
>
> In a CoCo VM, when transitioning memory from encrypted to decrypted, or
> vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
> is responsible for ensuring the memory isn't in use and isn't referenced
> while the transition is in progress. The transition has multiple steps,
> and the memory is in an inconsistent state until all steps are complete.
> A reference while the state is inconsistent could result in an exception
> that can't be cleanly fixed up.
>
> However, the kernel load_unaligned_zeropad() mechanism could cause a stray
> reference that can't be prevented by the caller of set_memory_encrypted()
> or set_memory_decrypted(), so there's specific code to handle this case.
> But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
> with the #VC or #VE exception routed to the paravisor. There's no
> architectural way to forward the exceptions back to the guest kernel, and
> in such a case, the load_unaligned_zeropad() specific code doesn't work.
>
> To avoid this problem, mark pages as "not present" while a transition
> is in progress. If load_unaligned_zeropad() causes a stray reference, a
> normal page fault is generated instead of #VC or #VE, and the
> page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
> reference. When the encrypted/decrypted transition is complete, mark the
> pages as "present" again.
>
> This version of the patch series marks transitioning pages "not present"
> only when running as a Hyper-V guest with a paravisor. Previous
> versions[1] marked transitioning pages "not present" regardless of the
> hypervisor and regardless of whether a paravisor is in use. That more
> general use had the benefit of decoupling the load_unaligned_zeropad()
> fixup from CoCo VM #VE and #VC exception handling. But the implementation
> was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks
> require a valid virtual address, not a physical address like with TDX and
> the Hyper-V paravisor. Marking the transitioning pages "not present"
> causes the virtual address to not be valid, and the PVALIDATE
> instruction in the SEV-SNP callback fails. Constructing a temporary
> virtual address for this purpose is slower and adds complexity that
> negates the benefits of the more general use. So this version narrows
> the applicability of the approach to just where it is required
> because of the #VC and #VE exceptions being routed to a paravisor.
>
> The previous version minimized the TLB flushing done during page
> transitions between encrypted and decrypted. Because this version
> marks the pages "not present" in hypervisor specific callbacks and
> not in __set_memory_enc_pgtable(), doing such optimization is more
> difficult to coordinate. But the page transitions are not a hot path,
> so this version eschews optimization of TLB flushing in favor of
> simplicity.
>
> Since this version no longer touches __set_memory_enc_pgtable(),
> I've also removed patches that add comments about error handling
> in that function. Rick Edgecombe has proposed patches to improve
> that error handling, and I'll leave those comments to Rick's
> patches.
>
> Patch 1 handles implications of the hypervisor callbacks needing
> to do virt-to-phys translations on pages that are temporarily
> marked not present.
>
> Patch 2 makes the existing set_memory_p() function available for
> use in the hypervisor callbacks.
>
> Patch 3 is the core change that marks the transitioning pages
> as not present.
>
> This patch set is based on the linux-next20240103 code tree.
>
> Changes in v4:
> * Patch 1: Updated comment in slow_virt_to_phys() to reduce the
> likelihood of the comment becoming stale. The new comment
> describes the requirement to work with leaf PTE not present,
> but doesn't directly reference the CoCo hypervisor callbacks.
> [Rick Edgecombe]
> * Patch 1: Decomposed a complex line-wrapped statement into
> multiple statements for ease of understanding. No functional
> change compared with v3. [Kirill Shutemov]
> * Patch 3: Fixed handling of memory allocation errors. [Rick
> Edgecombe]
>
> Changes in v3:
> * Major rework and simplification per discussion above.
>
> Changes in v2:
> * Added Patches 3 and 4 to deal with the failure on SEV-SNP
> [Tom Lendacky]
> * Split the main change into two separate patches (Patch 5 and
> Patch 6) to improve reviewability and to offer the option of
> retaining both hypervisor callbacks.
> * Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64
> so that the code builds correctly for 32-bit, even though it
> is never executed for 32-bit [reported by kernel test robot]
>
> [1] https://lore.kernel.org/lkml/20231121212016.1154303-1-
> [email protected]/
>
> Michael Kelley (3):
> x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
> callback
> x86/mm: Regularize set_memory_p() parameters and make non-static
> x86/hyperv: Make encrypted/decrypted changes safe for
> load_unaligned_zeropad()
>
> arch/x86/hyperv/ivm.c | 65 ++++++++++++++++++++++++++++---
> arch/x86/include/asm/set_memory.h | 1 +
> arch/x86/mm/pat/set_memory.c | 24 +++++++-----
> 3 files changed, 75 insertions(+), 15 deletions(-)
>

Wei --

Can this series go through the Hyper-V tree? It's mostly Hyper-V specific
code, plus some comments and a minor tweak to a utility function in 'mm'.

All comments on earlier versions have been addressed, and it would be
good to get some mileage in linux-next before the 6.9 merge window.

Michael



2024-03-01 09:26:35

by Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

On Mon, Jan 15, 2024 at 06:20:05PM -0800, [email protected] wrote:
> From: Michael Kelley <[email protected]>
>
> In a CoCo VM, when transitioning memory from encrypted to decrypted, or
> vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
> is responsible for ensuring the memory isn't in use and isn't referenced
> while the transition is in progress. The transition has multiple steps,
> and the memory is in an inconsistent state until all steps are complete.
> A reference while the state is inconsistent could result in an exception
> that can't be cleanly fixed up.
>
> However, the kernel load_unaligned_zeropad() mechanism could cause a stray
> reference that can't be prevented by the caller of set_memory_encrypted()
> or set_memory_decrypted(), so there's specific code to handle this case.
> But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
> with the #VC or #VE exception routed to the paravisor. There's no
> architectural way to forward the exceptions back to the guest kernel, and
> in such a case, the load_unaligned_zeropad() specific code doesn't work.
>
> To avoid this problem, mark pages as "not present" while a transition
> is in progress. If load_unaligned_zeropad() causes a stray reference, a
> normal page fault is generated instead of #VC or #VE, and the
> page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
> reference. When the encrypted/decrypted transition is complete, mark the
> pages as "present" again.
>
> This version of the patch series marks transitioning pages "not present"
> only when running as a Hyper-V guest with a paravisor. Previous
> versions[1] marked transitioning pages "not present" regardless of the
> hypervisor and regardless of whether a paravisor is in use. That more
> general use had the benefit of decoupling the load_unaligned_zeropad()
> fixup from CoCo VM #VE and #VC exception handling. But the implementation
> was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks
> require a valid virtual address, not a physical address like with TDX and
> the Hyper-V paravisor. Marking the transitioning pages "not present"
> causes the virtual address to not be valid, and the PVALIDATE
> instruction in the SEV-SNP callback fails. Constructing a temporary
> virtual address for this purpose is slower and adds complexity that
> negates the benefits of the more general use. So this version narrows
> the applicability of the approach to just where it is required
> because of the #VC and #VE exceptions being routed to a paravisor.
>
> The previous version minimized the TLB flushing done during page
> transitions between encrypted and decrypted. Because this version
> marks the pages "not present" in hypervisor specific callbacks and
> not in __set_memory_enc_pgtable(), doing such optimization is more
> difficult to coordinate. But the page transitions are not a hot path,
> so this version eschews optimization of TLB flushing in favor of
> simplicity.
>
> Since this version no longer touches __set_memory_enc_pgtable(),
> I've also removed patches that add comments about error handling
> in that function. Rick Edgecombe has proposed patches to improve
> that error handling, and I'll leave those comments to Rick's
> patches.
>
> Patch 1 handles implications of the hypervisor callbacks needing
> to do virt-to-phys translations on pages that are temporarily
> marked not present.
>
> Patch 2 makes the existing set_memory_p() function available for
> use in the hypervisor callbacks.
>
> Patch 3 is the core change that marks the transitioning pages
> as not present.
>
> This patch set is based on the linux-next20240103 code tree.
>
> Changes in v4:
> * Patch 1: Updated comment in slow_virt_to_phys() to reduce the
> likelihood of the comment becoming stale. The new comment
> describes the requirement to work with leaf PTE not present,
> but doesn't directly reference the CoCo hypervisor callbacks.
> [Rick Edgecombe]
> * Patch 1: Decomposed a complex line-wrapped statement into
> multiple statements for ease of understanding. No functional
> change compared with v3. [Kirill Shutemov]
> * Patch 3: Fixed handling of memory allocation errors. [Rick
> Edgecombe]
>
> Changes in v3:
> * Major rework and simplification per discussion above.
>
> Changes in v2:
> * Added Patches 3 and 4 to deal with the failure on SEV-SNP
> [Tom Lendacky]
> * Split the main change into two separate patches (Patch 5 and
> Patch 6) to improve reviewability and to offer the option of
> retaining both hypervisor callbacks.
> * Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64
> so that the code builds correctly for 32-bit, even though it
> is never executed for 32-bit [reported by kernel test robot]
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Michael Kelley (3):
> x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
> callback
> x86/mm: Regularize set_memory_p() parameters and make non-static
> x86/hyperv: Make encrypted/decrypted changes safe for
> load_unaligned_zeropad()

Applied. Thanks.

The changes to mm code are mostly cosmetic. The changes had been pending
for a while without any objections from the maintainers, so I picked
them up as well.

If this becomes problematic, please let me know.

>
> arch/x86/hyperv/ivm.c | 65 ++++++++++++++++++++++++++++---
> arch/x86/include/asm/set_memory.h | 1 +
> arch/x86/mm/pat/set_memory.c | 24 +++++++-----
> 3 files changed, 75 insertions(+), 15 deletions(-)
>
> --
> 2.25.1
>