LinuxLists.cc - [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

2024-01-05 18:31:04

Subject: [PATCH v3 0/3] x86/hyperv: Mark CoCo VM pages not present when changing encrypted state

From: Michael Kelley <[email protected]>

In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress. The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.

However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.

To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.

This version of the patch series marks transitioning pages "not present"
only when running as a Hyper-V guest with a paravisor. Previous
versions[1] marked transitioning pages "not present" regardless of the
hypervisor and regardless of whether a paravisor is in use. That more
general use had the benefit of decoupling the load_unaligned_zeropad()
fixup from CoCo VM #VE and #VC exception handling. But the implementation
was problematic for SEV-SNP because the SEV-SNP hypervisor callbacks
require a valid virtual address, not a physical address like with TDX and
the Hyper-V paravisor. Marking the transitioning pages "not present"
causes the virtual address to not be valid, and the PVALIDATE
instruction in the SEV-SNP callback fails. Constructing a temporary
virtual address for this purpose is slower and adds complexity that
negates the benefits of the more general use. So this version narrows
the applicability of the approach to just where it is required
because of the #VC and #VE exceptions being routed to a paravisor.

The previous version minimized the TLB flushing done during page
transitions between encrypted and decrypted. Because this version
marks the pages "not present" in hypervisor specific callbacks and
not in __set_memory_enc_pgtable(), doing such optimization is more
difficult to coordinate. But the page transitions are not a hot path,
so this version eschews optimization of TLB flushing in favor of
simplicity.

Since this version no longer touches __set_memory_enc_pgtable(),
I've also removed patches that add comments about error handling
in that function. Rick Edgecombe has proposed patches to improve
that error handling, and I'll leave those comments to Rick's
patches.

Patch 1 handles implications of the hypervisor callbacks needing
to do virt-to-phys translations on pages that are temporarily
marked not present.

Patch 2 makes the existing set_memory_p() function available for
use in the hypervisor callbacks.

Patch 3 is the core change that marks the transitioning pages
as not present.

This patch set is based on the linux-next20240103 code tree.

Changes in v3:
* Major rework and simplification per discussion above.

Changes in v2:
* Added Patches 3 and 4 to deal with the failure on SEV-SNP
[Tom Lendacky]
* Split the main change into two separate patches (Patch 5 and
Patch 6) to improve reviewability and to offer the option of
retaining both hypervisor callbacks.
* Patch 5 moves set_memory_p() out of an #ifdef CONFIG_X86_64
so that the code builds correctly for 32-bit, even though it
is never executed for 32-bit [reported by kernel test robot]

[1] https://lore.kernel.org/lkml/[email protected]/

Michael Kelley (3):
x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor
callback
x86/mm: Regularize set_memory_p() parameters and make non-static
x86/hyperv: Make encrypted/decrypted changes safe for
load_unaligned_zeropad()

arch/x86/hyperv/ivm.c | 58 ++++++++++++++++++++++++++++---
arch/x86/include/asm/set_memory.h | 1 +
arch/x86/mm/pat/set_memory.c | 25 +++++++------
3 files changed, 70 insertions(+), 14 deletions(-)

--
2.25.1

2024-01-05 18:31:14

by Michael Kelley

[permalink] [raw]

Subject: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

From: Michael Kelley <[email protected]>

In preparation for temporarily marking pages not present during a
transition between encrypted and decrypted, use slow_virt_to_phys()
in the hypervisor callback. As long as the PFN is correct,
slow_virt_to_phys() works even if the leaf PTE is not present.
The existing functions that depend on vmalloc_to_page() all
require that the leaf PTE be marked present, so they don't work.

Update the comments for slow_virt_to_phys() to note this broader usage
and the requirement to work even if the PTE is not marked present.

Signed-off-by: Michael Kelley <[email protected]>
---
arch/x86/hyperv/ivm.c | 9 ++++++++-
arch/x86/mm/pat/set_memory.c | 13 +++++++++----
2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 02e55237d919..8ba18635e338 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -524,7 +524,14 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
return false;

for (i = 0, pfn = 0; i < pagecount; i++) {
- pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
+ /*
+ * Use slow_virt_to_phys() because the PRESENT bit has been
+ * temporarily cleared in the PTEs. slow_virt_to_phys() works
+ * without the PRESENT bit while virt_to_hvpfn() or similar
+ * does not.
+ */
+ pfn_array[pfn] = slow_virt_to_phys((void *)kbuffer +
+ i * HV_HYP_PAGE_SIZE) >> HV_HYP_PAGE_SHIFT;
pfn++;

if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index bda9f129835e..8e19796e7ce5 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -755,10 +755,15 @@ pmd_t *lookup_pmd_address(unsigned long address)
* areas on 32-bit NUMA systems. The percpu areas can
* end up in this kind of memory, for instance.
*
- * This could be optimized, but it is only intended to be
- * used at initialization time, and keeping it
- * unoptimized should increase the testing coverage for
- * the more obscure platforms.
+ * It is also used in callbacks for CoCo VM page transitions between private
+ * and shared because it works when the PRESENT bit is not set in the leaf
+ * PTE. In such cases, the state of the PTEs, including the PFN, is otherwise
+ * known to be valid, so the returned physical address is correct. The similar
+ * function vmalloc_to_pfn() can't be used because it requires the PRESENT bit.
+ *
+ * This could be optimized, but it is only used in paths that are not perf
+ * sensitive, and keeping it unoptimized should increase the testing coverage
+ * for the more obscure platforms.
*/
phys_addr_t slow_virt_to_phys(void *__virt_addr)
{
--
2.25.1

2024-01-08 13:08:31

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

On Fri, Jan 05, 2024 at 10:30:23AM -0800, [email protected] wrote:
> From: Michael Kelley <[email protected]>
>
> In preparation for temporarily marking pages not present during a
> transition between encrypted and decrypted, use slow_virt_to_phys()
> in the hypervisor callback. As long as the PFN is correct,
> slow_virt_to_phys() works even if the leaf PTE is not present.
> The existing functions that depend on vmalloc_to_page() all
> require that the leaf PTE be marked present, so they don't work.
>
> Update the comments for slow_virt_to_phys() to note this broader usage
> and the requirement to work even if the PTE is not marked present.
>
> Signed-off-by: Michael Kelley <[email protected]>
> ---
> arch/x86/hyperv/ivm.c | 9 ++++++++-
> arch/x86/mm/pat/set_memory.c | 13 +++++++++----
> 2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 02e55237d919..8ba18635e338 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -524,7 +524,14 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
> return false;
>
> for (i = 0, pfn = 0; i < pagecount; i++) {
> - pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
> + /*
> + * Use slow_virt_to_phys() because the PRESENT bit has been
> + * temporarily cleared in the PTEs. slow_virt_to_phys() works
> + * without the PRESENT bit while virt_to_hvpfn() or similar
> + * does not.
> + */
> + pfn_array[pfn] = slow_virt_to_phys((void *)kbuffer +
> + i * HV_HYP_PAGE_SIZE) >> HV_HYP_PAGE_SHIFT;

I think you can make it much more readable by introducing few variables:

virt = (void *)kbuffer + i * HV_HYPPAGE_SIZE;
phys = slow_virt_to_phys(virt);
pfn_array[pfn] = phys >> HV_HYP_PAGE_SHIFT;

> pfn++;
>
> if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index bda9f129835e..8e19796e7ce5 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -755,10 +755,15 @@ pmd_t *lookup_pmd_address(unsigned long address)
> * areas on 32-bit NUMA systems. The percpu areas can
> * end up in this kind of memory, for instance.
> *
> - * This could be optimized, but it is only intended to be
> - * used at initialization time, and keeping it
> - * unoptimized should increase the testing coverage for
> - * the more obscure platforms.
> + * It is also used in callbacks for CoCo VM page transitions between private
> + * and shared because it works when the PRESENT bit is not set in the leaf
> + * PTE. In such cases, the state of the PTEs, including the PFN, is otherwise
> + * known to be valid, so the returned physical address is correct. The similar
> + * function vmalloc_to_pfn() can't be used because it requires the PRESENT bit.
> + *
> + * This could be optimized, but it is only used in paths that are not perf
> + * sensitive, and keeping it unoptimized should increase the testing coverage
> + * for the more obscure platforms.
> */
> phys_addr_t slow_virt_to_phys(void *__virt_addr)
> {
> --
> 2.25.1
>

--
Kiryl Shutsemau / Kirill A. Shutemov

2024-01-08 16:00:53

by Michael Kelley

[permalink] [raw]

Subject: RE: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

From: [email protected] <[email protected]> Sent: Monday, January 8, 2024 5:08 AM
>
> On Fri, Jan 05, 2024 at 10:30:23AM -0800, [email protected] wrote:
> > From: Michael Kelley <[email protected]>
> >
> > In preparation for temporarily marking pages not present during a
> > transition between encrypted and decrypted, use slow_virt_to_phys()
> > in the hypervisor callback. As long as the PFN is correct,
> > slow_virt_to_phys() works even if the leaf PTE is not present.
> > The existing functions that depend on vmalloc_to_page() all
> > require that the leaf PTE be marked present, so they don't work.
> >
> > Update the comments for slow_virt_to_phys() to note this broader usage
> > and the requirement to work even if the PTE is not marked present.
> >
> > Signed-off-by: Michael Kelley <[email protected]>
> > ---
> > arch/x86/hyperv/ivm.c | 9 ++++++++-
> > arch/x86/mm/pat/set_memory.c | 13 +++++++++----
> > 2 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> > index 02e55237d919..8ba18635e338 100644
> > --- a/arch/x86/hyperv/ivm.c
> > +++ b/arch/x86/hyperv/ivm.c
> > @@ -524,7 +524,14 @@ static bool hv_vtom_set_host_visibility(unsigned
> long kbuffer, int pagecount, bo
> > return false;
> >
> > for (i = 0, pfn = 0; i < pagecount; i++) {
> > - pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
> > + /*
> > + * Use slow_virt_to_phys() because the PRESENT bit has been
> > + * temporarily cleared in the PTEs. slow_virt_to_phys() works
> > + * without the PRESENT bit while virt_to_hvpfn() or similar
> > + * does not.
> > + */
> > + pfn_array[pfn] = slow_virt_to_phys((void *)kbuffer +
> > + i * HV_HYP_PAGE_SIZE) >> HV_HYP_PAGE_SHIFT;
>
> I think you can make it much more readable by introducing few variables:
>
> virt = (void *)kbuffer + i * HV_HYPPAGE_SIZE;
> phys = slow_virt_to_phys(virt);
> pfn_array[pfn] = phys >> HV_HYP_PAGE_SHIFT;
>

Agreed. I'll do this in the next version.

Michael

2024-01-12 01:20:22

by Edgecombe, Rick P

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

On Fri, 2024-01-05 at 10:30 -0800, [email protected] wrote:
> + * It is also used in callbacks for CoCo VM page transitions between
> private
> + * and shared because it works when the PRESENT bit is not set in
> the leaf
> + * PTE. In such cases, the state of the PTEs, including the PFN, is
> otherwise
> + * known to be valid, so the returned physical address is correct.
> The similar
> + * function vmalloc_to_pfn() can't be used because it requires the
> PRESENT bit.

I'm not sure about this comment. It is mostly about callers far away
and other functions in vmalloc. Probably a decent chance to get stale.
It also kind of begs the question of why vmalloc_to_pfn() requires the
present bit in the leaf.

It seems the first part of the comment is about why this is needed when
__pa() exists. One reason given is that __pa() doesn't work with
vmalloc memory. Then the next bit talks about another similar function
that works with vmalloc memory.

So the comment is a risk to get stale, and leaves me a little confused
why this function exists.

I think the reason is because vmalloc_to_pfn() *only* works with
vmalloc memory and this is needed to work on other alias mappings.

2024-01-12 15:08:07

by Michael Kelley

[permalink] [raw]

Subject: RE: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

From: Edgecombe, Rick P <[email protected]> Sent: Thursday, January 11, 2024 5:20 PM
>
> On Fri, 2024-01-05 at 10:30 -0800, [email protected] wrote:
> > + * It is also used in callbacks for CoCo VM page transitions between private
> > + * and shared because it works when the PRESENT bit is not set in the leaf
> > + * PTE. In such cases, the state of the PTEs, including the PFN, is otherwise
> > + * known to be valid, so the returned physical address is correct. The similar
> > + * function vmalloc_to_pfn() can't be used because it requires the PRESENT bit.
>
> I'm not sure about this comment. It is mostly about callers far away
> and other functions in vmalloc. Probably a decent chance to get stale.
> It also kind of begs the question of why vmalloc_to_pfn() requires the
> present bit in the leaf.

The comment is Kirill Shutemov's suggestion based on comments in
an earlier version of the patch series. See [1]. The intent is to prevent
some later revision to slow_virt_to_phys() from adding a check for the
present bit and breaking the CoCo VM hypervisor callback. Yes, the
comment could get stale, but I'm not sure how else to call out the
implicit dependency. The idea of creating a private version of
slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
is also discussed in the thread, but that seems worse overall.

As for why vmalloc_to_page() checks the present bit, I don't know.
But vmalloc_to_page() is very widely used, so trying to change it
doesn't seem viable.

>
> It seems the first part of the comment is about why this is needed when
> __pa() exists. One reason given is that __pa() doesn't work with
> vmalloc memory. Then the next bit talks about another similar function
> that works with vmalloc memory.
>
> So the comment is a risk to get stale, and leaves me a little confused
> why this function exists.
>
> I think the reason is because vmalloc_to_pfn() *only* works with
> vmalloc memory and this is needed to work on other alias mappings.

Presumably so. The first paragraph of the existing comment also
calls out "alloc_remap() areas on 32-bit NUMA systems" as
needing slow_virt_to_phys().

Michael

[1] https://lore.kernel.org/lkml/[email protected]/

2024-01-12 17:17:51

by Edgecombe, Rick P

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

On Fri, 2024-01-12 at 15:07 +0000, Michael Kelley wrote:
> The comment is Kirill Shutemov's suggestion based on comments in
> an earlier version of the patch series. See [1]. The intent is to
> prevent
> some later revision to slow_virt_to_phys() from adding a check for
> the
> present bit and breaking the CoCo VM hypervisor callback. Yes, the
> comment could get stale, but I'm not sure how else to call out the
> implicit dependency. The idea of creating a private version of
> slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
> is also discussed in the thread, but that seems worse overall.

Well, it's not a huge deal, but I would have just put a comment at the
caller like:

/*
* Use slow_virt_to_phys() instead of vmalloc_to_page(), because it
* returns the PFN even for NP PTEs.
*/

If someone is changing slow_virt_to_phys() they should check the
callers to make sure they won't break anything. They can see the
comment then and have an easy time.

An optional comment at slow_virt_to_phys() could explain how the
function works in regards to the present bit, but not include details
about CoCoO VM page transition's usage of the present bit. The proposed
comment text looks like something more appropriate for a commit log.

2024-01-12 19:24:54

by Michael Kelley

[permalink] [raw]

Subject: RE: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

From: Edgecombe, Rick P <[email protected]> Sent: Friday, January 12, 2024 9:17 AM
>
> On Fri, 2024-01-12 at 15:07 +0000, Michael Kelley wrote:
> > The comment is Kirill Shutemov's suggestion based on comments in
> > an earlier version of the patch series. See [1]. The intent is to
> > prevent
> > some later revision to slow_virt_to_phys() from adding a check for
> > the
> > present bit and breaking the CoCo VM hypervisor callback. Yes, the
> > comment could get stale, but I'm not sure how else to call out the
> > implicit dependency. The idea of creating a private version of
> > slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
> > is also discussed in the thread, but that seems worse overall.
>
> Well, it's not a huge deal, but I would have just put a comment at the
> caller like:
>
> /*
> * Use slow_virt_to_phys() instead of vmalloc_to_page(), because it
> * returns the PFN even for NP PTEs.
> */

Yes, that comment is added in this patch.

>
> If someone is changing slow_virt_to_phys() they should check the
> callers to make sure they won't break anything. They can see the
> comment then and have an easy time.
>
> An optional comment at slow_virt_to_phys() could explain how the
> function works in regards to the present bit, but not include details
> about CoCoO VM page transition's usage of the present bit. The proposed
> comment text looks like something more appropriate for a commit log.

Kirill -- you originally asked for a comment in slow_virt_to_phys(). [1]
Are you OK with Rick's proposal?

Michael

[1] https://lore.kernel.org/lkml/[email protected]/

2024-01-15 10:00:30

by Kirill A. Shutemov

[permalink] [raw]

Subject: Re: [PATCH v3 1/3] x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback

On Fri, Jan 12, 2024 at 07:24:35PM +0000, Michael Kelley wrote:
> From: Edgecombe, Rick P <[email protected]> Sent: Friday, January 12, 2024 9:17 AM
> >
> > On Fri, 2024-01-12 at 15:07 +0000, Michael Kelley wrote:
> > > The comment is Kirill Shutemov's suggestion based on comments in
> > > an earlier version of the patch series.? See [1].?? The intent is to
> > > prevent
> > > some later revision to slow_virt_to_phys() from adding a check for
> > > the
> > > present bit and breaking the CoCo VM hypervisor callback.? Yes, the
> > > comment could get stale, but I'm not sure how else to call out the
> > > implicit dependency.? The idea of creating a private version of
> > > slow_virt_to_phys() for use only in the CoCo VM hypervisor callback
> > > is also discussed in the thread, but that seems worse overall.
> >
> > Well, it's not a huge deal, but I would have just put a comment at the
> > caller like:
> >
> > /*
> > * Use slow_virt_to_phys() instead of vmalloc_to_page(), because it
> > * returns the PFN even for NP PTEs.
> > */
>
> Yes, that comment is added in this patch.
>
> >
> > If someone is changing slow_virt_to_phys() they should check the
> > callers to make sure they won't break anything. They can see the
> > comment then and have an easy time.
> >
> > An optional comment at slow_virt_to_phys() could explain how the
> > function works in regards to the present bit, but not include details
> > about CoCoO VM page transition's usage of the present bit. The proposed
> > comment text looks like something more appropriate for a commit log.
>
> Kirill -- you originally asked for a comment in slow_virt_to_phys(). [1]
> Are you OK with Rick's proposal?

Yes, sounds sensible.

--
Kiryl Shutsemau / Kirill A. Shutemov