2023-01-12 22:02:59

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 00/14] Add PCI pass-thru support to Hyper-V Confidential VMs

This patch series adds support for PCI pass-thru devices to Hyper-V
Confidential VMs (also called "Isolation VMs"). But in preparation, it
first changes how private (encrypted) vs. shared (decrypted) memory is
handled in Hyper-V SEV-SNP guest VMs. The new approach builds on the
confidential computing (coco) mechanisms introduced in the 5.19 kernel
for TDX support and significantly reduces the amount of Hyper-V specific
code. Furthermore, with this new approach a proposed RFC patch set for
generic DMA layer functionality[1] is no longer necessary.

Background
==========
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves. vTOM is the dividing line where the uppermost bit of the
physical address space is set; e.g., with 47 bits of guest physical
address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical
memory is accessible at two parallel physical addresses -- one below
vTOM and one above vTOM. Accesses below vTOM are private (encrypted)
while accesses above vTOM are shared (decrypted). In this sense, vTOM
is like the GPA.SHARED bit in Intel TDX.

In Hyper-V's use of vTOM, the normal guest OS runs at VMPL2, while
a Hyper-V provided "paravisor" runs at VMPL0 in the guest VM. (VMPL is
Virtual Machine Privilege Level. See AMD's SEV-SNP spec for more
details.) The paravisor provides emulation for various system devices
like the IO-APIC and virtual TPM as part of the guest VM. Accesses to
such devices made by the normal guest OS trap to the paravisor and are
emulated in the guest VM context instead of in the Hyper-V host. This
emulation is invisible to the normal guest OS, but with the quirk that
memory mapped I/O accesses to these devices must be treated as private,
not shared as would be the case for other device accesses.

Support for Hyper-V guests using vTOM was added to the Linux kernel
in two patch sets[2][3]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, the core
approach is to create a second kernel virtual mapping that maps to
parallel physical addresses above vTOM, while leaving the original
mapping unchanged. Most of the code for creating that second virtual
mapping is confined to Hyper-V specific areas, but there are are also
changes to generic swiotlb code.

Changes in this patch set
=========================
In preparation for supporting PCI pass-thru devices, this patch set
changes the core approach for handling vTOM. In the new approach,
the vTOM bit is treated as a protection flag, and not as part of
the physical address. This new approach is like the approach for
the GPA.SHARED bit in Intel TDX. Furthermore, there's no need to
create a second kernel virtual mapping. When memory is changed
between private and shared using set_memory_decrypted() and
set_memory_encrypted(), the PTEs for the existing kernel mapping
are changed to add or remove the vTOM bit just as with TDX. The
hypercalls to change the memory status on the host side are made
using the existing callback mechanism. Everything just works, with
tweaks to map the IO-APIC and vTPM to use private accesses as
mentioned above.

With the new handling of vTOM in place, existing Hyper-V code that
creates the second kernel virtual mapping still works, but it is now
redundant as the original kernel virtual mapping (as updated) maps
to the same physical address. To simplify things going forward, this
patch set removes the code that creates the second kernel virtual
mapping. And since a second kernel virtual mapping is no longer
needed, changes to the DMA layer proposed as an RFC[1] are no
longer needed.

Finally, to support PCI pass-thru in a Confidential VM, Hyper-V
requires that all accesses to PCI config space be emulated using
a hypercall. This patch set adds functions to invoke those
hypercalls and uses them in the config space access functions
in the Hyper-V PCI driver. Lastly, the Hyper-V PCI driver is
marked as allowed to be used in a Confidential VM. The Hyper-V
PCI driver has been hardened against a malicious Hyper-V in a
previous patch set.[4]

Patch Organization
==================
Patches 1 thru 6 are prepatory patches that account for
slightly different assumptions when running in a Hyper-V VM
with vTOM, fix some minor bugs, and make temporary tweaks
to avoid needing a single large patch to make the transition
from the old approach to the new approach.

Patch 7 enables the new approach to handling vTOM for Hyper-V
guest VMs. This is the core patch after which the new approach
is in effect.

Patches 8 thru 11 remove existing code for creating the second
kernel virtual mapping that is no longer necessary with the
new approach.

Patch 12 updates existing code so that it no longer assumes that
the vTOM bit is part of the physical address.

Patches 13 and 14 add new hypercalls for accessing MMIO space
and use those hypercalls for PCI config space. They also enable
the Hyper-V vPCI driver to be used in a Confidential VM.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/

---

Changes in v5:
* Add new Patch 6 and update hv_vtom_init() in Patch 7 so that
the virtual TPM in the guest is mapped as encrypted

* Update commit messages for Patches 1 thru 4, and 13 [Boris Petkov]

* Remove the Fixes tag in Patch 4 after discussion upstream
[Boris Petkov, Tom Lendacky, others]

* Use PAGE_KERNEL instead of PAGE_KERNEL_NOENC in arch independent
Hyper-V code. PAGE_KERNEL_NOENC doesn't exist for ARM64, so it
causes compile errors. Using PAGE_KERNEL means or'ing in
sme_me_mask when on x86, but it will be zero for vTOM VMs.

* In Patch 7, break out amd_cc_platform_has() handling of vTOM
into a separate helper function [Boris Petkov]

Changes in v4:
* Remove previous Patch 1 from this series and submit separately
[Dave Hansen & Boris Petkov]

* Patch 1: Change the name of the new CC_ATTR that controls
whether the IO-APIC is mapped decrypted [Boris Petkov]

* Patch 4: Use sme_me_mask directly instead of calling the
getter function. Add Fixes: tag. [Tom Lendacky]

* Patch 6: Remove CC_VENDOR_HYPERV and merge associated
vTOM functionality under CC_VENDOR_AMD. [Boris Petkov]

* Patch 8: Use bitwise OR to pick up the vTOM bit in
shared_gpa_boundary rather than adding it

Changes in v3:
* Patch 1: Tweak the code fix to cleanly separate the page
alignment and physical address masking [Dave Hansen]

* Patch 2: Change the name of the new CC_ATTR that controls
whether the IO-APIC is mapped decrypted [Dave Hansen]

* Patch 5 (now patch 7): Add CC_ATTR_MEM_ENCRYPT to what
Hyper-V vTOM reports as 'true'. With the addition, Patches
5 and 6 are new to accomodate working correctly with Hyper-V
VMs using vTOM. [Tom Lendacky]

Changes in v2:
* Patch 11: Include more detail in the error message if an MMIO
hypercall fails. [Bjorn Helgaas]

* Patch 12: Restore removed memory barriers. It seems like these
barriers should not be needed because of the spin_unlock() calls,
but commit bdd74440d9e8 indicates that they are. This patch series
will leave the barriers unchanged; whether they are really needed
can be sorted out separately. [Boqun Feng]

Michael Kelley (14):
x86/ioapic: Gate decrypted mapping on cc_platform_has() attribute
x86/hyperv: Reorder code to facilitate future work
Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
init: Call mem_encrypt_init() after Hyper-V hypercall init is done
x86/ioremap: Support hypervisor specified range to map as encrypted
x86/hyperv: Change vTOM handling to use standard coco mechanisms
swiotlb: Remove bounce buffer remapping for Hyper-V
Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages
Drivers: hv: vmbus: Remove second way of mapping ring buffers
hv_netvsc: Remove second mapping of send and recv buffers
Drivers: hv: Don't remap addresses that are above shared_gpa_boundary
PCI: hv: Add hypercalls to read/write MMIO space
PCI: hv: Enable PCI pass-thru devices in Confidential VMs

arch/x86/coco/core.c | 43 +++++--
arch/x86/hyperv/hv_init.c | 18 +--
arch/x86/hyperv/ivm.c | 134 +++++++++++----------
arch/x86/include/asm/coco.h | 1 -
arch/x86/include/asm/hyperv-tlfs.h | 3 +
arch/x86/include/asm/io.h | 2 +
arch/x86/include/asm/mshyperv.h | 8 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/apic/io_apic.c | 3 +-
arch/x86/kernel/cpu/mshyperv.c | 22 ++--
arch/x86/mm/ioremap.c | 27 ++++-
arch/x86/mm/mem_encrypt_amd.c | 10 +-
arch/x86/mm/pat/set_memory.c | 3 -
drivers/hv/Kconfig | 1 -
drivers/hv/channel_mgmt.c | 2 +-
drivers/hv/connection.c | 113 +++++-------------
drivers/hv/hv.c | 23 ++--
drivers/hv/hv_common.c | 11 --
drivers/hv/hyperv_vmbus.h | 2 -
drivers/hv/ring_buffer.c | 62 ++++------
drivers/hv/vmbus_drv.c | 1 -
drivers/net/hyperv/hyperv_net.h | 2 -
drivers/net/hyperv/netvsc.c | 48 +-------
drivers/pci/controller/pci-hyperv.c | 232 ++++++++++++++++++++++++++----------
include/asm-generic/hyperv-tlfs.h | 22 ++++
include/asm-generic/mshyperv.h | 4 +-
include/linux/cc_platform.h | 12 ++
include/linux/swiotlb.h | 2 -
init/main.c | 19 +--
kernel/dma/swiotlb.c | 45 +------
30 files changed, 440 insertions(+), 436 deletions(-)

--
1.8.3.1


2023-01-12 22:05:55

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 12/14] Drivers: hv: Don't remap addresses that are above shared_gpa_boundary

With the vTOM bit now treated as a protection flag and not part of
the physical address, avoid remapping physical addresses with vTOM set
since technically such addresses aren't valid. Use ioremap_cache()
instead of memremap() to ensure that the mapping provides decrypted
access, which will correctly set the vTOM bit as a protection flag.

While this change is not required for correctness with the current
implementation of memremap(), for general code hygiene it's better to
not depend on the mapping functions doing something reasonable with
a physical address that is out-of-range.

While here, fix typos in two error messages.

Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/hv_init.c | 7 +++++--
drivers/hv/hv.c | 23 +++++++++++++----------
2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index edbc67e..a5f9474 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -63,7 +63,10 @@ static int hyperv_init_ghcb(void)
* memory boundary and map it here.
*/
rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
- ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+
+ /* Mask out vTOM bit. ioremap_cache() maps decrypted */
+ ghcb_gpa &= ~ms_hyperv.shared_gpa_boundary;
+ ghcb_va = (void *)ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
if (!ghcb_va)
return -ENOMEM;

@@ -217,7 +220,7 @@ static int hv_cpu_die(unsigned int cpu)
if (hv_ghcb_pg) {
ghcb_va = (void **)this_cpu_ptr(hv_ghcb_pg);
if (*ghcb_va)
- memunmap(*ghcb_va);
+ iounmap(*ghcb_va);
*ghcb_va = NULL;
}

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 4d6480d..410e6c4 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -217,11 +217,13 @@ void hv_synic_enable_regs(unsigned int cpu)
simp.simp_enabled = 1;

if (hv_isolation_type_snp()) {
+ /* Mask out vTOM bit. ioremap_cache() maps decrypted */
+ u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
+ ~ms_hyperv.shared_gpa_boundary;
hv_cpu->synic_message_page
- = memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
- HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+ = (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
if (!hv_cpu->synic_message_page)
- pr_err("Fail to map syinc message page.\n");
+ pr_err("Fail to map synic message page.\n");
} else {
simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
>> HV_HYP_PAGE_SHIFT;
@@ -234,12 +236,13 @@ void hv_synic_enable_regs(unsigned int cpu)
siefp.siefp_enabled = 1;

if (hv_isolation_type_snp()) {
- hv_cpu->synic_event_page =
- memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
- HV_HYP_PAGE_SIZE, MEMREMAP_WB);
-
+ /* Mask out vTOM bit. ioremap_cache() maps decrypted */
+ u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
+ ~ms_hyperv.shared_gpa_boundary;
+ hv_cpu->synic_event_page
+ = (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
if (!hv_cpu->synic_event_page)
- pr_err("Fail to map syinc event page.\n");
+ pr_err("Fail to map synic event page.\n");
} else {
siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
>> HV_HYP_PAGE_SHIFT;
@@ -316,7 +319,7 @@ void hv_synic_disable_regs(unsigned int cpu)
*/
simp.simp_enabled = 0;
if (hv_isolation_type_snp())
- memunmap(hv_cpu->synic_message_page);
+ iounmap(hv_cpu->synic_message_page);
else
simp.base_simp_gpa = 0;

@@ -326,7 +329,7 @@ void hv_synic_disable_regs(unsigned int cpu)
siefp.siefp_enabled = 0;

if (hv_isolation_type_snp())
- memunmap(hv_cpu->synic_event_page);
+ iounmap(hv_cpu->synic_event_page);
else
siefp.base_siefp_gpa = 0;

--
1.8.3.1

2023-01-12 22:05:58

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 03/14] Drivers: hv: Explicitly request decrypted in vmap_pfn() calls

Update vmap_pfn() calls to explicitly request that the mapping
be for decrypted access to the memory. There's no change in
functionality since the PFNs passed to vmap_pfn() are above the
shared_gpa_boundary, implicitly producing a decrypted mapping.
But explicitly requesting "decrypted" allows the code to work
before and after changes that cause vmap_pfn() to mask the
PFNs to being below the shared_gpa_boundary.

Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 2 +-
drivers/hv/ring_buffer.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index f33c67e..5648efb 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -343,7 +343,7 @@ void *hv_map_memory(void *addr, unsigned long size)
pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
(ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);

- vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+ vaddr = vmap_pfn(pfns, size / PAGE_SIZE, pgprot_decrypted(PAGE_KERNEL));
kfree(pfns);

return vaddr;
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index c6692fd..2111e97 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -211,7 +211,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,

ring_info->ring_buffer = (struct hv_ring_buffer *)
vmap_pfn(pfns_wraparound, page_cnt * 2 - 1,
- PAGE_KERNEL);
+ pgprot_decrypted(PAGE_KERNEL));
kfree(pfns_wraparound);

if (!ring_info->ring_buffer)
--
1.8.3.1

2023-01-12 22:23:56

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 09/14] Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages

With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().

As such, remove the code to create and manage the second
mapping for VMBus monitor pages. Because set_memory_decrypted()
and set_memory_encrypted() are no-ops in normal VMs, it's
not even necessary to test for being in a Confidential VM
(a.k.a., "Isolation VM").

Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
drivers/hv/connection.c | 113 ++++++++++++----------------------------------
drivers/hv/hyperv_vmbus.h | 2 -
2 files changed, 28 insertions(+), 87 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 9dc27e5..f670cfd 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -104,8 +104,14 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
}

- msg->monitor_page1 = vmbus_connection.monitor_pages_pa[0];
- msg->monitor_page2 = vmbus_connection.monitor_pages_pa[1];
+ /*
+ * shared_gpa_boundary is zero in non-SNP VMs, so it's safe to always
+ * bitwise OR it
+ */
+ msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]) |
+ ms_hyperv.shared_gpa_boundary;
+ msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]) |
+ ms_hyperv.shared_gpa_boundary;

msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);

@@ -219,72 +225,27 @@ int vmbus_connect(void)
* Setup the monitor notification facility. The 1st page for
* parent->child and the 2nd page for child->parent
*/
- vmbus_connection.monitor_pages[0] = (void *)hv_alloc_hyperv_zeroed_page();
- vmbus_connection.monitor_pages[1] = (void *)hv_alloc_hyperv_zeroed_page();
+ vmbus_connection.monitor_pages[0] = (void *)hv_alloc_hyperv_page();
+ vmbus_connection.monitor_pages[1] = (void *)hv_alloc_hyperv_page();
if ((vmbus_connection.monitor_pages[0] == NULL) ||
(vmbus_connection.monitor_pages[1] == NULL)) {
ret = -ENOMEM;
goto cleanup;
}

- vmbus_connection.monitor_pages_original[0]
- = vmbus_connection.monitor_pages[0];
- vmbus_connection.monitor_pages_original[1]
- = vmbus_connection.monitor_pages[1];
- vmbus_connection.monitor_pages_pa[0]
- = virt_to_phys(vmbus_connection.monitor_pages[0]);
- vmbus_connection.monitor_pages_pa[1]
- = virt_to_phys(vmbus_connection.monitor_pages[1]);
-
- if (hv_is_isolation_supported()) {
- ret = set_memory_decrypted((unsigned long)
- vmbus_connection.monitor_pages[0],
- 1);
- ret |= set_memory_decrypted((unsigned long)
- vmbus_connection.monitor_pages[1],
- 1);
- if (ret)
- goto cleanup;
-
- /*
- * Isolation VM with AMD SNP needs to access monitor page via
- * address space above shared gpa boundary.
- */
- if (hv_isolation_type_snp()) {
- vmbus_connection.monitor_pages_pa[0] +=
- ms_hyperv.shared_gpa_boundary;
- vmbus_connection.monitor_pages_pa[1] +=
- ms_hyperv.shared_gpa_boundary;
-
- vmbus_connection.monitor_pages[0]
- = memremap(vmbus_connection.monitor_pages_pa[0],
- HV_HYP_PAGE_SIZE,
- MEMREMAP_WB);
- if (!vmbus_connection.monitor_pages[0]) {
- ret = -ENOMEM;
- goto cleanup;
- }
-
- vmbus_connection.monitor_pages[1]
- = memremap(vmbus_connection.monitor_pages_pa[1],
- HV_HYP_PAGE_SIZE,
- MEMREMAP_WB);
- if (!vmbus_connection.monitor_pages[1]) {
- ret = -ENOMEM;
- goto cleanup;
- }
- }
-
- /*
- * Set memory host visibility hvcall smears memory
- * and so zero monitor pages here.
- */
- memset(vmbus_connection.monitor_pages[0], 0x00,
- HV_HYP_PAGE_SIZE);
- memset(vmbus_connection.monitor_pages[1], 0x00,
- HV_HYP_PAGE_SIZE);
+ ret = set_memory_decrypted((unsigned long)
+ vmbus_connection.monitor_pages[0], 1);
+ ret |= set_memory_decrypted((unsigned long)
+ vmbus_connection.monitor_pages[1], 1);
+ if (ret)
+ goto cleanup;

- }
+ /*
+ * Set_memory_decrypted() will change the memory contents if
+ * decryption occurs, so zero monitor pages here.
+ */
+ memset(vmbus_connection.monitor_pages[0], 0x00, HV_HYP_PAGE_SIZE);
+ memset(vmbus_connection.monitor_pages[1], 0x00, HV_HYP_PAGE_SIZE);

msginfo = kzalloc(sizeof(*msginfo) +
sizeof(struct vmbus_channel_initiate_contact),
@@ -376,31 +337,13 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}

- if (hv_is_isolation_supported()) {
- /*
- * memunmap() checks input address is ioremap address or not
- * inside. It doesn't unmap any thing in the non-SNP CVM and
- * so not check CVM type here.
- */
- memunmap(vmbus_connection.monitor_pages[0]);
- memunmap(vmbus_connection.monitor_pages[1]);
-
- set_memory_encrypted((unsigned long)
- vmbus_connection.monitor_pages_original[0],
- 1);
- set_memory_encrypted((unsigned long)
- vmbus_connection.monitor_pages_original[1],
- 1);
- }
+ set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[0], 1);
+ set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[1], 1);

- hv_free_hyperv_page((unsigned long)
- vmbus_connection.monitor_pages_original[0]);
- hv_free_hyperv_page((unsigned long)
- vmbus_connection.monitor_pages_original[1]);
- vmbus_connection.monitor_pages_original[0] =
- vmbus_connection.monitor_pages[0] = NULL;
- vmbus_connection.monitor_pages_original[1] =
- vmbus_connection.monitor_pages[1] = NULL;
+ hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
+ hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
+ vmbus_connection.monitor_pages[0] = NULL;
+ vmbus_connection.monitor_pages[1] = NULL;
}

/*
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index dc673ed..167ac51 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -241,8 +241,6 @@ struct vmbus_connection {
* is child->parent notification
*/
struct hv_monitor_page *monitor_pages[2];
- void *monitor_pages_original[2];
- phys_addr_t monitor_pages_pa[2];
struct list_head chn_msg_list;
spinlock_t channelmsg_lock;

--
1.8.3.1

2023-01-12 22:23:58

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 07/14] x86/hyperv: Change vTOM handling to use standard coco mechanisms

Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves. vTOM is the dividing line where the uppermost bit of the
physical address space is set; e.g., with 47 bits of guest physical
address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical
memory is accessible at two parallel physical addresses -- one below
vTOM and one above vTOM. Accesses below vTOM are private (encrypted)
while accesses above vTOM are shared (decrypted). In this sense, vTOM
is like the GPA.SHARED bit in Intel TDX.

Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.

A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms. When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit
in the guest physical address, just as with TDX. The hypercalls to
change the memory status on the host side are made using the existing
callback mechanism. Everything just works, with a minor tweak to map
the IO-APIC to use private accesses.

To accomplish the switch in approach, the following must be done:

* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.

* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address

* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.

* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().

* Remove the Hyper-V special case from __set_memory_enc_dec()

* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

Signed-off-by: Michael Kelley <[email protected]>
---
arch/x86/coco/core.c | 43 ++++++++++++++++++++++-------
arch/x86/hyperv/hv_init.c | 11 --------
arch/x86/hyperv/ivm.c | 58 ++++++++++++++++++++++++++++++++--------
arch/x86/include/asm/coco.h | 1 -
arch/x86/include/asm/mshyperv.h | 8 ++----
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/mshyperv.c | 15 +++++------
arch/x86/mm/pat/set_memory.c | 3 ---
drivers/hv/vmbus_drv.c | 1 -
include/asm-generic/mshyperv.h | 2 ++
10 files changed, 92 insertions(+), 51 deletions(-)

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index 49b44f8..0670961 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -29,6 +29,19 @@ static bool intel_cc_platform_has(enum cc_attr attr)
}
}

+/* Helper function for AMD SEV-SNP vTOM case */
+static __maybe_unused bool amd_cc_platform_vtom(enum cc_attr attr)
+{
+ switch (attr) {
+ case CC_ATTR_GUEST_MEM_ENCRYPT:
+ case CC_ATTR_MEM_ENCRYPT:
+ case CC_ATTR_ACCESS_IOAPIC_ENCRYPTED:
+ return true;
+ default:
+ return false;
+ }
+}
+
/*
* SME and SEV are very similar but they are not the same, so there are
* times that the kernel will need to distinguish between SME and SEV. The
@@ -41,9 +54,20 @@ static bool intel_cc_platform_has(enum cc_attr attr)
* up under SME the trampoline area cannot be encrypted, whereas under SEV
* the trampoline area must be encrypted.
*/
+
static bool amd_cc_platform_has(enum cc_attr attr)
{
#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+ /*
+ * Handle the SEV-SNP vTOM case where sme_me_mask is zero, and
+ * the other levels of SME/SEV functionality, including C-bit
+ * based SEV-SNP, are not enabled.
+ */
+ if (sev_status & MSR_AMD64_SNP_VTOM_ENABLED)
+ return amd_cc_platform_vtom(attr);
+
+ /* Handle the C-bit case */
switch (attr) {
case CC_ATTR_MEM_ENCRYPT:
return sme_me_mask;
@@ -76,11 +100,6 @@ static bool amd_cc_platform_has(enum cc_attr attr)
#endif
}

-static bool hyperv_cc_platform_has(enum cc_attr attr)
-{
- return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
-}
-
bool cc_platform_has(enum cc_attr attr)
{
switch (vendor) {
@@ -88,8 +107,6 @@ bool cc_platform_has(enum cc_attr attr)
return amd_cc_platform_has(attr);
case CC_VENDOR_INTEL:
return intel_cc_platform_has(attr);
- case CC_VENDOR_HYPERV:
- return hyperv_cc_platform_has(attr);
default:
return false;
}
@@ -103,11 +120,14 @@ u64 cc_mkenc(u64 val)
* encryption status of the page.
*
* - for AMD, bit *set* means the page is encrypted
- * - for Intel *clear* means encrypted.
+ * - for AMD with vTOM and for Intel, *clear* means encrypted
*/
switch (vendor) {
case CC_VENDOR_AMD:
- return val | cc_mask;
+ if (sev_status & MSR_AMD64_SNP_VTOM_ENABLED)
+ return val & ~cc_mask;
+ else
+ return val | cc_mask;
case CC_VENDOR_INTEL:
return val & ~cc_mask;
default:
@@ -120,7 +140,10 @@ u64 cc_mkdec(u64 val)
/* See comment in cc_mkenc() */
switch (vendor) {
case CC_VENDOR_AMD:
- return val & ~cc_mask;
+ if (sev_status & MSR_AMD64_SNP_VTOM_ENABLED)
+ return val | cc_mask;
+ else
+ return val & ~cc_mask;
case CC_VENDOR_INTEL:
return val | cc_mask;
default:
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 41ef036..edbc67e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -29,7 +29,6 @@
#include <linux/syscore_ops.h>
#include <clocksource/hyperv_timer.h>
#include <linux/highmem.h>
-#include <linux/swiotlb.h>

int hyperv_init_cpuhp;
u64 hv_current_partition_id = ~0ull;
@@ -504,16 +503,6 @@ void __init hyperv_init(void)
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);

-#ifdef CONFIG_SWIOTLB
- /*
- * Swiotlb bounce buffer needs to be mapped in extra address
- * space. Map function doesn't work in the early place and so
- * call swiotlb_update_mem_attributes() here.
- */
- if (hv_is_isolation_supported())
- swiotlb_update_mem_attributes();
-#endif
-
return;

clean_guest_os_id:
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 5648efb..43bc193 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -13,6 +13,8 @@
#include <asm/svm.h>
#include <asm/sev.h>
#include <asm/io.h>
+#include <asm/coco.h>
+#include <asm/mem_encrypt.h>
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>

@@ -233,7 +235,6 @@ void hv_ghcb_msr_read(u64 msr, u64 *value)
local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
-#endif

/*
* hv_mark_gpa_visibility - Set pages visible to host via hvcall.
@@ -286,27 +287,25 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
}

/*
- * hv_set_mem_host_visibility - Set specified memory visible to host.
+ * hv_vtom_set_host_visibility - Set specified memory visible to host.
*
* In Isolation VM, all guest memory is encrypted from host and guest
* needs to set memory visible to host via hvcall before sharing memory
* with host. This function works as wrap of hv_mark_gpa_visibility()
* with memory base and size.
*/
-int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visible)
+static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bool enc)
{
- enum hv_mem_host_visibility visibility = visible ?
- VMBUS_PAGE_VISIBLE_READ_WRITE : VMBUS_PAGE_NOT_VISIBLE;
+ enum hv_mem_host_visibility visibility = enc ?
+ VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE;
u64 *pfn_array;
int ret = 0;
+ bool result = true;
int i, pfn;

- if (!hv_is_isolation_supported() || !hv_hypercall_pg)
- return 0;
-
pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
if (!pfn_array)
- return -ENOMEM;
+ return false;

for (i = 0, pfn = 0; i < pagecount; i++) {
pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
@@ -315,17 +314,54 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl
if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
ret = hv_mark_gpa_visibility(pfn, pfn_array,
visibility);
- if (ret)
+ if (ret) {
+ result = false;
goto err_free_pfn_array;
+ }
pfn = 0;
}
}

err_free_pfn_array:
kfree(pfn_array);
- return ret;
+ return result;
+}
+
+static bool hv_vtom_tlb_flush_required(bool private)
+{
+ return true;
}

+static bool hv_vtom_cache_flush_required(void)
+{
+ return false;
+}
+
+void __init hv_vtom_init(void)
+{
+ /*
+ * By design, a VM using vTOM doesn't see the SEV setting,
+ * so SEV initialization is bypassed and sev_status isn't set.
+ * Set it here to indicate a vTOM VM.
+ */
+ sev_status = MSR_AMD64_SNP_VTOM_ENABLED;
+ cc_set_vendor(CC_VENDOR_AMD);
+ cc_set_mask(ms_hyperv.shared_gpa_boundary);
+ physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
+
+ /*
+ * Mark the vTPM address range to be encrypted when it is mapped
+ * by the vTPM driver since it is provided by the paravisor.
+ */
+ ioremap_set_encrypted_range(VTPM_BASE_ADDRESS, PAGE_SIZE);
+
+ x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
+ x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
+ x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
+}
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
/*
* hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
*/
diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h
index 3d98c3a..d2c6a2e 100644
--- a/arch/x86/include/asm/coco.h
+++ b/arch/x86/include/asm/coco.h
@@ -7,7 +7,6 @@
enum cc_vendor {
CC_VENDOR_NONE,
CC_VENDOR_AMD,
- CC_VENDOR_HYPERV,
CC_VENDOR_INTEL,
};

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6d502f3..010768d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -172,18 +172,19 @@ static inline void hv_apic_init(void) {}
int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
struct hv_interrupt_entry *entry);
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);

#ifdef CONFIG_AMD_MEM_ENCRYPT
void hv_ghcb_msr_write(u64 msr, u64 value);
void hv_ghcb_msr_read(u64 msr, u64 *value);
bool hv_ghcb_negotiate_protocol(void);
void hv_ghcb_terminate(unsigned int set, unsigned int reason);
+void hv_vtom_init(void);
#else
static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
+static inline void hv_vtom_init(void) {}
#endif

extern bool hv_isolation_type_snp(void);
@@ -239,11 +240,6 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
}
static inline void hv_set_register(unsigned int reg, u64 value) { }
static inline u64 hv_get_register(unsigned int reg) { return 0; }
-static inline int hv_set_mem_host_visibility(unsigned long addr, int numpages,
- bool visible)
-{
- return -1;
-}
#endif /* CONFIG_HYPERV */


diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 37ff475..6a6e70e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -565,6 +565,7 @@
#define MSR_AMD64_SEV_ENABLED BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
#define MSR_AMD64_SEV_ES_ENABLED BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
#define MSR_AMD64_SEV_SNP_ENABLED BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_SNP_VTOM_ENABLED BIT_ULL(3)

#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 46668e2..cd7f480 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -33,7 +33,6 @@
#include <asm/nmi.h>
#include <clocksource/hyperv_timer.h>
#include <asm/numa.h>
-#include <asm/coco.h>

/* Is Linux running as the root partition? */
bool hv_root_partition;
@@ -325,8 +324,10 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.priv_high & HV_ISOLATION) {
ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
- ms_hyperv.shared_gpa_boundary =
- BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
+
+ if (ms_hyperv.shared_gpa_boundary_active)
+ ms_hyperv.shared_gpa_boundary =
+ BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);

pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
@@ -337,11 +338,6 @@ static void __init ms_hyperv_init_platform(void)
swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
#endif
}
- /* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
- if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
- cc_set_vendor(CC_VENDOR_HYPERV);
- }
}

if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
@@ -410,6 +406,9 @@ static void __init ms_hyperv_init_platform(void)
i8253_clear_counter_on_shutdown = false;

#if IS_ENABLED(CONFIG_HYPERV)
+ if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) ||
+ (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP))
+ hv_vtom_init();
/*
* Setup the hook to get control post apic initialization.
*/
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 356758b..b037954 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2175,9 +2175,6 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)

static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
- if (hv_is_isolation_supported())
- return hv_set_mem_host_visibility(addr, numpages, !enc);
-
if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
return __set_memory_enc_pgtable(addr, numpages, enc);

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 3146710..08bdbe5 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2156,7 +2156,6 @@ void vmbus_device_unregister(struct hv_device *device_obj)
* VMBUS is an acpi enumerated device. Get the information we
* need from DSDT.
*/
-#define VTPM_BASE_ADDRESS 0xfed40000
static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
{
resource_size_t start = 0;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index d55d283..2bb2234 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -26,6 +26,8 @@
#include <asm/ptrace.h>
#include <asm/hyperv-tlfs.h>

+#define VTPM_BASE_ADDRESS 0xfed40000
+
struct ms_hyperv_info {
u32 features;
u32 priv_high;
--
1.8.3.1

2023-01-12 22:24:59

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

In a AMD SEV-SNP VM using vTOM, devices in MMIO space may be provided by
the paravisor and need to be mapped as encrypted. Provide a function
for the hypervisor to specify the address range for such devices.
In __ioremap_caller(), map addresses in this range as encrypted.

Only a single range is supported. If multiple devices need to be
mapped encrypted, the paravisor must place them within the single
contiguous range.

Signed-off-by: Michael Kelley <[email protected]>
---
arch/x86/include/asm/io.h | 2 ++
arch/x86/mm/ioremap.c | 27 ++++++++++++++++++++++++++-
2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e902564..72eb366 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -169,6 +169,8 @@ static inline unsigned int isa_virt_to_bus(volatile void *address)
}
#define isa_bus_to_virt phys_to_virt

+extern void ioremap_set_encrypted_range(resource_size_t addr, unsigned long size);
+
/*
* The default ioremap() behavior is non-cached; if you need something
* else, you probably want one of the following.
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 6453fba..8db5846 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -37,6 +37,10 @@ struct ioremap_desc {
unsigned int flags;
};

+/* Range of "other" addresses to treat as encrypted when remapping */
+resource_size_t other_encrypted_start;
+resource_size_t other_encrypted_end;
+
/*
* Fix up the linear direct mapping of the kernel to avoid cache attribute
* conflicts.
@@ -108,14 +112,35 @@ static unsigned int __ioremap_check_encrypted(struct resource *res)
}

/*
+ * Allow a hypervisor to specify an additional range of addresses to
+ * treat as encrypted when remapping.
+ */
+void ioremap_set_encrypted_range(resource_size_t addr, unsigned long size)
+{
+ other_encrypted_start = addr;
+ other_encrypted_end = addr + size - 1;
+}
+
+/*
* The EFI runtime services data area is not covered by walk_mem_res(), but must
- * be mapped encrypted when SEV is active.
+ * be mapped encrypted when SEV is active. Also check the hypervisor specified
+ * "other" address range to treat as encrypted.
*/
static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *desc)
{
if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;

+ /*
+ * Check for an address within the "other" encrypted address range. If such
+ * a range is set, it must include the entire space used by the device,
+ * so we don't need to deal with a partial fit.
+ */
+ if ((addr >= other_encrypted_start) && (addr <= other_encrypted_end)) {
+ desc->flags |= IORES_MAP_ENCRYPTED;
+ return;
+ }
+
if (!IS_ENABLED(CONFIG_EFI))
return;

--
1.8.3.1

2023-01-12 22:25:17

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 14/14] PCI: hv: Enable PCI pass-thru devices in Confidential VMs

For PCI pass-thru devices in a Confidential VM, Hyper-V requires
that PCI config space be accessed via hypercalls. In normal VMs,
config space accesses are trapped to the Hyper-V host and emulated.
But in a confidential VM, the host can't access guest memory to
decode the instruction for emulation, so an explicit hypercall must
be used.

Update the PCI config space access functions to use the hypercalls
when such use is indicated by Hyper-V flags. Also, set the flag to
allow the Hyper-V PCI driver to be loaded and used in a Confidential
VM (a.k.a., "Isolation VM"). The driver has previously been hardened
against a malicious Hyper-V host[1].

[1] https://lore.kernel.org/all/[email protected]/

Co-developed-by: Dexuan Cui <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
---
drivers/hv/channel_mgmt.c | 2 +-
drivers/pci/controller/pci-hyperv.c | 168 ++++++++++++++++++++++--------------
2 files changed, 105 insertions(+), 65 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index cc23b90..007f26d 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -67,7 +67,7 @@
{ .dev_type = HV_PCIE,
HV_PCIE_GUID,
.perf_device = false,
- .allowed_in_isolated = false,
+ .allowed_in_isolated = true,
},

/* Synthetic Frame Buffer */
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index bbe6e36..f874f89 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -514,6 +514,7 @@ struct hv_pcibus_device {

/* Highest slot of child device with resources allocated */
int wslot_res_allocated;
+ bool use_calls; /* Use hypercalls to access mmio cfg space */

/* hypercall arg, must not cross page boundary */
struct hv_retarget_device_interrupt retarget_msi_interrupt_params;
@@ -1123,8 +1124,10 @@ static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where,
int size, u32 *val)
{
+ struct hv_pcibus_device *hbus = hpdev->hbus;
+ struct device *dev = &hbus->hdev->device;
+ int offset = where + CFG_PAGE_OFFSET;
unsigned long flags;
- void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET + where;

/*
* If the attempt is to read the IDs or the ROM BAR, simulate that.
@@ -1152,56 +1155,79 @@ static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where,
*/
*val = 0;
} else if (where + size <= CFG_PAGE_SIZE) {
- spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
- /* Choose the function to be read. (See comment above) */
- writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
- /* Make sure the function was chosen before we start reading. */
- mb();
- /* Read from that function's config space. */
- switch (size) {
- case 1:
- *val = readb(addr);
- break;
- case 2:
- *val = readw(addr);
- break;
- default:
- *val = readl(addr);
- break;
+
+ spin_lock_irqsave(&hbus->config_lock, flags);
+ if (hbus->use_calls) {
+ phys_addr_t addr = hbus->mem_config->start + offset;
+
+ hv_pci_write_mmio(dev, hbus->mem_config->start, 4,
+ hpdev->desc.win_slot.slot);
+ hv_pci_read_mmio(dev, addr, size, val);
+ } else {
+ void __iomem *addr = hbus->cfg_addr + offset;
+
+ /* Choose the function to be read. (See comment above) */
+ writel(hpdev->desc.win_slot.slot, hbus->cfg_addr);
+ /* Make sure the function was chosen before reading. */
+ mb();
+ /* Read from that function's config space. */
+ switch (size) {
+ case 1:
+ *val = readb(addr);
+ break;
+ case 2:
+ *val = readw(addr);
+ break;
+ default:
+ *val = readl(addr);
+ break;
+ }
+ /*
+ * Make sure the read was done before we release the
+ * spinlock allowing consecutive reads/writes.
+ */
+ mb();
}
- /*
- * Make sure the read was done before we release the spinlock
- * allowing consecutive reads/writes.
- */
- mb();
- spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+ spin_unlock_irqrestore(&hbus->config_lock, flags);
} else {
- dev_err(&hpdev->hbus->hdev->device,
- "Attempt to read beyond a function's config space.\n");
+ dev_err(dev, "Attempt to read beyond a function's config space.\n");
}
}

static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev)
{
+ struct hv_pcibus_device *hbus = hpdev->hbus;
+ struct device *dev = &hbus->hdev->device;
+ u32 val;
u16 ret;
unsigned long flags;
- void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET +
- PCI_VENDOR_ID;

- spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
+ spin_lock_irqsave(&hbus->config_lock, flags);

- /* Choose the function to be read. (See comment above) */
- writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
- /* Make sure the function was chosen before we start reading. */
- mb();
- /* Read from that function's config space. */
- ret = readw(addr);
- /*
- * mb() is not required here, because the spin_unlock_irqrestore()
- * is a barrier.
- */
+ if (hbus->use_calls) {
+ phys_addr_t addr = hbus->mem_config->start +
+ CFG_PAGE_OFFSET + PCI_VENDOR_ID;
+
+ hv_pci_write_mmio(dev, hbus->mem_config->start, 4,
+ hpdev->desc.win_slot.slot);
+ hv_pci_read_mmio(dev, addr, 2, &val);
+ ret = val; /* Truncates to 16 bits */
+ } else {
+ void __iomem *addr = hbus->cfg_addr + CFG_PAGE_OFFSET +
+ PCI_VENDOR_ID;
+ /* Choose the function to be read. (See comment above) */
+ writel(hpdev->desc.win_slot.slot, hbus->cfg_addr);
+ /* Make sure the function was chosen before we start reading. */
+ mb();
+ /* Read from that function's config space. */
+ ret = readw(addr);
+ /*
+ * mb() is not required here, because the
+ * spin_unlock_irqrestore() is a barrier.
+ */
+ }

- spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+ spin_unlock_irqrestore(&hbus->config_lock, flags);

return ret;
}
@@ -1216,39 +1242,51 @@ static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev)
static void _hv_pcifront_write_config(struct hv_pci_dev *hpdev, int where,
int size, u32 val)
{
+ struct hv_pcibus_device *hbus = hpdev->hbus;
+ struct device *dev = &hbus->hdev->device;
+ int offset = where + CFG_PAGE_OFFSET;
unsigned long flags;
- void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET + where;

if (where >= PCI_SUBSYSTEM_VENDOR_ID &&
where + size <= PCI_CAPABILITY_LIST) {
/* SSIDs and ROM BARs are read-only */
} else if (where >= PCI_COMMAND && where + size <= CFG_PAGE_SIZE) {
- spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
- /* Choose the function to be written. (See comment above) */
- writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
- /* Make sure the function was chosen before we start writing. */
- wmb();
- /* Write to that function's config space. */
- switch (size) {
- case 1:
- writeb(val, addr);
- break;
- case 2:
- writew(val, addr);
- break;
- default:
- writel(val, addr);
- break;
+ spin_lock_irqsave(&hbus->config_lock, flags);
+
+ if (hbus->use_calls) {
+ phys_addr_t addr = hbus->mem_config->start + offset;
+
+ hv_pci_write_mmio(dev, hbus->mem_config->start, 4,
+ hpdev->desc.win_slot.slot);
+ hv_pci_write_mmio(dev, addr, size, val);
+ } else {
+ void __iomem *addr = hbus->cfg_addr + offset;
+
+ /* Choose the function to write. (See comment above) */
+ writel(hpdev->desc.win_slot.slot, hbus->cfg_addr);
+ /* Make sure the function was chosen before writing. */
+ wmb();
+ /* Write to that function's config space. */
+ switch (size) {
+ case 1:
+ writeb(val, addr);
+ break;
+ case 2:
+ writew(val, addr);
+ break;
+ default:
+ writel(val, addr);
+ break;
+ }
+ /*
+ * Make sure the write was done before we release the
+ * spinlock allowing consecutive reads/writes.
+ */
+ mb();
}
- /*
- * Make sure the write was done before we release the spinlock
- * allowing consecutive reads/writes.
- */
- mb();
- spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+ spin_unlock_irqrestore(&hbus->config_lock, flags);
} else {
- dev_err(&hpdev->hbus->hdev->device,
- "Attempt to write beyond a function's config space.\n");
+ dev_err(dev, "Attempt to write beyond a function's config space.\n");
}
}

@@ -3627,6 +3665,7 @@ static int hv_pci_probe(struct hv_device *hdev,
hbus->bridge->domain_nr = dom;
#ifdef CONFIG_X86
hbus->sysdata.domain = dom;
+ hbus->use_calls = !!(ms_hyperv.hints & HV_X64_USE_MMIO_HYPERCALLS);
#elif defined(CONFIG_ARM64)
/*
* Set the PCI bus parent to be the corresponding VMbus
@@ -3636,6 +3675,7 @@ static int hv_pci_probe(struct hv_device *hdev,
* information to devices created on the bus.
*/
hbus->sysdata.parent = hdev->device.parent;
+ hbus->use_calls = false;
#endif

hbus->hdev = hdev;
--
1.8.3.1

2023-01-12 22:25:29

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v5 04/14] x86/mm: Handle decryption/re-encryption of bss_decrypted consistently

sme_postprocess_startup() decrypts the bss_decrypted section when
sme_me_mask is non-zero.

mem_encrypt_free_decrypted_mem() re-encrypts the unused portion based
on CC_ATTR_MEM_ENCRYPT.

In a Hyper-V guest VM using vTOM, these conditions are not equivalent
as sme_me_mask is always zero when using vTOM. Consequently,
mem_encrypt_free_decrypted_mem() attempts to re-encrypt memory that was
never decrypted.

So check sme_me_mask in mem_encrypt_free_decrypted_mem() too.

Hyper-V guests using vTOM don't need the bss_decrypted section to be
decrypted, so skipping the decryption/re-encryption doesn't cause a
problem.

Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
arch/x86/mm/mem_encrypt_amd.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 9c4d8db..e0b51c0 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -513,10 +513,14 @@ void __init mem_encrypt_free_decrypted_mem(void)
npages = (vaddr_end - vaddr) >> PAGE_SHIFT;

/*
- * The unused memory range was mapped decrypted, change the encryption
- * attribute from decrypted to encrypted before freeing it.
+ * If the unused memory range was mapped decrypted, change the encryption
+ * attribute from decrypted to encrypted before freeing it. Base the
+ * re-encryption on the same condition used for the decryption in
+ * sme_postprocess_startup(). Higher level abstractions, such as
+ * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM
+ * using vTOM, where sme_me_mask is always zero.
*/
- if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
+ if (sme_me_mask) {
r = set_memory_encrypted(vaddr, npages);
if (r) {
pr_warn("failed to free unused decrypted pages\n");
--
1.8.3.1

2023-01-20 20:28:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Jan 12, 2023 at 01:42:25PM -0800, Michael Kelley wrote:
> In a AMD SEV-SNP VM using vTOM, devices in MMIO space may be provided by
> the paravisor and need to be mapped as encrypted. Provide a function
> for the hypervisor to specify the address range for such devices.
> In __ioremap_caller(), map addresses in this range as encrypted.
>
> Only a single range is supported. If multiple devices need to be
> mapped encrypted, the paravisor must place them within the single
> contiguous range.

This already is starting to sound insufficient and hacky. And it also makes
CC_ATTR_ACCESS_IOAPIC_ENCRYPTED insufficient either.

So, the situation we have is, we're a SEV-SNP VM using vTOM. Which means,
MSR_AMD64_SEV[3] = 1. Or SEV_FEATURES[1], alternatively - same thing.

That MSR cannot be intercepted by the HV and we use it extensively in Linux when
it runs as a SEV-* guest. And I had asked this before, during review, but why
aren't you checking this bit above when you wanna do vTOM-specific work?

Because then you can do that check and

1. map the IO-APIC encrypted
2. map MMIO space of devices from the driver encrypted too
3. ...

and so on.

And you won't need those other, not as nice things...

Hmmm.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-01-21 04:44:46

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Friday, January 20, 2023 12:15 PM
>
> On Thu, Jan 12, 2023 at 01:42:25PM -0800, Michael Kelley wrote:
> > In a AMD SEV-SNP VM using vTOM, devices in MMIO space may be provided by
> > the paravisor and need to be mapped as encrypted. Provide a function
> > for the hypervisor to specify the address range for such devices.
> > In __ioremap_caller(), map addresses in this range as encrypted.
> >
> > Only a single range is supported. If multiple devices need to be
> > mapped encrypted, the paravisor must place them within the single
> > contiguous range.
>
> This already is starting to sound insufficient and hacky. And it also makes
> CC_ATTR_ACCESS_IOAPIC_ENCRYPTED insufficient either.
>
> So, the situation we have is, we're a SEV-SNP VM using vTOM. Which means,
> MSR_AMD64_SEV[3] = 1. Or SEV_FEATURES[1], alternatively - same thing.
>
> That MSR cannot be intercepted by the HV and we use it extensively in Linux when
> it runs as a SEV-* guest. And I had asked this before, during review, but why
> aren't you checking this bit above when you wanna do vTOM-specific work?

Per the commit message for 009767dbf42a, it's not safe to read MSR_AMD64_SEV
on all implementations of AMD processors. CC_ATTR_ACCESS_IOAPIC_ENCRYPTED
is used in io_apic_set_fixmap(), which is called on all Intel/AMD systems without
any qualifications. Even if rdmsrl_safe() works at this point in boot, I'm a little
leery of reading a new MSR in a path that essentially every x86 bare-metal system
or VM takes during boot. Or am I being overly paranoid about old processor
models or hypervisor versions potentially doing something weird?

But in any case, the whole point of cc_platform_has() is to provide a level of
abstraction from the hardware registers, and it's fully safe to use on every x86
bare-metal system or VM. And while I don't anticipate it now, maybe there's
some future scheme where a paravisor-like entity could be used with Intel
TDX. It seems like using a cc_platform_has() abstraction is better than directly
accessing the MSR.

>
> Because then you can do that check and
>
> 1. map the IO-APIC encrypted
> 2. map MMIO space of devices from the driver encrypted too
> 3. ...

My resolution of the TPM driver issue is admittedly a work-around. I think
of it as temporary in anticipation of future implementations of PCIe TDISP
hardware, which allows PCI devices to DMA directly into guest encrypted
memory. TDISP also places the device's BAR values in an encrypted portion
of the GPA space. Assuming TDISP hardware comes along in the next couple
of years, Linux will need a robust way to deal with a mix of PCI devices
being in unencrypted and encrypted GPA space. I don't know how a
specific device will be mapped correctly, but I hope it can happen in the
generic PCI code, and not by modifying each device driver. It's probably
premature to build that robust mechanism now, but when it comes, my
work-around would be replaced.

With all that in mind, I don't want to modify the TPM driver to special-case
its MMIO space being encrypted. FWIW, the TPM driver today uses
devm_ioremap_resource() to do the mapping, which defaults to mapping
decrypted except for the exceptions implemented in __ioremap_caller().
There's no devm_* option for specifying encrypted. Handling decrypted
vs. encrypted in the driver would require extending the driver API to
provide an "encrypted" option, and that seems like going in the wrong
long-term direction.

Finally, I would have liked to handle the IO-APIC and the vTPM the same
way. But the IO-APIC doesn't use the normal ioremap_* functions because
it's done early in boot via the fixmap. I didn't see a reasonable way to
unify them.

Anyway, that's my thoughts. I'm certainly open if someone has a better
way to do it.

Michael

>
> and so on.
>
> And you won't need those other, not as nice things...
>
> Hmmm.

2023-01-25 14:56:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Sat, Jan 21, 2023 at 04:10:23AM +0000, Michael Kelley (LINUX) wrote:
> Per the commit message for 009767dbf42a, it's not safe to read MSR_AMD64_SEV
> on all implementations of AMD processors.

1. Why does that matter to you? This is Hygon.

2. The MSR access is behind CPUID check.

> CC_ATTR_ACCESS_IOAPIC_ENCRYPTED is used in io_apic_set_fixmap(), which is
> called on all Intel/AMD systems without any qualifications. Even if
> rdmsrl_safe() works at this point in boot, I'm a little leery of reading a new
> MSR in a path that essentially every x86 bare-metal system or VM takes during
> boot.

You read the MSR once and cache it. sme_enable() already does that. I don't see
what the problem is.

> Or am I being overly paranoid about old processor models or hypervisor
> versions potentially doing something weird?

Yes, you are. :)

If they're doing something weird which is out of spec, then we'll deal with them
later.

> But in any case, the whole point of cc_platform_has() is to provide a level of
> abstraction from the hardware registers, and it's fully safe to use on every x86
> bare-metal system or VM. And while I don't anticipate it now, maybe there's
> some future scheme where a paravisor-like entity could be used with Intel
> TDX. It seems like using a cc_platform_has() abstraction is better than directly
> accessing the MSR.

That's fine but we're talking about this particular implementation and that is
vTOM-like with the address space split. If TDX does address space split later,
we can accomodate it too. (Although I think they are not interested in this).

And if you really want to use cc_platform_has(), we could do

cc_platform_has(CC_ADDRESS_SPACE_SPLIT_ON_A_PARAVISOR)

or something with a better name.

The point is, you want to do things which are specific for this particular
implementation. If so, then check for it. Do not do hacky things which get the
work done for your case but can very easily be misused by others.

> My resolution of the TPM driver issue is admittedly a work-around. I think
> of it as temporary in anticipation of future implementations of PCIe TDISP
> hardware, which allows PCI devices to DMA directly into guest encrypted
> memory.

Yap, that sounds real nice.

> TDISP also places the device's BAR values in an encrypted portion
> of the GPA space. Assuming TDISP hardware comes along in the next couple
> of years, Linux will need a robust way to deal with a mix of PCI devices
> being in unencrypted and encrypted GPA space. I don't know how a
> specific device will be mapped correctly, but I hope it can happen in the
> generic PCI code, and not by modifying each device driver.

I guess those devices would advertize that capability somehow so that code can
query it and act accordingly.

> It's probably premature to build that robust mechanism now, but when it comes,
> my work-around would be replaced.

It would be replaced if it doesn't have any users. By the looks of it, it'll
soon grow others and then good luck removing it.

> With all that in mind, I don't want to modify the TPM driver to special-case
> its MMIO space being encrypted. FWIW, the TPM driver today uses
> devm_ioremap_resource() to do the mapping, which defaults to mapping
> decrypted except for the exceptions implemented in __ioremap_caller().
> There's no devm_* option for specifying encrypted.

You mean, it is hard to add a DEVM_IOREMAP_ENCRYPTED type which will have
__devm_ioremap() call ioremap_encrypted()?

Or define a IORESOURCE_ENCRYPTED and pass it through the ioresource flags?

Why is that TPM driver so precious that it can be touched and the arch code
would have to accept hacks?

> Handling decrypted vs. encrypted in the driver would require extending the
> driver API to provide an "encrypted" option, and that seems like going in the
> wrong long-term direction.

Sorry, I can't follow here.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-02 05:51:30

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Wednesday, January 25, 2023 6:56 AM
>
> On Sat, Jan 21, 2023 at 04:10:23AM +0000, Michael Kelley (LINUX) wrote:

[snip]

>
> > But in any case, the whole point of cc_platform_has() is to provide a level of
> > abstraction from the hardware registers, and it's fully safe to use on every x86
> > bare-metal system or VM. And while I don't anticipate it now, maybe there's
> > some future scheme where a paravisor-like entity could be used with Intel
> > TDX. It seems like using a cc_platform_has() abstraction is better than directly
> > accessing the MSR.
>
> That's fine but we're talking about this particular implementation and that is
> vTOM-like with the address space split. If TDX does address space split later,
> we can accomodate it too. (Although I think they are not interested in this).
>
> And if you really want to use cc_platform_has(), we could do
>
> cc_platform_has(CC_ADDRESS_SPACE_SPLIT_ON_A_PARAVISOR)
>
> or something with a better name.

I do think it makes sense to use the cc_platform_has() abstraction. It's
then a question of agreeing on how to name the attribute. We've
discussed various approaches in different versions of this patch series:

v1 & v2: CC_ATTR_HAS_PARAVISOR
v3: CC_ATTR_EMULATED_IOAPIC
v4 & v5: CC_ATTR_ACCESS_IOAPIC_ENCRYPTED

I could do:
1. CC_ATTR_PARAVISOR_SPLIT_ADDRESS_SPACE, which is similar to
what I had for v1 & v2. At the time, somebody commented that
this might be a bit too general.
2. Keep CC_ATTR_ACCESS_IOAPIC_ENCRYPTED and add
CC_ATTR_ACCESS_TPM_ENCRYPTED, which would decouple them
3. CC_ATTR_ACCESS_IOAPIC_AND_TPM_ENCRYPTED, which is very
narrow and specific.

I have weak preference for #1 above, but I could go with any of them.
What's your preference?

> > My resolution of the TPM driver issue is admittedly a work-around. I think
> > of it as temporary in anticipation of future implementations of PCIe TDISP
> > hardware, which allows PCI devices to DMA directly into guest encrypted
> > memory.
>
> Yap, that sounds real nice.
>
> > TDISP also places the device's BAR values in an encrypted portion
> > of the GPA space. Assuming TDISP hardware comes along in the next couple
> > of years, Linux will need a robust way to deal with a mix of PCI devices
> > being in unencrypted and encrypted GPA space. I don't know how a
> > specific device will be mapped correctly, but I hope it can happen in the
> > generic PCI code, and not by modifying each device driver.
>
> I guess those devices would advertize that capability somehow so that code can
> query it and act accordingly.
>
> > It's probably premature to build that robust mechanism now, but when it comes,
> > my work-around would be replaced.
>
> It would be replaced if it doesn't have any users. By the looks of it, it'll
> soon grow others and then good luck removing it.
>
> > With all that in mind, I don't want to modify the TPM driver to special-case
> > its MMIO space being encrypted. FWIW, the TPM driver today uses
> > devm_ioremap_resource() to do the mapping, which defaults to mapping
> > decrypted except for the exceptions implemented in __ioremap_caller().
> > There's no devm_* option for specifying encrypted.
>
> You mean, it is hard to add a DEVM_IOREMAP_ENCRYPTED type which will have
> __devm_ioremap() call ioremap_encrypted()?
>
> Or define a IORESOURCE_ENCRYPTED and pass it through the ioresource flags?
>
> Why is that TPM driver so precious that it can be touched and the arch code
> would have to accept hacks?
>
> > Handling decrypted vs. encrypted in the driver would require extending the
> > driver API to provide an "encrypted" option, and that seems like going in the
> > wrong long-term direction.
>
> Sorry, I can't follow here.
>

For v6 of the patch series, I've coded devm_ioremap_resource_enc() to call
__devm_ioremap(), which then calls ioremap_encrypted(). I've updated the
TPM driver to use cc_platform_has() with whatever attribute name we agree
on to decide between devm_ioremap_resource_enc() and
devm_ioremap_resource().

If this approach is OK with the TPM driver maintainers, I'm good with it.
More robust handling of a mix of encrypted and decrypted devices can get
sorted out later.

Michael

2023-02-07 12:41:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Feb 02, 2023 at 05:49:44AM +0000, Michael Kelley (LINUX) wrote:
> I could do:
> 1. CC_ATTR_PARAVISOR_SPLIT_ADDRESS_SPACE, which is similar to
> what I had for v1 & v2. At the time, somebody commented that
> this might be a bit too general.
> 2. Keep CC_ATTR_ACCESS_IOAPIC_ENCRYPTED and add
> CC_ATTR_ACCESS_TPM_ENCRYPTED, which would decouple them
> 3. CC_ATTR_ACCESS_IOAPIC_AND_TPM_ENCRYPTED, which is very
> narrow and specific.
>
> I have weak preference for #1 above, but I could go with any of them.
> What's your preference?

Either 1. but a shorter name or something which works with the TDX side
too.

Or are there no similar TDX solutions planned where the guest runs
unmodified and under a paravisor?

> For v6 of the patch series, I've coded devm_ioremap_resource_enc() to call
> __devm_ioremap(), which then calls ioremap_encrypted(). I've updated the
> TPM driver to use cc_platform_has() with whatever attribute name we agree
> on to decide between devm_ioremap_resource_enc() and
> devm_ioremap_resource().
>
> If this approach is OK with the TPM driver maintainers, I'm good with it.
> More robust handling of a mix of encrypted and decrypted devices can get
> sorted out later.

Makes sense to me...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-07 19:01:36

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Tuesday, February 7, 2023 4:41 AM
>
> On Thu, Feb 02, 2023 at 05:49:44AM +0000, Michael Kelley (LINUX) wrote:
> > I could do:
> > 1. CC_ATTR_PARAVISOR_SPLIT_ADDRESS_SPACE, which is similar to
> > what I had for v1 & v2. At the time, somebody commented that
> > this might be a bit too general.
> > 2. Keep CC_ATTR_ACCESS_IOAPIC_ENCRYPTED and add
> > CC_ATTR_ACCESS_TPM_ENCRYPTED, which would decouple them
> > 3. CC_ATTR_ACCESS_IOAPIC_AND_TPM_ENCRYPTED, which is very
> > narrow and specific.
> >
> > I have weak preference for #1 above, but I could go with any of them.
> > What's your preference?
>
> Either 1. but a shorter name or something which works with the TDX side
> too.

Unless there are objections, I'll go with CC_ATTR_PARAVISOR_DEVICES,
which is shorter. The full details of the meaning will be in a comment
where this is defined with all the other CC_ATTR_* values.

>
> Or are there no similar TDX solutions planned where the guest runs
> unmodified and under a paravisor?

The TDX plans are still being sorted out. But if we end up with such
an approach, CC_ATTR_PARAVISOR_DEVICES will be correct for TDX
also.

Michael

>
> > For v6 of the patch series, I've coded devm_ioremap_resource_enc() to call
> > __devm_ioremap(), which then calls ioremap_encrypted(). I've updated the
> > TPM driver to use cc_platform_has() with whatever attribute name we agree
> > on to decide between devm_ioremap_resource_enc() and
> > devm_ioremap_resource().
> >
> > If this approach is OK with the TPM driver maintainers, I'm good with it.
> > More robust handling of a mix of encrypted and decrypted devices can get
> > sorted out later.
>
> Makes sense to me...
>
> Thx.

2023-02-07 19:33:49

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Tue, Feb 07, 2023 at 07:01:25PM +0000, Michael Kelley (LINUX) wrote:
> Unless there are objections, I'll go with CC_ATTR_PARAVISOR_DEVICES,

What does "DEVICES" mean in this context?

You need to think about !virt people too who are already confused by the
word "paravisor". :-)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-07 19:48:18

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Tuesday, February 7, 2023 11:33 AM
>
> On Tue, Feb 07, 2023 at 07:01:25PM +0000, Michael Kelley (LINUX) wrote:
> > Unless there are objections, I'll go with CC_ATTR_PARAVISOR_DEVICES,
>
> What does "DEVICES" mean in this context?
>
> You need to think about !virt people too who are already confused by the
> word "paravisor". :-)
>

Maybe I misunderstood your previous comment about "Either 1". We can
avoid "PARAVISOR" entirely by going with two attributes:

CC_ATTR_ACCESS_IOAPIC_ENCRYPTED
CC_ATTR_ACCESS_TPM_ENCRYPTED

These are much more specific, and relatively short, and having two allows
decoupling the handling of the IO-APIC and TPM. Combining into the single

CC_ATTR_ACCESS_IOAPIC_AND_TPM_ENCRYPTED

also works but is longer.

Capturing the full meaning in the string names is probably impossible.
Referring to the comment for the definition will be required for a full
understanding.

Michael




2023-02-07 19:54:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Tue, Feb 07, 2023 at 07:48:06PM +0000, Michael Kelley (LINUX) wrote:
> From: Borislav Petkov <[email protected]> Sent: Tuesday, February 7, 2023 11:33 AM
> >
> > On Tue, Feb 07, 2023 at 07:01:25PM +0000, Michael Kelley (LINUX) wrote:
> > > Unless there are objections, I'll go with CC_ATTR_PARAVISOR_DEVICES,
> >
> > What does "DEVICES" mean in this context?
> >
> > You need to think about !virt people too who are already confused by the
> > word "paravisor". :-)
> >
>
> Maybe I misunderstood your previous comment about "Either 1". We can
> avoid "PARAVISOR" entirely by going with two attributes:

No, I'm fine with CC_ATTR_PARAVISOR. Why would you have to have
CC_ATTR_PARAVISOR_DEVICES? I.e., the string "_DEVICES" appended after
"PARAVISOR". Isn't CC_ATTR_PARAVISOR enough?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-07 19:57:55

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Tuesday, February 7, 2023 11:55 AM
>
> On Tue, Feb 07, 2023 at 07:48:06PM +0000, Michael Kelley (LINUX) wrote:
> > From: Borislav Petkov <[email protected]> Sent: Tuesday, February 7, 2023 11:33 AM
> > >
> > > On Tue, Feb 07, 2023 at 07:01:25PM +0000, Michael Kelley (LINUX) wrote:
> > > > Unless there are objections, I'll go with CC_ATTR_PARAVISOR_DEVICES,
> > >
> > > What does "DEVICES" mean in this context?
> > >
> > > You need to think about !virt people too who are already confused by the
> > > word "paravisor". :-)
> > >
> >
> > Maybe I misunderstood your previous comment about "Either 1". We can
> > avoid "PARAVISOR" entirely by going with two attributes:
>
> No, I'm fine with CC_ATTR_PARAVISOR. Why would you have to have
> CC_ATTR_PARAVISOR_DEVICES? I.e., the string "_DEVICES" appended after
> "PARAVISOR". Isn't CC_ATTR_PARAVISOR enough?
>

Works for me. :-)

Michael

2023-02-08 00:18:54

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Tuesday, February 7, 2023 11:55 AM
>
> On Tue, Feb 07, 2023 at 07:48:06PM +0000, Michael Kelley (LINUX) wrote:
> > From: Borislav Petkov <[email protected]> Sent: Tuesday, February 7, 2023 11:33 AM
> > >
> > > On Tue, Feb 07, 2023 at 07:01:25PM +0000, Michael Kelley (LINUX) wrote:
> > > > Unless there are objections, I'll go with CC_ATTR_PARAVISOR_DEVICES,
> > >
> > > What does "DEVICES" mean in this context?
> > >
> > > You need to think about !virt people too who are already confused by the
> > > word "paravisor". :-)
> > >
> >
> > Maybe I misunderstood your previous comment about "Either 1". We can
> > avoid "PARAVISOR" entirely by going with two attributes:
>
> No, I'm fine with CC_ATTR_PARAVISOR. Why would you have to have
> CC_ATTR_PARAVISOR_DEVICES? I.e., the string "_DEVICES" appended after
> "PARAVISOR". Isn't CC_ATTR_PARAVISOR enough?
>

Dave --

In v2 of this patch series, you had concerns about CC_ATTR_PARAVISOR being too
generic. [1] After some back-and-forth discussion in this thread, Boris is back to
preferring it. Can you live with CC_ATTR_PARAVISOR? Just trying to reach
consensus ...

Michael

[1] https://lore.kernel.org/linux-hyperv/Y258BO8ohVtVZvSH@liuwe-devbox-debian-v2/T/#m593853d8094453ad3f1a5552dad995ccc6c019b2

2023-02-08 15:13:50

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/7/23 16:18, Michael Kelley (LINUX) wrote:
> In v2 of this patch series, you had concerns about CC_ATTR_PARAVISOR being too
> generic. [1] After some back-and-forth discussion in this thread, Boris is back to
> preferring it. Can you live with CC_ATTR_PARAVISOR? Just trying to reach
> consensus ...

I still think it's too generic. Even the comment was trying to be too
generic:

> + /**
> + * @CC_ATTR_HAS_PARAVISOR: Guest VM is running with a paravisor
> + *
> + * The platform/OS is running as a guest/virtual machine with
> + * a paravisor in VMPL0. Having a paravisor affects things
> + * like whether the I/O APIC is emulated and operates in the
> + * encrypted or decrypted portion of the guest physical address
> + * space.
> + *
> + * Examples include Hyper-V SEV-SNP guests using vTOM.
> + */
> + CC_ATTR_HAS_PARAVISOR,

This doesn't help me figure out when I should use CC_ATTR_HAS_PARAVISOR
really at all. It "operates in the encrypted or decrypted portion..."
Which one is it? Should I be adding or removing encryption on the
mappings for paravisors?

That's opposed to:

> + /**
> + * @CC_ATTR_ACCESS_IOAPIC_ENCRYPTED: Guest VM IO-APIC is encrypted
> + *
> + * The platform/OS is running as a guest/virtual machine with
> + * an IO-APIC that is emulated by a paravisor running in the
> + * guest VM context. As such, the IO-APIC is accessed in the
> + * encrypted portion of the guest physical address space.
> + *
> + * Examples include Hyper-V SEV-SNP guests using vTOM.
> + */
> + CC_ATTR_ACCESS_IOAPIC_ENCRYPTED,

Which makes this code almost stupidly obvious:

> - flags = pgprot_decrypted(flags);
> + if (!cc_platform_has(CC_ATTR_ACCESS_IOAPIC_ENCRYPTED))
> + flags = pgprot_decrypted(flags);

"Oh, if it's access is not encrypted, then get the decrypted version of
the flags."

Compare that to:

if (!cc_platform_has(CC_ATTR_PARAVISOR))
flags = pgprot_decrypted(flags);

Which is a big fat WTF. Because a paravisor "operates in the encrypted
or decrypted portion..." So is this if() condition correct or inverted?
It's utterly impossible to tell because of how generic the option is.

The only way to make sense of the generic thing is to do:

/* Paravisors have a decrypted IO-APIC mapping: */
if (!cc_platform_has(CC_ATTR_PARAVISOR))
flags = pgprot_decrypted(flags);

at every site to state the assumption and make the connection between
paravisors and the behavior. If you want to go do _that_, then fine by
me. But, at that point, the naming is pretty worthless because you
could also have said "goldfish" instead of "paravisor" and it makes an
equal amount of sense:

/* Goldfish have a decrypted IO-APIC mapping: */
if (!cc_platform_has(CC_ATTR_GOLDFISH))
flags = pgprot_decrypted(flags);

I get it, naming is hard.

2023-02-08 17:25:28

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/7/23 04:41, Borislav Petkov wrote:
> Or are there no similar TDX solutions planned where the guest runs
> unmodified and under a paravisor?

I actually don't think paravisors make *ANY* sense for Linux. If you
have to modify the guest, then just modify it to talk to the hypervisor
directly. This code is... modifying the guest. What does putting a
paravisor in the middle do for you?

It might help with binary drivers, but we don't do upstream kernel work
to make silly binary Linux drivers happy.

So, no, there's no similar TDX solutions planned, at least for Linux
guests. Unless I missed the memo. Kirill?

2023-02-09 17:29:21

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Dave Hansen <[email protected]> Sent: Wednesday, February 8, 2023 7:10 AM
>
> On 2/7/23 16:18, Michael Kelley (LINUX) wrote:
> > In v2 of this patch series, you had concerns about CC_ATTR_PARAVISOR being too
> > generic. [1] After some back-and-forth discussion in this thread, Boris is back to
> > preferring it. Can you live with CC_ATTR_PARAVISOR? Just trying to reach
> > consensus ...
>
> I still think it's too generic. Even the comment was trying to be too
> generic:
>
> > + /**
> > + * @CC_ATTR_HAS_PARAVISOR: Guest VM is running with a paravisor
> > + *
> > + * The platform/OS is running as a guest/virtual machine with
> > + * a paravisor in VMPL0. Having a paravisor affects things
> > + * like whether the I/O APIC is emulated and operates in the
> > + * encrypted or decrypted portion of the guest physical address
> > + * space.
> > + *
> > + * Examples include Hyper-V SEV-SNP guests using vTOM.
> > + */
> > + CC_ATTR_HAS_PARAVISOR,
>
> This doesn't help me figure out when I should use CC_ATTR_HAS_PARAVISOR
> really at all. It "operates in the encrypted or decrypted portion..."
> Which one is it? Should I be adding or removing encryption on the
> mappings for paravisors?
>
> That's opposed to:
>
> > + /**
> > + * @CC_ATTR_ACCESS_IOAPIC_ENCRYPTED: Guest VM IO-APIC is encrypted
> > + *
> > + * The platform/OS is running as a guest/virtual machine with
> > + * an IO-APIC that is emulated by a paravisor running in the
> > + * guest VM context. As such, the IO-APIC is accessed in the
> > + * encrypted portion of the guest physical address space.
> > + *
> > + * Examples include Hyper-V SEV-SNP guests using vTOM.
> > + */
> > + CC_ATTR_ACCESS_IOAPIC_ENCRYPTED,
>
> Which makes this code almost stupidly obvious:
>
> > - flags = pgprot_decrypted(flags);
> > + if (!cc_platform_has(CC_ATTR_ACCESS_IOAPIC_ENCRYPTED))
> > + flags = pgprot_decrypted(flags);
>
> "Oh, if it's access is not encrypted, then get the decrypted version of
> the flags."
>
> Compare that to:
>
> if (!cc_platform_has(CC_ATTR_PARAVISOR))
> flags = pgprot_decrypted(flags);
>
> Which is a big fat WTF. Because a paravisor "operates in the encrypted
> or decrypted portion..." So is this if() condition correct or inverted?
> It's utterly impossible to tell because of how generic the option is.
>
> The only way to make sense of the generic thing is to do:
>
> /* Paravisors have a decrypted IO-APIC mapping: */
> if (!cc_platform_has(CC_ATTR_PARAVISOR))
> flags = pgprot_decrypted(flags);
>
> at every site to state the assumption and make the connection between
> paravisors and the behavior. If you want to go do _that_, then fine by
> me. But, at that point, the naming is pretty worthless because you
> could also have said "goldfish" instead of "paravisor" and it makes an
> equal amount of sense:
>
> /* Goldfish have a decrypted IO-APIC mapping: */
> if (!cc_platform_has(CC_ATTR_GOLDFISH))
> flags = pgprot_decrypted(flags);
>
> I get it, naming is hard.

Boris --

Any further comments? Trying to reach consensus. A
solution aligned with Dave's arguments would keep the current
CC_ATTR_ACCESS_IOAPIC_ENCRYPTED, and add
CC_ATTR_ACCESS_TPM_ENCRYPTED to cover the TPM case,
which decouples the two.

Yes, naming is hard. Reaching consensus on naming is even
harder. :-)

Michael

2023-02-09 17:47:23

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Dave Hansen <[email protected]> Sent: Wednesday, February 8, 2023 9:24 AM
>
> On 2/7/23 04:41, Borislav Petkov wrote:
> > Or are there no similar TDX solutions planned where the guest runs
> > unmodified and under a paravisor?
>
> I actually don't think paravisors make *ANY* sense for Linux. If you
> have to modify the guest, then just modify it to talk to the hypervisor
> directly. This code is... modifying the guest. What does putting a
> paravisor in the middle do for you?

One of the original goals of the paravisor was to make fewer
modifications to the guest, especially in areas that aren't directly related
to the hypervisor. It's arguable as to whether that goal played out in
reality.

But another significant goal is to be able to move some device emulation
from the hypervisor/VMM to the guest context. In a CoCo VM, this move
is from outside the TCB to inside the TCB. A great example is a virtual
TPM. Per the CoCo VM threat model, a guest can't rely on a vTPM
provided by the host. But a guest *can* rely on a vTPM implemented in
a more privileged layer of the guest context. With CoCo VMs in the
Azure public cloud, the paravisor also provides other device emulation, like
the IO-APIC to solve some of the ugly interrupt delivery issues. In a
complete solution, it should be possible for a customer to provide his
own paravisor, or at least to inspect/audit the vendor-provided paravisor
code so that it can be certified against whatever security standards the
customer requires. For Azure CoCo VMs, this part is a work-in-progress.

This could turn into an extended discussion, and I've given only a
fairly high-level answer. There are architects at Microsoft who could
probably give a better rendition of why we've pursued the paravisor
approach with SEV-SNP guests.

Michael

>
> It might help with binary drivers, but we don't do upstream kernel work
> to make silly binary Linux drivers happy.
>
> So, no, there's no similar TDX solutions planned, at least for Linux
> guests. Unless I missed the memo. Kirill?

2023-02-10 18:42:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

Wearing my KVM hat and not my Google hat...

On Thu, Feb 09, 2023, Michael Kelley (LINUX) wrote:
> From: Dave Hansen <[email protected]> Sent: Wednesday, February 8, 2023 9:24 AM
> >
> > On 2/7/23 04:41, Borislav Petkov wrote:
> > > Or are there no similar TDX solutions planned where the guest runs
> > > unmodified and under a paravisor?
> >
> > I actually don't think paravisors make *ANY* sense for Linux.

I 100% agree, but Intel made what I think almost entirely irrelevant by refusing
to allow third party code to run in SEAM.

> > If you have to modify the guest, then just modify it to talk to the
> > hypervisor directly. This code is... modifying the guest. What does
> > putting a paravisor in the middle do for you?
>
> One of the original goals of the paravisor was to make fewer
> modifications to the guest, especially in areas that aren't directly related
> to the hypervisor. It's arguable as to whether that goal played out in
> reality.
>
> But another significant goal is to be able to move some device emulation
> from the hypervisor/VMM to the guest context. In a CoCo VM, this move
> is from outside the TCB to inside the TCB. A great example is a virtual
> TPM. Per the CoCo VM threat model, a guest can't rely on a vTPM
> provided by the host.

I vehemently disagree with this assertion. It's kinda sorta true, but only
because Intel and AMD have gone down the road of not providing the mechanisms and
ability for the hypervisor to run and attest to the integrity, functionality, etc.
of (a subset of) the hypervisor's own code.

Taking SEAM/TDX as an example, if the code running in SEAM were an extension of
KVM instead of a hypervisor-agnostic nanny, then there would be no need for a
"paravisor" to provide a vTPM. It would be very feasible to teach the SEAM-protected
bits of KVM to forward vTPM accesses to a host-provided, signed, attested, and open
source software running in a helper TD.

I fully realize you meant "untrusted host", but statements like "the host can't
be trusted" subconciously reinforce the, IMO, flawed model of hardware vendors
and _only_ hardware vendors providing the trusted bits.

The idea that firmware/software written by hardware vendors is somehow more
trustworthy than fully open source software is simultaneously laughable and
infuriating.

Anyways, tying things back to the actual code being discussed, I vote against
CC_ATTR_PARAVISOR. Being able to trust device emulation is not unique to a
paravisor. A single flag also makes too many assumptions about what is trusted
and thus should be accessed encrypted.

2023-02-10 18:58:34

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/10/23 10:41, Sean Christopherson wrote:
> Anyways, tying things back to the actual code being discussed, I vote against
> CC_ATTR_PARAVISOR. Being able to trust device emulation is not unique to a
> paravisor. A single flag also makes too many assumptions about what is trusted
> and thus should be accessed encrypted.

Did you like the more wordy per-device flags better? Or did you have
something else in mind entirely?

2023-02-10 19:04:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 10, 2023 at 06:41:54PM +0000, Sean Christopherson wrote:
> Anyways, tying things back to the actual code being discussed, I vote against
> CC_ATTR_PARAVISOR. Being able to trust device emulation is not unique to a
> paravisor. A single flag also makes too many assumptions about what is trusted
> and thus should be accessed encrypted.

I'm not crazy about the alternative either: one flag per access type:
IO APIC, vTPM, and soon.

Soon this will become an unmaintainable zoo of different guest types
people want the kernel to support. I don't think I want that madness in
kernel proper.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-10 19:15:50

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Friday, February 10, 2023 11:04 AM
>
> On Fri, Feb 10, 2023 at 06:41:54PM +0000, Sean Christopherson wrote:
> > Anyways, tying things back to the actual code being discussed, I vote against
> > CC_ATTR_PARAVISOR. Being able to trust device emulation is not unique to a
> > paravisor. A single flag also makes too many assumptions about what is trusted
> > and thus should be accessed encrypted.
>
> I'm not crazy about the alternative either: one flag per access type:
> IO APIC, vTPM, and soon.
>

FWIW, I don't think the list of devices to be accessed encrypted is likely
to grow significantly. Is one or two more possible? Perhaps. Does it
become a list of ten? I doubt it.

One approach is to go with the individual device attributes for now.
If the list does grow significantly, there will probably be patterns
or groupings that we can't discern now. We could restructure into
larger buckets at that point based on those patterns/groupings.

Michael

>
> Soon this will become an unmaintainable zoo of different guest types
> people want the kernel to support. I don't think I want that madness in
> kernel proper.
>

2023-02-10 19:37:00

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 10, 2023 at 07:15:41PM +0000, Michael Kelley (LINUX) wrote:
> FWIW, I don't think the list of devices to be accessed encrypted is likely
> to grow significantly. Is one or two more possible? Perhaps. Does it
> become a list of ten? I doubt it.

What happens if the next silly HV guest scheme comes along and they do
need more and different ones?

Do I say, but but, Michael said that he doubted at the time that that
list would grow... ;-\

And then all our paths are sprinkled with

if (cc_platform_has())

and we can't change sh*t anymore out of fear that some weird guest type
will break.

> One approach is to go with the individual device attributes for now.
> If the list does grow significantly, there will probably be patterns
> or groupings that we can't discern now. We could restructure into
> larger buckets at that point based on those patterns/groupings.

There's a reason the word "platform" is in cc_platform_has(). Initially
we wanted to distinguish attributes of the different platforms. So even
if y'all don't like CC_ATTR_PARAVISOR, that is what distinguishes this
platform and it *is* one platform.

So call it CC_ATTR_SEV_VTOM as it uses that technology or whatever. But
call it like the platform, not to mean "I need this functionality".

And yes, we could do the regroupings later because, yeah, those things
are not exposed to userspace so it's not like they're cast in stone but
I fear that we will do regroupings and we will break guests.

Now if you had CC_ATTR_<PLATFORM_TYPE> then you break (or not) only that
platform.

Oh, and then there's the thing that this is kernel proper - that code
still runs on real hardware, for now, and is not only guests. And not
everything is a damn cloud.

So I don't want a zoo here and we'd have to agree to distinguish by
platform and not by different functionality required.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-10 19:58:44

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/10/23 11:36, Borislav Petkov wrote:
>> One approach is to go with the individual device attributes for now.>> If the list does grow significantly, there will probably be patterns
>> or groupings that we can't discern now. We could restructure into
>> larger buckets at that point based on those patterns/groupings.
> There's a reason the word "platform" is in cc_platform_has(). Initially
> we wanted to distinguish attributes of the different platforms. So even
> if y'all don't like CC_ATTR_PARAVISOR, that is what distinguishes this
> platform and it *is* one platform.
>
> So call it CC_ATTR_SEV_VTOM as it uses that technology or whatever. But
> call it like the platform, not to mean "I need this functionality".

I can live with that. There's already a CC_ATTR_GUEST_SEV_SNP, so it
would at least not be too much of a break from what we already have.

2023-02-10 20:51:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 10, 2023, Dave Hansen wrote:
> On 2/10/23 11:36, Borislav Petkov wrote:
> >> One approach is to go with the individual device attributes for now.>> If the list does grow significantly, there will probably be patterns
> >> or groupings that we can't discern now. We could restructure into
> >> larger buckets at that point based on those patterns/groupings.
> > There's a reason the word "platform" is in cc_platform_has(). Initially
> > we wanted to distinguish attributes of the different platforms. So even
> > if y'all don't like CC_ATTR_PARAVISOR, that is what distinguishes this
> > platform and it *is* one platform.
> >
> > So call it CC_ATTR_SEV_VTOM as it uses that technology or whatever. But
> > call it like the platform, not to mean "I need this functionality".
>
> I can live with that. There's already a CC_ATTR_GUEST_SEV_SNP, so it
> would at least not be too much of a break from what we already have.

I'm fine with CC_ATTR_SEV_VTOM, assuming the proposal is to have something like:

static inline bool is_address_range_private(resource_size_t addr)
{
if (cc_platform_has(CC_ATTR_SEV_VTOM))
return is_address_below_vtom(addr);

return false;
}

i.e. not have SEV_VTOM mean "I/O APIC and vTPM are private". Though I don't see
the point in making it SEV vTOM specific or using a flag. Despite what any of us
think about TDX paravisors, it's completely doable within the confines of TDX to
have an emulated device reside in the private address space. E.g. why not
something like this?

static inline bool is_address_range_private(resource_size_t addr)
{
return addr < cc_platform_private_end;
}

where SEV fills in "cc_platform_private_end" when vTOM is enabled, and TDX does
the same. Or wrap cc_platform_private_end in a helper, etc.

2023-02-10 20:58:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 10, 2023, Sean Christopherson wrote:
> On Fri, Feb 10, 2023, Dave Hansen wrote:
> > On 2/10/23 11:36, Borislav Petkov wrote:
> > >> One approach is to go with the individual device attributes for now.>> If the list does grow significantly, there will probably be patterns
> > >> or groupings that we can't discern now. We could restructure into
> > >> larger buckets at that point based on those patterns/groupings.
> > > There's a reason the word "platform" is in cc_platform_has(). Initially
> > > we wanted to distinguish attributes of the different platforms. So even
> > > if y'all don't like CC_ATTR_PARAVISOR, that is what distinguishes this
> > > platform and it *is* one platform.
> > >
> > > So call it CC_ATTR_SEV_VTOM as it uses that technology or whatever. But
> > > call it like the platform, not to mean "I need this functionality".
> >
> > I can live with that. There's already a CC_ATTR_GUEST_SEV_SNP, so it
> > would at least not be too much of a break from what we already have.
>
> I'm fine with CC_ATTR_SEV_VTOM, assuming the proposal is to have something like:
>
> static inline bool is_address_range_private(resource_size_t addr)
> {
> if (cc_platform_has(CC_ATTR_SEV_VTOM))
> return is_address_below_vtom(addr);
>
> return false;
> }
>
> i.e. not have SEV_VTOM mean "I/O APIC and vTPM are private". Though I don't see
> the point in making it SEV vTOM specific or using a flag. Despite what any of us
> think about TDX paravisors, it's completely doable within the confines of TDX to
> have an emulated device reside in the private address space. E.g. why not
> something like this?
>
> static inline bool is_address_range_private(resource_size_t addr)
> {
> return addr < cc_platform_private_end;
> }
>
> where SEV fills in "cc_platform_private_end" when vTOM is enabled, and TDX does
> the same. Or wrap cc_platform_private_end in a helper, etc.

Gah, forgot that the intent with TDX is to enumerate devices in their legacy
address spaces. So a TDX guest couldn't do this by default, but if/when Hyper-V
or some other hypervisor moves I/O APIC, vTPM, etc... into the TCB, the common
code would just work and only the hypervisor-specific paravirt code would need
to change.

Probably need a more specific name than is_address_range_private() though, e.g.
is_mmio_address_range_private()?

2023-02-10 21:27:49

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Sean Christopherson <[email protected]> Sent: Friday, February 10, 2023 12:58 PM
>
> On Fri, Feb 10, 2023, Sean Christopherson wrote:
> > On Fri, Feb 10, 2023, Dave Hansen wrote:
> > > On 2/10/23 11:36, Borislav Petkov wrote:
> > > >> One approach is to go with the individual device attributes for now.>> If the list
> does grow significantly, there will probably be patterns
> > > >> or groupings that we can't discern now. We could restructure into
> > > >> larger buckets at that point based on those patterns/groupings.
> > > > There's a reason the word "platform" is in cc_platform_has(). Initially
> > > > we wanted to distinguish attributes of the different platforms. So even
> > > > if y'all don't like CC_ATTR_PARAVISOR, that is what distinguishes this
> > > > platform and it *is* one platform.
> > > >
> > > > So call it CC_ATTR_SEV_VTOM as it uses that technology or whatever. But
> > > > call it like the platform, not to mean "I need this functionality".
> > >
> > > I can live with that. There's already a CC_ATTR_GUEST_SEV_SNP, so it
> > > would at least not be too much of a break from what we already have.
> >
> > I'm fine with CC_ATTR_SEV_VTOM, assuming the proposal is to have something like:
> >
> > static inline bool is_address_range_private(resource_size_t addr)
> > {
> > if (cc_platform_has(CC_ATTR_SEV_VTOM))
> > return is_address_below_vtom(addr);
> >
> > return false;
> > }
> >
> > i.e. not have SEV_VTOM mean "I/O APIC and vTPM are private". Though I don't see
> > the point in making it SEV vTOM specific or using a flag. Despite what any of us
> > think about TDX paravisors, it's completely doable within the confines of TDX to
> > have an emulated device reside in the private address space. E.g. why not
> > something like this?
> >
> > static inline bool is_address_range_private(resource_size_t addr)
> > {
> > return addr < cc_platform_private_end;
> > }
> >
> > where SEV fills in "cc_platform_private_end" when vTOM is enabled, and TDX does
> > the same. Or wrap cc_platform_private_end in a helper, etc.
>
> Gah, forgot that the intent with TDX is to enumerate devices in their legacy
> address spaces. So a TDX guest couldn't do this by default, but if/when Hyper-V
> or some other hypervisor moves I/O APIC, vTPM, etc... into the TCB, the common
> code would just work and only the hypervisor-specific paravirt code would need
> to change.
>
> Probably need a more specific name than is_address_range_private() though, e.g.
> is_mmio_address_range_private()?

Maybe I'm not understanding what you are proposing, but in an SEV-SNP
VM using vTOM, devices like the IO-APIC and TPM live at their usual guest
physical addresses. The question is whether the kernel virtual mapping
should be set up with encryption enabled or disabled. That question can't
be answered by looking at the device's address. Whether to map a particular
device with encryption enabled really is a property of the "platform" because
it depends on whether the paravisor is emulating the device. Having the
paravisor emulate the device does not change its guest physical address.

While there's a duality, it's better to think of the vTOM bit as the
"encryption" flag in the PTE rather than part of the guest physical address.
A key part of this patch series is about making that shift in how the vTOM
bit is treated. With the change, the vTOM bit is treated pretty much the
same as the TDX SHARED flag.

Michael



2023-02-10 23:47:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 10, 2023, Michael Kelley (LINUX) wrote:
> From: Sean Christopherson <[email protected]> Sent: Friday, February 10, 2023 12:58 PM
> >
> > On Fri, Feb 10, 2023, Sean Christopherson wrote:
> > > On Fri, Feb 10, 2023, Dave Hansen wrote:
> > > > On 2/10/23 11:36, Borislav Petkov wrote:
> > > > >> One approach is to go with the individual device attributes for now.>> If the list
> > does grow significantly, there will probably be patterns
> > > > >> or groupings that we can't discern now. We could restructure into
> > > > >> larger buckets at that point based on those patterns/groupings.
> > > > > There's a reason the word "platform" is in cc_platform_has(). Initially
> > > > > we wanted to distinguish attributes of the different platforms. So even
> > > > > if y'all don't like CC_ATTR_PARAVISOR, that is what distinguishes this
> > > > > platform and it *is* one platform.
> > > > >
> > > > > So call it CC_ATTR_SEV_VTOM as it uses that technology or whatever. But
> > > > > call it like the platform, not to mean "I need this functionality".
> > > >
> > > > I can live with that. There's already a CC_ATTR_GUEST_SEV_SNP, so it
> > > > would at least not be too much of a break from what we already have.
> > >
> > > I'm fine with CC_ATTR_SEV_VTOM, assuming the proposal is to have something like:
> > >
> > > static inline bool is_address_range_private(resource_size_t addr)
> > > {
> > > if (cc_platform_has(CC_ATTR_SEV_VTOM))
> > > return is_address_below_vtom(addr);
> > >
> > > return false;
> > > }
> > >
> > > i.e. not have SEV_VTOM mean "I/O APIC and vTPM are private". Though I don't see
> > > the point in making it SEV vTOM specific or using a flag. Despite what any of us
> > > think about TDX paravisors, it's completely doable within the confines of TDX to
> > > have an emulated device reside in the private address space. E.g. why not
> > > something like this?
> > >
> > > static inline bool is_address_range_private(resource_size_t addr)
> > > {
> > > return addr < cc_platform_private_end;
> > > }
> > >
> > > where SEV fills in "cc_platform_private_end" when vTOM is enabled, and TDX does
> > > the same. Or wrap cc_platform_private_end in a helper, etc.
> >
> > Gah, forgot that the intent with TDX is to enumerate devices in their legacy
> > address spaces. So a TDX guest couldn't do this by default, but if/when Hyper-V
> > or some other hypervisor moves I/O APIC, vTPM, etc... into the TCB, the common
> > code would just work and only the hypervisor-specific paravirt code would need
> > to change.
> >
> > Probably need a more specific name than is_address_range_private() though, e.g.
> > is_mmio_address_range_private()?
>
> Maybe I'm not understanding what you are proposing, but in an SEV-SNP
> VM using vTOM, devices like the IO-APIC and TPM live at their usual guest
> physical addresses.

Ah, so as the cover letter says, the intent really is to treat vTOM as an
attribute bit. Sorry, I got confused by Boris's comment:

: What happens if the next silly HV guest scheme comes along and they do
: need more and different ones?

Based on that comment, I assumed the proposal to use CC_ATTR_SEV_VTOM was intended
to be a generic range-based thing, but it sounds like that's not the case.

IMO, using CC_ATTR_SEV_VTOM to infer anything about the state of I/O APIC or vTPM
is wrong. vTOM as a platform feature effectively says "physical address bit X
controls private vs. shared" (ignoring weird usage of vTOM). vTOM does not mean
I/O APIC and vTPM are private, that's very much a property of Hyper-V's current
generation of vTOM-based VMs.

Hardcoding this in the guest feels wrong. Ideally, we would have a way to enumerate
that a device is private/trusted, e.g. through ACPI. I'm guessing we already
missed the boat on that, so the next best thing is to do something like Michael
originally proposed in this patch and shove the "which devices are private" logic
into hypervisor-specific code, i.e. let Hyper-V figure out how to enumerate to its
guests which devices are shared.

I agree with Boris' comment that a one-off "other encrypted range" is a hack, but
that's just an API problem. The kernel already has hypervisor specific hooks (and
for SEV-ES even), why not expand that? That way figuring out which devices are
private is wholly contained in Hyper-V code, at least until there's a generic
solution for enumerating private devices, though that seems unlikely to happen
and will be a happy problem to solve if it does come about.

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a868b76cd3d4..08f65ed439d9 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2682,11 +2682,16 @@ static void io_apic_set_fixmap(enum fixed_addresses idx, phys_addr_t phys)
{
pgprot_t flags = FIXMAP_PAGE_NOCACHE;

- /*
- * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
- * bits, just like normal ioremap():
- */
- flags = pgprot_decrypted(flags);
+ if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
+ /*
+ * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
+ * bits, just like normal ioremap():
+ */
+ if (x86_platform.hyper.is_private_mmio(phys))
+ flags = pgprot_encrypted(flags);
+ else
+ flags = pgprot_decrypted(flags);
+ }

__set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 6453fbaedb08..0baec766b921 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -116,6 +116,9 @@ static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *des
if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;

+ if (x86_platform.hyper.is_private_mmio(addr))
+ desc->flags |= IORES_MAP_ENCRYPTED;
+
if (!IS_ENABLED(CONFIG_EFI))
return;



2023-02-14 07:45:57

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Sean Christopherson <[email protected]> Sent: Friday, February 10, 2023 3:47 PM
>
> On Fri, Feb 10, 2023, Michael Kelley (LINUX) wrote:
> > From: Sean Christopherson <[email protected]> Sent: Friday, February 10, 2023
> 12:58 PM
> > >
> > > On Fri, Feb 10, 2023, Sean Christopherson wrote:
> > > > On Fri, Feb 10, 2023, Dave Hansen wrote:
> > > > > On 2/10/23 11:36, Borislav Petkov wrote:
> > > > > >> One approach is to go with the individual device attributes for now.>> If the list
> > > does grow significantly, there will probably be patterns
> > > > > >> or groupings that we can't discern now. We could restructure into
> > > > > >> larger buckets at that point based on those patterns/groupings.
> > > > > > There's a reason the word "platform" is in cc_platform_has(). Initially
> > > > > > we wanted to distinguish attributes of the different platforms. So even
> > > > > > if y'all don't like CC_ATTR_PARAVISOR, that is what distinguishes this
> > > > > > platform and it *is* one platform.
> > > > > >
> > > > > > So call it CC_ATTR_SEV_VTOM as it uses that technology or whatever. But
> > > > > > call it like the platform, not to mean "I need this functionality".
> > > > >
> > > > > I can live with that. There's already a CC_ATTR_GUEST_SEV_SNP, so it
> > > > > would at least not be too much of a break from what we already have.
> > > >
> > > > I'm fine with CC_ATTR_SEV_VTOM, assuming the proposal is to have something
> like:
> > > >
> > > > static inline bool is_address_range_private(resource_size_t addr)
> > > > {
> > > > if (cc_platform_has(CC_ATTR_SEV_VTOM))
> > > > return is_address_below_vtom(addr);
> > > >
> > > > return false;
> > > > }
> > > >
> > > > i.e. not have SEV_VTOM mean "I/O APIC and vTPM are private". Though I don't
> see
> > > > the point in making it SEV vTOM specific or using a flag. Despite what any of us
> > > > think about TDX paravisors, it's completely doable within the confines of TDX to
> > > > have an emulated device reside in the private address space. E.g. why not
> > > > something like this?
> > > >
> > > > static inline bool is_address_range_private(resource_size_t addr)
> > > > {
> > > > return addr < cc_platform_private_end;
> > > > }
> > > >
> > > > where SEV fills in "cc_platform_private_end" when vTOM is enabled, and TDX does
> > > > the same. Or wrap cc_platform_private_end in a helper, etc.
> > >
> > > Gah, forgot that the intent with TDX is to enumerate devices in their legacy
> > > address spaces. So a TDX guest couldn't do this by default, but if/when Hyper-V
> > > or some other hypervisor moves I/O APIC, vTPM, etc... into the TCB, the common
> > > code would just work and only the hypervisor-specific paravirt code would need
> > > to change.
> > >
> > > Probably need a more specific name than is_address_range_private() though, e.g.
> > > is_mmio_address_range_private()?
> >
> > Maybe I'm not understanding what you are proposing, but in an SEV-SNP
> > VM using vTOM, devices like the IO-APIC and TPM live at their usual guest
> > physical addresses.
>
> Ah, so as the cover letter says, the intent really is to treat vTOM as an
> attribute bit. Sorry, I got confused by Boris's comment:
>
> : What happens if the next silly HV guest scheme comes along and they do
> : need more and different ones?
>
> Based on that comment, I assumed the proposal to use CC_ATTR_SEV_VTOM was
> intended
> to be a generic range-based thing, but it sounds like that's not the case.
>
> IMO, using CC_ATTR_SEV_VTOM to infer anything about the state of I/O APIC or vTPM
> is wrong. vTOM as a platform feature effectively says "physical address bit X
> controls private vs. shared" (ignoring weird usage of vTOM). vTOM does not mean
> I/O APIC and vTPM are private, that's very much a property of Hyper-V's current
> generation of vTOM-based VMs.
>
> Hardcoding this in the guest feels wrong. Ideally, we would have a way to enumerate
> that a device is private/trusted, e.g. through ACPI. I'm guessing we already
> missed the boat on that, so the next best thing is to do something like Michael
> originally proposed in this patch and shove the "which devices are private" logic
> into hypervisor-specific code, i.e. let Hyper-V figure out how to enumerate to its
> guests which devices are shared.
>
> I agree with Boris' comment that a one-off "other encrypted range" is a hack, but
> that's just an API problem. The kernel already has hypervisor specific hooks (and
> for SEV-ES even), why not expand that? That way figuring out which devices are
> private is wholly contained in Hyper-V code, at least until there's a generic
> solution for enumerating private devices, though that seems unlikely to happen
> and will be a happy problem to solve if it does come about.

Yes, this is definitely a cleaner way to implement what I was originally proposing.
I can make it work if there's agreement to take this approach.

Michael

>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index a868b76cd3d4..08f65ed439d9 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2682,11 +2682,16 @@ static void io_apic_set_fixmap(enum fixed_addresses idx,
> phys_addr_t phys)
> {
> pgprot_t flags = FIXMAP_PAGE_NOCACHE;
>
> - /*
> - * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
> - * bits, just like normal ioremap():
> - */
> - flags = pgprot_decrypted(flags);
> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
> + /*
> + * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
> + * bits, just like normal ioremap():
> + */
> + if (x86_platform.hyper.is_private_mmio(phys))
> + flags = pgprot_encrypted(flags);
> + else
> + flags = pgprot_decrypted(flags);
> + }
>
> __set_fixmap(idx, phys, flags);
> }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 6453fbaedb08..0baec766b921 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -116,6 +116,9 @@ static void __ioremap_check_other(resource_size_t addr, struct
> ioremap_desc *des
> if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> return;
>
> + if (x86_platform.hyper.is_private_mmio(addr))
> + desc->flags |= IORES_MAP_ENCRYPTED;
> +
> if (!IS_ENABLED(CONFIG_EFI))
> return;
>


2023-02-16 13:33:09

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 10, 2023 at 11:47:27PM +0000, Sean Christopherson wrote:
> I agree with Boris' comment that a one-off "other encrypted range" is a hack, but
> that's just an API problem. The kernel already has hypervisor specific hooks (and
> for SEV-ES even), why not expand that? That way figuring out which devices are
> private is wholly contained in Hyper-V code, at least until there's a generic
> solution for enumerating private devices, though that seems unlikely to happen
> and will be a happy problem to solve if it does come about.

I feel ya and this all makes sense and your proposals look clean enough
to me but we still need some way of determining whether this is a vTOM
on hyperv because there's the next crapola with

https://lore.kernel.org/r/[email protected]

because apparently hyperv does PAT but disables MTRRs for such vTOM
SEV-SNP guests and ... madness.

But that's not the only example - Xen has been doing this thing too.

And Jürgen has been trying to address this in a clean way but it is
a pain.

What I don't want to have is a gazillion ways to check what needs to
happen for which guest type. Because people who change the kernel to run
on baremetal, will break them. And I can't blame them. We try to support
all kinds of guests in the x86 code but this support should be plain and
simple.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-16 16:16:26

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Thursday, February 16, 2023 5:33 AM
>
> On Fri, Feb 10, 2023 at 11:47:27PM +0000, Sean Christopherson wrote:
> > I agree with Boris' comment that a one-off "other encrypted range" is a hack, but
> > that's just an API problem. The kernel already has hypervisor specific hooks (and
> > for SEV-ES even), why not expand that? That way figuring out which devices are
> > private is wholly contained in Hyper-V code, at least until there's a generic
> > solution for enumerating private devices, though that seems unlikely to happen
> > and will be a happy problem to solve if it does come about.
>
> I feel ya and this all makes sense and your proposals look clean enough
> to me but we still need some way of determining whether this is a vTOM
> on hyperv

Historically, callbacks like Sean proposed default to NULL and do nothing
unless they are explicitly set. The Hyper-V vTOM code would set the callback.
Is that not sufficient? Or in the two places where the callback would
be made, do you want to bracket with a test for being in a Hyper-V vTOM
VM? If so, then we're back to needing something like CC_ATTR_PARAVISOR
on which to gate the callbacks.

Or do you mean something else entirely?

Michael

> because there's the next crapola with
>
> https://lore.kernel.org/all/[email protected]/
>
> because apparently hyperv does PAT but disables MTRRs for such vTOM
> SEV-SNP guests and ... madness.
>
> But that's not the only example - Xen has been doing this thing too.
>
> And J?rgen has been trying to address this in a clean way but it is
> a pain.
>
> What I don't want to have is a gazillion ways to check what needs to
> happen for which guest type. Because people who change the kernel to run
> on baremetal, will break them. And I can't blame them. We try to support
> all kinds of guests in the x86 code but this support should be plain and
> simple.
>

2023-02-16 17:06:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Feb 16, 2023 at 04:16:16PM +0000, Michael Kelley (LINUX) wrote:
> Historically, callbacks like Sean proposed default to NULL and do nothing
> unless they are explicitly set. The Hyper-V vTOM code would set the callback.
> Is that not sufficient? Or in the two places where the callback would
> be made, do you want to bracket with a test for being in a Hyper-V vTOM
> VM? If so, then we're back to needing something like CC_ATTR_PARAVISOR
> on which to gate the callbacks.
>
> Or do you mean something else entirely?

See the second part of my reply.

This thing...

> > because there's the next crapola with
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> > because apparently hyperv does PAT but disables MTRRs for such vTOM
> > SEV-SNP guests and ... madness.
> >
> > But that's not the only example - Xen has been doing this thing too.
> >
> > And Jürgen has been trying to address this in a clean way but it is
> > a pain.
> >
> > What I don't want to have is a gazillion ways to check what needs to
> > happen for which guest type. Because people who change the kernel to run
> > on baremetal, will break them. And I can't blame them. We try to support
> > all kinds of guests in the x86 code but this support should be plain and
> > simple.

... here.

We need a single way to test for this guest type and stick with it.

I'd like for all guest types we support to be queried in a plain and
simple way.

Not:

* CC_ATTR_GUEST_MEM_ENCRYPT

* x86_platform.hyper.is_private_mmio(addr)

* CC_ATTR_PARAVISOR

to mean three different aspects of SEV-SNP guests using vTOM on Hyper-V.

This is going to be a major mess which we won't support.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-17 06:17:06

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted



From: Borislav Petkov <[email protected]> Sent: Thursday, February 16, 2023 9:07 AM
>
> ... here.
>
> We need a single way to test for this guest type and stick with it.
>
> I'd like for all guest types we support to be queried in a plain and
> simple way.
>
> Not:
>
> * CC_ATTR_GUEST_MEM_ENCRYPT
>
> * x86_platform.hyper.is_private_mmio(addr)
>
> * CC_ATTR_PARAVISOR
>
> to mean three different aspects of SEV-SNP guests using vTOM on Hyper-V.
>
> This is going to be a major mess which we won't support.
>

OK, I'm trying to figure out where to go next. I've been following the pattern
set by the SEV/SEV-ES/SEV-SNP and TDX platforms in the cc_platform_has()
function. Each platform returns "true" for multiple CC_ATTR_* values,
and those CC_ATTR_* values are tested in multiple places throughout
kernel code. Some CC_ATTR_* values are shared by multiple platforms
(like CC_ATTR_GUEST_MEM_ENCRYPT) and some are unique to a particular
platform (like CC_ATTR_HOTPLUG_DISABLED). For the CC_ATTR_* values
that are shared, the logic of which platforms they apply to occurs once in
cc_platform_has() rather than scattered and duplicated throughout the
kernel, which makes sense. Any given platform is not represented by a
single CC_ATTR_* value, but by multiple ones. Each CC_ATTR_* value
essentially represents a chunk of kernel functionality that one or more
platforms need, and the platform indicates its need by cc_platform_has()
returning "true" for that value.

So I've been trying to apply the same pattern to the SNP vTOM sub-case
of SEV-SNP. Is that consistent with your thinking, or is the whole
cc_platform_has() approach problematic, including for the existing
SEV flavors and for TDX?

Michael

2023-02-17 14:55:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 17, 2023 at 06:16:56AM +0000, Michael Kelley (LINUX) wrote:
> Is that consistent with your thinking, or is the whole
> cc_platform_has() approach problematic, including for the existing SEV
> flavors and for TDX?

The confidential computing attributes are, yes, features. I've been
preaching since the very beginning that vTOM *is* *also* one such
feature. It is a feature bit in sev_features, for chrissakes. So by that
logic, those SEV-SNP HyperV guests should return true when

cc_platform_has(CC_ATTR_GUEST_SEV_SNP_VTOM);

is tested.

But Sean doesn't like that.

If the access method to the IO-APIC and vTPM are specific to the
HyperV's vTOM implementation, then I don't mind if this were called

cc_platform_has(CC_ATTR_GUEST_HYPERV_VTOM);

Frankly, I don't see any other enlightened guest using vTOM except
HyperV's but virt folks have managed to surprise me in the past too.

In any case, a single flag which is specific to that guest type is fine
too.

It feels like we're running in circles by now... ;-\

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-22 22:14:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Fri, Feb 17, 2023, Borislav Petkov wrote:
> On Fri, Feb 17, 2023 at 06:16:56AM +0000, Michael Kelley (LINUX) wrote:
> > Is that consistent with your thinking, or is the whole
> > cc_platform_has() approach problematic, including for the existing SEV
> > flavors and for TDX?
>
> The confidential computing attributes are, yes, features. I've been
> preaching since the very beginning that vTOM *is* *also* one such
> feature. It is a feature bit in sev_features, for chrissakes. So by that
> logic, those SEV-SNP HyperV guests should return true when
>
> cc_platform_has(CC_ATTR_GUEST_SEV_SNP_VTOM);
>
> is tested.
>
> But Sean doesn't like that.

Because vTOM is a hardware feature, whereas the IO-APIC and vTPM being accessible
via private memory are software features. It's very possible to emulate the
IO-APIC in trusted code without vTOM.

> If the access method to the IO-APIC and vTPM are specific to the
> HyperV's vTOM implementation, then I don't mind if this were called
>
> cc_platform_has(CC_ATTR_GUEST_HYPERV_VTOM);

I still think that's likely to caused problems in the future, e.g. if Hyper-V
moves more stuff into the paravisor or if Hyper-V ends up with similar functionality
for TDX. But it's not a sticking point, the only thing I'm fiercely resistant to
is conflating hardware features with software features.

2023-02-22 22:33:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Wed, Feb 22, 2023 at 02:13:44PM -0800, Sean Christopherson wrote:
> Because vTOM is a hardware feature, whereas the IO-APIC and vTPM being accessible
> via private memory are software features. It's very possible to emulate the
> IO-APIC in trusted code without vTOM.

I know, but their use case is dictated by the fact that they're using
a SNP guest *with* vTOM as a SEV feature. And so their guest does
IO-APIC and vTPM *with* the vTOM SEV feature. That's what I'm trying to
model.

> > If the access method to the IO-APIC and vTPM are specific to the
> > HyperV's vTOM implementation, then I don't mind if this were called
> >
> > cc_platform_has(CC_ATTR_GUEST_HYPERV_VTOM);
>
> I still think that's likely to caused problems in the future, e.g. if Hyper-V
> moves more stuff into the paravisor or if Hyper-V ends up with similar functionality
> for TDX.

Yah, reportedly, TDX folks are not very interested in this case.

> But it's not a sticking point, the only thing I'm fiercely resistant to
> is conflating hardware features with software features.

So you and I need to find a common ground...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-22 22:54:58

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Wed, Feb 22, 2023, Borislav Petkov wrote:
> On Wed, Feb 22, 2023 at 02:13:44PM -0800, Sean Christopherson wrote:
> > Because vTOM is a hardware feature, whereas the IO-APIC and vTPM being accessible
> > via private memory are software features. It's very possible to emulate the
> > IO-APIC in trusted code without vTOM.
>
> I know, but their use case is dictated by the fact that they're using
> a SNP guest *with* vTOM as a SEV feature. And so their guest does
> IO-APIC and vTPM *with* the vTOM SEV feature. That's what I'm trying to
> model.

Why? I genuinely don't understand the motivation for bundling all of this stuff
under a single "feature". To me, that's like saying Haswell or Zen2 is a "feature",
but outside of a very few cases where the exact uarch truly matters, nothing pivots
on FMS because the CPU type is not a single feature.

2023-02-22 23:35:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Wed, Feb 22, 2023 at 02:54:47PM -0800, Sean Christopherson wrote:
> Why? I genuinely don't understand the motivation for bundling all of this stuff
> under a single "feature".

It is called "sanity".

See here:

https://lore.kernel.org/r/Y%[email protected]

We support SEV, SEV-ES, SEV-SNP, TDX, HyperV... guests and whatever's
coming down the pipe. And all that goes into arch/x86/ kernel proper
code.

The CC_ATTR stuff is clean-ish in the sense that we have separation by
confidential computing platform - AMD's and Intel's. Hyper-V comes along
and wants to define a different subset of that. And that's only the
SEV-SNP side - there's a TDX patchset too.

And then some other hypervisor will come along and say, but but, I wanna
have X and Y and a pink pony too.

Oh, and there's this other fun with MTRRs where each HV decides to do
whatever it wants.

So, we have a zoo brewing on the horizon already!

If there's no clean definition of what each guest is and requires and
that stuff isn't documented properly and if depending on which "feature"
I need to check, I need to call a different function or query
a different variable, then it won't go anywhere as far as guest support
goes.

The cc_platform_has() thing gives us a relatively clean way to abstract
all those differences away and keep the code sane-ish.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-23 01:21:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Feb 23, 2023, Borislav Petkov wrote:
> On Wed, Feb 22, 2023 at 02:54:47PM -0800, Sean Christopherson wrote:
> > Why? I genuinely don't understand the motivation for bundling all of this stuff
> > under a single "feature".
>
> It is called "sanity".
>
> See here:
>
> https://lore.kernel.org/r/Y%[email protected]
>
> We support SEV, SEV-ES, SEV-SNP, TDX, HyperV... guests and whatever's
> coming down the pipe. And all that goes into arch/x86/ kernel proper
> code.
>
> The CC_ATTR stuff is clean-ish in the sense that we have separation by
> confidential computing platform - AMD's and Intel's. Hyper-V comes along
> and wants to define a different subset of that. And that's only the
> SEV-SNP side - there's a TDX patchset too.
>
> And then some other hypervisor will come along and say, but but, I wanna
> have X and Y and a pink pony too.
>
> Oh, and there's this other fun with MTRRs where each HV decides to do
> whatever it wants.

The MTRR mess isn't unique to coco guests, e.g. KVM explicitly "supports" VMMs
hiding MTTRs from the guest by defaulting to WB if MTTRs aren't exposed to the
guest. Why on earth Hyper-V suddenly needs to enlighten the guest is beyond me,
but whatever the reason, it's not unique to coco VMs.

> So, we have a zoo brewing on the horizon already!
>
> If there's no clean definition of what each guest is and requires and
> that stuff isn't documented properly and if depending on which "feature"
> I need to check, I need to call a different function or query
> a different variable, then it won't go anywhere as far as guest support
> goes.
>
> The cc_platform_has() thing gives us a relatively clean way to abstract
> all those differences away and keep the code sane-ish.

For features that are inherent to the platform, I agree, or at least I don't hate
the interface. But defining a platform to have specific devices runs counter to
pretty much the entire x86 ecosystem. At some point, there _will_ be more devices
in private memory than just IO-APIC and TPM, and conversely there will be "platforms"
that support a trusted TPM but not a trusted IO-APIC, and probably even vice versa.

All I'm advocating is that for determining whether or not a device should be mapped
private vs. shared, provide an API so that the hypervisor-specific enlightened code
can manage that insanity without polluting common code. If we are ever fortunate
enough to have common enumeration, e.g. through ACPI or something, the enlightened
code can simply reroute to the common code. This is a well established pattern for
many paravirt features, I don't see why it wouldn't work here.

2023-02-23 10:45:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Wed, Feb 22, 2023 at 05:21:27PM -0800, Sean Christopherson wrote:
> The MTRR mess isn't unique to coco guests, e.g. KVM explicitly "supports" VMMs
> hiding MTTRs from the guest by defaulting to WB if MTTRs aren't exposed to the
> guest. Why on earth Hyper-V suddenly needs to enlighten the guest is beyond me,
> but whatever the reason, it's not unique to coco VMs.

Well, TDX can't stomach MTRRs either, reportedly, and I hear we should
try to avoid #VEs for them too.

And this is the problem: all those guest "enlightening" efforts come up
with the weirdest stuff they need to sprinkle around arch/x86/. And if
we let that without paying attention to the big picture, that will
become an unmaintanable mess.

And I'm not proud of some of the stuff we did in arch/x86/ already and
some day they'll get on my nerves just enough...

> All I'm advocating is that for determining whether or not a device should be mapped
> private vs. shared, provide an API so that the hypervisor-specific enlightened code
> can manage that insanity without polluting common code. If we are ever fortunate
> enough to have common enumeration, e.g. through ACPI or something, the enlightened
> code can simply reroute to the common code. This is a well established pattern for
> many paravirt features, I don't see why it wouldn't work here.

Yah, that would be good. If the device can know upfront how it needs to
ioremap its address range, then that is fine - we already have
ioremap_encrypted() for example.

What I don't like is hooking conditionals into the common code to figure
out what to do depending on what we're running on.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-02-23 20:01:40

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Borislav Petkov <[email protected]> Sent: Thursday, February 23, 2023 2:45 AM
>
> On Wed, Feb 22, 2023 at 05:21:27PM -0800, Sean Christopherson wrote:
>
> > All I'm advocating is that for determining whether or not a device should be mapped
> > private vs. shared, provide an API so that the hypervisor-specific enlightened code
> > can manage that insanity without polluting common code. If we are ever fortunate
> > enough to have common enumeration, e.g. through ACPI or something, the enlightened
> > code can simply reroute to the common code. This is a well established pattern for
> > many paravirt features, I don't see why it wouldn't work here.
>
> Yah, that would be good.

Just so I'm clear, are you saying you are good with the proposal that Sean
sketched out with code here? [1] With his proposal, the device driver
is not involved in deciding whether to map encrypted or decrypted. That
decision is made in the hypervisor-specific callback function based on
the physical address. The callback has to be made in two places in common
code. But then as Sean said, " the hypervisor-specific enlightened code
can manage that insanity without polluting common code". :-)

I like Sean's proposal, so if you are good with it, I'll do v6 of the patch
set with that approach.

Dave Hansen: Are you also OK with Sean's proposal? Looking for consensus
here ....

> If the device can know upfront how it needs to
> ioremap its address range, then that is fine - we already have
> ioremap_encrypted() for example.
>

Again, the device driver would not be involved in Sean's proposal.

Michael

[1] https://lore.kernel.org/linux-hyperv/[email protected]/

2023-02-23 20:26:54

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/10/23 15:47, Sean Christopherson wrote:
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index a868b76cd3d4..08f65ed439d9 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2682,11 +2682,16 @@ static void io_apic_set_fixmap(enum fixed_addresses idx, phys_addr_t phys)
> {
> pgprot_t flags = FIXMAP_PAGE_NOCACHE;
>
> - /*
> - * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
> - * bits, just like normal ioremap():
> - */
> - flags = pgprot_decrypted(flags);
> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
> + /*
> + * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
> + * bits, just like normal ioremap():
> + */
> + if (x86_platform.hyper.is_private_mmio(phys))
> + flags = pgprot_encrypted(flags);
> + else
> + flags = pgprot_decrypted(flags);
> + }

I don't completely hate this. Thinking to the future, I'd hope that
future platforms will include information about which physical addresses
are shared or private. This might even vary per device, but this
interface would still work.

I _think_ it would be nicer to wrap both checks up in a helper where the
comments can be more detailed, like:

if (cc_private_mmio(phys))
flags = pgprot_encrypted(flags);
else
flags = pgprot_decrypted(flags);

but I honestly don't feel that strongly about it.

It does seem a bit odd that there's a new CC_ATTR_GUEST_MEM_ENCRYPT
check wrapping this whole thing. I guess the trip through
pgprot_decrypted() is harmless on normal platforms, though.

> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 6453fbaedb08..0baec766b921 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -116,6 +116,9 @@ static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *des
> if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> return;
>
> + if (x86_platform.hyper.is_private_mmio(addr))
> + desc->flags |= IORES_MAP_ENCRYPTED;
> +
> if (!IS_ENABLED(CONFIG_EFI))
> return;
>


2023-02-23 20:28:24

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/23/23 12:01, Michael Kelley (LINUX) wrote:
> Dave Hansen: Are you also OK with Sean's proposal? Looking for consensus
> here ....

Yeah, I'm generally OK with it as long as Borislav is.

2023-02-23 20:42:04

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/23/23 12:26, Dave Hansen wrote:
>> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
>> + /*
>> + * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
>> + * bits, just like normal ioremap():
>> + */
>> + if (x86_platform.hyper.is_private_mmio(phys))
>> + flags = pgprot_encrypted(flags);
>> + else
>> + flags = pgprot_decrypted(flags);
>> + }
...
> It does seem a bit odd that there's a new CC_ATTR_GUEST_MEM_ENCRYPT
> check wrapping this whole thing. I guess the trip through
> pgprot_decrypted() is harmless on normal platforms, though.

Yeah, that's _really_ odd. Sean, were you trying to optimize away the
indirect call or something?

I would just expect the Hyper-V/vTOM code to leave
x86_platform.hyper.is_private_mmio alone unless
it *knows* the platform has private MMIO *and* CC_ATTR_GUEST_MEM_ENCRYPT.

Is there ever a case where CC_ATTR_GUEST_MEM_ENCRYPT==0 and he
Hyper-V/vTOM code would need to set x86_platform.hyper.is_private_mmio?

2023-02-23 20:52:04

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Dave Hansen <[email protected]> Sent: Thursday, February 23, 2023 12:42 PM
>
> On 2/23/23 12:26, Dave Hansen wrote:
> >> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
> >> + /*
> >> + * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
> >> + * bits, just like normal ioremap():
> >> + */
> >> + if (x86_platform.hyper.is_private_mmio(phys))
> >> + flags = pgprot_encrypted(flags);
> >> + else
> >> + flags = pgprot_decrypted(flags);
> >> + }
> ...
> > It does seem a bit odd that there's a new CC_ATTR_GUEST_MEM_ENCRYPT
> > check wrapping this whole thing. I guess the trip through
> > pgprot_decrypted() is harmless on normal platforms, though.
>
> Yeah, that's _really_ odd. Sean, were you trying to optimize away the
> indirect call or something?
>
> I would just expect the Hyper-V/vTOM code to leave
> x86_platform.hyper.is_private_mmio alone unless
> it *knows* the platform has private MMIO *and* CC_ATTR_GUEST_MEM_ENCRYPT.

Agreed.

>
> Is there ever a case where CC_ATTR_GUEST_MEM_ENCRYPT==0 and he
> Hyper-V/vTOM code would need to set x86_platform.hyper.is_private_mmio?

There's no such case.

I agree that gating with CC_ATTR_GUEST_MEM_ENCRYPT isn't really necessary.
Current upstream code always does the pgprot_decrypted(), and as you said,
that's a no-op on platforms with no memory encryption.

Michael

2023-02-23 21:07:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Feb 23, 2023, Michael Kelley (LINUX) wrote:
> From: Dave Hansen <[email protected]> Sent: Thursday, February 23, 2023 12:42 PM
> >
> > On 2/23/23 12:26, Dave Hansen wrote:
> > >> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
> > >> + /*
> > >> + * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
> > >> + * bits, just like normal ioremap():
> > >> + */
> > >> + if (x86_platform.hyper.is_private_mmio(phys))
> > >> + flags = pgprot_encrypted(flags);
> > >> + else
> > >> + flags = pgprot_decrypted(flags);
> > >> + }
> > ...
> > > It does seem a bit odd that there's a new CC_ATTR_GUEST_MEM_ENCRYPT
> > > check wrapping this whole thing. I guess the trip through
> > > pgprot_decrypted() is harmless on normal platforms, though.
> >
> > Yeah, that's _really_ odd. Sean, were you trying to optimize away the
> > indirect call or something?

No, my thought was simply to require platforms that support GUEST_MEM_ENCRYPT to
implement x86_platform.hyper.is_private_mmio, e.g. to avoid having to check if
is_private_mmio is NULL, to explicit document that non-Hyper-V encrypted guests
don't (yet) support private MMIO, and to add a bit of documentation around the
{de,en}crypted logic.

> > I would just expect the Hyper-V/vTOM code to leave
> > x86_platform.hyper.is_private_mmio alone unless it *knows* the platform has
> > private MMIO *and* CC_ATTR_GUEST_MEM_ENCRYPT.
>
> Agreed.
>
> >
> > Is there ever a case where CC_ATTR_GUEST_MEM_ENCRYPT==0 and he
> > Hyper-V/vTOM code would need to set x86_platform.hyper.is_private_mmio?
>
> There's no such case.
>
> I agree that gating with CC_ATTR_GUEST_MEM_ENCRYPT isn't really necessary.
> Current upstream code always does the pgprot_decrypted(), and as you said,
> that's a no-op on platforms with no memory encryption.

Right, but since is_private_mmio can be NULL, unless I'm missing something we'll
need an extra check no matter what, i.e. the alternative would be

if (x86_platform.hyper.is_private_mmio &&
x86_platform.hyper.is_private_mmio(phys))
flags = pgprot_encrypted(flags);
else
flags = pgprot_decrypted(flags);

I have no objection to that approach. It does have the advantage of not needing
an indirect call for encrypted guests that don't support private MMIO, though
I can't imagine this code is performance sensitive.

2023-02-23 21:16:07

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

From: Sean Christopherson <[email protected]> Sent: Thursday, February 23, 2023 1:08 PM
>
> On Thu, Feb 23, 2023, Michael Kelley (LINUX) wrote:
> > From: Dave Hansen <[email protected]> Sent: Thursday, February 23, 2023
> 12:42 PM
> > >
> > > On 2/23/23 12:26, Dave Hansen wrote:
> > > >> + if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
> > > >> + /*
> > > >> + * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
> > > >> + * bits, just like normal ioremap():
> > > >> + */
> > > >> + if (x86_platform.hyper.is_private_mmio(phys))
> > > >> + flags = pgprot_encrypted(flags);
> > > >> + else
> > > >> + flags = pgprot_decrypted(flags);
> > > >> + }
> > > ...
> > > > It does seem a bit odd that there's a new CC_ATTR_GUEST_MEM_ENCRYPT
> > > > check wrapping this whole thing. I guess the trip through
> > > > pgprot_decrypted() is harmless on normal platforms, though.
> > >
> > > Yeah, that's _really_ odd. Sean, were you trying to optimize away the
> > > indirect call or something?
>
> No, my thought was simply to require platforms that support GUEST_MEM_ENCRYPT
> to
> implement x86_platform.hyper.is_private_mmio, e.g. to avoid having to check if
> is_private_mmio is NULL, to explicit document that non-Hyper-V encrypted guests
> don't (yet) support private MMIO, and to add a bit of documentation around the
> {de,en}crypted logic.
>
> > > I would just expect the Hyper-V/vTOM code to leave
> > > x86_platform.hyper.is_private_mmio alone unless it *knows* the platform has
> > > private MMIO *and* CC_ATTR_GUEST_MEM_ENCRYPT.
> >
> > Agreed.
> >
> > >
> > > Is there ever a case where CC_ATTR_GUEST_MEM_ENCRYPT==0 and he
> > > Hyper-V/vTOM code would need to set x86_platform.hyper.is_private_mmio?
> >
> > There's no such case.
> >
> > I agree that gating with CC_ATTR_GUEST_MEM_ENCRYPT isn't really necessary.
> > Current upstream code always does the pgprot_decrypted(), and as you said,
> > that's a no-op on platforms with no memory encryption.
>
> Right, but since is_private_mmio can be NULL, unless I'm missing something we'll
> need an extra check no matter what, i.e. the alternative would be
>
> if (x86_platform.hyper.is_private_mmio &&
> x86_platform.hyper.is_private_mmio(phys))
> flags = pgprot_encrypted(flags);
> else
> flags = pgprot_decrypted(flags);
>
> I have no objection to that approach. It does have the advantage of not needing
> an indirect call for encrypted guests that don't support private MMIO, though
> I can't imagine this code is performance sensitive.

Or statically set a default stub function for is_private_mmio() that returns "false".
Then there's no need to check for NULL, and only platforms that want to use it
have to code anything. Several other entries in x86_platform have such defaults.

Michael

2023-02-23 21:25:54

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 2/23/23 13:15, Michael Kelley (LINUX) wrote:
> Or statically set a default stub function for is_private_mmio() that returns "false".
> Then there's no need to check for NULL, and only platforms that want to use it
> have to code anything. Several other entries in x86_platform have such defaults.

Yeah, that's what I was thinking too, like 'x86_op_int_noop':

> struct x86_platform_ops x86_platform __ro_after_init = {
> .calibrate_cpu = native_calibrate_cpu_early,
> .calibrate_tsc = native_calibrate_tsc,
...
> .hyper.pin_vcpu = x86_op_int_noop,

It's kinda silly to do an indirect call to a two-instruction function,
but this is a pretty slow path.

2023-03-06 21:51:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Feb 23, 2023 at 12:27:50PM -0800, Dave Hansen wrote:
> On 2/23/23 12:01, Michael Kelley (LINUX) wrote:
> > Dave Hansen: Are you also OK with Sean's proposal? Looking for consensus
> > here ....
>
> Yeah, I'm generally OK with it as long as Borislav is.

Right, I think we're ok with the following basic rules:

- pure arch/x86/ code should use the x86_platform function pointers to
query hypervisor capabilities/peculiarities

- cc_platform_has() should be used in generic/driver code as it
abstracts away the underlying platform better. IOW, querying
x86_platform.... in generic, platform-agnostic driver code looks weird to
say the least

The hope is that those two should be enough to support most guest types
and not let the zoo get too much out of hand...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-09 11:17:28

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Mon, 2023-03-06 at 22:51 +0100, Borislav Petkov wrote:
> On Thu, Feb 23, 2023 at 12:27:50PM -0800, Dave Hansen wrote:
> > On 2/23/23 12:01, Michael Kelley (LINUX) wrote:
> > > Dave Hansen:  Are you also OK with Sean's proposal?  Looking for consensus
> > > here ....
> >
> > Yeah, I'm generally OK with it as long as Borislav is.
>
> Right, I think we're ok with the following basic rules:
>
> - pure arch/x86/ code should use the x86_platform function pointers to
>   query hypervisor capabilities/peculiarities
>
> - cc_platform_has() should be used in generic/driver code as it
>   abstracts away the underlying platform better. IOW, querying
>   x86_platform.... in generic, platform-agnostic driver code looks weird to
>   say the least
>
> The hope is that those two should be enough to support most guest types
> and not let the zoo get too much out of hand...
>
> Thx.

In
https://lore.kernel.org/all/[email protected]/
I added an sev_es_active() helper for x86 code.

Is that consistent with the vision here, or should I do something different?


Attachments:
smime.p7s (5.83 kB)

2023-03-09 12:00:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

First of all,

thanks for proactively pointing that out instead of simply using what's
there and we get to find out later, only by chance.

Much appreciated. :-)

On Thu, Mar 09, 2023 at 11:12:10AM +0000, David Woodhouse wrote:
> > Right, I think we're ok with the following basic rules:
> >
> > - pure arch/x86/ code should use the x86_platform function pointers to
> >   query hypervisor capabilities/peculiarities
> >
> > - cc_platform_has() should be used in generic/driver code as it
> >   abstracts away the underlying platform better. IOW, querying
> >   x86_platform.... in generic, platform-agnostic driver code looks weird to
> >   say the least
> >
> > The hope is that those two should be enough to support most guest types
> > and not let the zoo get too much out of hand...
> >
> > Thx.
>
> In
> https://lore.kernel.org/all/[email protected]/
> I added an sev_es_active() helper for x86 code.
>
> Is that consistent with the vision here, or should I do something different?

So looking at sev_es_init_vc_handling() where we set that key, I'm
*thinking* that key can be removed now and the code should check

cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)

instead.

Because if some of the checks in that function below fail, the guest
will terminate anyway.

Jörg, Tom?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-09 13:02:59

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, 2023-03-09 at 12:59 +0100, Borislav Petkov wrote:
>
> So looking at sev_es_init_vc_handling() where we set that key, I'm
> *thinking* that key can be removed now and the code should check
>
>   cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)
>
> instead.
>
> Because if some of the checks in that function below fail, the guest
> will terminate anyway.
>
> Jörg, Tom?

Hrm... the implication of that is that I should do something like this
in my own code. Is this really your intent?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index b4265c5b46da..7ac4ec6415de 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1519,24 +1519,17 @@ void __init smp_prepare_cpus_common(void)
*/
static bool prepare_parallel_bringup(void)
{
- bool has_sev_es = sev_es_active();
+ /*
+ * The "generic" CC_ATTR_GUEST_STATE_ENCRYPT actually means specifically
+ * SEV-ES, and only SEV-ES, and always shall mean that. If it's present,
+ * that means the AP startup code should use the hard-coded SEV-ES GHCB
+ * call to find its APIC ID (STARTUP_APICID_SEV_ES).
+ */
+ bool has_sev_es = cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT);

if (IS_ENABLED(CONFIG_X86_32))
return false;

- /*
- * Encrypted guests other than SEV-ES (in the future) will need to
- * implement an early way of finding the APIC ID, since they will
- * presumably block direct CPUID too. Be kind to our future selves
- * by warning here instead of just letting them break. Parallel
- * startup doesn't have to be in the first round of enabling patches
- * for any such technology.
- */
- if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) && !has_sev_es) {
- pr_info("Disabling parallel bringup due to guest memory encryption\n");
- return false;
- }
-
if (x2apic_mode || has_sev_es) {
if (boot_cpu_data.cpuid_level < 0x0b)
return false;


Attachments:
smime.p7s (5.83 kB)

2023-03-09 14:20:27

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On 3/9/23 05:59, Borislav Petkov wrote:
> First of all,
>
> thanks for proactively pointing that out instead of simply using what's
> there and we get to find out later, only by chance.
>
> Much appreciated. :-)
>
> On Thu, Mar 09, 2023 at 11:12:10AM +0000, David Woodhouse wrote:
>>> Right, I think we're ok with the following basic rules:
>>>
>>> - pure arch/x86/ code should use the x86_platform function pointers to
>>>   query hypervisor capabilities/peculiarities
>>>
>>> - cc_platform_has() should be used in generic/driver code as it
>>>   abstracts away the underlying platform better. IOW, querying
>>>   x86_platform.... in generic, platform-agnostic driver code looks weird to
>>>   say the least
>>>
>>> The hope is that those two should be enough to support most guest types
>>> and not let the zoo get too much out of hand...
>>>
>>> Thx.
>>
>> In
>> https://lore.kernel.org/all/[email protected]/
>> I added an sev_es_active() helper for x86 code.
>>
>> Is that consistent with the vision here, or should I do something different?
>
> So looking at sev_es_init_vc_handling() where we set that key, I'm
> *thinking* that key can be removed now and the code should check
>
> cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)
>
> instead.
>
> Because if some of the checks in that function below fail, the guest
> will terminate anyway.
>
> Jörg, Tom?

I believe Joerg added that key for performance reasons, since it is used
on the exception path and can avoid all the calls to cc_platform_has(). I
think that key should stay.

Maybe David can introduce an CC_ATTR_GUEST_SEV_ES attribute that returns
true if the guest is an ES or SNP guest. Or do we introduce a
CC_ATTR_PARALLEL_BOOT attribute that returns true for any SEV guest.

Then the "if cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) && !has_sev_es"
check in arch/x86/kernel/smpboot.c can be removed and the following check
can become if (x2apic_mode || cc_platform_has(CC_ATTR_PARALLEL_BOOT))

Not sure how that affects a TDX guest, though.

Thanks,
Tom

>

2023-03-09 14:37:45

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Mar 09, 2023 at 08:19:58AM -0600, Tom Lendacky wrote:
> I believe Joerg added that key for performance reasons, since it is used on
> the exception path and can avoid all the calls to cc_platform_has(). I think
> that key should stay.

Yes, that is right. The key is mainly for the NMI entry path which can
be performance relevant in some situations. For SEV-ES some special
handling is needed there to re-enable NMIs and adjust the #VC stack in
case it was raised on the VC-handlers entry path.

Regards,

Joerg

2023-03-09 14:52:11

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Mar 09, 2023 at 03:36:45PM +0100, Jörg Rödel wrote:
> Yes, that is right. The key is mainly for the NMI entry path which can
> be performance relevant in some situations. For SEV-ES some special
> handling is needed there to re-enable NMIs and adjust the #VC stack in
> case it was raised on the VC-handlers entry path.

So the performance argument is meh. That key will be replaced by

if (cc_vendor == CC_VENDOR_AMD &&
cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)

which is something like 4 insns or so. Tops.

Haven't looked yet but it should be cheap.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-09 15:46:52

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, 2023-03-09 at 15:45 +0100, Borislav Petkov wrote:
> On Thu, Mar 09, 2023 at 03:36:45PM +0100, Jörg Rödel wrote:
> > Yes, that is right. The key is mainly for the NMI entry path which can
> > be performance relevant in some situations. For SEV-ES some special
> > handling is needed there to re-enable NMIs and adjust the #VC stack in
> > case it was raised on the VC-handlers entry path.
>
> So the performance argument is meh. That key will be replaced by
>
>         if (cc_vendor == CC_VENDOR_AMD &&
>             cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)
>
> which is something like 4 insns or so. Tops.
>
> Haven't looked yet but it should be cheap.


cc_vendor isn't yet exposed. As we discussed this in IRC, I've been
updating the parallel bringup support for SEV-ES, including adding a
cc_get_vendor() function, in the top of my tree at
https://git.infradead.org/users/dwmw2/linux.git/commitdiff/parallel-6.2-v15
and it now looks like this:

/*
* Encrypted guests other than SEV-ES (in the future) will need to
* implement an early way of finding the APIC ID, since they will
* presumably block direct CPUID too. Be kind to our future selves
* by warning here instead of just letting them break. Parallel
* startup doesn't have to be in the first round of enabling patches
* for any such technology.
*/
if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
switch (cc_get_vendor()) {
case CC_VENDOR_AMD:
has_sev_es = true;
break;

default:
pr_info("Disabling parallel bringup due to guest state encryption\n");
return false;
}
}

Using an explicit CC_ATTR_NO_EARLY_CPUID flag instead of
CC_ATTR_GUEST_STATE_ENCRYPT which is merely an approximation of that,
might be interesting.


Attachments:
smime.p7s (5.83 kB)

2023-03-09 16:46:31

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, Mar 09, 2023 at 03:45:35PM +0000, David Woodhouse wrote:
> cc_vendor isn't yet exposed. As we discussed this in IRC, I've been

Right, and we might just as well expose it because having
a setter/getter is kinda silly for a __ro_after_init variable, see
below.

So here's what I was able to scratch up quickly to remove the static
key.

The asm looks like this:

# ./arch/x86/include/asm/sev.h:156: if (cc_vendor == CC_VENDOR_AMD &&
cmpl $1, cc_vendor(%rip) #, cc_vendor
je .L204 #,

...

.L204:
# ./arch/x86/include/asm/sev.h:157: cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
movl $3, %edi #,
call cc_platform_has #
# ./arch/x86/include/asm/sev.h:156: if (cc_vendor == CC_VENDOR_AMD &&
testb %al, %al # tmp134
je .L158 #,

and so I doubt that this is at all measureable comparing that to the
rest of the code that gets executed in the NMI handler. And it lets us
get rid of yet another static key and use only the CC APIs.

Oh, and it bitches because cc_platform_has() is being called in noinstr
region but we can mark it noinstr or add the drop-noinstr markers around
it, if needed. Not that important as that function and what it calls
don't do anything magical.

Thx.

---
diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index 49b44f881484..34446383e68b 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -13,7 +13,7 @@
#include <asm/coco.h>
#include <asm/processor.h>

-static enum cc_vendor vendor __ro_after_init;
+enum cc_vendor cc_vendor __ro_after_init;
static u64 cc_mask __ro_after_init;

static bool intel_cc_platform_has(enum cc_attr attr)
@@ -83,7 +83,7 @@ static bool hyperv_cc_platform_has(enum cc_attr attr)

bool cc_platform_has(enum cc_attr attr)
{
- switch (vendor) {
+ switch (cc_vendor) {
case CC_VENDOR_AMD:
return amd_cc_platform_has(attr);
case CC_VENDOR_INTEL:
@@ -105,7 +105,7 @@ u64 cc_mkenc(u64 val)
* - for AMD, bit *set* means the page is encrypted
* - for Intel *clear* means encrypted.
*/
- switch (vendor) {
+ switch (cc_vendor) {
case CC_VENDOR_AMD:
return val | cc_mask;
case CC_VENDOR_INTEL:
@@ -118,7 +118,7 @@ u64 cc_mkenc(u64 val)
u64 cc_mkdec(u64 val)
{
/* See comment in cc_mkenc() */
- switch (vendor) {
+ switch (cc_vendor) {
case CC_VENDOR_AMD:
return val & ~cc_mask;
case CC_VENDOR_INTEL:
@@ -131,7 +131,7 @@ EXPORT_SYMBOL_GPL(cc_mkdec);

__init void cc_set_vendor(enum cc_vendor v)
{
- vendor = v;
+ cc_vendor = v;
}

__init void cc_set_mask(u64 mask)
diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h
index 3d98c3a60d34..0563e07a1002 100644
--- a/arch/x86/include/asm/coco.h
+++ b/arch/x86/include/asm/coco.h
@@ -11,6 +11,8 @@ enum cc_vendor {
CC_VENDOR_INTEL,
};

+extern enum cc_vendor cc_vendor;
+
void cc_set_vendor(enum cc_vendor v);
void cc_set_mask(u64 mask);

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ebc271bb6d8e..1335781e4976 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -12,6 +12,7 @@
#include <asm/insn.h>
#include <asm/sev-common.h>
#include <asm/bootparam.h>
+#include <asm/coco.h>

#define GHCB_PROTOCOL_MIN 1ULL
#define GHCB_PROTOCOL_MAX 2ULL
@@ -134,24 +135,26 @@ struct snp_secrets_page_layout {
} __packed;

#ifdef CONFIG_AMD_MEM_ENCRYPT
-extern struct static_key_false sev_es_enable_key;
extern void __sev_es_ist_enter(struct pt_regs *regs);
extern void __sev_es_ist_exit(void);
static __always_inline void sev_es_ist_enter(struct pt_regs *regs)
{
- if (static_branch_unlikely(&sev_es_enable_key))
+ if (cc_vendor == CC_VENDOR_AMD &&
+ cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
__sev_es_ist_enter(regs);
}
static __always_inline void sev_es_ist_exit(void)
{
- if (static_branch_unlikely(&sev_es_enable_key))
+ if (cc_vendor == CC_VENDOR_AMD &&
+ cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
__sev_es_ist_exit();
}
extern int sev_es_setup_ap_jump_table(struct real_mode_header *rmh);
extern void __sev_es_nmi_complete(void);
static __always_inline void sev_es_nmi_complete(void)
{
- if (static_branch_unlikely(&sev_es_enable_key))
+ if (cc_vendor == CC_VENDOR_AMD &&
+ cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 679026a640ef..7d873bffbd8e 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -111,7 +111,6 @@ struct ghcb_state {
};

static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
-DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);

static DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);

@@ -1393,9 +1392,6 @@ void __init sev_es_init_vc_handling(void)
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
}

- /* Enable SEV-ES special handling */
- static_branch_enable(&sev_es_enable_key);
-
/* Initialize per-cpu GHCB pages */
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2023-03-10 10:06:54

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v5 06/14] x86/ioremap: Support hypervisor specified range to map as encrypted

On Thu, 2023-03-09 at 17:34 +0100, Borislav Petkov wrote:
> On Thu, Mar 09, 2023 at 03:45:35PM +0000, David Woodhouse wrote:
> > cc_vendor isn't yet exposed. As we discussed this in IRC, I've been
>
> Right, and we might just as well expose it because having
> a setter/getter is kinda silly for a __ro_after_init variable, see
> below.
>
> So here's what I was able to scratch up quickly to remove the static
> key.
>
> The asm looks like this:

Seems reasonable. I might just let you iterate on that, drop the SNP-ES
part from the parallel boot series, then let Tom or someone send it in
later once the dust settles.


Attachments:
smime.p7s (5.83 kB)