This patch series adds support for PCI pass-thru devices to Hyper-V
Confidential VMs (also called "Isolation VMs"). But in preparation, it
first changes how private (encrypted) vs. shared (decrypted) memory is
handled in Hyper-V SEV-SNP guest VMs. The new approach builds on the
confidential computing (coco) mechanisms introduced in the 5.19 kernel
for TDX support and significantly reduces the amount of Hyper-V specific
code. Furthermore, with this new approach a proposed RFC patch set for
generic DMA layer functionality[1] is no longer necessary.
Background
==========
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves. vTOM is the dividing line where the uppermost bit of the
physical address space is set; e.g., with 47 bits of guest physical
address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical
memory is accessible at two parallel physical addresses -- one below
vTOM and one above vTOM. Accesses below vTOM are private (encrypted)
while accesses above vTOM are shared (decrypted). In this sense, vTOM
is like the GPA.SHARED bit in Intel TDX.
In Hyper-V's use of vTOM, the normal guest OS runs at VMPL2, while
a Hyper-V provided "paravisor" runs at VMPL0 in the guest VM. (VMPL is
Virtual Machine Privilege Level. See AMD's SEV-SNP spec for more
details.) The paravisor provides emulation for various system devices
like the IO-APIC and TPM as part of the guest VM. Accesses to such
devices made by the normal guest OS trap to the paravisor and are emulated
in the guest VM context instead of in the Hyper-V host. This emulation is
invisible to the normal guest OS, but with the quirk that memory mapped
I/O accesses to these devices must be treated as private, not shared as
would be the case for other device accesses.
Support for Hyper-V guests using vTOM was added to the Linux kernel
in two patch sets[2][3]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, the core
approach is to create a second kernel virtual mapping that maps to
parallel physical addresses above vTOM, while leaving the original
mapping unchanged. Most of the code for creating that second virtual
mapping is confined to Hyper-V specific areas, but there are also
changes to generic swiotlb code.
Changes in this patch set
=========================
In preparation for supporting PCI pass-thru devices, this patch set
changes the core approach for handling vTOM. In the new approach,
the vTOM bit is treated as a protection flag, and not as part of
the physical address. This new approach is like the approach for
the GPA.SHARED bit in Intel TDX. Furthermore, there's no need to
create a second kernel virtual mapping. When memory is changed
between private and shared using set_memory_decrypted() and
set_memory_encrypted(), the PTEs for the existing kernel mapping
are changed to add or remove the vTOM bit just as with TDX. The
hypercalls to change the memory status on the host side are made
using the existing callback mechanism. Everything just works, with
a minor tweak to map the IO-APIC and TPM to use private accesses
as mentioned above.
With the new handling of vTOM in place, existing Hyper-V code that
creates the second kernel virtual mapping still works, but it is now
redundant as the original kernel virtual mapping (as updated) maps
to the same physical address. To simplify things going forward, this
patch set removes the code that creates the second kernel virtual
mapping. And since a second kernel virtual mapping is no longer
needed, changes to the DMA layer proposed as an RFC[1] are no
longer needed.
Finally, to support PCI pass-thru in a Confidential VM, Hyper-V
requires that all accesses to PCI config space be emulated using
a hypercall. This patch set adds functions to invoke those
hypercalls and uses them in the config space access functions
in the Hyper-V PCI driver. Lastly, the Hyper-V PCI driver is
marked as allowed to be used in a Confidential VM. The Hyper-V
PCI driver has been hardened against a malicious Hyper-V in a
previous patch set.[4]
Patch Organization
==================
Patches 1 thru 5 are prepatory patches to account for
slightly different assumptions when running in a Hyper-V VM
with vTOM, and to make temporary tweaks to avoid needing a single
large patch to make the transition from the old approach to
the new approach.
Patch 6 enables the new approach to handling vTOM for Hyper-V
guest VMs. This is the core patch after which the new approach
is in effect.
Patches 7 thru 10 remove existing code for creating the second
kernel virtual mapping that is no longer necessary with the
new approach.
Patch 11 updates existing code so that it no longer assumes that
the vTOM bit is part of the physical address.
Patches 12 and 13 add new hypercalls for accessing MMIO space
and use those hypercalls for PCI config space. They also enable
the Hyper-V vPCI driver to be used in a Confidential VM.
These patches are built against the linux-next20230307 tree.
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/
---
Changes in v6:
* Redo Patch 1 to use a hypervisor callback to determine if an
MMIO mapping should be private. [Sean Christopherson]
* Remove Patch 6 since it is replaced by the new hypervisor
callback in Patch 1
* Update the previous Patch 7 (now Patch 6) to set up the
new hypervisor callback and return "true" when mapping the
IO-APIC or TPM.
* Minor tweaks to rebase to linux-next20230307
Changes in v5:
* Add new Patch 6 and update hv_vtom_init() in Patch 7 so that
the virtual TPM in the guest is mapped as encrypted
* Update commit messages for Patches 1 thru 4, and 12 [Boris Petkov]
* Remove the Fixes tag in Patch 4 after discussion upstream
[Boris Petkov, Tom Lendacky, others]
* Use PAGE_KERNEL instead of PAGE_KERNEL_NOENC in arch independent
Hyper-V code. PAGE_KERNEL_NOENC doesn't exist for ARM64, so it
causes compile errors. Using PAGE_KERNEL means or'ing in
sme_me_mask when on x86, but it will be zero for vTOM VMs.
* In patch 6, break out amd_cc_platform_has() handling of vTOM
into a separate helper function [Boris Petkov]
Changes in v4:
* Remove previous Patch 1 from this series and submit separately
[Dave Hansen & Boris Petkov]
* Patch 1: Change the name of the new CC_ATTR that controls
whether the IO-APIC is mapped decrypted [Boris Petkov]
* Patch 4: Use sme_me_mask directly instead of calling the
getter function. Add Fixes: tag. [Tom Lendacky]
* Patch 6: Remove CC_VENDOR_HYPERV and merge associated
vTOM functionality under CC_VENDOR_AMD. [Boris Petkov]
* Patch 8: Use bitwise OR to pick up the vTOM bit in
shared_gpa_boundary rather than adding it
Changes in v3:
* Patch 1: Tweak the code fix to cleanly separate the page
alignment and physical address masking [Dave Hansen]
* Patch 2: Change the name of the new CC_ATTR that controls
whether the IO-APIC is mapped decrypted [Dave Hansen]
* Patch 5 (now patch 7): Add CC_ATTR_MEM_ENCRYPT to what
Hyper-V vTOM reports as 'true'. With the addition, Patches
5 and 6 are new to accomodate working correctly with Hyper-V
VMs using vTOM. [Tom Lendacky]
Changes in v2:
* Patch 11: Include more detail in the error message if an MMIO
hypercall fails. [Bjorn Helgaas]
* Patch 12: Restore removed memory barriers. It seems like these
barriers should not be needed because of the spin_unlock() calls,
but commit bdd74440d9e8 indicates that they are. This patch series
will leave the barriers unchanged; whether they are really needed
can be sorted out separately. [Boqun Feng]
Michael Kelley (13):
x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
x86/hyperv: Reorder code to facilitate future work
Drivers: hv: Explicitly request decrypted in vmap_pfn() calls
x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
init: Call mem_encrypt_init() after Hyper-V hypercall init is done
x86/hyperv: Change vTOM handling to use standard coco mechanisms
swiotlb: Remove bounce buffer remapping for Hyper-V
Drivers: hv: vmbus: Remove second mapping of VMBus monitor pages
Drivers: hv: vmbus: Remove second way of mapping ring buffers
hv_netvsc: Remove second mapping of send and recv buffers
Drivers: hv: Don't remap addresses that are above shared_gpa_boundary
PCI: hv: Add hypercalls to read/write MMIO space
PCI: hv: Enable PCI pass-thru devices in Confidential VMs
arch/x86/coco/core.c | 42 +++++--
arch/x86/hyperv/hv_init.c | 18 +--
arch/x86/hyperv/ivm.c | 148 +++++++++++++----------
arch/x86/include/asm/coco.h | 1 -
arch/x86/include/asm/hyperv-tlfs.h | 3 +
arch/x86/include/asm/mshyperv.h | 16 ++-
arch/x86/include/asm/x86_init.h | 4 +
arch/x86/kernel/apic/io_apic.c | 16 ++-
arch/x86/kernel/cpu/mshyperv.c | 22 ++--
arch/x86/kernel/x86_init.c | 2 +
arch/x86/mm/ioremap.c | 5 +
arch/x86/mm/mem_encrypt_amd.c | 10 +-
arch/x86/mm/pat/set_memory.c | 3 -
drivers/hv/Kconfig | 1 -
drivers/hv/channel_mgmt.c | 2 +-
drivers/hv/connection.c | 113 +++++-------------
drivers/hv/hv.c | 23 ++--
drivers/hv/hv_common.c | 11 --
drivers/hv/hyperv_vmbus.h | 2 -
drivers/hv/ring_buffer.c | 62 ++++------
drivers/hv/vmbus_drv.c | 1 -
drivers/net/hyperv/hyperv_net.h | 2 -
drivers/net/hyperv/netvsc.c | 48 +-------
drivers/pci/controller/pci-hyperv.c | 232 ++++++++++++++++++++++++++----------
include/asm-generic/hyperv-tlfs.h | 22 ++++
include/asm-generic/mshyperv.h | 4 +-
include/linux/swiotlb.h | 2 -
init/main.c | 19 +--
kernel/dma/swiotlb.c | 45 +------
29 files changed, 440 insertions(+), 439 deletions(-)
--
1.8.3.1
Current code always maps MMIO devices as shared (decrypted) in a
confidential computing VM. But Hyper-V guest VMs on AMD SEV-SNP with vTOM
use a paravisor running in VMPL0 to emulate some devices, such as the
IO-APIC and TPM. In such a case, the device must be accessed as private
(encrypted) because the paravisor emulates the device at an address below
vTOM, where all accesses are encrypted.
Add a new hypervisor callback to determine if an MMIO address should
be mapped private. The callback allows hypervisor-specific code to handle
any quirks, the use of a paravisor, etc. in determining whether a mapping
must be private. If the callback is not used by a hypervisor, default
to returning "false", which is consistent with normal coco VM behavior.
Use this callback as another special case to check for when doing ioremap.
Just checking the starting address is sufficient as an ioremap range must
be all private or all shared.
Also make the callback in early boot IO-APIC mapping code that uses the
fixmap.
Signed-off-by: Michael Kelley <[email protected]>
---
arch/x86/include/asm/x86_init.h | 4 ++++
arch/x86/kernel/apic/io_apic.c | 16 +++++++++++-----
arch/x86/kernel/x86_init.c | 2 ++
arch/x86/mm/ioremap.c | 5 +++++
4 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index c1c8c58..6f873c6 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -259,11 +259,15 @@ struct x86_legacy_features {
* VMMCALL under SEV-ES. Needs to return 'false'
* if the checks fail. Called from the #VC
* exception handler.
+ * @is_private_mmio: For Coco VM, must map MMIO address as private.
+ * Used when device is emulated by a paravisor
+ * layer in the VM context.
*/
struct x86_hyper_runtime {
void (*pin_vcpu)(int cpu);
void (*sev_es_hcall_prepare)(struct ghcb *ghcb, struct pt_regs *regs);
bool (*sev_es_hcall_finish)(struct ghcb *ghcb, struct pt_regs *regs);
+ bool (*is_private_mmio)(u64 addr);
};
/**
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 1f83b05..88cb8a6 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -66,6 +66,7 @@
#include <asm/hw_irq.h>
#include <asm/apic.h>
#include <asm/pgtable.h>
+#include <asm/x86_init.h>
#define for_each_ioapic(idx) \
for ((idx) = 0; (idx) < nr_ioapics; (idx)++)
@@ -2679,11 +2680,16 @@ static void io_apic_set_fixmap(enum fixed_addresses idx, phys_addr_t phys)
{
pgprot_t flags = FIXMAP_PAGE_NOCACHE;
- /*
- * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
- * bits, just like normal ioremap():
- */
- flags = pgprot_decrypted(flags);
+ if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
+ /*
+ * Ensure fixmaps for IOAPIC MMIO respect memory encryption
+ * pgprot bits, just like normal ioremap():
+ */
+ if (x86_platform.hyper.is_private_mmio(phys))
+ flags = pgprot_encrypted(flags);
+ else
+ flags = pgprot_decrypted(flags);
+ }
__set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index ef80d36..95be383 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -134,6 +134,7 @@ static void enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool
static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return false; }
static bool enc_tlb_flush_required_noop(bool enc) { return false; }
static bool enc_cache_flush_required_noop(void) { return false; }
+static bool is_private_mmio_noop(u64 addr) {return false; }
struct x86_platform_ops x86_platform __ro_after_init = {
.calibrate_cpu = native_calibrate_cpu_early,
@@ -149,6 +150,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
.realmode_reserve = reserve_real_mode,
.realmode_init = init_real_mode,
.hyper.pin_vcpu = x86_op_int_noop,
+ .hyper.is_private_mmio = is_private_mmio_noop,
.guest = {
.enc_status_change_prepare = enc_status_change_prepare_noop,
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 6453fba..aa7d279 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -116,6 +116,11 @@ static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *des
if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;
+ if (x86_platform.hyper.is_private_mmio(addr)) {
+ desc->flags |= IORES_MAP_ENCRYPTED;
+ return;
+ }
+
if (!IS_ENABLED(CONFIG_EFI))
return;
--
1.8.3.1
Reorder some code to facilitate future work. No functional
change.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 68 +++++++++++++++++++++++++--------------------------
1 file changed, 34 insertions(+), 34 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 1dbcbd9..f33c67e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -235,40 +235,6 @@ void hv_ghcb_msr_read(u64 msr, u64 *value)
EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
#endif
-enum hv_isolation_type hv_get_isolation_type(void)
-{
- if (!(ms_hyperv.priv_high & HV_ISOLATION))
- return HV_ISOLATION_TYPE_NONE;
- return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-/*
- * hv_is_isolation_supported - Check system runs in the Hyper-V
- * isolation VM.
- */
-bool hv_is_isolation_supported(void)
-{
- if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
- return false;
-
- if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
- return false;
-
- return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-
-DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
-
-/*
- * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
- * isolation VM.
- */
-bool hv_isolation_type_snp(void)
-{
- return static_branch_unlikely(&isolation_type_snp);
-}
-
/*
* hv_mark_gpa_visibility - Set pages visible to host via hvcall.
*
@@ -387,3 +353,37 @@ void hv_unmap_memory(void *addr)
{
vunmap(addr);
}
+
+enum hv_isolation_type hv_get_isolation_type(void)
+{
+ if (!(ms_hyperv.priv_high & HV_ISOLATION))
+ return HV_ISOLATION_TYPE_NONE;
+ return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
+}
+EXPORT_SYMBOL_GPL(hv_get_isolation_type);
+
+/*
+ * hv_is_isolation_supported - Check system runs in the Hyper-V
+ * isolation VM.
+ */
+bool hv_is_isolation_supported(void)
+{
+ if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+ return false;
+
+ if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
+ return false;
+
+ return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
+}
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+/*
+ * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
+ * isolation VM.
+ */
+bool hv_isolation_type_snp(void)
+{
+ return static_branch_unlikely(&isolation_type_snp);
+}
--
1.8.3.1
Update vmap_pfn() calls to explicitly request that the mapping
be for decrypted access to the memory. There's no change in
functionality since the PFNs passed to vmap_pfn() are above the
shared_gpa_boundary, implicitly producing a decrypted mapping.
But explicitly requesting "decrypted" allows the code to work
before and after changes that cause vmap_pfn() to mask the
PFNs to being below the shared_gpa_boundary.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 2 +-
drivers/hv/ring_buffer.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index f33c67e..5648efb 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -343,7 +343,7 @@ void *hv_map_memory(void *addr, unsigned long size)
pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
(ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
- vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+ vaddr = vmap_pfn(pfns, size / PAGE_SIZE, pgprot_decrypted(PAGE_KERNEL));
kfree(pfns);
return vaddr;
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index c6692fd..2111e97 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -211,7 +211,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
ring_info->ring_buffer = (struct hv_ring_buffer *)
vmap_pfn(pfns_wraparound, page_cnt * 2 - 1,
- PAGE_KERNEL);
+ pgprot_decrypted(PAGE_KERNEL));
kfree(pfns_wraparound);
if (!ring_info->ring_buffer)
--
1.8.3.1
sme_postprocess_startup() decrypts the bss_decrypted section when
sme_me_mask is non-zero.
mem_encrypt_free_decrypted_mem() re-encrypts the unused portion based
on CC_ATTR_MEM_ENCRYPT.
In a Hyper-V guest VM using vTOM, these conditions are not equivalent
as sme_me_mask is always zero when using vTOM. Consequently,
mem_encrypt_free_decrypted_mem() attempts to re-encrypt memory that was
never decrypted.
So check sme_me_mask in mem_encrypt_free_decrypted_mem() too.
Hyper-V guests using vTOM don't need the bss_decrypted section to be
decrypted, so skipping the decryption/re-encryption doesn't cause a
problem.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
arch/x86/mm/mem_encrypt_amd.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 9c4d8db..e0b51c0 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -513,10 +513,14 @@ void __init mem_encrypt_free_decrypted_mem(void)
npages = (vaddr_end - vaddr) >> PAGE_SHIFT;
/*
- * The unused memory range was mapped decrypted, change the encryption
- * attribute from decrypted to encrypted before freeing it.
+ * If the unused memory range was mapped decrypted, change the encryption
+ * attribute from decrypted to encrypted before freeing it. Base the
+ * re-encryption on the same condition used for the decryption in
+ * sme_postprocess_startup(). Higher level abstractions, such as
+ * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM
+ * using vTOM, where sme_me_mask is always zero.
*/
- if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
+ if (sme_me_mask) {
r = set_memory_encrypted(vaddr, npages);
if (r) {
pr_warn("failed to free unused decrypted pages\n");
--
1.8.3.1
Full Hyper-V initialization, including support for hypercalls, is done
as an apic_post_init callback via late_time_init(). mem_encrypt_init()
needs to make hypercalls when it marks swiotlb memory as decrypted.
But mem_encrypt_init() is currently called a few lines before
late_time_init(), so the hypercalls don't work.
Fix this by moving mem_encrypt_init() after late_time_init() and
related clock initializations. The intervening initializations don't
do any I/O that requires the swiotlb, so moving mem_encrypt_init()
slightly later has no impact.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
init/main.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/init/main.c b/init/main.c
index 4425d17..7e9c0ca 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1088,14 +1088,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
*/
locking_selftest();
- /*
- * This needs to be called before any devices perform DMA
- * operations that might use the SWIOTLB bounce buffers. It will
- * mark the bounce buffers as decrypted so that their usage will
- * not cause "plain-text" data to be decrypted when accessed.
- */
- mem_encrypt_init();
-
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
@@ -1112,6 +1104,17 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
late_time_init();
sched_clock_init();
calibrate_delay();
+
+ /*
+ * This needs to be called before any devices perform DMA
+ * operations that might use the SWIOTLB bounce buffers. It will
+ * mark the bounce buffers as decrypted so that their usage will
+ * not cause "plain-text" data to be decrypted when accessed. It
+ * must be called after late_time_init() so that Hyper-V x86/x64
+ * hypercalls work when the SWIOTLB bounce buffers are decrypted.
+ */
+ mem_encrypt_init();
+
pid_idr_init();
anon_vma_init();
#ifdef CONFIG_X86
--
1.8.3.1
Hyper-V guests on AMD SEV-SNP hardware have the option of using the
"virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP
architecture. With vTOM, shared vs. private memory accesses are
controlled by splitting the guest physical address space into two
halves. vTOM is the dividing line where the uppermost bit of the
physical address space is set; e.g., with 47 bits of guest physical
address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical
memory is accessible at two parallel physical addresses -- one below
vTOM and one above vTOM. Accesses below vTOM are private (encrypted)
while accesses above vTOM are shared (decrypted). In this sense, vTOM
is like the GPA.SHARED bit in Intel TDX.
Support for Hyper-V guests using vTOM was added to the Linux kernel in
two patch sets[1][2]. This support treats the vTOM bit as part of
the physical address. For accessing shared (decrypted) memory, these
patch sets create a second kernel virtual mapping that maps to physical
addresses above vTOM.
A better approach is to treat the vTOM bit as a protection flag, not
as part of the physical address. This new approach is like the approach
for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel
virtual mapping, the existing mapping is updated using recently added
coco mechanisms. When memory is changed between private and shared using
set_memory_decrypted() and set_memory_encrypted(), the PTEs for the
existing kernel mapping are changed to add or remove the vTOM bit
in the guest physical address, just as with TDX. The hypercalls to
change the memory status on the host side are made using the existing
callback mechanism. Everything just works, with a minor tweak to map
the IO-APIC to use private accesses.
To accomplish the switch in approach, the following must be done:
* Update Hyper-V initialization to set the cc_mask based on vTOM
and do other coco initialization.
* Update physical_mask so the vTOM bit is no longer treated as part
of the physical address
* Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality
under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear
the vTOM bit as a protection flag.
* Code already exists to make hypercalls to inform Hyper-V about pages
changing between shared and private. Update this code to run as a
callback from __set_memory_enc_pgtable().
* Remove the Hyper-V special case from __set_memory_enc_dec()
* Remove the Hyper-V specific call to swiotlb_update_mem_attributes()
since mem_encrypt_init() will now do it.
* Add a Hyper-V specific implementation of the is_private_mmio()
callback that returns true for the IO-APIC and vTPM MMIO addresses
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
Signed-off-by: Michael Kelley <[email protected]>
---
arch/x86/coco/core.c | 42 ++++++++++++++++++------
arch/x86/hyperv/hv_init.c | 11 -------
arch/x86/hyperv/ivm.c | 72 ++++++++++++++++++++++++++++++++++-------
arch/x86/include/asm/coco.h | 1 -
arch/x86/include/asm/mshyperv.h | 16 +++++----
arch/x86/kernel/cpu/mshyperv.c | 15 ++++-----
arch/x86/mm/pat/set_memory.c | 3 --
drivers/hv/vmbus_drv.c | 1 -
include/asm-generic/mshyperv.h | 2 ++
9 files changed, 112 insertions(+), 51 deletions(-)
diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index 49b44f8..d1c3306 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -29,6 +29,18 @@ static bool intel_cc_platform_has(enum cc_attr attr)
}
}
+/* Helper function for AMD SEV-SNP vTOM case */
+static __maybe_unused bool amd_cc_platform_vtom(enum cc_attr attr)
+{
+ switch (attr) {
+ case CC_ATTR_GUEST_MEM_ENCRYPT:
+ case CC_ATTR_MEM_ENCRYPT:
+ return true;
+ default:
+ return false;
+ }
+}
+
/*
* SME and SEV are very similar but they are not the same, so there are
* times that the kernel will need to distinguish between SME and SEV. The
@@ -41,9 +53,20 @@ static bool intel_cc_platform_has(enum cc_attr attr)
* up under SME the trampoline area cannot be encrypted, whereas under SEV
* the trampoline area must be encrypted.
*/
+
static bool amd_cc_platform_has(enum cc_attr attr)
{
#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+ /*
+ * Handle the SEV-SNP vTOM case where sme_me_mask is zero, and
+ * the other levels of SME/SEV functionality, including C-bit
+ * based SEV-SNP, are not enabled.
+ */
+ if (sev_status & MSR_AMD64_SNP_VTOM)
+ return amd_cc_platform_vtom(attr);
+
+ /* Handle the C-bit case */
switch (attr) {
case CC_ATTR_MEM_ENCRYPT:
return sme_me_mask;
@@ -76,11 +99,6 @@ static bool amd_cc_platform_has(enum cc_attr attr)
#endif
}
-static bool hyperv_cc_platform_has(enum cc_attr attr)
-{
- return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
-}
-
bool cc_platform_has(enum cc_attr attr)
{
switch (vendor) {
@@ -88,8 +106,6 @@ bool cc_platform_has(enum cc_attr attr)
return amd_cc_platform_has(attr);
case CC_VENDOR_INTEL:
return intel_cc_platform_has(attr);
- case CC_VENDOR_HYPERV:
- return hyperv_cc_platform_has(attr);
default:
return false;
}
@@ -103,11 +119,14 @@ u64 cc_mkenc(u64 val)
* encryption status of the page.
*
* - for AMD, bit *set* means the page is encrypted
- * - for Intel *clear* means encrypted.
+ * - for AMD with vTOM and for Intel, *clear* means encrypted
*/
switch (vendor) {
case CC_VENDOR_AMD:
- return val | cc_mask;
+ if (sev_status & MSR_AMD64_SNP_VTOM)
+ return val & ~cc_mask;
+ else
+ return val | cc_mask;
case CC_VENDOR_INTEL:
return val & ~cc_mask;
default:
@@ -120,7 +139,10 @@ u64 cc_mkdec(u64 val)
/* See comment in cc_mkenc() */
switch (vendor) {
case CC_VENDOR_AMD:
- return val & ~cc_mask;
+ if (sev_status & MSR_AMD64_SNP_VTOM)
+ return val | cc_mask;
+ else
+ return val & ~cc_mask;
case CC_VENDOR_INTEL:
return val | cc_mask;
default:
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 41ef036..edbc67e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -29,7 +29,6 @@
#include <linux/syscore_ops.h>
#include <clocksource/hyperv_timer.h>
#include <linux/highmem.h>
-#include <linux/swiotlb.h>
int hyperv_init_cpuhp;
u64 hv_current_partition_id = ~0ull;
@@ -504,16 +503,6 @@ void __init hyperv_init(void)
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);
-#ifdef CONFIG_SWIOTLB
- /*
- * Swiotlb bounce buffer needs to be mapped in extra address
- * space. Map function doesn't work in the early place and so
- * call swiotlb_update_mem_attributes() here.
- */
- if (hv_is_isolation_supported())
- swiotlb_update_mem_attributes();
-#endif
-
return;
clean_guest_os_id:
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 5648efb..f6a020c 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -13,6 +13,8 @@
#include <asm/svm.h>
#include <asm/sev.h>
#include <asm/io.h>
+#include <asm/coco.h>
+#include <asm/mem_encrypt.h>
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
@@ -233,7 +235,6 @@ void hv_ghcb_msr_read(u64 msr, u64 *value)
local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(hv_ghcb_msr_read);
-#endif
/*
* hv_mark_gpa_visibility - Set pages visible to host via hvcall.
@@ -286,27 +287,25 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
}
/*
- * hv_set_mem_host_visibility - Set specified memory visible to host.
+ * hv_vtom_set_host_visibility - Set specified memory visible to host.
*
* In Isolation VM, all guest memory is encrypted from host and guest
* needs to set memory visible to host via hvcall before sharing memory
* with host. This function works as wrap of hv_mark_gpa_visibility()
* with memory base and size.
*/
-int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visible)
+static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bool enc)
{
- enum hv_mem_host_visibility visibility = visible ?
- VMBUS_PAGE_VISIBLE_READ_WRITE : VMBUS_PAGE_NOT_VISIBLE;
+ enum hv_mem_host_visibility visibility = enc ?
+ VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE;
u64 *pfn_array;
int ret = 0;
+ bool result = true;
int i, pfn;
- if (!hv_is_isolation_supported() || !hv_hypercall_pg)
- return 0;
-
pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
if (!pfn_array)
- return -ENOMEM;
+ return false;
for (i = 0, pfn = 0; i < pagecount; i++) {
pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
@@ -315,17 +314,68 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl
if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
ret = hv_mark_gpa_visibility(pfn, pfn_array,
visibility);
- if (ret)
+ if (ret) {
+ result = false;
goto err_free_pfn_array;
+ }
pfn = 0;
}
}
err_free_pfn_array:
kfree(pfn_array);
- return ret;
+ return result;
}
+static bool hv_vtom_tlb_flush_required(bool private)
+{
+ return true;
+}
+
+static bool hv_vtom_cache_flush_required(void)
+{
+ return false;
+}
+
+static bool hv_is_private_mmio(u64 addr)
+{
+ /*
+ * Hyper-V always provides a single IO-APIC in a guest VM.
+ * When a paravisor is used, it is emulated by the paravisor
+ * in the guest context and must be mapped private.
+ */
+ if (addr >= HV_IOAPIC_BASE_ADDRESS &&
+ addr < (HV_IOAPIC_BASE_ADDRESS + PAGE_SIZE))
+ return true;
+
+ /* Same with a vTPM */
+ if (addr >= VTPM_BASE_ADDRESS &&
+ addr < (VTPM_BASE_ADDRESS + PAGE_SIZE))
+ return true;
+
+ return false;
+}
+
+void __init hv_vtom_init(void)
+{
+ /*
+ * By design, a VM using vTOM doesn't see the SEV setting,
+ * so SEV initialization is bypassed and sev_status isn't set.
+ * Set it here to indicate a vTOM VM.
+ */
+ sev_status = MSR_AMD64_SNP_VTOM;
+ cc_set_vendor(CC_VENDOR_AMD);
+ cc_set_mask(ms_hyperv.shared_gpa_boundary);
+ physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
+
+ x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
+ x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
+ x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
+ x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
+}
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
/*
* hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
*/
diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h
index 3d98c3a..d2c6a2e 100644
--- a/arch/x86/include/asm/coco.h
+++ b/arch/x86/include/asm/coco.h
@@ -7,7 +7,6 @@
enum cc_vendor {
CC_VENDOR_NONE,
CC_VENDOR_AMD,
- CC_VENDOR_HYPERV,
CC_VENDOR_INTEL,
};
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 4c4c0ec..e3cef98 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,14 @@
#include <asm/paravirt.h>
#include <asm/mshyperv.h>
+/*
+ * Hyper-V always provides a single IO-APIC at this MMIO address.
+ * Ideally, the value should be looked up in ACPI tables, but it
+ * is needed for mapping the IO-APIC early in boot on Confidential
+ * VMs, before ACPI functions can be used.
+ */
+#define HV_IOAPIC_BASE_ADDRESS 0xfec00000
+
union hv_ghcb;
DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
@@ -206,18 +214,19 @@ static inline void hv_apic_init(void) {}
int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
struct hv_interrupt_entry *entry);
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
#ifdef CONFIG_AMD_MEM_ENCRYPT
void hv_ghcb_msr_write(u64 msr, u64 value);
void hv_ghcb_msr_read(u64 msr, u64 *value);
bool hv_ghcb_negotiate_protocol(void);
void hv_ghcb_terminate(unsigned int set, unsigned int reason);
+void hv_vtom_init(void);
#else
static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
static inline bool hv_ghcb_negotiate_protocol(void) { return false; }
static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {}
+static inline void hv_vtom_init(void) {}
#endif
extern bool hv_isolation_type_snp(void);
@@ -259,11 +268,6 @@ static inline void hv_set_register(unsigned int reg, u64 value) { }
static inline u64 hv_get_register(unsigned int reg) { return 0; }
static inline void hv_set_non_nested_register(unsigned int reg, u64 value) { }
static inline u64 hv_get_non_nested_register(unsigned int reg) { return 0; }
-static inline int hv_set_mem_host_visibility(unsigned long addr, int numpages,
- bool visible)
-{
- return -1;
-}
#endif /* CONFIG_HYPERV */
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index f36dc2f..ded7506 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -33,7 +33,6 @@
#include <asm/nmi.h>
#include <clocksource/hyperv_timer.h>
#include <asm/numa.h>
-#include <asm/coco.h>
/* Is Linux running as the root partition? */
bool hv_root_partition;
@@ -397,8 +396,10 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.priv_high & HV_ISOLATION) {
ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
- ms_hyperv.shared_gpa_boundary =
- BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
+
+ if (ms_hyperv.shared_gpa_boundary_active)
+ ms_hyperv.shared_gpa_boundary =
+ BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
@@ -409,11 +410,6 @@ static void __init ms_hyperv_init_platform(void)
swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
#endif
}
- /* Isolation VMs are unenlightened SEV-based VMs, thus this check: */
- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
- if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE)
- cc_set_vendor(CC_VENDOR_HYPERV);
- }
}
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
@@ -482,6 +478,9 @@ static void __init ms_hyperv_init_platform(void)
i8253_clear_counter_on_shutdown = false;
#if IS_ENABLED(CONFIG_HYPERV)
+ if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) ||
+ (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP))
+ hv_vtom_init();
/*
* Setup the hook to get control post apic initialization.
*/
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 356758b..b037954 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2175,9 +2175,6 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
- if (hv_is_isolation_supported())
- return hv_set_mem_host_visibility(addr, numpages, !enc);
-
if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
return __set_memory_enc_pgtable(addr, numpages, enc);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index d24dd65..e9e1c41 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2156,7 +2156,6 @@ void vmbus_device_unregister(struct hv_device *device_obj)
* VMBUS is an acpi enumerated device. Get the information we
* need from DSDT.
*/
-#define VTPM_BASE_ADDRESS 0xfed40000
static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx)
{
resource_size_t start = 0;
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 8845a2e..90d7f68 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -26,6 +26,8 @@
#include <asm/ptrace.h>
#include <asm/hyperv-tlfs.h>
+#define VTPM_BASE_ADDRESS 0xfed40000
+
struct ms_hyperv_info {
u32 features;
u32 priv_high;
--
1.8.3.1
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().
As such, remove swiotlb_unencrypted_base and the associated
code.
Signed-off-by: Michael Kelley <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 7 +------
include/linux/swiotlb.h | 2 --
kernel/dma/swiotlb.c | 45 +-----------------------------------------
3 files changed, 2 insertions(+), 52 deletions(-)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index ded7506..ff348eb 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -18,7 +18,6 @@
#include <linux/kexec.h>
#include <linux/i8253.h>
#include <linux/random.h>
-#include <linux/swiotlb.h>
#include <asm/processor.h>
#include <asm/hypervisor.h>
#include <asm/hyperv-tlfs.h>
@@ -404,12 +403,8 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
- if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
+ if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
static_branch_enable(&isolation_type_snp);
-#ifdef CONFIG_SWIOTLB
- swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
-#endif
- }
}
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index bcef10e..2ef25e6 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -180,6 +180,4 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
}
#endif /* CONFIG_DMA_RESTRICTED_POOL */
-extern phys_addr_t swiotlb_unencrypted_base;
-
#endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 03e3251..9f7ba60 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -73,8 +73,6 @@ struct io_tlb_slot {
struct io_tlb_mem io_tlb_default_mem;
-phys_addr_t swiotlb_unencrypted_base;
-
static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
static unsigned long default_nareas;
@@ -202,34 +200,6 @@ static inline unsigned long nr_slots(u64 val)
}
/*
- * Remap swioltb memory in the unencrypted physical address space
- * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
- * Isolation VMs).
- */
-#ifdef CONFIG_HAS_IOMEM
-static void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
-{
- void *vaddr = NULL;
-
- if (swiotlb_unencrypted_base) {
- phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
-
- vaddr = memremap(paddr, bytes, MEMREMAP_WB);
- if (!vaddr)
- pr_err("Failed to map the unencrypted memory %pa size %lx.\n",
- &paddr, bytes);
- }
-
- return vaddr;
-}
-#else
-static void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
-{
- return NULL;
-}
-#endif
-
-/*
* Early SWIOTLB allocation may be too early to allow an architecture to
* perform the desired operations. This function allows the architecture to
* call SWIOTLB when the operations are possible. It needs to be called
@@ -238,18 +208,12 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
void __init swiotlb_update_mem_attributes(void)
{
struct io_tlb_mem *mem = &io_tlb_default_mem;
- void *vaddr;
unsigned long bytes;
if (!mem->nslabs || mem->late_alloc)
return;
- vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
- set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-
- mem->vaddr = swiotlb_mem_remap(mem, bytes);
- if (!mem->vaddr)
- mem->vaddr = vaddr;
+ set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);
}
static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -280,13 +244,6 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
mem->slots[i].alloc_size = 0;
}
- /*
- * If swiotlb_unencrypted_base is set, the bounce buffer memory will
- * be remapped and cleared in swiotlb_update_mem_attributes.
- */
- if (swiotlb_unencrypted_base)
- return;
-
memset(vaddr, 0, bytes);
mem->vaddr = vaddr;
return;
--
1.8.3.1
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), it's no longer necessary to
have separate code paths for mapping VMBus ring buffers for
for normal VMs and for Confidential VMs.
As such, remove the code path that uses vmap_pfn(), and set
the protection flags argument to vmap() to account for the
difference between normal and Confidential VMs.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
drivers/hv/ring_buffer.c | 62 ++++++++++++++++--------------------------------
1 file changed, 20 insertions(+), 42 deletions(-)
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2111e97..3c9b024 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -186,8 +186,6 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
struct page *pages, u32 page_cnt, u32 max_pkt_size)
{
struct page **pages_wraparound;
- unsigned long *pfns_wraparound;
- u64 pfn;
int i;
BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
@@ -196,50 +194,30 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
* First page holds struct hv_ring_buffer, do wraparound mapping for
* the rest.
*/
- if (hv_isolation_type_snp()) {
- pfn = page_to_pfn(pages) +
- PFN_DOWN(ms_hyperv.shared_gpa_boundary);
+ pages_wraparound = kcalloc(page_cnt * 2 - 1,
+ sizeof(struct page *),
+ GFP_KERNEL);
+ if (!pages_wraparound)
+ return -ENOMEM;
- pfns_wraparound = kcalloc(page_cnt * 2 - 1,
- sizeof(unsigned long), GFP_KERNEL);
- if (!pfns_wraparound)
- return -ENOMEM;
-
- pfns_wraparound[0] = pfn;
- for (i = 0; i < 2 * (page_cnt - 1); i++)
- pfns_wraparound[i + 1] = pfn + i % (page_cnt - 1) + 1;
-
- ring_info->ring_buffer = (struct hv_ring_buffer *)
- vmap_pfn(pfns_wraparound, page_cnt * 2 - 1,
- pgprot_decrypted(PAGE_KERNEL));
- kfree(pfns_wraparound);
-
- if (!ring_info->ring_buffer)
- return -ENOMEM;
-
- /* Zero ring buffer after setting memory host visibility. */
- memset(ring_info->ring_buffer, 0x00, PAGE_SIZE * page_cnt);
- } else {
- pages_wraparound = kcalloc(page_cnt * 2 - 1,
- sizeof(struct page *),
- GFP_KERNEL);
- if (!pages_wraparound)
- return -ENOMEM;
-
- pages_wraparound[0] = pages;
- for (i = 0; i < 2 * (page_cnt - 1); i++)
- pages_wraparound[i + 1] =
- &pages[i % (page_cnt - 1) + 1];
+ pages_wraparound[0] = pages;
+ for (i = 0; i < 2 * (page_cnt - 1); i++)
+ pages_wraparound[i + 1] =
+ &pages[i % (page_cnt - 1) + 1];
- ring_info->ring_buffer = (struct hv_ring_buffer *)
- vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
- PAGE_KERNEL);
+ ring_info->ring_buffer = (struct hv_ring_buffer *)
+ vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
+ pgprot_decrypted(PAGE_KERNEL));
- kfree(pages_wraparound);
- if (!ring_info->ring_buffer)
- return -ENOMEM;
- }
+ kfree(pages_wraparound);
+ if (!ring_info->ring_buffer)
+ return -ENOMEM;
+ /*
+ * Ensure the header page is zero'ed since
+ * encryption status may have changed.
+ */
+ memset(ring_info->ring_buffer, 0, HV_HYP_PAGE_SIZE);
ring_info->ring_buffer->read_index =
ring_info->ring_buffer->write_index = 0;
--
1.8.3.1
For PCI pass-thru devices in a Confidential VM, Hyper-V requires
that PCI config space be accessed via hypercalls. In normal VMs,
config space accesses are trapped to the Hyper-V host and emulated.
But in a confidential VM, the host can't access guest memory to
decode the instruction for emulation, so an explicit hypercall must
be used.
Update the PCI config space access functions to use the hypercalls
when such use is indicated by Hyper-V flags. Also, set the flag to
allow the Hyper-V PCI driver to be loaded and used in a Confidential
VM (a.k.a., "Isolation VM"). The driver has previously been hardened
against a malicious Hyper-V host[1].
[1] https://lore.kernel.org/all/[email protected]/
Co-developed-by: Dexuan Cui <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
---
drivers/hv/channel_mgmt.c | 2 +-
drivers/pci/controller/pci-hyperv.c | 168 ++++++++++++++++++++++--------------
2 files changed, 105 insertions(+), 65 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index cc23b90..007f26d 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -67,7 +67,7 @@
{ .dev_type = HV_PCIE,
HV_PCIE_GUID,
.perf_device = false,
- .allowed_in_isolated = false,
+ .allowed_in_isolated = true,
},
/* Synthetic Frame Buffer */
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index d78a419..337f3b4 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -514,6 +514,7 @@ struct hv_pcibus_device {
/* Highest slot of child device with resources allocated */
int wslot_res_allocated;
+ bool use_calls; /* Use hypercalls to access mmio cfg space */
/* hypercall arg, must not cross page boundary */
struct hv_retarget_device_interrupt retarget_msi_interrupt_params;
@@ -1123,8 +1124,10 @@ static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where,
int size, u32 *val)
{
+ struct hv_pcibus_device *hbus = hpdev->hbus;
+ struct device *dev = &hbus->hdev->device;
+ int offset = where + CFG_PAGE_OFFSET;
unsigned long flags;
- void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET + where;
/*
* If the attempt is to read the IDs or the ROM BAR, simulate that.
@@ -1152,56 +1155,79 @@ static void _hv_pcifront_read_config(struct hv_pci_dev *hpdev, int where,
*/
*val = 0;
} else if (where + size <= CFG_PAGE_SIZE) {
- spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
- /* Choose the function to be read. (See comment above) */
- writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
- /* Make sure the function was chosen before we start reading. */
- mb();
- /* Read from that function's config space. */
- switch (size) {
- case 1:
- *val = readb(addr);
- break;
- case 2:
- *val = readw(addr);
- break;
- default:
- *val = readl(addr);
- break;
+
+ spin_lock_irqsave(&hbus->config_lock, flags);
+ if (hbus->use_calls) {
+ phys_addr_t addr = hbus->mem_config->start + offset;
+
+ hv_pci_write_mmio(dev, hbus->mem_config->start, 4,
+ hpdev->desc.win_slot.slot);
+ hv_pci_read_mmio(dev, addr, size, val);
+ } else {
+ void __iomem *addr = hbus->cfg_addr + offset;
+
+ /* Choose the function to be read. (See comment above) */
+ writel(hpdev->desc.win_slot.slot, hbus->cfg_addr);
+ /* Make sure the function was chosen before reading. */
+ mb();
+ /* Read from that function's config space. */
+ switch (size) {
+ case 1:
+ *val = readb(addr);
+ break;
+ case 2:
+ *val = readw(addr);
+ break;
+ default:
+ *val = readl(addr);
+ break;
+ }
+ /*
+ * Make sure the read was done before we release the
+ * spinlock allowing consecutive reads/writes.
+ */
+ mb();
}
- /*
- * Make sure the read was done before we release the spinlock
- * allowing consecutive reads/writes.
- */
- mb();
- spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+ spin_unlock_irqrestore(&hbus->config_lock, flags);
} else {
- dev_err(&hpdev->hbus->hdev->device,
- "Attempt to read beyond a function's config space.\n");
+ dev_err(dev, "Attempt to read beyond a function's config space.\n");
}
}
static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev)
{
+ struct hv_pcibus_device *hbus = hpdev->hbus;
+ struct device *dev = &hbus->hdev->device;
+ u32 val;
u16 ret;
unsigned long flags;
- void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET +
- PCI_VENDOR_ID;
- spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
+ spin_lock_irqsave(&hbus->config_lock, flags);
- /* Choose the function to be read. (See comment above) */
- writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
- /* Make sure the function was chosen before we start reading. */
- mb();
- /* Read from that function's config space. */
- ret = readw(addr);
- /*
- * mb() is not required here, because the spin_unlock_irqrestore()
- * is a barrier.
- */
+ if (hbus->use_calls) {
+ phys_addr_t addr = hbus->mem_config->start +
+ CFG_PAGE_OFFSET + PCI_VENDOR_ID;
+
+ hv_pci_write_mmio(dev, hbus->mem_config->start, 4,
+ hpdev->desc.win_slot.slot);
+ hv_pci_read_mmio(dev, addr, 2, &val);
+ ret = val; /* Truncates to 16 bits */
+ } else {
+ void __iomem *addr = hbus->cfg_addr + CFG_PAGE_OFFSET +
+ PCI_VENDOR_ID;
+ /* Choose the function to be read. (See comment above) */
+ writel(hpdev->desc.win_slot.slot, hbus->cfg_addr);
+ /* Make sure the function was chosen before we start reading. */
+ mb();
+ /* Read from that function's config space. */
+ ret = readw(addr);
+ /*
+ * mb() is not required here, because the
+ * spin_unlock_irqrestore() is a barrier.
+ */
+ }
- spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+ spin_unlock_irqrestore(&hbus->config_lock, flags);
return ret;
}
@@ -1216,39 +1242,51 @@ static u16 hv_pcifront_get_vendor_id(struct hv_pci_dev *hpdev)
static void _hv_pcifront_write_config(struct hv_pci_dev *hpdev, int where,
int size, u32 val)
{
+ struct hv_pcibus_device *hbus = hpdev->hbus;
+ struct device *dev = &hbus->hdev->device;
+ int offset = where + CFG_PAGE_OFFSET;
unsigned long flags;
- void __iomem *addr = hpdev->hbus->cfg_addr + CFG_PAGE_OFFSET + where;
if (where >= PCI_SUBSYSTEM_VENDOR_ID &&
where + size <= PCI_CAPABILITY_LIST) {
/* SSIDs and ROM BARs are read-only */
} else if (where >= PCI_COMMAND && where + size <= CFG_PAGE_SIZE) {
- spin_lock_irqsave(&hpdev->hbus->config_lock, flags);
- /* Choose the function to be written. (See comment above) */
- writel(hpdev->desc.win_slot.slot, hpdev->hbus->cfg_addr);
- /* Make sure the function was chosen before we start writing. */
- wmb();
- /* Write to that function's config space. */
- switch (size) {
- case 1:
- writeb(val, addr);
- break;
- case 2:
- writew(val, addr);
- break;
- default:
- writel(val, addr);
- break;
+ spin_lock_irqsave(&hbus->config_lock, flags);
+
+ if (hbus->use_calls) {
+ phys_addr_t addr = hbus->mem_config->start + offset;
+
+ hv_pci_write_mmio(dev, hbus->mem_config->start, 4,
+ hpdev->desc.win_slot.slot);
+ hv_pci_write_mmio(dev, addr, size, val);
+ } else {
+ void __iomem *addr = hbus->cfg_addr + offset;
+
+ /* Choose the function to write. (See comment above) */
+ writel(hpdev->desc.win_slot.slot, hbus->cfg_addr);
+ /* Make sure the function was chosen before writing. */
+ wmb();
+ /* Write to that function's config space. */
+ switch (size) {
+ case 1:
+ writeb(val, addr);
+ break;
+ case 2:
+ writew(val, addr);
+ break;
+ default:
+ writel(val, addr);
+ break;
+ }
+ /*
+ * Make sure the write was done before we release the
+ * spinlock allowing consecutive reads/writes.
+ */
+ mb();
}
- /*
- * Make sure the write was done before we release the spinlock
- * allowing consecutive reads/writes.
- */
- mb();
- spin_unlock_irqrestore(&hpdev->hbus->config_lock, flags);
+ spin_unlock_irqrestore(&hbus->config_lock, flags);
} else {
- dev_err(&hpdev->hbus->hdev->device,
- "Attempt to write beyond a function's config space.\n");
+ dev_err(dev, "Attempt to write beyond a function's config space.\n");
}
}
@@ -3627,6 +3665,7 @@ static int hv_pci_probe(struct hv_device *hdev,
hbus->bridge->domain_nr = dom;
#ifdef CONFIG_X86
hbus->sysdata.domain = dom;
+ hbus->use_calls = !!(ms_hyperv.hints & HV_X64_USE_MMIO_HYPERCALLS);
#elif defined(CONFIG_ARM64)
/*
* Set the PCI bus parent to be the corresponding VMbus
@@ -3636,6 +3675,7 @@ static int hv_pci_probe(struct hv_device *hdev,
* information to devices created on the bus.
*/
hbus->sysdata.parent = hdev->device.parent;
+ hbus->use_calls = false;
#endif
hbus->hdev = hdev;
--
1.8.3.1
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().
As such, remove the code to create and manage the second
mapping for VMBus monitor pages. Because set_memory_decrypted()
and set_memory_encrypted() are no-ops in normal VMs, it's
not even necessary to test for being in a Confidential VM
(a.k.a., "Isolation VM").
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
drivers/hv/connection.c | 113 ++++++++++++----------------------------------
drivers/hv/hyperv_vmbus.h | 2 -
2 files changed, 28 insertions(+), 87 deletions(-)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index da51b50..5978e9d 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -104,8 +104,14 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
vmbus_connection.msg_conn_id = VMBUS_MESSAGE_CONNECTION_ID;
}
- msg->monitor_page1 = vmbus_connection.monitor_pages_pa[0];
- msg->monitor_page2 = vmbus_connection.monitor_pages_pa[1];
+ /*
+ * shared_gpa_boundary is zero in non-SNP VMs, so it's safe to always
+ * bitwise OR it
+ */
+ msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]) |
+ ms_hyperv.shared_gpa_boundary;
+ msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]) |
+ ms_hyperv.shared_gpa_boundary;
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
@@ -219,72 +225,27 @@ int vmbus_connect(void)
* Setup the monitor notification facility. The 1st page for
* parent->child and the 2nd page for child->parent
*/
- vmbus_connection.monitor_pages[0] = (void *)hv_alloc_hyperv_zeroed_page();
- vmbus_connection.monitor_pages[1] = (void *)hv_alloc_hyperv_zeroed_page();
+ vmbus_connection.monitor_pages[0] = (void *)hv_alloc_hyperv_page();
+ vmbus_connection.monitor_pages[1] = (void *)hv_alloc_hyperv_page();
if ((vmbus_connection.monitor_pages[0] == NULL) ||
(vmbus_connection.monitor_pages[1] == NULL)) {
ret = -ENOMEM;
goto cleanup;
}
- vmbus_connection.monitor_pages_original[0]
- = vmbus_connection.monitor_pages[0];
- vmbus_connection.monitor_pages_original[1]
- = vmbus_connection.monitor_pages[1];
- vmbus_connection.monitor_pages_pa[0]
- = virt_to_phys(vmbus_connection.monitor_pages[0]);
- vmbus_connection.monitor_pages_pa[1]
- = virt_to_phys(vmbus_connection.monitor_pages[1]);
-
- if (hv_is_isolation_supported()) {
- ret = set_memory_decrypted((unsigned long)
- vmbus_connection.monitor_pages[0],
- 1);
- ret |= set_memory_decrypted((unsigned long)
- vmbus_connection.monitor_pages[1],
- 1);
- if (ret)
- goto cleanup;
-
- /*
- * Isolation VM with AMD SNP needs to access monitor page via
- * address space above shared gpa boundary.
- */
- if (hv_isolation_type_snp()) {
- vmbus_connection.monitor_pages_pa[0] +=
- ms_hyperv.shared_gpa_boundary;
- vmbus_connection.monitor_pages_pa[1] +=
- ms_hyperv.shared_gpa_boundary;
-
- vmbus_connection.monitor_pages[0]
- = memremap(vmbus_connection.monitor_pages_pa[0],
- HV_HYP_PAGE_SIZE,
- MEMREMAP_WB);
- if (!vmbus_connection.monitor_pages[0]) {
- ret = -ENOMEM;
- goto cleanup;
- }
-
- vmbus_connection.monitor_pages[1]
- = memremap(vmbus_connection.monitor_pages_pa[1],
- HV_HYP_PAGE_SIZE,
- MEMREMAP_WB);
- if (!vmbus_connection.monitor_pages[1]) {
- ret = -ENOMEM;
- goto cleanup;
- }
- }
-
- /*
- * Set memory host visibility hvcall smears memory
- * and so zero monitor pages here.
- */
- memset(vmbus_connection.monitor_pages[0], 0x00,
- HV_HYP_PAGE_SIZE);
- memset(vmbus_connection.monitor_pages[1], 0x00,
- HV_HYP_PAGE_SIZE);
+ ret = set_memory_decrypted((unsigned long)
+ vmbus_connection.monitor_pages[0], 1);
+ ret |= set_memory_decrypted((unsigned long)
+ vmbus_connection.monitor_pages[1], 1);
+ if (ret)
+ goto cleanup;
- }
+ /*
+ * Set_memory_decrypted() will change the memory contents if
+ * decryption occurs, so zero monitor pages here.
+ */
+ memset(vmbus_connection.monitor_pages[0], 0x00, HV_HYP_PAGE_SIZE);
+ memset(vmbus_connection.monitor_pages[1], 0x00, HV_HYP_PAGE_SIZE);
msginfo = kzalloc(sizeof(*msginfo) +
sizeof(struct vmbus_channel_initiate_contact),
@@ -376,31 +337,13 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
- if (hv_is_isolation_supported()) {
- /*
- * memunmap() checks input address is ioremap address or not
- * inside. It doesn't unmap any thing in the non-SNP CVM and
- * so not check CVM type here.
- */
- memunmap(vmbus_connection.monitor_pages[0]);
- memunmap(vmbus_connection.monitor_pages[1]);
-
- set_memory_encrypted((unsigned long)
- vmbus_connection.monitor_pages_original[0],
- 1);
- set_memory_encrypted((unsigned long)
- vmbus_connection.monitor_pages_original[1],
- 1);
- }
+ set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[0], 1);
+ set_memory_encrypted((unsigned long)vmbus_connection.monitor_pages[1], 1);
- hv_free_hyperv_page((unsigned long)
- vmbus_connection.monitor_pages_original[0]);
- hv_free_hyperv_page((unsigned long)
- vmbus_connection.monitor_pages_original[1]);
- vmbus_connection.monitor_pages_original[0] =
- vmbus_connection.monitor_pages[0] = NULL;
- vmbus_connection.monitor_pages_original[1] =
- vmbus_connection.monitor_pages[1] = NULL;
+ hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
+ hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
+ vmbus_connection.monitor_pages[0] = NULL;
+ vmbus_connection.monitor_pages[1] = NULL;
}
/*
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index dc673ed..167ac51 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -241,8 +241,6 @@ struct vmbus_connection {
* is child->parent notification
*/
struct hv_monitor_page *monitor_pages[2];
- void *monitor_pages_original[2];
- phys_addr_t monitor_pages_pa[2];
struct list_head chn_msg_list;
spinlock_t channelmsg_lock;
--
1.8.3.1
With the vTOM bit now treated as a protection flag and not part of
the physical address, avoid remapping physical addresses with vTOM set
since technically such addresses aren't valid. Use ioremap_cache()
instead of memremap() to ensure that the mapping provides decrypted
access, which will correctly set the vTOM bit as a protection flag.
While this change is not required for correctness with the current
implementation of memremap(), for general code hygiene it's better to
not depend on the mapping functions doing something reasonable with
a physical address that is out-of-range.
While here, fix typos in two error messages.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/hv_init.c | 7 +++++--
drivers/hv/hv.c | 23 +++++++++++++----------
2 files changed, 18 insertions(+), 12 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index edbc67e..a5f9474 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -63,7 +63,10 @@ static int hyperv_init_ghcb(void)
* memory boundary and map it here.
*/
rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
- ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+
+ /* Mask out vTOM bit. ioremap_cache() maps decrypted */
+ ghcb_gpa &= ~ms_hyperv.shared_gpa_boundary;
+ ghcb_va = (void *)ioremap_cache(ghcb_gpa, HV_HYP_PAGE_SIZE);
if (!ghcb_va)
return -ENOMEM;
@@ -217,7 +220,7 @@ static int hv_cpu_die(unsigned int cpu)
if (hv_ghcb_pg) {
ghcb_va = (void **)this_cpu_ptr(hv_ghcb_pg);
if (*ghcb_va)
- memunmap(*ghcb_va);
+ iounmap(*ghcb_va);
*ghcb_va = NULL;
}
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 8b0dd8e..00823489 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -217,11 +217,13 @@ void hv_synic_enable_regs(unsigned int cpu)
simp.simp_enabled = 1;
if (hv_isolation_type_snp() || hv_root_partition) {
+ /* Mask out vTOM bit. ioremap_cache() maps decrypted */
+ u64 base = (simp.base_simp_gpa << HV_HYP_PAGE_SHIFT) &
+ ~ms_hyperv.shared_gpa_boundary;
hv_cpu->synic_message_page
- = memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
- HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+ = (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
if (!hv_cpu->synic_message_page)
- pr_err("Fail to map syinc message page.\n");
+ pr_err("Fail to map synic message page.\n");
} else {
simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
>> HV_HYP_PAGE_SHIFT;
@@ -234,12 +236,13 @@ void hv_synic_enable_regs(unsigned int cpu)
siefp.siefp_enabled = 1;
if (hv_isolation_type_snp() || hv_root_partition) {
- hv_cpu->synic_event_page =
- memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
- HV_HYP_PAGE_SIZE, MEMREMAP_WB);
-
+ /* Mask out vTOM bit. ioremap_cache() maps decrypted */
+ u64 base = (siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT) &
+ ~ms_hyperv.shared_gpa_boundary;
+ hv_cpu->synic_event_page
+ = (void *)ioremap_cache(base, HV_HYP_PAGE_SIZE);
if (!hv_cpu->synic_event_page)
- pr_err("Fail to map syinc event page.\n");
+ pr_err("Fail to map synic event page.\n");
} else {
siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
>> HV_HYP_PAGE_SHIFT;
@@ -316,7 +319,7 @@ void hv_synic_disable_regs(unsigned int cpu)
*/
simp.simp_enabled = 0;
if (hv_isolation_type_snp() || hv_root_partition) {
- memunmap(hv_cpu->synic_message_page);
+ iounmap(hv_cpu->synic_message_page);
hv_cpu->synic_message_page = NULL;
} else {
simp.base_simp_gpa = 0;
@@ -328,7 +331,7 @@ void hv_synic_disable_regs(unsigned int cpu)
siefp.siefp_enabled = 0;
if (hv_isolation_type_snp() || hv_root_partition) {
- memunmap(hv_cpu->synic_event_page);
+ iounmap(hv_cpu->synic_event_page);
hv_cpu->synic_event_page = NULL;
} else {
siefp.base_siefp_gpa = 0;
--
1.8.3.1
To support PCI pass-thru devices in Confidential VMs, Hyper-V
has added hypercalls to read and write MMIO space. Add the
appropriate definitions to hyperv-tlfs.h and implement
functions to make the hypercalls.
Co-developed-by: Dexuan Cui <[email protected]>
Signed-off-by: Dexuan Cui <[email protected]>
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Haiyang Zhang <[email protected]>
---
arch/x86/include/asm/hyperv-tlfs.h | 3 ++
drivers/pci/controller/pci-hyperv.c | 64 +++++++++++++++++++++++++++++++++++++
include/asm-generic/hyperv-tlfs.h | 22 +++++++++++++
3 files changed, 89 insertions(+)
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 0b73a80..b4fb75b 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -122,6 +122,9 @@
/* Recommend using enlightened VMCS */
#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)
+/* Use hypercalls for MMIO config space access */
+#define HV_X64_USE_MMIO_HYPERCALLS BIT(21)
+
/*
* CPU management features identification.
* These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index f33370b..d78a419 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1041,6 +1041,70 @@ static int wslot_to_devfn(u32 wslot)
return PCI_DEVFN(slot_no.bits.dev, slot_no.bits.func);
}
+static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32 *val)
+{
+ struct hv_mmio_read_input *in;
+ struct hv_mmio_read_output *out;
+ u64 ret;
+
+ /*
+ * Must be called with interrupts disabled so it is safe
+ * to use the per-cpu input argument page. Use it for
+ * both input and output.
+ */
+ in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
+ in->gpa = gpa;
+ in->size = size;
+
+ ret = hv_do_hypercall(HVCALL_MMIO_READ, in, out);
+ if (hv_result_success(ret)) {
+ switch (size) {
+ case 1:
+ *val = *(u8 *)(out->data);
+ break;
+ case 2:
+ *val = *(u16 *)(out->data);
+ break;
+ default:
+ *val = *(u32 *)(out->data);
+ break;
+ }
+ } else
+ dev_err(dev, "MMIO read hypercall error %llx addr %llx size %d\n",
+ ret, gpa, size);
+}
+
+static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32 val)
+{
+ struct hv_mmio_write_input *in;
+ u64 ret;
+
+ /*
+ * Must be called with interrupts disabled so it is safe
+ * to use the per-cpu input argument memory.
+ */
+ in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ in->gpa = gpa;
+ in->size = size;
+ switch (size) {
+ case 1:
+ *(u8 *)(in->data) = val;
+ break;
+ case 2:
+ *(u16 *)(in->data) = val;
+ break;
+ default:
+ *(u32 *)(in->data) = val;
+ break;
+ }
+
+ ret = hv_do_hypercall(HVCALL_MMIO_WRITE, in, NULL);
+ if (!hv_result_success(ret))
+ dev_err(dev, "MMIO write hypercall error %llx addr %llx size %d\n",
+ ret, gpa, size);
+}
+
/*
* PCI Configuration Space for these root PCI buses is implemented as a pair
* of pages in memory-mapped I/O space. Writing to the first page chooses
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index b870983..ea406e9 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -168,6 +168,8 @@ struct ms_hyperv_tsc_page {
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
+#define HVCALL_MMIO_READ 0x0106
+#define HVCALL_MMIO_WRITE 0x0107
/* Extended hypercalls */
#define HV_EXT_CALL_QUERY_CAPABILITIES 0x8001
@@ -796,4 +798,24 @@ struct hv_memory_hint {
union hv_gpa_page_range ranges[];
} __packed;
+/* Data structures for HVCALL_MMIO_READ and HVCALL_MMIO_WRITE */
+#define HV_HYPERCALL_MMIO_MAX_DATA_LENGTH 64
+
+struct hv_mmio_read_input {
+ u64 gpa;
+ u32 size;
+ u32 reserved;
+} __packed;
+
+struct hv_mmio_read_output {
+ u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
+} __packed;
+
+struct hv_mmio_write_input {
+ u64 gpa;
+ u32 size;
+ u32 reserved;
+ u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
+} __packed;
+
#endif
--
1.8.3.1
With changes to how Hyper-V guest VMs flip memory between private
(encrypted) and shared (decrypted), creating a second kernel virtual
mapping for shared memory is no longer necessary. Everything needed
for the transition to shared is handled by set_memory_decrypted().
As such, remove the code to create and manage the second
mapping for the pre-allocated send and recv buffers. This mapping
is the last user of hv_map_memory()/hv_unmap_memory(), so delete
these functions as well. Finally, hv_map_memory() is the last
user of vmap_pfn() in Hyper-V guest code, so remove the Kconfig
selection of VMAP_PFN.
Signed-off-by: Michael Kelley <[email protected]>
Reviewed-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 28 ------------------------
drivers/hv/Kconfig | 1 -
drivers/hv/hv_common.c | 11 ----------
drivers/net/hyperv/hyperv_net.h | 2 --
drivers/net/hyperv/netvsc.c | 48 ++---------------------------------------
include/asm-generic/mshyperv.h | 2 --
6 files changed, 2 insertions(+), 90 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index f6a020c..127d5b7 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -376,34 +376,6 @@ void __init hv_vtom_init(void)
#endif /* CONFIG_AMD_MEM_ENCRYPT */
-/*
- * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
- */
-void *hv_map_memory(void *addr, unsigned long size)
-{
- unsigned long *pfns = kcalloc(size / PAGE_SIZE,
- sizeof(unsigned long), GFP_KERNEL);
- void *vaddr;
- int i;
-
- if (!pfns)
- return NULL;
-
- for (i = 0; i < size / PAGE_SIZE; i++)
- pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
- (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
-
- vaddr = vmap_pfn(pfns, size / PAGE_SIZE, pgprot_decrypted(PAGE_KERNEL));
- kfree(pfns);
-
- return vaddr;
-}
-
-void hv_unmap_memory(void *addr)
-{
- vunmap(addr);
-}
-
enum hv_isolation_type hv_get_isolation_type(void)
{
if (!(ms_hyperv.priv_high & HV_ISOLATION))
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 0747a8f..9a074cb 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -8,7 +8,6 @@ config HYPERV
|| (ARM64 && !CPU_BIG_ENDIAN))
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
- select VMAP_PFN
help
Select this option to run Linux as a Hyper-V client operating
system.
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 52a6f89..6d40b6c 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -311,14 +311,3 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
}
EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
-
-void __weak *hv_map_memory(void *addr, unsigned long size)
-{
- return NULL;
-}
-EXPORT_SYMBOL_GPL(hv_map_memory);
-
-void __weak hv_unmap_memory(void *addr)
-{
-}
-EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index dd5919e..33d51e3 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1139,7 +1139,6 @@ struct netvsc_device {
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
- void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1148,7 +1147,6 @@ struct netvsc_device {
/* Send buffer allocated by us */
void *send_buf;
- void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index da737d9..82e9796 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -154,17 +154,8 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
kfree(nvdev->extension);
-
- if (nvdev->recv_original_buf)
- vfree(nvdev->recv_original_buf);
- else
- vfree(nvdev->recv_buf);
-
- if (nvdev->send_original_buf)
- vfree(nvdev->send_original_buf);
- else
- vfree(nvdev->send_buf);
-
+ vfree(nvdev->recv_buf);
+ vfree(nvdev->send_buf);
bitmap_free(nvdev->send_section_map);
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -347,7 +338,6 @@ static int netvsc_init_buf(struct hv_device *device,
struct nvsp_message *init_packet;
unsigned int buf_size;
int i, ret = 0;
- void *vaddr;
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -383,17 +373,6 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
- if (hv_isolation_type_snp()) {
- vaddr = hv_map_memory(net_device->recv_buf, buf_size);
- if (!vaddr) {
- ret = -ENOMEM;
- goto cleanup;
- }
-
- net_device->recv_original_buf = net_device->recv_buf;
- net_device->recv_buf = vaddr;
- }
-
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -497,17 +476,6 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
- if (hv_isolation_type_snp()) {
- vaddr = hv_map_memory(net_device->send_buf, buf_size);
- if (!vaddr) {
- ret = -ENOMEM;
- goto cleanup;
- }
-
- net_device->send_original_buf = net_device->send_buf;
- net_device->send_buf = vaddr;
- }
-
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -762,12 +730,6 @@ void netvsc_device_remove(struct hv_device *device)
netvsc_teardown_send_gpadl(device, net_device, ndev);
}
- if (net_device->recv_original_buf)
- hv_unmap_memory(net_device->recv_buf);
-
- if (net_device->send_original_buf)
- hv_unmap_memory(net_device->send_buf);
-
/* Release all resources */
free_netvsc_device_rcu(net_device);
}
@@ -1844,12 +1806,6 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
netif_napi_del(&net_device->chan_table[0].napi);
cleanup2:
- if (net_device->recv_original_buf)
- hv_unmap_memory(net_device->recv_buf);
-
- if (net_device->send_original_buf)
- hv_unmap_memory(net_device->send_buf);
-
free_netvsc_device(&net_device->rcu);
return ERR_PTR(ret);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 90d7f68..afcd9ae 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -271,8 +271,6 @@ static inline int cpumask_to_vpset_noself(struct hv_vpset *vpset,
void hyperv_cleanup(void);
bool hv_query_ext_cap(u64 cap_query);
void hv_setup_dma_ops(struct device *dev, bool coherent);
-void *hv_map_memory(void *addr, unsigned long size);
-void hv_unmap_memory(void *addr);
#else /* CONFIG_HYPERV */
static inline bool hv_is_hyperv_initialized(void) { return false; }
static inline bool hv_is_hibernation_supported(void) { return false; }
--
1.8.3.1
On Wed, Mar 08, 2023 at 06:40:07PM -0800, Michael Kelley wrote:
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index 49b44f8..d1c3306 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -88,8 +106,6 @@ bool cc_platform_has(enum cc_attr attr)
> return amd_cc_platform_has(attr);
> case CC_VENDOR_INTEL:
> return intel_cc_platform_has(attr);
> - case CC_VENDOR_HYPERV:
> - return hyperv_cc_platform_has(attr);
> default:
> return false;
> }
> @@ -103,11 +119,14 @@ u64 cc_mkenc(u64 val)
> * encryption status of the page.
> *
> * - for AMD, bit *set* means the page is encrypted
> - * - for Intel *clear* means encrypted.
> + * - for AMD with vTOM and for Intel, *clear* means encrypted
> */
> switch (vendor) {
> case CC_VENDOR_AMD:
> - return val | cc_mask;
> + if (sev_status & MSR_AMD64_SNP_VTOM)
> + return val & ~cc_mask;
This is silly. It should simply be:
if (sev_status & MSR_AMD64_SNP_VTOM)
return val;
> + else
> + return val | cc_mask;
> case CC_VENDOR_INTEL:
> return val & ~cc_mask;
> default:
> @@ -120,7 +139,10 @@ u64 cc_mkdec(u64 val)
> /* See comment in cc_mkenc() */
> switch (vendor) {
> case CC_VENDOR_AMD:
> - return val & ~cc_mask;
> + if (sev_status & MSR_AMD64_SNP_VTOM)
> + return val | cc_mask;
So if you set the C-bit, that doesn't make it decrypted on AMD. cc_mask
on VTOM is 0 so why even bother?
Same as the above.
> + else
> + return val & ~cc_mask;
> case CC_VENDOR_INTEL:
> return val | cc_mask;
> default:
...
> +void __init hv_vtom_init(void)
> +{
> + /*
> + * By design, a VM using vTOM doesn't see the SEV setting,
> + * so SEV initialization is bypassed and sev_status isn't set.
> + * Set it here to indicate a vTOM VM.
> + */
This looks like a hack. The SEV status MSR cannot be intercepted so the
guest should see vTOM. How are you running vTOM without setting it even up?!
> + sev_status = MSR_AMD64_SNP_VTOM;
> + cc_set_vendor(CC_VENDOR_AMD);
> + cc_set_mask(ms_hyperv.shared_gpa_boundary);
> + physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
> +
> + x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
> + x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
> + x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
> + x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
> +}
> +
> +#endif /* CONFIG_AMD_MEM_ENCRYPT */
> +
> /*
> * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> */
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Mar 08, 2023 at 06:40:01PM -0800, Michael Kelley wrote:
> arch/x86/coco/core.c | 42 +++++--
> arch/x86/hyperv/hv_init.c | 18 +--
> arch/x86/hyperv/ivm.c | 148 +++++++++++++----------
> arch/x86/include/asm/coco.h | 1 -
> arch/x86/include/asm/hyperv-tlfs.h | 3 +
> arch/x86/include/asm/mshyperv.h | 16 ++-
> arch/x86/include/asm/x86_init.h | 4 +
> arch/x86/kernel/apic/io_apic.c | 16 ++-
> arch/x86/kernel/cpu/mshyperv.c | 22 ++--
> arch/x86/kernel/x86_init.c | 2 +
> arch/x86/mm/ioremap.c | 5 +
> arch/x86/mm/mem_encrypt_amd.c | 10 +-
> arch/x86/mm/pat/set_memory.c | 3 -
...
> 29 files changed, 440 insertions(+), 439 deletions(-)
The x86 bits look pretty much ready modulo some uncertainties in patch 6.
When this has all been clarified I'm thinking of taking patches
1-2,4-6
through tip and giving an immutable branch to HyperV maintainers to base
the rest ontop...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
From: Borislav Petkov <[email protected]> Sent: Monday, March 20, 2023 4:23 AM
>
> On Wed, Mar 08, 2023 at 06:40:07PM -0800, Michael Kelley wrote:
> > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > index 49b44f8..d1c3306 100644
> > --- a/arch/x86/coco/core.c
> > +++ b/arch/x86/coco/core.c
> > @@ -88,8 +106,6 @@ bool cc_platform_has(enum cc_attr attr)
> > return amd_cc_platform_has(attr);
> > case CC_VENDOR_INTEL:
> > return intel_cc_platform_has(attr);
> > - case CC_VENDOR_HYPERV:
> > - return hyperv_cc_platform_has(attr);
> > default:
> > return false;
> > }
> > @@ -103,11 +119,14 @@ u64 cc_mkenc(u64 val)
> > * encryption status of the page.
> > *
> > * - for AMD, bit *set* means the page is encrypted
> > - * - for Intel *clear* means encrypted.
> > + * - for AMD with vTOM and for Intel, *clear* means encrypted
> > */
> > switch (vendor) {
> > case CC_VENDOR_AMD:
> > - return val | cc_mask;
> > + if (sev_status & MSR_AMD64_SNP_VTOM)
> > + return val & ~cc_mask;
>
> This is silly. It should simply be:
>
> if (sev_status & MSR_AMD64_SNP_VTOM)
> return val;
>
To be clear, cc_mask contains the vTOM bit. It's not zero. See the
call to cc_set_mask() further down below. My code makes sure the
vTOM bit is *not* set for the encrypted case, just like the
CC_VENDOR_INTEL code below does for the TDX SHARED bit.
>
> > + else
> > + return val | cc_mask;
> > case CC_VENDOR_INTEL:
> > return val & ~cc_mask;
> > default:
> > @@ -120,7 +139,10 @@ u64 cc_mkdec(u64 val)
> > /* See comment in cc_mkenc() */
> > switch (vendor) {
> > case CC_VENDOR_AMD:
> > - return val & ~cc_mask;
> > + if (sev_status & MSR_AMD64_SNP_VTOM)
> > + return val | cc_mask;
>
> So if you set the C-bit, that doesn't make it decrypted on AMD. cc_mask
> on VTOM is 0 so why even bother?
cc_mask is *not* zero in the vTOM case. It contains the vTOM bit.
The C-bit is not used or set in the vTOM case.
>
> Same as the above.
>
> > + else
> > + return val & ~cc_mask;
> > case CC_VENDOR_INTEL:
> > return val | cc_mask;
> > default:
>
> ...
>
> > +void __init hv_vtom_init(void)
> > +{
> > + /*
> > + * By design, a VM using vTOM doesn't see the SEV setting,
> > + * so SEV initialization is bypassed and sev_status isn't set.
> > + * Set it here to indicate a vTOM VM.
> > + */
>
> This looks like a hack. The SEV status MSR cannot be intercepted so the
> guest should see vTOM. How are you running vTOM without setting it even up?!
>
In a vTOM VM, CPUID leaf 0x8000001f is filtered so it does *not* return
Bit 1 (SEV) as set. Consequently, sme_enable() does not read MSR_AMD64_SEV
and does not populate sev_status. The Linux boot sequence proceeds as if
SEV-SNP (and any other memory encryption) is *not* enabled, which is the
whole point of vTOM mode. The tricky SME/SEV code for setting the C-bit,
getting the kernel encrypted, etc. is not needed or wanted because the
hardware already encrypts all memory by default. The bootloader runs
with memory encrypted, it loads the kernel into encrypted memory, and
so forth.
sev_status is used here only to communicate to cc_platform_has() and
cc_mkenc() and cc_mkdec() that we're in vTOM mode. If using sev_status
to communicate is confusing, this could just as easily be some new global
variable, and sev_status could be left as all zeros.
Michael
> > + sev_status = MSR_AMD64_SNP_VTOM;
> > + cc_set_vendor(CC_VENDOR_AMD);
> > + cc_set_mask(ms_hyperv.shared_gpa_boundary);
> > + physical_mask &= ms_hyperv.shared_gpa_boundary - 1;
> > +
> > + x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
> > + x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
> > + x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
> > + x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
> > +}
> > +
> > +#endif /* CONFIG_AMD_MEM_ENCRYPT */
On Mon, Mar 20, 2023 at 01:30:54PM +0000, Michael Kelley (LINUX) wrote:
> In a vTOM VM, CPUID leaf 0x8000001f is filtered so it does *not* return
> Bit 1 (SEV) as set. Consequently, sme_enable() does not read MSR_AMD64_SEV
> and does not populate sev_status.
So how much of the hardware side of vTOM are you actually using besides
the actual encryption?
Virtual TOM MSR (C001_0135)? Anything else?
AFAICT, you're passing the vTOM value from CPUID from the hypervisor so
I'm guessing that happens underneath in the hypervisor?
I'd like to make sure there are no more "surprises" down the road...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
From: Borislav Petkov <[email protected]> Sent: Monday, March 20, 2023 11:17 AM
>
> On Mon, Mar 20, 2023 at 01:30:54PM +0000, Michael Kelley (LINUX) wrote:
> > In a vTOM VM, CPUID leaf 0x8000001f is filtered so it does *not* return
> > Bit 1 (SEV) as set. Consequently, sme_enable() does not read MSR_AMD64_SEV
> > and does not populate sev_status.
>
> So how much of the hardware side of vTOM are you actually using besides
> the actual encryption?
vTOM mode in Linux is just turning on/off the vTOM bit in the PTE
to create unencrypted or encrypted mappings, with encrypted being the
default. There's no other hardware dependency except CPUID leaf
0x8000001f reporting that SEV is not enabled, and the GHCB protocol
(if you want to call that "hardware") as mentioned below.
>
> Virtual TOM MSR (C001_0135)? Anything else?
>
> AFAICT, you're passing the vTOM value from CPUID from the hypervisor so
> I'm guessing that happens underneath in the hypervisor?
Correct. Linux in vTOM mode is not reading MSR 0xC0010135. The
PTE bit position of the vTOM bit is coming from Hyper-V (or the paravisor)
via a synthetic MSR. Presumably Hyper-V or the paravisor is reading
the vTOM MSR, but I haven't reviewed that code.
>
> I'd like to make sure there are no more "surprises" down the road...
>
The only other vTOM changes are for software protocols for communication
between the guest and Hyper-V (or the paravisor). Some hypercalls and
synthetic MSR accesses need to bypass the paravisor and are handled
with the GHCB protocol. The Hyper-V and VMbus specific code in Linux
handles those idiosyncrasies. That code went into the 5.15 kernel and
isn't modified by this patch set.
The vTOM case is down to the bare minimum in the use of the hardware
functionality, so it's unlikely anything else would turn up as being different.
Michael
On Mon, Mar 20, 2023 at 06:50:05PM +0000, Michael Kelley (LINUX) wrote:
> The vTOM case is down to the bare minimum in the use of the hardware
> functionality, so it's unlikely anything else would turn up as being
> different.
Ok, lemme queue 1-2,4-6 as previously mentioned.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Mar 08, 2023 at 06:40:13PM -0800, Michael Kelley wrote:
> To support PCI pass-thru devices in Confidential VMs, Hyper-V
> has added hypercalls to read and write MMIO space. Add the
> appropriate definitions to hyperv-tlfs.h and implement
> functions to make the hypercalls.
>
> Co-developed-by: Dexuan Cui <[email protected]>
> Signed-off-by: Dexuan Cui <[email protected]>
> Signed-off-by: Michael Kelley <[email protected]>
> Reviewed-by: Haiyang Zhang <[email protected]>
> ---
> arch/x86/include/asm/hyperv-tlfs.h | 3 ++
> drivers/pci/controller/pci-hyperv.c | 64 +++++++++++++++++++++++++++++++++++++
> include/asm-generic/hyperv-tlfs.h | 22 +++++++++++++
> 3 files changed, 89 insertions(+)
Nit: I'd squash this in with the patch where the calls are used,
don't think this patch is bisectable as it stands (maybe you
split them for review purposes, apologies if so).
Lorenzo
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index 0b73a80..b4fb75b 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -122,6 +122,9 @@
> /* Recommend using enlightened VMCS */
> #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)
>
> +/* Use hypercalls for MMIO config space access */
> +#define HV_X64_USE_MMIO_HYPERCALLS BIT(21)
> +
> /*
> * CPU management features identification.
> * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index f33370b..d78a419 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1041,6 +1041,70 @@ static int wslot_to_devfn(u32 wslot)
> return PCI_DEVFN(slot_no.bits.dev, slot_no.bits.func);
> }
>
> +static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32 *val)
> +{
> + struct hv_mmio_read_input *in;
> + struct hv_mmio_read_output *out;
> + u64 ret;
> +
> + /*
> + * Must be called with interrupts disabled so it is safe
> + * to use the per-cpu input argument page. Use it for
> + * both input and output.
> + */
> + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
> + in->gpa = gpa;
> + in->size = size;
> +
> + ret = hv_do_hypercall(HVCALL_MMIO_READ, in, out);
> + if (hv_result_success(ret)) {
> + switch (size) {
> + case 1:
> + *val = *(u8 *)(out->data);
> + break;
> + case 2:
> + *val = *(u16 *)(out->data);
> + break;
> + default:
> + *val = *(u32 *)(out->data);
> + break;
> + }
> + } else
> + dev_err(dev, "MMIO read hypercall error %llx addr %llx size %d\n",
> + ret, gpa, size);
> +}
> +
> +static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32 val)
> +{
> + struct hv_mmio_write_input *in;
> + u64 ret;
> +
> + /*
> + * Must be called with interrupts disabled so it is safe
> + * to use the per-cpu input argument memory.
> + */
> + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + in->gpa = gpa;
> + in->size = size;
> + switch (size) {
> + case 1:
> + *(u8 *)(in->data) = val;
> + break;
> + case 2:
> + *(u16 *)(in->data) = val;
> + break;
> + default:
> + *(u32 *)(in->data) = val;
> + break;
> + }
> +
> + ret = hv_do_hypercall(HVCALL_MMIO_WRITE, in, NULL);
> + if (!hv_result_success(ret))
> + dev_err(dev, "MMIO write hypercall error %llx addr %llx size %d\n",
> + ret, gpa, size);
> +}
> +
> /*
> * PCI Configuration Space for these root PCI buses is implemented as a pair
> * of pages in memory-mapped I/O space. Writing to the first page chooses
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index b870983..ea406e9 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -168,6 +168,8 @@ struct ms_hyperv_tsc_page {
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> +#define HVCALL_MMIO_READ 0x0106
> +#define HVCALL_MMIO_WRITE 0x0107
>
> /* Extended hypercalls */
> #define HV_EXT_CALL_QUERY_CAPABILITIES 0x8001
> @@ -796,4 +798,24 @@ struct hv_memory_hint {
> union hv_gpa_page_range ranges[];
> } __packed;
>
> +/* Data structures for HVCALL_MMIO_READ and HVCALL_MMIO_WRITE */
> +#define HV_HYPERCALL_MMIO_MAX_DATA_LENGTH 64
> +
> +struct hv_mmio_read_input {
> + u64 gpa;
> + u32 size;
> + u32 reserved;
> +} __packed;
> +
> +struct hv_mmio_read_output {
> + u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
> +} __packed;
> +
> +struct hv_mmio_write_input {
> + u64 gpa;
> + u32 size;
> + u32 reserved;
> + u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
> +} __packed;
> +
> #endif
> --
> 1.8.3.1
>
From: Lorenzo Pieralisi <[email protected]> Sent: Friday, March 24, 2023 7:48 AM
>
> On Wed, Mar 08, 2023 at 06:40:13PM -0800, Michael Kelley wrote:
> > To support PCI pass-thru devices in Confidential VMs, Hyper-V
> > has added hypercalls to read and write MMIO space. Add the
> > appropriate definitions to hyperv-tlfs.h and implement
> > functions to make the hypercalls.
> >
> > Co-developed-by: Dexuan Cui <[email protected]>
> > Signed-off-by: Dexuan Cui <[email protected]>
> > Signed-off-by: Michael Kelley <[email protected]>
> > Reviewed-by: Haiyang Zhang <[email protected]>
> > ---
> > arch/x86/include/asm/hyperv-tlfs.h | 3 ++
> > drivers/pci/controller/pci-hyperv.c | 64
> +++++++++++++++++++++++++++++++++++++
> > include/asm-generic/hyperv-tlfs.h | 22 +++++++++++++
> > 3 files changed, 89 insertions(+)
>
> Nit: I'd squash this in with the patch where the calls are used,
> don't think this patch is bisectable as it stands (maybe you
> split them for review purposes, apologies if so).
>
> Lorenzo
I did split the new code into two patches to make it more
consumable from a review standpoint. But I'm not understanding
what you mean by not being bisectable. After applying the first
of the two patches, everything should still compile and work
even though there are no users of the new hypercalls. Or maybe
your concern is that there would be "unused function" warnings?
In any case, squashing the two patches isn't a problem.
Michael
>
> > diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> > index 0b73a80..b4fb75b 100644
> > --- a/arch/x86/include/asm/hyperv-tlfs.h
> > +++ b/arch/x86/include/asm/hyperv-tlfs.h
> > @@ -122,6 +122,9 @@
> > /* Recommend using enlightened VMCS */
> > #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED BIT(14)
> >
> > +/* Use hypercalls for MMIO config space access */
> > +#define HV_X64_USE_MMIO_HYPERCALLS BIT(21)
> > +
> > /*
> > * CPU management features identification.
> > * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > index f33370b..d78a419 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -1041,6 +1041,70 @@ static int wslot_to_devfn(u32 wslot)
> > return PCI_DEVFN(slot_no.bits.dev, slot_no.bits.func);
> > }
> >
> > +static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32 *val)
> > +{
> > + struct hv_mmio_read_input *in;
> > + struct hv_mmio_read_output *out;
> > + u64 ret;
> > +
> > + /*
> > + * Must be called with interrupts disabled so it is safe
> > + * to use the per-cpu input argument page. Use it for
> > + * both input and output.
> > + */
> > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > + out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
> > + in->gpa = gpa;
> > + in->size = size;
> > +
> > + ret = hv_do_hypercall(HVCALL_MMIO_READ, in, out);
> > + if (hv_result_success(ret)) {
> > + switch (size) {
> > + case 1:
> > + *val = *(u8 *)(out->data);
> > + break;
> > + case 2:
> > + *val = *(u16 *)(out->data);
> > + break;
> > + default:
> > + *val = *(u32 *)(out->data);
> > + break;
> > + }
> > + } else
> > + dev_err(dev, "MMIO read hypercall error %llx addr %llx size %d\n",
> > + ret, gpa, size);
> > +}
> > +
> > +static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32 val)
> > +{
> > + struct hv_mmio_write_input *in;
> > + u64 ret;
> > +
> > + /*
> > + * Must be called with interrupts disabled so it is safe
> > + * to use the per-cpu input argument memory.
> > + */
> > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > + in->gpa = gpa;
> > + in->size = size;
> > + switch (size) {
> > + case 1:
> > + *(u8 *)(in->data) = val;
> > + break;
> > + case 2:
> > + *(u16 *)(in->data) = val;
> > + break;
> > + default:
> > + *(u32 *)(in->data) = val;
> > + break;
> > + }
> > +
> > + ret = hv_do_hypercall(HVCALL_MMIO_WRITE, in, NULL);
> > + if (!hv_result_success(ret))
> > + dev_err(dev, "MMIO write hypercall error %llx addr %llx size %d\n",
> > + ret, gpa, size);
> > +}
> > +
> > /*
> > * PCI Configuration Space for these root PCI buses is implemented as a pair
> > * of pages in memory-mapped I/O space. Writing to the first page chooses
> > diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> > index b870983..ea406e9 100644
> > --- a/include/asm-generic/hyperv-tlfs.h
> > +++ b/include/asm-generic/hyperv-tlfs.h
> > @@ -168,6 +168,8 @@ struct ms_hyperv_tsc_page {
> > #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
> > #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> > #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
> > +#define HVCALL_MMIO_READ 0x0106
> > +#define HVCALL_MMIO_WRITE 0x0107
> >
> > /* Extended hypercalls */
> > #define HV_EXT_CALL_QUERY_CAPABILITIES 0x8001
> > @@ -796,4 +798,24 @@ struct hv_memory_hint {
> > union hv_gpa_page_range ranges[];
> > } __packed;
> >
> > +/* Data structures for HVCALL_MMIO_READ and HVCALL_MMIO_WRITE */
> > +#define HV_HYPERCALL_MMIO_MAX_DATA_LENGTH 64
> > +
> > +struct hv_mmio_read_input {
> > + u64 gpa;
> > + u32 size;
> > + u32 reserved;
> > +} __packed;
> > +
> > +struct hv_mmio_read_output {
> > + u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
> > +} __packed;
> > +
> > +struct hv_mmio_write_input {
> > + u64 gpa;
> > + u32 size;
> > + u32 reserved;
> > + u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
> > +} __packed;
> > +
> > #endif
> > --
> > 1.8.3.1
> >
On Fri, Mar 24, 2023 at 03:13:02PM +0000, Michael Kelley (LINUX) wrote:
> From: Lorenzo Pieralisi <[email protected]> Sent: Friday, March 24, 2023 7:48 AM
> >
> > On Wed, Mar 08, 2023 at 06:40:13PM -0800, Michael Kelley wrote:
> > > To support PCI pass-thru devices in Confidential VMs, Hyper-V
> > > has added hypercalls to read and write MMIO space. Add the
> > > appropriate definitions to hyperv-tlfs.h and implement
> > > functions to make the hypercalls.
> > >
> > > Co-developed-by: Dexuan Cui <[email protected]>
> > > Signed-off-by: Dexuan Cui <[email protected]>
> > > Signed-off-by: Michael Kelley <[email protected]>
> > > Reviewed-by: Haiyang Zhang <[email protected]>
> > > ---
> > > arch/x86/include/asm/hyperv-tlfs.h | 3 ++
> > > drivers/pci/controller/pci-hyperv.c | 64
> > +++++++++++++++++++++++++++++++++++++
> > > include/asm-generic/hyperv-tlfs.h | 22 +++++++++++++
> > > 3 files changed, 89 insertions(+)
> >
> > Nit: I'd squash this in with the patch where the calls are used,
> > don't think this patch is bisectable as it stands (maybe you
> > split them for review purposes, apologies if so).
> >
> > Lorenzo
>
> I did split the new code into two patches to make it more
> consumable from a review standpoint. But I'm not understanding
> what you mean by not being bisectable. After applying the first
> of the two patches, everything should still compile and work
> even though there are no users of the new hypercalls. Or maybe
> your concern is that there would be "unused function" warnings?
That's what I meant - that's it.
> In any case, squashing the two patches isn't a problem.
Thanks,
Lorenzo
On Thu, Mar 23, 2023 at 02:43:06PM +0100, Borislav Petkov wrote:
> Ok, lemme queue 1-2,4-6 as previously mentioned.
With first six applied:
arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status'
if (sev_status & MSR_AMD64_SNP_VTOM)
^
arch/x86/coco/core.c:139:7: error: use of undeclared identifier 'sev_status'
if (sev_status & MSR_AMD64_SNP_VTOM)
^
2 errors generated.
make[3]: *** [scripts/Makefile.build:252: arch/x86/coco/core.o] Error 1
make[2]: *** [scripts/Makefile.build:494: arch/x86/coco] Error 2
make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:2025: .] Error 2
compiler is:
Debian clang version 14.0.6-2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
.config is attached.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
> From: Borislav Petkov <[email protected]>
> Sent: Friday, March 24, 2023 8:49 AM
> ...
> With first six applied:
>
> arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status'
> if (sev_status & MSR_AMD64_SNP_VTOM)
> ^
Your config doesn't define CONFIG_AMD_MEM_ENCRYPT:
# CONFIG_AMD_MEM_ENCRYPT is not set
Hi,
On 3/24/23 10:10 AM, Dexuan Cui wrote:
>> From: Borislav Petkov <[email protected]>
>> Sent: Friday, March 24, 2023 8:49 AM
>> ...
>> With first six applied:
>>
>> arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status'
>> if (sev_status & MSR_AMD64_SNP_VTOM)
>> ^
>
> Your config doesn't define CONFIG_AMD_MEM_ENCRYPT:
> # CONFIG_AMD_MEM_ENCRYPT is not set
If you have config dependency, I think you should fix it in the code or add
Kconfig dependency.
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
On Fri, Mar 24, 2023 at 05:10:26PM +0000, Dexuan Cui wrote:
> Your config doesn't define CONFIG_AMD_MEM_ENCRYPT:
> # CONFIG_AMD_MEM_ENCRYPT is not set
That's why it is called randconfig builds. That doesn't mean that they
should not build properly.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
From: Borislav Petkov <[email protected]> Sent: Friday, March 24, 2023 8:49 AM
>
> On Thu, Mar 23, 2023 at 02:43:06PM +0100, Borislav Petkov wrote:
> > Ok, lemme queue 1-2,4-6 as previously mentioned.
>
> With first six applied:
>
> arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status'
> if (sev_status & MSR_AMD64_SNP_VTOM)
> ^
> arch/x86/coco/core.c:139:7: error: use of undeclared identifier 'sev_status'
> if (sev_status & MSR_AMD64_SNP_VTOM)
> ^
> 2 errors generated.
> make[3]: *** [scripts/Makefile.build:252: arch/x86/coco/core.o] Error 1
> make[2]: *** [scripts/Makefile.build:494: arch/x86/coco] Error 2
> make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2
> make[1]: *** Waiting for unfinished jobs....
> make: *** [Makefile:2025: .] Error 2
>
> compiler is:
>
> Debian clang version 14.0.6-2
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
>
> .config is attached.
>
OK, I see what went wrong. I had tested with CONFIG_AMD_MEM_ENCRYPT=n
and didn't see any compile problems. It turns out in my test, arch/x86/coco/core.c
wasn't built at all because I did not also have TDX configured, so I didn't see
any errors. But with CONFIG_INTEL_TDX_GUEST=y, coco/core.c gets built, and
the error with undefined sev_status pops out.
The straightforward fix is somewhat ugly. That's to put #ifdef
CONFIG_AMD_MEM_ENCRYPT around the entire CC_VENDOR_AMD
case in cc_mkenc() and in cc_mkdec(). Or put it just around the test of
sev_status.
Perhaps a cleaner way would be to have a "vendor_subtype" variable
declared in arch/x86/coco/core.c and tested instead of sev_status.
That subtype variable would be set from hv_vtom_init(), maybe via
a separate accessor function. But didn't I recently see a patch that
makes the existing "vendor" variable no longer static? In that case
just setting vendor_subtype without the accessor function may be
OK.
What's your preference Boris? I can spin a v7 of the patch series
that fixes this, and that squashes the last two patches of the series
per Lorenz Pieralisi's comments.
Michael
From: Michael Kelley (LINUX) <[email protected]>
>
> From: Borislav Petkov <[email protected]> Sent: Friday, March 24, 2023 8:49 AM
> >
> > On Thu, Mar 23, 2023 at 02:43:06PM +0100, Borislav Petkov wrote:
> > > Ok, lemme queue 1-2,4-6 as previously mentioned.
> >
> > With first six applied:
> >
> > arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status'
> > if (sev_status & MSR_AMD64_SNP_VTOM)
> > ^
> > arch/x86/coco/core.c:139:7: error: use of undeclared identifier 'sev_status'
> > if (sev_status & MSR_AMD64_SNP_VTOM)
> > ^
> > 2 errors generated.
> > make[3]: *** [scripts/Makefile.build:252: arch/x86/coco/core.o] Error 1
> > make[2]: *** [scripts/Makefile.build:494: arch/x86/coco] Error 2
> > make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2
> > make[1]: *** Waiting for unfinished jobs....
> > make: *** [Makefile:2025: .] Error 2
> >
> > compiler is:
> >
> > Debian clang version 14.0.6-2
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> >
> > .config is attached.
> >
>
> OK, I see what went wrong. I had tested with CONFIG_AMD_MEM_ENCRYPT=n
> and didn't see any compile problems. It turns out in my test, arch/x86/coco/core.c
> wasn't built at all because I did not also have TDX configured, so I didn't see
> any errors. But with CONFIG_INTEL_TDX_GUEST=y, coco/core.c gets built, and
> the error with undefined sev_status pops out.
>
> The straightforward fix is somewhat ugly. That's to put #ifdef
> CONFIG_AMD_MEM_ENCRYPT around the entire CC_VENDOR_AMD
> case in cc_mkenc() and in cc_mkdec(). Or put it just around the test of
> sev_status.
>
> Perhaps a cleaner way would be to have a "vendor_subtype" variable
> declared in arch/x86/coco/core.c and tested instead of sev_status.
> That subtype variable would be set from hv_vtom_init(), maybe via
> a separate accessor function. But didn't I recently see a patch that
> makes the existing "vendor" variable no longer static? In that case
> just setting vendor_subtype without the accessor function may be
> OK.
>
> What's your preference Boris? I can spin a v7 of the patch series
> that fixes this, and that squashes the last two patches of the series
> per Lorenz Pieralisi's comments.
>
Actually, a pretty clean approach is to #define sev_status 0ULL in
the #else /* !CONFIG_AMD_MEM_ENCRYPT */ half of
arch/x86/include/asm/mem_encrypt.h. That's where the existing
extern statement is, and sme_me_mask is already handled that way.
I'll respin the patch set with that approach.
Michael
The following commit has been merged into the x86/sev branch of tip:
Commit-ID: e45e761b77bc0739e7e23258c4394013bbb919c7
Gitweb: https://git.kernel.org/tip/e45e761b77bc0739e7e23258c4394013bbb919c7
Author: Michael Kelley <[email protected]>
AuthorDate: Wed, 08 Mar 2023 18:40:06 -08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 27 Mar 2023 09:24:01 +02:00
init: Call mem_encrypt_init() after Hyper-V hypercall init is done
Full Hyper-V initialization, including support for hypercalls, is done
as an apic_post_init callback via late_time_init(). mem_encrypt_init()
needs to make hypercalls when it marks swiotlb memory as decrypted.
But mem_encrypt_init() is currently called a few lines before
late_time_init(), so the hypercalls don't work.
Fix this by moving mem_encrypt_init() after late_time_init() and
related clock initializations. The intervening initializations don't
do any I/O that requires the swiotlb, so moving mem_encrypt_init()
slightly later has no impact.
Signed-off-by: Michael Kelley <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
init/main.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/init/main.c b/init/main.c
index 4425d17..7e9c0ca 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1088,14 +1088,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
*/
locking_selftest();
- /*
- * This needs to be called before any devices perform DMA
- * operations that might use the SWIOTLB bounce buffers. It will
- * mark the bounce buffers as decrypted so that their usage will
- * not cause "plain-text" data to be decrypted when accessed.
- */
- mem_encrypt_init();
-
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
@@ -1112,6 +1104,17 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
late_time_init();
sched_clock_init();
calibrate_delay();
+
+ /*
+ * This needs to be called before any devices perform DMA
+ * operations that might use the SWIOTLB bounce buffers. It will
+ * mark the bounce buffers as decrypted so that their usage will
+ * not cause "plain-text" data to be decrypted when accessed. It
+ * must be called after late_time_init() so that Hyper-V x86/x64
+ * hypercalls work when the SWIOTLB bounce buffers are decrypted.
+ */
+ mem_encrypt_init();
+
pid_idr_init();
anon_vma_init();
#ifdef CONFIG_X86
The following commit has been merged into the x86/sev branch of tip:
Commit-ID: c7b5254bd802ee3868f1c59333545272dc700d6d
Gitweb: https://git.kernel.org/tip/c7b5254bd802ee3868f1c59333545272dc700d6d
Author: Michael Kelley <[email protected]>
AuthorDate: Wed, 08 Mar 2023 18:40:05 -08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 27 Mar 2023 09:23:21 +02:00
x86/mm: Handle decryption/re-encryption of bss_decrypted consistently
sme_postprocess_startup() decrypts the bss_decrypted section when
sme_me_mask is non-zero.
mem_encrypt_free_decrypted_mem() re-encrypts the unused portion based
on CC_ATTR_MEM_ENCRYPT.
In a Hyper-V guest VM using vTOM, these conditions are not equivalent
as sme_me_mask is always zero when using vTOM. Consequently,
mem_encrypt_free_decrypted_mem() attempts to re-encrypt memory that was
never decrypted.
So check sme_me_mask in mem_encrypt_free_decrypted_mem() too.
Hyper-V guests using vTOM don't need the bss_decrypted section to be
decrypted, so skipping the decryption/re-encryption doesn't cause a
problem.
Signed-off-by: Michael Kelley <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/mm/mem_encrypt_amd.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index 9c4d8db..e0b51c0 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -513,10 +513,14 @@ void __init mem_encrypt_free_decrypted_mem(void)
npages = (vaddr_end - vaddr) >> PAGE_SHIFT;
/*
- * The unused memory range was mapped decrypted, change the encryption
- * attribute from decrypted to encrypted before freeing it.
+ * If the unused memory range was mapped decrypted, change the encryption
+ * attribute from decrypted to encrypted before freeing it. Base the
+ * re-encryption on the same condition used for the decryption in
+ * sme_postprocess_startup(). Higher level abstractions, such as
+ * CC_ATTR_MEM_ENCRYPT, aren't necessarily equivalent in a Hyper-V VM
+ * using vTOM, where sme_me_mask is always zero.
*/
- if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) {
+ if (sme_me_mask) {
r = set_memory_encrypted(vaddr, npages);
if (r) {
pr_warn("failed to free unused decrypted pages\n");
The following commit has been merged into the x86/sev branch of tip:
Commit-ID: 88e378d400fa0544d51cf62037e7774d8a4b4379
Gitweb: https://git.kernel.org/tip/88e378d400fa0544d51cf62037e7774d8a4b4379
Author: Michael Kelley <[email protected]>
AuthorDate: Wed, 08 Mar 2023 18:40:02 -08:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Sun, 26 Mar 2023 23:42:40 +02:00
x86/ioremap: Add hypervisor callback for private MMIO mapping in coco VM
Current code always maps MMIO devices as shared (decrypted) in a
confidential computing VM. But Hyper-V guest VMs on AMD SEV-SNP with vTOM
use a paravisor running in VMPL0 to emulate some devices, such as the
IO-APIC and TPM. In such a case, the device must be accessed as private
(encrypted) because the paravisor emulates the device at an address below
vTOM, where all accesses are encrypted.
Add a new hypervisor callback to determine if an MMIO address should
be mapped private. The callback allows hypervisor-specific code to handle
any quirks, the use of a paravisor, etc. in determining whether a mapping
must be private. If the callback is not used by a hypervisor, default
to returning "false", which is consistent with normal coco VM behavior.
Use this callback as another special case to check for when doing
ioremap(). Just checking the starting address is sufficient as an
ioremap range must be all private or all shared.
Also make the callback in early boot IO-APIC mapping code that uses the
fixmap.
[ bp: Touchups. ]
Signed-off-by: Michael Kelley <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/x86_init.h | 4 ++++
arch/x86/kernel/apic/io_apic.c | 10 ++++++++--
arch/x86/kernel/x86_init.c | 2 ++
arch/x86/mm/ioremap.c | 5 +++++
4 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index c1c8c58..acc20ae 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -259,11 +259,15 @@ struct x86_legacy_features {
* VMMCALL under SEV-ES. Needs to return 'false'
* if the checks fail. Called from the #VC
* exception handler.
+ * @is_private_mmio: For CoCo VMs, must map MMIO address as private.
+ * Used when device is emulated by a paravisor
+ * layer in the VM context.
*/
struct x86_hyper_runtime {
void (*pin_vcpu)(int cpu);
void (*sev_es_hcall_prepare)(struct ghcb *ghcb, struct pt_regs *regs);
bool (*sev_es_hcall_finish)(struct ghcb *ghcb, struct pt_regs *regs);
+ bool (*is_private_mmio)(u64 addr);
};
/**
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 1f83b05..146671d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -66,6 +66,7 @@
#include <asm/hw_irq.h>
#include <asm/apic.h>
#include <asm/pgtable.h>
+#include <asm/x86_init.h>
#define for_each_ioapic(idx) \
for ((idx) = 0; (idx) < nr_ioapics; (idx)++)
@@ -2680,10 +2681,15 @@ static void io_apic_set_fixmap(enum fixed_addresses idx, phys_addr_t phys)
pgprot_t flags = FIXMAP_PAGE_NOCACHE;
/*
- * Ensure fixmaps for IOAPIC MMIO respect memory encryption pgprot
+ * Ensure fixmaps for IO-APIC MMIO respect memory encryption pgprot
* bits, just like normal ioremap():
*/
- flags = pgprot_decrypted(flags);
+ if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
+ if (x86_platform.hyper.is_private_mmio(phys))
+ flags = pgprot_encrypted(flags);
+ else
+ flags = pgprot_decrypted(flags);
+ }
__set_fixmap(idx, phys, flags);
}
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index ef80d36..95be383 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -134,6 +134,7 @@ static void enc_status_change_prepare_noop(unsigned long vaddr, int npages, bool
static bool enc_status_change_finish_noop(unsigned long vaddr, int npages, bool enc) { return false; }
static bool enc_tlb_flush_required_noop(bool enc) { return false; }
static bool enc_cache_flush_required_noop(void) { return false; }
+static bool is_private_mmio_noop(u64 addr) {return false; }
struct x86_platform_ops x86_platform __ro_after_init = {
.calibrate_cpu = native_calibrate_cpu_early,
@@ -149,6 +150,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
.realmode_reserve = reserve_real_mode,
.realmode_init = init_real_mode,
.hyper.pin_vcpu = x86_op_int_noop,
+ .hyper.is_private_mmio = is_private_mmio_noop,
.guest = {
.enc_status_change_prepare = enc_status_change_prepare_noop,
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 6453fba..aa7d279 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -116,6 +116,11 @@ static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *des
if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;
+ if (x86_platform.hyper.is_private_mmio(addr)) {
+ desc->flags |= IORES_MAP_ENCRYPTED;
+ return;
+ }
+
if (!IS_ENABLED(CONFIG_EFI))
return;