2021-07-28 14:55:01

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

From: Tianyu Lan <[email protected]>


Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Change sicne RFC V4:
- Introduce dma map decrypted function to remap bounce buffer
and provide dma map decrypted ops for platform to hook callback.
- Split swiotlb and dma map decrypted change into two patches
- Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
- Add interface set_memory_decrypted_map() to decrypt memory and
map bounce buffer in extra address space
- Remove swiotlb remap function and store the remap address
returned by set_memory_decrypted_map() in swiotlb mem data structure.
- Introduce hv_set_mem_enc() to make code more readable in the __set_memory_enc_dec().

Change since RFC v2:
- Remove not UIO driver in Isolation VM patch
- Use vmap_pfn() to replace ioremap_page_range function in
order to avoid exposing symbol ioremap_page_range() and
ioremap_page_range()
- Call hv set mem host visibility hvcall in set_memory_encrypted/decrypted()
- Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
- Fix code style


Tianyu Lan (13):
x86/HV: Initialize GHCB page in Isolation VM
x86/HV: Initialize shared memory boundary in the Isolation VM.
x86/HV: Add new hvcall guest address host visibility support
HV: Mark vmbus ring buffer visible to host in Isolation VM
HV: Add Write/Read MSR registers via ghcb page
HV: Add ghcb hvcall support for SNP VM
HV/Vmbus: Add SNP support for VMbus channel initiate message
HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
HV/Netvsc: Add Isolation VM support for netvsc driver
HV/Storvsc: Add Isolation VM support for storvsc driver

arch/x86/hyperv/Makefile | 2 +-
arch/x86/hyperv/hv_init.c | 87 +++++++--
arch/x86/hyperv/ivm.c | 296 +++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 18 ++
arch/x86/include/asm/mshyperv.h | 86 ++++++++-
arch/x86/include/asm/sev.h | 4 +
arch/x86/kernel/cpu/mshyperv.c | 5 +
arch/x86/kernel/sev-shared.c | 21 +-
arch/x86/mm/pat/set_memory.c | 6 +-
arch/x86/xen/pci-swiotlb-xen.c | 3 +-
drivers/hv/Kconfig | 1 +
drivers/hv/channel.c | 48 ++++-
drivers/hv/connection.c | 71 ++++++-
drivers/hv/hv.c | 129 +++++++++----
drivers/hv/hyperv_vmbus.h | 3 +
drivers/hv/ring_buffer.c | 84 ++++++--
drivers/hv/vmbus_drv.c | 3 +
drivers/iommu/hyperv-iommu.c | 65 +++++++
drivers/net/hyperv/hyperv_net.h | 6 +
drivers/net/hyperv/netvsc.c | 144 +++++++++++++-
drivers/net/hyperv/rndis_filter.c | 2 +
drivers/scsi/storvsc_drv.c | 68 ++++++-
include/asm-generic/hyperv-tlfs.h | 1 +
include/asm-generic/mshyperv.h | 53 +++++-
include/linux/dma-map-ops.h | 9 +
include/linux/hyperv.h | 16 ++
include/linux/swiotlb.h | 4 +
kernel/dma/mapping.c | 22 +++
kernel/dma/swiotlb.c | 11 +-
29 files changed, 1166 insertions(+), 102 deletions(-)
create mode 100644 arch/x86/hyperv/ivm.c

--
2.25.1



2021-07-28 14:55:02

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM

From: Tianyu Lan <[email protected]>

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/hv_init.c | 73 +++++++++++++++++++++++++++++++--
arch/x86/include/asm/mshyperv.h | 2 +
include/asm-generic/mshyperv.h | 2 +
3 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 4a643a85d570..ee449c076ef4 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
#include <linux/kexec.h>
#include <linux/version.h>
#include <linux/vmalloc.h>
+#include <linux/io.h>
#include <linux/mm.h>
#include <linux/hyperv.h>
#include <linux/slab.h>
@@ -42,6 +43,26 @@ static void *hv_hypercall_pg_saved;
struct hv_vp_assist_page **hv_vp_assist_page;
EXPORT_SYMBOL_GPL(hv_vp_assist_page);

+static int hyperv_init_ghcb(void)
+{
+ u64 ghcb_gpa;
+ void *ghcb_va;
+ void **ghcb_base;
+
+ if (!ms_hyperv.ghcb_base)
+ return -EINVAL;
+
+ rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+ ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+ if (!ghcb_va)
+ return -ENOMEM;
+
+ ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+ *ghcb_base = ghcb_va;
+
+ return 0;
+}
+
static int hv_cpu_init(unsigned int cpu)
{
struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
@@ -75,6 +96,8 @@ static int hv_cpu_init(unsigned int cpu)
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}

+ hyperv_init_ghcb();
+
return 0;
}

@@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu)
{
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
+ unsigned long flags;
+ void **input_arg;
+ void *pg;
+ void **ghcb_va = NULL;
+
+ local_irq_save(flags);
+ input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
+ pg = *input_arg;
+ *input_arg = NULL;
+
+ if (hv_root_partition) {
+ void **output_arg;
+
+ output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
+ *output_arg = NULL;
+ }
+
+ if (ms_hyperv.ghcb_base) {
+ ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+ if (*ghcb_va)
+ memunmap(*ghcb_va);
+ *ghcb_va = NULL;
+ }
+
+ local_irq_restore(flags);

hv_common_cpu_die(cpu);

@@ -340,9 +388,22 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
- if (hv_hypercall_pg == NULL) {
- wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
- goto remove_cpuhp_state;
+ if (hv_hypercall_pg == NULL)
+ goto clean_guest_os_id;
+
+ if (hv_isolation_type_snp()) {
+ ms_hyperv.ghcb_base = alloc_percpu(void *);
+ if (!ms_hyperv.ghcb_base)
+ goto clean_guest_os_id;
+
+ if (hyperv_init_ghcb()) {
+ free_percpu(ms_hyperv.ghcb_base);
+ ms_hyperv.ghcb_base = NULL;
+ goto clean_guest_os_id;
+ }
+
+ /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+ hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}

rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -403,7 +464,8 @@ void __init hyperv_init(void)
hv_query_ext_cap(0);
return;

-remove_cpuhp_state:
+clean_guest_os_id:
+ wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -431,6 +493,9 @@ void hyperv_cleanup(void)
*/
hv_hypercall_pg = NULL;

+ if (ms_hyperv.ghcb_base)
+ free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..6627cfd2bfba 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
#include <asm/paravirt.h>
#include <asm/mshyperv.h>

+DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+
typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
void *data);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c1ab6a6e72b5..4269f3174e58 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -36,6 +36,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+ void __percpu **ghcb_base;
};
extern struct ms_hyperv_info ms_hyperv;

@@ -237,6 +238,7 @@ bool hv_is_hyperv_initialized(void);
bool hv_is_hibernation_supported(void);
enum hv_isolation_type hv_get_isolation_type(void);
bool hv_is_isolation_supported(void);
+bool hv_isolation_type_snp(void);
void hyperv_cleanup(void);
bool hv_query_ext_cap(u64 cap_query);
#else /* CONFIG_HYPERV */
--
2.25.1


2021-07-28 14:55:23

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

From: Tianyu Lan <[email protected]>

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted().

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/Makefile | 2 +-
arch/x86/hyperv/ivm.c | 112 +++++++++++++++++++++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 18 +++++
arch/x86/include/asm/mshyperv.h | 3 +-
arch/x86/mm/pat/set_memory.c | 6 +-
include/asm-generic/hyperv-tlfs.h | 1 +
6 files changed, 139 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-y := hv_init.o mmu.o nested.o irqdomain.o
+obj-y := hv_init.o mmu.o nested.o irqdomain.o ivm.o
obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o

ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index 000000000000..24a58795abd8
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ * Tianyu Lan <[email protected]>
+ */
+
+#include <linux/hyperv.h>
+#include <linux/types.h>
+#include <linux/bitfield.h>
+#include <linux/slab.h>
+#include <asm/io.h>
+#include <asm/mshyperv.h>
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)
+{
+ struct hv_gpa_range_for_visibility **input_pcpu, *input;
+ u16 pages_processed;
+ u64 hv_status;
+ unsigned long flags;
+
+ /* no-op if partition isolation is not enabled */
+ if (!hv_is_isolation_supported())
+ return 0;
+
+ if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+ pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+ HV_MAX_MODIFY_GPA_REP_COUNT);
+ return -EINVAL;
+ }
+
+ local_irq_save(flags);
+ input_pcpu = (struct hv_gpa_range_for_visibility **)
+ this_cpu_ptr(hyperv_pcpu_input_arg);
+ input = *input_pcpu;
+ if (unlikely(!input)) {
+ local_irq_restore(flags);
+ return -EINVAL;
+ }
+
+ input->partition_id = HV_PARTITION_ID_SELF;
+ input->host_visibility = visibility;
+ input->reserved0 = 0;
+ input->reserved1 = 0;
+ memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+ hv_status = hv_do_rep_hypercall(
+ HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+ 0, input, &pages_processed);
+ local_irq_restore(flags);
+
+ if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+ return 0;
+
+ return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+/*
+ * hv_set_mem_host_visibility - Set specified memory visible to host.
+ *
+ * In Isolation VM, all guest memory is encrypted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host. This function works as wrap of hv_mark_gpa_visibility()
+ * with memory base and size.
+ */
+static int hv_set_mem_host_visibility(void *kbuffer, size_t size, u32 visibility)
+{
+ int pagecount = size >> HV_HYP_PAGE_SHIFT;
+ u64 *pfn_array;
+ int ret = 0;
+ int i, pfn;
+
+ if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
+ return 0;
+
+ pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+ if (!pfn_array)
+ return -ENOMEM;
+
+ for (i = 0, pfn = 0; i < pagecount; i++) {
+ pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+ pfn++;
+
+ if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+ ret |= hv_mark_gpa_visibility(pfn, pfn_array, visibility);
+ pfn = 0;
+
+ if (ret)
+ goto err_free_pfn_array;
+ }
+ }
+
+ err_free_pfn_array:
+ kfree(pfn_array);
+ return ret;
+}
+
+int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
+{
+ return hv_set_mem_host_visibility((void *)addr,
+ numpages * HV_HYP_PAGE_SIZE,
+ enc ? VMBUS_PAGE_NOT_VISIBLE
+ : VMBUS_PAGE_VISIBLE_READ_WRITE);
+}
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index f1366ce609e3..f027b5bf6076 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -276,6 +276,11 @@ enum hv_isolation_type {
#define HV_X64_MSR_TIME_REF_COUNT HV_REGISTER_TIME_REF_COUNT
#define HV_X64_MSR_REFERENCE_TSC HV_REGISTER_REFERENCE_TSC

+/* Hyper-V GPA map flags */
+#define VMBUS_PAGE_NOT_VISIBLE 0
+#define VMBUS_PAGE_VISIBLE_READ_ONLY 1
+#define VMBUS_PAGE_VISIBLE_READ_WRITE 3
+
/*
* Declare the MSR used to setup pages used to communicate with the hypervisor.
*/
@@ -578,4 +583,17 @@ enum hv_interrupt_type {

#include <asm-generic/hyperv-tlfs.h>

+/* All input parameters should be in single page. */
+#define HV_MAX_MODIFY_GPA_REP_COUNT \
+ ((PAGE_SIZE / sizeof(u64)) - 2)
+
+/* HvCallModifySparseGpaPageHostVisibility hypercall */
+struct hv_gpa_range_for_visibility {
+ u64 partition_id;
+ u32 host_visibility:2;
+ u32 reserved0:30;
+ u32 reserved1;
+ u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+} __packed;
+
#endif
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 6627cfd2bfba..68dd207c2603 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -190,7 +190,8 @@ struct irq_domain *hv_create_pci_msi_domain(void);
int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
struct hv_interrupt_entry *entry);
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
-
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
+int hv_set_mem_enc(unsigned long addr, int numpages, bool enc);
#else /* CONFIG_HYPERV */
static inline void hyperv_init(void) {}
static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..ba2a22886976 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -29,6 +29,8 @@
#include <asm/proto.h>
#include <asm/memtype.h>
#include <asm/set_memory.h>
+#include <asm/hyperv-tlfs.h>
+#include <asm/mshyperv.h>

#include "../mm_internal.h"

@@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
int ret;

/* Nothing to do if memory encryption is not active */
- if (!mem_encrypt_active())
+ if (hv_is_isolation_supported())
+ return hv_set_mem_enc(addr, numpages, enc);
+ else if (!mem_encrypt_active())
return 0;

/* Should not be working on unaligned addresses */
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 56348a541c50..8ed6733d5146 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -158,6 +158,7 @@ struct ms_hyperv_tsc_page {
#define HVCALL_RETARGET_INTERRUPT 0x007e
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
+#define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db

/* Extended hypercalls */
#define HV_EXT_CALL_QUERY_CAPABILITIES 0x8001
--
2.25.1


2021-07-28 14:55:37

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.

From: Tianyu Lan <[email protected]>

Hyper-V exposes shared memory boundary via cpuid
HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/kernel/cpu/mshyperv.c | 2 ++
include/asm-generic/mshyperv.h | 12 +++++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index dcfbd2770d7f..773e84e134b3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -327,6 +327,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.priv_high & HV_ISOLATION) {
ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+ ms_hyperv.shared_gpa_boundary =
+ (u64)1 << ms_hyperv.shared_gpa_boundary_bits;

pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 4269f3174e58..aa26d24a5ca9 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,8 +35,18 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
- u32 isolation_config_b;
+ union {
+ u32 isolation_config_b;
+ struct {
+ u32 cvm_type : 4;
+ u32 Reserved11 : 1;
+ u32 shared_gpa_boundary_active : 1;
+ u32 shared_gpa_boundary_bits : 6;
+ u32 Reserved12 : 20;
+ };
+ };
void __percpu **ghcb_base;
+ u64 shared_gpa_boundary;
};
extern struct ms_hyperv_info ms_hyperv;

--
2.25.1


2021-07-28 14:55:46

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

From: Tianyu Lan <[email protected]>

Mark vmbus ring buffer visible with set_memory_decrypted() when
establish gpadl handle.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/channel.c | 38 ++++++++++++++++++++++++++++++++++++--
include/linux/hyperv.h | 10 ++++++++++
2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c73b074..01048bb07082 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
#include <linux/hyperv.h>
#include <linux/uio.h>
#include <linux/interrupt.h>
+#include <linux/set_memory.h>
#include <asm/page.h>
#include <asm/mshyperv.h>

@@ -465,7 +466,7 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
struct list_head *curr;
u32 next_gpadl_handle;
unsigned long flags;
- int ret = 0;
+ int ret = 0, index;

next_gpadl_handle =
(atomic_inc_return(&vmbus_connection.next_gpadl_handle) - 1);
@@ -474,6 +475,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
if (ret)
return ret;

+ ret = set_memory_decrypted((unsigned long)kbuffer,
+ HVPFN_UP(size));
+ if (ret) {
+ pr_warn("Failed to set host visibility.\n");
+ return ret;
+ }
+
init_completion(&msginfo->waitevent);
msginfo->waiting_channel = channel;

@@ -539,6 +547,15 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
/* At this point, we received the gpadl created msg */
*gpadl_handle = gpadlmsg->gpadl;

+ if (type == HV_GPADL_BUFFER)
+ index = 0;
+ else
+ index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;
+
+ channel->gpadl_range[index].size = size;
+ channel->gpadl_range[index].buffer = kbuffer;
+ channel->gpadl_range[index].gpadlhandle = *gpadl_handle;
+
cleanup:
spin_lock_irqsave(&vmbus_connection.channelmsg_lock, flags);
list_del(&msginfo->msglistentry);
@@ -549,6 +566,11 @@ static int __vmbus_establish_gpadl(struct vmbus_channel *channel,
}

kfree(msginfo);
+
+ if (ret)
+ set_memory_encrypted((unsigned long)kbuffer,
+ HVPFN_UP(size));
+
return ret;
}

@@ -811,7 +833,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
struct vmbus_channel_gpadl_teardown *msg;
struct vmbus_channel_msginfo *info;
unsigned long flags;
- int ret;
+ int ret, i;

info = kzalloc(sizeof(*info) +
sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
@@ -859,6 +881,18 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 gpadl_handle)
spin_unlock_irqrestore(&vmbus_connection.channelmsg_lock, flags);

kfree(info);
+
+ /* Find gpadl buffer virtual address and size. */
+ for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
+ if (channel->gpadl_range[i].gpadlhandle == gpadl_handle)
+ break;
+
+ if (set_memory_encrypted((unsigned long)channel->gpadl_range[i].buffer,
+ HVPFN_UP(channel->gpadl_range[i].size)))
+ pr_warn("Fail to set mem host visibility.\n");
+
+ channel->gpadl_range[i].gpadlhandle = 0;
+
return ret;
}
EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2e859d2f9609..06eccaba10c5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -809,6 +809,14 @@ struct vmbus_device {

#define VMBUS_DEFAULT_MAX_PKT_SIZE 4096

+struct vmbus_gpadl_range {
+ u32 gpadlhandle;
+ u32 size;
+ void *buffer;
+};
+
+#define VMBUS_GPADL_RANGE_COUNT 3
+
struct vmbus_channel {
struct list_head listentry;

@@ -829,6 +837,8 @@ struct vmbus_channel {
struct completion rescind_event;

u32 ringbuffer_gpadlhandle;
+ /* GPADL_RING and Send/Receive GPADL_BUFFER. */
+ struct vmbus_gpadl_range gpadl_range[VMBUS_GPADL_RANGE_COUNT];

/* Allocated memory for ring buffer */
struct page *ringbuffer_page;
--
2.25.1


2021-07-28 14:56:04

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page

From: Tianyu Lan <[email protected]>

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyper-V requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/hv_init.c | 16 +----
arch/x86/hyperv/ivm.c | 114 ++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 78 +++++++++++++++++++-
arch/x86/include/asm/sev.h | 4 ++
arch/x86/kernel/cpu/mshyperv.c | 3 +
arch/x86/kernel/sev-shared.c | 21 ++++--
drivers/hv/hv.c | 121 ++++++++++++++++++++++----------
include/asm-generic/mshyperv.h | 12 +++-
8 files changed, 307 insertions(+), 62 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index ee449c076ef4..b99f6b3930b7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -392,7 +392,7 @@ void __init hyperv_init(void)
goto clean_guest_os_id;

if (hv_isolation_type_snp()) {
- ms_hyperv.ghcb_base = alloc_percpu(void *);
+ ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
if (!ms_hyperv.ghcb_base)
goto clean_guest_os_id;

@@ -485,6 +485,7 @@ void hyperv_cleanup(void)

/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+ hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);

/*
* Reset hypercall page reference before reset the page,
@@ -558,16 +559,3 @@ bool hv_is_hyperv_initialized(void)
return hypercall_msr.enable;
}
EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
- if (!(ms_hyperv.priv_high & HV_ISOLATION))
- return HV_ISOLATION_TYPE_NONE;
- return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
- return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 24a58795abd8..9c30d5bb7b64 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,6 +6,8 @@
* Tianyu Lan <[email protected]>
*/

+#include <linux/types.h>
+#include <linux/bitfield.h>
#include <linux/hyperv.h>
#include <linux/types.h>
#include <linux/bitfield.h>
@@ -13,6 +15,118 @@
#include <asm/io.h>
#include <asm/mshyperv.h>

+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+ union hv_ghcb *hv_ghcb;
+ void **ghcb_base;
+ unsigned long flags;
+
+ if (!ms_hyperv.ghcb_base)
+ return;
+
+ WARN_ON(in_nmi());
+
+ local_irq_save(flags);
+ ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+ hv_ghcb = (union hv_ghcb *)*ghcb_base;
+ if (!hv_ghcb) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+ ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+ ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
+ ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
+
+ if (sev_es_ghcb_hv_call(&hv_ghcb->ghcb, NULL, SVM_EXIT_MSR, 1, 0))
+ pr_warn("Fail to write msr via ghcb %llx.\n", msr);
+
+ local_irq_restore(flags);
+}
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+ union hv_ghcb *hv_ghcb;
+ void **ghcb_base;
+ unsigned long flags;
+
+ if (!ms_hyperv.ghcb_base)
+ return;
+
+ WARN_ON(in_nmi());
+
+ local_irq_save(flags);
+ ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+ hv_ghcb = (union hv_ghcb *)*ghcb_base;
+ if (!hv_ghcb) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+
+ ghcb_set_rcx(&hv_ghcb->ghcb, msr);
+ if (sev_es_ghcb_hv_call(&hv_ghcb->ghcb, NULL, SVM_EXIT_MSR, 0, 0))
+ pr_warn("Fail to read msr via ghcb %llx.\n", msr);
+ else
+ *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+ | ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+ local_irq_restore(flags);
+}
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+ hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
+
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
+{
+ hv_ghcb_msr_write(msr, value);
+
+ /* Write proxy bit vua wrmsrl instruction. */
+ if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
+ wrmsrl(msr, value | 1 << 20);
+}
+EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
+
+void hv_signal_eom_ghcb(void)
+{
+ hv_sint_wrmsrl_ghcb(HV_X64_MSR_EOM, 0);
+}
+EXPORT_SYMBOL_GPL(hv_signal_eom_ghcb);
+
+enum hv_isolation_type hv_get_isolation_type(void)
+{
+ if (!(ms_hyperv.priv_high & HV_ISOLATION))
+ return HV_ISOLATION_TYPE_NONE;
+ return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
+}
+EXPORT_SYMBOL_GPL(hv_get_isolation_type);
+
+/*
+ * hv_is_isolation_supported - Check system runs in the Hyper-V
+ * isolation VM.
+ */
+bool hv_is_isolation_supported(void)
+{
+ return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
+}
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+/*
+ * hv_isolation_type_snp - Check system runs in the AMD SEV-SNP based
+ * isolation VM.
+ */
+bool hv_isolation_type_snp(void)
+{
+ return static_branch_unlikely(&isolation_type_snp);
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
+
/*
* hv_mark_gpa_visibility - Set pages visible to host via hvcall.
*
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 68dd207c2603..3c0cafdf7309 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -30,6 +30,63 @@ static inline u64 hv_get_register(unsigned int reg)
return value;
}

+#define hv_get_sint_reg(val, reg) { \
+ if (hv_isolation_type_snp()) \
+ hv_get_##reg##_ghcb(&val); \
+ else \
+ rdmsrl(HV_X64_MSR_##reg, val); \
+ }
+
+#define hv_set_sint_reg(val, reg) { \
+ if (hv_isolation_type_snp()) \
+ hv_set_##reg##_ghcb(val); \
+ else \
+ wrmsrl(HV_X64_MSR_##reg, val); \
+ }
+
+
+#define hv_get_simp(val) hv_get_sint_reg(val, SIMP)
+#define hv_get_siefp(val) hv_get_sint_reg(val, SIEFP)
+
+#define hv_set_simp(val) hv_set_sint_reg(val, SIMP)
+#define hv_set_siefp(val) hv_set_sint_reg(val, SIEFP)
+
+#define hv_get_synic_state(val) { \
+ if (hv_isolation_type_snp()) \
+ hv_get_synic_state_ghcb(&val); \
+ else \
+ rdmsrl(HV_X64_MSR_SCONTROL, val); \
+ }
+#define hv_set_synic_state(val) { \
+ if (hv_isolation_type_snp()) \
+ hv_set_synic_state_ghcb(val); \
+ else \
+ wrmsrl(HV_X64_MSR_SCONTROL, val); \
+ }
+
+#define hv_get_vp_index(index) rdmsrl(HV_X64_MSR_VP_INDEX, index)
+
+#define hv_signal_eom() { \
+ if (hv_isolation_type_snp() && \
+ old_msg_type != HVMSG_TIMER_EXPIRED) \
+ hv_signal_eom_ghcb(); \
+ else \
+ wrmsrl(HV_X64_MSR_EOM, 0); \
+ }
+
+#define hv_get_synint_state(int_num, val) { \
+ if (hv_isolation_type_snp()) \
+ hv_get_synint_state_ghcb(int_num, &val);\
+ else \
+ rdmsrl(HV_X64_MSR_SINT0 + int_num, val);\
+ }
+#define hv_set_synint_state(int_num, val) { \
+ if (hv_isolation_type_snp()) \
+ hv_set_synint_state_ghcb(int_num, val); \
+ else \
+ wrmsrl(HV_X64_MSR_SINT0 + int_num, val);\
+ }
+
#define hv_get_raw_timer() rdtsc_ordered()

void hyperv_vector_handler(struct pt_regs *regs);
@@ -192,6 +249,25 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
int hv_set_mem_enc(unsigned long addr, int numpages, bool enc);
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
+void hv_signal_eom_ghcb(void);
+void hv_ghcb_msr_write(u64 msr, u64 value);
+void hv_ghcb_msr_read(u64 msr, u64 *value);
+
+#define hv_get_synint_state_ghcb(int_num, val) \
+ hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+#define hv_set_synint_state_ghcb(int_num, val) \
+ hv_sint_wrmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
+
+#define hv_get_SIMP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIMP, val)
+#define hv_set_SIMP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIMP, val)
+
+#define hv_get_SIEFP_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+#define hv_set_SIEFP_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SIEFP, val)
+
+#define hv_get_synic_state_ghcb(val) hv_sint_rdmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
+#define hv_set_synic_state_ghcb(val) hv_sint_wrmsrl_ghcb(HV_X64_MSR_SCONTROL, val)
#else /* CONFIG_HYPERV */
static inline void hyperv_init(void) {}
static inline void hyperv_setup_mmu_ops(void) {}
@@ -208,9 +284,9 @@ static inline int hyperv_flush_guest_mapping_range(u64 as,
{
return -1;
}
+static inline void hv_signal_eom_ghcb(void) { };
#endif /* CONFIG_HYPERV */

-
#include <asm-generic/mshyperv.h>

#endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index fa5cd05d3b5b..4249fde0a30e 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -81,6 +81,10 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+ struct es_em_ctxt *ctxt,
+ u64 exit_code, u64 exit_info_1,
+ u64 exit_info_2);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 773e84e134b3..46a09cdfa77a 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -332,6 +332,9 @@ static void __init ms_hyperv_init_platform(void)

pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
+
+ if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+ static_branch_enable(&isolation_type_snp);
}

if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 9f90f460a28c..e039e55b9c72 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -94,10 +94,10 @@ static void vc_finish_insn(struct es_em_ctxt *ctxt)
ctxt->regs->ip += ctxt->insn.length;
}

-static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
- struct es_em_ctxt *ctxt,
- u64 exit_code, u64 exit_info_1,
- u64 exit_info_2)
+enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+ struct es_em_ctxt *ctxt,
+ u64 exit_code, u64 exit_info_1,
+ u64 exit_info_2)
{
enum es_result ret;

@@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
ghcb_set_sw_exit_info_2(ghcb, exit_info_2);

- sev_es_wr_ghcb_msr(__pa(ghcb));
+ /*
+ * Hyper-V runs paravisor with SEV. Ghcb page is allocated by
+ * paravisor and not needs to be updated in the Linux guest.
+ * Otherwise, the ghcb page's PA reported by paravisor is above
+ * VTOM. Hyper-V use this function with NULL for ctxt point and
+ * skip setting ghcb page in such case.
+ */
+ if (ctxt)
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+
VMGEXIT();

if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
@@ -120,7 +129,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
v = info & SVM_EVTINJ_VEC_MASK;

/* Check if exception information from hypervisor is sane. */
- if ((info & SVM_EVTINJ_VALID) &&
+ if (ctxt && (info & SVM_EVTINJ_VALID) &&
((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
ctxt->fi.vector = v;
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index e83507f49676..59f7173c4d9f 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -8,6 +8,7 @@
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

+#include <linux/io.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/slab.h>
@@ -136,17 +137,24 @@ int hv_synic_alloc(void)
tasklet_init(&hv_cpu->msg_dpc,
vmbus_on_msg_dpc, (unsigned long) hv_cpu);

- hv_cpu->synic_message_page =
- (void *)get_zeroed_page(GFP_ATOMIC);
- if (hv_cpu->synic_message_page == NULL) {
- pr_err("Unable to allocate SYNIC message page\n");
- goto err;
- }
+ /*
+ * Synic message and event pages are allocated by paravisor.
+ * Skip these pages allocation here.
+ */
+ if (!hv_isolation_type_snp()) {
+ hv_cpu->synic_message_page =
+ (void *)get_zeroed_page(GFP_ATOMIC);
+ if (hv_cpu->synic_message_page == NULL) {
+ pr_err("Unable to allocate SYNIC message page\n");
+ goto err;
+ }

- hv_cpu->synic_event_page = (void *)get_zeroed_page(GFP_ATOMIC);
- if (hv_cpu->synic_event_page == NULL) {
- pr_err("Unable to allocate SYNIC event page\n");
- goto err;
+ hv_cpu->synic_event_page =
+ (void *)get_zeroed_page(GFP_ATOMIC);
+ if (hv_cpu->synic_event_page == NULL) {
+ pr_err("Unable to allocate SYNIC event page\n");
+ goto err;
+ }
}

hv_cpu->post_msg_page = (void *)get_zeroed_page(GFP_ATOMIC);
@@ -173,10 +181,17 @@ void hv_synic_free(void)
for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
+ free_page((unsigned long)hv_cpu->post_msg_page);
+
+ /*
+ * Synic message and event pages are allocated by paravisor.
+ * Skip free these pages here.
+ */
+ if (hv_isolation_type_snp())
+ continue;

free_page((unsigned long)hv_cpu->synic_event_page);
free_page((unsigned long)hv_cpu->synic_message_page);
- free_page((unsigned long)hv_cpu->post_msg_page);
}

kfree(hv_context.hv_numa_map);
@@ -199,26 +214,43 @@ void hv_synic_enable_regs(unsigned int cpu)
union hv_synic_scontrol sctrl;

/* Setup the Synic's message page */
- simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
+ hv_get_simp(simp.as_uint64);
simp.simp_enabled = 1;
- simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
- >> HV_HYP_PAGE_SHIFT;

- hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
+ if (hv_isolation_type_snp()) {
+ hv_cpu->synic_message_page
+ = memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
+ HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+ if (!hv_cpu->synic_message_page)
+ pr_err("Fail to map syinc message page.\n");
+ } else {
+ simp.base_simp_gpa = virt_to_phys(hv_cpu->synic_message_page)
+ >> HV_HYP_PAGE_SHIFT;
+ }
+
+ hv_set_simp(simp.as_uint64);

/* Setup the Synic's event page */
- siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
+ hv_get_siefp(siefp.as_uint64);
siefp.siefp_enabled = 1;
- siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
- >> HV_HYP_PAGE_SHIFT;

- hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
+ if (hv_isolation_type_snp()) {
+ hv_cpu->synic_event_page =
+ memremap(siefp.base_siefp_gpa << HV_HYP_PAGE_SHIFT,
+ HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+
+ if (!hv_cpu->synic_event_page)
+ pr_err("Fail to map syinc event page.\n");
+ } else {
+ siefp.base_siefp_gpa = virt_to_phys(hv_cpu->synic_event_page)
+ >> HV_HYP_PAGE_SHIFT;
+ }
+ hv_set_siefp(siefp.as_uint64);

/* Setup the shared SINT. */
if (vmbus_irq != -1)
enable_percpu_irq(vmbus_irq, 0);
- shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
- VMBUS_MESSAGE_SINT);
+ hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);

shared_sint.vector = vmbus_interrupt;
shared_sint.masked = false;
@@ -233,14 +265,12 @@ void hv_synic_enable_regs(unsigned int cpu)
#else
shared_sint.auto_eoi = 0;
#endif
- hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
- shared_sint.as_uint64);
+ hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);

/* Enable the global synic bit */
- sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
+ hv_get_synic_state(sctrl.as_uint64);
sctrl.enable = 1;
-
- hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
+ hv_set_synic_state(sctrl.as_uint64);
}

int hv_synic_init(unsigned int cpu)
@@ -257,37 +287,50 @@ int hv_synic_init(unsigned int cpu)
*/
void hv_synic_disable_regs(unsigned int cpu)
{
+ struct hv_per_cpu_context *hv_cpu
+ = per_cpu_ptr(hv_context.cpu_context, cpu);
union hv_synic_sint shared_sint;
union hv_synic_simp simp;
union hv_synic_siefp siefp;
union hv_synic_scontrol sctrl;

- shared_sint.as_uint64 = hv_get_register(HV_REGISTER_SINT0 +
- VMBUS_MESSAGE_SINT);
-
+ hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
shared_sint.masked = 1;
+ hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+

/* Need to correctly cleanup in the case of SMP!!! */
/* Disable the interrupt */
- hv_set_register(HV_REGISTER_SINT0 + VMBUS_MESSAGE_SINT,
- shared_sint.as_uint64);
+ hv_get_simp(simp.as_uint64);

- simp.as_uint64 = hv_get_register(HV_REGISTER_SIMP);
+ /*
+ * In Isolation VM, sim and sief pages are allocated by
+ * paravisor. These pages also will be used by kdump
+ * kernel. So just reset enable bit here and keep page
+ * addresses.
+ */
simp.simp_enabled = 0;
- simp.base_simp_gpa = 0;
+ if (hv_isolation_type_snp())
+ memunmap(hv_cpu->synic_message_page);
+ else
+ simp.base_simp_gpa = 0;

- hv_set_register(HV_REGISTER_SIMP, simp.as_uint64);
+ hv_set_simp(simp.as_uint64);

- siefp.as_uint64 = hv_get_register(HV_REGISTER_SIEFP);
+ hv_get_siefp(siefp.as_uint64);
siefp.siefp_enabled = 0;
- siefp.base_siefp_gpa = 0;

- hv_set_register(HV_REGISTER_SIEFP, siefp.as_uint64);
+ if (hv_isolation_type_snp())
+ memunmap(hv_cpu->synic_event_page);
+ else
+ siefp.base_siefp_gpa = 0;
+
+ hv_set_siefp(siefp.as_uint64);

/* Disable the global synic bit */
- sctrl.as_uint64 = hv_get_register(HV_REGISTER_SCONTROL);
+ hv_get_synic_state(sctrl.as_uint64);
sctrl.enable = 0;
- hv_set_register(HV_REGISTER_SCONTROL, sctrl.as_uint64);
+ hv_set_synic_state(sctrl.as_uint64);

if (vmbus_irq != -1)
disable_percpu_irq(vmbus_irq);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index aa26d24a5ca9..b0cfc25dffaa 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -23,9 +23,16 @@
#include <linux/bitops.h>
#include <linux/cpumask.h>
#include <linux/nmi.h>
+#include <asm/svm.h>
+#include <asm/sev.h>
#include <asm/ptrace.h>
+#include <asm/mshyperv.h>
#include <asm/hyperv-tlfs.h>

+union hv_ghcb {
+ struct ghcb ghcb;
+} __packed __aligned(PAGE_SIZE);
+
struct ms_hyperv_info {
u32 features;
u32 priv_high;
@@ -45,7 +52,7 @@ struct ms_hyperv_info {
u32 Reserved12 : 20;
};
};
- void __percpu **ghcb_base;
+ union hv_ghcb __percpu **ghcb_base;
u64 shared_gpa_boundary;
};
extern struct ms_hyperv_info ms_hyperv;
@@ -55,6 +62,7 @@ extern void __percpu **hyperv_pcpu_output_arg;

extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
+extern bool hv_isolation_type_snp(void);

/* Helper functions that provide a consistent pattern for checking Hyper-V hypercall status. */
static inline int hv_result(u64 status)
@@ -149,7 +157,7 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
* possibly deliver another msg from the
* hypervisor
*/
- hv_set_register(HV_REGISTER_EOM, 0);
+ hv_signal_eom();
}
}

--
2.25.1


2021-07-28 14:56:10

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 06/13] HV: Add ghcb hvcall support for SNP VM

From: Tianyu Lan <[email protected]>

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 42 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/mshyperv.h | 1 +
drivers/hv/connection.c | 6 ++++-
drivers/hv/hv.c | 8 ++++++-
include/asm-generic/mshyperv.h | 29 +++++++++++++++++++++++
5 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 9c30d5bb7b64..13bab7f07085 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,48 @@
#include <asm/io.h>
#include <asm/mshyperv.h>

+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+ union hv_ghcb *hv_ghcb;
+ void **ghcb_base;
+ unsigned long flags;
+
+ if (!ms_hyperv.ghcb_base)
+ return -EFAULT;
+
+ WARN_ON(in_nmi());
+
+ local_irq_save(flags);
+ ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+ hv_ghcb = (union hv_ghcb *)*ghcb_base;
+ if (!hv_ghcb) {
+ local_irq_restore(flags);
+ return -EFAULT;
+ }
+
+ memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
+ hv_ghcb->ghcb.protocol_version = 1;
+ hv_ghcb->ghcb.ghcb_usage = 1;
+
+ hv_ghcb->hypercall.outputgpa = (u64)output;
+ hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+ hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+ if (input_size)
+ memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+ VMGEXIT();
+
+ hv_ghcb->ghcb.ghcb_usage = 0xffffffff;
+ memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+ sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+ local_irq_restore(flags);
+
+ return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
void hv_ghcb_msr_write(u64 msr, u64 value)
{
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 3c0cafdf7309..8bf26e6e7055 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -254,6 +254,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
void hv_signal_eom_ghcb(void);
void hv_ghcb_msr_write(u64 msr, u64 value);
void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);

#define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5e479d54918c..6d315c1465e0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)

++channel->sig_events;

- hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+ if (hv_isolation_type_snp())
+ hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, &channel->sig_event,
+ NULL, sizeof(u64));
+ else
+ hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
}
EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 59f7173c4d9f..e5c9fc467893 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);

- status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+ if (hv_isolation_type_snp())
+ status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+ (void *)aligned_msg, NULL,
+ sizeof(struct hv_input_post_message));
+ else
+ status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+ aligned_msg, NULL);

/* Preemption must remain disabled until after the hypercall
* so some other thread can't get scheduled onto this cpu and
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index b0cfc25dffaa..317d2a8d9700 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -31,6 +31,35 @@

union hv_ghcb {
struct ghcb ghcb;
+ struct {
+ u64 hypercalldata[509];
+ u64 outputgpa;
+ union {
+ union {
+ struct {
+ u32 callcode : 16;
+ u32 isfast : 1;
+ u32 reserved1 : 14;
+ u32 isnested : 1;
+ u32 countofelements : 12;
+ u32 reserved2 : 4;
+ u32 repstartindex : 12;
+ u32 reserved3 : 4;
+ };
+ u64 asuint64;
+ } hypercallinput;
+ union {
+ struct {
+ u16 callstatus;
+ u16 reserved1;
+ u32 elementsprocessed : 12;
+ u32 reserved2 : 20;
+ };
+ u64 asunit64;
+ } hypercalloutput;
+ };
+ u64 reserved2;
+ } hypercall;
} __packed __aligned(PAGE_SIZE);

struct ms_hyperv_info {
--
2.25.1


2021-07-28 14:56:14

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message

From: Tianyu Lan <[email protected]>

The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary).

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/connection.c | 65 +++++++++++++++++++++++++++++++++++++++
drivers/hv/hyperv_vmbus.h | 1 +
2 files changed, 66 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6d315c1465e0..e6a7bae036a8 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -19,6 +19,7 @@
#include <linux/vmalloc.h>
#include <linux/hyperv.h>
#include <linux/export.h>
+#include <linux/io.h>
#include <asm/mshyperv.h>

#include "hyperv_vmbus.h"
@@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)

msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+ if (hv_is_isolation_supported()) {
+ msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+ msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+ }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);

/*
@@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
return -ECONNREFUSED;
}

+ if (hv_is_isolation_supported()) {
+ vmbus_connection.monitor_pages_va[0]
+ = vmbus_connection.monitor_pages[0];
+ vmbus_connection.monitor_pages[0]
+ = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
+ MEMREMAP_WB);
+ if (!vmbus_connection.monitor_pages[0])
+ return -ENOMEM;
+
+ vmbus_connection.monitor_pages_va[1]
+ = vmbus_connection.monitor_pages[1];
+ vmbus_connection.monitor_pages[1]
+ = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
+ MEMREMAP_WB);
+ if (!vmbus_connection.monitor_pages[1]) {
+ memunmap(vmbus_connection.monitor_pages[0]);
+ return -ENOMEM;
+ }
+
+ memset(vmbus_connection.monitor_pages[0], 0x00,
+ HV_HYP_PAGE_SIZE);
+ memset(vmbus_connection.monitor_pages[1], 0x00,
+ HV_HYP_PAGE_SIZE);
+ }
+
return ret;
}

@@ -159,6 +191,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+ u64 pfn[2];

/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -216,6 +249,16 @@ int vmbus_connect(void)
goto cleanup;
}

+ if (hv_is_isolation_supported()) {
+ pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+ pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+ if (hv_mark_gpa_visibility(2, pfn,
+ VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+ ret = -EFAULT;
+ goto cleanup;
+ }
+ }
+
msginfo = kzalloc(sizeof(*msginfo) +
sizeof(struct vmbus_channel_initiate_contact),
GFP_KERNEL);
@@ -284,6 +327,8 @@ int vmbus_connect(void)

void vmbus_disconnect(void)
{
+ u64 pfn[2];
+
/*
* First send the unload request to the host.
*/
@@ -303,6 +348,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}

+ if (hv_is_isolation_supported()) {
+ if (vmbus_connection.monitor_pages_va[0]) {
+ memunmap(vmbus_connection.monitor_pages[0]);
+ vmbus_connection.monitor_pages[0]
+ = vmbus_connection.monitor_pages_va[0];
+ vmbus_connection.monitor_pages_va[0] = NULL;
+ }
+
+ if (vmbus_connection.monitor_pages_va[1]) {
+ memunmap(vmbus_connection.monitor_pages[1]);
+ vmbus_connection.monitor_pages[1]
+ = vmbus_connection.monitor_pages_va[1];
+ vmbus_connection.monitor_pages_va[1] = NULL;
+ }
+
+ pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+ pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+ hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+ }
+
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
vmbus_connection.monitor_pages[0] = NULL;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 42f3d9d123a1..40bc0eff6665 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -240,6 +240,7 @@ struct vmbus_connection {
* is child->parent notification
*/
struct hv_monitor_page *monitor_pages[2];
+ void *monitor_pages_va[2];
struct list_head chn_msg_list;
spinlock_t channelmsg_lock;

--
2.25.1


2021-07-28 14:57:13

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

From: Tianyu Lan <[email protected]>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/scsi/storvsc_drv.c | 68 +++++++++++++++++++++++++++++++++++---
1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 328bb961c281..78320719bdd8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
#include <linux/device.h>
#include <linux/hyperv.h>
#include <linux/blkdev.h>
+#include <linux/io.h>
+#include <linux/dma-mapping.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
#include <scsi/scsi_host.h>
@@ -427,6 +429,8 @@ struct storvsc_cmd_request {
u32 payload_sz;

struct vstor_packet vstor_packet;
+ u32 hvpg_count;
+ struct hv_dma_range *dma_range;
};


@@ -509,6 +513,14 @@ struct storvsc_scan_work {
u8 tgt_id;
};

+#define storvsc_dma_map(dev, page, offset, size, dir) \
+ dma_map_page(dev, page, offset, size, dir)
+
+#define storvsc_dma_unmap(dev, dma_range, dir) \
+ dma_unmap_page(dev, dma_range.dma, \
+ dma_range.mapping_size, \
+ dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
+
static void storvsc_device_scan(struct work_struct *work)
{
struct storvsc_scan_work *wrk;
@@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
struct hv_device *device;
struct storvsc_device *stor_device;
struct Scsi_Host *shost;
+ int i;

if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
}

+ if (request->dma_range) {
+ for (i = 0; i < request->hvpg_count; i++)
+ storvsc_dma_unmap(&device->device,
+ request->dma_range[i],
+ request->vstor_packet.vm_srb.data_in == READ_TYPE);
+
+ kfree(request->dma_range);
+ }
+
storvsc_on_receive(stor_device, packet, request);
continue;
}
@@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
+ dma_addr_t dma;
u64 hvpfn;
+ u32 size;

if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {

@@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;

+ cmd_request->dma_range = kcalloc(hvpg_count,
+ sizeof(*cmd_request->dma_range),
+ GFP_ATOMIC);
+ if (!cmd_request->dma_range) {
+ ret = -ENOMEM;
+ goto free_payload;
+ }

for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
/*
@@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
* last sgl should be reached at the same time that
* the PFN array is filled.
*/
- while (hvpfns_to_add--)
- payload->range.pfn_array[i++] = hvpfn++;
+ while (hvpfns_to_add--) {
+ size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
+ (unsigned long)length);
+ dma = storvsc_dma_map(&dev->device, pfn_to_page(hvpfn++),
+ offset_in_hvpg, size,
+ scmnd->sc_data_direction);
+ if (dma_mapping_error(&dev->device, dma)) {
+ ret = -ENOMEM;
+ goto free_dma_range;
+ }
+
+ if (offset_in_hvpg) {
+ payload->range.offset = dma & ~HV_HYP_PAGE_MASK;
+ offset_in_hvpg = 0;
+ }
+
+ cmd_request->dma_range[i].dma = dma;
+ cmd_request->dma_range[i].mapping_size = size;
+ payload->range.pfn_array[i++] = dma >> HV_HYP_PAGE_SHIFT;
+ length -= size;
+ }
}
+ cmd_request->hvpg_count = hvpg_count;
}

cmd_request->payload = payload;
@@ -1860,13 +1911,20 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
put_cpu();

if (ret == -EAGAIN) {
- if (payload_sz > sizeof(cmd_request->mpb))
- kfree(payload);
/* no more space */
- return SCSI_MLQUEUE_DEVICE_BUSY;
+ ret = SCSI_MLQUEUE_DEVICE_BUSY;
+ goto free_dma_range;
}

return 0;
+
+free_dma_range:
+ kfree(cmd_request->dma_range);
+
+free_payload:
+ if (payload_sz > sizeof(cmd_request->mpb))
+ kfree(payload);
+ return ret;
}

static struct scsi_host_template scsi_driver = {
--
2.25.1


2021-07-28 14:57:36

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

From: Tianyu Lan <[email protected]>

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/Kconfig | 1 +
drivers/hv/channel.c | 10 +++++
drivers/hv/hyperv_vmbus.h | 2 +
drivers/hv/ring_buffer.c | 84 ++++++++++++++++++++++++++++++---------
4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 66c794d92391..a8386998be40 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,7 @@ config HYPERV
depends on X86 && ACPI && X86_LOCAL_APIC && HYPERVISOR_GUEST
select PARAVIRT
select X86_HV_CALLBACK_VECTOR
+ select VMAP_PFN
help
Select this option to run Linux as a Hyper-V client operating
system.
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 01048bb07082..7350da9dbe97 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -707,6 +707,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;

+ err = hv_ringbuffer_post_init(&newchannel->outbound,
+ page, send_pages);
+ if (err)
+ goto error_free_gpadl;
+
+ err = hv_ringbuffer_post_init(&newchannel->inbound,
+ &page[send_pages], recv_pages);
+ if (err)
+ goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 40bc0eff6665..15cd23a561f3 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
/* Interface */

void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+ struct page *pages, u32 page_cnt);

int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
struct page *pages, u32 pagecnt, u32 max_pkt_size);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2aee356840a2..d4f93fca1108 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
#include <linux/vmalloc.h>
#include <linux/slab.h>
#include <linux/prefetch.h>
+#include <linux/io.h>
+#include <asm/mshyperv.h>

#include "hyperv_vmbus.h"

@@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(&channel->outbound.ring_buffer_mutex);
}

-/* Initialize the ring buffer. */
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
- struct page *pages, u32 page_cnt, u32 max_pkt_size)
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+ struct page *pages, u32 page_cnt)
{
+ u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+ unsigned long *pfns_wraparound;
+ void *vaddr;
int i;
- struct page **pages_wraparound;

- BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+ if (!hv_isolation_type_snp())
+ return 0;
+
+ physic_addr += ms_hyperv.shared_gpa_boundary;

/*
* First page holds struct hv_ring_buffer, do wraparound mapping for
* the rest.
*/
- pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+ pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
GFP_KERNEL);
- if (!pages_wraparound)
+ if (!pfns_wraparound)
return -ENOMEM;

- pages_wraparound[0] = pages;
+ pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
for (i = 0; i < 2 * (page_cnt - 1); i++)
- pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
-
- ring_info->ring_buffer = (struct hv_ring_buffer *)
- vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
-
- kfree(pages_wraparound);
+ pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
+ i % (page_cnt - 1) + 1;

-
- if (!ring_info->ring_buffer)
+ vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
+ kfree(pfns_wraparound);
+ if (!vaddr)
return -ENOMEM;

- ring_info->ring_buffer->read_index =
- ring_info->ring_buffer->write_index = 0;
+ /* Clean memory after setting host visibility. */
+ memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+ ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+ ring_info->ring_buffer->read_index = 0;
+ ring_info->ring_buffer->write_index = 0;

/* Set the feature bit for enabling flow control. */
ring_info->ring_buffer->feature_bits.value = 1;

+ return 0;
+}
+
+/* Initialize the ring buffer. */
+int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
+ struct page *pages, u32 page_cnt, u32 max_pkt_size)
+{
+ int i;
+ struct page **pages_wraparound;
+
+ BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+
+ if (!hv_isolation_type_snp()) {
+ /*
+ * First page holds struct hv_ring_buffer, do wraparound mapping for
+ * the rest.
+ */
+ pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+ GFP_KERNEL);
+ if (!pages_wraparound)
+ return -ENOMEM;
+
+ pages_wraparound[0] = pages;
+ for (i = 0; i < 2 * (page_cnt - 1); i++)
+ pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
+
+ ring_info->ring_buffer = (struct hv_ring_buffer *)
+ vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
+
+ kfree(pages_wraparound);
+
+ if (!ring_info->ring_buffer)
+ return -ENOMEM;
+
+ ring_info->ring_buffer->read_index =
+ ring_info->ring_buffer->write_index = 0;
+
+ /* Set the feature bit for enabling flow control. */
+ ring_info->ring_buffer->feature_bits.value = 1;
+ }
+
ring_info->ring_size = page_cnt << PAGE_SHIFT;
ring_info->ring_size_div10_reciprocal =
reciprocal_value(ring_info->ring_size / 10);
--
2.25.1


2021-07-28 14:58:42

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver

From: Tianyu Lan <[email protected]>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/net/hyperv/hyperv_net.h | 6 ++
drivers/net/hyperv/netvsc.c | 144 +++++++++++++++++++++++++++++-
drivers/net/hyperv/rndis_filter.c | 2 +
include/linux/hyperv.h | 5 ++
4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855dff10..862419912bfb 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+ struct hv_dma_range *dma_range;
};

#define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {

/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+ void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,8 @@ struct netvsc_device {

/* Send buffer allocated by us */
void *send_buf;
+ void *send_original_buf;
+ u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1730,4 +1734,6 @@ struct rndis_message {
#define RETRY_US_HI 10000
#define RETRY_MAX 2000 /* >10 sec */

+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
#endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd935412853..fc312e5db4d5 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;

kfree(nvdev->extension);
- vfree(nvdev->recv_buf);
- vfree(nvdev->send_buf);
+
+ if (nvdev->recv_original_buf) {
+ vunmap(nvdev->recv_buf);
+ vfree(nvdev->recv_original_buf);
+ } else {
+ vfree(nvdev->recv_buf);
+ }
+
+ if (nvdev->send_original_buf) {
+ vunmap(nvdev->send_buf);
+ vfree(nvdev->send_original_buf);
+ } else {
+ vfree(nvdev->send_buf);
+ }
+
kfree(nvdev->send_section_map);

for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device *net_device, u32 q_idx)
return nvchan->mrc.slots ? 0 : -ENOMEM;
}

+static void *netvsc_remap_buf(void *buf, unsigned long size)
+{
+ unsigned long *pfns;
+ void *vaddr;
+ int i;
+
+ pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
+ GFP_KERNEL);
+ if (!pfns)
+ return NULL;
+
+ for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+ pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
+ + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+ vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+ kfree(pfns);
+
+ return vaddr;
+}
+
static int netvsc_init_buf(struct hv_device *device,
struct netvsc_device *net_device,
const struct netvsc_device_info *device_info)
@@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
unsigned int buf_size;
size_t map_words;
int i, ret = 0;
+ void *vaddr;

/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}

+ if (hv_isolation_type_snp()) {
+ vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
+ if (!vaddr)
+ goto cleanup;
+
+ net_device->recv_original_buf = net_device->recv_buf;
+ net_device->recv_buf = vaddr;
+ }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}

+ if (hv_isolation_type_snp()) {
+ vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
+ if (!vaddr)
+ goto cleanup;
+
+ net_device->send_original_buf = net_device->send_buf;
+ net_device->send_buf = vaddr;
+ }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -767,7 +820,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,

/* Notify the layer above us */
if (likely(skb)) {
- const struct hv_netvsc_packet *packet
+ struct hv_netvsc_packet *packet
= (struct hv_netvsc_packet *)skb->cb;
u32 send_index = packet->send_buf_index;
struct netvsc_stats *tx_stats;
@@ -783,6 +836,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
tx_stats->bytes += packet->total_bytes;
u64_stats_update_end(&tx_stats->syncp);

+ netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
napi_consume_skb(skb, budget);
}

@@ -947,6 +1001,82 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
memset(dest, 0, padding);
}

+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet)
+{
+ u32 page_count = packet->cp_partial ?
+ packet->page_buf_cnt - packet->rmsg_pgcnt :
+ packet->page_buf_cnt;
+ int i;
+
+ if (!hv_is_isolation_supported())
+ return;
+
+ if (!packet->dma_range)
+ return;
+
+ for (i = 0; i < page_count; i++)
+ dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
+ packet->dma_range[i].mapping_size,
+ DMA_TO_DEVICE);
+
+ kfree(packet->dma_range);
+}
+
+/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
+ * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
+ * VM.
+ *
+ * In isolation VM, netvsc send buffer has been marked visible to
+ * host and so the data copied to send buffer doesn't need to use
+ * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
+ * may not be copied to send buffer and so these pages need to be
+ * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
+ * that. The pfns in the struct hv_page_buffer need to be converted
+ * to bounce buffer's pfn. The loop here is necessary and so not
+ * use dma_map_sg() here.
+ */
+int netvsc_dma_map(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet,
+ struct hv_page_buffer *pb)
+{
+ u32 page_count = packet->cp_partial ?
+ packet->page_buf_cnt - packet->rmsg_pgcnt :
+ packet->page_buf_cnt;
+ dma_addr_t dma;
+ int i;
+
+ if (!hv_is_isolation_supported())
+ return 0;
+
+ packet->dma_range = kcalloc(page_count,
+ sizeof(*packet->dma_range),
+ GFP_KERNEL);
+ if (!packet->dma_range)
+ return -ENOMEM;
+
+ for (i = 0; i < page_count; i++) {
+ char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
+ + pb[i].offset);
+ u32 len = pb[i].len;
+
+ dma = dma_map_single(&hv_dev->device, src, len,
+ DMA_TO_DEVICE);
+ if (dma_mapping_error(&hv_dev->device, dma)) {
+ kfree(packet->dma_range);
+ return -ENOMEM;
+ }
+
+ packet->dma_range[i].dma = dma;
+ packet->dma_range[i].mapping_size = len;
+ pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
+ pb[i].offset = offset_in_hvpage(dma);
+ pb[i].len = len;
+ }
+
+ return 0;
+}
+
static inline int netvsc_send_pkt(
struct hv_device *device,
struct hv_netvsc_packet *packet,
@@ -987,14 +1117,22 @@ static inline int netvsc_send_pkt(

trace_nvsp_send_pkt(ndev, out_channel, rpkt);

+ packet->dma_range = NULL;
if (packet->page_buf_cnt) {
if (packet->cp_partial)
pb += packet->rmsg_pgcnt;

+ ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
+ if (ret)
+ return ret;
+
ret = vmbus_sendpacket_pagebuffer(out_channel,
pb, packet->page_buf_cnt,
&nvmsg, sizeof(nvmsg),
req_id);
+
+ if (ret)
+ netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
} else {
ret = vmbus_sendpacket(out_channel,
&nvmsg, sizeof(nvmsg),
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index f6c9c2a670f9..448fcc325ed7 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -361,6 +361,8 @@ static void rndis_filter_receive_response(struct net_device *ndev,
}
}

+ netvsc_dma_unmap(((struct net_device_context *)
+ netdev_priv(ndev))->device_ctx, &request->pkt);
complete(&request->wait_event);
} else {
netdev_err(ndev,
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index babbe19f57e2..90abff664495 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1616,6 +1616,11 @@ struct hyperv_service_callback {
void (*callback)(void *context);
};

+struct hv_dma_range {
+ dma_addr_t dma;
+ u32 mapping_size;
+};
+
#define MAX_SRV_VER 0x7ffffff
extern bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, u8 *buf, u32 buflen,
const int *fw_version, int fw_vercnt,
--
2.25.1


2021-07-28 14:58:52

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function

From: Tianyu Lan <[email protected]>

In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
needs to be mapped into address space above vTOM and so
introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
bounce buffer memory. The platform can populate man/unmap callback
in the dma memory decrypted ops.

Signed-off-by: Tianyu Lan <[email protected]>
---
include/linux/dma-map-ops.h | 9 +++++++++
kernel/dma/mapping.c | 22 ++++++++++++++++++++++
2 files changed, 31 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..01d60a024e45 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -71,6 +71,11 @@ struct dma_map_ops {
unsigned long (*get_merge_boundary)(struct device *dev);
};

+struct dma_memory_decrypted_ops {
+ void *(*map)(void *addr, unsigned long size);
+ void (*unmap)(void *addr);
+};
+
#ifdef CONFIG_DMA_OPS
#include <asm/dma-mapping.h>

@@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
}
#endif /* CONFIG_DMA_API_DEBUG */

+void *dma_map_decrypted(void *addr, unsigned long size);
+int dma_unmap_decrypted(void *addr, unsigned long size);
+
extern const struct dma_map_ops dma_dummy_ops;
+extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;

#endif /* _LINUX_DMA_MAP_OPS_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..6fb150dc1750 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -13,11 +13,13 @@
#include <linux/of_device.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
+#include <asm/set_memory.h>
#include "debug.h"
#include "direct.h"

bool dma_default_coherent;

+struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
/*
* Managed DMA API
*/
@@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
return ops->get_merge_boundary(dev);
}
EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+void *dma_map_decrypted(void *addr, unsigned long size)
+{
+ if (set_memory_decrypted((unsigned long)addr,
+ size / PAGE_SIZE))
+ return NULL;
+
+ if (dma_memory_generic_decrypted_ops.map)
+ return dma_memory_generic_decrypted_ops.map(addr, size);
+ else
+ return addr;
+}
+
+int dma_unmap_encrypted(void *addr, unsigned long size)
+{
+ if (dma_memory_generic_decrypted_ops.unmap)
+ dma_memory_generic_decrypted_ops.unmap(addr);
+
+ return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
+}
--
2.25.1


2021-07-28 14:58:56

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

From: Tianyu Lan <[email protected]>

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Use dma_map_decrypted() in the swiotlb code, store remap address returned
and use the remap address to copy data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan <[email protected]>
---
include/linux/swiotlb.h | 4 ++++
kernel/dma/swiotlb.c | 11 ++++++++---
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f507e3eacbea..584560ecaa8e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
* @end: The end address of the swiotlb memory pool. Used to do a quick
* range check to see if the memory was in fact allocated by this
* API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and store
+ * virtual address for bounce buffer operation.
* @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
* @end. For default swiotlb, this is command line adjustable via
* setup_io_tlb_npages.
@@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+ void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fa81c096c1d..6866e5784b53 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -194,8 +194,13 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
mem->slots[i].alloc_size = 0;
}

- set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
- memset(vaddr, 0, bytes);
+ mem->vaddr = dma_map_decrypted(vaddr, bytes);
+ if (!mem->vaddr) {
+ pr_err("Failed to decrypt memory.\n");
+ return;
+ }
+
+ memset(mem->vaddr, 0, bytes);
}

int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
@@ -360,7 +365,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
- unsigned char *vaddr = phys_to_virt(tlb_addr);
+ unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
unsigned int tlb_offset;

if (orig_addr == INVALID_PHYS_ADDR)
--
2.25.1


2021-07-28 14:59:08

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM

From: Tianyu Lan <[email protected]>

Hyper-V Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls dma_map_decrypted()
to mark bounce buffer visible to host and map it in extra
address space. Populate dma memory decrypted ops with hv
map/unmap function.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

The map function vmap_pfn() can't work in the early place
hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
buffer in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan <[email protected]>
---
arch/x86/hyperv/ivm.c | 28 ++++++++++++++
arch/x86/include/asm/mshyperv.h | 2 +
arch/x86/xen/pci-swiotlb-xen.c | 3 +-
drivers/hv/vmbus_drv.c | 3 ++
drivers/iommu/hyperv-iommu.c | 65 +++++++++++++++++++++++++++++++++
include/linux/hyperv.h | 1 +
6 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 13bab7f07085..9fbb5cbf3321 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -266,3 +266,31 @@ int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
enc ? VMBUS_PAGE_NOT_VISIBLE
: VMBUS_PAGE_VISIBLE_READ_WRITE);
}
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+ unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+ void *vaddr;
+ int i;
+
+ if (!pfns)
+ return (unsigned long)NULL;
+
+ for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+ pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
+ (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+ vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+ kfree(pfns);
+
+ return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+ vunmap(addr);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 8bf26e6e7055..b815ec0bc36d 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -249,6 +249,8 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility);
int hv_set_mem_enc(unsigned long addr, int numpages, bool enc);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
void hv_signal_eom_ghcb(void);
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 54f9aa7e8457..43bd031aa332 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@

#include <linux/dma-map-ops.h>
#include <linux/pci.h>
+#include <linux/hyperv.h>
#include <xen/swiotlb-xen.h>

#include <asm/xen/hypervisor.h>
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);

IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
pci_xen_swiotlb_init,
NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 57bbbaa4e8f7..f068e22a5636 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
#include <linux/cpu.h>
#include <linux/sched/task_stack.h>

+#include <linux/dma-map-ops.h>
#include <linux/delay.h>
#include <linux/notifier.h>
#include <linux/panic_notifier.h>
@@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
}

+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
/*
* vmbus_device_register - Register the child device
*/
@@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);

+ child_device_obj->device.dma_mask = &vmbus_dma_mask;
return 0;

err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..089617085a69 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,22 @@
#include <linux/irq.h>
#include <linux/iommu.h>
#include <linux/module.h>
+#include <linux/hyperv.h>
+#include <linux/io.h>

#include <asm/apic.h>
#include <asm/cpu.h>
#include <asm/hw_irq.h>
#include <asm/io_apic.h>
+#include <asm/iommu.h>
+#include <asm/iommu_table.h>
#include <asm/irq_remapping.h>
#include <asm/hypervisor.h>
#include <asm/mshyperv.h>
+#include <asm/swiotlb.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-direct.h>
+#include <linux/set_memory.h>

#include "irq_remapping.h"

@@ -36,6 +44,8 @@
static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
static struct irq_domain *ioapic_ir_domain;

+static unsigned long hyperv_io_tlb_start, hyperv_io_tlb_size;
+
static int hyperv_ir_set_affinity(struct irq_data *data,
const struct cpumask *mask, bool force)
{
@@ -337,4 +347,59 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
};

+void __init hyperv_iommu_swiotlb_init(void)
+{
+ unsigned long bytes;
+
+ /*
+ * Allocate Hyper-V swiotlb bounce buffer at early place
+ * to reserve large contiguous memory.
+ */
+ hyperv_io_tlb_size = 256 * 1024 * 1024;
+ hyperv_io_tlb_start =
+ (unsigned long)memblock_alloc_low(
+ PAGE_ALIGN(hyperv_io_tlb_size),
+ HV_HYP_PAGE_SIZE);
+
+ if (!hyperv_io_tlb_start) {
+ pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");
+ return;
+ }
+}
+
+int __init hyperv_swiotlb_detect(void)
+{
+ if (hypervisor_is_type(X86_HYPER_MS_HYPERV)
+ && hv_is_isolation_supported()) {
+ /*
+ * Enable swiotlb force mode in Isolation VM to
+ * use swiotlb bounce buffer for dma transaction.
+ */
+ swiotlb_force = SWIOTLB_FORCE;
+
+ dma_memory_generic_decrypted_ops.map = hv_map_memory;
+ dma_memory_generic_decrypted_ops.unmap = hv_unmap_memory;
+ return 1;
+ }
+
+ return 0;
+}
+
+void __init hyperv_iommu_swiotlb_later_init(void)
+{
+ int ret;
+
+ /*
+ * Swiotlb bounce buffer needs to be mapped in extra address
+ * space. Map function doesn't work in the early place and so
+ * call swiotlb_late_init_with_tbl() here.
+ */
+ swiotlb_late_init_with_tbl(hyperv_io_tlb_start,
+ hyperv_io_tlb_size >> IO_TLB_SHIFT);
+}
+
+IOMMU_INIT_FINISH(hyperv_swiotlb_detect,
+ NULL, hyperv_iommu_swiotlb_init,
+ hyperv_iommu_swiotlb_later_init);
+
#endif
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 06eccaba10c5..babbe19f57e2 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1759,6 +1759,7 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
void (*block_invalidate)(void *context,
u64 block_mask));
+int __init hyperv_swiotlb_detect(void);

struct hyperv_pci_block_ops {
int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
--
2.25.1


2021-07-28 15:31:16

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 7/28/21 7:52 AM, Tianyu Lan wrote:
> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> int ret;
>
> /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported())
> + return hv_set_mem_enc(addr, numpages, enc);
> + else if (!mem_encrypt_active())
> return 0;

__set_memory_enc_dec() is turning into a real mess. SEV, TDX and now
Hyper-V are messing around in here.

It doesn't help that these additions are totally uncommented. Even
worse is that hv_set_mem_enc() was intentionally named "enc" when it
presumably has nothing to do with encryption.

This needs to be refactored. The current __set_memory_enc_dec() can
become __set_memory_enc_pgtable(). It gets used for the hypervisors
that get informed about "encryption" status via page tables: SEV and TDX.

Then, rename hv_set_mem_enc() to hv_set_visible_hcall(). You'll end up
with:

int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
if (hv_is_isolation_supported())
return hv_set_visible_hcall(...);

if (mem_encrypt_active() || ...)
return __set_memory_enc_pgtable();

/* Nothing to do */
return 0;
}

That tells the story pretty effectively, in code.

> +int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
> +{
> + return hv_set_mem_host_visibility((void *)addr,
> + numpages * HV_HYP_PAGE_SIZE,
> + enc ? VMBUS_PAGE_NOT_VISIBLE
> + : VMBUS_PAGE_VISIBLE_READ_WRITE);
> +}

I know this is off in Hyper-V code, but this just makes my eyes bleed.
I'd much rather see something which is less compact but readable.

> +/* Hyper-V GPA map flags */
> +#define VMBUS_PAGE_NOT_VISIBLE 0
> +#define VMBUS_PAGE_VISIBLE_READ_ONLY 1
> +#define VMBUS_PAGE_VISIBLE_READ_WRITE 3

That looks suspiciously like an enum.

2021-07-28 17:09:22

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 7/28/21 7:52 AM, Tianyu Lan wrote:
> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> int ret;
>
> /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported())
> + return hv_set_mem_enc(addr, numpages, enc);
> + else if (!mem_encrypt_active())
> return 0;

One more thing. If you're going to be patching generic code, please
start using feature checks that can get optimized away at runtime.
hv_is_isolation_supported() doesn't look like the world's cheapest
check. It can't be inlined and costs at least a function call.

These checks could, with basically no effort be wrapped in a header like
this:

static inline bool hv_is_isolation_supported(void)
{
if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
return 0;

// out of line function call:
return __hv_is_isolation_supported();
}

I don't think it would be the end of the world to add an
X86_FEATURE_HYPERV_GUEST, either. There are plenty of bits allocated
for Xen and VMWare.

2021-07-29 12:56:42

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

Hi Dave:
Thanks for your review.

On 7/28/2021 11:29 PM, Dave Hansen wrote:
> On 7/28/21 7:52 AM, Tianyu Lan wrote:
>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>> int ret;
>>
>> /* Nothing to do if memory encryption is not active */
>> - if (!mem_encrypt_active())
>> + if (hv_is_isolation_supported())
>> + return hv_set_mem_enc(addr, numpages, enc);
>> + else if (!mem_encrypt_active())
>> return 0;
>
> __set_memory_enc_dec() is turning into a real mess. SEV, TDX and now
> Hyper-V are messing around in here.
>
> It doesn't help that these additions are totally uncommented. Even
> worse is that hv_set_mem_enc() was intentionally named "enc" when it
> presumably has nothing to do with encryption.
>
> This needs to be refactored. The current __set_memory_enc_dec() can
> become __set_memory_enc_pgtable(). It gets used for the hypervisors
> that get informed about "encryption" status via page tables: SEV and TDX.
>
> Then, rename hv_set_mem_enc() to hv_set_visible_hcall(). You'll end up
> with:
>
> int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> {
> if (hv_is_isolation_supported())
> return hv_set_visible_hcall(...);
>
> if (mem_encrypt_active() || ...)
> return __set_memory_enc_pgtable();
>
> /* Nothing to do */
> return 0;
> }
>
> That tells the story pretty effectively, in code.

Yes, this is good idea. Thanks for your suggestion.

>
>> +int hv_set_mem_enc(unsigned long addr, int numpages, bool enc)
>> +{
>> + return hv_set_mem_host_visibility((void *)addr,
>> + numpages * HV_HYP_PAGE_SIZE,
>> + enc ? VMBUS_PAGE_NOT_VISIBLE
>> + : VMBUS_PAGE_VISIBLE_READ_WRITE);
>> +}
>
> I know this is off in Hyper-V code, but this just makes my eyes bleed.
> I'd much rather see something which is less compact but readable.

OK. Will update.

>
>> +/* Hyper-V GPA map flags */
>> +#define VMBUS_PAGE_NOT_VISIBLE 0
>> +#define VMBUS_PAGE_VISIBLE_READ_ONLY 1
>> +#define VMBUS_PAGE_VISIBLE_READ_WRITE 3
>
> That looks suspiciously like an enum.
>

OK. Will update.


2021-07-29 13:04:09

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 7/29/2021 1:06 AM, Dave Hansen wrote:
> On 7/28/21 7:52 AM, Tianyu Lan wrote:
>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>> int ret;
>>
>> /* Nothing to do if memory encryption is not active */
>> - if (!mem_encrypt_active())
>> + if (hv_is_isolation_supported())
>> + return hv_set_mem_enc(addr, numpages, enc);
>> + else if (!mem_encrypt_active())
>> return 0;
>
> One more thing. If you're going to be patching generic code, please
> start using feature checks that can get optimized away at runtime.
> hv_is_isolation_supported() doesn't look like the world's cheapest
> check. It can't be inlined and costs at least a function call.

Yes, you are right. How about adding a static branch key for the check
of isolation VM? This may reduce the check cost.



2021-07-29 14:14:14

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 7/29/21 6:01 AM, Tianyu Lan wrote:
> On 7/29/2021 1:06 AM, Dave Hansen wrote:
>> On 7/28/21 7:52 AM, Tianyu Lan wrote:
>>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long
>>> addr, int numpages, bool enc)
>>>       int ret;
>>>         /* Nothing to do if memory encryption is not active */
>>> -    if (!mem_encrypt_active())
>>> +    if (hv_is_isolation_supported())
>>> +        return hv_set_mem_enc(addr, numpages, enc);
>>> +    else if (!mem_encrypt_active())
>>>           return 0;
>>
>> One more thing.  If you're going to be patching generic code, please
>> start using feature checks that can get optimized away at runtime.
>> hv_is_isolation_supported() doesn't look like the world's cheapest
>> check.  It can't be inlined and costs at least a function call.
>
> Yes, you are right. How about adding a static branch key for the check
> of isolation VM? This may reduce the check cost.

I don't think you need a static key.

There are basically three choices:
1. Use an existing X86_FEATURE bit. I think there's already one for
when you are running under a hypervisor. It's not super precise,
but it's better than what you have.
2. Define a new X86_FEATURE bit for when you are running under
Hyper-V.
3. Define a new X86_FEATURE bit specifically for Hyper-V isolation VM
support. This particular feature might be a little uncommon to
deserve its own bit.

I'd probably just do #2.

2021-07-29 15:07:21

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 7/29/2021 10:09 PM, Dave Hansen wrote:
> On 7/29/21 6:01 AM, Tianyu Lan wrote:
>> On 7/29/2021 1:06 AM, Dave Hansen wrote:
>>> On 7/28/21 7:52 AM, Tianyu Lan wrote:
>>>> @@ -1986,7 +1988,9 @@ static int __set_memory_enc_dec(unsigned long
>>>> addr, int numpages, bool enc)
>>>>       int ret;
>>>>         /* Nothing to do if memory encryption is not active */
>>>> -    if (!mem_encrypt_active())
>>>> +    if (hv_is_isolation_supported())
>>>> +        return hv_set_mem_enc(addr, numpages, enc);
>>>> +    else if (!mem_encrypt_active())
>>>>           return 0;
>>>
>>> One more thing.  If you're going to be patching generic code, please
>>> start using feature checks that can get optimized away at runtime.
>>> hv_is_isolation_supported() doesn't look like the world's cheapest
>>> check.  It can't be inlined and costs at least a function call.
>>
>> Yes, you are right. How about adding a static branch key for the check
>> of isolation VM? This may reduce the check cost.
>
> I don't think you need a static key.
>
> There are basically three choices:
> 1. Use an existing X86_FEATURE bit. I think there's already one for
> when you are running under a hypervisor. It's not super precise,
> but it's better than what you have.
> 2. Define a new X86_FEATURE bit for when you are running under
> Hyper-V.
> 3. Define a new X86_FEATURE bit specifically for Hyper-V isolation VM
> support. This particular feature might be a little uncommon to
> deserve its own bit.
>
> I'd probably just do #2.
>

There is x86_hyper_type to identify hypervisor type and we may check
this variable after checking X86_FEATURE_HYPERVISOR.

static inline bool hv_is_isolation_supported(void)
{
if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
return 0;

if (x86_hyper_type != X86_HYPER_MS_HYPERV)
return 0;

// out of line function call:
return __hv_is_isolation_supported();
}

2021-07-29 15:14:39

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function


Hi Christoph:
Could you have a look at this patch and the following patch
"[PATCH 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function
for HV IVM" These two patches follows your previous comments and add
dma_map_decrypted/dma_unmap_decrypted(). I don't add arch prefix because
each platform may populate their callbacks into dma memory decrypted ops.

Thanks.

On 7/28/2021 10:52 PM, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
> needs to be mapped into address space above vTOM and so
> introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
> bounce buffer memory. The platform can populate man/unmap callback
> in the dma memory decrypted ops.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> include/linux/dma-map-ops.h | 9 +++++++++
> kernel/dma/mapping.c | 22 ++++++++++++++++++++++
> 2 files changed, 31 insertions(+)
>
> diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
> index 0d53a96a3d64..01d60a024e45 100644
> --- a/include/linux/dma-map-ops.h
> +++ b/include/linux/dma-map-ops.h
> @@ -71,6 +71,11 @@ struct dma_map_ops {
> unsigned long (*get_merge_boundary)(struct device *dev);
> };
>
> +struct dma_memory_decrypted_ops {
> + void *(*map)(void *addr, unsigned long size);
> + void (*unmap)(void *addr);
> +};
> +
> #ifdef CONFIG_DMA_OPS
> #include <asm/dma-mapping.h>
>
> @@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device *dev)
> }
> #endif /* CONFIG_DMA_API_DEBUG */
>
> +void *dma_map_decrypted(void *addr, unsigned long size);
> +int dma_unmap_decrypted(void *addr, unsigned long size);
> +
> extern const struct dma_map_ops dma_dummy_ops;
> +extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
>
> #endif /* _LINUX_DMA_MAP_OPS_H */
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..6fb150dc1750 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -13,11 +13,13 @@
> #include <linux/of_device.h>
> #include <linux/slab.h>
> #include <linux/vmalloc.h>
> +#include <asm/set_memory.h>
> #include "debug.h"
> #include "direct.h"
>
> bool dma_default_coherent;
>
> +struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
> /*
> * Managed DMA API
> */
> @@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
> return ops->get_merge_boundary(dev);
> }
> EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
> +
> +void *dma_map_decrypted(void *addr, unsigned long size)
> +{
> + if (set_memory_decrypted((unsigned long)addr,
> + size / PAGE_SIZE))
> + return NULL;
> +
> + if (dma_memory_generic_decrypted_ops.map)
> + return dma_memory_generic_decrypted_ops.map(addr, size);
> + else
> + return addr;
> +}
> +
> +int dma_unmap_encrypted(void *addr, unsigned long size)
> +{
> + if (dma_memory_generic_decrypted_ops.unmap)
> + dma_memory_generic_decrypted_ops.unmap(addr);
> +
> + return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
> +}
>

2021-07-29 16:12:33

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 7/29/21 8:02 AM, Tianyu Lan wrote:
>>
>
> There is x86_hyper_type to identify hypervisor type and we may check
> this variable after checking X86_FEATURE_HYPERVISOR.
>
> static inline bool hv_is_isolation_supported(void)
> {
>     if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>         return 0;
>
>         if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>                 return 0;
>
>     // out of line function call:
>     return __hv_is_isolation_supported();
> }   

Looks fine. You just might want to use this existing helper:

static inline bool hypervisor_is_type(enum x86_hypervisor_type type)
{
return x86_hyper_type == type;
}


2021-07-29 16:32:16

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

On Wed, Jul 28, 2021 at 10:52:25AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
>
> Use dma_map_decrypted() in the swiotlb code, store remap address returned
> and use the remap address to copy data from/to swiotlb bounce buffer.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> include/linux/swiotlb.h | 4 ++++
> kernel/dma/swiotlb.c | 11 ++++++++---
> 2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index f507e3eacbea..584560ecaa8e 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
> * @end: The end address of the swiotlb memory pool. Used to do a quick
> * range check to see if the memory was in fact allocated by this
> * API.
> + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
> + * memory pool may be remapped in the memory encrypted case and store
> + * virtual address for bounce buffer operation.
> * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
> * @end. For default swiotlb, this is command line adjustable via
> * setup_io_tlb_npages.
> @@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
> struct io_tlb_mem {
> phys_addr_t start;
> phys_addr_t end;
> + void *vaddr;
> unsigned long nslabs;
> unsigned long used;
> unsigned int index;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 1fa81c096c1d..6866e5784b53 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -194,8 +194,13 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
> mem->slots[i].alloc_size = 0;
> }
>
> - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
> - memset(vaddr, 0, bytes);
> + mem->vaddr = dma_map_decrypted(vaddr, bytes);
> + if (!mem->vaddr) {
> + pr_err("Failed to decrypt memory.\n");

I am wondering if it would be worth returning an error code in this
function instead of just printing an error?

For this patch I think it is Ok, but perhaps going forward this would be
better done as I am thinking - is there some global guest->hyperv
reporting mechanism so that if this fails - it ends up being bubbled up
to the HyperV console-ish?

And ditto for other hypervisors?


> + return;
> + }
> +
> + memset(mem->vaddr, 0, bytes);
> }
>
> int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> @@ -360,7 +365,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
> phys_addr_t orig_addr = mem->slots[index].orig_addr;
> size_t alloc_size = mem->slots[index].alloc_size;
> unsigned long pfn = PFN_DOWN(orig_addr);
> - unsigned char *vaddr = phys_to_virt(tlb_addr);
> + unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
> unsigned int tlb_offset;
>
> if (orig_addr == INVALID_PHYS_ADDR)
> --
> 2.25.1
>

2021-07-30 02:56:43

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 7/30/2021 12:05 AM, Dave Hansen wrote:
> On 7/29/21 8:02 AM, Tianyu Lan wrote:
>>>
>>
>> There is x86_hyper_type to identify hypervisor type and we may check
>> this variable after checking X86_FEATURE_HYPERVISOR.
>>
>> static inline bool hv_is_isolation_supported(void)
>> {
>>     if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
>>         return 0;
>>
>>         if (x86_hyper_type != X86_HYPER_MS_HYPERV)
>>                 return 0;
>>
>>     // out of line function call:
>>     return __hv_is_isolation_supported();
>> }
>
> Looks fine. You just might want to use this existing helper:
>
> static inline bool hypervisor_is_type(enum x86_hypervisor_type type)
> {
> return x86_hyper_type == type;
> }
>

Yes,thanks for suggestion and will update in the next version.

2021-07-30 04:15:54

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

Hi Konrad:
Thanks for your review.

On 7/30/2021 12:29 AM, Konrad Rzeszutek Wilk wrote:
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index 1fa81c096c1d..6866e5784b53 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -194,8 +194,13 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
>> mem->slots[i].alloc_size = 0;
>> }
>>
>> - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
>> - memset(vaddr, 0, bytes);
>> + mem->vaddr = dma_map_decrypted(vaddr, bytes);
>> + if (!mem->vaddr) {
>> + pr_err("Failed to decrypt memory.\n");
> I am wondering if it would be worth returning an error code in this
> function instead of just printing an error?

Yes, this is good idea and will update in the next version.

>
> For this patch I think it is Ok, but perhaps going forward this would be
> better done as I am thinking - is there some global guest->hyperv
> reporting mechanism so that if this fails - it ends up being bubbled up
> to the HyperV console-ish?

Hyper-V has such panic page report mechanism. Guest can pass one page
log to host during crash.


2021-08-02 11:55:15

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM

On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote:
> +static int hyperv_init_ghcb(void)
> +{
> + u64 ghcb_gpa;
> + void *ghcb_va;
> + void **ghcb_base;
> +
> + if (!ms_hyperv.ghcb_base)
> + return -EINVAL;
> +
> + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
> + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);

This deserves a comment. As I understand it, the GHCB pa is set by
Hyper-V or the paravisor, so the page does not need to be allocated by
Linux.
And it is not mapped unencrypted because the GHCB page is allocated
above the VTOM boundary?

> @@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu)
> {
> struct hv_reenlightenment_control re_ctrl;
> unsigned int new_cpu;
> + unsigned long flags;
> + void **input_arg;
> + void *pg;
> + void **ghcb_va = NULL;
> +
> + local_irq_save(flags);
> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> + pg = *input_arg;

Pg is never used later on, why is it set?


2021-08-02 12:04:17

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:
> __set_memory_enc_dec() is turning into a real mess. SEV, TDX and now
> Hyper-V are messing around in here.

I was going to suggest a PV_OPS call where the fitting implementation
for the guest environment can be plugged in at boot. There is TDX and an
SEV(-SNP) case, a Hyper-V case, and likely more coming up from other
cloud/hypervisor vendors. Hiding all these behind feature checks is not
going to make things cleaner.

Regards,

Joerg

2021-08-02 12:09:12

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote:
> + if (type == HV_GPADL_BUFFER)
> + index = 0;
> + else
> + index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;

Hmm... This doesn't look very robust. Can you set fixed indexes for
different buffer types? HV_GPADL_BUFFER already has fixed index 0. But
as it is implemented here you risk that index 2 gets overwritten by
subsequent calls.


2021-08-02 12:32:39

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page

On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote:
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + WARN_ON(in_nmi());
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);

Do you really need to zero out the whole 4k? The validation bitmap
should be enough, there are no secrets on the page anyway.
Same in hv_ghcb_msr_read().

> +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> + struct es_em_ctxt *ctxt,
> + u64 exit_code, u64 exit_info_1,
> + u64 exit_info_2)
> {
> enum es_result ret;
>
> @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
> ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
> ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
>
> - sev_es_wr_ghcb_msr(__pa(ghcb));
> + /*
> + * Hyper-V runs paravisor with SEV. Ghcb page is allocated by
> + * paravisor and not needs to be updated in the Linux guest.
> + * Otherwise, the ghcb page's PA reported by paravisor is above
> + * VTOM. Hyper-V use this function with NULL for ctxt point and
> + * skip setting ghcb page in such case.
> + */
> + if (ctxt)
> + sev_es_wr_ghcb_msr(__pa(ghcb));

No, do not make this function work with ctxt==NULL. Instead, factor out
a helper function which contains what Hyper-V needs and use that in
sev_es_ghcb_hv_call() and Hyper-V code.

> +union hv_ghcb {
> + struct ghcb ghcb;
> +} __packed __aligned(PAGE_SIZE);

I am curious what this will end up being good for.


2021-08-02 12:38:30

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 01/13] x86/HV: Initialize GHCB page in Isolation VM

Hi Joerg:
Thanks for your review.


On 8/2/2021 7:53 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:16AM -0400, Tianyu Lan wrote:
>> +static int hyperv_init_ghcb(void)
>> +{
>> + u64 ghcb_gpa;
>> + void *ghcb_va;
>> + void **ghcb_base;
>> +
>> + if (!ms_hyperv.ghcb_base)
>> + return -EINVAL;
>> +
>> + rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
>> + ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
>
> This deserves a comment. As I understand it, the GHCB pa is set by
> Hyper-V or the paravisor, so the page does not need to be allocated by
> Linux.
> And it is not mapped unencrypted because the GHCB page is allocated
> above the VTOM boundary?

You are right. The ghdb page is allocated by paravisor and its physical
address is above VTOM boundary. Will add a comment to describe this.
Thanks for suggestion.

>
>> @@ -167,6 +190,31 @@ static int hv_cpu_die(unsigned int cpu)
>> {
>> struct hv_reenlightenment_control re_ctrl;
>> unsigned int new_cpu;
>> + unsigned long flags;
>> + void **input_arg;
>> + void *pg;
>> + void **ghcb_va = NULL;
>> +
>> + local_irq_save(flags);
>> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>> + pg = *input_arg;
>
> Pg is never used later on, why is it set?

Sorry for noise. This should be removed during rebase and will fix in
the next version.

2021-08-02 12:43:49

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 06/13] HV: Add ghcb hvcall support for SNP VM

On Wed, Jul 28, 2021 at 10:52:21AM -0400, Tianyu Lan wrote:
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 1;

The values set to ghcb_usage deserve some defines (here and below).

> +
> + hv_ghcb->hypercall.outputgpa = (u64)output;
> + hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
> + hv_ghcb->hypercall.hypercallinput.callcode = control;
> +
> + if (input_size)
> + memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
> +
> + VMGEXIT();
> +
> + hv_ghcb->ghcb.ghcb_usage = 0xffffffff;

...

> union hv_ghcb {
> struct ghcb ghcb;
> + struct {
> + u64 hypercalldata[509];
> + u64 outputgpa;
> + union {
> + union {
> + struct {
> + u32 callcode : 16;
> + u32 isfast : 1;
> + u32 reserved1 : 14;
> + u32 isnested : 1;
> + u32 countofelements : 12;
> + u32 reserved2 : 4;
> + u32 repstartindex : 12;
> + u32 reserved3 : 4;
> + };
> + u64 asuint64;
> + } hypercallinput;
> + union {
> + struct {
> + u16 callstatus;
> + u16 reserved1;
> + u32 elementsprocessed : 12;
> + u32 reserved2 : 20;
> + };
> + u64 asunit64;
> + } hypercalloutput;
> + };
> + u64 reserved2;
> + } hypercall;

Okay, this answers my previous question :)


2021-08-02 12:59:00

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM



On 8/2/2021 8:07 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:19AM -0400, Tianyu Lan wrote:
>> + if (type == HV_GPADL_BUFFER)
>> + index = 0;
>> + else
>> + index = channel->gpadl_range[1].gpadlhandle ? 2 : 1;
>
> Hmm... This doesn't look very robust. Can you set fixed indexes for
> different buffer types? HV_GPADL_BUFFER already has fixed index 0. But
> as it is implemented here you risk that index 2 gets overwritten by
> subsequent calls.

Both second and third are HV_GPADL_RING type. One is send ring and the
other is receive ring. The driver keeps the order to allocate rx and
tx buffer. You are right this is not robust and will add a mutex to keep
the order.

2021-08-02 13:01:37

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

On Mon, Aug 02, 2021 at 08:56:29PM +0800, Tianyu Lan wrote:
> Both second and third are HV_GPADL_RING type. One is send ring and the
> other is receive ring. The driver keeps the order to allocate rx and
> tx buffer. You are right this is not robust and will add a mutex to keep
> the order.

Or you introduce fixed indexes for the RX and TX buffers?

2021-08-02 13:02:38

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message

On Wed, Jul 28, 2021 at 10:52:22AM -0400, Tianyu Lan wrote:
> + if (hv_is_isolation_supported()) {
> + vmbus_connection.monitor_pages_va[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages[0]
> + = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
> + MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[0])
> + return -ENOMEM;
> +
> + vmbus_connection.monitor_pages_va[1]
> + = vmbus_connection.monitor_pages[1];
> + vmbus_connection.monitor_pages[1]
> + = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
> + MEMREMAP_WB);
> + if (!vmbus_connection.monitor_pages[1]) {
> + memunmap(vmbus_connection.monitor_pages[0]);
> + return -ENOMEM;
> + }
> +
> + memset(vmbus_connection.monitor_pages[0], 0x00,
> + HV_HYP_PAGE_SIZE);
> + memset(vmbus_connection.monitor_pages[1], 0x00,
> + HV_HYP_PAGE_SIZE);
> + }

Okay, let me see if I got this right. In Hyper-V Isolation VMs, when the
guest wants to make memory shared, it does":

- Call to the Hypervisor the mark the pages shared. The
Hypervisor will do the RMP update and remap the pages at
(VTOM + pa)

- The guest maps the memory again into its page-table, so that
the entries point to the correct GPA (which is above VTOM
now).

Or in other words, Hyper-V implements a hardware-independent and
configurable c-bit position, as the VTOM value is always power-of-two
aligned. Is that correct?
This would at least explain why there is no separate
allocation/dealloction of memory for the shared range.

Thanks,

Joerg

2021-08-02 13:03:50

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support



On 8/2/2021 8:01 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:
>> __set_memory_enc_dec() is turning into a real mess. SEV, TDX and now
>> Hyper-V are messing around in here.
>
> I was going to suggest a PV_OPS call where the fitting implementation
> for the guest environment can be plugged in at boot. There is TDX and an
> SEV(-SNP) case, a Hyper-V case, and likely more coming up from other
> cloud/hypervisor vendors. Hiding all these behind feature checks is not
> going to make things cleaner.
>

Yes, that makes sense. I will do this in the next version.

2021-08-02 13:10:18

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

On 8/2/2021 8:59 PM, Joerg Roedel wrote:
> On Mon, Aug 02, 2021 at 08:56:29PM +0800, Tianyu Lan wrote:
>> Both second and third are HV_GPADL_RING type. One is send ring and the
>> other is receive ring. The driver keeps the order to allocate rx and
>> tx buffer. You are right this is not robust and will add a mutex to keep
>> the order.
>
> Or you introduce fixed indexes for the RX and TX buffers?
>

The interface just allocates a buffer and driver will continue to
configure the buffer to be rx or tx after calling.


2021-08-02 13:12:51

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On 02.08.21 14:01, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 08:29:41AM -0700, Dave Hansen wrote:
>> __set_memory_enc_dec() is turning into a real mess. SEV, TDX and now
>> Hyper-V are messing around in here.
>
> I was going to suggest a PV_OPS call where the fitting implementation
> for the guest environment can be plugged in at boot. There is TDX and an
> SEV(-SNP) case, a Hyper-V case, and likely more coming up from other
> cloud/hypervisor vendors. Hiding all these behind feature checks is not
> going to make things cleaner.

As those cases are all mutually exclusive, wouldn't a static_call() be
the appropriate solution?


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.06 kB)
OpenPGP public key
OpenPGP_signature (505.00 B)
OpenPGP digital signature
Download all attachments

2021-08-02 13:19:46

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 05/13] HV: Add Write/Read MSR registers via ghcb page

On 8/2/2021 8:28 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:20AM -0400, Tianyu Lan wrote:
>> +void hv_ghcb_msr_write(u64 msr, u64 value)
>> +{
>> + union hv_ghcb *hv_ghcb;
>> + void **ghcb_base;
>> + unsigned long flags;
>> +
>> + if (!ms_hyperv.ghcb_base)
>> + return;
>> +
>> + WARN_ON(in_nmi());
>> +
>> + local_irq_save(flags);
>> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
>> + if (!hv_ghcb) {
>> + local_irq_restore(flags);
>> + return;
>> + }
>> +
>> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
>
> Do you really need to zero out the whole 4k? The validation bitmap
> should be enough, there are no secrets on the page anyway.
> Same in hv_ghcb_msr_read().

OK. Thanks for suggestion. I will have a try.

>
>> +enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
>> + struct es_em_ctxt *ctxt,
>> + u64 exit_code, u64 exit_info_1,
>> + u64 exit_info_2)
>> {
>> enum es_result ret;
>>
>> @@ -109,7 +109,16 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
>> ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
>> ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
>>
>> - sev_es_wr_ghcb_msr(__pa(ghcb));
>> + /*
>> + * Hyper-V runs paravisor with SEV. Ghcb page is allocated by
>> + * paravisor and not needs to be updated in the Linux guest.
>> + * Otherwise, the ghcb page's PA reported by paravisor is above
>> + * VTOM. Hyper-V use this function with NULL for ctxt point and
>> + * skip setting ghcb page in such case.
>> + */
>> + if (ctxt)
>> + sev_es_wr_ghcb_msr(__pa(ghcb));
>
> No, do not make this function work with ctxt==NULL. Instead, factor out
> a helper function which contains what Hyper-V needs and use that in
> sev_es_ghcb_hv_call() and Hyper-V code.
>

OK. Will update.

>> +union hv_ghcb {
>> + struct ghcb ghcb;
>> +} __packed __aligned(PAGE_SIZE);
>
> I am curious what this will end up being good for.
>

Hyper-V introduces a specific hypercall request in GHCB page and use
same union in the Linux Hyper-V code to read/write MSR and call the new
hypercall request.

2021-08-02 13:23:44

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote:
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still need to handle. Use DMA API to map/umap these
> memory during sending/receiving packet and Hyper-V DMA ops callback
> will use swiotlb function to allocate bounce buffer and copy data
> from/to bounce buffer.

I am wondering why you dont't use DMA-API unconditionally? It provides
enough abstraction to do the right thing for isolated and legacy VMs.

Regards,

Joerg

2021-08-02 13:33:12

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 03/13] x86/HV: Add new hvcall guest address host visibility support

On Mon, Aug 02, 2021 at 03:11:40PM +0200, Juergen Gross wrote:
> As those cases are all mutually exclusive, wouldn't a static_call() be
> the appropriate solution?

Right, static_call() is even better, thanks.

2021-08-02 13:35:14

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 06/13] HV: Add ghcb hvcall support for SNP VM

On 8/2/2021 8:39 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:21AM -0400, Tianyu Lan wrote:
>> + hv_ghcb->ghcb.protocol_version = 1;
>> + hv_ghcb->ghcb.ghcb_usage = 1;
>
> The values set to ghcb_usage deserve some defines (here and below).
>

OK. Will update in the next version.


2021-08-02 14:25:08

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

On 8/2/2021 9:20 PM, Joerg Roedel wrote:
> On Wed, Jul 28, 2021 at 10:52:28AM -0400, Tianyu Lan wrote:
>> In Isolation VM, all shared memory with host needs to mark visible
>> to host via hvcall. vmbus_establish_gpadl() has already done it for
>> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
>> mpb_desc() still need to handle. Use DMA API to map/umap these
>> memory during sending/receiving packet and Hyper-V DMA ops callback
>> will use swiotlb function to allocate bounce buffer and copy data
>> from/to bounce buffer.
>
> I am wondering why you dont't use DMA-API unconditionally? It provides
> enough abstraction to do the right thing for isolated and legacy VMs.
>

In VMbus, there is already a similar bounce buffer design and so there
is no need to call DMA-API for such buffer. Calling DMA-API is to use
swiotlb bounce buffer for those buffer which hasn't been covered. This
is why need to conditionally call DMA API.