2021-12-07 07:56:08

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V6 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

From: Tianyu Lan <[email protected]>

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM.

This version follows Michael Kelley suggestion in the following link.
https://lkml.org/lkml/2021/11/24/2044

Change sicne v5:
* Modify "Swiotlb" to "swiotlb" in commit log.
* Remove CONFIG_HYPERV check in the hyperv_cc_platform_has()

Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
in the hyperv_init().

Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
Move calloing of set_memory_decrypted() back from
swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
and rmem_swiotlb_device_init().
* Change code style of checking GUEST_MEM attribute in the
hyperv_cc_platform_has().
* Add comment in pci-swiotlb-xen.c to explain why add
dependency between hyperv_swiotlb_detect() and pci_
xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
buffer in the hyperv_iommu_swiotlb_init().

Change since v2:
* Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add
hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra
address space.
* Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start)
when fail to remap swiotlb memory.

Change since v1:
* Add Hyper-V Isolation support check in the cc_platform_has()
and return true for guest memory encrypt attr.
* Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (5):
swiotlb: Add swiotlb bounce buffer remap function for HV IVM
x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
hyper-v: Enable swiotlb bounce buffer for Isolation VM
scsi: storvsc: Add Isolation VM support for storvsc driver
net: netvsc: Add Isolation VM support for netvsc driver

arch/x86/hyperv/hv_init.c | 10 +++
arch/x86/hyperv/ivm.c | 28 ++++++
arch/x86/kernel/cc_platform.c | 8 ++
arch/x86/kernel/cpu/mshyperv.c | 11 ++-
drivers/hv/hv_common.c | 11 +++
drivers/hv/vmbus_drv.c | 4 +
drivers/net/hyperv/hyperv_net.h | 5 ++
drivers/net/hyperv/netvsc.c | 136 +++++++++++++++++++++++++++++-
drivers/net/hyperv/netvsc_drv.c | 1 +
drivers/net/hyperv/rndis_filter.c | 2 +
drivers/scsi/storvsc_drv.c | 37 ++++----
include/asm-generic/mshyperv.h | 2 +
include/linux/hyperv.h | 14 +++
include/linux/swiotlb.h | 6 ++
kernel/dma/swiotlb.c | 43 +++++++++-
15 files changed, 296 insertions(+), 22 deletions(-)

--
2.25.1



2021-12-07 07:56:17

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V6 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

From: Tianyu Lan <[email protected]>

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Tianyu Lan <[email protected]>
---
Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
Move calloing of set_memory_decrypted() back from
swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
and rmem_swiotlb_device_init().

Change since v2:
* Leave mem->vaddr with phys_to_virt(mem->start) when fail
to remap swiotlb memory.

Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
include/linux/swiotlb.h | 6 ++++++
kernel/dma/swiotlb.c | 43 +++++++++++++++++++++++++++++++++++++++--
2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
* @end: The end address of the swiotlb memory pool. Used to do a quick
* range check to see if the memory was in fact allocated by this
* API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
* @nslabs: The number of IO TLB blocks (in groups of 64) between @start and
* @end. For default swiotlb, this is command line adjustable via
* setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+ void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
}
#endif /* CONFIG_DMA_RESTRICTED_POOL */

+extern phys_addr_t swiotlb_unencrypted_base;
+
#endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
#include <asm/io.h>
#include <asm/dma.h>

+#include <linux/io.h>
#include <linux/init.h>
#include <linux/memblock.h>
#include <linux/iommu-helper.h>
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;

struct io_tlb_mem io_tlb_default_mem;

+phys_addr_t swiotlb_unencrypted_base;
+
/*
* Max segment that we can provide which (if pages are contingous) will
* not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
}

+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+ void *vaddr = NULL;
+
+ if (swiotlb_unencrypted_base) {
+ phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+ vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+ if (!vaddr)
+ pr_err("Failed to map the unencrypted memory %llx size %lx.\n",
+ paddr, bytes);
+ }
+
+ return vaddr;
+}
+
/*
* Early SWIOTLB allocation may be too early to allow an architecture to
* perform the desired operations. This function allows the architecture to
@@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
- memset(vaddr, 0, bytes);
+
+ mem->vaddr = swiotlb_mem_remap(mem, bytes);
+ if (!mem->vaddr)
+ mem->vaddr = vaddr;
+
+ memset(mem->vaddr, 0, bytes);
}

static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
}
+
+ /*
+ * If swiotlb_unencrypted_base is set, the bounce buffer memory will
+ * be remapped and cleared in swiotlb_update_mem_attributes.
+ */
+ if (swiotlb_unencrypted_base)
+ return;
+
memset(vaddr, 0, bytes);
+ mem->vaddr = vaddr;
+ return;
}

int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
@@ -371,7 +410,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
unsigned long pfn = PFN_DOWN(orig_addr);
- unsigned char *vaddr = phys_to_virt(tlb_addr);
+ unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
unsigned int tlb_offset, orig_addr_offset;

if (orig_addr == INVALID_PHYS_ADDR)
--
2.25.1


2021-12-07 07:56:26

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

From: Tianyu Lan <[email protected]>

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since v3:
* Change code style of checking GUEST_MEM attribute in the
hyperv_cc_platform_has().
---
arch/x86/kernel/cc_platform.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..47db88c275d5 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
#include <linux/cc_platform.h>
#include <linux/mem_encrypt.h>

+#include <asm/mshyperv.h>
#include <asm/processor.h>

static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr)
#endif
}

+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+ return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+}

bool cc_platform_has(enum cc_attr attr)
{
+ if (hv_is_isolation_supported())
+ return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);

--
2.25.1


2021-12-07 07:56:31

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

From: Tianyu Lan <[email protected]>

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
(e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
attributes() in the hyperv_init().

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
in the hyperv_init().

Change since v3:
* Add comment in pci-swiotlb-xen.c to explain why add
dependency between hyperv_swiotlb_detect() and pci_
xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
buffer in the hyperv_iommu_swiotlb_init().
---
arch/x86/hyperv/hv_init.c | 10 ++++++++++
arch/x86/kernel/cpu/mshyperv.c | 11 ++++++++++-
include/linux/hyperv.h | 8 ++++++++
3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 24f4a06ac46a..9e18a280f89d 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -28,6 +28,7 @@
#include <linux/syscore_ops.h>
#include <clocksource/hyperv_timer.h>
#include <linux/highmem.h>
+#include <linux/swiotlb.h>

int hyperv_init_cpuhp;
u64 hv_current_partition_id = ~0ull;
@@ -502,6 +503,15 @@ void __init hyperv_init(void)

/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);
+
+ /*
+ * Swiotlb bounce buffer needs to be mapped in extra address
+ * space. Map function doesn't work in the early place and so
+ * call swiotlb_update_mem_attributes() here.
+ */
+ if (hv_is_isolation_supported())
+ swiotlb_update_mem_attributes();
+
return;

clean_guest_os_id:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 4794b716ec79..baf3a0873552 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -18,6 +18,7 @@
#include <linux/kexec.h>
#include <linux/i8253.h>
#include <linux/random.h>
+#include <linux/swiotlb.h>
#include <asm/processor.h>
#include <asm/hypervisor.h>
#include <asm/hyperv-tlfs.h>
@@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);

- if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+ if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
static_branch_enable(&isolation_type_snp);
+ swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
+ }
+
+ /*
+ * Enable swiotlb force mode in Isolation VM to
+ * use swiotlb bounce buffer for dma transaction.
+ */
+ swiotlb_force = SWIOTLB_FORCE;
}

if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b823311eac79..1f037e114dc8 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
void (*block_invalidate)(void *context,
u64 block_mask));
+#if IS_ENABLED(CONFIG_HYPERV)
+int __init hyperv_swiotlb_detect(void);
+#else
+static inline int __init hyperv_swiotlb_detect(void)
+{
+ return 0;
+}
+#endif

struct hyperv_pci_block_ops {
int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
--
2.25.1


2021-12-07 07:56:34

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

From: Tianyu Lan <[email protected]>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan <[email protected]>
---
drivers/hv/vmbus_drv.c | 4 ++++
drivers/scsi/storvsc_drv.c | 37 +++++++++++++++++++++----------------
include/linux/hyperv.h | 1 +
3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..ae6ec503399a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
#include <linux/random.h>
#include <linux/kernel.h>
#include <linux/syscore_ops.h>
+#include <linux/dma-map-ops.h>
#include <clocksource/hyperv_timer.h>
#include "hyperv_vmbus.h"

@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
}

+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
/*
* vmbus_device_register - Register the child device
*/
@@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device *child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);

+ child_device_obj->device.dma_mask = &vmbus_dma_mask;
+ child_device_obj->device.dma_parms = &child_device_obj->dma_parms;
return 0;

err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
#include <linux/device.h>
#include <linux/hyperv.h>
#include <linux/blkdev.h>
+#include <linux/dma-mapping.h>
+
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
#include <scsi/scsi_host.h>
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
+ scsi_dma_unmap(scmnd);
}

storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
- int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);

if (sg_count) {
- unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
- u64 hvpfn;
+ struct scatterlist *sg;
+ unsigned long hvpfn, hvpfns_to_add;
+ int j, i = 0;

if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {

@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;

+ sg_count = scsi_dma_map(scmnd);
+ if (sg_count < 0)
+ return SCSI_MLQUEUE_DEVICE_BUSY;

- for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+ for_each_sg(sgl, sg, sg_count, j) {
/*
- * Init values for the current sgl entry. hvpgoff
- * and hvpfns_to_add are in units of Hyper-V size
- * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
- * case also handles values of sgl->offset that are
- * larger than PAGE_SIZE. Such offsets are handled
- * even on other than the first sgl entry, provided
- * they are a multiple of PAGE_SIZE.
+ * Init values for the current sgl entry. hvpfns_to_add
+ * is in units of Hyper-V size pages. Handling the
+ * PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+ * values of sgl->offset that are larger than PAGE_SIZE.
+ * Such offsets are handled even on other than the first
+ * sgl entry, provided they are a multiple of PAGE_SIZE.
*/
- hvpgoff = HVPFN_DOWN(sgl->offset);
- hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
- hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
- hvpgoff;
+ hvpfn = HVPFN_DOWN(sg_dma_address(sg));
+ hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
+ sg_dma_len(sg)) - hvpfn;

/*
* Fill the next portion of the PFN array with
@@ -1872,7 +1876,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
* the PFN array is filled.
*/
while (hvpfns_to_add--)
- payload->range.pfn_array[i++] = hvpfn++;
+ payload->range.pfn_array[i++] = hvpfn++;
}
}

@@ -2016,6 +2020,7 @@ static int storvsc_probe(struct hv_device *device,
stor_device->vmscsi_size_delta = sizeof(struct vmscsi_win8_extension);
spin_lock_init(&stor_device->lock);
hv_set_drvdata(device, stor_device);
+ dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);

stor_device->port_number = host->host_no;
ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size, is_fc);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 1f037e114dc8..74f5e92f91a0 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1261,6 +1261,7 @@ struct hv_device {

struct vmbus_channel *channel;
struct kset *channels_kset;
+ struct device_dma_parameters dma_parms;

/* place holder to keep track of the dir for hv device in debugfs */
struct dentry *debug_dir;
--
2.25.1


2021-12-07 07:56:36

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver

From: Tianyu Lan <[email protected]>

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since v3:
* Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
with vmalloc_to_pfn() in the hv_map_memory()

Change since v2:
* Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
arch/x86/hyperv/ivm.c | 28 ++++++
drivers/hv/hv_common.c | 11 +++
drivers/net/hyperv/hyperv_net.h | 5 ++
drivers/net/hyperv/netvsc.c | 136 +++++++++++++++++++++++++++++-
drivers/net/hyperv/netvsc_drv.c | 1 +
drivers/net/hyperv/rndis_filter.c | 2 +
include/asm-generic/mshyperv.h | 2 +
include/linux/hyperv.h | 5 ++
8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..2b994117581e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl
kfree(pfn_array);
return ret;
}
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+ unsigned long *pfns = kcalloc(size / PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+ void *vaddr;
+ int i;
+
+ if (!pfns)
+ return NULL;
+
+ for (i = 0; i < size / PAGE_SIZE; i++)
+ pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
+ (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+ vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+ kfree(pfns);
+
+ return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+ vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
}
EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+ struct hv_dma_range *dma_range;
};

#define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {

/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+ void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {

/* Send buffer allocated by us */
void *send_buf;
+ void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
#define RETRY_US_HI 10000
#define RETRY_MAX 2000 /* >10 sec */

+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
#endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..b7ade735a806 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;

kfree(nvdev->extension);
- vfree(nvdev->recv_buf);
- vfree(nvdev->send_buf);
+
+ if (nvdev->recv_original_buf) {
+ hv_unmap_memory(nvdev->recv_buf);
+ vfree(nvdev->recv_original_buf);
+ } else {
+ vfree(nvdev->recv_buf);
+ }
+
+ if (nvdev->send_original_buf) {
+ hv_unmap_memory(nvdev->send_buf);
+ vfree(nvdev->send_original_buf);
+ } else {
+ vfree(nvdev->send_buf);
+ }
+
kfree(nvdev->send_section_map);

for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -338,6 +351,7 @@ static int netvsc_init_buf(struct hv_device *device,
unsigned int buf_size;
size_t map_words;
int i, ret = 0;
+ void *vaddr;

/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -373,6 +387,17 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}

+ if (hv_isolation_type_snp()) {
+ vaddr = hv_map_memory(net_device->recv_buf, buf_size);
+ if (!vaddr) {
+ ret = -ENOMEM;
+ goto cleanup;
+ }
+
+ net_device->recv_original_buf = net_device->recv_buf;
+ net_device->recv_buf = vaddr;
+ }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -476,6 +501,17 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}

+ if (hv_isolation_type_snp()) {
+ vaddr = hv_map_memory(net_device->send_buf, buf_size);
+ if (!vaddr) {
+ ret = -ENOMEM;
+ goto cleanup;
+ }
+
+ net_device->send_original_buf = net_device->send_buf;
+ net_device->send_buf = vaddr;
+ }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = &net_device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -766,7 +802,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,

/* Notify the layer above us */
if (likely(skb)) {
- const struct hv_netvsc_packet *packet
+ struct hv_netvsc_packet *packet
= (struct hv_netvsc_packet *)skb->cb;
u32 send_index = packet->send_buf_index;
struct netvsc_stats *tx_stats;
@@ -782,6 +818,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
tx_stats->bytes += packet->total_bytes;
u64_stats_update_end(&tx_stats->syncp);

+ netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
napi_consume_skb(skb, budget);
}

@@ -946,6 +983,88 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
memset(dest, 0, padding);
}

+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet)
+{
+ u32 page_count = packet->cp_partial ?
+ packet->page_buf_cnt - packet->rmsg_pgcnt :
+ packet->page_buf_cnt;
+ int i;
+
+ if (!hv_is_isolation_supported())
+ return;
+
+ if (!packet->dma_range)
+ return;
+
+ for (i = 0; i < page_count; i++)
+ dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
+ packet->dma_range[i].mapping_size,
+ DMA_TO_DEVICE);
+
+ kfree(packet->dma_range);
+}
+
+/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
+ * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
+ * VM.
+ *
+ * In isolation VM, netvsc send buffer has been marked visible to
+ * host and so the data copied to send buffer doesn't need to use
+ * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
+ * may not be copied to send buffer and so these pages need to be
+ * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
+ * that. The pfns in the struct hv_page_buffer need to be converted
+ * to bounce buffer's pfn. The loop here is necessary because the
+ * entries in the page buffer array are not necessarily full
+ * pages of data. Each entry in the array has a separate offset and
+ * len that may be non-zero, even for entries in the middle of the
+ * array. And the entries are not physically contiguous. So each
+ * entry must be individually mapped rather than as a contiguous unit.
+ * So not use dma_map_sg() here.
+ */
+int netvsc_dma_map(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet,
+ struct hv_page_buffer *pb)
+{
+ u32 page_count = packet->cp_partial ?
+ packet->page_buf_cnt - packet->rmsg_pgcnt :
+ packet->page_buf_cnt;
+ dma_addr_t dma;
+ int i;
+
+ if (!hv_is_isolation_supported())
+ return 0;
+
+ packet->dma_range = kcalloc(page_count,
+ sizeof(*packet->dma_range),
+ GFP_KERNEL);
+ if (!packet->dma_range)
+ return -ENOMEM;
+
+ for (i = 0; i < page_count; i++) {
+ char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
+ + pb[i].offset);
+ u32 len = pb[i].len;
+
+ dma = dma_map_single(&hv_dev->device, src, len,
+ DMA_TO_DEVICE);
+ if (dma_mapping_error(&hv_dev->device, dma)) {
+ kfree(packet->dma_range);
+ return -ENOMEM;
+ }
+
+ /* pb[].offset and pb[].len are not changed during dma mapping
+ * and so not reassign.
+ */
+ packet->dma_range[i].dma = dma;
+ packet->dma_range[i].mapping_size = len;
+ pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
+ }
+
+ return 0;
+}
+
static inline int netvsc_send_pkt(
struct hv_device *device,
struct hv_netvsc_packet *packet,
@@ -986,14 +1105,24 @@ static inline int netvsc_send_pkt(

trace_nvsp_send_pkt(ndev, out_channel, rpkt);

+ packet->dma_range = NULL;
if (packet->page_buf_cnt) {
if (packet->cp_partial)
pb += packet->rmsg_pgcnt;

+ ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
+ if (ret) {
+ ret = -EAGAIN;
+ goto exit;
+ }
+
ret = vmbus_sendpacket_pagebuffer(out_channel,
pb, packet->page_buf_cnt,
&nvmsg, sizeof(nvmsg),
req_id);
+
+ if (ret)
+ netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
} else {
ret = vmbus_sendpacket(out_channel,
&nvmsg, sizeof(nvmsg),
@@ -1001,6 +1130,7 @@ static inline int netvsc_send_pkt(
VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
}

+exit:
if (ret == 0) {
atomic_inc_return(&nvchan->queue_sends);

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 7e66ae1d2a59..17958533bf30 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2512,6 +2512,7 @@ static int netvsc_probe(struct hv_device *dev,
net->netdev_ops = &device_ops;
net->ethtool_ops = &ethtool_ops;
SET_NETDEV_DEV(net, &dev->device);
+ dma_set_min_align_mask(&dev->device, HV_HYP_PAGE_SIZE - 1);

/* We always need headroom for rndis header */
net->needed_headroom = RNDIS_AND_PPI_SIZE;
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index f6c9c2a670f9..448fcc325ed7 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -361,6 +361,8 @@ static void rndis_filter_receive_response(struct net_device *ndev,
}
}

+ netvsc_dma_unmap(((struct net_device_context *)
+ netdev_priv(ndev))->device_ctx, &request->pkt);
complete(&request->wait_event);
} else {
netdev_err(ndev,
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 3e2248ac328e..94e73ba129c5 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -269,6 +269,8 @@ bool hv_isolation_type_snp(void);
u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
void hyperv_cleanup(void);
bool hv_query_ext_cap(u64 cap_query);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
#else /* CONFIG_HYPERV */
static inline bool hv_is_hyperv_initialized(void) { return false; }
static inline bool hv_is_hibernation_supported(void) { return false; }
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 74f5e92f91a0..b53cfc4163af 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1584,6 +1584,11 @@ struct hyperv_service_callback {
void (*callback)(void *context);
};

+struct hv_dma_range {
+ dma_addr_t dma;
+ u32 mapping_size;
+};
+
#define MAX_SRV_VER 0x7ffffff
extern bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, u8 *buf, u32 buflen,
const int *fw_version, int fw_vercnt,
--
2.25.1


2021-12-07 09:47:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

On Tue, Dec 07, 2021 at 02:55:58AM -0500, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> Hyper-V provides Isolation VM which has memory encrypt support. Add
> hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
> attribute.

You need to refresh on how to write commit messages - never say what the
patch is doing - that's visible in the diff itself. Rather, you should
talk about *why* it is doing what it is doing.

> bool cc_platform_has(enum cc_attr attr)
> {
> + if (hv_is_isolation_supported())
> + return hyperv_cc_platform_has(attr);

Is there any reason for the hv_is_.. check to come before...

> +
> if (sme_me_mask)
> return amd_cc_platform_has(attr);

... the sme_me_mask check?

What's in sme_me_mask on hyperv?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-12-07 11:18:32

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

Hi Borislav:
Thanks for your review.

On 12/7/2021 5:47 PM, Borislav Petkov wrote:
> On Tue, Dec 07, 2021 at 02:55:58AM -0500, Tianyu Lan wrote:
>> From: Tianyu Lan <[email protected]>
>>
>> Hyper-V provides Isolation VM which has memory encrypt support. Add
>> hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
>> attribute.
>
> You need to refresh on how to write commit messages - never say what the
> patch is doing - that's visible in the diff itself. Rather, you should
> talk about *why* it is doing what it is doing.

Sure. Will update.

>
>> bool cc_platform_has(enum cc_attr attr)
>> {
>> + if (hv_is_isolation_supported())
>> + return hyperv_cc_platform_has(attr);
>
> Is there any reason for the hv_is_.. check to come before...
>

Do you mean to check hyper-v before sev? If yes, no special reason.


>> +
>> if (sme_me_mask)
>> return amd_cc_platform_has(attr);
>
> ... the sme_me_mask check?
>
> What's in sme_me_mask on hyperv?

sme_me_mask is unset in this case.


2021-12-08 14:52:35

by Tianyu Lan

[permalink] [raw]
Subject: [PATCH V6.1] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

From: Tianyu Lan <[email protected]>

Hyper-V provides Isolation VM which encrypt guest memory. In
isolation VM, swiotlb bounce buffer size needs to adjust
according to memory size in the sev_setup_arch(). Add GUEST_MEM_
ENCRYPT check in the Isolation VM.

Signed-off-by: Tianyu Lan <[email protected]>
---
Change since v6:
* Change the order in the cc_platform_has() and check sev first.

Change since v3:
* Change code style of checking GUEST_MEM attribute in the
hyperv_cc_platform_has().
---
arch/x86/kernel/cc_platform.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..6cb3a675e686 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
#include <linux/cc_platform.h>
#include <linux/mem_encrypt.h>

+#include <asm/mshyperv.h>
#include <asm/processor.h>

static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,12 +59,19 @@ static bool amd_cc_platform_has(enum cc_attr attr)
#endif
}

+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+ return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+}

bool cc_platform_has(enum cc_attr attr)
{
if (sme_me_mask)
return amd_cc_platform_has(attr);

+ if (hv_is_isolation_supported())
+ return hyperv_cc_platform_has(attr);
+
return false;
}
EXPORT_SYMBOL_GPL(cc_platform_has);
--
2.25.1


2021-12-08 15:12:46

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V6.1] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

On 12/8/2021 10:52 PM, Tianyu Lan wrote:
> From: Tianyu Lan <[email protected]>
>
> Hyper-V provides Isolation VM which encrypt guest memory. In
> isolation VM, swiotlb bounce buffer size needs to adjust
> according to memory size in the sev_setup_arch(). Add GUEST_MEM_
> ENCRYPT check in the Isolation VM.
>
> Signed-off-by: Tianyu Lan <[email protected]>

Hi Boris:
Could you check whether this version is ok with you?

Thanks.

> ---
> Change since v6:
> * Change the order in the cc_platform_has() and check sev first.
>
> Change since v3:
> * Change code style of checking GUEST_MEM attribute in the
> hyperv_cc_platform_has().
> ---
> arch/x86/kernel/cc_platform.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
> index 03bb2f343ddb..6cb3a675e686 100644
> --- a/arch/x86/kernel/cc_platform.c
> +++ b/arch/x86/kernel/cc_platform.c
> @@ -11,6 +11,7 @@
> #include <linux/cc_platform.h>
> #include <linux/mem_encrypt.h>
>
> +#include <asm/mshyperv.h>
> #include <asm/processor.h>
>
> static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
> @@ -58,12 +59,19 @@ static bool amd_cc_platform_has(enum cc_attr attr)
> #endif
> }
>
> +static bool hyperv_cc_platform_has(enum cc_attr attr)
> +{
> + return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
> +}
>
> bool cc_platform_has(enum cc_attr attr)
> {
> if (sme_me_mask)
> return amd_cc_platform_has(attr);
>
> + if (hv_is_isolation_supported())
> + return hyperv_cc_platform_has(attr);
> +
> return false;
> }
> EXPORT_SYMBOL_GPL(cc_platform_has);
>

2021-12-08 20:14:21

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver



> -----Original Message-----
> From: Tianyu Lan <[email protected]>
> Sent: Tuesday, December 7, 2021 2:56 AM
> To: KY Srinivasan <[email protected]>; Haiyang Zhang <[email protected]>; Stephen
> Hemminger <[email protected]>; [email protected]; Dexuan Cui <[email protected]>;
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; Tianyu Lan <[email protected]>; [email protected];
> Michael Kelley (LINUX) <[email protected]>
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; vkuznets <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
>
> From: Tianyu Lan <[email protected]>
>
> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> pagebuffer() stills need to be handled. Use DMA API to map/umap
> these memory during sending/receiving packet and Hyper-V swiotlb
> bounce buffer dma address will be returned. The swiotlb bounce buffer
> has been masked to be visible to host during boot up.
>
> rx/tx ring buffer is allocated via vzalloc() and they need to be
> mapped into unencrypted address space(above vTOM) before sharing
> with host and accessing. Add hv_map/unmap_memory() to map/umap rx
> /tx ring buffer.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since v3:
> * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
> with vmalloc_to_pfn() in the hv_map_memory()
>
> Change since v2:
> * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
> ---
> arch/x86/hyperv/ivm.c | 28 ++++++
> drivers/hv/hv_common.c | 11 +++
> drivers/net/hyperv/hyperv_net.h | 5 ++
> drivers/net/hyperv/netvsc.c | 136 +++++++++++++++++++++++++++++-
> drivers/net/hyperv/netvsc_drv.c | 1 +
> drivers/net/hyperv/rndis_filter.c | 2 +
> include/asm-generic/mshyperv.h | 2 +
> include/linux/hyperv.h | 5 ++
> 8 files changed, 187 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 69c7a57f3307..2b994117581e 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount,
> bool visibl
> kfree(pfn_array);
> return ret;
> }
> +
> +/*
> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
> + */
> +void *hv_map_memory(void *addr, unsigned long size)
> +{
> + unsigned long *pfns = kcalloc(size / PAGE_SIZE,
> + sizeof(unsigned long), GFP_KERNEL);
> + void *vaddr;
> + int i;
> +
> + if (!pfns)
> + return NULL;
> +
> + for (i = 0; i < size / PAGE_SIZE; i++)
> + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
> + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
> +
> + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
> + kfree(pfns);
> +
> + return vaddr;
> +}
> +
> +void hv_unmap_memory(void *addr)
> +{
> + vunmap(addr);
> +}
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 7be173a99f27..3c5cb1f70319 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output,
> u32 input_s
> return HV_STATUS_INVALID_PARAMETER;
> }
> EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
> +
> +void __weak *hv_map_memory(void *addr, unsigned long size)
> +{
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(hv_map_memory);
> +
> +void __weak hv_unmap_memory(void *addr)
> +{
> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_memory);
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 315278a7cf88..cf69da0e296c 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
> u32 total_bytes;
> u32 send_buf_index;
> u32 total_data_buflen;
> + struct hv_dma_range *dma_range;
> };
>
> #define NETVSC_HASH_KEYLEN 40
> @@ -1074,6 +1075,7 @@ struct netvsc_device {
>
> /* Receive buffer allocated by us but manages by NetVSP */
> void *recv_buf;
> + void *recv_original_buf;
> u32 recv_buf_size; /* allocated bytes */
> struct vmbus_gpadl recv_buf_gpadl_handle;
> u32 recv_section_cnt;
> @@ -1082,6 +1084,7 @@ struct netvsc_device {
>
> /* Send buffer allocated by us */
> void *send_buf;
> + void *send_original_buf;
> u32 send_buf_size;
> struct vmbus_gpadl send_buf_gpadl_handle;
> u32 send_section_cnt;
> @@ -1731,4 +1734,6 @@ struct rndis_message {
> #define RETRY_US_HI 10000
> #define RETRY_MAX 2000 /* >10 sec */
>
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> + struct hv_netvsc_packet *packet);
> #endif /* _HYPERV_NET_H */
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 396bc1c204e6..b7ade735a806 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
> int i;
>
> kfree(nvdev->extension);
> - vfree(nvdev->recv_buf);
> - vfree(nvdev->send_buf);
> +
> + if (nvdev->recv_original_buf) {
> + hv_unmap_memory(nvdev->recv_buf);
> + vfree(nvdev->recv_original_buf);
> + } else {
> + vfree(nvdev->recv_buf);
> + }
> +
> + if (nvdev->send_original_buf) {
> + hv_unmap_memory(nvdev->send_buf);
> + vfree(nvdev->send_original_buf);
> + } else {
> + vfree(nvdev->send_buf);
> + }
> +
> kfree(nvdev->send_section_map);
>
> for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
> @@ -338,6 +351,7 @@ static int netvsc_init_buf(struct hv_device *device,
> unsigned int buf_size;
> size_t map_words;
> int i, ret = 0;
> + void *vaddr;
>
> /* Get receive buffer area. */
> buf_size = device_info->recv_sections * device_info->recv_section_size;
> @@ -373,6 +387,17 @@ static int netvsc_init_buf(struct hv_device *device,
> goto cleanup;
> }
>
> + if (hv_isolation_type_snp()) {
> + vaddr = hv_map_memory(net_device->recv_buf, buf_size);
> + if (!vaddr) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + net_device->recv_original_buf = net_device->recv_buf;
> + net_device->recv_buf = vaddr;
> + }
> +
> /* Notify the NetVsp of the gpadl handle */
> init_packet = &net_device->channel_init_pkt;
> memset(init_packet, 0, sizeof(struct nvsp_message));
> @@ -476,6 +501,17 @@ static int netvsc_init_buf(struct hv_device *device,
> goto cleanup;
> }
>
> + if (hv_isolation_type_snp()) {
> + vaddr = hv_map_memory(net_device->send_buf, buf_size);
> + if (!vaddr) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> +
> + net_device->send_original_buf = net_device->send_buf;
> + net_device->send_buf = vaddr;
> + }
> +
> /* Notify the NetVsp of the gpadl handle */
> init_packet = &net_device->channel_init_pkt;
> memset(init_packet, 0, sizeof(struct nvsp_message));
> @@ -766,7 +802,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
>
> /* Notify the layer above us */
> if (likely(skb)) {
> - const struct hv_netvsc_packet *packet
> + struct hv_netvsc_packet *packet
> = (struct hv_netvsc_packet *)skb->cb;
> u32 send_index = packet->send_buf_index;
> struct netvsc_stats *tx_stats;
> @@ -782,6 +818,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
> tx_stats->bytes += packet->total_bytes;
> u64_stats_update_end(&tx_stats->syncp);
>
> + netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
> napi_consume_skb(skb, budget);
> }
>
> @@ -946,6 +983,88 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
> memset(dest, 0, padding);
> }
>
> +void netvsc_dma_unmap(struct hv_device *hv_dev,
> + struct hv_netvsc_packet *packet)
> +{
> + u32 page_count = packet->cp_partial ?
> + packet->page_buf_cnt - packet->rmsg_pgcnt :
> + packet->page_buf_cnt;
> + int i;
> +
> + if (!hv_is_isolation_supported())
> + return;
> +
> + if (!packet->dma_range)
> + return;
> +
> + for (i = 0; i < page_count; i++)
> + dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
> + packet->dma_range[i].mapping_size,
> + DMA_TO_DEVICE);
> +
> + kfree(packet->dma_range);
> +}
> +
> +/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
> + * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
> + * VM.
> + *
> + * In isolation VM, netvsc send buffer has been marked visible to
> + * host and so the data copied to send buffer doesn't need to use
> + * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
> + * may not be copied to send buffer and so these pages need to be
> + * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
> + * that. The pfns in the struct hv_page_buffer need to be converted
> + * to bounce buffer's pfn. The loop here is necessary because the
> + * entries in the page buffer array are not necessarily full
> + * pages of data. Each entry in the array has a separate offset and
> + * len that may be non-zero, even for entries in the middle of the
> + * array. And the entries are not physically contiguous. So each
> + * entry must be individually mapped rather than as a contiguous unit.
> + * So not use dma_map_sg() here.
> + */
> +int netvsc_dma_map(struct hv_device *hv_dev,
> + struct hv_netvsc_packet *packet,
> + struct hv_page_buffer *pb)
> +{
> + u32 page_count = packet->cp_partial ?
> + packet->page_buf_cnt - packet->rmsg_pgcnt :
> + packet->page_buf_cnt;
> + dma_addr_t dma;
> + int i;
> +
> + if (!hv_is_isolation_supported())
> + return 0;
> +
> + packet->dma_range = kcalloc(page_count,
> + sizeof(*packet->dma_range),
> + GFP_KERNEL);
> + if (!packet->dma_range)
> + return -ENOMEM;
> +
> + for (i = 0; i < page_count; i++) {
> + char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
> + + pb[i].offset);
> + u32 len = pb[i].len;
> +
> + dma = dma_map_single(&hv_dev->device, src, len,
> + DMA_TO_DEVICE);
> + if (dma_mapping_error(&hv_dev->device, dma)) {
> + kfree(packet->dma_range);
> + return -ENOMEM;
> + }
> +
> + /* pb[].offset and pb[].len are not changed during dma mapping
> + * and so not reassign.
> + */
> + packet->dma_range[i].dma = dma;
> + packet->dma_range[i].mapping_size = len;
> + pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
> + }
> +
> + return 0;
> +}
> +
> static inline int netvsc_send_pkt(
> struct hv_device *device,
> struct hv_netvsc_packet *packet,
> @@ -986,14 +1105,24 @@ static inline int netvsc_send_pkt(
>
> trace_nvsp_send_pkt(ndev, out_channel, rpkt);
>
> + packet->dma_range = NULL;
> if (packet->page_buf_cnt) {
> if (packet->cp_partial)
> pb += packet->rmsg_pgcnt;
>
> + ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
> + if (ret) {
> + ret = -EAGAIN;
> + goto exit;
> + }

Returning EAGAIN will let the upper network layer busy retry,
which may make things worse.
I suggest to return ENOSPC here like another place in this
function, which will just drop the packet, and let the network
protocol/app layer decide how to recover.

Thanks,
- Haiyang

2021-12-09 08:00:38

by Long Li

[permalink] [raw]
Subject: RE: [PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

> Subject: [PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
>
> From: Tianyu Lan <[email protected]>
>
> In Isolation VM, all shared memory with host needs to mark visible to host via
> hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer.
> The page buffer used by vmbus_sendpacket_
> mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to
> map these memory during sending/receiving packet and return swiotlb bounce
> buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be
> visible to host and the swiotlb force mode is enabled.
>
> Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the
> original data offset in the bounce buffer.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> drivers/hv/vmbus_drv.c | 4 ++++
> drivers/scsi/storvsc_drv.c | 37 +++++++++++++++++++++----------------
> include/linux/hyperv.h | 1 +
> 3 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index
> 392c1ac4f819..ae6ec503399a 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -33,6 +33,7 @@
> #include <linux/random.h>
> #include <linux/kernel.h>
> #include <linux/syscore_ops.h>
> +#include <linux/dma-map-ops.h>
> #include <clocksource/hyperv_timer.h>
> #include "hyperv_vmbus.h"
>
> @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t
> *type,
> return child_device_obj;
> }
>
> +static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
> /*
> * vmbus_device_register - Register the child device
> */
> @@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device
> *child_device_obj)
> }
> hv_debug_add_dev_dir(child_device_obj);
>
> + child_device_obj->device.dma_mask = &vmbus_dma_mask;
> + child_device_obj->device.dma_parms = &child_device_obj->dma_parms;
> return 0;
>
> err_kset_unregister:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index
> 20595c0ba0ae..ae293600d799 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
> #include <linux/device.h>
> #include <linux/hyperv.h>
> #include <linux/blkdev.h>
> +#include <linux/dma-mapping.h>
> +
> #include <scsi/scsi.h>
> #include <scsi/scsi_cmnd.h>
> #include <scsi/scsi_host.h>
> @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void
> *context)
> continue;
> }
> request = (struct storvsc_cmd_request
> *)scsi_cmd_priv(scmnd);
> + scsi_dma_unmap(scmnd);
> }
>
> storvsc_on_receive(stor_device, packet, request); @@
> -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> struct scsi_cmnd *scmnd)
> struct hv_host_device *host_dev = shost_priv(host);
> struct hv_device *dev = host_dev->dev;
> struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> - int i;
> struct scatterlist *sgl;
> unsigned int sg_count;
> struct vmscsi_request *vm_srb;
> @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *scmnd)
> payload_sz = sizeof(cmd_request->mpb);
>
> if (sg_count) {
> - unsigned int hvpgoff, hvpfns_to_add;
> unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
> unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> - u64 hvpfn;
> + struct scatterlist *sg;
> + unsigned long hvpfn, hvpfns_to_add;
> + int j, i = 0;
>
> if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
>
> @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host
> *host, struct scsi_cmnd *scmnd)
> payload->range.len = length;
> payload->range.offset = offset_in_hvpg;
>
> + sg_count = scsi_dma_map(scmnd);
> + if (sg_count < 0)
> + return SCSI_MLQUEUE_DEVICE_BUSY;

Hi Tianyu,

This patch (and this patch series) unconditionally adds code for dealing with DMA addresses for all VMs, including non-isolation VMs.

Does this add performance penalty for VMs that don't require isolation?

Long

>
> - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
> + for_each_sg(sgl, sg, sg_count, j) {
> /*
> - * Init values for the current sgl entry. hvpgoff
> - * and hvpfns_to_add are in units of Hyper-V size
> - * pages. Handling the PAGE_SIZE !=
> HV_HYP_PAGE_SIZE
> - * case also handles values of sgl->offset that are
> - * larger than PAGE_SIZE. Such offsets are handled
> - * even on other than the first sgl entry, provided
> - * they are a multiple of PAGE_SIZE.
> + * Init values for the current sgl entry. hvpfns_to_add
> + * is in units of Hyper-V size pages. Handling the
> + * PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
> + * values of sgl->offset that are larger than PAGE_SIZE.
> + * Such offsets are handled even on other than the first
> + * sgl entry, provided they are a multiple of PAGE_SIZE.
> */
> - hvpgoff = HVPFN_DOWN(sgl->offset);
> - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
> - hvpfns_to_add = HVPFN_UP(sgl->offset + sgl-
> >length) -
> - hvpgoff;
> + hvpfn = HVPFN_DOWN(sg_dma_address(sg));
> + hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
> + sg_dma_len(sg)) - hvpfn;
>
> /*
> * Fill the next portion of the PFN array with @@ -
> 1872,7 +1876,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> struct scsi_cmnd *scmnd)
> * the PFN array is filled.
> */
> while (hvpfns_to_add--)
> - payload->range.pfn_array[i++] = hvpfn++;
> + payload->range.pfn_array[i++] = hvpfn++;
> }
> }
>
> @@ -2016,6 +2020,7 @@ static int storvsc_probe(struct hv_device *device,
> stor_device->vmscsi_size_delta = sizeof(struct vmscsi_win8_extension);
> spin_lock_init(&stor_device->lock);
> hv_set_drvdata(device, stor_device);
> + dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);
>
> stor_device->port_number = host->host_no;
> ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size, is_fc);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index
> 1f037e114dc8..74f5e92f91a0 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1261,6 +1261,7 @@ struct hv_device {
>
> struct vmbus_channel *channel;
> struct kset *channels_kset;
> + struct device_dma_parameters dma_parms;
>
> /* place holder to keep track of the dir for hv device in debugfs */
> struct dentry *debug_dir;
> --
> 2.25.1


2021-12-09 08:08:45

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver



On 12/9/2021 4:14 AM, Haiyang Zhang wrote:
>> From: Tianyu Lan <[email protected]>
>> Sent: Tuesday, December 7, 2021 2:56 AM
>> To: KY Srinivasan <[email protected]>; Haiyang Zhang <[email protected]>; Stephen
>> Hemminger <[email protected]>; [email protected]; Dexuan Cui <[email protected]>;
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; Tianyu Lan <[email protected]>; [email protected];
>> Michael Kelley (LINUX) <[email protected]>
>> Cc: [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected];
>> [email protected]; vkuznets <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
>>
>> From: Tianyu Lan <[email protected]>
>>
>> In Isolation VM, all shared memory with host needs to mark visible
>> to host via hvcall. vmbus_establish_gpadl() has already done it for
>> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
>> pagebuffer() stills need to be handled. Use DMA API to map/umap
>> these memory during sending/receiving packet and Hyper-V swiotlb
>> bounce buffer dma address will be returned. The swiotlb bounce buffer
>> has been masked to be visible to host during boot up.
>>
>> rx/tx ring buffer is allocated via vzalloc() and they need to be
>> mapped into unencrypted address space(above vTOM) before sharing
>> with host and accessing. Add hv_map/unmap_memory() to map/umap rx
>> /tx ring buffer.
>>
>> Signed-off-by: Tianyu Lan <[email protected]>
>> ---
>> Change since v3:
>> * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
>> with vmalloc_to_pfn() in the hv_map_memory()
>>
>> Change since v2:
>> * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
>> ---
>> arch/x86/hyperv/ivm.c | 28 ++++++
>> drivers/hv/hv_common.c | 11 +++
>> drivers/net/hyperv/hyperv_net.h | 5 ++
>> drivers/net/hyperv/netvsc.c | 136 +++++++++++++++++++++++++++++-
>> drivers/net/hyperv/netvsc_drv.c | 1 +
>> drivers/net/hyperv/rndis_filter.c | 2 +
>> include/asm-generic/mshyperv.h | 2 +
>> include/linux/hyperv.h | 5 ++
>> 8 files changed, 187 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
>> index 69c7a57f3307..2b994117581e 100644
>> --- a/arch/x86/hyperv/ivm.c
>> +++ b/arch/x86/hyperv/ivm.c
>> @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount,
>> bool visibl
>> kfree(pfn_array);
>> return ret;
>> }
>> +
>> +/*
>> + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
>> + */
>> +void *hv_map_memory(void *addr, unsigned long size)
>> +{
>> + unsigned long *pfns = kcalloc(size / PAGE_SIZE,
>> + sizeof(unsigned long), GFP_KERNEL);
>> + void *vaddr;
>> + int i;
>> +
>> + if (!pfns)
>> + return NULL;
>> +
>> + for (i = 0; i < size / PAGE_SIZE; i++)
>> + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
>> + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
>> +
>> + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
>> + kfree(pfns);
>> +
>> + return vaddr;
>> +}
>> +
>> +void hv_unmap_memory(void *addr)
>> +{
>> + vunmap(addr);
>> +}
>> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
>> index 7be173a99f27..3c5cb1f70319 100644
>> --- a/drivers/hv/hv_common.c
>> +++ b/drivers/hv/hv_common.c
>> @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output,
>> u32 input_s
>> return HV_STATUS_INVALID_PARAMETER;
>> }
>> EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
>> +
>> +void __weak *hv_map_memory(void *addr, unsigned long size)
>> +{
>> + return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(hv_map_memory);
>> +
>> +void __weak hv_unmap_memory(void *addr)
>> +{
>> +}
>> +EXPORT_SYMBOL_GPL(hv_unmap_memory);
>> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
>> index 315278a7cf88..cf69da0e296c 100644
>> --- a/drivers/net/hyperv/hyperv_net.h
>> +++ b/drivers/net/hyperv/hyperv_net.h
>> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>> u32 total_bytes;
>> u32 send_buf_index;
>> u32 total_data_buflen;
>> + struct hv_dma_range *dma_range;
>> };
>>
>> #define NETVSC_HASH_KEYLEN 40
>> @@ -1074,6 +1075,7 @@ struct netvsc_device {
>>
>> /* Receive buffer allocated by us but manages by NetVSP */
>> void *recv_buf;
>> + void *recv_original_buf;
>> u32 recv_buf_size; /* allocated bytes */
>> struct vmbus_gpadl recv_buf_gpadl_handle;
>> u32 recv_section_cnt;
>> @@ -1082,6 +1084,7 @@ struct netvsc_device {
>>
>> /* Send buffer allocated by us */
>> void *send_buf;
>> + void *send_original_buf;
>> u32 send_buf_size;
>> struct vmbus_gpadl send_buf_gpadl_handle;
>> u32 send_section_cnt;
>> @@ -1731,4 +1734,6 @@ struct rndis_message {
>> #define RETRY_US_HI 10000
>> #define RETRY_MAX 2000 /* >10 sec */
>>
>> +void netvsc_dma_unmap(struct hv_device *hv_dev,
>> + struct hv_netvsc_packet *packet);
>> #endif /* _HYPERV_NET_H */
>> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
>> index 396bc1c204e6..b7ade735a806 100644
>> --- a/drivers/net/hyperv/netvsc.c
>> +++ b/drivers/net/hyperv/netvsc.c
>> @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
>> int i;
>>
>> kfree(nvdev->extension);
>> - vfree(nvdev->recv_buf);
>> - vfree(nvdev->send_buf);
>> +
>> + if (nvdev->recv_original_buf) {
>> + hv_unmap_memory(nvdev->recv_buf);
>> + vfree(nvdev->recv_original_buf);
>> + } else {
>> + vfree(nvdev->recv_buf);
>> + }
>> +
>> + if (nvdev->send_original_buf) {
>> + hv_unmap_memory(nvdev->send_buf);
>> + vfree(nvdev->send_original_buf);
>> + } else {
>> + vfree(nvdev->send_buf);
>> + }
>> +
>> kfree(nvdev->send_section_map);
>>
>> for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
>> @@ -338,6 +351,7 @@ static int netvsc_init_buf(struct hv_device *device,
>> unsigned int buf_size;
>> size_t map_words;
>> int i, ret = 0;
>> + void *vaddr;
>>
>> /* Get receive buffer area. */
>> buf_size = device_info->recv_sections * device_info->recv_section_size;
>> @@ -373,6 +387,17 @@ static int netvsc_init_buf(struct hv_device *device,
>> goto cleanup;
>> }
>>
>> + if (hv_isolation_type_snp()) {
>> + vaddr = hv_map_memory(net_device->recv_buf, buf_size);
>> + if (!vaddr) {
>> + ret = -ENOMEM;
>> + goto cleanup;
>> + }
>> +
>> + net_device->recv_original_buf = net_device->recv_buf;
>> + net_device->recv_buf = vaddr;
>> + }
>> +
>> /* Notify the NetVsp of the gpadl handle */
>> init_packet = &net_device->channel_init_pkt;
>> memset(init_packet, 0, sizeof(struct nvsp_message));
>> @@ -476,6 +501,17 @@ static int netvsc_init_buf(struct hv_device *device,
>> goto cleanup;
>> }
>>
>> + if (hv_isolation_type_snp()) {
>> + vaddr = hv_map_memory(net_device->send_buf, buf_size);
>> + if (!vaddr) {
>> + ret = -ENOMEM;
>> + goto cleanup;
>> + }
>> +
>> + net_device->send_original_buf = net_device->send_buf;
>> + net_device->send_buf = vaddr;
>> + }
>> +
>> /* Notify the NetVsp of the gpadl handle */
>> init_packet = &net_device->channel_init_pkt;
>> memset(init_packet, 0, sizeof(struct nvsp_message));
>> @@ -766,7 +802,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
>>
>> /* Notify the layer above us */
>> if (likely(skb)) {
>> - const struct hv_netvsc_packet *packet
>> + struct hv_netvsc_packet *packet
>> = (struct hv_netvsc_packet *)skb->cb;
>> u32 send_index = packet->send_buf_index;
>> struct netvsc_stats *tx_stats;
>> @@ -782,6 +818,7 @@ static void netvsc_send_tx_complete(struct net_device *ndev,
>> tx_stats->bytes += packet->total_bytes;
>> u64_stats_update_end(&tx_stats->syncp);
>>
>> + netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
>> napi_consume_skb(skb, budget);
>> }
>>
>> @@ -946,6 +983,88 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
>> memset(dest, 0, padding);
>> }
>>
>> +void netvsc_dma_unmap(struct hv_device *hv_dev,
>> + struct hv_netvsc_packet *packet)
>> +{
>> + u32 page_count = packet->cp_partial ?
>> + packet->page_buf_cnt - packet->rmsg_pgcnt :
>> + packet->page_buf_cnt;
>> + int i;
>> +
>> + if (!hv_is_isolation_supported())
>> + return;
>> +
>> + if (!packet->dma_range)
>> + return;
>> +
>> + for (i = 0; i < page_count; i++)
>> + dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
>> + packet->dma_range[i].mapping_size,
>> + DMA_TO_DEVICE);
>> +
>> + kfree(packet->dma_range);
>> +}
>> +
>> +/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
>> + * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
>> + * VM.
>> + *
>> + * In isolation VM, netvsc send buffer has been marked visible to
>> + * host and so the data copied to send buffer doesn't need to use
>> + * bounce buffer. The data pages handled by vmbus_sendpacket_pagebuffer()
>> + * may not be copied to send buffer and so these pages need to be
>> + * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
>> + * that. The pfns in the struct hv_page_buffer need to be converted
>> + * to bounce buffer's pfn. The loop here is necessary because the
>> + * entries in the page buffer array are not necessarily full
>> + * pages of data. Each entry in the array has a separate offset and
>> + * len that may be non-zero, even for entries in the middle of the
>> + * array. And the entries are not physically contiguous. So each
>> + * entry must be individually mapped rather than as a contiguous unit.
>> + * So not use dma_map_sg() here.
>> + */
>> +int netvsc_dma_map(struct hv_device *hv_dev,
>> + struct hv_netvsc_packet *packet,
>> + struct hv_page_buffer *pb)
>> +{
>> + u32 page_count = packet->cp_partial ?
>> + packet->page_buf_cnt - packet->rmsg_pgcnt :
>> + packet->page_buf_cnt;
>> + dma_addr_t dma;
>> + int i;
>> +
>> + if (!hv_is_isolation_supported())
>> + return 0;
>> +
>> + packet->dma_range = kcalloc(page_count,
>> + sizeof(*packet->dma_range),
>> + GFP_KERNEL);
>> + if (!packet->dma_range)
>> + return -ENOMEM;
>> +
>> + for (i = 0; i < page_count; i++) {
>> + char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
>> + + pb[i].offset);
>> + u32 len = pb[i].len;
>> +
>> + dma = dma_map_single(&hv_dev->device, src, len,
>> + DMA_TO_DEVICE);
>> + if (dma_mapping_error(&hv_dev->device, dma)) {
>> + kfree(packet->dma_range);
>> + return -ENOMEM;
>> + }
>> +
>> + /* pb[].offset and pb[].len are not changed during dma mapping
>> + * and so not reassign.
>> + */
>> + packet->dma_range[i].dma = dma;
>> + packet->dma_range[i].mapping_size = len;
>> + pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static inline int netvsc_send_pkt(
>> struct hv_device *device,
>> struct hv_netvsc_packet *packet,
>> @@ -986,14 +1105,24 @@ static inline int netvsc_send_pkt(
>>
>> trace_nvsp_send_pkt(ndev, out_channel, rpkt);
>>
>> + packet->dma_range = NULL;
>> if (packet->page_buf_cnt) {
>> if (packet->cp_partial)
>> pb += packet->rmsg_pgcnt;
>>
>> + ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
>> + if (ret) {
>> + ret = -EAGAIN;
>> + goto exit;
>> + }
>
> Returning EAGAIN will let the upper network layer busy retry,
> which may make things worse.
> I suggest to return ENOSPC here like another place in this
> function, which will just drop the packet, and let the network
> protocol/app layer decide how to recover.
>

Yes, agree. Will update in the next version. Thanks.


2021-12-09 11:17:20

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver



On 12/9/2021 4:00 PM, Long Li wrote:
>> @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host
>> *host, struct scsi_cmnd *scmnd)
>> payload->range.len = length;
>> payload->range.offset = offset_in_hvpg;
>>
>> + sg_count = scsi_dma_map(scmnd);
>> + if (sg_count < 0)
>> + return SCSI_MLQUEUE_DEVICE_BUSY;
> Hi Tianyu,
>
> This patch (and this patch series) unconditionally adds code for dealing with DMA addresses for all VMs, including non-isolation VMs.
>
> Does this add performance penalty for VMs that don't require isolation?
>

Hi Long:
scsi_dma_map() in the traditional VM just save sg->offset to
sg->dma_address and no data copy because swiotlb bounce buffer code
doesn't work. The data copy only takes place in the Isolation VM and
swiotlb_force is set. So there is no additional overhead in the
traditional VM.

Thanks.

2021-12-09 19:54:35

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver

From: Haiyang Zhang <[email protected]> Sent: Wednesday, December 8, 2021 12:14 PM
> > From: Tianyu Lan <[email protected]>
> > Sent: Tuesday, December 7, 2021 2:56 AM

[snip]

> > static inline int netvsc_send_pkt(
> > struct hv_device *device,
> > struct hv_netvsc_packet *packet,
> > @@ -986,14 +1105,24 @@ static inline int netvsc_send_pkt(
> >
> > trace_nvsp_send_pkt(ndev, out_channel, rpkt);
> >
> > + packet->dma_range = NULL;
> > if (packet->page_buf_cnt) {
> > if (packet->cp_partial)
> > pb += packet->rmsg_pgcnt;
> >
> > + ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
> > + if (ret) {
> > + ret = -EAGAIN;
> > + goto exit;
> > + }
>
> Returning EAGAIN will let the upper network layer busy retry,
> which may make things worse.
> I suggest to return ENOSPC here like another place in this
> function, which will just drop the packet, and let the network
> protocol/app layer decide how to recover.
>
> Thanks,
> - Haiyang

I made the original suggestion to return -EAGAIN here. A
DMA mapping failure should occur only if swiotlb bounce
buffer space is unavailable, which is a transient condition.
The existing code already stops the queue and returns
-EAGAIN when the ring buffer is full, which is also a transient
condition. My sense is that the two conditions should be
handled the same way. Or is there a reason why a ring
buffer full condition should stop the queue and retry, while
a mapping failure should drop the packet?

Michael

2021-12-09 20:25:28

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

From: Tianyu Lan <[email protected]> Sent: Monday, December 6, 2021 11:56 PM
>
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
>
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
>
> Swiotlb bounce buffer code calls set_memory_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space via memremap. Populate the shared_gpa_boundary
> (vTOM) via swiotlb_unencrypted_base variable.
>
> The map function memremap() can't work in the early place
> (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
> attributes() in the hyperv_init().
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since v4:
> * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
> and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
> ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
> in the hyperv_init().
>
> Change since v3:
> * Add comment in pci-swiotlb-xen.c to explain why add
> dependency between hyperv_swiotlb_detect() and pci_
> xen_swiotlb_detect().
> * Return directly when fails to allocate Hyper-V swiotlb
> buffer in the hyperv_iommu_swiotlb_init().
> ---
> arch/x86/hyperv/hv_init.c | 10 ++++++++++
> arch/x86/kernel/cpu/mshyperv.c | 11 ++++++++++-
> include/linux/hyperv.h | 8 ++++++++
> 3 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 24f4a06ac46a..9e18a280f89d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -28,6 +28,7 @@
> #include <linux/syscore_ops.h>
> #include <clocksource/hyperv_timer.h>
> #include <linux/highmem.h>
> +#include <linux/swiotlb.h>
>
> int hyperv_init_cpuhp;
> u64 hv_current_partition_id = ~0ull;
> @@ -502,6 +503,15 @@ void __init hyperv_init(void)
>
> /* Query the VMs extended capability once, so that it can be cached. */
> hv_query_ext_cap(0);
> +
> + /*
> + * Swiotlb bounce buffer needs to be mapped in extra address
> + * space. Map function doesn't work in the early place and so
> + * call swiotlb_update_mem_attributes() here.
> + */
> + if (hv_is_isolation_supported())
> + swiotlb_update_mem_attributes();
> +
> return;
>
> clean_guest_os_id:
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 4794b716ec79..baf3a0873552 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -18,6 +18,7 @@
> #include <linux/kexec.h>
> #include <linux/i8253.h>
> #include <linux/random.h>
> +#include <linux/swiotlb.h>
> #include <asm/processor.h>
> #include <asm/hypervisor.h>
> #include <asm/hyperv-tlfs.h>
> @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
> pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
> ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
>
> - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
> + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> static_branch_enable(&isolation_type_snp);
> + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
> + }
> +
> + /*
> + * Enable swiotlb force mode in Isolation VM to
> + * use swiotlb bounce buffer for dma transaction.
> + */
> + swiotlb_force = SWIOTLB_FORCE;

I'm good with this approach that directly updates the swiotlb settings here
rather than in IOMMU initialization code. It's a lot more straightforward.

However, there's an issue if building for X86_32 without PAE, in that the
swiotlb module may not be built, resulting in compile and link errors. The
swiotlb.h file needs to be updated to provide a stub function for
swiotlb_update_mem_attributes(). swiotlb_unencrypted_base probably
needs wrapper functions to get/set it, which can be stubs when
CONFIG_SWIOTLB is not set. swiotlb_force is a bit of a mess in that it already
has a stub definition that assumes it will only be read, and not set. A bit of
thinking will be needed to sort that out.

> }
>
> if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index b823311eac79..1f037e114dc8 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
> int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
> void (*block_invalidate)(void *context,
> u64 block_mask));
> +#if IS_ENABLED(CONFIG_HYPERV)
> +int __init hyperv_swiotlb_detect(void);
> +#else
> +static inline int __init hyperv_swiotlb_detect(void)
> +{
> + return 0;
> +}
> +#endif

I don't think hyperv_swiotlb_detect() is used any longer, so this change
should be dropped.

>
> struct hyperv_pci_block_ops {
> int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
> --
> 2.25.1


2021-12-09 20:38:28

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

From: Tianyu Lan <[email protected]> Sent: Monday, December 6, 2021 11:56 PM
>
> Hyper-V provides Isolation VM which has memory encrypt support. Add
> hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
> attribute.
>
> Signed-off-by: Tianyu Lan <[email protected]>
> ---
> Change since v3:
> * Change code style of checking GUEST_MEM attribute in the
> hyperv_cc_platform_has().
> ---
> arch/x86/kernel/cc_platform.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
> index 03bb2f343ddb..47db88c275d5 100644
> --- a/arch/x86/kernel/cc_platform.c
> +++ b/arch/x86/kernel/cc_platform.c
> @@ -11,6 +11,7 @@
> #include <linux/cc_platform.h>
> #include <linux/mem_encrypt.h>
>
> +#include <asm/mshyperv.h>
> #include <asm/processor.h>
>
> static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
> @@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr)
> #endif
> }
>
> +static bool hyperv_cc_platform_has(enum cc_attr attr)
> +{
> + return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
> +}
>
> bool cc_platform_has(enum cc_attr attr)
> {
> + if (hv_is_isolation_supported())
> + return hyperv_cc_platform_has(attr);
> +
> if (sme_me_mask)
> return amd_cc_platform_has(attr);
>

Throughout Linux kernel code, there are about 20 calls to cc_platform_has()
with CC_ATTR_GUEST_MEM_ENCRYPT as the argument. The original code
(from v1 of this patch set) only dealt with the call in sev_setup_arch(). But
with this patch, all the other calls that previously returned "false" will now
return "true" in a Hyper-V Isolated VM. I didn't try to analyze all these other
calls, so I think there's an open question about whether this is the behavior
we want.

Michael

2021-12-09 20:40:15

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver



> -----Original Message-----
> From: Michael Kelley (LINUX) <[email protected]>
> Sent: Thursday, December 9, 2021 2:54 PM
> To: Haiyang Zhang <[email protected]>; Tianyu Lan <[email protected]>; KY
> Srinivasan <[email protected]>; Stephen Hemminger <[email protected]>;
> [email protected]; Dexuan Cui <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected]; Tianyu
> Lan <[email protected]>; [email protected]
> Cc: [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; vkuznets <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: RE: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
>
> From: Haiyang Zhang <[email protected]> Sent: Wednesday, December 8, 2021 12:14 PM
> > > From: Tianyu Lan <[email protected]>
> > > Sent: Tuesday, December 7, 2021 2:56 AM
>
> [snip]
>
> > > static inline int netvsc_send_pkt(
> > > struct hv_device *device,
> > > struct hv_netvsc_packet *packet,
> > > @@ -986,14 +1105,24 @@ static inline int netvsc_send_pkt(
> > >
> > > trace_nvsp_send_pkt(ndev, out_channel, rpkt);
> > >
> > > + packet->dma_range = NULL;
> > > if (packet->page_buf_cnt) {
> > > if (packet->cp_partial)
> > > pb += packet->rmsg_pgcnt;
> > >
> > > + ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
> > > + if (ret) {
> > > + ret = -EAGAIN;
> > > + goto exit;
> > > + }
> >
> > Returning EAGAIN will let the upper network layer busy retry,
> > which may make things worse.
> > I suggest to return ENOSPC here like another place in this
> > function, which will just drop the packet, and let the network
> > protocol/app layer decide how to recover.
> >
> > Thanks,
> > - Haiyang
>
> I made the original suggestion to return -EAGAIN here. A
> DMA mapping failure should occur only if swiotlb bounce
> buffer space is unavailable, which is a transient condition.
> The existing code already stops the queue and returns
> -EAGAIN when the ring buffer is full, which is also a transient
> condition. My sense is that the two conditions should be
> handled the same way. Or is there a reason why a ring
> buffer full condition should stop the queue and retry, while
> a mapping failure should drop the packet?

The netvsc_dma_map() fails in these two places. The dma_map_single()
is doing the maping with swiotlb bounce buffer, correct? And it will
become successful after the previous packets are unmapped?

+ packet->dma_range = kcalloc(page_count,
+ sizeof(*packet->dma_range),
+ GFP_KERNEL);

+ dma = dma_map_single(&hv_dev->device, src, len,
+ DMA_TO_DEVICE);

I recalled your previous suggestion now, and agree with you that
we can treat it the same way (return -EAGAIN) in this case. And
the existing code will stop the queue temporarily.

Thanks,
- Haiyang

2021-12-10 11:26:56

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

On 12/10/2021 4:38 AM, Michael Kelley (LINUX) wrote:
> From: Tianyu Lan <[email protected]> Sent: Monday, December 6, 2021 11:56 PM
>>
>> Hyper-V provides Isolation VM which has memory encrypt support. Add
>> hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
>> attribute.
>>
>> Signed-off-by: Tianyu Lan <[email protected]>
>> ---
>> Change since v3:
>> * Change code style of checking GUEST_MEM attribute in the
>> hyperv_cc_platform_has().
>> ---
>> arch/x86/kernel/cc_platform.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
>> index 03bb2f343ddb..47db88c275d5 100644
>> --- a/arch/x86/kernel/cc_platform.c
>> +++ b/arch/x86/kernel/cc_platform.c
>> @@ -11,6 +11,7 @@
>> #include <linux/cc_platform.h>
>> #include <linux/mem_encrypt.h>
>>
>> +#include <asm/mshyperv.h>
>> #include <asm/processor.h>
>>
>> static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
>> @@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr)
>> #endif
>> }
>>
>> +static bool hyperv_cc_platform_has(enum cc_attr attr)
>> +{
>> + return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
>> +}
>>
>> bool cc_platform_has(enum cc_attr attr)
>> {
>> + if (hv_is_isolation_supported())
>> + return hyperv_cc_platform_has(attr);
>> +
>> if (sme_me_mask)
>> return amd_cc_platform_has(attr);
>>
>
> Throughout Linux kernel code, there are about 20 calls to cc_platform_has()
> with CC_ATTR_GUEST_MEM_ENCRYPT as the argument. The original code
> (from v1 of this patch set) only dealt with the call in sev_setup_arch(). But
> with this patch, all the other calls that previously returned "false" will now
> return "true" in a Hyper-V Isolated VM. I didn't try to analyze all these other
> calls, so I think there's an open question about whether this is the behavior
> we want.
>

CC_ATTR_GUEST_MEM_ENCRYPT is for SEV support so far. Hyper-V Isolation
VM is based on SEV or software memory encrypt. Most checks can be
reused. The difference is that SEV code use encrypt bit in the page
table to encrypt and decrypt memory while Hyper-V uses vTOM. But the sev
memory encrypt mask "sme_me_mask" is unset in the Hyper-V Isolation VM
where claims sev and sme are unsupported. The rest of checks for mem enc
bit are still safe. So reuse CC_ATTR_GUEST_MEM_ENCRYPT for Hyper-V.



2021-12-10 13:25:55

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

On 12/10/2021 4:09 AM, Michael Kelley (LINUX) wrote:
>> @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
>> pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>> ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
>>
>> - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
>> + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>> static_branch_enable(&isolation_type_snp);
>> + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
>> + }
>> +
>> + /*
>> + * Enable swiotlb force mode in Isolation VM to
>> + * use swiotlb bounce buffer for dma transaction.
>> + */
>> + swiotlb_force = SWIOTLB_FORCE;
> I'm good with this approach that directly updates the swiotlb settings here
>
> rather than in IOMMU initialization code. It's a lot more straightforward.
>
> However, there's an issue if building for X86_32 without PAE, in that the
> swiotlb module may not be built, resulting in compile and link errors. The
> swiotlb.h file needs to be updated to provide a stub function for
> swiotlb_update_mem_attributes(). swiotlb_unencrypted_base probably
> needs wrapper functions to get/set it, which can be stubs when
> CONFIG_SWIOTLB is not set. swiotlb_force is a bit of a mess in that it already
> has a stub definition that assumes it will only be read, and not set. A bit of
> thinking will be needed to sort that out.

It's ok to fix the issue via selecting swiotlb when CONFIG_HYPERV is
set?

>
>> }
>>
>> if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
>> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
>> index b823311eac79..1f037e114dc8 100644
>> --- a/include/linux/hyperv.h
>> +++ b/include/linux/hyperv.h
>> @@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
>> int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
>> void (*block_invalidate)(void *context,
>> u64 block_mask));
>> +#if IS_ENABLED(CONFIG_HYPERV)
>> +int __init hyperv_swiotlb_detect(void);
>> +#else
>> +static inline int __init hyperv_swiotlb_detect(void)
>> +{
>> + return 0;
>> +}
>> +#endif
> I don't think hyperv_swiotlb_detect() is used any longer, so this change
> should be dropped.
Yes, will update.

2021-12-10 14:01:32

by Tianyu Lan

[permalink] [raw]
Subject: Re: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM



On 12/10/2021 9:25 PM, Tianyu Lan wrote:
>>> @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
>>>           pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B
>>> 0x%x\n",
>>>               ms_hyperv.isolation_config_a,
>>> ms_hyperv.isolation_config_b);
>>>
>>> -        if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
>>> +        if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>>>               static_branch_enable(&isolation_type_snp);
>>> +            swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
>>> +        }
>>> +
>>> +        /*
>>> +         * Enable swiotlb force mode in Isolation VM to
>>> +         * use swiotlb bounce buffer for dma transaction.
>>> +         */
>>> +        swiotlb_force = SWIOTLB_FORCE;
>> I'm good with this approach that directly updates the swiotlb settings
>> here
>>
>> rather than in IOMMU initialization code.  It's a lot more
>> straightforward.
>>
>> However, there's an issue if building for X86_32 without PAE, in that the
>> swiotlb module may not be built, resulting in compile and link
>> errors.  The
>> swiotlb.h file needs to be updated to provide a stub function for
>> swiotlb_update_mem_attributes().   swiotlb_unencrypted_base probably
>> needs wrapper functions to get/set it, which can be stubs when
>> CONFIG_SWIOTLB is not set.  swiotlb_force is a bit of a mess in that
>> it already
>> has a stub definition that assumes it will only be read, and not set.
>> A bit of
>> thinking will be needed to sort that out.
>
> It's ok to fix the issue via selecting swiotlb when CONFIG_HYPERV is
> set?
>
Sorry. ignore the previous statement. These codes doesn't depend on
CONFIG_HYPERV.

How about making these code under #ifdef CONFIG_X86_64 or CONFIG_SWIOTLB?