2020-04-28 11:41:34

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 0/5] virtio on Type-1 hypervisor

We ran into several problems in using virtio for IO paravirtualization on a
Type-1 hypervisor with these characteristics:

* By default, all of a guests's memory is private to it (no other guest can
access its memory).

* One of the VM is considered as primary and has access to most IO devices. This
is similar to dom0 VM in case of Xen. All other VMs are considered as
secondary VMs

* virtio-backend drivers for all secondary VMs need to be hosted in primary VM

* Since secondary VM's memory is not accessible to primary VM, to make virtio
backend driver work, instead an additional piece of memory is provisioned
by the hypervisor that is shared between primary and secondary VMs. This
shared memory can be used, for example, to host virtio-ring structures
and also to bounce IO buffers of secondary VM.

* Message-queue and doorbell interfaces available in hypervisor to support
inter-VM communication. Messge-queue API (send/recv) allows one VM to send
short messages to another VM. Doorbell interface allows a VM to inject
an interrupt into another VM.

* No support for MMIO transport i.e hypervisor does not support trapping MMIO
config space access by front-end driver and having it handled in backend
drivers.

Few problem statements arising out of this:

1) How can we make use of the shared memory region effectively to make virtio
work in this scenario?

What is proposed in the patch series for this problem is a virtio bounce driver
that recognizes a shared memory region (shared between VMs) and makes use of
swiotlb driver interfaces to bounce IO buffers between private and shared space.
Some changes are proposed to swiotlb driver in this regard, that can let us
reuse swiotlb functions to work with the shared memory pool. swiotlb driver can
now recognize more than one pool of memory and extend its functions
(allocate/free/bounce memory chunks) for each pool.

2) What transport mechanism works best in this case?

I realize that ivshmem2-virtio proposal has discussed the possibility of using
shared memory + doorbell as a virtio transport option. We can consider using
that as a transport in case it will be acceptable upstream. Other option we had
considered was to modify virtio_mmio.c itself to use message_queue send/recv
hypercall interface (in place of readl/writel). That could be abstracted via
'mmio_ops' structure providing suitable implementation of readl/writel. Another
option suggested by Christopher Dall is to unmap the config space from kernel
address space and as part of the fault handler, use hypervisor specific APIs to
achieve config space IO.

3) Which virtio backend drivers to leverage?

We realized there are multiple implementations of virtio backend drivers,
bundled as part of separate projects (Qemu, lkvm etc). We think it would be nice
if we had some consolidation in that regard so that we can make use of the
backend drivers that are not tightly coupled with a VMM. In our case, we need to
be able to run virtio backend drivers as standalone programs (and not coupled
with any VMM).


Srivatsa Vaddagiri (5):
swiotlb: Introduce concept of swiotlb_pool
swiotlb: Allow for non-linear mapping between paddr and vaddr
swiotlb: Add alloc and free APIs
swiotlb: Add API to register new pool
virtio: Add bounce DMA ops

drivers/virtio/Makefile | 2 +-
drivers/virtio/virtio.c | 2 +
drivers/virtio/virtio_bounce.c | 150 +++++++++++
drivers/xen/swiotlb-xen.c | 4 +-
include/linux/swiotlb.h | 157 +++++++++++-
include/linux/virtio.h | 4 +
kernel/dma/swiotlb.c | 556 ++++++++++++++++++++++++-----------------
7 files changed, 638 insertions(+), 237 deletions(-)
create mode 100644 drivers/virtio/virtio_bounce.c

--
2.7.4

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


2020-04-28 11:41:45

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 1/5] swiotlb: Introduce concept of swiotlb_pool

Currently swiotlb driver manages a global pool of memory which
acts as bounce buffers for memory that is not accessible to some
devices. The core functions provides by this driver to
allocate/free/bounce memory chunks will be more
useful if this driver can manage more than one pool. An immediate
application of such extension to swiotlb driver is to bounce
virtio buffers between private and shared space of a VM.

This patch introduces the concept of a swiotlb memory pool and
reorganizes current driver to work with the default global pool.
There is no functional change introduced by this patch.
Subsequent patches allow the swiotlb driver to work with more
than one pool of memory.

Signed-off-by: Srivatsa Vaddagiri <[email protected]>
---
drivers/xen/swiotlb-xen.c | 4 +-
include/linux/swiotlb.h | 129 ++++++++++++++++-
kernel/dma/swiotlb.c | 359 +++++++++++++++++++++++-----------------------
3 files changed, 307 insertions(+), 185 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b6d2776..c2dc9c8 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -190,8 +190,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
/*
* IO TLB memory already allocated. Just use it.
*/
- if (io_tlb_start != 0) {
- xen_io_tlb_start = phys_to_virt(io_tlb_start);
+ if (swiotlb_start()) {
+ xen_io_tlb_start = phys_to_virt(swiotlb_start());
goto end;
}

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 046bb94..8c7843f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -44,7 +44,59 @@ enum dma_sync_target {
SYNC_FOR_DEVICE = 1,
};

-extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
+#define MAX_POOL_NAME_SIZE 16
+
+struct swiotlb_pool {
+ char name[MAX_POOL_NAME_SIZE];
+ bool no_iotlb_memory;
+ int late_alloc;
+
+ spinlock_t io_tlb_lock;
+
+ /*
+ * Used to do a quick range check in swiotlb_tbl_unmap_single and
+ * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated
+ * by this API.
+ */
+
+ phys_addr_t io_tlb_start, io_tlb_end;
+
+ /*
+ * The number of IO TLB blocks (in groups of 64) between io_tlb_start
+ * and io_tlb_end. This is command line adjustable via
+ * setup_io_tlb_npages.
+ */
+ unsigned long io_tlb_nslabs;
+
+ /*
+ * The number of used IO TLB block
+ */
+ unsigned long io_tlb_used;
+
+ /*
+ * This is a free list describing the number of free entries available
+ * from each index
+ */
+ unsigned int *io_tlb_list;
+ unsigned int io_tlb_index;
+
+ /*
+ * We need to save away the original address corresponding to a mapped
+ * entry for the sync operations.
+ */
+ phys_addr_t *io_tlb_orig_addr;
+
+ /*
+ * Max segment that we can provide which (if pages are contingous) will
+ * not be bounced (unless SWIOTLB_FORCE is set).
+ */
+ unsigned int max_segment;
+};
+
+extern struct swiotlb_pool default_swiotlb_pool;
+
+extern phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
+ struct device *hwdev,
dma_addr_t tbl_dma_addr,
phys_addr_t phys,
size_t mapping_size,
@@ -52,28 +104,80 @@ extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
enum dma_data_direction dir,
unsigned long attrs);

-extern void swiotlb_tbl_unmap_single(struct device *hwdev,
+extern void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
+ struct device *hwdev,
phys_addr_t tlb_addr,
size_t mapping_size,
size_t alloc_size,
enum dma_data_direction dir,
unsigned long attrs);

-extern void swiotlb_tbl_sync_single(struct device *hwdev,
+extern void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool,
+ struct device *hwdev,
phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
enum dma_sync_target target);

-dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t _swiotlb_map(struct swiotlb_pool *pool, struct device *dev,
+ phys_addr_t phys, size_t size, enum dma_data_direction dir,
+ unsigned long attrs);
+
+static inline phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
+ dma_addr_t tbl_dma_addr,
+ phys_addr_t phys,
+ size_t mapping_size,
+ size_t alloc_size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ return _swiotlb_tbl_map_single(&default_swiotlb_pool, hwdev,
+ tbl_dma_addr, phys, mapping_size,
+ alloc_size, dir, attrs);
+}
+
+static inline void swiotlb_tbl_unmap_single(struct device *hwdev,
+ phys_addr_t tlb_addr,
+ size_t mapping_size,
+ size_t alloc_size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ _swiotlb_tbl_unmap_single(&default_swiotlb_pool, hwdev, tlb_addr,
+ mapping_size, alloc_size, dir, attrs);
+}
+
+static inline void swiotlb_tbl_sync_single(struct device *hwdev,
+ phys_addr_t tlb_addr,
+ size_t size, enum dma_data_direction dir,
+ enum dma_sync_target target)
+{
+ return _swiotlb_tbl_sync_single(&default_swiotlb_pool, hwdev, tlb_addr,
+ size, dir, target);
+}
+
+static inline dma_addr_t swiotlb_map(struct device *dev,
+ phys_addr_t phys, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ return _swiotlb_map(&default_swiotlb_pool, dev, phys, size, dir, attrs);
+}

#ifdef CONFIG_SWIOTLB
extern enum swiotlb_force swiotlb_force;
-extern phys_addr_t io_tlb_start, io_tlb_end;
+
+static inline phys_addr_t swiotlb_start(void)
+{
+ return default_swiotlb_pool.io_tlb_start;
+}
+
+static inline phys_addr_t swiotlb_end(void)
+{
+ return default_swiotlb_pool.io_tlb_end;
+}

static inline bool is_swiotlb_buffer(phys_addr_t paddr)
{
- return paddr >= io_tlb_start && paddr < io_tlb_end;
+ return paddr >= swiotlb_start() && paddr < swiotlb_end();
}

void __init swiotlb_exit(void);
@@ -82,6 +186,17 @@ size_t swiotlb_max_mapping_size(struct device *dev);
bool is_swiotlb_active(void);
#else
#define swiotlb_force SWIOTLB_NO_FORCE
+
+static inline unsigned long swiotlb_start(void)
+{
+ return 0;
+}
+
+static inline phys_addr_t swiotlb_end(void)
+{
+ return 0;
+}
+
static inline bool is_swiotlb_buffer(phys_addr_t paddr)
{
return false;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fa..9c504d3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -64,58 +64,24 @@

enum swiotlb_force swiotlb_force;

-/*
- * Used to do a quick range check in swiotlb_tbl_unmap_single and
- * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
- * API.
- */
-phys_addr_t io_tlb_start, io_tlb_end;
-
-/*
- * The number of IO TLB blocks (in groups of 64) between io_tlb_start and
- * io_tlb_end. This is command line adjustable via setup_io_tlb_npages.
- */
-static unsigned long io_tlb_nslabs;
-
-/*
- * The number of used IO TLB block
- */
-static unsigned long io_tlb_used;
-
-/*
- * This is a free list describing the number of free entries available from
- * each index
- */
-static unsigned int *io_tlb_list;
-static unsigned int io_tlb_index;
-
-/*
- * Max segment that we can provide which (if pages are contingous) will
- * not be bounced (unless SWIOTLB_FORCE is set).
- */
-unsigned int max_segment;
-
-/*
- * We need to save away the original address corresponding to a mapped entry
- * for the sync operations.
- */
#define INVALID_PHYS_ADDR (~(phys_addr_t)0)
-static phys_addr_t *io_tlb_orig_addr;
-
-/*
- * Protect the above data structures in the map and unmap calls
- */
-static DEFINE_SPINLOCK(io_tlb_lock);

-static int late_alloc;
+struct swiotlb_pool default_swiotlb_pool = {
+ .name = "default_pool",
+ .io_tlb_lock =
+ __SPIN_LOCK_UNLOCKED(default_swiotlb_pool.io_tlb_lock),
+};

static int __init
setup_io_tlb_npages(char *str)
{
+ unsigned long nslabs;
+
if (isdigit(*str)) {
- io_tlb_nslabs = simple_strtoul(str, &str, 0);
+ nslabs = simple_strtoul(str, &str, 0);
/* avoid tail segment of size < IO_TLB_SEGSIZE */
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ nslabs = ALIGN(nslabs, IO_TLB_SEGSIZE);
+ default_swiotlb_pool.io_tlb_nslabs = nslabs;
}
if (*str == ',')
++str;
@@ -123,33 +89,33 @@ setup_io_tlb_npages(char *str)
swiotlb_force = SWIOTLB_FORCE;
} else if (!strcmp(str, "noforce")) {
swiotlb_force = SWIOTLB_NO_FORCE;
- io_tlb_nslabs = 1;
+ default_swiotlb_pool.io_tlb_nslabs = 1;
}

return 0;
}
early_param("swiotlb", setup_io_tlb_npages);

-static bool no_iotlb_memory;
-
unsigned long swiotlb_nr_tbl(void)
{
- return unlikely(no_iotlb_memory) ? 0 : io_tlb_nslabs;
+ return unlikely(default_swiotlb_pool.no_iotlb_memory) ?
+ 0 : default_swiotlb_pool.io_tlb_nslabs;
}
EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);

unsigned int swiotlb_max_segment(void)
{
- return unlikely(no_iotlb_memory) ? 0 : max_segment;
+ return unlikely(default_swiotlb_pool.no_iotlb_memory) ?
+ 0 : default_swiotlb_pool.max_segment;
}
EXPORT_SYMBOL_GPL(swiotlb_max_segment);

void swiotlb_set_max_segment(unsigned int val)
{
if (swiotlb_force == SWIOTLB_FORCE)
- max_segment = 1;
+ default_swiotlb_pool.max_segment = 1;
else
- max_segment = rounddown(val, PAGE_SIZE);
+ default_swiotlb_pool.max_segment = rounddown(val, PAGE_SIZE);
}

/* default to 64MB */
@@ -158,23 +124,25 @@ unsigned long swiotlb_size_or_default(void)
{
unsigned long size;

- size = io_tlb_nslabs << IO_TLB_SHIFT;
+ size = default_swiotlb_pool.io_tlb_nslabs << IO_TLB_SHIFT;

return size ? size : (IO_TLB_DEFAULT_SIZE);
}

void swiotlb_print_info(void)
{
- unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+ unsigned long bytes;
+
+ bytes = default_swiotlb_pool.io_tlb_nslabs << IO_TLB_SHIFT;

- if (no_iotlb_memory) {
+ if (default_swiotlb_pool.no_iotlb_memory) {
pr_warn("No low mem\n");
return;
}

pr_info("mapped [mem %#010llx-%#010llx] (%luMB)\n",
- (unsigned long long)io_tlb_start,
- (unsigned long long)io_tlb_end,
+ (unsigned long long)default_swiotlb_pool.io_tlb_start,
+ (unsigned long long)default_swiotlb_pool.io_tlb_end,
bytes >> 20);
}

@@ -189,11 +157,12 @@ void __init swiotlb_update_mem_attributes(void)
void *vaddr;
unsigned long bytes;

- if (no_iotlb_memory || late_alloc)
+ if (default_swiotlb_pool.no_iotlb_memory ||
+ default_swiotlb_pool.late_alloc)
return;

- vaddr = phys_to_virt(io_tlb_start);
- bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+ vaddr = phys_to_virt(default_swiotlb_pool.io_tlb_start);
+ bytes = PAGE_ALIGN(default_swiotlb_pool.io_tlb_nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
memset(vaddr, 0, bytes);
}
@@ -205,37 +174,44 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)

bytes = nslabs << IO_TLB_SHIFT;

- io_tlb_nslabs = nslabs;
- io_tlb_start = __pa(tlb);
- io_tlb_end = io_tlb_start + bytes;
+ default_swiotlb_pool.io_tlb_nslabs = nslabs;
+ default_swiotlb_pool.io_tlb_start = __pa(tlb);
+ default_swiotlb_pool.io_tlb_end =
+ default_swiotlb_pool.io_tlb_start + bytes;

/*
* Allocate and initialize the free list array. This array is used
* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
* between io_tlb_start and io_tlb_end.
*/
- alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(int));
- io_tlb_list = memblock_alloc(alloc_size, PAGE_SIZE);
- if (!io_tlb_list)
+ alloc_size = PAGE_ALIGN(default_swiotlb_pool.io_tlb_nslabs *
+ sizeof(int));
+ default_swiotlb_pool.io_tlb_list =
+ memblock_alloc(alloc_size, PAGE_SIZE);
+ if (!default_swiotlb_pool.io_tlb_list)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
__func__, alloc_size, PAGE_SIZE);

- alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t));
- io_tlb_orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
- if (!io_tlb_orig_addr)
+ alloc_size = PAGE_ALIGN(default_swiotlb_pool.io_tlb_nslabs *
+ sizeof(phys_addr_t));
+ default_swiotlb_pool.io_tlb_orig_addr =
+ memblock_alloc(alloc_size, PAGE_SIZE);
+ if (!default_swiotlb_pool.io_tlb_orig_addr)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
__func__, alloc_size, PAGE_SIZE);

- for (i = 0; i < io_tlb_nslabs; i++) {
- io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
- io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+ for (i = 0; i < default_swiotlb_pool.io_tlb_nslabs; i++) {
+ default_swiotlb_pool.io_tlb_list[i] = IO_TLB_SEGSIZE -
+ OFFSET(i, IO_TLB_SEGSIZE);
+ default_swiotlb_pool.io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
- io_tlb_index = 0;
+ default_swiotlb_pool.io_tlb_index = 0;

if (verbose)
swiotlb_print_info();

- swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_max_segment(default_swiotlb_pool.io_tlb_nslabs <<
+ IO_TLB_SHIFT);
return 0;
}

@@ -249,24 +225,28 @@ swiotlb_init(int verbose)
size_t default_size = IO_TLB_DEFAULT_SIZE;
unsigned char *vstart;
unsigned long bytes;
+ unsigned long nslabs;

- if (!io_tlb_nslabs) {
- io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ if (!default_swiotlb_pool.io_tlb_nslabs) {
+ nslabs = (default_size >> IO_TLB_SHIFT);
+ nslabs = ALIGN(nslabs, IO_TLB_SEGSIZE);
+ default_swiotlb_pool.io_tlb_nslabs = nslabs;
}

- bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+ bytes = default_swiotlb_pool.io_tlb_nslabs << IO_TLB_SHIFT;

/* Get IO TLB memory from the low pages */
vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
- if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
+ if (vstart && !swiotlb_init_with_tbl(vstart,
+ default_swiotlb_pool.io_tlb_nslabs, verbose))
return;

- if (io_tlb_start)
- memblock_free_early(io_tlb_start,
- PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+ if (default_swiotlb_pool.io_tlb_start)
+ memblock_free_early(default_swiotlb_pool.io_tlb_start,
+ PAGE_ALIGN(default_swiotlb_pool.io_tlb_nslabs <<
+ IO_TLB_SHIFT));
pr_warn("Cannot allocate buffer");
- no_iotlb_memory = true;
+ default_swiotlb_pool.no_iotlb_memory = true;
}

/*
@@ -277,22 +257,24 @@ swiotlb_init(int verbose)
int
swiotlb_late_init_with_default_size(size_t default_size)
{
- unsigned long bytes, req_nslabs = io_tlb_nslabs;
+ unsigned long bytes, req_nslabs = default_swiotlb_pool.io_tlb_nslabs;
unsigned char *vstart = NULL;
unsigned int order;
+ unsigned long nslabs;
int rc = 0;

- if (!io_tlb_nslabs) {
- io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ if (!default_swiotlb_pool.io_tlb_nslabs) {
+ nslabs = (default_size >> IO_TLB_SHIFT);
+ nslabs = ALIGN(nslabs, IO_TLB_SEGSIZE);
+ default_swiotlb_pool.io_tlb_nslabs = nslabs;
}

/*
* Get IO TLB memory from the low pages
*/
- order = get_order(io_tlb_nslabs << IO_TLB_SHIFT);
- io_tlb_nslabs = SLABS_PER_PAGE << order;
- bytes = io_tlb_nslabs << IO_TLB_SHIFT;
+ order = get_order(default_swiotlb_pool.io_tlb_nslabs << IO_TLB_SHIFT);
+ default_swiotlb_pool.io_tlb_nslabs = SLABS_PER_PAGE << order;
+ bytes = default_swiotlb_pool.io_tlb_nslabs << IO_TLB_SHIFT;

while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
vstart = (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
@@ -303,15 +285,16 @@ swiotlb_late_init_with_default_size(size_t default_size)
}

if (!vstart) {
- io_tlb_nslabs = req_nslabs;
+ default_swiotlb_pool.io_tlb_nslabs = req_nslabs;
return -ENOMEM;
}
if (order != get_order(bytes)) {
pr_warn("only able to allocate %ld MB\n",
(PAGE_SIZE << order) >> 20);
- io_tlb_nslabs = SLABS_PER_PAGE << order;
+ default_swiotlb_pool.io_tlb_nslabs = SLABS_PER_PAGE << order;
}
- rc = swiotlb_late_init_with_tbl(vstart, io_tlb_nslabs);
+ rc = swiotlb_late_init_with_tbl(vstart,
+ default_swiotlb_pool.io_tlb_nslabs);
if (rc)
free_pages((unsigned long)vstart, order);

@@ -320,10 +303,10 @@ swiotlb_late_init_with_default_size(size_t default_size)

static void swiotlb_cleanup(void)
{
- io_tlb_end = 0;
- io_tlb_start = 0;
- io_tlb_nslabs = 0;
- max_segment = 0;
+ default_swiotlb_pool.io_tlb_end = 0;
+ default_swiotlb_pool.io_tlb_start = 0;
+ default_swiotlb_pool.io_tlb_nslabs = 0;
+ default_swiotlb_pool.max_segment = 0;
}

int
@@ -333,9 +316,10 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)

bytes = nslabs << IO_TLB_SHIFT;

- io_tlb_nslabs = nslabs;
- io_tlb_start = virt_to_phys(tlb);
- io_tlb_end = io_tlb_start + bytes;
+ default_swiotlb_pool.io_tlb_nslabs = nslabs;
+ default_swiotlb_pool.io_tlb_start = virt_to_phys(tlb);
+ default_swiotlb_pool.io_tlb_end =
+ default_swiotlb_pool.io_tlb_start + bytes;

set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
memset(tlb, 0, bytes);
@@ -345,36 +329,39 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
* between io_tlb_start and io_tlb_end.
*/
- io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL,
- get_order(io_tlb_nslabs * sizeof(int)));
- if (!io_tlb_list)
+ default_swiotlb_pool.io_tlb_list =
+ (unsigned int *)__get_free_pages(GFP_KERNEL,
+ get_order(default_swiotlb_pool.io_tlb_nslabs * sizeof(int)));
+ if (!default_swiotlb_pool.io_tlb_list)
goto cleanup3;

- io_tlb_orig_addr = (phys_addr_t *)
+ default_swiotlb_pool.io_tlb_orig_addr = (phys_addr_t *)
__get_free_pages(GFP_KERNEL,
- get_order(io_tlb_nslabs *
+ get_order(default_swiotlb_pool.io_tlb_nslabs *
sizeof(phys_addr_t)));
- if (!io_tlb_orig_addr)
+ if (!default_swiotlb_pool.io_tlb_orig_addr)
goto cleanup4;

- for (i = 0; i < io_tlb_nslabs; i++) {
- io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
- io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+ for (i = 0; i < default_swiotlb_pool.io_tlb_nslabs; i++) {
+ default_swiotlb_pool.io_tlb_list[i] = IO_TLB_SEGSIZE -
+ OFFSET(i, IO_TLB_SEGSIZE);
+ default_swiotlb_pool.io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
- io_tlb_index = 0;
+ default_swiotlb_pool.io_tlb_index = 0;

swiotlb_print_info();

- late_alloc = 1;
+ default_swiotlb_pool.late_alloc = 1;

- swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_max_segment(default_swiotlb_pool.io_tlb_nslabs <<
+ IO_TLB_SHIFT);

return 0;

cleanup4:
- free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
- sizeof(int)));
- io_tlb_list = NULL;
+ free_pages((unsigned long)default_swiotlb_pool.io_tlb_list,
+ get_order(default_swiotlb_pool.io_tlb_nslabs * sizeof(int)));
+ default_swiotlb_pool.io_tlb_list = NULL;
cleanup3:
swiotlb_cleanup();
return -ENOMEM;
@@ -382,23 +369,30 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)

void __init swiotlb_exit(void)
{
- if (!io_tlb_orig_addr)
+ if (!default_swiotlb_pool.io_tlb_orig_addr)
return;

- if (late_alloc) {
- free_pages((unsigned long)io_tlb_orig_addr,
- get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
- free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
- sizeof(int)));
- free_pages((unsigned long)phys_to_virt(io_tlb_start),
- get_order(io_tlb_nslabs << IO_TLB_SHIFT));
+ if (default_swiotlb_pool.late_alloc) {
+ free_pages((unsigned long)default_swiotlb_pool.io_tlb_orig_addr,
+ get_order(default_swiotlb_pool.io_tlb_nslabs *
+ sizeof(phys_addr_t)));
+ free_pages((unsigned long)default_swiotlb_pool.io_tlb_list,
+ get_order(default_swiotlb_pool.io_tlb_nslabs *
+ sizeof(int)));
+ free_pages((unsigned long)
+ phys_to_virt(default_swiotlb_pool.io_tlb_start),
+ get_order(default_swiotlb_pool.io_tlb_nslabs <<
+ IO_TLB_SHIFT));
} else {
- memblock_free_late(__pa(io_tlb_orig_addr),
- PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
- memblock_free_late(__pa(io_tlb_list),
- PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
- memblock_free_late(io_tlb_start,
- PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+ memblock_free_late(__pa(default_swiotlb_pool.io_tlb_orig_addr),
+ PAGE_ALIGN(default_swiotlb_pool.io_tlb_nslabs *
+ sizeof(phys_addr_t)));
+ memblock_free_late(__pa(default_swiotlb_pool.io_tlb_list),
+ PAGE_ALIGN(default_swiotlb_pool.io_tlb_nslabs *
+ sizeof(int)));
+ memblock_free_late(default_swiotlb_pool.io_tlb_start,
+ PAGE_ALIGN(default_swiotlb_pool.io_tlb_nslabs <<
+ IO_TLB_SHIFT));
}
swiotlb_cleanup();
}
@@ -443,7 +437,8 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
}
}

-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
+phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
+ struct device *hwdev,
dma_addr_t tbl_dma_addr,
phys_addr_t orig_addr,
size_t mapping_size,
@@ -460,7 +455,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
unsigned long max_slots;
unsigned long tmp_io_tlb_used;

- if (no_iotlb_memory)
+ if (pool->no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");

if (mem_encrypt_active())
@@ -501,13 +496,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
* Find suitable number of IO TLB entries size that will fit this
* request and allocate a buffer from that IO TLB pool.
*/
- spin_lock_irqsave(&io_tlb_lock, flags);
+ spin_lock_irqsave(&pool->io_tlb_lock, flags);

- if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
+ if (unlikely(nslots > pool->io_tlb_nslabs - pool->io_tlb_used))
goto not_found;

- index = ALIGN(io_tlb_index, stride);
- if (index >= io_tlb_nslabs)
+ index = ALIGN(pool->io_tlb_index, stride);
+ if (index >= pool->io_tlb_nslabs)
index = 0;
wrap = index;

@@ -515,7 +510,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
while (iommu_is_span_boundary(index, nslots, offset_slots,
max_slots)) {
index += stride;
- if (index >= io_tlb_nslabs)
+ if (index >= pool->io_tlb_nslabs)
index = 0;
if (index == wrap)
goto not_found;
@@ -526,40 +521,43 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
* contiguous buffers, we allocate the buffers from that slot
* and mark the entries as '0' indicating unavailable.
*/
- if (io_tlb_list[index] >= nslots) {
+ if (pool->io_tlb_list[index] >= nslots) {
int count = 0;

for (i = index; i < (int) (index + nslots); i++)
- io_tlb_list[i] = 0;
- for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
- io_tlb_list[i] = ++count;
- tlb_addr = io_tlb_start + (index << IO_TLB_SHIFT);
+ pool->io_tlb_list[i] = 0;
+ for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) !=
+ IO_TLB_SEGSIZE - 1) &&
+ pool->io_tlb_list[i]; i--)
+ pool->io_tlb_list[i] = ++count;
+ tlb_addr = pool->io_tlb_start + (index << IO_TLB_SHIFT);

/*
* Update the indices to avoid searching in the next
* round.
*/
- io_tlb_index = ((index + nslots) < io_tlb_nslabs
- ? (index + nslots) : 0);
+ pool->io_tlb_index = ((index + nslots) <
+ pool->io_tlb_nslabs ?
+ (index + nslots) : 0);

goto found;
}
index += stride;
- if (index >= io_tlb_nslabs)
+ if (index >= pool->io_tlb_nslabs)
index = 0;
} while (index != wrap);

not_found:
- tmp_io_tlb_used = io_tlb_used;
+ tmp_io_tlb_used = pool->io_tlb_used;

- spin_unlock_irqrestore(&io_tlb_lock, flags);
+ spin_unlock_irqrestore(&pool->io_tlb_lock, flags);
if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
- alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
+ alloc_size, pool->io_tlb_nslabs, tmp_io_tlb_used);
return (phys_addr_t)DMA_MAPPING_ERROR;
found:
- io_tlb_used += nslots;
- spin_unlock_irqrestore(&io_tlb_lock, flags);
+ pool->io_tlb_used += nslots;
+ spin_unlock_irqrestore(&pool->io_tlb_lock, flags);

/*
* Save away the mapping from the original address to the DMA address.
@@ -567,10 +565,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
* needed.
*/
for (i = 0; i < nslots; i++)
- io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+ pool->io_tlb_orig_addr[index+i] = orig_addr +
+ (i << IO_TLB_SHIFT);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
+ swiotlb_bounce(orig_addr, tlb_addr,
+ mapping_size, DMA_TO_DEVICE);

return tlb_addr;
}
@@ -578,14 +578,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
/*
* tlb_addr is the physical address of the bounce buffer to unmap.
*/
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, size_t alloc_size,
- enum dma_data_direction dir, unsigned long attrs)
+void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
+ struct device *hwdev, phys_addr_t tlb_addr,
+ size_t mapping_size, size_t alloc_size,
+ enum dma_data_direction dir, unsigned long attrs)
{
unsigned long flags;
int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
- int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = io_tlb_orig_addr[index];
+ int index = (tlb_addr - pool->io_tlb_start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = pool->io_tlb_orig_addr[index];

/*
* First, sync the memory before unmapping the entry
@@ -601,36 +602,39 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
* While returning the entries to the free list, we merge the entries
* with slots below and above the pool being returned.
*/
- spin_lock_irqsave(&io_tlb_lock, flags);
+ spin_lock_irqsave(&pool->io_tlb_lock, flags);
{
count = ((index + nslots) < ALIGN(index + 1, IO_TLB_SEGSIZE) ?
- io_tlb_list[index + nslots] : 0);
+ pool->io_tlb_list[index + nslots] : 0);
/*
* Step 1: return the slots to the free list, merging the
* slots with superceeding slots
*/
for (i = index + nslots - 1; i >= index; i--) {
- io_tlb_list[i] = ++count;
- io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+ pool->io_tlb_list[i] = ++count;
+ pool->io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
/*
* Step 2: merge the returned slots with the preceding slots,
* if available (non zero)
*/
- for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
- io_tlb_list[i] = ++count;
+ for (i = index - 1;
+ (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) &&
+ pool->io_tlb_list[i]; i--)
+ pool->io_tlb_list[i] = ++count;

- io_tlb_used -= nslots;
+ pool->io_tlb_used -= nslots;
}
- spin_unlock_irqrestore(&io_tlb_lock, flags);
+ spin_unlock_irqrestore(&pool->io_tlb_lock, flags);
}

-void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t size, enum dma_data_direction dir,
- enum dma_sync_target target)
+void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool,
+ struct device *hwdev, phys_addr_t tlb_addr,
+ size_t size, enum dma_data_direction dir,
+ enum dma_sync_target target)
{
- int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = io_tlb_orig_addr[index];
+ int index = (tlb_addr - pool->io_tlb_start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = pool->io_tlb_orig_addr[index];

if (orig_addr == INVALID_PHYS_ADDR)
return;
@@ -660,7 +664,8 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
* Create a swiotlb mapping for the buffer at @paddr, and in case of DMAing
* to the device copy the data into it as well.
*/
-dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
+dma_addr_t _swiotlb_map(struct swiotlb_pool *pool,
+ struct device *dev, phys_addr_t paddr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
phys_addr_t swiotlb_addr;
@@ -669,8 +674,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size,
swiotlb_force);

- swiotlb_addr = swiotlb_tbl_map_single(dev,
- __phys_to_dma(dev, io_tlb_start),
+ swiotlb_addr = _swiotlb_tbl_map_single(pool, dev,
+ __phys_to_dma(dev, pool->io_tlb_start),
paddr, size, size, dir, attrs);
if (swiotlb_addr == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
@@ -678,8 +683,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
/* Ensure that the address returned is DMA'ble */
dma_addr = __phys_to_dma(dev, swiotlb_addr);
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
- swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, size, dir,
- attrs | DMA_ATTR_SKIP_CPU_SYNC);
+ _swiotlb_tbl_unmap_single(pool, dev, swiotlb_addr, size, size,
+ dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
dev_WARN_ONCE(dev, 1,
"swiotlb addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
@@ -702,7 +707,7 @@ bool is_swiotlb_active(void)
* When SWIOTLB is initialized, even if io_tlb_start points to physical
* address zero, io_tlb_end surely doesn't.
*/
- return io_tlb_end != 0;
+ return default_swiotlb_pool.io_tlb_end != 0;
}

#ifdef CONFIG_DEBUG_FS
@@ -712,8 +717,10 @@ static int __init swiotlb_create_debugfs(void)
struct dentry *root;

root = debugfs_create_dir("swiotlb", NULL);
- debugfs_create_ulong("io_tlb_nslabs", 0400, root, &io_tlb_nslabs);
- debugfs_create_ulong("io_tlb_used", 0400, root, &io_tlb_used);
+ debugfs_create_ulong("io_tlb_nslabs", 0400, root,
+ &default_swiotlb_pool.io_tlb_nslabs);
+ debugfs_create_ulong("io_tlb_used", 0400, root,
+ &default_swiotlb_pool.io_tlb_used);
return 0;
}

--
2.7.4

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 11:42:12

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 2/5] swiotlb: Allow for non-linear mapping between paddr and vaddr

Some of the memory pool managed by swiotlb driver could fall
outside the direct-mapped range, made accessible via memremap()
routine. To facilitate easy conversion between virtual and
physical address of such memory, store the virtual address of
memory pool in addition to its physical address.

Signed-off-by: Srivatsa Vaddagiri <[email protected]>
---
include/linux/swiotlb.h | 2 ++
kernel/dma/swiotlb.c | 20 ++++++++++++++------
2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8c7843f..c634b4d 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -61,6 +61,8 @@ struct swiotlb_pool {

phys_addr_t io_tlb_start, io_tlb_end;

+ void *io_tlb_vstart;
+
/*
* The number of IO TLB blocks (in groups of 64) between io_tlb_start
* and io_tlb_end. This is command line adjustable via
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9c504d3..8cf0b57 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -178,6 +178,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
default_swiotlb_pool.io_tlb_start = __pa(tlb);
default_swiotlb_pool.io_tlb_end =
default_swiotlb_pool.io_tlb_start + bytes;
+ default_swiotlb_pool.io_tlb_vstart = tlb;

/*
* Allocate and initialize the free list array. This array is used
@@ -307,6 +308,7 @@ static void swiotlb_cleanup(void)
default_swiotlb_pool.io_tlb_start = 0;
default_swiotlb_pool.io_tlb_nslabs = 0;
default_swiotlb_pool.max_segment = 0;
+ default_swiotlb_pool.io_tlb_vstart = NULL;
}

int
@@ -320,6 +322,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
default_swiotlb_pool.io_tlb_start = virt_to_phys(tlb);
default_swiotlb_pool.io_tlb_end =
default_swiotlb_pool.io_tlb_start + bytes;
+ default_swiotlb_pool.io_tlb_vstart = tlb;

set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
memset(tlb, 0, bytes);
@@ -400,11 +403,10 @@ void __init swiotlb_exit(void)
/*
* Bounce: copy the swiotlb buffer from or back to the original dma location
*/
-static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
+static void swiotlb_bounce(phys_addr_t orig_addr, void *vaddr,
size_t size, enum dma_data_direction dir)
{
unsigned long pfn = PFN_DOWN(orig_addr);
- unsigned char *vaddr = phys_to_virt(tlb_addr);

if (PageHighMem(pfn_to_page(pfn))) {
/* The buffer does not have a mapping. Map it in and copy */
@@ -437,6 +439,11 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
}
}

+static inline void *tlb_vaddr(struct swiotlb_pool *pool, phys_addr_t tlb_addr)
+{
+ return pool->io_tlb_vstart + (tlb_addr - pool->io_tlb_start);
+}
+
phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
struct device *hwdev,
dma_addr_t tbl_dma_addr,
@@ -569,7 +576,7 @@ phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
(i << IO_TLB_SHIFT);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_addr,
+ swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
mapping_size, DMA_TO_DEVICE);

return tlb_addr;
@@ -594,7 +601,8 @@ void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
if (orig_addr != INVALID_PHYS_ADDR &&
!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
- swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+ swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
+ mapping_size, DMA_FROM_DEVICE);

/*
* Return the buffer to the free list by setting the corresponding
@@ -643,14 +651,14 @@ void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool,
switch (target) {
case SYNC_FOR_CPU:
if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_addr,
+ swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
size, DMA_FROM_DEVICE);
else
BUG_ON(dir != DMA_TO_DEVICE);
break;
case SYNC_FOR_DEVICE:
if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_addr,
+ swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
size, DMA_TO_DEVICE);
else
BUG_ON(dir != DMA_FROM_DEVICE);
--
2.7.4

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 11:42:30

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 4/5] swiotlb: Add API to register new pool

This patch adds an interface for the swiotlb driver to recognize
a new memory pool. Upon successful initialization of the pool,
swiotlb returns a handle, which needs to be passed as an argument
for any future operations on the pool (map/unmap/alloc/free).

Signed-off-by: Srivatsa Vaddagiri <[email protected]>
---
include/linux/swiotlb.h | 9 ++++++++
kernel/dma/swiotlb.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 69 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 957697e..97ac82a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -182,6 +182,9 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
return paddr >= swiotlb_start() && paddr < swiotlb_end();
}

+extern struct swiotlb_pool *swiotlb_register_pool(char *name,
+ phys_addr_t start, void *vstart, size_t size);
+
void __init swiotlb_exit(void);
unsigned int swiotlb_max_segment(void);
size_t swiotlb_max_mapping_size(struct device *dev);
@@ -236,6 +239,12 @@ static inline void swiotlb_free(struct swiotlb_pool *pool,
{
}

+static struct swiotlb_pool *swiotlb_register_pool(char *name,
+ phys_addr_t start, void *vstart, size_t size)
+{
+ return NULL;
+}
+
#endif /* CONFIG_SWIOTLB */

extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7411ce5..9883744 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -36,6 +36,7 @@
#include <linux/scatterlist.h>
#include <linux/mem_encrypt.h>
#include <linux/set_memory.h>
+#include <linux/slab.h>
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
#endif
@@ -736,6 +737,65 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
}

+struct swiotlb_pool *swiotlb_register_pool(char *name, phys_addr_t start,
+ void *vstart, size_t size)
+{
+ struct swiotlb_pool *pool;
+ unsigned long i, bytes;
+ unsigned long nslabs;
+
+ nslabs = size >> IO_TLB_SHIFT;
+ if (!nslabs)
+ return ERR_PTR(-EINVAL);
+
+ pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+ if (!pool)
+ return ERR_PTR(-ENOMEM);
+
+ bytes = nslabs << IO_TLB_SHIFT;
+
+ strncpy(pool->name, name, sizeof(pool->name));
+ spin_lock_init(&pool->io_tlb_lock);
+ pool->late_alloc = 1;
+ pool->io_tlb_start = start;
+ pool->io_tlb_end = start + bytes;
+ pool->io_tlb_vstart = vstart;
+ pool->io_tlb_nslabs = nslabs;
+ pool->max_segment = rounddown(bytes, PAGE_SIZE);
+
+ /*
+ * Allocate and initialize the free list array. This array is used
+ * to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
+ * between io_tlb_start and io_tlb_end.
+ */
+ pool->io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL,
+ get_order(pool->io_tlb_nslabs * sizeof(int)));
+ if (!pool->io_tlb_list)
+ goto cleanup;
+
+ pool->io_tlb_orig_addr = (phys_addr_t *)
+ __get_free_pages(GFP_KERNEL,
+ get_order(pool->io_tlb_nslabs *
+ sizeof(phys_addr_t)));
+ if (!pool->io_tlb_orig_addr)
+ goto cleanup;
+
+ for (i = 0; i < pool->io_tlb_nslabs; i++) {
+ pool->io_tlb_list[i] = IO_TLB_SEGSIZE -
+ OFFSET(i, IO_TLB_SEGSIZE);
+ pool->io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
+ }
+
+ return pool;
+
+cleanup:
+ kfree(pool->io_tlb_list);
+ kfree(pool->io_tlb_orig_addr);
+ kfree(pool);
+
+ return ERR_PTR(-ENOMEM);
+}
+
bool is_swiotlb_active(void)
{
/*
--
2.7.4

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 11:42:46

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 5/5] virtio: Add bounce DMA ops

For better security, its desirable that a guest VM's memory is
not accessible to any entity that executes outside the context of
guest VM. In case of virtio, backend drivers execute outside the
context of guest VM and in general will need access to complete
guest VM memory. One option to restrict the access provided to
backend driver is to make use of a bounce buffer. The bounce
buffer is accessible to both backend and frontend drivers. All IO
buffers that are in private space of guest VM are bounced to be
accessible to backend.

This patch proposes a new memory pool to be used for this bounce
purpose, rather than the default swiotlb memory pool. That will
avoid any conflicts that may arise in situations where a VM needs
to use swiotlb pool for driving any pass-through devices (in
which case swiotlb memory needs not be shared with another VM) as
well as virtio devices (which will require swiotlb memory to be
shared with backend VM). As a possible extension to this patch,
we can provide an option for virtio to make use of default
swiotlb memory pool itself, where no such conflicts may exist in
a given deployment.

Signed-off-by: Srivatsa Vaddagiri <[email protected]>
---
drivers/virtio/Makefile | 2 +-
drivers/virtio/virtio.c | 2 +
drivers/virtio/virtio_bounce.c | 150 +++++++++++++++++++++++++++++++++++++++++
include/linux/virtio.h | 4 ++
4 files changed, 157 insertions(+), 1 deletion(-)
create mode 100644 drivers/virtio/virtio_bounce.c

diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 29a1386e..3fd3515 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
+obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o
obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index a977e32..bc2f779 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev)

dev->index = err;
dev_set_name(&dev->dev, "virtio%u", dev->index);
+ virtio_bounce_set_dma_ops(dev);

spin_lock_init(&dev->config_lock);
dev->config_enabled = false;
@@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore);

static int virtio_init(void)
{
+ virtio_map_bounce_buffer();
if (bus_register(&virtio_bus) != 0)
panic("virtio bus registration failed");
return 0;
diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
new file mode 100644
index 0000000..3de8e0e
--- /dev/null
+++ b/drivers/virtio/virtio_bounce.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio DMA ops to bounce buffers
+ *
+ * Copyright (c) 2020, The Linux Foundation. All rights reserved.
+ *
+ * This module allows bouncing of IO buffers to a region which will be
+ * accessible to backend drivers.
+ */
+
+#include <linux/virtio.h>
+#include <linux/io.h>
+#include <linux/swiotlb.h>
+#include <linux/dma-mapping.h>
+#include <linux/of_reserved_mem.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+static phys_addr_t bounce_buf_paddr;
+static void *bounce_buf_vaddr;
+static size_t bounce_buf_size;
+struct swiotlb_pool *virtio_pool;
+
+#define VIRTIO_MAX_BOUNCE_SIZE (16*4096)
+
+static void *virtio_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs)
+{
+ phys_addr_t addr;
+
+ if (!virtio_pool)
+ return NULL;
+
+ addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX);
+ if (addr == DMA_MAPPING_ERROR)
+ return NULL;
+
+ *dma_handle = (addr - bounce_buf_paddr);
+
+ return bounce_buf_vaddr + (addr - bounce_buf_paddr);
+}
+
+static void virtio_free_coherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle, unsigned long attrs)
+{
+ phys_addr_t addr = (dma_handle + bounce_buf_paddr);
+
+ swiotlb_free(virtio_pool, addr, size);
+}
+
+static dma_addr_t virtio_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ void *ptr = page_address(page) + offset;
+ phys_addr_t paddr = virt_to_phys(ptr);
+ dma_addr_t handle;
+
+ if (!virtio_pool)
+ return DMA_MAPPING_ERROR;
+
+ handle = _swiotlb_tbl_map_single(virtio_pool, dev, bounce_buf_paddr,
+ paddr, size, size, dir, attrs);
+ if (handle == (phys_addr_t)DMA_MAPPING_ERROR)
+ return DMA_MAPPING_ERROR;
+
+ return handle - bounce_buf_paddr;
+}
+
+static void virtio_unmap_page(struct device *dev, dma_addr_t dev_addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ phys_addr_t addr = dev_addr + bounce_buf_paddr;
+
+ _swiotlb_tbl_unmap_single(virtio_pool, dev, addr, size,
+ size, dir, attrs);
+}
+
+size_t virtio_max_mapping_size(struct device *dev)
+{
+ return VIRTIO_MAX_BOUNCE_SIZE;
+}
+
+static const struct dma_map_ops virtio_dma_ops = {
+ .alloc = virtio_alloc_coherent,
+ .free = virtio_free_coherent,
+ .map_page = virtio_map_page,
+ .unmap_page = virtio_unmap_page,
+ .max_mapping_size = virtio_max_mapping_size,
+};
+
+void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
+{
+ if (!bounce_buf_paddr)
+ return;
+
+ set_dma_ops(vdev->dev.parent, &virtio_dma_ops);
+}
+
+int virtio_map_bounce_buffer(void)
+{
+ int ret;
+
+ if (!bounce_buf_paddr)
+ return 0;
+
+ /*
+ * Map region as 'cacheable' memory. This will reduce access latency for
+ * backend.
+ */
+ bounce_buf_vaddr = memremap(bounce_buf_paddr,
+ bounce_buf_size, MEMREMAP_WB);
+ if (!bounce_buf_vaddr)
+ return -ENOMEM;
+
+ memset(bounce_buf_vaddr, 0, bounce_buf_size);
+ virtio_pool = swiotlb_register_pool("virtio_swiotlb", bounce_buf_paddr,
+ bounce_buf_vaddr, bounce_buf_size);
+ if (IS_ERR(virtio_pool)) {
+ ret = PTR_ERR(virtio_pool);
+ virtio_pool = NULL;
+ memunmap(bounce_buf_vaddr);
+ return ret;
+ }
+
+ return 0;
+}
+
+int virtio_register_bounce_buffer(phys_addr_t base, size_t size)
+{
+ if (bounce_buf_paddr || !base || size < PAGE_SIZE)
+ return -EINVAL;
+
+ bounce_buf_paddr = base;
+ bounce_buf_size = size;
+
+ return 0;
+}
+
+static int __init virtio_bounce_setup(struct reserved_mem *rmem)
+{
+ unsigned long node = rmem->fdt_node;
+
+ if (!of_get_flat_dt_prop(node, "no-map", NULL))
+ return -EINVAL;
+
+ return virtio_register_bounce_buffer(rmem->base, rmem->size);
+}
+
+RESERVEDMEM_OF_DECLARE(virtio, "virtio_bounce_pool", virtio_bounce_setup);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index a493eac..c4970c5 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -134,12 +134,16 @@ void virtio_config_changed(struct virtio_device *dev);
void virtio_config_disable(struct virtio_device *dev);
void virtio_config_enable(struct virtio_device *dev);
int virtio_finalize_features(struct virtio_device *dev);
+int virtio_register_bounce_buffer(phys_addr_t base, size_t size);
+
#ifdef CONFIG_PM_SLEEP
int virtio_device_freeze(struct virtio_device *dev);
int virtio_device_restore(struct virtio_device *dev);
#endif

size_t virtio_max_dma_size(struct virtio_device *vdev);
+extern int virtio_map_bounce_buffer(void);
+extern void virtio_bounce_set_dma_ops(struct virtio_device *dev);

#define virtio_device_for_each_vq(vdev, vq) \
list_for_each_entry(vq, &vdev->vqs, list)
--
2.7.4

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 11:44:23

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: [PATCH 3/5] swiotlb: Add alloc and free APIs

Move the memory allocation and free portion of swiotlb driver
into independent routines. They will be useful for drivers that
need swiotlb driver to just allocate/free memory chunks and not
additionally bounce memory.

Signed-off-by: Srivatsa Vaddagiri <[email protected]>
---
include/linux/swiotlb.h | 17 ++++++
kernel/dma/swiotlb.c | 151 ++++++++++++++++++++++++++++--------------------
2 files changed, 106 insertions(+), 62 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index c634b4d..957697e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -186,6 +186,10 @@ void __init swiotlb_exit(void);
unsigned int swiotlb_max_segment(void);
size_t swiotlb_max_mapping_size(struct device *dev);
bool is_swiotlb_active(void);
+extern phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, size_t alloc_size,
+ unsigned long tbl_dma_addr, unsigned long mask);
+extern void swiotlb_free(struct swiotlb_pool *pool,
+ phys_addr_t tlb_addr, size_t alloc_size);
#else
#define swiotlb_force SWIOTLB_NO_FORCE

@@ -219,6 +223,19 @@ static inline bool is_swiotlb_active(void)
{
return false;
}
+
+static inline phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool,
+ size_t alloc_size, unsigned long tbl_dma_addr,
+ unsigned long mask)
+{
+ return DMA_MAPPING_ERROR;
+}
+
+static inline void swiotlb_free(struct swiotlb_pool *pool,
+ phys_addr_t tlb_addr, size_t alloc_size)
+{
+}
+
#endif /* CONFIG_SWIOTLB */

extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8cf0b57..7411ce5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -444,37 +444,14 @@ static inline void *tlb_vaddr(struct swiotlb_pool *pool, phys_addr_t tlb_addr)
return pool->io_tlb_vstart + (tlb_addr - pool->io_tlb_start);
}

-phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
- struct device *hwdev,
- dma_addr_t tbl_dma_addr,
- phys_addr_t orig_addr,
- size_t mapping_size,
- size_t alloc_size,
- enum dma_data_direction dir,
- unsigned long attrs)
+phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool, size_t alloc_size,
+ unsigned long tbl_dma_addr, unsigned long mask)
{
unsigned long flags;
phys_addr_t tlb_addr;
- unsigned int nslots, stride, index, wrap;
- int i;
- unsigned long mask;
+ unsigned int i, nslots, stride, index, wrap;
unsigned long offset_slots;
unsigned long max_slots;
- unsigned long tmp_io_tlb_used;
-
- if (pool->no_iotlb_memory)
- panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
-
- if (mem_encrypt_active())
- pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
-
- if (mapping_size > alloc_size) {
- dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
- mapping_size, alloc_size);
- return (phys_addr_t)DMA_MAPPING_ERROR;
- }
-
- mask = dma_get_seg_boundary(hwdev);

tbl_dma_addr &= mask;

@@ -555,54 +532,23 @@ phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
} while (index != wrap);

not_found:
- tmp_io_tlb_used = pool->io_tlb_used;
-
spin_unlock_irqrestore(&pool->io_tlb_lock, flags);
- if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
- dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
- alloc_size, pool->io_tlb_nslabs, tmp_io_tlb_used);
return (phys_addr_t)DMA_MAPPING_ERROR;
+
found:
pool->io_tlb_used += nslots;
spin_unlock_irqrestore(&pool->io_tlb_lock, flags);

- /*
- * Save away the mapping from the original address to the DMA address.
- * This is needed when we sync the memory. Then we sync the buffer if
- * needed.
- */
- for (i = 0; i < nslots; i++)
- pool->io_tlb_orig_addr[index+i] = orig_addr +
- (i << IO_TLB_SHIFT);
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
- mapping_size, DMA_TO_DEVICE);
-
return tlb_addr;
}

-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
- struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, size_t alloc_size,
- enum dma_data_direction dir, unsigned long attrs)
+void swiotlb_free(struct swiotlb_pool *pool,
+ phys_addr_t tlb_addr, size_t alloc_size)
{
unsigned long flags;
- int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+ int i, count;
+ int nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - pool->io_tlb_start) >> IO_TLB_SHIFT;
- phys_addr_t orig_addr = pool->io_tlb_orig_addr[index];
-
- /*
- * First, sync the memory before unmapping the entry
- */
- if (orig_addr != INVALID_PHYS_ADDR &&
- !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
- swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
- mapping_size, DMA_FROM_DEVICE);

/*
* Return the buffer to the free list by setting the corresponding
@@ -636,6 +582,87 @@ void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
spin_unlock_irqrestore(&pool->io_tlb_lock, flags);
}

+phys_addr_t _swiotlb_tbl_map_single(struct swiotlb_pool *pool,
+ struct device *hwdev,
+ dma_addr_t tbl_dma_addr,
+ phys_addr_t orig_addr,
+ size_t mapping_size,
+ size_t alloc_size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ phys_addr_t tlb_addr;
+ unsigned int nslots, index;
+ int i;
+ unsigned long mask;
+
+ if (pool->no_iotlb_memory)
+ panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
+
+ if (mem_encrypt_active())
+ pr_warn_once("Memory encryption is active and system is using DMA bounce buffers\n");
+
+ if (mapping_size > alloc_size) {
+ dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: %zd bytes)",
+ mapping_size, alloc_size);
+ return (phys_addr_t)DMA_MAPPING_ERROR;
+ }
+
+ mask = dma_get_seg_boundary(hwdev);
+
+ tlb_addr = swiotlb_alloc(pool, alloc_size, tbl_dma_addr, mask);
+
+ if (tlb_addr == DMA_MAPPING_ERROR) {
+ if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
+ dev_warn(hwdev, "swiotlb buffer is full (sz: %zd "
+ "bytes), total %lu (slots), used %lu (slots)\n",
+ alloc_size, pool->io_tlb_nslabs,
+ pool->io_tlb_used);
+ return (phys_addr_t)DMA_MAPPING_ERROR;
+ }
+
+ nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+ index = (tlb_addr - pool->io_tlb_start) >> IO_TLB_SHIFT;
+
+ /*
+ * Save away the mapping from the original address to the DMA address.
+ * This is needed when we sync the memory. Then we sync the buffer if
+ * needed.
+ */
+ for (i = 0; i < nslots; i++)
+ pool->io_tlb_orig_addr[index+i] = orig_addr +
+ (i << IO_TLB_SHIFT);
+ if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
+ mapping_size, DMA_TO_DEVICE);
+
+ return tlb_addr;
+}
+
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void _swiotlb_tbl_unmap_single(struct swiotlb_pool *pool,
+ struct device *hwdev, phys_addr_t tlb_addr,
+ size_t mapping_size, size_t alloc_size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ int index = (tlb_addr - pool->io_tlb_start) >> IO_TLB_SHIFT;
+ phys_addr_t orig_addr = pool->io_tlb_orig_addr[index];
+
+ /*
+ * First, sync the memory before unmapping the entry
+ */
+ if (orig_addr != INVALID_PHYS_ADDR &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
+ swiotlb_bounce(orig_addr, tlb_vaddr(pool, tlb_addr),
+ mapping_size, DMA_FROM_DEVICE);
+
+ swiotlb_free(pool, tlb_addr, alloc_size);
+}
+
void _swiotlb_tbl_sync_single(struct swiotlb_pool *pool,
struct device *hwdev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir,
--
2.7.4

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 16:20:18

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Tue, Apr 28, 2020 at 05:09:18PM +0530, Srivatsa Vaddagiri wrote:
> For better security, its desirable that a guest VM's memory is
> not accessible to any entity that executes outside the context of
> guest VM. In case of virtio, backend drivers execute outside the
> context of guest VM and in general will need access to complete
> guest VM memory. One option to restrict the access provided to
> backend driver is to make use of a bounce buffer. The bounce
> buffer is accessible to both backend and frontend drivers. All IO
> buffers that are in private space of guest VM are bounced to be
> accessible to backend.
>
> This patch proposes a new memory pool to be used for this bounce
> purpose, rather than the default swiotlb memory pool. That will
> avoid any conflicts that may arise in situations where a VM needs
> to use swiotlb pool for driving any pass-through devices (in
> which case swiotlb memory needs not be shared with another VM) as
> well as virtio devices (which will require swiotlb memory to be
> shared with backend VM). As a possible extension to this patch,
> we can provide an option for virtio to make use of default
> swiotlb memory pool itself, where no such conflicts may exist in
> a given deployment.
>
> Signed-off-by: Srivatsa Vaddagiri <[email protected]>


Okay, but how is all this virtio specific? For example, why not allow
separate swiotlbs for any type of device?
For example, this might make sense if a given device is from a
different, less trusted vendor.
All this can then maybe be hidden behind the DMA API.



> ---
> drivers/virtio/Makefile | 2 +-
> drivers/virtio/virtio.c | 2 +
> drivers/virtio/virtio_bounce.c | 150 +++++++++++++++++++++++++++++++++++++++++
> include/linux/virtio.h | 4 ++
> 4 files changed, 157 insertions(+), 1 deletion(-)
> create mode 100644 drivers/virtio/virtio_bounce.c
>
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 29a1386e..3fd3515 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -1,5 +1,5 @@
> # SPDX-License-Identifier: GPL-2.0
> -obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o
> +obj-$(CONFIG_VIRTIO) += virtio.o virtio_ring.o virtio_bounce.o
> obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
> obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
> virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index a977e32..bc2f779 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -329,6 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
>
> dev->index = err;
> dev_set_name(&dev->dev, "virtio%u", dev->index);
> + virtio_bounce_set_dma_ops(dev);
>
> spin_lock_init(&dev->config_lock);
> dev->config_enabled = false;
> @@ -431,6 +432,7 @@ EXPORT_SYMBOL_GPL(virtio_device_restore);
>
> static int virtio_init(void)
> {
> + virtio_map_bounce_buffer();
> if (bus_register(&virtio_bus) != 0)
> panic("virtio bus registration failed");
> return 0;
> diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
> new file mode 100644
> index 0000000..3de8e0e
> --- /dev/null
> +++ b/drivers/virtio/virtio_bounce.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Virtio DMA ops to bounce buffers
> + *
> + * Copyright (c) 2020, The Linux Foundation. All rights reserved.
> + *
> + * This module allows bouncing of IO buffers to a region which will be
> + * accessible to backend drivers.
> + */
> +
> +#include <linux/virtio.h>
> +#include <linux/io.h>
> +#include <linux/swiotlb.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/of_reserved_mem.h>
> +#include <linux/of.h>
> +#include <linux/of_fdt.h>
> +
> +static phys_addr_t bounce_buf_paddr;
> +static void *bounce_buf_vaddr;
> +static size_t bounce_buf_size;
> +struct swiotlb_pool *virtio_pool;
> +
> +#define VIRTIO_MAX_BOUNCE_SIZE (16*4096)
> +
> +static void *virtio_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp_flags, unsigned long attrs)
> +{
> + phys_addr_t addr;
> +
> + if (!virtio_pool)
> + return NULL;
> +
> + addr = swiotlb_alloc(virtio_pool, size, bounce_buf_paddr, ULONG_MAX);
> + if (addr == DMA_MAPPING_ERROR)
> + return NULL;
> +
> + *dma_handle = (addr - bounce_buf_paddr);
> +
> + return bounce_buf_vaddr + (addr - bounce_buf_paddr);
> +}
> +
> +static void virtio_free_coherent(struct device *dev, size_t size, void *vaddr,
> + dma_addr_t dma_handle, unsigned long attrs)
> +{
> + phys_addr_t addr = (dma_handle + bounce_buf_paddr);
> +
> + swiotlb_free(virtio_pool, addr, size);
> +}
> +
> +static dma_addr_t virtio_map_page(struct device *dev, struct page *page,
> + unsigned long offset, size_t size,
> + enum dma_data_direction dir, unsigned long attrs)
> +{
> + void *ptr = page_address(page) + offset;
> + phys_addr_t paddr = virt_to_phys(ptr);
> + dma_addr_t handle;
> +
> + if (!virtio_pool)
> + return DMA_MAPPING_ERROR;
> +
> + handle = _swiotlb_tbl_map_single(virtio_pool, dev, bounce_buf_paddr,
> + paddr, size, size, dir, attrs);
> + if (handle == (phys_addr_t)DMA_MAPPING_ERROR)
> + return DMA_MAPPING_ERROR;
> +
> + return handle - bounce_buf_paddr;
> +}
> +
> +static void virtio_unmap_page(struct device *dev, dma_addr_t dev_addr,
> + size_t size, enum dma_data_direction dir, unsigned long attrs)
> +{
> + phys_addr_t addr = dev_addr + bounce_buf_paddr;
> +
> + _swiotlb_tbl_unmap_single(virtio_pool, dev, addr, size,
> + size, dir, attrs);
> +}
> +
> +size_t virtio_max_mapping_size(struct device *dev)
> +{
> + return VIRTIO_MAX_BOUNCE_SIZE;
> +}
> +
> +static const struct dma_map_ops virtio_dma_ops = {
> + .alloc = virtio_alloc_coherent,
> + .free = virtio_free_coherent,
> + .map_page = virtio_map_page,
> + .unmap_page = virtio_unmap_page,
> + .max_mapping_size = virtio_max_mapping_size,
> +};
> +
> +void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
> +{
> + if (!bounce_buf_paddr)
> + return;
> +
> + set_dma_ops(vdev->dev.parent, &virtio_dma_ops);


I don't think DMA API maintainers will be happy with new users
of set_dma_ops.

> +}
> +
> +int virtio_map_bounce_buffer(void)
> +{
> + int ret;
> +
> + if (!bounce_buf_paddr)
> + return 0;
> +
> + /*
> + * Map region as 'cacheable' memory. This will reduce access latency for
> + * backend.
> + */
> + bounce_buf_vaddr = memremap(bounce_buf_paddr,
> + bounce_buf_size, MEMREMAP_WB);
> + if (!bounce_buf_vaddr)
> + return -ENOMEM;
> +
> + memset(bounce_buf_vaddr, 0, bounce_buf_size);
> + virtio_pool = swiotlb_register_pool("virtio_swiotlb", bounce_buf_paddr,
> + bounce_buf_vaddr, bounce_buf_size);
> + if (IS_ERR(virtio_pool)) {
> + ret = PTR_ERR(virtio_pool);
> + virtio_pool = NULL;
> + memunmap(bounce_buf_vaddr);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +int virtio_register_bounce_buffer(phys_addr_t base, size_t size)
> +{
> + if (bounce_buf_paddr || !base || size < PAGE_SIZE)
> + return -EINVAL;
> +
> + bounce_buf_paddr = base;
> + bounce_buf_size = size;
> +
> + return 0;
> +}
> +
> +static int __init virtio_bounce_setup(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (!of_get_flat_dt_prop(node, "no-map", NULL))
> + return -EINVAL;
> +
> + return virtio_register_bounce_buffer(rmem->base, rmem->size);
> +}
> +
> +RESERVEDMEM_OF_DECLARE(virtio, "virtio_bounce_pool", virtio_bounce_setup);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index a493eac..c4970c5 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -134,12 +134,16 @@ void virtio_config_changed(struct virtio_device *dev);
> void virtio_config_disable(struct virtio_device *dev);
> void virtio_config_enable(struct virtio_device *dev);
> int virtio_finalize_features(struct virtio_device *dev);
> +int virtio_register_bounce_buffer(phys_addr_t base, size_t size);
> +
> #ifdef CONFIG_PM_SLEEP
> int virtio_device_freeze(struct virtio_device *dev);
> int virtio_device_restore(struct virtio_device *dev);
> #endif
>
> size_t virtio_max_dma_size(struct virtio_device *vdev);
> +extern int virtio_map_bounce_buffer(void);
> +extern void virtio_bounce_set_dma_ops(struct virtio_device *dev);
>
> #define virtio_device_for_each_vq(vdev, vq) \
> list_for_each_entry(vq, &vdev->vqs, list)
> --
> 2.7.4
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 17:53:08

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

* Michael S. Tsirkin <[email protected]> [2020-04-28 12:17:57]:

> Okay, but how is all this virtio specific? For example, why not allow
> separate swiotlbs for any type of device?
> For example, this might make sense if a given device is from a
> different, less trusted vendor.

Is swiotlb commonly used for multiple devices that may be on different trust
boundaries (and not behind a hardware iommu)? If so, then yes it sounds like a
good application of multiple swiotlb pools.

> All this can then maybe be hidden behind the DMA API.

Won't we still need some changes to virtio to make use of its own pool (to
bounce buffers)? Something similar to its own DMA ops proposed in this patch?

> > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
> > +{
> > + if (!bounce_buf_paddr)
> > + return;
> > +
> > + set_dma_ops(vdev->dev.parent, &virtio_dma_ops);
>
>
> I don't think DMA API maintainers will be happy with new users
> of set_dma_ops.

Is there an alternate API that is more preffered?

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 20:45:25

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin <[email protected]> [2020-04-28 12:17:57]:
>
> > Okay, but how is all this virtio specific? For example, why not allow
> > separate swiotlbs for any type of device?
> > For example, this might make sense if a given device is from a
> > different, less trusted vendor.
>
> Is swiotlb commonly used for multiple devices that may be on different trust
> boundaries (and not behind a hardware iommu)?

Even a hardware iommu does not imply a 100% security from malicious
hardware. First lots of people use iommu=pt for performance reasons.
Second even without pt, unmaps are often batched, and sub-page buffers
might be used for DMA, so we are not 100% protected at all times.


> If so, then yes it sounds like a
> good application of multiple swiotlb pools.
>
> > All this can then maybe be hidden behind the DMA API.
>
> Won't we still need some changes to virtio to make use of its own pool (to
> bounce buffers)? Something similar to its own DMA ops proposed in this patch?

If you are doing this for all devices, you need to either find a way
to do this without chaning DMA ops, or by doing some automatic change
to all drivers.


> > > +void virtio_bounce_set_dma_ops(struct virtio_device *vdev)
> > > +{
> > > + if (!bounce_buf_paddr)
> > > + return;
> > > +
> > > + set_dma_ops(vdev->dev.parent, &virtio_dma_ops);
> >
> >
> > I don't think DMA API maintainers will be happy with new users
> > of set_dma_ops.
>
> Is there an alternate API that is more preffered?

all this is supposed to be part of DMA API itself. new drivers aren't
supposed to have custom DMA ops.

> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

2020-04-28 21:38:34

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

Hi Srivatsa,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on xen-tip/linux-next linus/master v5.7-rc3 next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: sh-randconfig-a001-20200428 (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=sh

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from drivers/virtio/virtio_bounce.c:13:
include/linux/swiotlb.h: In function 'swiotlb_alloc':
include/linux/swiotlb.h:234:9: error: 'DMA_MAPPING_ERROR' undeclared (first use in this function)
234 | return DMA_MAPPING_ERROR;
| ^~~~~~~~~~~~~~~~~
include/linux/swiotlb.h:234:9: note: each undeclared identifier is reported only once for each function it appears in
>> include/linux/swiotlb.h:235:1: warning: control reaches end of non-void function [-Wreturn-type]
235 | }
| ^

vim +235 include/linux/swiotlb.h

9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 229
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 230 static inline phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool,
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 231 size_t alloc_size, unsigned long tbl_dma_addr,
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 232 unsigned long mask)
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 233 {
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 234 return DMA_MAPPING_ERROR;
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 @235 }
9ab4c39d9f9298 Srivatsa Vaddagiri 2020-04-28 236

:::::: The code at line 235 was first introduced by commit
:::::: 9ab4c39d9f929840cb884e589f6112770dbc2f63 swiotlb: Add alloc and free APIs

:::::: TO: Srivatsa Vaddagiri <[email protected]>
:::::: CC: 0day robot <[email protected]>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.69 kB)
.config.gz (26.71 kB)
Download all attachments

2020-04-28 22:21:35

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

Hi Srivatsa,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: s390-randconfig-a001-20200428 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=s390

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

drivers/virtio/virtio_bounce.c: In function 'virtio_bounce_setup':
>> drivers/virtio/virtio_bounce.c:144:7: error: implicit declaration of function 'of_get_flat_dt_prop' [-Werror=implicit-function-declaration]
144 | if (!of_get_flat_dt_prop(node, "no-map", NULL))
| ^~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors

vim +/of_get_flat_dt_prop +144 drivers/virtio/virtio_bounce.c

139
140 static int __init virtio_bounce_setup(struct reserved_mem *rmem)
141 {
142 unsigned long node = rmem->fdt_node;
143
> 144 if (!of_get_flat_dt_prop(node, "no-map", NULL))
145 return -EINVAL;
146
147 return virtio_register_bounce_buffer(rmem->base, rmem->size);
148 }
149

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.05 kB)
.config.gz (20.91 kB)
Download all attachments

2020-04-28 22:58:57

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Tue, 28 Apr 2020, Srivatsa Vaddagiri wrote:
> For better security, its desirable that a guest VM's memory is
> not accessible to any entity that executes outside the context of
> guest VM. In case of virtio, backend drivers execute outside the
> context of guest VM and in general will need access to complete
> guest VM memory. One option to restrict the access provided to
> backend driver is to make use of a bounce buffer. The bounce
> buffer is accessible to both backend and frontend drivers. All IO
> buffers that are in private space of guest VM are bounced to be
> accessible to backend.

[...]

> +static int __init virtio_bounce_setup(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (!of_get_flat_dt_prop(node, "no-map", NULL))
> + return -EINVAL;
> +
> + return virtio_register_bounce_buffer(rmem->base, rmem->size);
> +}
> +
> +RESERVEDMEM_OF_DECLARE(virtio, "virtio_bounce_pool", virtio_bounce_setup);

Is this special reserved-memory region documented somewhere?

2020-04-28 23:07:15

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Tue, 28 Apr 2020, Michael S. Tsirkin wrote:
> On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > * Michael S. Tsirkin <[email protected]> [2020-04-28 12:17:57]:
> >
> > > Okay, but how is all this virtio specific? For example, why not allow
> > > separate swiotlbs for any type of device?
> > > For example, this might make sense if a given device is from a
> > > different, less trusted vendor.
> >
> > Is swiotlb commonly used for multiple devices that may be on different trust
> > boundaries (and not behind a hardware iommu)?

The trust boundary is not a good way of describing the scenario and I
think it leads to miscommunication.

A better way to describe the scenario would be that the device can only
DMA to/from a small reserved-memory region advertised on device tree.

Do we have other instances of devices that can only DMA to/from very
specific and non-configurable address ranges? If so, this series could
follow their example.


> Even a hardware iommu does not imply a 100% security from malicious
> hardware. First lots of people use iommu=pt for performance reasons.
> Second even without pt, unmaps are often batched, and sub-page buffers
> might be used for DMA, so we are not 100% protected at all times.

2020-04-29 00:34:36

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/5] swiotlb: Introduce concept of swiotlb_pool

Hi Srivatsa,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-defconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

drivers/iommu/intel-iommu.c: In function 'bounce_map_single':
>> drivers/iommu/intel-iommu.c:3990:24: error: 'io_tlb_start' undeclared (first use in this function); did you mean 'swiotlb_start'?
__phys_to_dma(dev, io_tlb_start),
^~~~~~~~~~~~
swiotlb_start
drivers/iommu/intel-iommu.c:3990:24: note: each undeclared identifier is reported only once for each function it appears in

vim +3990 drivers/iommu/intel-iommu.c

cfb94a372f2d4e Lu Baolu 2019-09-06 3941
cfb94a372f2d4e Lu Baolu 2019-09-06 3942 static dma_addr_t
cfb94a372f2d4e Lu Baolu 2019-09-06 3943 bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size,
cfb94a372f2d4e Lu Baolu 2019-09-06 3944 enum dma_data_direction dir, unsigned long attrs,
cfb94a372f2d4e Lu Baolu 2019-09-06 3945 u64 dma_mask)
cfb94a372f2d4e Lu Baolu 2019-09-06 3946 {
cfb94a372f2d4e Lu Baolu 2019-09-06 3947 size_t aligned_size = ALIGN(size, VTD_PAGE_SIZE);
cfb94a372f2d4e Lu Baolu 2019-09-06 3948 struct dmar_domain *domain;
cfb94a372f2d4e Lu Baolu 2019-09-06 3949 struct intel_iommu *iommu;
cfb94a372f2d4e Lu Baolu 2019-09-06 3950 unsigned long iova_pfn;
cfb94a372f2d4e Lu Baolu 2019-09-06 3951 unsigned long nrpages;
cfb94a372f2d4e Lu Baolu 2019-09-06 3952 phys_addr_t tlb_addr;
cfb94a372f2d4e Lu Baolu 2019-09-06 3953 int prot = 0;
cfb94a372f2d4e Lu Baolu 2019-09-06 3954 int ret;
cfb94a372f2d4e Lu Baolu 2019-09-06 3955
a11bfde9c77df1 Joerg Roedel 2020-02-17 3956 if (unlikely(attach_deferred(dev)))
a11bfde9c77df1 Joerg Roedel 2020-02-17 3957 do_deferred_attach(dev);
a11bfde9c77df1 Joerg Roedel 2020-02-17 3958
96d170f3b1a607 Joerg Roedel 2020-02-17 3959 domain = find_domain(dev);
a11bfde9c77df1 Joerg Roedel 2020-02-17 3960
cfb94a372f2d4e Lu Baolu 2019-09-06 3961 if (WARN_ON(dir == DMA_NONE || !domain))
cfb94a372f2d4e Lu Baolu 2019-09-06 3962 return DMA_MAPPING_ERROR;
cfb94a372f2d4e Lu Baolu 2019-09-06 3963
cfb94a372f2d4e Lu Baolu 2019-09-06 3964 iommu = domain_get_iommu(domain);
cfb94a372f2d4e Lu Baolu 2019-09-06 3965 if (WARN_ON(!iommu))
cfb94a372f2d4e Lu Baolu 2019-09-06 3966 return DMA_MAPPING_ERROR;
cfb94a372f2d4e Lu Baolu 2019-09-06 3967
cfb94a372f2d4e Lu Baolu 2019-09-06 3968 nrpages = aligned_nrpages(0, size);
cfb94a372f2d4e Lu Baolu 2019-09-06 3969 iova_pfn = intel_alloc_iova(dev, domain,
cfb94a372f2d4e Lu Baolu 2019-09-06 3970 dma_to_mm_pfn(nrpages), dma_mask);
cfb94a372f2d4e Lu Baolu 2019-09-06 3971 if (!iova_pfn)
cfb94a372f2d4e Lu Baolu 2019-09-06 3972 return DMA_MAPPING_ERROR;
cfb94a372f2d4e Lu Baolu 2019-09-06 3973
cfb94a372f2d4e Lu Baolu 2019-09-06 3974 /*
cfb94a372f2d4e Lu Baolu 2019-09-06 3975 * Check if DMAR supports zero-length reads on write only
cfb94a372f2d4e Lu Baolu 2019-09-06 3976 * mappings..
cfb94a372f2d4e Lu Baolu 2019-09-06 3977 */
cfb94a372f2d4e Lu Baolu 2019-09-06 3978 if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL ||
cfb94a372f2d4e Lu Baolu 2019-09-06 3979 !cap_zlr(iommu->cap))
cfb94a372f2d4e Lu Baolu 2019-09-06 3980 prot |= DMA_PTE_READ;
cfb94a372f2d4e Lu Baolu 2019-09-06 3981 if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
cfb94a372f2d4e Lu Baolu 2019-09-06 3982 prot |= DMA_PTE_WRITE;
cfb94a372f2d4e Lu Baolu 2019-09-06 3983
cfb94a372f2d4e Lu Baolu 2019-09-06 3984 /*
cfb94a372f2d4e Lu Baolu 2019-09-06 3985 * If both the physical buffer start address and size are
cfb94a372f2d4e Lu Baolu 2019-09-06 3986 * page aligned, we don't need to use a bounce page.
cfb94a372f2d4e Lu Baolu 2019-09-06 3987 */
cfb94a372f2d4e Lu Baolu 2019-09-06 3988 if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) {
cfb94a372f2d4e Lu Baolu 2019-09-06 3989 tlb_addr = swiotlb_tbl_map_single(dev,
cfb94a372f2d4e Lu Baolu 2019-09-06 @3990 __phys_to_dma(dev, io_tlb_start),
cfb94a372f2d4e Lu Baolu 2019-09-06 3991 paddr, size, aligned_size, dir, attrs);
cfb94a372f2d4e Lu Baolu 2019-09-06 3992 if (tlb_addr == DMA_MAPPING_ERROR) {
cfb94a372f2d4e Lu Baolu 2019-09-06 3993 goto swiotlb_error;
cfb94a372f2d4e Lu Baolu 2019-09-06 3994 } else {
cfb94a372f2d4e Lu Baolu 2019-09-06 3995 /* Cleanup the padding area. */
cfb94a372f2d4e Lu Baolu 2019-09-06 3996 void *padding_start = phys_to_virt(tlb_addr);
cfb94a372f2d4e Lu Baolu 2019-09-06 3997 size_t padding_size = aligned_size;
cfb94a372f2d4e Lu Baolu 2019-09-06 3998
cfb94a372f2d4e Lu Baolu 2019-09-06 3999 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
cfb94a372f2d4e Lu Baolu 2019-09-06 4000 (dir == DMA_TO_DEVICE ||
cfb94a372f2d4e Lu Baolu 2019-09-06 4001 dir == DMA_BIDIRECTIONAL)) {
cfb94a372f2d4e Lu Baolu 2019-09-06 4002 padding_start += size;
cfb94a372f2d4e Lu Baolu 2019-09-06 4003 padding_size -= size;
cfb94a372f2d4e Lu Baolu 2019-09-06 4004 }
cfb94a372f2d4e Lu Baolu 2019-09-06 4005
cfb94a372f2d4e Lu Baolu 2019-09-06 4006 memset(padding_start, 0, padding_size);
cfb94a372f2d4e Lu Baolu 2019-09-06 4007 }
cfb94a372f2d4e Lu Baolu 2019-09-06 4008 } else {
cfb94a372f2d4e Lu Baolu 2019-09-06 4009 tlb_addr = paddr;
cfb94a372f2d4e Lu Baolu 2019-09-06 4010 }
cfb94a372f2d4e Lu Baolu 2019-09-06 4011
cfb94a372f2d4e Lu Baolu 2019-09-06 4012 ret = domain_pfn_mapping(domain, mm_to_dma_pfn(iova_pfn),
cfb94a372f2d4e Lu Baolu 2019-09-06 4013 tlb_addr >> VTD_PAGE_SHIFT, nrpages, prot);
cfb94a372f2d4e Lu Baolu 2019-09-06 4014 if (ret)
cfb94a372f2d4e Lu Baolu 2019-09-06 4015 goto mapping_error;
cfb94a372f2d4e Lu Baolu 2019-09-06 4016
cfb94a372f2d4e Lu Baolu 2019-09-06 4017 trace_bounce_map_single(dev, iova_pfn << PAGE_SHIFT, paddr, size);
cfb94a372f2d4e Lu Baolu 2019-09-06 4018
cfb94a372f2d4e Lu Baolu 2019-09-06 4019 return (phys_addr_t)iova_pfn << PAGE_SHIFT;
cfb94a372f2d4e Lu Baolu 2019-09-06 4020
cfb94a372f2d4e Lu Baolu 2019-09-06 4021 mapping_error:
cfb94a372f2d4e Lu Baolu 2019-09-06 4022 if (is_swiotlb_buffer(tlb_addr))
cfb94a372f2d4e Lu Baolu 2019-09-06 4023 swiotlb_tbl_unmap_single(dev, tlb_addr, size,
cfb94a372f2d4e Lu Baolu 2019-09-06 4024 aligned_size, dir, attrs);
cfb94a372f2d4e Lu Baolu 2019-09-06 4025 swiotlb_error:
cfb94a372f2d4e Lu Baolu 2019-09-06 4026 free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages));
cfb94a372f2d4e Lu Baolu 2019-09-06 4027 dev_err(dev, "Device bounce map: %zx@%llx dir %d --- failed\n",
cfb94a372f2d4e Lu Baolu 2019-09-06 4028 size, (unsigned long long)paddr, dir);
cfb94a372f2d4e Lu Baolu 2019-09-06 4029
cfb94a372f2d4e Lu Baolu 2019-09-06 4030 return DMA_MAPPING_ERROR;
cfb94a372f2d4e Lu Baolu 2019-09-06 4031 }
cfb94a372f2d4e Lu Baolu 2019-09-06 4032

:::::: The code at line 3990 was first introduced by commit
:::::: cfb94a372f2d4ee226247447c863f8709863d170 iommu/vt-d: Use bounce buffer for untrusted devices

:::::: TO: Lu Baolu <[email protected]>
:::::: CC: Joerg Roedel <[email protected]>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (8.42 kB)
.config.gz (28.29 kB)
Download all attachments

2020-04-29 02:24:21

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
>> * Michael S. Tsirkin<[email protected]> [2020-04-28 12:17:57]:
>>
>>> Okay, but how is all this virtio specific? For example, why not allow
>>> separate swiotlbs for any type of device?
>>> For example, this might make sense if a given device is from a
>>> different, less trusted vendor.
>> Is swiotlb commonly used for multiple devices that may be on different trust
>> boundaries (and not behind a hardware iommu)?
> Even a hardware iommu does not imply a 100% security from malicious
> hardware. First lots of people use iommu=pt for performance reasons.
> Second even without pt, unmaps are often batched, and sub-page buffers
> might be used for DMA, so we are not 100% protected at all times.
>

For untrusted devices, IOMMU is forced on even iommu=pt is used; and
iotlb flush is in strict mode (no batched flushes); ATS is also not
allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
only apply page granularity protection. Swiotlb is now used for devices
from different trust zone.

Best regards,
baolu

2020-04-29 03:39:03

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

* Michael S. Tsirkin <[email protected]> [2020-04-28 16:41:04]:

> > Won't we still need some changes to virtio to make use of its own pool (to
> > bounce buffers)? Something similar to its own DMA ops proposed in this patch?
>
> If you are doing this for all devices, you need to either find a way
> to do this without chaning DMA ops, or by doing some automatic change
> to all drivers.

Ok thanks for this input. I will see how we can obfuscate this in DMA APIs
itself.

Can you also comment on the virtio transport problem I cited? The hypervisor we
are dealing with does not support MMIO transport. It supports message queue
send/recv and also doorbell, which I think can be used if we can make some
change like this to virtio_mmio.c:

+static inline u32
+virtio_readl(struct virtio_mmio_device *vm_dev, u32 reg_offset)
+{
+ return vm_dev->mmio_ops->readl(vm_dev, reg_offset);
+}
+
+static inline void
+virtio_writel(struct virtio_mmio_device *vm_dev, u32 reg_offset, u32 data)
+{
+ vm_dev->mmio_ops->writel(vm_dev, reg_offset, data);
+}


/* Check magic value */
- magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE);
+ magic = vrito_readl(vm_dev, VIRTIO_MMIO_MAGIC_VALUE);

mmio_ops->readl on most platforms can default to readl itself, while on a
platform like us, it can boil down to message_queue send/recv. Would such a
change be acceptable?

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-29 04:11:29

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

* Stefano Stabellini <[email protected]> [2020-04-28 16:04:34]:

> > > Is swiotlb commonly used for multiple devices that may be on different trust
> > > boundaries (and not behind a hardware iommu)?
>
> The trust boundary is not a good way of describing the scenario and I
> think it leads to miscommunication.
>
> A better way to describe the scenario would be that the device can only
> DMA to/from a small reserved-memory region advertised on device tree.
>
> Do we have other instances of devices that can only DMA to/from very
> specific and non-configurable address ranges? If so, this series could
> follow their example.

AFAICT there is no such notion in current DMA API.

static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size,
bool is_ram)
{
return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
}

Only the max address a device can access is defined and not a range that we seem
to need here. I think we need to set the bus_dma_limit to 0 for virtio devices
which will force the use of swiotlb_map API. We should also have a per-device
swiotlb pool defined, so that swiotlb can use the pool meant for the given
device.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-29 05:00:43

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > * Michael S. Tsirkin<[email protected]> [2020-04-28 12:17:57]:
> > >
> > > > Okay, but how is all this virtio specific? For example, why not allow
> > > > separate swiotlbs for any type of device?
> > > > For example, this might make sense if a given device is from a
> > > > different, less trusted vendor.
> > > Is swiotlb commonly used for multiple devices that may be on different trust
> > > boundaries (and not behind a hardware iommu)?
> > Even a hardware iommu does not imply a 100% security from malicious
> > hardware. First lots of people use iommu=pt for performance reasons.
> > Second even without pt, unmaps are often batched, and sub-page buffers
> > might be used for DMA, so we are not 100% protected at all times.
> >
>
> For untrusted devices, IOMMU is forced on even iommu=pt is used;

I think you are talking about untrusted *drivers* like with VFIO.

On the other hand, I am talking about things like thunderbolt
peripherals being less trusted than on-board ones.

Or possibly even using swiotlb for specific use-cases where
speed is less of an issue.

E.g. my wifi is pretty slow anyway, and that card is exposed to
malicious actors all the time, put just that behind swiotlb
for security, and leave my graphics card with pt since
I'm trusting it with secrets anyway.


> and
> iotlb flush is in strict mode (no batched flushes); ATS is also not
> allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> only apply page granularity protection. Swiotlb is now used for devices
> from different trust zone.
>
> Best regards,
> baolu

2020-04-29 05:44:11

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On 2020/4/29 12:57, Michael S. Tsirkin wrote:
> On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
>> On 2020/4/29 4:41, Michael S. Tsirkin wrote:
>>> On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
>>>> * Michael S. Tsirkin<[email protected]> [2020-04-28 12:17:57]:
>>>>
>>>>> Okay, but how is all this virtio specific? For example, why not allow
>>>>> separate swiotlbs for any type of device?
>>>>> For example, this might make sense if a given device is from a
>>>>> different, less trusted vendor.
>>>> Is swiotlb commonly used for multiple devices that may be on different trust
>>>> boundaries (and not behind a hardware iommu)?
>>> Even a hardware iommu does not imply a 100% security from malicious
>>> hardware. First lots of people use iommu=pt for performance reasons.
>>> Second even without pt, unmaps are often batched, and sub-page buffers
>>> might be used for DMA, so we are not 100% protected at all times.
>>>
>>
>> For untrusted devices, IOMMU is forced on even iommu=pt is used;
>
> I think you are talking about untrusted *drivers* like with VFIO.

No. I am talking about untrusted devices like thunderbolt peripherals.
We always trust drivers hosted in kernel and the DMA APIs are designed
for them, right?

Please refer to this series.

https://lkml.org/lkml/2019/9/6/39

Best regards,
baolu

>
> On the other hand, I am talking about things like thunderbolt
> peripherals being less trusted than on-board ones.



>
> Or possibly even using swiotlb for specific use-cases where
> speed is less of an issue.
>
> E.g. my wifi is pretty slow anyway, and that card is exposed to
> malicious actors all the time, put just that behind swiotlb
> for security, and leave my graphics card with pt since
> I'm trusting it with secrets anyway.
>
>
>> and
>> iotlb flush is in strict mode (no batched flushes); ATS is also not
>> allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
>> only apply page granularity protection. Swiotlb is now used for devices
>> from different trust zone.
>>
>> Best regards,
>> baolu
>

2020-04-29 06:54:37

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Wed, Apr 29, 2020 at 01:42:13PM +0800, Lu Baolu wrote:
> On 2020/4/29 12:57, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
> > > On 2020/4/29 4:41, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
> > > > > * Michael S. Tsirkin<[email protected]> [2020-04-28 12:17:57]:
> > > > >
> > > > > > Okay, but how is all this virtio specific? For example, why not allow
> > > > > > separate swiotlbs for any type of device?
> > > > > > For example, this might make sense if a given device is from a
> > > > > > different, less trusted vendor.
> > > > > Is swiotlb commonly used for multiple devices that may be on different trust
> > > > > boundaries (and not behind a hardware iommu)?
> > > > Even a hardware iommu does not imply a 100% security from malicious
> > > > hardware. First lots of people use iommu=pt for performance reasons.
> > > > Second even without pt, unmaps are often batched, and sub-page buffers
> > > > might be used for DMA, so we are not 100% protected at all times.
> > > >
> > >
> > > For untrusted devices, IOMMU is forced on even iommu=pt is used;
> >
> > I think you are talking about untrusted *drivers* like with VFIO.
>
> No. I am talking about untrusted devices like thunderbolt peripherals.
> We always trust drivers hosted in kernel and the DMA APIs are designed
> for them, right?
>
> Please refer to this series.
>
> https://lkml.org/lkml/2019/9/6/39
>
> Best regards,
> baolu

Oh, thanks for that! I didn't realize Linux is doing this.

So it seems that with modern Linux, all one needs
to do on x86 is mark the device as untrusted.
It's already possible to do this with ACPI and with OF - would that be
sufficient for achieving what this patchset is trying to do?

Adding more ways to mark a device as untrusted, and adding
support for more platforms to use bounce buffers
sounds like a reasonable thing to do.

> >
> > On the other hand, I am talking about things like thunderbolt
> > peripherals being less trusted than on-board ones.
>
>
>
> >
> > Or possibly even using swiotlb for specific use-cases where
> > speed is less of an issue.
> >
> > E.g. my wifi is pretty slow anyway, and that card is exposed to
> > malicious actors all the time, put just that behind swiotlb
> > for security, and leave my graphics card with pt since
> > I'm trusting it with secrets anyway.
> >
> >
> > > and
> > > iotlb flush is in strict mode (no batched flushes); ATS is also not
> > > allowed. Swiotlb is used to protect sub-page buffers since IOMMU can
> > > only apply page granularity protection. Swiotlb is now used for devices
> > > from different trust zone.
> > >
> > > Best regards,
> > > baolu
> >

2020-04-29 07:06:03

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On 2020/4/29 14:50, Michael S. Tsirkin wrote:
> On Wed, Apr 29, 2020 at 01:42:13PM +0800, Lu Baolu wrote:
>> On 2020/4/29 12:57, Michael S. Tsirkin wrote:
>>> On Wed, Apr 29, 2020 at 10:22:32AM +0800, Lu Baolu wrote:
>>>> On 2020/4/29 4:41, Michael S. Tsirkin wrote:
>>>>> On Tue, Apr 28, 2020 at 11:19:52PM +0530, Srivatsa Vaddagiri wrote:
>>>>>> * Michael S. Tsirkin<[email protected]> [2020-04-28 12:17:57]:
>>>>>>
>>>>>>> Okay, but how is all this virtio specific? For example, why not allow
>>>>>>> separate swiotlbs for any type of device?
>>>>>>> For example, this might make sense if a given device is from a
>>>>>>> different, less trusted vendor.
>>>>>> Is swiotlb commonly used for multiple devices that may be on different trust
>>>>>> boundaries (and not behind a hardware iommu)?
>>>>> Even a hardware iommu does not imply a 100% security from malicious
>>>>> hardware. First lots of people use iommu=pt for performance reasons.
>>>>> Second even without pt, unmaps are often batched, and sub-page buffers
>>>>> might be used for DMA, so we are not 100% protected at all times.
>>>>>
>>>> For untrusted devices, IOMMU is forced on even iommu=pt is used;
>>> I think you are talking about untrusted*drivers* like with VFIO.
>> No. I am talking about untrusted devices like thunderbolt peripherals.
>> We always trust drivers hosted in kernel and the DMA APIs are designed
>> for them, right?
>>
>> Please refer to this series.
>>
>> https://lkml.org/lkml/2019/9/6/39
>>
>> Best regards,
>> baolu
> Oh, thanks for that! I didn't realize Linux is doing this.
>
> So it seems that with modern Linux, all one needs
> to do on x86 is mark the device as untrusted.
> It's already possible to do this with ACPI and with OF - would that be
> sufficient for achieving what this patchset is trying to do?

Yes.

>
> Adding more ways to mark a device as untrusted, and adding
> support for more platforms to use bounce buffers
> sounds like a reasonable thing to do.
>

Agreed.

Best regards,
baolu

2020-04-29 09:46:22

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

* Michael S. Tsirkin <[email protected]> [2020-04-29 02:50:41]:

> So it seems that with modern Linux, all one needs
> to do on x86 is mark the device as untrusted.
> It's already possible to do this with ACPI and with OF - would that be
> sufficient for achieving what this patchset is trying to do?

In my case, its not sufficient to just mark virtio device untrusted and thus
activate the use of swiotlb. All of the secondary VM memory, including those
allocate by swiotlb driver, is private to it. An additional piece of memory is
available to secondary VM which is shared between VMs and which is where I need
swiotlb driver to do its work.

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-29 09:53:57

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Wed, Apr 29, 2020 at 03:14:10PM +0530, Srivatsa Vaddagiri wrote:
> * Michael S. Tsirkin <[email protected]> [2020-04-29 02:50:41]:
>
> > So it seems that with modern Linux, all one needs
> > to do on x86 is mark the device as untrusted.
> > It's already possible to do this with ACPI and with OF - would that be
> > sufficient for achieving what this patchset is trying to do?
>
> In my case, its not sufficient to just mark virtio device untrusted and thus
> activate the use of swiotlb. All of the secondary VM memory, including those
> allocate by swiotlb driver, is private to it.

So why not make the bounce buffer memory shared then?

> An additional piece of memory is
> available to secondary VM which is shared between VMs and which is where I need
> swiotlb driver to do its work.
>
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

2020-04-29 10:12:01

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

* Michael S. Tsirkin <[email protected]> [2020-04-29 05:52:05]:

> > > So it seems that with modern Linux, all one needs
> > > to do on x86 is mark the device as untrusted.
> > > It's already possible to do this with ACPI and with OF - would that be
> > > sufficient for achieving what this patchset is trying to do?
> >
> > In my case, its not sufficient to just mark virtio device untrusted and thus
> > activate the use of swiotlb. All of the secondary VM memory, including those
> > allocate by swiotlb driver, is private to it.
>
> So why not make the bounce buffer memory shared then?

Its a limitation by our hypervisor. When a secondary VM is created, two
memory segments are allocated - one private and other shared. There is no
provision for the secondary VM to make part of its private memory shared after
it boots. I can perhaps consider a change in swiotlb driver to accept the second
shared memory segment as its main working area (rather than allocate its own).

That would still not work I think where swiotlb is used for pass-thr devices
(when private memory is fine) as well as virtio devices (when shared memory is
required).

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-29 10:23:00

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> That would still not work I think where swiotlb is used for pass-thr devices
> (when private memory is fine) as well as virtio devices (when shared memory is
> required).

So that is a separate question. When there are multiple untrusted
devices, at the moment it looks like a single bounce buffer is used.

Which to me seems like a security problem, I think we should protect
untrusted devices from each other.





> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation

2020-04-29 10:31:28

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On 29.04.20 12:20, Michael S. Tsirkin wrote:
> On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
>> That would still not work I think where swiotlb is used for pass-thr devices
>> (when private memory is fine) as well as virtio devices (when shared memory is
>> required).
>
> So that is a separate question. When there are multiple untrusted
> devices, at the moment it looks like a single bounce buffer is used.
>
> Which to me seems like a security problem, I think we should protect
> untrusted devices from each other.
>

Definitely. That's the model we have for ivshmem-virtio as well.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

2020-04-29 10:36:51

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

* Michael S. Tsirkin <[email protected]> [2020-04-29 06:20:48]:

> On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> > That would still not work I think where swiotlb is used for pass-thr devices
> > (when private memory is fine) as well as virtio devices (when shared memory is
> > required).
>
> So that is a separate question. When there are multiple untrusted
> devices, at the moment it looks like a single bounce buffer is used.
>
> Which to me seems like a security problem, I think we should protect
> untrusted devices from each other.

I think as first step, let me see if we can make swiotlb driver accept a target
memory segment as its working area. That may suffice our needs I think. A
subsequent step could be to make swiotlb driver recognize multiple pools.


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

2020-04-29 10:47:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote:
> On 29.04.20 12:20, Michael S. Tsirkin wrote:
> > On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> > > That would still not work I think where swiotlb is used for pass-thr devices
> > > (when private memory is fine) as well as virtio devices (when shared memory is
> > > required).
> >
> > So that is a separate question. When there are multiple untrusted
> > devices, at the moment it looks like a single bounce buffer is used.
> >
> > Which to me seems like a security problem, I think we should protect
> > untrusted devices from each other.
> >
>
> Definitely. That's the model we have for ivshmem-virtio as well.
>
> Jan

Want to try implementing that?

> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux

2020-04-29 11:00:44

by Jan Kiszka

[permalink] [raw]
Subject: Re: [virtio-dev] Re: [PATCH 5/5] virtio: Add bounce DMA ops

On 29.04.20 12:45, Michael S. Tsirkin wrote:
> On Wed, Apr 29, 2020 at 12:26:43PM +0200, Jan Kiszka wrote:
>> On 29.04.20 12:20, Michael S. Tsirkin wrote:
>>> On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
>>>> That would still not work I think where swiotlb is used for pass-thr devices
>>>> (when private memory is fine) as well as virtio devices (when shared memory is
>>>> required).
>>>
>>> So that is a separate question. When there are multiple untrusted
>>> devices, at the moment it looks like a single bounce buffer is used.
>>>
>>> Which to me seems like a security problem, I think we should protect
>>> untrusted devices from each other.
>>>
>>
>> Definitely. That's the model we have for ivshmem-virtio as well.
>>
>> Jan
>
> Want to try implementing that?
>

The desire is definitely there, currently "just" not the time.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

2020-04-29 21:10:38

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

Hi Srivatsa,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on xen-tip/linux-next linus/master v5.7-rc3 next-20200428]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-191-gc51a0382-dirty
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)

>> drivers/virtio/virtio_bounce.c:22:21: sparse: sparse: symbol 'virtio_pool' was not declared. Should it be static?
>> drivers/virtio/virtio_bounce.c:79:8: sparse: sparse: symbol 'virtio_max_mapping_size' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

2020-04-29 21:12:48

by kernel test robot

[permalink] [raw]
Subject: [RFC PATCH] virtio: virtio_pool can be static


Signed-off-by: kbuild test robot <[email protected]>
---
virtio_bounce.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_bounce.c b/drivers/virtio/virtio_bounce.c
index 3de8e0eb71e48..5a68d48667c42 100644
--- a/drivers/virtio/virtio_bounce.c
+++ b/drivers/virtio/virtio_bounce.c
@@ -19,7 +19,7 @@
static phys_addr_t bounce_buf_paddr;
static void *bounce_buf_vaddr;
static size_t bounce_buf_size;
-struct swiotlb_pool *virtio_pool;
+static struct swiotlb_pool *virtio_pool;

#define VIRTIO_MAX_BOUNCE_SIZE (16*4096)

@@ -76,7 +76,7 @@ static void virtio_unmap_page(struct device *dev, dma_addr_t dev_addr,
size, dir, attrs);
}

-size_t virtio_max_mapping_size(struct device *dev)
+static size_t virtio_max_mapping_size(struct device *dev)
{
return VIRTIO_MAX_BOUNCE_SIZE;
}

2020-04-30 04:20:49

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 3/5] swiotlb: Add alloc and free APIs

Hi Srivatsa,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on xen-tip/linux-next linus/master v5.7-rc3 next-20200429]
[cannot apply to swiotlb/linux-next]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Srivatsa-Vaddagiri/virtio-on-Type-1-hypervisor/20200429-032334
base: https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-b002-20200429 (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from drivers/gpu/drm/i915/i915_scatterlist.h:12:0,
from drivers/gpu/drm/i915/i915_scatterlist.c:7:
include/linux/swiotlb.h: In function 'swiotlb_alloc':
>> include/linux/swiotlb.h:231:9: error: 'DMA_MAPPING_ERROR' undeclared (first use in this function); did you mean 'APM_NO_ERROR'?
return DMA_MAPPING_ERROR;
^~~~~~~~~~~~~~~~~
APM_NO_ERROR
include/linux/swiotlb.h:231:9: note: each undeclared identifier is reported only once for each function it appears in

vim +231 include/linux/swiotlb.h

226
227 static inline phys_addr_t swiotlb_alloc(struct swiotlb_pool *pool,
228 size_t alloc_size, unsigned long tbl_dma_addr,
229 unsigned long mask)
230 {
> 231 return DMA_MAPPING_ERROR;
232 }
233

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.92 kB)
.config.gz (36.69 kB)
Download all attachments

2020-04-30 15:26:29

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 5/5] virtio: Add bounce DMA ops

On Wed, Apr 29, 2020 at 06:20:48AM -0400, Michael S. Tsirkin wrote:
> On Wed, Apr 29, 2020 at 03:39:53PM +0530, Srivatsa Vaddagiri wrote:
> > That would still not work I think where swiotlb is used for pass-thr devices
> > (when private memory is fine) as well as virtio devices (when shared memory is
> > required).
>
> So that is a separate question. When there are multiple untrusted
> devices, at the moment it looks like a single bounce buffer is used.
>
> Which to me seems like a security problem, I think we should protect
> untrusted devices from each other.

There are two DMA pools code in Linux already - the TTM one for graphics
and the mm/dmapool.c - could those be used instead? Or augmented at least?