Hi all,
this patch series enables xen-swiotlb on arm and arm64.
Considering that all guests, including dom0, run on xen on arm with
second stage translation enabled, it follows that without an IOMMU no
guests could actually drive the hardware.
The solution for platforms without an IOMMU is to use swiotlb-xen,
adapted to autotranslate guests. swiotlb-xen provides a set of dma_ops
that can be used by Linux to setup a contiguous buffer in stage-2
addresses and use it for dma operations.
Basically Linux asks Xen to make a buffer contiguous and gets the
machine address for it. This buffer is going to be used by lib/swiotlb.c
to allocate bounce buffers.
The first 5 patches lay the groundwork on arm and arm64 to have
alternative dma_ops and swiotlb.
The sixth patch moves Xen initialization earlier so that we already know
whether we are running on Xen at the time of initializing dma_ops on the
platform.
The following patches adapt swiotlb-xen to autotranslate guests (guest
with second stage translation in hardware) and provide an arm
implementation of xen_create_contiguous_region.
Feedback is very welcome.
Cheers,
Stefano
Changes in v4:
- rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
- rename XENMEM_put_dma_buf to XENMEM_unpin;
- improve the documentation of the new hypercalls;
- add a note about out.address_bits for XENMEM_exchange;
- code style fixes;
- add err_out label in xen_dma_add_entry;
- remove INVALID_ADDRESS, use DMA_ERROR_CODE instead;
- add in-code comments regarding the usage of xen_dma_seg[0].dma_addr.
Changes in v3:
- add a patch to compile SWIOTLB without CONFIG_NEED_SG_DMA_LENGTH;
- add a patch to compile SWIOTLB_XEN without CONFIG_NEED_SG_DMA_LENGTH;
- arm/dma_capable: do not treat dma_mask as a limit;
- arm/dmabounce: keep using arm_dma_ops;
- add missing __init in xen_early_init declaration;
- many code style and name changes in swiotlb-xen.c;
- improve error checks in xen_dma_add_entry;
- warn on XENMEM_put_dma_buf failures.
Changes in v2:
- fixed a couple of errors in xen_bus_to_phys, xen_phys_to_bus and
xen_swiotlb_fixup.
EUNBONG SONG (1):
swiotlb: replace dma_length with sg_dma_len() macro
Stefano Stabellini (9):
swiotlb-xen: replace dma_length with sg_dma_len() macro
arm: make SWIOTLB available
arm: introduce a global dma_ops pointer
arm64: do not initialize arm64_swiotlb if dma_ops is already set
xen/arm,arm64: move Xen initialization earlier
xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
xen: make xen_create_contiguous_region return the dma address
swiotlb-xen: support autotranslate guests
xen/arm,arm64: enable SWIOTLB_XEN
arch/arm/Kconfig | 7 +
arch/arm/include/asm/dma-mapping.h | 33 ++++++-
arch/arm/include/asm/xen/hypervisor.h | 8 ++
arch/arm/include/asm/xen/page.h | 2 +
arch/arm/kernel/setup.c | 2 +
arch/arm/mm/dma-mapping.c | 3 +
arch/arm/xen/Makefile | 2 +-
arch/arm/xen/enlighten.c | 24 +++-
arch/arm/xen/mm.c | 117 +++++++++++++++++++
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/setup.c | 2 +
arch/arm64/mm/dma-mapping.c | 2 +
arch/arm64/xen/Makefile | 2 +-
arch/x86/xen/mmu.c | 4 +-
drivers/xen/Kconfig | 1 -
drivers/xen/swiotlb-xen.c | 198 +++++++++++++++++++++++++++++----
include/xen/interface/memory.h | 37 ++++++
include/xen/xen-ops.h | 3 +-
lib/swiotlb.c | 8 +-
19 files changed, 418 insertions(+), 38 deletions(-)
create mode 100644 arch/arm/xen/mm.c
git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git swiotlb-xen-4
swiotlb-xen has an implicit dependency on CONFIG_NEED_SG_DMA_LENGTH.
Remove it by replacing dma_length with sg_dma_len.
Signed-off-by: Stefano Stabellini <[email protected]>
---
drivers/xen/swiotlb-xen.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index aadffcf..1b2277c 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -506,13 +506,13 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
to do proper error handling. */
xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
attrs);
- sgl[0].dma_length = 0;
+ sg_dma_len(sgl) = 0;
return DMA_ERROR_CODE;
}
sg->dma_address = xen_phys_to_bus(map);
} else
sg->dma_address = dev_addr;
- sg->dma_length = sg->length;
+ sg_dma_len(sg) = sg->length;
}
return nelems;
}
@@ -533,7 +533,7 @@ xen_swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
BUG_ON(dir == DMA_NONE);
for_each_sg(sgl, sg, nelems, i)
- xen_unmap_single(hwdev, sg->dma_address, sg->dma_length, dir);
+ xen_unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir);
}
EXPORT_SYMBOL_GPL(xen_swiotlb_unmap_sg_attrs);
@@ -555,7 +555,7 @@ xen_swiotlb_sync_sg(struct device *hwdev, struct scatterlist *sgl,
for_each_sg(sgl, sg, nelems, i)
xen_swiotlb_sync_single(hwdev, sg->dma_address,
- sg->dma_length, dir, target);
+ sg_dma_len(sg), dir, target);
}
void
--
1.7.2.5
XENMEM_exchange can't be used by autotranslate guests because of two
severe limitations:
- it does not copy back the mfns into the out field for autotranslate
guests;
- it does not guarantee that the hypervisor won't change the p2m
mappings for the exchanged pages while the guest is using them. Xen
never promises to keep the p2m mapping stable for autotranslate guests
in general. In practice it won't happen unless one uses uncommon
features like memory sharing or paging.
To overcome these problems I am introducing two new hypercalls.
Signed-off-by: Stefano Stabellini <[email protected]>
Changes in v4:
- rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
- rename XENMEM_put_dma_buf to XENMEM_unpin;
- improve the documentation of the new hypercalls;
- add a note about out.address_bits for XENMEM_exchange.
---
include/xen/interface/memory.h | 37 +++++++++++++++++++++++++++++++++++++
1 files changed, 37 insertions(+), 0 deletions(-)
diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index 2ecfe4f..5708ff8 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -66,6 +66,7 @@ struct xen_memory_exchange {
/*
* [IN] Details of memory extents to be exchanged (GMFN bases).
* Note that @in.address_bits is ignored and unused.
+ * @out.address_bits should contain the address mask for the new pages.
*/
struct xen_memory_reservation in;
@@ -263,4 +264,40 @@ struct xen_remove_from_physmap {
};
DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
+#define XENMEM_exchange_and_pin 26
+/*
+ * This hypercall is similar to XENMEM_exchange: it takes the same
+ * struct as an argument and it exchanges the pages passed in with a new
+ * set of pages. The new pages are going to be "pinned": it's guaranteed
+ * that their p2m mapping won't be changed until explicitly "unpinned".
+ * The content of the exchanged pages is lost.
+ * Only normal guest r/w memory can be pinned: no granted pages or
+ * ballooned pages.
+ * If return code is zero then @out.extent_list provides the DMA frame
+ * numbers of the newly-allocated memory.
+ * Returns zero on complete success, otherwise a negative error code:
+ * -ENOSYS if not implemented
+ * -EINVAL if the page is already pinned
+ * -EFAULT if the physical to machine walk fails
+ * On complete success then always @nr_exchanged == @in.nr_extents. On
+ * partial success @nr_exchanged indicates how much work was done and a
+ * negative error code is returned.
+ */
+
+#define XENMEM_unpin 27
+/*
+ * XENMEM_unpin unpins a set of pages, previously pinned by
+ * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can
+ * be transparently changed by the hypervisor, as usual. The pages are
+ * still accessible from the guest.
+ */
+struct xen_unpin {
+ /*
+ * [IN] Details of memory extents to be unpinned (GMFN bases).
+ * Note that @in.address_bits is ignored and unused.
+ */
+ struct xen_memory_reservation in;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_unpin);
+
#endif /* __XEN_PUBLIC_MEMORY_H__ */
--
1.7.2.5
From: EUNBONG SONG <[email protected]>
This patch replace dma_length in "lib/swiotlb.c" to sg_dma_len() macro,
because the build error can occur if CONFIG_NEED_SG_DMA_LENGTH is not
set, and CONFIG_SWIOTLB is set.
Singed-off-by: EunBong Song <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
---
lib/swiotlb.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index d23762e..4e8686c 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -870,13 +870,13 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
swiotlb_full(hwdev, sg->length, dir, 0);
swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
attrs);
- sgl[0].dma_length = 0;
+ sg_dma_len(sgl) = 0;
return 0;
}
sg->dma_address = phys_to_dma(hwdev, map);
} else
sg->dma_address = dev_addr;
- sg->dma_length = sg->length;
+ sg_dma_len(sg) = sg->length;
}
return nelems;
}
@@ -904,7 +904,7 @@ swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
BUG_ON(dir == DMA_NONE);
for_each_sg(sgl, sg, nelems, i)
- unmap_single(hwdev, sg->dma_address, sg->dma_length, dir);
+ unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir);
}
EXPORT_SYMBOL(swiotlb_unmap_sg_attrs);
@@ -934,7 +934,7 @@ swiotlb_sync_sg(struct device *hwdev, struct scatterlist *sgl,
for_each_sg(sgl, sg, nelems, i)
swiotlb_sync_single(hwdev, sg->dma_address,
- sg->dma_length, dir, target);
+ sg_dma_len(sg), dir, target);
}
void
--
1.7.2.5
Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/arm64/mm/dma-mapping.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bd7579..a006e84 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -63,6 +63,8 @@ static struct dma_map_ops arm64_swiotlb_dma_ops = {
void __init arm64_swiotlb_init(void)
{
+ if (dma_ops != NULL)
+ return;
dma_ops = &arm64_swiotlb_dma_ops;
swiotlb_init(1);
}
--
1.7.2.5
Modify xen_create_contiguous_region to return the dma address of the
newly contiguous buffer.
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
Changes in v4:
- use virt_to_machine instead of virt_to_bus.
---
arch/x86/xen/mmu.c | 4 +++-
drivers/xen/swiotlb-xen.c | 6 +++---
include/xen/xen-ops.h | 3 ++-
3 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fdc3ba2..6c34d7c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2329,7 +2329,8 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
}
int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
- unsigned int address_bits)
+ unsigned int address_bits,
+ dma_addr_t *dma_handle)
{
unsigned long *in_frames = discontig_frames, out_frame;
unsigned long flags;
@@ -2368,6 +2369,7 @@ int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
spin_unlock_irqrestore(&xen_reservation_lock, flags);
+ *dma_handle = virt_to_machine(vstart).maddr;
return success ? 0 : -ENOMEM;
}
EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 1b2277c..b72f31c 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -126,6 +126,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
{
int i, rc;
int dma_bits;
+ dma_addr_t dma_handle;
dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT;
@@ -137,7 +138,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
rc = xen_create_contiguous_region(
(unsigned long)buf + (i << IO_TLB_SHIFT),
get_order(slabs << IO_TLB_SHIFT),
- dma_bits);
+ dma_bits, &dma_handle);
} while (rc && dma_bits++ < max_dma_bits);
if (rc)
return rc;
@@ -294,11 +295,10 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
*dma_handle = dev_addr;
else {
if (xen_create_contiguous_region(vstart, order,
- fls64(dma_mask)) != 0) {
+ fls64(dma_mask), dma_handle) != 0) {
free_pages(vstart, order);
return NULL;
}
- *dma_handle = virt_to_machine(ret).maddr;
}
memset(ret, 0, size);
return ret;
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index d6fe062..9ef704d 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -20,7 +20,8 @@ int xen_setup_shutdown_event(void);
extern unsigned long *xen_contiguous_bitmap;
int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
- unsigned int address_bits);
+ unsigned int address_bits,
+ dma_addr_t *dma_handle);
void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order);
--
1.7.2.5
Xen on arm and arm64 needs SWIOTLB_XEN: when running on Xen we need to
program the hardware with mfns rather than pfns for dma addresses.
Remove SWIOTLB_XEN dependency on X86 and PCI and make XEN select
SWIOTLB_XEN on arm and arm64.
At the moment always rely on swiotlb-xen, but when Xen starts supporting
hardware IOMMUs we'll be able to avoid it conditionally on the presence
of an IOMMU on the platform.
Implement xen_create_contiguous_region on arm and arm64 by using
XENMEM_exchange_and_pin.
Initialize the xen-swiotlb from xen_early_init (before the native
dma_ops are initialized), set dma_ops to &xen_swiotlb_dma_ops if we are
running on Xen.
Signed-off-by: Stefano Stabellini <[email protected]>
Changes in v4:
- remove redefinition of DMA_ERROR_CODE;
- update the code to use XENMEM_exchange_and_pin and XENMEM_unpin;
- add a note about hardware IOMMU in the commit message.
Changes in v3:
- code style changes;
- warn on XENMEM_put_dma_buf failures.
---
arch/arm/Kconfig | 1 +
arch/arm/include/asm/xen/page.h | 2 +
arch/arm/xen/Makefile | 2 +-
arch/arm/xen/enlighten.c | 3 +
arch/arm/xen/mm.c | 117 +++++++++++++++++++++++++++++++++++++++
arch/arm64/Kconfig | 1 +
arch/arm64/xen/Makefile | 2 +-
drivers/xen/Kconfig | 1 -
drivers/xen/swiotlb-xen.c | 16 +++++
9 files changed, 142 insertions(+), 3 deletions(-)
create mode 100644 arch/arm/xen/mm.c
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c0bfb33..2c9d112 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1848,6 +1848,7 @@ config XEN
depends on CPU_V7 && !CPU_V6
depends on !GENERIC_ATOMIC64
select ARM_PSCI
+ select SWIOTLB_XEN
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 359a7b5..b0f7150 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -6,12 +6,14 @@
#include <linux/pfn.h>
#include <linux/types.h>
+#include <linux/dma-mapping.h>
#include <xen/interface/grant_table.h>
#define pfn_to_mfn(pfn) (pfn)
#define phys_to_machine_mapping_valid(pfn) (1)
#define mfn_to_pfn(mfn) (mfn)
+#define mfn_to_local_pfn(m) (mfn_to_pfn(m))
#define mfn_to_virt(m) (__va(mfn_to_pfn(m) << PAGE_SHIFT))
#define pte_mfn pte_pfn
diff --git a/arch/arm/xen/Makefile b/arch/arm/xen/Makefile
index 4384103..66fc35d 100644
--- a/arch/arm/xen/Makefile
+++ b/arch/arm/xen/Makefile
@@ -1 +1 @@
-obj-y := enlighten.o hypercall.o grant-table.o
+obj-y := enlighten.o hypercall.o grant-table.o mm.o
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 14d17ab..06a6953 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -195,6 +195,7 @@ static void xen_power_off(void)
* documentation of the Xen Device Tree format.
*/
#define GRANT_TABLE_PHYSADDR 0
+extern int xen_mm_init(void);
void __init xen_early_init(void)
{
struct resource res;
@@ -230,6 +231,8 @@ void __init xen_early_init(void)
xen_start_info->flags |= SIF_INITDOMAIN|SIF_PRIVILEGED;
else
xen_start_info->flags &= ~(SIF_INITDOMAIN|SIF_PRIVILEGED);
+
+ xen_mm_init();
}
static int __init xen_guest_init(void)
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
new file mode 100644
index 0000000..916deb2
--- /dev/null
+++ b/arch/arm/xen/mm.c
@@ -0,0 +1,117 @@
+#include <linux/bootmem.h>
+#include <linux/gfp.h>
+#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/dma-mapping.h>
+#include <linux/vmalloc.h>
+#include <linux/swiotlb.h>
+
+#include <xen/xen.h>
+#include <xen/interface/memory.h>
+#include <xen/swiotlb-xen.h>
+
+#include <asm/cacheflush.h>
+#include <asm/xen/page.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/interface.h>
+
+static int xen_exchange_memory(xen_ulong_t extents_in,
+ unsigned int order_in,
+ xen_pfn_t *pfns_in,
+ xen_ulong_t extents_out,
+ unsigned int order_out,
+ xen_pfn_t *mfns_out,
+ unsigned int address_bits)
+{
+ long rc;
+ int success;
+
+ struct xen_memory_exchange exchange = {
+ .in = {
+ .nr_extents = extents_in,
+ .extent_order = order_in,
+ .domid = DOMID_SELF
+ },
+ .out = {
+ .nr_extents = extents_out,
+ .extent_order = order_out,
+ .address_bits = address_bits,
+ .domid = DOMID_SELF
+ }
+ };
+ set_xen_guest_handle(exchange.in.extent_start, pfns_in);
+ set_xen_guest_handle(exchange.out.extent_start, mfns_out);
+
+ BUG_ON(extents_in << order_in != extents_out << order_out);
+
+
+ rc = HYPERVISOR_memory_op(XENMEM_exchange_and_pin, &exchange);
+ success = (exchange.nr_exchanged == extents_in);
+
+ BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
+ BUG_ON(success && (rc != 0));
+
+ return success;
+}
+
+int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
+ unsigned int address_bits,
+ dma_addr_t *dma_handle)
+{
+ phys_addr_t pstart = __pa(vstart);
+ xen_pfn_t in_frame, out_frame;
+ int success;
+
+ /* Get a new contiguous memory extent. */
+ in_frame = out_frame = pstart >> PAGE_SHIFT;
+ success = xen_exchange_memory(1, order, &in_frame,
+ 1, order, &out_frame,
+ address_bits);
+
+ if (!success)
+ return -ENOMEM;
+
+ *dma_handle = out_frame << PAGE_SHIFT;
+
+ return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
+
+void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
+{
+ xen_pfn_t in_frame = __pa(vstart) >> PAGE_SHIFT;
+ struct xen_unpin unpin = {
+ .in = {
+ .nr_extents = 1,
+ .extent_order = order,
+ .domid = DOMID_SELF
+ },
+ };
+ set_xen_guest_handle(unpin.in.extent_start, &in_frame);
+
+ WARN_ON(HYPERVISOR_memory_op(XENMEM_unpin, &unpin));
+}
+EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
+
+static struct dma_map_ops xen_swiotlb_dma_ops = {
+ .mapping_error = xen_swiotlb_dma_mapping_error,
+ .alloc = xen_swiotlb_alloc_coherent,
+ .free = xen_swiotlb_free_coherent,
+ .sync_single_for_cpu = xen_swiotlb_sync_single_for_cpu,
+ .sync_single_for_device = xen_swiotlb_sync_single_for_device,
+ .sync_sg_for_cpu = xen_swiotlb_sync_sg_for_cpu,
+ .sync_sg_for_device = xen_swiotlb_sync_sg_for_device,
+ .map_sg = xen_swiotlb_map_sg_attrs,
+ .unmap_sg = xen_swiotlb_unmap_sg_attrs,
+ .map_page = xen_swiotlb_map_page,
+ .unmap_page = xen_swiotlb_unmap_page,
+ .dma_supported = xen_swiotlb_dma_supported,
+};
+
+int __init xen_mm_init(void)
+{
+ xen_swiotlb_init(1, true);
+ dma_ops = &xen_swiotlb_dma_ops;
+ return 0;
+}
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9737e97..aa1f6fb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -209,6 +209,7 @@ config XEN_DOM0
config XEN
bool "Xen guest support on ARM64 (EXPERIMENTAL)"
depends on ARM64 && OF
+ select SWIOTLB_XEN
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
diff --git a/arch/arm64/xen/Makefile b/arch/arm64/xen/Makefile
index be24040..0ef9637 100644
--- a/arch/arm64/xen/Makefile
+++ b/arch/arm64/xen/Makefile
@@ -1,2 +1,2 @@
-xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o)
+xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o mm.o)
obj-y := xen-arm.o hypercall.o
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 9e02d60..7e83688 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -140,7 +140,6 @@ config XEN_GRANT_DEV_ALLOC
config SWIOTLB_XEN
def_bool y
- depends on PCI && X86
select SWIOTLB
config XEN_TMEM
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 8a403a0..7b25cf8 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -46,6 +46,8 @@
#include <xen/xen-ops.h>
#include <xen/hvc-console.h>
#include <xen/features.h>
+#include <asm/dma-mapping.h>
+
/*
* Used to do a quick range check in swiotlb_tbl_unmap_single and
* swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
@@ -59,6 +61,20 @@ static unsigned long xen_io_tlb_nslabs;
* Quick lookup value of the bus address of the IOTLB.
*/
+#ifndef CONFIG_X86
+static unsigned long dma_alloc_coherent_mask(struct device *dev,
+ gfp_t gfp)
+{
+ unsigned long dma_mask = 0;
+
+ dma_mask = dev->coherent_dma_mask;
+ if (!dma_mask)
+ dma_mask = (gfp & GFP_DMA) ? DMA_BIT_MASK(24) : DMA_BIT_MASK(32);
+
+ return dma_mask;
+}
+#endif
+
struct xen_dma_info {
dma_addr_t dma_addr;
phys_addr_t phys_addr;
--
1.7.2.5
Signed-off-by: Stefano Stabellini <[email protected]>
Changes in v3:
- add missing __init in xen_early_init declaration.
---
arch/arm/include/asm/xen/hypervisor.h | 8 ++++++++
arch/arm/kernel/setup.c | 2 ++
arch/arm/xen/enlighten.c | 21 ++++++++++++++-------
arch/arm64/kernel/setup.c | 2 ++
4 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/arch/arm/include/asm/xen/hypervisor.h b/arch/arm/include/asm/xen/hypervisor.h
index d7ab99a..2782591 100644
--- a/arch/arm/include/asm/xen/hypervisor.h
+++ b/arch/arm/include/asm/xen/hypervisor.h
@@ -1,6 +1,8 @@
#ifndef _ASM_ARM_XEN_HYPERVISOR_H
#define _ASM_ARM_XEN_HYPERVISOR_H
+#include <linux/init.h>
+
extern struct shared_info *HYPERVISOR_shared_info;
extern struct start_info *xen_start_info;
@@ -16,4 +18,10 @@ static inline enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
return PARAVIRT_LAZY_NONE;
}
+#ifdef CONFIG_XEN
+void __init xen_early_init(void);
+#else
+static inline void xen_early_init(void) { return; }
+#endif
+
#endif /* _ASM_ARM_XEN_HYPERVISOR_H */
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 63af9a7..cb7b8e2 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -45,6 +45,7 @@
#include <asm/cacheflush.h>
#include <asm/cachetype.h>
#include <asm/tlbflush.h>
+#include <asm/xen/hypervisor.h>
#include <asm/prom.h>
#include <asm/mach/arch.h>
@@ -889,6 +890,7 @@ void __init setup_arch(char **cmdline_p)
arm_dt_init_cpu_maps();
psci_init();
+ xen_early_init();
#ifdef CONFIG_SMP
if (is_smp()) {
if (!mdesc->smp_init || !mdesc->smp_init()) {
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index c9770ba..14d17ab 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -195,21 +195,19 @@ static void xen_power_off(void)
* documentation of the Xen Device Tree format.
*/
#define GRANT_TABLE_PHYSADDR 0
-static int __init xen_guest_init(void)
+void __init xen_early_init(void)
{
- struct xen_add_to_physmap xatp;
- static struct shared_info *shared_info_page = 0;
+ struct resource res;
struct device_node *node;
int len;
const char *s = NULL;
const char *version = NULL;
const char *xen_prefix = "xen,xen-";
- struct resource res;
node = of_find_compatible_node(NULL, NULL, "xen,xen");
if (!node) {
pr_debug("No Xen support\n");
- return 0;
+ return;
}
s = of_get_property(node, "compatible", &len);
if (strlen(xen_prefix) + 3 < len &&
@@ -217,10 +215,10 @@ static int __init xen_guest_init(void)
version = s + strlen(xen_prefix);
if (version == NULL) {
pr_debug("Xen version not found\n");
- return 0;
+ return;
}
if (of_address_to_resource(node, GRANT_TABLE_PHYSADDR, &res))
- return 0;
+ return;
xen_hvm_resume_frames = res.start >> PAGE_SHIFT;
xen_events_irq = irq_of_parse_and_map(node, 0);
pr_info("Xen %s support found, events_irq=%d gnttab_frame_pfn=%lx\n",
@@ -232,6 +230,15 @@ static int __init xen_guest_init(void)
xen_start_info->flags |= SIF_INITDOMAIN|SIF_PRIVILEGED;
else
xen_start_info->flags &= ~(SIF_INITDOMAIN|SIF_PRIVILEGED);
+}
+
+static int __init xen_guest_init(void)
+{
+ struct xen_add_to_physmap xatp;
+ static struct shared_info *shared_info_page = 0;
+
+ if (!xen_domain())
+ return 0;
if (!shared_info_page)
shared_info_page = (struct shared_info *)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index add6ea6..e0d438a 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -53,6 +53,7 @@
#include <asm/traps.h>
#include <asm/memblock.h>
#include <asm/psci.h>
+#include <asm/xen/hypervisor.h>
unsigned int processor_id;
EXPORT_SYMBOL(processor_id);
@@ -267,6 +268,7 @@ void __init setup_arch(char **cmdline_p)
unflatten_device_tree();
psci_init();
+ xen_early_init();
cpu_logical_map(0) = read_cpuid_mpidr() & MPIDR_HWID_BITMASK;
#ifdef CONFIG_SMP
--
1.7.2.5
IOMMU_HELPER is needed because SWIOTLB calls iommu_is_span_boundary,
provided by lib/iommu_helper.c.
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
CC: [email protected]
CC: [email protected]
Changes in v3:
- dma_capable: do not treat dma_mask as a limit;
- remove SWIOTLB dependency on NEED_SG_DMA_LENGTH.
---
arch/arm/Kconfig | 6 ++++++
arch/arm/include/asm/dma-mapping.h | 30 ++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+), 0 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ba412e0..c0bfb33 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1832,6 +1832,12 @@ config CC_STACKPROTECTOR
neutralized via a kernel panic.
This feature requires gcc version 4.2 or above.
+config SWIOTLB
+ def_bool y
+
+config IOMMU_HELPER
+ def_bool SWIOTLB
+
config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 5b579b9..5b8eef9 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -86,6 +86,36 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
}
#endif
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
+{
+ unsigned int offset = paddr & ~PAGE_MASK;
+ return pfn_to_dma(dev, paddr >> PAGE_SHIFT) + offset;
+}
+
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
+{
+ unsigned int offset = dev_addr & ~PAGE_MASK;
+ return (dma_to_pfn(dev, dev_addr) << PAGE_SHIFT) + offset;
+}
+
+static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
+{
+ u64 limit, mask = *dev->dma_mask;
+
+ limit = (mask + 1) & ~mask;
+ if (limit && size > limit)
+ return 0;
+
+ if ((addr | (addr + size - 1)) & ~mask)
+ return 0;
+
+ return 1;
+}
+
+static inline void dma_mark_clean(void *addr, size_t size)
+{
+}
+
/*
* DMA errors are defined by all-bits-set in the DMA address.
*/
--
1.7.2.5
Initially set dma_ops to arm_dma_ops.
Signed-off-by: Stefano Stabellini <[email protected]>
Acked-by: Konrad Rzeszutek Wilk <[email protected]>
CC: [email protected]
CC: [email protected]
Changes in v3:
- keep using arm_dma_ops in dmabounce.
---
arch/arm/include/asm/dma-mapping.h | 3 ++-
arch/arm/mm/dma-mapping.c | 3 +++
2 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 5b8eef9..5c130bc 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
#include <asm/memory.h>
#define DMA_ERROR_CODE (~0)
+extern struct dma_map_ops *dma_ops;
extern struct dma_map_ops arm_dma_ops;
extern struct dma_map_ops arm_coherent_dma_ops;
@@ -19,7 +20,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
{
if (dev && dev->archdata.dma_ops)
return dev->archdata.dma_ops;
- return &arm_dma_ops;
+ return dma_ops;
}
static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7f9b179..870b12c 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -141,6 +141,9 @@ struct dma_map_ops arm_dma_ops = {
};
EXPORT_SYMBOL(arm_dma_ops);
+struct dma_map_ops *dma_ops = &arm_dma_ops;
+EXPORT_SYMBOL(dma_ops);
+
static void *arm_coherent_dma_alloc(struct device *dev, size_t size,
dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs);
static void arm_coherent_dma_free(struct device *dev, size_t size, void *cpu_addr,
--
1.7.2.5
Support autotranslate guests in swiotlb-xen by keeping track of the
phys-to-bus and bus-to-phys mappings of the swiotlb buffer
(xen_io_tlb_start-xen_io_tlb_end).
Use a simple direct access on a pre-allocated array for phys-to-bus
queries. Use a red-black tree for bus-to-phys queries.
Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
Changes in v4:
- add err_out label in xen_dma_add_entry;
- remove INVALID_ADDRESS, use DMA_ERROR_CODE instead;
- code style fixes;
- add in-code comments regarding the usage of xen_dma_seg[0].dma_addr.
Changes in v3:
- many code style and name changes;
- improve error checks in xen_dma_add_entry.
---
drivers/xen/swiotlb-xen.c | 172 ++++++++++++++++++++++++++++++++++++++++----
1 files changed, 156 insertions(+), 16 deletions(-)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b72f31c..8a403a0 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -38,32 +38,148 @@
#include <linux/bootmem.h>
#include <linux/dma-mapping.h>
#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/spinlock_types.h>
+#include <linux/rbtree.h>
#include <xen/swiotlb-xen.h>
#include <xen/page.h>
#include <xen/xen-ops.h>
#include <xen/hvc-console.h>
+#include <xen/features.h>
/*
* Used to do a quick range check in swiotlb_tbl_unmap_single and
* swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
* API.
*/
+#define NR_DMA_SEGS ((xen_io_tlb_nslabs + IO_TLB_SEGSIZE - 1) / IO_TLB_SEGSIZE)
static char *xen_io_tlb_start, *xen_io_tlb_end;
static unsigned long xen_io_tlb_nslabs;
/*
* Quick lookup value of the bus address of the IOTLB.
*/
-static u64 start_dma_addr;
+struct xen_dma_info {
+ dma_addr_t dma_addr;
+ phys_addr_t phys_addr;
+ size_t size;
+ struct rb_node rbnode;
+};
+
+/*
+ * This array of struct xen_dma_info is indexed by physical addresses,
+ * starting from virt_to_phys(xen_io_tlb_start). Each entry maps
+ * (IO_TLB_SEGSIZE << IO_TLB_SHIFT) bytes, except the last one that is
+ * smaller. Getting the dma address corresponding to a given physical
+ * address can be done by direct access with the right index on the
+ * array.
+ */
+static struct xen_dma_info *xen_dma_seg;
+/*
+ * This tree keeps track of bus address to physical address
+ * mappings.
+ */
+static struct rb_root bus_to_phys = RB_ROOT;
+/* This lock protects operations on the bus_to_phys tree */
+static DEFINE_SPINLOCK(xen_bus_to_phys_lock);
+
+static int xen_dma_add_entry(struct xen_dma_info *new)
+{
+ struct rb_node **link = &bus_to_phys.rb_node;
+ struct rb_node *parent = NULL;
+ struct xen_dma_info *entry;
+ int rc = 0;
+
+ spin_lock(&xen_bus_to_phys_lock);
+
+ while (*link) {
+ parent = *link;
+ entry = rb_entry(parent, struct xen_dma_info, rbnode);
+
+ if (new->dma_addr == entry->dma_addr) {
+ spin_unlock(&xen_bus_to_phys_lock);
+ pr_warn("%s: cannot add phys=0x%pa -> dma=0x%pa, the dma address is already present, mapping to 0x%pa\n",
+ __func__, &new->phys_addr,
+ &new->dma_addr, &entry->phys_addr);
+ rc = -EINVAL;
+ goto err_out;
+ }
+ if (new->phys_addr == entry->phys_addr) {
+ spin_unlock(&xen_bus_to_phys_lock);
+ pr_warn("%s: cannot add phys=0x%pa -> dma=0x%pa, the phys address is already present, mapping to 0x%pa\n",
+ __func__, &new->phys_addr,
+ &new->dma_addr, &entry->dma_addr);
+ rc = -EINVAL;
+ goto err_out;
+ }
+
+ if (new->dma_addr < entry->dma_addr)
+ link = &(*link)->rb_left;
+ else
+ link = &(*link)->rb_right;
+ }
+ rb_link_node(&new->rbnode, parent, link);
+ rb_insert_color(&new->rbnode, &bus_to_phys);
+
+err_out:
+ spin_unlock(&xen_bus_to_phys_lock);
+ return rc;
+}
+
+static struct xen_dma_info *xen_get_dma_info(dma_addr_t dma_addr)
+{
+ struct rb_node *n = bus_to_phys.rb_node;
+ struct xen_dma_info *entry;
+
+ spin_lock(&xen_bus_to_phys_lock);
+
+ while (n) {
+ entry = rb_entry(n, struct xen_dma_info, rbnode);
+ if (entry->dma_addr <= dma_addr &&
+ entry->dma_addr + entry->size > dma_addr) {
+ spin_unlock(&xen_bus_to_phys_lock);
+ return entry;
+ }
+ if (dma_addr < entry->dma_addr)
+ n = n->rb_left;
+ else
+ n = n->rb_right;
+ }
+
+ spin_unlock(&xen_bus_to_phys_lock);
+ return NULL;
+}
static dma_addr_t xen_phys_to_bus(phys_addr_t paddr)
{
- return phys_to_machine(XPADDR(paddr)).maddr;
+ int nr_seg;
+ unsigned long offset;
+ char *vaddr;
+
+ if (!xen_feature(XENFEAT_auto_translated_physmap))
+ return phys_to_machine(XPADDR(paddr)).maddr;
+
+ vaddr = (char *)phys_to_virt(paddr);
+ if (vaddr >= xen_io_tlb_end || vaddr < xen_io_tlb_start)
+ return DMA_ERROR_CODE;
+
+ offset = vaddr - xen_io_tlb_start;
+ nr_seg = offset / (IO_TLB_SEGSIZE << IO_TLB_SHIFT);
+
+ return xen_dma_seg[nr_seg].dma_addr +
+ (paddr - xen_dma_seg[nr_seg].phys_addr);
}
static phys_addr_t xen_bus_to_phys(dma_addr_t baddr)
{
- return machine_to_phys(XMADDR(baddr)).paddr;
+ if (xen_feature(XENFEAT_auto_translated_physmap)) {
+ struct xen_dma_info *dma = xen_get_dma_info(baddr);
+ if (dma == NULL)
+ return DMA_ERROR_CODE;
+ else
+ return dma->phys_addr + (baddr - dma->dma_addr);
+ } else
+ return machine_to_phys(XMADDR(baddr)).paddr;
}
static dma_addr_t xen_virt_to_bus(void *address)
@@ -107,6 +223,9 @@ static int is_xen_swiotlb_buffer(dma_addr_t dma_addr)
unsigned long pfn = mfn_to_local_pfn(mfn);
phys_addr_t paddr;
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return 1;
+
/* If the address is outside our domain, it CAN
* have the same virtual address as another address
* in our domain. Therefore _only_ check address within our domain.
@@ -124,13 +243,12 @@ static int max_dma_bits = 32;
static int
xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
{
- int i, rc;
+ int i, j, rc;
int dma_bits;
- dma_addr_t dma_handle;
dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT;
- i = 0;
+ i = j = 0;
do {
int slabs = min(nslabs - i, (unsigned long)IO_TLB_SEGSIZE);
@@ -138,12 +256,18 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
rc = xen_create_contiguous_region(
(unsigned long)buf + (i << IO_TLB_SHIFT),
get_order(slabs << IO_TLB_SHIFT),
- dma_bits, &dma_handle);
+ dma_bits, &xen_dma_seg[j].dma_addr);
} while (rc && dma_bits++ < max_dma_bits);
if (rc)
return rc;
+ xen_dma_seg[j].phys_addr = virt_to_phys(buf + (i << IO_TLB_SHIFT));
+ xen_dma_seg[j].size = slabs << IO_TLB_SHIFT;
+ rc = xen_dma_add_entry(&xen_dma_seg[j]);
+ if (rc != 0)
+ return rc;
i += slabs;
+ j++;
} while (i < nslabs);
return 0;
}
@@ -193,9 +317,10 @@ retry:
/*
* Get IO TLB memory from any location.
*/
- if (early)
+ if (early) {
xen_io_tlb_start = alloc_bootmem_pages(PAGE_ALIGN(bytes));
- else {
+ xen_dma_seg = alloc_bootmem(sizeof(struct xen_dma_info) * NR_DMA_SEGS);
+ } else {
#define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
#define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
@@ -210,6 +335,8 @@ retry:
xen_io_tlb_nslabs = SLABS_PER_PAGE << order;
bytes = xen_io_tlb_nslabs << IO_TLB_SHIFT;
}
+ xen_dma_seg = kzalloc(sizeof(struct xen_dma_info) * NR_DMA_SEGS,
+ GFP_KERNEL);
}
if (!xen_io_tlb_start) {
m_ret = XEN_SWIOTLB_ENOMEM;
@@ -232,7 +359,6 @@ retry:
m_ret = XEN_SWIOTLB_EFIXUP;
goto error;
}
- start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
if (early) {
if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
verbose))
@@ -290,7 +416,8 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
phys = virt_to_phys(ret);
dev_addr = xen_phys_to_bus(phys);
- if (((dev_addr + size - 1 <= dma_mask)) &&
+ if (!xen_feature(XENFEAT_auto_translated_physmap) &&
+ ((dev_addr + size - 1 <= dma_mask)) &&
!range_straddles_page_boundary(phys, size))
*dma_handle = dev_addr;
else {
@@ -321,8 +448,9 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
phys = virt_to_phys(vaddr);
- if (((dev_addr + size - 1 > dma_mask)) ||
- range_straddles_page_boundary(phys, size))
+ if (xen_feature(XENFEAT_auto_translated_physmap) ||
+ (((dev_addr + size - 1 > dma_mask)) ||
+ range_straddles_page_boundary(phys, size)))
xen_destroy_contiguous_region((unsigned long)vaddr, order);
free_pages((unsigned long)vaddr, order);
@@ -351,14 +479,19 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
* we can safely return the device addr and not worry about bounce
* buffering it.
*/
- if (dma_capable(dev, dev_addr, size) &&
+ if (!xen_feature(XENFEAT_auto_translated_physmap) &&
+ dma_capable(dev, dev_addr, size) &&
!range_straddles_page_boundary(phys, size) && !swiotlb_force)
return dev_addr;
/*
* Oh well, have to allocate and map a bounce buffer.
+ * Pass the dma_addr of the first slab in the iotlb buffer as
+ * argument so that swiotlb_tbl_map_single is free to allocate
+ * the bounce buffer anywhere appropriate in io_tlb_start -
+ * io_tlb_end.
*/
- map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir);
+ map = swiotlb_tbl_map_single(dev, xen_dma_seg[0].dma_addr, phys, size, dir);
if (map == SWIOTLB_MAP_ERROR)
return DMA_ERROR_CODE;
@@ -494,10 +627,17 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
dma_addr_t dev_addr = xen_phys_to_bus(paddr);
if (swiotlb_force ||
+ xen_feature(XENFEAT_auto_translated_physmap) ||
!dma_capable(hwdev, dev_addr, sg->length) ||
range_straddles_page_boundary(paddr, sg->length)) {
+ /*
+ * Pass the dma_addr of the first slab in the iotlb buffer as
+ * argument so that swiotlb_tbl_map_single is free to allocate
+ * the bounce buffer anywhere appropriate in io_tlb_start -
+ * io_tlb_end.
+ */
phys_addr_t map = swiotlb_tbl_map_single(hwdev,
- start_dma_addr,
+ xen_dma_seg[0].dma_addr,
sg_phys(sg),
sg->length,
dir);
--
1.7.2.5
On Thu, Aug 15, 2013 at 12:10:45PM +0100, Stefano Stabellini wrote:
> From: EUNBONG SONG <[email protected]>
>
> This patch replace dma_length in "lib/swiotlb.c" to sg_dma_len() macro,
> because the build error can occur if CONFIG_NEED_SG_DMA_LENGTH is not
> set, and CONFIG_SWIOTLB is set.
>
> Singed-off-by: EunBong Song <[email protected]>
> Signed-off-by: Stefano Stabellini <[email protected]>
I already took that patch, so you can drop it here.
> ---
> lib/swiotlb.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index d23762e..4e8686c 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -870,13 +870,13 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
> swiotlb_full(hwdev, sg->length, dir, 0);
> swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
> attrs);
> - sgl[0].dma_length = 0;
> + sg_dma_len(sgl) = 0;
> return 0;
> }
> sg->dma_address = phys_to_dma(hwdev, map);
> } else
> sg->dma_address = dev_addr;
> - sg->dma_length = sg->length;
> + sg_dma_len(sg) = sg->length;
> }
> return nelems;
> }
> @@ -904,7 +904,7 @@ swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> BUG_ON(dir == DMA_NONE);
>
> for_each_sg(sgl, sg, nelems, i)
> - unmap_single(hwdev, sg->dma_address, sg->dma_length, dir);
> + unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir);
>
> }
> EXPORT_SYMBOL(swiotlb_unmap_sg_attrs);
> @@ -934,7 +934,7 @@ swiotlb_sync_sg(struct device *hwdev, struct scatterlist *sgl,
>
> for_each_sg(sgl, sg, nelems, i)
> swiotlb_sync_single(hwdev, sg->dma_address,
> - sg->dma_length, dir, target);
> + sg_dma_len(sg), dir, target);
> }
>
> void
> --
> 1.7.2.5
>
On Thu, Aug 15, 2013 at 12:10:46PM +0100, Stefano Stabellini wrote:
> swiotlb-xen has an implicit dependency on CONFIG_NEED_SG_DMA_LENGTH.
> Remove it by replacing dma_length with sg_dma_len.
>
Ditto. Already took that one.
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> drivers/xen/swiotlb-xen.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index aadffcf..1b2277c 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -506,13 +506,13 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> to do proper error handling. */
> xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
> attrs);
> - sgl[0].dma_length = 0;
> + sg_dma_len(sgl) = 0;
> return DMA_ERROR_CODE;
> }
> sg->dma_address = xen_phys_to_bus(map);
> } else
> sg->dma_address = dev_addr;
> - sg->dma_length = sg->length;
> + sg_dma_len(sg) = sg->length;
> }
> return nelems;
> }
> @@ -533,7 +533,7 @@ xen_swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> BUG_ON(dir == DMA_NONE);
>
> for_each_sg(sgl, sg, nelems, i)
> - xen_unmap_single(hwdev, sg->dma_address, sg->dma_length, dir);
> + xen_unmap_single(hwdev, sg->dma_address, sg_dma_len(sg), dir);
>
> }
> EXPORT_SYMBOL_GPL(xen_swiotlb_unmap_sg_attrs);
> @@ -555,7 +555,7 @@ xen_swiotlb_sync_sg(struct device *hwdev, struct scatterlist *sgl,
>
> for_each_sg(sgl, sg, nelems, i)
> xen_swiotlb_sync_single(hwdev, sg->dma_address,
> - sg->dma_length, dir, target);
> + sg_dma_len(sg), dir, target);
> }
>
> void
> --
> 1.7.2.5
>
On Thu, Aug 15, 2013 at 12:10:51PM +0100, Stefano Stabellini wrote:
> XENMEM_exchange can't be used by autotranslate guests because of two
> severe limitations:
>
> - it does not copy back the mfns into the out field for autotranslate
> guests;
>
> - it does not guarantee that the hypervisor won't change the p2m
> mappings for the exchanged pages while the guest is using them. Xen
> never promises to keep the p2m mapping stable for autotranslate guests
> in general. In practice it won't happen unless one uses uncommon
> features like memory sharing or paging.
>
> To overcome these problems I am introducing two new hypercalls.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
>
>
> Changes in v4:
> - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
> - rename XENMEM_put_dma_buf to XENMEM_unpin;
> - improve the documentation of the new hypercalls;
> - add a note about out.address_bits for XENMEM_exchange.
> ---
> include/xen/interface/memory.h | 37 +++++++++++++++++++++++++++++++++++++
> 1 files changed, 37 insertions(+), 0 deletions(-)
>
> diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
> index 2ecfe4f..5708ff8 100644
> --- a/include/xen/interface/memory.h
> +++ b/include/xen/interface/memory.h
> @@ -66,6 +66,7 @@ struct xen_memory_exchange {
> /*
> * [IN] Details of memory extents to be exchanged (GMFN bases).
> * Note that @in.address_bits is ignored and unused.
> + * @out.address_bits should contain the address mask for the new pages.
> */
> struct xen_memory_reservation in;
>
> @@ -263,4 +264,40 @@ struct xen_remove_from_physmap {
> };
> DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
>
> +#define XENMEM_exchange_and_pin 26
> +/*
> + * This hypercall is similar to XENMEM_exchange: it takes the same
> + * struct as an argument and it exchanges the pages passed in with a new
> + * set of pages. The new pages are going to be "pinned": it's guaranteed
> + * that their p2m mapping won't be changed until explicitly "unpinned".
> + * The content of the exchanged pages is lost.
> + * Only normal guest r/w memory can be pinned: no granted pages or
> + * ballooned pages.
> + * If return code is zero then @out.extent_list provides the DMA frame
> + * numbers of the newly-allocated memory.
> + * Returns zero on complete success, otherwise a negative error code:
> + * -ENOSYS if not implemented
> + * -EINVAL if the page is already pinned
> + * -EFAULT if the physical to machine walk fails
I don't know what that means. Physical to machine walk? That sounds like
you are trying to do some form of mind melt between robots and humans
while walking around the office.
Could you expand this a bit please?
> + * On complete success then always @nr_exchanged == @in.nr_extents. On
> + * partial success @nr_exchanged indicates how much work was done and a
> + * negative error code is returned.
> + */
> +
> +#define XENMEM_unpin 27
> +/*
> + * XENMEM_unpin unpins a set of pages, previously pinned by
> + * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can
> + * be transparently changed by the hypervisor, as usual. The pages are
> + * still accessible from the guest.
> + */
> +struct xen_unpin {
> + /*
> + * [IN] Details of memory extents to be unpinned (GMFN bases).
> + * Note that @in.address_bits is ignored and unused.
> + */
> + struct xen_memory_reservation in;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_unpin);
Otherwise looks good to me.
> +
> #endif /* __XEN_PUBLIC_MEMORY_H__ */
> --
> 1.7.2.5
>
On Thu, Aug 15, 2013 at 12:10:53PM +0100, Stefano Stabellini wrote:
> Support autotranslate guests in swiotlb-xen by keeping track of the
> phys-to-bus and bus-to-phys mappings of the swiotlb buffer
> (xen_io_tlb_start-xen_io_tlb_end).
>
> Use a simple direct access on a pre-allocated array for phys-to-bus
> queries. Use a red-black tree for bus-to-phys queries.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> Reviewed-by: David Vrabel <[email protected]>
>
>
> Changes in v4:
> - add err_out label in xen_dma_add_entry;
> - remove INVALID_ADDRESS, use DMA_ERROR_CODE instead;
> - code style fixes;
> - add in-code comments regarding the usage of xen_dma_seg[0].dma_addr.
>
> Changes in v3:
> - many code style and name changes;
> - improve error checks in xen_dma_add_entry.
> ---
> drivers/xen/swiotlb-xen.c | 172 ++++++++++++++++++++++++++++++++++++++++----
> 1 files changed, 156 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index b72f31c..8a403a0 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -38,32 +38,148 @@
> #include <linux/bootmem.h>
> #include <linux/dma-mapping.h>
> #include <linux/export.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock_types.h>
> +#include <linux/rbtree.h>
> #include <xen/swiotlb-xen.h>
> #include <xen/page.h>
> #include <xen/xen-ops.h>
> #include <xen/hvc-console.h>
> +#include <xen/features.h>
> /*
> * Used to do a quick range check in swiotlb_tbl_unmap_single and
> * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
> * API.
> */
>
> +#define NR_DMA_SEGS ((xen_io_tlb_nslabs + IO_TLB_SEGSIZE - 1) / IO_TLB_SEGSIZE)
> static char *xen_io_tlb_start, *xen_io_tlb_end;
> static unsigned long xen_io_tlb_nslabs;
> /*
> * Quick lookup value of the bus address of the IOTLB.
> */
>
> -static u64 start_dma_addr;
> +struct xen_dma_info {
> + dma_addr_t dma_addr;
> + phys_addr_t phys_addr;
> + size_t size;
> + struct rb_node rbnode;
> +};
> +
> +/*
> + * This array of struct xen_dma_info is indexed by physical addresses,
> + * starting from virt_to_phys(xen_io_tlb_start). Each entry maps
> + * (IO_TLB_SEGSIZE << IO_TLB_SHIFT) bytes, except the last one that is
> + * smaller. Getting the dma address corresponding to a given physical
> + * address can be done by direct access with the right index on the
> + * array.
> + */
> +static struct xen_dma_info *xen_dma_seg;
> +/*
> + * This tree keeps track of bus address to physical address
> + * mappings.
> + */
> +static struct rb_root bus_to_phys = RB_ROOT;
> +/* This lock protects operations on the bus_to_phys tree */
> +static DEFINE_SPINLOCK(xen_bus_to_phys_lock);
> +
> +static int xen_dma_add_entry(struct xen_dma_info *new)
> +{
> + struct rb_node **link = &bus_to_phys.rb_node;
> + struct rb_node *parent = NULL;
> + struct xen_dma_info *entry;
> + int rc = 0;
> +
> + spin_lock(&xen_bus_to_phys_lock);
> +
> + while (*link) {
> + parent = *link;
> + entry = rb_entry(parent, struct xen_dma_info, rbnode);
> +
> + if (new->dma_addr == entry->dma_addr) {
> + spin_unlock(&xen_bus_to_phys_lock);
> + pr_warn("%s: cannot add phys=0x%pa -> dma=0x%pa, the dma address is already present, mapping to 0x%pa\n",
> + __func__, &new->phys_addr,
> + &new->dma_addr, &entry->phys_addr);
> + rc = -EINVAL;
> + goto err_out;
> + }
> + if (new->phys_addr == entry->phys_addr) {
> + spin_unlock(&xen_bus_to_phys_lock);
> + pr_warn("%s: cannot add phys=0x%pa -> dma=0x%pa, the phys address is already present, mapping to 0x%pa\n",
> + __func__, &new->phys_addr,
> + &new->dma_addr, &entry->dma_addr);
> + rc = -EINVAL;
You didn't test this logic path, did you :-)
See the double spin_unlock?
I was thinking you could have the pr_warn in the err_out
label code.
> + goto err_out;
> + }
> +
> + if (new->dma_addr < entry->dma_addr)
> + link = &(*link)->rb_left;
> + else
> + link = &(*link)->rb_right;
> + }
> + rb_link_node(&new->rbnode, parent, link);
> + rb_insert_color(&new->rbnode, &bus_to_phys);
And here we just do
goto out;
> +
> +err_out:
while this is:
pr_warn("%s...")
rc = -EINVAL;
out:
> + spin_unlock(&xen_bus_to_phys_lock);
> + return rc;
> +}
And that should make those checks above just be:
if ((some conditional))
goto err_out;
right?
> +
> +static struct xen_dma_info *xen_get_dma_info(dma_addr_t dma_addr)
> +{
> + struct rb_node *n = bus_to_phys.rb_node;
> + struct xen_dma_info *entry;
> +
> + spin_lock(&xen_bus_to_phys_lock);
> +
> + while (n) {
> + entry = rb_entry(n, struct xen_dma_info, rbnode);
> + if (entry->dma_addr <= dma_addr &&
> + entry->dma_addr + entry->size > dma_addr) {
> + spin_unlock(&xen_bus_to_phys_lock);
> + return entry;
> + }
> + if (dma_addr < entry->dma_addr)
> + n = n->rb_left;
> + else
> + n = n->rb_right;
> + }
> +
> + spin_unlock(&xen_bus_to_phys_lock);
> + return NULL;
> +}
>
> static dma_addr_t xen_phys_to_bus(phys_addr_t paddr)
> {
> - return phys_to_machine(XPADDR(paddr)).maddr;
> + int nr_seg;
> + unsigned long offset;
> + char *vaddr;
> +
> + if (!xen_feature(XENFEAT_auto_translated_physmap))
> + return phys_to_machine(XPADDR(paddr)).maddr;
> +
> + vaddr = (char *)phys_to_virt(paddr);
> + if (vaddr >= xen_io_tlb_end || vaddr < xen_io_tlb_start)
> + return DMA_ERROR_CODE;
> +
> + offset = vaddr - xen_io_tlb_start;
> + nr_seg = offset / (IO_TLB_SEGSIZE << IO_TLB_SHIFT);
> +
> + return xen_dma_seg[nr_seg].dma_addr +
> + (paddr - xen_dma_seg[nr_seg].phys_addr);
> }
>
> static phys_addr_t xen_bus_to_phys(dma_addr_t baddr)
> {
> - return machine_to_phys(XMADDR(baddr)).paddr;
> + if (xen_feature(XENFEAT_auto_translated_physmap)) {
> + struct xen_dma_info *dma = xen_get_dma_info(baddr);
> + if (dma == NULL)
> + return DMA_ERROR_CODE;
> + else
> + return dma->phys_addr + (baddr - dma->dma_addr);
> + } else
> + return machine_to_phys(XMADDR(baddr)).paddr;
> }
>
> static dma_addr_t xen_virt_to_bus(void *address)
> @@ -107,6 +223,9 @@ static int is_xen_swiotlb_buffer(dma_addr_t dma_addr)
> unsigned long pfn = mfn_to_local_pfn(mfn);
> phys_addr_t paddr;
>
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return 1;
> +
> /* If the address is outside our domain, it CAN
> * have the same virtual address as another address
> * in our domain. Therefore _only_ check address within our domain.
> @@ -124,13 +243,12 @@ static int max_dma_bits = 32;
> static int
> xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
> {
> - int i, rc;
> + int i, j, rc;
> int dma_bits;
> - dma_addr_t dma_handle;
>
> dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT;
>
> - i = 0;
> + i = j = 0;
> do {
> int slabs = min(nslabs - i, (unsigned long)IO_TLB_SEGSIZE);
>
> @@ -138,12 +256,18 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
> rc = xen_create_contiguous_region(
> (unsigned long)buf + (i << IO_TLB_SHIFT),
> get_order(slabs << IO_TLB_SHIFT),
> - dma_bits, &dma_handle);
> + dma_bits, &xen_dma_seg[j].dma_addr);
> } while (rc && dma_bits++ < max_dma_bits);
> if (rc)
> return rc;
>
> + xen_dma_seg[j].phys_addr = virt_to_phys(buf + (i << IO_TLB_SHIFT));
> + xen_dma_seg[j].size = slabs << IO_TLB_SHIFT;
> + rc = xen_dma_add_entry(&xen_dma_seg[j]);
> + if (rc != 0)
> + return rc;
> i += slabs;
> + j++;
> } while (i < nslabs);
> return 0;
> }
> @@ -193,9 +317,10 @@ retry:
> /*
> * Get IO TLB memory from any location.
> */
> - if (early)
> + if (early) {
> xen_io_tlb_start = alloc_bootmem_pages(PAGE_ALIGN(bytes));
> - else {
> + xen_dma_seg = alloc_bootmem(sizeof(struct xen_dma_info) * NR_DMA_SEGS);
> + } else {
> #define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
> #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
> while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
> @@ -210,6 +335,8 @@ retry:
> xen_io_tlb_nslabs = SLABS_PER_PAGE << order;
> bytes = xen_io_tlb_nslabs << IO_TLB_SHIFT;
> }
> + xen_dma_seg = kzalloc(sizeof(struct xen_dma_info) * NR_DMA_SEGS,
> + GFP_KERNEL);
> }
> if (!xen_io_tlb_start) {
> m_ret = XEN_SWIOTLB_ENOMEM;
> @@ -232,7 +359,6 @@ retry:
> m_ret = XEN_SWIOTLB_EFIXUP;
> goto error;
> }
> - start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
> if (early) {
> if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
> verbose))
> @@ -290,7 +416,8 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
>
> phys = virt_to_phys(ret);
> dev_addr = xen_phys_to_bus(phys);
> - if (((dev_addr + size - 1 <= dma_mask)) &&
> + if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> + ((dev_addr + size - 1 <= dma_mask)) &&
> !range_straddles_page_boundary(phys, size))
> *dma_handle = dev_addr;
> else {
> @@ -321,8 +448,9 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
>
> phys = virt_to_phys(vaddr);
>
> - if (((dev_addr + size - 1 > dma_mask)) ||
> - range_straddles_page_boundary(phys, size))
> + if (xen_feature(XENFEAT_auto_translated_physmap) ||
> + (((dev_addr + size - 1 > dma_mask)) ||
> + range_straddles_page_boundary(phys, size)))
> xen_destroy_contiguous_region((unsigned long)vaddr, order);
>
> free_pages((unsigned long)vaddr, order);
> @@ -351,14 +479,19 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> * we can safely return the device addr and not worry about bounce
> * buffering it.
> */
> - if (dma_capable(dev, dev_addr, size) &&
> + if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> + dma_capable(dev, dev_addr, size) &&
> !range_straddles_page_boundary(phys, size) && !swiotlb_force)
> return dev_addr;
>
> /*
> * Oh well, have to allocate and map a bounce buffer.
> + * Pass the dma_addr of the first slab in the iotlb buffer as
> + * argument so that swiotlb_tbl_map_single is free to allocate
> + * the bounce buffer anywhere appropriate in io_tlb_start -
> + * io_tlb_end.
> */
> - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir);
> + map = swiotlb_tbl_map_single(dev, xen_dma_seg[0].dma_addr, phys, size, dir);
> if (map == SWIOTLB_MAP_ERROR)
> return DMA_ERROR_CODE;
>
> @@ -494,10 +627,17 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> dma_addr_t dev_addr = xen_phys_to_bus(paddr);
>
> if (swiotlb_force ||
> + xen_feature(XENFEAT_auto_translated_physmap) ||
> !dma_capable(hwdev, dev_addr, sg->length) ||
> range_straddles_page_boundary(paddr, sg->length)) {
> + /*
> + * Pass the dma_addr of the first slab in the iotlb buffer as
> + * argument so that swiotlb_tbl_map_single is free to allocate
> + * the bounce buffer anywhere appropriate in io_tlb_start -
> + * io_tlb_end.
> + */
> phys_addr_t map = swiotlb_tbl_map_single(hwdev,
> - start_dma_addr,
> + xen_dma_seg[0].dma_addr,
> sg_phys(sg),
> sg->length,
> dir);
> --
Did you run any performance tests to see if adding the extra
spinlock (as the native SWIOTLB already has its own lock) and handling
of the tree is affecting it?
In the worst case when we do need to use the bounce buffer we end
up using two spinlocks.
Is there perhaps a better way? Could we eliminate the usage of the
spinlocks by doing some hashing and on the red-black trees having
a lock? And moving that in the SWIOTLB generic code? Similar to how
we do M2P or tmem does it? That would mean we could split up the
mega 64MB buffer in smaller chunks - as the code (swiotlb) already
assumes IO_TLB_SEGSIZE (128) slabs to allocate - which is 512kB
contingous memory (If memory serves right). Altering the underlaying
code from using an array to using an hash and from the hash
entries use the red-black trees. Or perhaps another array.
Obviously you still need to reference the dma to virtual address
lookup from the tree (as you have done here).
P.S.
I am also the SWIOTLB maintainer, so it is OK to modify the SWIOTLB
to be faster.
> 1.7.2.5
>
On Thu, 2013-08-15 at 12:10 +0100, Stefano Stabellini wrote:
> At the moment always rely on swiotlb-xen, but when Xen starts supporting
> hardware IOMMUs we'll be able to avoid it conditionally on the presence
> of an IOMMU on the platform.
Do we have any idea how we are going to do this?
It's extra complicated if you consider that on some systems on some of
the devices are behind an IOMMU :-/
I wonder if we can enumerate which devices have an IOMMU at boot time
and force a ludicrous dma mask (such as 0) if one isn't present in order
to force us to always take the exchange_and_pin path?
Ian.
On Thu, Aug 15, 2013 at 12:10:50PM +0100, Stefano Stabellini wrote:
> Signed-off-by: Stefano Stabellini <[email protected]>
Some explanation on why and how early it needs to be moved?
--
Catalin
On Thu, Aug 15, 2013 at 12:10:49PM +0100, Stefano Stabellini wrote:
> Signed-off-by: Stefano Stabellini <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> arch/arm64/mm/dma-mapping.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 4bd7579..a006e84 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -63,6 +63,8 @@ static struct dma_map_ops arm64_swiotlb_dma_ops = {
>
> void __init arm64_swiotlb_init(void)
> {
> + if (dma_ops != NULL)
> + return;
> dma_ops = &arm64_swiotlb_dma_ops;
> swiotlb_init(1);
Acked-by: Catalin Marinas <[email protected]>
On Mon, 19 Aug 2013, Catalin Marinas wrote:
> On Thu, Aug 15, 2013 at 12:10:50PM +0100, Stefano Stabellini wrote:
> > Signed-off-by: Stefano Stabellini <[email protected]>
>
> Some explanation on why and how early it needs to be moved?
It only needs to be moved before any dma requests are made.
I am not sure exactly where the right place for it would be but I was
pretty sure that right after psci_init would be early enough.
On Thu, 15 Aug 2013, Ian Campbell wrote:
> On Thu, 2013-08-15 at 12:10 +0100, Stefano Stabellini wrote:
> > At the moment always rely on swiotlb-xen, but when Xen starts supporting
> > hardware IOMMUs we'll be able to avoid it conditionally on the presence
> > of an IOMMU on the platform.
>
> Do we have any idea how we are going to do this?
>
> It's extra complicated if you consider that on some systems on some of
> the devices are behind an IOMMU :-/
>
> I wonder if we can enumerate which devices have an IOMMU at boot time
> and force a ludicrous dma mask (such as 0) if one isn't present in order
> to force us to always take the exchange_and_pin path?
We don't need to worry about how to specify which devices need to go via
the swiotlb internally, because we have our own arm specific
dma_map_ops. At the moment they are just implemented using the
swiotlb-xen functions, but we could easily provide wrappers that check
our own internal whitelist/blacklist and go via swiotlb-xen only in
those cases.
On Wed, 2013-08-28 at 21:07 +0100, Stefano Stabellini wrote:
> On Thu, 15 Aug 2013, Ian Campbell wrote:
> > On Thu, 2013-08-15 at 12:10 +0100, Stefano Stabellini wrote:
> > > At the moment always rely on swiotlb-xen, but when Xen starts supporting
> > > hardware IOMMUs we'll be able to avoid it conditionally on the presence
> > > of an IOMMU on the platform.
> >
> > Do we have any idea how we are going to do this?
> >
> > It's extra complicated if you consider that on some systems on some of
> > the devices are behind an IOMMU :-/
> >
> > I wonder if we can enumerate which devices have an IOMMU at boot time
> > and force a ludicrous dma mask (such as 0) if one isn't present in order
> > to force us to always take the exchange_and_pin path?
>
> We don't need to worry about how to specify which devices need to go via
> the swiotlb internally, because we have our own arm specific
> dma_map_ops. At the moment they are just implemented using the
> swiotlb-xen functions, but we could easily provide wrappers that check
> our own internal whitelist/blacklist and go via swiotlb-xen only in
> those cases.
OK, but how do we decide which devices go on those lists? We need some
sort of indication from the hypervisor, don't we? Only Xen knows if it
has an iommu it can use because the iommu must necessarily be hidden
from Linux.
Ian.
On Thu, 29 Aug 2013, Ian Campbell wrote:
> On Wed, 2013-08-28 at 21:07 +0100, Stefano Stabellini wrote:
> > On Thu, 15 Aug 2013, Ian Campbell wrote:
> > > On Thu, 2013-08-15 at 12:10 +0100, Stefano Stabellini wrote:
> > > > At the moment always rely on swiotlb-xen, but when Xen starts supporting
> > > > hardware IOMMUs we'll be able to avoid it conditionally on the presence
> > > > of an IOMMU on the platform.
> > >
> > > Do we have any idea how we are going to do this?
> > >
> > > It's extra complicated if you consider that on some systems on some of
> > > the devices are behind an IOMMU :-/
> > >
> > > I wonder if we can enumerate which devices have an IOMMU at boot time
> > > and force a ludicrous dma mask (such as 0) if one isn't present in order
> > > to force us to always take the exchange_and_pin path?
> >
> > We don't need to worry about how to specify which devices need to go via
> > the swiotlb internally, because we have our own arm specific
> > dma_map_ops. At the moment they are just implemented using the
> > swiotlb-xen functions, but we could easily provide wrappers that check
> > our own internal whitelist/blacklist and go via swiotlb-xen only in
> > those cases.
>
> OK, but how do we decide which devices go on those lists? We need some
> sort of indication from the hypervisor, don't we? Only Xen knows if it
> has an iommu it can use because the iommu must necessarily be hidden
> from Linux.
Right. Maybe Xen could mark the devices that are safe to use without
swiotlb-xen on device tree? Adding a property to the node?
On Thu, 15 Aug 2013, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 15, 2013 at 12:10:51PM +0100, Stefano Stabellini wrote:
> > XENMEM_exchange can't be used by autotranslate guests because of two
> > severe limitations:
> >
> > - it does not copy back the mfns into the out field for autotranslate
> > guests;
> >
> > - it does not guarantee that the hypervisor won't change the p2m
> > mappings for the exchanged pages while the guest is using them. Xen
> > never promises to keep the p2m mapping stable for autotranslate guests
> > in general. In practice it won't happen unless one uses uncommon
> > features like memory sharing or paging.
> >
> > To overcome these problems I am introducing two new hypercalls.
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>
> >
> >
> > Changes in v4:
> > - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
> > - rename XENMEM_put_dma_buf to XENMEM_unpin;
> > - improve the documentation of the new hypercalls;
> > - add a note about out.address_bits for XENMEM_exchange.
> > ---
> > include/xen/interface/memory.h | 37 +++++++++++++++++++++++++++++++++++++
> > 1 files changed, 37 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
> > index 2ecfe4f..5708ff8 100644
> > --- a/include/xen/interface/memory.h
> > +++ b/include/xen/interface/memory.h
> > @@ -66,6 +66,7 @@ struct xen_memory_exchange {
> > /*
> > * [IN] Details of memory extents to be exchanged (GMFN bases).
> > * Note that @in.address_bits is ignored and unused.
> > + * @out.address_bits should contain the address mask for the new pages.
> > */
> > struct xen_memory_reservation in;
> >
> > @@ -263,4 +264,40 @@ struct xen_remove_from_physmap {
> > };
> > DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
> >
> > +#define XENMEM_exchange_and_pin 26
> > +/*
> > + * This hypercall is similar to XENMEM_exchange: it takes the same
> > + * struct as an argument and it exchanges the pages passed in with a new
> > + * set of pages. The new pages are going to be "pinned": it's guaranteed
> > + * that their p2m mapping won't be changed until explicitly "unpinned".
> > + * The content of the exchanged pages is lost.
> > + * Only normal guest r/w memory can be pinned: no granted pages or
> > + * ballooned pages.
> > + * If return code is zero then @out.extent_list provides the DMA frame
> > + * numbers of the newly-allocated memory.
> > + * Returns zero on complete success, otherwise a negative error code:
> > + * -ENOSYS if not implemented
> > + * -EINVAL if the page is already pinned
> > + * -EFAULT if the physical to machine walk fails
>
> I don't know what that means. Physical to machine walk? That sounds like
> you are trying to do some form of mind melt between robots and humans
> while walking around the office.
>
> Could you expand this a bit please?
Actually pfn to mfn resolution is not the only possible cause of
-EFAULT, for example all the copy_to/from_guest failures return
-EFAULT.
I am just going to go for the generic
"if an internal error occurs"
On Thu, 15 Aug 2013, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 15, 2013 at 12:10:53PM +0100, Stefano Stabellini wrote:
> > Support autotranslate guests in swiotlb-xen by keeping track of the
> > phys-to-bus and bus-to-phys mappings of the swiotlb buffer
> > (xen_io_tlb_start-xen_io_tlb_end).
> >
> > Use a simple direct access on a pre-allocated array for phys-to-bus
> > queries. Use a red-black tree for bus-to-phys queries.
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>
> > Reviewed-by: David Vrabel <[email protected]>
> >
> >
> > Changes in v4:
> > - add err_out label in xen_dma_add_entry;
> > - remove INVALID_ADDRESS, use DMA_ERROR_CODE instead;
> > - code style fixes;
> > - add in-code comments regarding the usage of xen_dma_seg[0].dma_addr.
> >
> > Changes in v3:
> > - many code style and name changes;
> > - improve error checks in xen_dma_add_entry.
> > ---
> > drivers/xen/swiotlb-xen.c | 172 ++++++++++++++++++++++++++++++++++++++++----
> > 1 files changed, 156 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index b72f31c..8a403a0 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -38,32 +38,148 @@
> > #include <linux/bootmem.h>
> > #include <linux/dma-mapping.h>
> > #include <linux/export.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock_types.h>
> > +#include <linux/rbtree.h>
> > #include <xen/swiotlb-xen.h>
> > #include <xen/page.h>
> > #include <xen/xen-ops.h>
> > #include <xen/hvc-console.h>
> > +#include <xen/features.h>
> > /*
> > * Used to do a quick range check in swiotlb_tbl_unmap_single and
> > * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
> > * API.
> > */
> >
> > +#define NR_DMA_SEGS ((xen_io_tlb_nslabs + IO_TLB_SEGSIZE - 1) / IO_TLB_SEGSIZE)
> > static char *xen_io_tlb_start, *xen_io_tlb_end;
> > static unsigned long xen_io_tlb_nslabs;
> > /*
> > * Quick lookup value of the bus address of the IOTLB.
> > */
> >
> > -static u64 start_dma_addr;
> > +struct xen_dma_info {
> > + dma_addr_t dma_addr;
> > + phys_addr_t phys_addr;
> > + size_t size;
> > + struct rb_node rbnode;
> > +};
> > +
> > +/*
> > + * This array of struct xen_dma_info is indexed by physical addresses,
> > + * starting from virt_to_phys(xen_io_tlb_start). Each entry maps
> > + * (IO_TLB_SEGSIZE << IO_TLB_SHIFT) bytes, except the last one that is
> > + * smaller. Getting the dma address corresponding to a given physical
> > + * address can be done by direct access with the right index on the
> > + * array.
> > + */
> > +static struct xen_dma_info *xen_dma_seg;
> > +/*
> > + * This tree keeps track of bus address to physical address
> > + * mappings.
> > + */
> > +static struct rb_root bus_to_phys = RB_ROOT;
> > +/* This lock protects operations on the bus_to_phys tree */
> > +static DEFINE_SPINLOCK(xen_bus_to_phys_lock);
> > +
> > +static int xen_dma_add_entry(struct xen_dma_info *new)
> > +{
> > + struct rb_node **link = &bus_to_phys.rb_node;
> > + struct rb_node *parent = NULL;
> > + struct xen_dma_info *entry;
> > + int rc = 0;
> > +
> > + spin_lock(&xen_bus_to_phys_lock);
> > +
> > + while (*link) {
> > + parent = *link;
> > + entry = rb_entry(parent, struct xen_dma_info, rbnode);
> > +
> > + if (new->dma_addr == entry->dma_addr) {
> > + spin_unlock(&xen_bus_to_phys_lock);
> > + pr_warn("%s: cannot add phys=0x%pa -> dma=0x%pa, the dma address is already present, mapping to 0x%pa\n",
> > + __func__, &new->phys_addr,
> > + &new->dma_addr, &entry->phys_addr);
> > + rc = -EINVAL;
> > + goto err_out;
> > + }
> > + if (new->phys_addr == entry->phys_addr) {
> > + spin_unlock(&xen_bus_to_phys_lock);
> > + pr_warn("%s: cannot add phys=0x%pa -> dma=0x%pa, the phys address is already present, mapping to 0x%pa\n",
> > + __func__, &new->phys_addr,
> > + &new->dma_addr, &entry->dma_addr);
> > + rc = -EINVAL;
>
> You didn't test this logic path, did you :-)
>
> See the double spin_unlock?
>
> I was thinking you could have the pr_warn in the err_out
> label code.
good suggestion, I'll do that
> Did you run any performance tests to see if adding the extra
> spinlock (as the native SWIOTLB already has its own lock) and handling
> of the tree is affecting it?
>
> In the worst case when we do need to use the bounce buffer we end
> up using two spinlocks.
>
> Is there perhaps a better way? Could we eliminate the usage of the
> spinlocks by doing some hashing and on the red-black trees having
> a lock? And moving that in the SWIOTLB generic code? Similar to how
> we do M2P or tmem does it? That would mean we could split up the
> mega 64MB buffer in smaller chunks - as the code (swiotlb) already
> assumes IO_TLB_SEGSIZE (128) slabs to allocate - which is 512kB
> contingous memory (If memory serves right). Altering the underlaying
> code from using an array to using an hash and from the hash
> entries use the red-black trees. Or perhaps another array.
> Obviously you still need to reference the dma to virtual address
> lookup from the tree (as you have done here).
>
> P.S.
> I am also the SWIOTLB maintainer, so it is OK to modify the SWIOTLB
> to be faster.
I haven't done any measurements but consider that the spin_lock is
already only used to access the red-black tree that keeps track of
dma_addr -> phys_addr mappings.
So it's taken at setup time once, then every time we call
xen_bus_to_phys and the guest is an autotraslate guest.
If the guest is not autotraslate there are no additional locks.
That makes me realize that we don't need any spin_locks at all: there
are no risks of concurrent accesses and modifications of the tree
because there are no changes on the tree once it's setup at boot time.
We can get rid of the spin_lock entirely as concurrent read accesses on
the tree are obviously fine.
On Thu, Aug 29, 2013 at 04:38:17PM +0100, Stefano Stabellini wrote:
> On Thu, 29 Aug 2013, Ian Campbell wrote:
> > On Wed, 2013-08-28 at 21:07 +0100, Stefano Stabellini wrote:
> > > On Thu, 15 Aug 2013, Ian Campbell wrote:
> > > > On Thu, 2013-08-15 at 12:10 +0100, Stefano Stabellini wrote:
> > > > > At the moment always rely on swiotlb-xen, but when Xen starts supporting
> > > > > hardware IOMMUs we'll be able to avoid it conditionally on the presence
> > > > > of an IOMMU on the platform.
> > > >
> > > > Do we have any idea how we are going to do this?
> > > >
> > > > It's extra complicated if you consider that on some systems on some of
> > > > the devices are behind an IOMMU :-/
> > > >
> > > > I wonder if we can enumerate which devices have an IOMMU at boot time
> > > > and force a ludicrous dma mask (such as 0) if one isn't present in order
> > > > to force us to always take the exchange_and_pin path?
> > >
> > > We don't need to worry about how to specify which devices need to go via
> > > the swiotlb internally, because we have our own arm specific
> > > dma_map_ops. At the moment they are just implemented using the
> > > swiotlb-xen functions, but we could easily provide wrappers that check
> > > our own internal whitelist/blacklist and go via swiotlb-xen only in
> > > those cases.
> >
> > OK, but how do we decide which devices go on those lists? We need some
> > sort of indication from the hypervisor, don't we? Only Xen knows if it
> > has an iommu it can use because the iommu must necessarily be hidden
> > from Linux.
>
> Right. Maybe Xen could mark the devices that are safe to use without
> swiotlb-xen on device tree? Adding a property to the node?
We do a hypercall for adding devices (See drivers/xen/pci.c). Perhaps
you can piggyback on that one and on ARM return 0 for the devices that
are under Xen control (so IOMMU) and for non-Xen control (those
that will need SWIOTLB) return -EYOUNEEDTOHANDLEIT and use it?
> > Did you run any performance tests to see if adding the extra
> > spinlock (as the native SWIOTLB already has its own lock) and handling
> > of the tree is affecting it?
.. bla bla..
> I haven't done any measurements but consider that the spin_lock is
> already only used to access the red-black tree that keeps track of
> dma_addr -> phys_addr mappings.
> So it's taken at setup time once, then every time we call
> xen_bus_to_phys and the guest is an autotraslate guest.
> If the guest is not autotraslate there are no additional locks.
>
> That makes me realize that we don't need any spin_locks at all: there
> are no risks of concurrent accesses and modifications of the tree
> because there are no changes on the tree once it's setup at boot time.
> We can get rid of the spin_lock entirely as concurrent read accesses on
> the tree are obviously fine.
Nice :-) I think that (and the goto) were the only concerns I had.
Thanks!