2013-09-27 16:09:58

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 0/19] enable swiotlb-xen on arm and arm64

Hi all,
this patch series enables xen-swiotlb on arm and arm64.

Considering that all guests, including dom0, run on xen on arm with
second stage translation enabled, it follows that without an IOMMU no
guests could actually drive the hardware.

The solution for platforms without an IOMMU is to use swiotlb-xen,
adapted to autotranslate guests. swiotlb-xen provides a set of dma_ops
that can be used by Linux to setup a contiguous buffer in stage-2
addresses and use it for dma operations.
Basically Linux asks Xen to make a buffer contiguous and gets the
machine address for it. This buffer is going to be used by lib/swiotlb.c
to allocate bounce buffers.

In few cases we can actually avoid going through the swiotlb bounce
buffer: if a dma request involves only a single page we can try to pin
the page passed in as an argument and return its machine address.
In order to do this we need to keep track of the machine to physical
mappings of the pinned pages.
Since this approach is preferable to bouncing all the times but it only
works with single pages, we force biovec requests not to be merged.

Cheers,

Stefano


Changes in v6:
- check for dev->dma_mask being NULL in dma_capable;
- update the comments and the hypercalls structures;
- add a xen_dma_info entry to the rbtree in xen_swiotlb_alloc_coherent
to keep track of the new mapping. Free the entry in xen_swiotlb_free_coherent;
- rename xen_dma_seg to dma_info in xen_swiotlb_alloc/free_coherent to
avoid confusions;
- introduce and export xen_dma_ops;
- call xen_mm_init from as arch_initcall;
- call __get_dma_ops to get the native dma_ops pointer on arm;
- do not merge biovecs;
- add single page optimization: pin the page rather than bouncing.

Changes in v5:
- dropped the first two patches, already in the Xen tree;
- implement dma_mark_clean using dmac_flush_range on arm;
- add "arm64: define DMA_ERROR_CODE"
- better comment for XENMEM_exchange_and_pin return codes;
- fix xen_dma_add_entry error path;
- remove the spin_lock: the red-black tree is not modified at run time;
- add "swiotlb-xen: introduce xen_swiotlb_set_dma_mask";
- add "xen: introduce xen_alloc/free_coherent_pages";
- add "swiotlb-xen: use xen_alloc/free_coherent_pages";
- add "swiotlb: don't assume that io_tlb_start-io_tlb_end is coherent".

Changes in v4:
- rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
- rename XENMEM_put_dma_buf to XENMEM_unpin;
- improve the documentation of the new hypercalls;
- add a note about out.address_bits for XENMEM_exchange;
- code style fixes;
- add err_out label in xen_dma_add_entry;
- remove INVALID_ADDRESS, use DMA_ERROR_CODE instead;
- add in-code comments regarding the usage of xen_dma_seg[0].dma_addr.

Changes in v3:
- add a patch to compile SWIOTLB without CONFIG_NEED_SG_DMA_LENGTH;
- add a patch to compile SWIOTLB_XEN without CONFIG_NEED_SG_DMA_LENGTH;
- arm/dma_capable: do not treat dma_mask as a limit;
- arm/dmabounce: keep using arm_dma_ops;
- add missing __init in xen_early_init declaration;
- many code style and name changes in swiotlb-xen.c;
- improve error checks in xen_dma_add_entry;
- warn on XENMEM_put_dma_buf failures.

Changes in v2:
- fixed a couple of errors in xen_bus_to_phys, xen_phys_to_bus and
xen_swiotlb_fixup.



Julien Grall (1):
ASoC: Samsung: Rename dma_ops by samsung_dma_ops

Stefano Stabellini (18):
arm: make SWIOTLB available
arm64: define DMA_ERROR_CODE
xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin
xen: make xen_create_contiguous_region return the dma address
swiotlb-xen: support autotranslate guests
xen/arm,arm64: enable SWIOTLB_XEN
swiotlb-xen: introduce xen_swiotlb_set_dma_mask
arm/xen: get_dma_ops: return xen_dma_ops if we are running on Xen
arm64/xen: get_dma_ops: return xen_dma_ops if we are running on Xen
xen: introduce xen_alloc/free_coherent_pages
swiotlb-xen: use xen_alloc/free_coherent_pages
swiotlb: don't assume that io_tlb_start-io_tlb_end is coherent
swiotlb: print a warning when the swiotlb is full
swiotlb-xen: call dma_capable only if dev->dma_mask is allocated
arm,arm64: do not always merge biovec if we are running on Xen
xen: introduce XENMEM_pin
swiotlb-xen: introduce a rbtree to track phys to bus mappings
swiotlb-xen: instead of bouncing on the swiotlb, pin single pages

arch/arm/Kconfig | 7 +
arch/arm/include/asm/dma-mapping.h | 50 +++-
arch/arm/include/asm/io.h | 8 +
arch/arm/include/asm/xen/hypervisor.h | 2 +
arch/arm/include/asm/xen/page-coherent.h | 22 ++
arch/arm/include/asm/xen/page.h | 2 +
arch/arm/xen/Makefile | 2 +-
arch/arm/xen/mm.c | 137 ++++++++
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/dma-mapping.h | 14 +-
arch/arm64/include/asm/io.h | 9 +
arch/arm64/include/asm/xen/page-coherent.h | 24 ++
arch/arm64/xen/Makefile | 2 +-
arch/ia64/include/asm/xen/page-coherent.h | 24 ++
arch/x86/include/asm/xen/page-coherent.h | 24 ++
arch/x86/xen/mmu.c | 18 +-
drivers/xen/Kconfig | 1 -
drivers/xen/biomerge.c | 4 +-
drivers/xen/swiotlb-xen.c | 464 +++++++++++++++++++++++++---
include/xen/interface/memory.h | 83 +++++
include/xen/swiotlb-xen.h | 2 +
include/xen/xen-ops.h | 8 +-
lib/swiotlb.c | 14 +-
sound/soc/samsung/dma.c | 4 +-
24 files changed, 870 insertions(+), 56 deletions(-)
create mode 100644 arch/arm/include/asm/xen/page-coherent.h
create mode 100644 arch/arm/xen/mm.c
create mode 100644 arch/arm64/include/asm/xen/page-coherent.h
create mode 100644 arch/ia64/include/asm/xen/page-coherent.h
create mode 100644 arch/x86/include/asm/xen/page-coherent.h

git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git swiotlb-xen-6


2013-09-27 16:11:01

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 02/19] arm64: define DMA_ERROR_CODE

Signed-off-by: Stefano Stabellini <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/arm64/include/asm/dma-mapping.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index 8d18100..c2cb8a0 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -25,6 +25,7 @@

#define ARCH_HAS_DMA_GET_REQUIRED_MASK

+#define DMA_ERROR_CODE (~0)
extern struct dma_map_ops *dma_ops;

static inline struct dma_map_ops *get_dma_ops(struct device *dev)
--
1.7.2.5

2013-09-27 16:11:03

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 06/19] xen/arm,arm64: enable SWIOTLB_XEN

Xen on arm and arm64 needs SWIOTLB_XEN: when running on Xen we need to
program the hardware with mfns rather than pfns for dma addresses.
Remove SWIOTLB_XEN dependency on X86 and PCI and make XEN select
SWIOTLB_XEN on arm and arm64.

At the moment always rely on swiotlb-xen, but when Xen starts supporting
hardware IOMMUs we'll be able to avoid it conditionally on the presence
of an IOMMU on the platform.

Implement xen_create_contiguous_region on arm and arm64 by using
XENMEM_exchange_and_pin.

Initialize the xen-swiotlb from xen_early_init (before the native
dma_ops are initialized), set xen_dma_ops to &xen_swiotlb_dma_ops.

Signed-off-by: Stefano Stabellini <[email protected]>


Changes in v6:
- introduce and export xen_dma_ops;
- call xen_mm_init from as arch_initcall.

Changes in v4:
- remove redefinition of DMA_ERROR_CODE;
- update the code to use XENMEM_exchange_and_pin and XENMEM_unpin;
- add a note about hardware IOMMU in the commit message.

Changes in v3:
- code style changes;
- warn on XENMEM_put_dma_buf failures.
---
arch/arm/Kconfig | 1 +
arch/arm/include/asm/xen/hypervisor.h | 2 +
arch/arm/include/asm/xen/page.h | 2 +
arch/arm/xen/Makefile | 2 +-
arch/arm/xen/mm.c | 121 +++++++++++++++++++++++++++++++++
arch/arm64/Kconfig | 1 +
arch/arm64/xen/Makefile | 2 +-
drivers/xen/Kconfig | 1 -
drivers/xen/swiotlb-xen.c | 16 +++++
9 files changed, 145 insertions(+), 3 deletions(-)
create mode 100644 arch/arm/xen/mm.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c0bfb33..2c9d112 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1848,6 +1848,7 @@ config XEN
depends on CPU_V7 && !CPU_V6
depends on !GENERIC_ATOMIC64
select ARM_PSCI
+ select SWIOTLB_XEN
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.

diff --git a/arch/arm/include/asm/xen/hypervisor.h b/arch/arm/include/asm/xen/hypervisor.h
index d7ab99a..1317ee4 100644
--- a/arch/arm/include/asm/xen/hypervisor.h
+++ b/arch/arm/include/asm/xen/hypervisor.h
@@ -16,4 +16,6 @@ static inline enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
return PARAVIRT_LAZY_NONE;
}

+extern struct dma_map_ops *xen_dma_ops;
+
#endif /* _ASM_ARM_XEN_HYPERVISOR_H */
diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
index 359a7b5..b0f7150 100644
--- a/arch/arm/include/asm/xen/page.h
+++ b/arch/arm/include/asm/xen/page.h
@@ -6,12 +6,14 @@

#include <linux/pfn.h>
#include <linux/types.h>
+#include <linux/dma-mapping.h>

#include <xen/interface/grant_table.h>

#define pfn_to_mfn(pfn) (pfn)
#define phys_to_machine_mapping_valid(pfn) (1)
#define mfn_to_pfn(mfn) (mfn)
+#define mfn_to_local_pfn(m) (mfn_to_pfn(m))
#define mfn_to_virt(m) (__va(mfn_to_pfn(m) << PAGE_SHIFT))

#define pte_mfn pte_pfn
diff --git a/arch/arm/xen/Makefile b/arch/arm/xen/Makefile
index 4384103..66fc35d 100644
--- a/arch/arm/xen/Makefile
+++ b/arch/arm/xen/Makefile
@@ -1 +1 @@
-obj-y := enlighten.o hypercall.o grant-table.o
+obj-y := enlighten.o hypercall.o grant-table.o mm.o
diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
new file mode 100644
index 0000000..b065c98
--- /dev/null
+++ b/arch/arm/xen/mm.c
@@ -0,0 +1,121 @@
+#include <linux/bootmem.h>
+#include <linux/gfp.h>
+#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/dma-mapping.h>
+#include <linux/vmalloc.h>
+#include <linux/swiotlb.h>
+
+#include <xen/xen.h>
+#include <xen/interface/memory.h>
+#include <xen/swiotlb-xen.h>
+
+#include <asm/cacheflush.h>
+#include <asm/xen/page.h>
+#include <asm/xen/hypercall.h>
+#include <asm/xen/interface.h>
+
+static int xen_exchange_memory(xen_ulong_t extents_in,
+ unsigned int order_in,
+ xen_pfn_t *pfns_in,
+ xen_ulong_t extents_out,
+ unsigned int order_out,
+ xen_pfn_t *mfns_out,
+ unsigned int address_bits)
+{
+ long rc;
+ int success;
+
+ struct xen_memory_exchange exchange = {
+ .in = {
+ .nr_extents = extents_in,
+ .extent_order = order_in,
+ .domid = DOMID_SELF
+ },
+ .out = {
+ .nr_extents = extents_out,
+ .extent_order = order_out,
+ .address_bits = address_bits,
+ .domid = DOMID_SELF
+ }
+ };
+ set_xen_guest_handle(exchange.in.extent_start, pfns_in);
+ set_xen_guest_handle(exchange.out.extent_start, mfns_out);
+
+ BUG_ON(extents_in << order_in != extents_out << order_out);
+
+
+ rc = HYPERVISOR_memory_op(XENMEM_exchange_and_pin, &exchange);
+ success = (exchange.nr_exchanged == extents_in);
+
+ BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));
+ BUG_ON(success && (rc != 0));
+
+ return success;
+}
+
+int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
+ unsigned int address_bits,
+ dma_addr_t *dma_handle)
+{
+ phys_addr_t pstart = __pa(vstart);
+ xen_pfn_t in_frame, out_frame;
+ int success;
+
+ /* Get a new contiguous memory extent. */
+ in_frame = out_frame = pstart >> PAGE_SHIFT;
+ success = xen_exchange_memory(1, order, &in_frame,
+ 1, order, &out_frame,
+ address_bits);
+
+ if (!success)
+ return -ENOMEM;
+
+ *dma_handle = out_frame << PAGE_SHIFT;
+
+ return success ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
+
+void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
+{
+ xen_pfn_t in_frame = __pa(vstart) >> PAGE_SHIFT;
+ struct xen_unpin unpin = {
+ .in = {
+ .nr_extents = 1,
+ .extent_order = order,
+ .domid = DOMID_SELF
+ },
+ };
+ set_xen_guest_handle(unpin.in.extent_start, &in_frame);
+
+ WARN_ON(HYPERVISOR_memory_op(XENMEM_unpin, &unpin));
+}
+EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
+
+struct dma_map_ops *xen_dma_ops;
+EXPORT_SYMBOL_GPL(xen_dma_ops);
+
+static struct dma_map_ops xen_swiotlb_dma_ops = {
+ .mapping_error = xen_swiotlb_dma_mapping_error,
+ .alloc = xen_swiotlb_alloc_coherent,
+ .free = xen_swiotlb_free_coherent,
+ .sync_single_for_cpu = xen_swiotlb_sync_single_for_cpu,
+ .sync_single_for_device = xen_swiotlb_sync_single_for_device,
+ .sync_sg_for_cpu = xen_swiotlb_sync_sg_for_cpu,
+ .sync_sg_for_device = xen_swiotlb_sync_sg_for_device,
+ .map_sg = xen_swiotlb_map_sg_attrs,
+ .unmap_sg = xen_swiotlb_unmap_sg_attrs,
+ .map_page = xen_swiotlb_map_page,
+ .unmap_page = xen_swiotlb_unmap_page,
+ .dma_supported = xen_swiotlb_dma_supported,
+};
+
+int __init xen_mm_init(void)
+{
+ xen_swiotlb_init(1, false);
+ xen_dma_ops = &xen_swiotlb_dma_ops;
+ return 0;
+}
+arch_initcall(xen_mm_init);
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9737e97..aa1f6fb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -209,6 +209,7 @@ config XEN_DOM0
config XEN
bool "Xen guest support on ARM64 (EXPERIMENTAL)"
depends on ARM64 && OF
+ select SWIOTLB_XEN
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.

diff --git a/arch/arm64/xen/Makefile b/arch/arm64/xen/Makefile
index be24040..0ef9637 100644
--- a/arch/arm64/xen/Makefile
+++ b/arch/arm64/xen/Makefile
@@ -1,2 +1,2 @@
-xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o)
+xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o mm.o)
obj-y := xen-arm.o hypercall.o
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 9e02d60..7e83688 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -140,7 +140,6 @@ config XEN_GRANT_DEV_ALLOC

config SWIOTLB_XEN
def_bool y
- depends on PCI && X86
select SWIOTLB

config XEN_TMEM
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 84aef43..a1cb0f4 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -45,6 +45,8 @@
#include <xen/xen-ops.h>
#include <xen/hvc-console.h>
#include <xen/features.h>
+#include <asm/dma-mapping.h>
+
/*
* Used to do a quick range check in swiotlb_tbl_unmap_single and
* swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
@@ -58,6 +60,20 @@ static unsigned long xen_io_tlb_nslabs;
* Quick lookup value of the bus address of the IOTLB.
*/

+#ifndef CONFIG_X86
+static unsigned long dma_alloc_coherent_mask(struct device *dev,
+ gfp_t gfp)
+{
+ unsigned long dma_mask = 0;
+
+ dma_mask = dev->coherent_dma_mask;
+ if (!dma_mask)
+ dma_mask = (gfp & GFP_DMA) ? DMA_BIT_MASK(24) : DMA_BIT_MASK(32);
+
+ return dma_mask;
+}
+#endif
+
struct xen_dma_info {
dma_addr_t dma_addr;
phys_addr_t phys_addr;
--
1.7.2.5

2013-09-27 16:11:06

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 05/19] swiotlb-xen: support autotranslate guests

Support autotranslate guests in swiotlb-xen by keeping track of the
phys-to-bus and bus-to-phys mappings of the swiotlb buffer
(xen_io_tlb_start-xen_io_tlb_end).

Use a simple direct access on a pre-allocated array for phys-to-bus
queries. Use a red-black tree for bus-to-phys queries.

Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: David Vrabel <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>


Changes in v6:
- add a xen_dma_info entry to the rbtree in xen_swiotlb_alloc_coherent to keep
track of the new mapping. Free the entry in xen_swiotlb_free_coherent;
- rename xen_dma_seg to dma_info in xen_swiotlb_alloc/free_coherent to avoid
confusions.

Changes in v5:
- fix xen_dma_add_entry error path;
- remove the spin_lock: the red-black tree is not modified at run time.

Changes in v4:
- add err_out label in xen_dma_add_entry;
- remove INVALID_ADDRESS, use DMA_ERROR_CODE instead;
- code style fixes;
- add in-code comments regarding the usage of xen_dma_seg[0].dma_addr.

Changes in v3:
- many code style and name changes;
- improve error checks in xen_dma_add_entry.
---
drivers/xen/swiotlb-xen.c | 177 +++++++++++++++++++++++++++++++++++++++++----
1 files changed, 161 insertions(+), 16 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index b72f31c..84aef43 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -38,32 +38,131 @@
#include <linux/bootmem.h>
#include <linux/dma-mapping.h>
#include <linux/export.h>
+#include <linux/slab.h>
+#include <linux/rbtree.h>
#include <xen/swiotlb-xen.h>
#include <xen/page.h>
#include <xen/xen-ops.h>
#include <xen/hvc-console.h>
+#include <xen/features.h>
/*
* Used to do a quick range check in swiotlb_tbl_unmap_single and
* swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
* API.
*/

+#define NR_DMA_SEGS ((xen_io_tlb_nslabs + IO_TLB_SEGSIZE - 1) / IO_TLB_SEGSIZE)
static char *xen_io_tlb_start, *xen_io_tlb_end;
static unsigned long xen_io_tlb_nslabs;
/*
* Quick lookup value of the bus address of the IOTLB.
*/

-static u64 start_dma_addr;
+struct xen_dma_info {
+ dma_addr_t dma_addr;
+ phys_addr_t phys_addr;
+ size_t size;
+ struct rb_node rbnode;
+};
+
+/*
+ * This array of struct xen_dma_info is indexed by physical addresses,
+ * starting from virt_to_phys(xen_io_tlb_start). Each entry maps
+ * (IO_TLB_SEGSIZE << IO_TLB_SHIFT) bytes, except the last one that is
+ * smaller. Getting the dma address corresponding to a given physical
+ * address can be done by direct access with the right index on the
+ * array.
+ */
+static struct xen_dma_info *xen_dma_seg;
+/*
+ * This tree keeps track of bus address to physical address
+ * mappings.
+ */
+static struct rb_root bus_to_phys = RB_ROOT;
+
+static int xen_dma_add_entry(struct xen_dma_info *new)
+{
+ struct rb_node **link = &bus_to_phys.rb_node;
+ struct rb_node *parent = NULL;
+ struct xen_dma_info *entry;
+ int rc = 0;
+
+ while (*link) {
+ parent = *link;
+ entry = rb_entry(parent, struct xen_dma_info, rbnode);
+
+ if (new->dma_addr == entry->dma_addr)
+ goto err_out;
+ if (new->phys_addr == entry->phys_addr)
+ goto err_out;
+
+ if (new->dma_addr < entry->dma_addr)
+ link = &(*link)->rb_left;
+ else
+ link = &(*link)->rb_right;
+ }
+ rb_link_node(&new->rbnode, parent, link);
+ rb_insert_color(&new->rbnode, &bus_to_phys);
+ goto out;
+
+err_out:
+ rc = -EINVAL;
+ pr_warn("%s: cannot add phys=%pa -> dma=%pa: phys=%pa -> dma=%pa already exists\n",
+ __func__, &new->phys_addr, &new->dma_addr, &entry->phys_addr, &entry->dma_addr);
+out:
+ return rc;
+}
+
+static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
+{
+ struct rb_node *n = bus_to_phys.rb_node;
+ struct xen_dma_info *entry;
+
+ while (n) {
+ entry = rb_entry(n, struct xen_dma_info, rbnode);
+ if (entry->dma_addr <= dma_addr &&
+ entry->dma_addr + entry->size > dma_addr) {
+ return entry;
+ }
+ if (dma_addr < entry->dma_addr)
+ n = n->rb_left;
+ else
+ n = n->rb_right;
+ }
+
+ return NULL;
+}

static dma_addr_t xen_phys_to_bus(phys_addr_t paddr)
{
- return phys_to_machine(XPADDR(paddr)).maddr;
+ int nr_seg;
+ unsigned long offset;
+ char *vaddr;
+
+ if (!xen_feature(XENFEAT_auto_translated_physmap))
+ return phys_to_machine(XPADDR(paddr)).maddr;
+
+ vaddr = (char *)phys_to_virt(paddr);
+ if (vaddr >= xen_io_tlb_end || vaddr < xen_io_tlb_start)
+ return DMA_ERROR_CODE;
+
+ offset = vaddr - xen_io_tlb_start;
+ nr_seg = offset / (IO_TLB_SEGSIZE << IO_TLB_SHIFT);
+
+ return xen_dma_seg[nr_seg].dma_addr +
+ (paddr - xen_dma_seg[nr_seg].phys_addr);
}

static phys_addr_t xen_bus_to_phys(dma_addr_t baddr)
{
- return machine_to_phys(XMADDR(baddr)).paddr;
+ if (xen_feature(XENFEAT_auto_translated_physmap)) {
+ struct xen_dma_info *dma = xen_get_dma_info_from_dma(baddr);
+ if (dma == NULL)
+ return DMA_ERROR_CODE;
+ else
+ return dma->phys_addr + (baddr - dma->dma_addr);
+ } else
+ return machine_to_phys(XMADDR(baddr)).paddr;
}

static dma_addr_t xen_virt_to_bus(void *address)
@@ -107,6 +206,9 @@ static int is_xen_swiotlb_buffer(dma_addr_t dma_addr)
unsigned long pfn = mfn_to_local_pfn(mfn);
phys_addr_t paddr;

+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return 1;
+
/* If the address is outside our domain, it CAN
* have the same virtual address as another address
* in our domain. Therefore _only_ check address within our domain.
@@ -124,13 +226,12 @@ static int max_dma_bits = 32;
static int
xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
{
- int i, rc;
+ int i, j, rc;
int dma_bits;
- dma_addr_t dma_handle;

dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT;

- i = 0;
+ i = j = 0;
do {
int slabs = min(nslabs - i, (unsigned long)IO_TLB_SEGSIZE);

@@ -138,12 +239,18 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
rc = xen_create_contiguous_region(
(unsigned long)buf + (i << IO_TLB_SHIFT),
get_order(slabs << IO_TLB_SHIFT),
- dma_bits, &dma_handle);
+ dma_bits, &xen_dma_seg[j].dma_addr);
} while (rc && dma_bits++ < max_dma_bits);
if (rc)
return rc;

+ xen_dma_seg[j].phys_addr = virt_to_phys(buf + (i << IO_TLB_SHIFT));
+ xen_dma_seg[j].size = slabs << IO_TLB_SHIFT;
+ rc = xen_dma_add_entry(&xen_dma_seg[j]);
+ if (rc != 0)
+ return rc;
i += slabs;
+ j++;
} while (i < nslabs);
return 0;
}
@@ -193,9 +300,10 @@ retry:
/*
* Get IO TLB memory from any location.
*/
- if (early)
+ if (early) {
xen_io_tlb_start = alloc_bootmem_pages(PAGE_ALIGN(bytes));
- else {
+ xen_dma_seg = alloc_bootmem(sizeof(struct xen_dma_info) * NR_DMA_SEGS);
+ } else {
#define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))
#define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
@@ -210,6 +318,8 @@ retry:
xen_io_tlb_nslabs = SLABS_PER_PAGE << order;
bytes = xen_io_tlb_nslabs << IO_TLB_SHIFT;
}
+ xen_dma_seg = kzalloc(sizeof(struct xen_dma_info) * NR_DMA_SEGS,
+ GFP_KERNEL);
}
if (!xen_io_tlb_start) {
m_ret = XEN_SWIOTLB_ENOMEM;
@@ -232,7 +342,6 @@ retry:
m_ret = XEN_SWIOTLB_EFIXUP;
goto error;
}
- start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
if (early) {
if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
verbose))
@@ -267,6 +376,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
unsigned long vstart;
phys_addr_t phys;
dma_addr_t dev_addr;
+ struct xen_dma_info *dma_info = NULL;

/*
* Ignore region specifiers - the kernel's ideas of
@@ -290,7 +400,8 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,

phys = virt_to_phys(ret);
dev_addr = xen_phys_to_bus(phys);
- if (((dev_addr + size - 1 <= dma_mask)) &&
+ if (!xen_feature(XENFEAT_auto_translated_physmap) &&
+ ((dev_addr + size - 1 <= dma_mask)) &&
!range_straddles_page_boundary(phys, size))
*dma_handle = dev_addr;
else {
@@ -299,6 +410,22 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
free_pages(vstart, order);
return NULL;
}
+
+ dma_info = kzalloc(sizeof(struct xen_dma_info), GFP_KERNEL);
+ if (!dma_info) {
+ pr_warn("cannot allocate xen_dma_info\n");
+ xen_destroy_contiguous_region(phys, order);
+ return NULL;
+ }
+ dma_info->phys_addr = phys;
+ dma_info->size = size;
+ dma_info->dma_addr = *dma_handle;
+ if (xen_dma_add_entry(dma_info)) {
+ pr_warn("cannot add new entry to bus_to_phys\n");
+ xen_destroy_contiguous_region(phys, order);
+ kfree(dma_info);
+ return NULL;
+ }
}
memset(ret, 0, size);
return ret;
@@ -312,6 +439,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
int order = get_order(size);
phys_addr_t phys;
u64 dma_mask = DMA_BIT_MASK(32);
+ struct xen_dma_info *dma_info = NULL;

if (dma_release_from_coherent(hwdev, order, vaddr))
return;
@@ -321,9 +449,14 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,

phys = virt_to_phys(vaddr);

- if (((dev_addr + size - 1 > dma_mask)) ||
- range_straddles_page_boundary(phys, size))
+ if (xen_feature(XENFEAT_auto_translated_physmap) ||
+ (((dev_addr + size - 1 > dma_mask)) ||
+ range_straddles_page_boundary(phys, size))) {
xen_destroy_contiguous_region((unsigned long)vaddr, order);
+ dma_info = xen_get_dma_info_from_dma(dev_addr);
+ rb_erase(&dma_info->rbnode, &bus_to_phys);
+ kfree(dma_info);
+ }

free_pages((unsigned long)vaddr, order);
}
@@ -351,14 +484,19 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
* we can safely return the device addr and not worry about bounce
* buffering it.
*/
- if (dma_capable(dev, dev_addr, size) &&
+ if (!xen_feature(XENFEAT_auto_translated_physmap) &&
+ dma_capable(dev, dev_addr, size) &&
!range_straddles_page_boundary(phys, size) && !swiotlb_force)
return dev_addr;

/*
* Oh well, have to allocate and map a bounce buffer.
+ * Pass the dma_addr of the first slab in the iotlb buffer as
+ * argument so that swiotlb_tbl_map_single is free to allocate
+ * the bounce buffer anywhere appropriate in io_tlb_start -
+ * io_tlb_end.
*/
- map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir);
+ map = swiotlb_tbl_map_single(dev, xen_dma_seg[0].dma_addr, phys, size, dir);
if (map == SWIOTLB_MAP_ERROR)
return DMA_ERROR_CODE;

@@ -494,10 +632,17 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
dma_addr_t dev_addr = xen_phys_to_bus(paddr);

if (swiotlb_force ||
+ xen_feature(XENFEAT_auto_translated_physmap) ||
!dma_capable(hwdev, dev_addr, sg->length) ||
range_straddles_page_boundary(paddr, sg->length)) {
+ /*
+ * Pass the dma_addr of the first slab in the iotlb buffer as
+ * argument so that swiotlb_tbl_map_single is free to allocate
+ * the bounce buffer anywhere appropriate in io_tlb_start -
+ * io_tlb_end.
+ */
phys_addr_t map = swiotlb_tbl_map_single(hwdev,
- start_dma_addr,
+ xen_dma_seg[0].dma_addr,
sg_phys(sg),
sg->length,
dir);
--
1.7.2.5

2013-09-27 16:10:57

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 04/19] xen: make xen_create_contiguous_region return the dma address

Modify xen_create_contiguous_region to return the dma address of the
newly contiguous buffer.

Signed-off-by: Stefano Stabellini <[email protected]>
Acked-by: Konrad Rzeszutek Wilk <[email protected]>
Reviewed-by: David Vrabel <[email protected]>


Changes in v4:
- use virt_to_machine instead of virt_to_bus.
---
arch/x86/xen/mmu.c | 4 +++-
drivers/xen/swiotlb-xen.c | 6 +++---
include/xen/xen-ops.h | 3 ++-
3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fdc3ba2..6c34d7c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2329,7 +2329,8 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
}

int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
- unsigned int address_bits)
+ unsigned int address_bits,
+ dma_addr_t *dma_handle)
{
unsigned long *in_frames = discontig_frames, out_frame;
unsigned long flags;
@@ -2368,6 +2369,7 @@ int xen_create_contiguous_region(unsigned long vstart, unsigned int order,

spin_unlock_irqrestore(&xen_reservation_lock, flags);

+ *dma_handle = virt_to_machine(vstart).maddr;
return success ? 0 : -ENOMEM;
}
EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 1b2277c..b72f31c 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -126,6 +126,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
{
int i, rc;
int dma_bits;
+ dma_addr_t dma_handle;

dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT;

@@ -137,7 +138,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
rc = xen_create_contiguous_region(
(unsigned long)buf + (i << IO_TLB_SHIFT),
get_order(slabs << IO_TLB_SHIFT),
- dma_bits);
+ dma_bits, &dma_handle);
} while (rc && dma_bits++ < max_dma_bits);
if (rc)
return rc;
@@ -294,11 +295,10 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
*dma_handle = dev_addr;
else {
if (xen_create_contiguous_region(vstart, order,
- fls64(dma_mask)) != 0) {
+ fls64(dma_mask), dma_handle) != 0) {
free_pages(vstart, order);
return NULL;
}
- *dma_handle = virt_to_machine(ret).maddr;
}
memset(ret, 0, size);
return ret;
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index d6fe062..9ef704d 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -20,7 +20,8 @@ int xen_setup_shutdown_event(void);

extern unsigned long *xen_contiguous_bitmap;
int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
- unsigned int address_bits);
+ unsigned int address_bits,
+ dma_addr_t *dma_handle);

void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order);

--
1.7.2.5

2013-09-27 16:10:54

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 01/19] arm: make SWIOTLB available

IOMMU_HELPER is needed because SWIOTLB calls iommu_is_span_boundary,
provided by lib/iommu_helper.c.

Signed-off-by: Stefano Stabellini <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
CC: [email protected]
CC: [email protected]


Changes in v6:
- check for dev->dma_mask being NULL in dma_capable.

Changes in v5:
- implement dma_mark_clean using dmac_flush_range.

Changes in v3:
- dma_capable: do not treat dma_mask as a limit;
- remove SWIOTLB dependency on NEED_SG_DMA_LENGTH.
---
arch/arm/Kconfig | 6 +++++
arch/arm/include/asm/dma-mapping.h | 37 ++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ba412e0..c0bfb33 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1832,6 +1832,12 @@ config CC_STACKPROTECTOR
neutralized via a kernel panic.
This feature requires gcc version 4.2 or above.

+config SWIOTLB
+ def_bool y
+
+config IOMMU_HELPER
+ def_bool SWIOTLB
+
config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 5b579b9..8807124 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -10,6 +10,7 @@

#include <asm-generic/dma-coherent.h>
#include <asm/memory.h>
+#include <asm/cacheflush.h>

#define DMA_ERROR_CODE (~0)
extern struct dma_map_ops arm_dma_ops;
@@ -86,6 +87,42 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
}
#endif

+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
+{
+ unsigned int offset = paddr & ~PAGE_MASK;
+ return pfn_to_dma(dev, paddr >> PAGE_SHIFT) + offset;
+}
+
+static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
+{
+ unsigned int offset = dev_addr & ~PAGE_MASK;
+ return (dma_to_pfn(dev, dev_addr) << PAGE_SHIFT) + offset;
+}
+
+static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
+{
+ u64 limit, mask;
+
+ if (!dev->dma_mask)
+ return 0;
+
+ mask = *dev->dma_mask;
+
+ limit = (mask + 1) & ~mask;
+ if (limit && size > limit)
+ return 0;
+
+ if ((addr | (addr + size - 1)) & ~mask)
+ return 0;
+
+ return 1;
+}
+
+static inline void dma_mark_clean(void *addr, size_t size)
+{
+ dmac_flush_range(addr, addr + size);
+}
+
/*
* DMA errors are defined by all-bits-set in the DMA address.
*/
--
1.7.2.5

2013-09-27 16:11:57

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 07/19] swiotlb-xen: introduce xen_swiotlb_set_dma_mask

Implement xen_swiotlb_set_dma_mask, use it for set_dma_mask on arm.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/arm/xen/mm.c | 1 +
drivers/xen/swiotlb-xen.c | 12 ++++++++++++
include/xen/swiotlb-xen.h | 2 ++
3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index b065c98..4330b15 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -110,6 +110,7 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
.map_page = xen_swiotlb_map_page,
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
+ .set_dma_mask = xen_swiotlb_set_dma_mask,
};

int __init xen_mm_init(void)
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index a1cb0f4..deb9131 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -754,3 +754,15 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask;
}
EXPORT_SYMBOL_GPL(xen_swiotlb_dma_supported);
+
+int
+xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask)
+{
+ if (!dev->dma_mask || !xen_swiotlb_dma_supported(dev, dma_mask))
+ return -EIO;
+
+ *dev->dma_mask = dma_mask;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xen_swiotlb_set_dma_mask);
diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h
index de8bcc6..7b64465 100644
--- a/include/xen/swiotlb-xen.h
+++ b/include/xen/swiotlb-xen.h
@@ -55,4 +55,6 @@ xen_swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr);
extern int
xen_swiotlb_dma_supported(struct device *hwdev, u64 mask);

+extern int
+xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask);
#endif /* __LINUX_SWIOTLB_XEN_H */
--
1.7.2.5

2013-09-27 16:11:55

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 10/19] xen: introduce xen_alloc/free_coherent_pages

xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
and devices. On native x86 and ARMv8 is sufficient to call
__get_free_pages in order to get a coherent buffer, while on ARM we need
to call the native dma_ops->alloc implementation.

When arm64 stops using the swiotlb by default and starts having multiple
dma_ops implementations, we'll use __get_dma_ops there too.

Introduce xen_alloc_coherent_pages to abstract the arch specific buffer
allocation.

Similarly introduce xen_free_coherent_pages to free a coherent buffer:
on x86 and ARM64 is simply a call to free_pages while on ARM is
arm_dma_ops.free.

Signed-off-by: Stefano Stabellini <[email protected]>


Changes in v6:
- call __get_dma_ops to get the native dma_ops pointer on arm.
---
arch/arm/include/asm/xen/page-coherent.h | 22 ++++++++++++++++++++++
arch/arm64/include/asm/xen/page-coherent.h | 24 ++++++++++++++++++++++++
arch/ia64/include/asm/xen/page-coherent.h | 24 ++++++++++++++++++++++++
arch/x86/include/asm/xen/page-coherent.h | 24 ++++++++++++++++++++++++
4 files changed, 94 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/include/asm/xen/page-coherent.h
create mode 100644 arch/arm64/include/asm/xen/page-coherent.h
create mode 100644 arch/ia64/include/asm/xen/page-coherent.h
create mode 100644 arch/x86/include/asm/xen/page-coherent.h

diff --git a/arch/arm/include/asm/xen/page-coherent.h b/arch/arm/include/asm/xen/page-coherent.h
new file mode 100644
index 0000000..92b7a8a
--- /dev/null
+++ b/arch/arm/include/asm/xen/page-coherent.h
@@ -0,0 +1,22 @@
+#ifndef _ASM_ARM_XEN_PAGE_COHERENT_H
+#define _ASM_ARM_XEN_PAGE_COHERENT_H
+
+#include <asm/page.h>
+#include <linux/dma-attrs.h>
+#include <linux/dma-mapping.h>
+
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags,
+ struct dma_attrs *attrs)
+{
+ return __get_dma_ops(hwdev)->alloc(hwdev, size, dma_handle, flags, attrs);
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle,
+ struct dma_attrs *attrs)
+{
+ __get_dma_ops(hwdev)->free(hwdev, size, cpu_addr, dma_handle, attrs);
+}
+
+#endif /* _ASM_ARM_XEN_PAGE_COHERENT_H */
diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h
new file mode 100644
index 0000000..0d6ad25
--- /dev/null
+++ b/arch/arm64/include/asm/xen/page-coherent.h
@@ -0,0 +1,24 @@
+#ifndef _ASM_ARM64_XEN_PAGE_COHERENT_H
+#define _ASM_ARM64_XEN_PAGE_COHERENT_H
+
+#include <asm/page.h>
+#include <linux/dma-attrs.h>
+#include <linux/dma-mapping.h>
+
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags,
+ struct dma_attrs *attrs)
+{
+ void *vstart = (void*)__get_free_pages(flags, get_order(size));
+ *dma_handle = virt_to_phys(vstart);
+ return vstart;
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle,
+ struct dma_attrs *attrs)
+{
+ free_pages((unsigned long) cpu_addr, get_order(size));
+}
+
+#endif /* _ASM_ARM64_XEN_PAGE_COHERENT_H */
diff --git a/arch/ia64/include/asm/xen/page-coherent.h b/arch/ia64/include/asm/xen/page-coherent.h
new file mode 100644
index 0000000..37b929c
--- /dev/null
+++ b/arch/ia64/include/asm/xen/page-coherent.h
@@ -0,0 +1,24 @@
+#ifndef _ASM_IA64_XEN_PAGE_COHERENT_H
+#define _ASM_IA64_XEN_PAGE_COHERENT_H
+
+#include <asm/page.h>
+#include <linux/dma-attrs.h>
+#include <linux/dma-mapping.h>
+
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags,
+ struct dma_attrs *attrs)
+{
+ void *vstart = (void*)__get_free_pages(flags, get_order(size));
+ *dma_handle = virt_to_phys(vstart);
+ return vstart;
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle,
+ struct dma_attrs *attrs)
+{
+ free_pages((unsigned long) cpu_addr, get_order(size));
+}
+
+#endif /* _ASM_IA64_XEN_PAGE_COHERENT_H */
diff --git a/arch/x86/include/asm/xen/page-coherent.h b/arch/x86/include/asm/xen/page-coherent.h
new file mode 100644
index 0000000..31de2e0
--- /dev/null
+++ b/arch/x86/include/asm/xen/page-coherent.h
@@ -0,0 +1,24 @@
+#ifndef _ASM_X86_XEN_PAGE_COHERENT_H
+#define _ASM_X86_XEN_PAGE_COHERENT_H
+
+#include <asm/page.h>
+#include <linux/dma-attrs.h>
+#include <linux/dma-mapping.h>
+
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flags,
+ struct dma_attrs *attrs)
+{
+ void *vstart = (void*)__get_free_pages(flags, get_order(size));
+ *dma_handle = virt_to_phys(vstart);
+ return vstart;
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle,
+ struct dma_attrs *attrs)
+{
+ free_pages((unsigned long) cpu_addr, get_order(size));
+}
+
+#endif /* _ASM_X86_XEN_PAGE_COHERENT_H */
--
1.7.2.5

2013-09-27 16:13:33

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 09/19] arm64/xen: get_dma_ops: return xen_dma_ops if we are running on Xen

Signed-off-by: Stefano Stabellini <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/arm64/include/asm/dma-mapping.h | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index c2cb8a0..1dbd1d5 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -23,12 +23,15 @@

#include <asm-generic/dma-coherent.h>

+#include <xen/xen.h>
+#include <asm/xen/hypervisor.h>
+
#define ARCH_HAS_DMA_GET_REQUIRED_MASK

#define DMA_ERROR_CODE (~0)
extern struct dma_map_ops *dma_ops;

-static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+static inline struct dma_map_ops *__get_dma_ops(struct device *dev)
{
if (unlikely(!dev) || !dev->archdata.dma_ops)
return dma_ops;
@@ -36,6 +39,14 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
return dev->archdata.dma_ops;
}

+static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+ if (xen_domain())
+ return xen_dma_ops;
+ else
+ return __get_dma_ops(dev);
+}
+
#include <asm-generic/dma-mapping-common.h>

static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
--
1.7.2.5

2013-09-27 16:13:52

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 08/19] arm/xen: get_dma_ops: return xen_dma_ops if we are running on Xen

Signed-off-by: Stefano Stabellini <[email protected]>
Suggested-by: Catalin Marinas <[email protected]>
CC: [email protected]
CC: [email protected]
---
arch/arm/include/asm/dma-mapping.h | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 8807124..b638d6d 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -12,17 +12,28 @@
#include <asm/memory.h>
#include <asm/cacheflush.h>

+#include <xen/xen.h>
+#include <asm/xen/hypervisor.h>
+
#define DMA_ERROR_CODE (~0)
extern struct dma_map_ops arm_dma_ops;
extern struct dma_map_ops arm_coherent_dma_ops;

-static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+static inline struct dma_map_ops *__get_dma_ops(struct device *dev)
{
if (dev && dev->archdata.dma_ops)
return dev->archdata.dma_ops;
return &arm_dma_ops;
}

+static inline struct dma_map_ops *get_dma_ops(struct device *dev)
+{
+ if (xen_domain())
+ return xen_dma_ops;
+ else
+ return __get_dma_ops(dev);
+}
+
static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
{
BUG_ON(!dev);
--
1.7.2.5

2013-09-27 16:13:51

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 03/19] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

XENMEM_exchange can't be used by autotranslate guests because of two
severe limitations:

- it does not copy back the mfns into the out field for autotranslate
guests;

- it does not guarantee that the hypervisor won't change the p2m
mappings for the exchanged pages while the guest is using them. Xen
never promises to keep the p2m mapping stable for autotranslate guests
in general. In practice it won't happen unless one uses uncommon
features like memory sharing or paging.

To overcome these problems I am introducing two new hypercalls.

Signed-off-by: Stefano Stabellini <[email protected]>


Changes in v6:
- update the comments and the hypercalls structures.

Changes in v5:
- better comment for XENMEM_exchange_and_pin return codes;

Changes in v4:
- rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
- rename XENMEM_put_dma_buf to XENMEM_unpin;
- improve the documentation of the new hypercalls;
- add a note about out.address_bits for XENMEM_exchange.
---
include/xen/interface/memory.h | 51 ++++++++++++++++++++++++++++++++++++++++
1 files changed, 51 insertions(+), 0 deletions(-)

diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index 2ecfe4f..49db252 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -66,6 +66,9 @@ struct xen_memory_exchange {
/*
* [IN] Details of memory extents to be exchanged (GMFN bases).
* Note that @in.address_bits is ignored and unused.
+ * @out.address_bits contains the maximum number of bits addressable
+ * by the caller. The addresses of the newly allocated pages have to
+ * meet this restriction.
*/
struct xen_memory_reservation in;

@@ -263,4 +266,52 @@ struct xen_remove_from_physmap {
};
DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);

+/*
+ * This hypercall is similar to XENMEM_exchange: it takes the same
+ * struct as an argument and it exchanges the pages passed in with a new
+ * set of pages. The new pages are going to be "pinned": it's guaranteed
+ * that their p2m mapping won't be changed until explicitly "unpinned".
+ * Only normal guest r/w memory can be pinned: no granted pages or
+ * ballooned pages.
+ * If return code is zero then @out.extent_list provides the frame
+ * numbers of the newly-allocated memory.
+ * On X86 the frame numbers are machine frame numbers (mfns).
+ * On ARMv7 and ARMv8 the frame numbers are machine frame numbers (mfns).
+ * Returns zero on complete success, otherwise a negative error code.
+ * The most common error codes are:
+ * -ENOSYS if not implemented
+ * -EPERM if the domain is not privileged for this operation
+ * -EBUSY if the page is already pinned
+ * -EFAULT if an internal error occurs
+ * On complete success then always @nr_exchanged == @in.nr_extents. On
+ * partial success @nr_exchanged indicates how much work was done and a
+ * negative error code is returned.
+ */
+#define XENMEM_exchange_and_pin 26
+
+/*
+ * XENMEM_unpin unpins a set of pages, previously pinned by
+ * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can
+ * be transparently changed by the hypervisor, as usual. The pages are
+ * still accessible from the guest.
+ */
+#define XENMEM_unpin 27
+struct xen_unpin {
+ /*
+ * [IN] Details of memory extents to be unpinned (GMFN bases).
+ * Note that @in.address_bits is ignored and unused.
+ */
+ struct xen_memory_reservation in;
+ /*
+ * [OUT] Number of input extents that were successfully unpinned.
+ * 1. The first @nr_unpinned input extents were successfully
+ * unpinned.
+ * 2. All other input extents are untouched.
+ * 3. If not all input extents are unpinned then the return code of this
+ * command will be non-zero.
+ */
+ xen_ulong_t nr_unpinned;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_unpin);
+
#endif /* __XEN_PUBLIC_MEMORY_H__ */
--
1.7.2.5

2013-09-27 16:35:21

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 13/19] ASoC: Samsung: Rename dma_ops by samsung_dma_ops

From: Julien Grall <[email protected]>

The commit "arm: introduce a global dma_ops pointer" introduce compilation issue
when CONFIG_SND_SOC_SAMSUNG is enabled.

sound/soc/samsung/dma.c:345:27: error: conflicting types for 'dma_ops'
/local/home/julien/works/arndale/linux/arch/arm/include/asm/dma-mapping.h:16:28:
note: previous declaration of 'dma_ops' was here

Signed-off-by: Julien Grall <[email protected]>
Signed-off-by: Stefano Stabellini <[email protected]>
---
sound/soc/samsung/dma.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/samsung/dma.c b/sound/soc/samsung/dma.c
index 21b7926..827c0d8 100644
--- a/sound/soc/samsung/dma.c
+++ b/sound/soc/samsung/dma.c
@@ -341,7 +341,7 @@ static int dma_mmap(struct snd_pcm_substream *substream,
runtime->dma_bytes);
}

-static struct snd_pcm_ops dma_ops = {
+static struct snd_pcm_ops samsung_dma_ops = {
.open = dma_open,
.close = dma_close,
.ioctl = snd_pcm_lib_ioctl,
@@ -428,7 +428,7 @@ out:
}

static struct snd_soc_platform_driver samsung_asoc_platform = {
- .ops = &dma_ops,
+ .ops = &samsung_dma_ops,
.pcm_new = dma_new,
.pcm_free = dma_free_dma_buffers,
};
--
1.7.2.5

2013-09-27 16:35:29

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 11/19] swiotlb-xen: use xen_alloc/free_coherent_pages

Use xen_alloc_coherent_pages and xen_free_coherent_pages to allocate or
free coherent pages.

We need to be careful handling the pointer returned by
xen_alloc_coherent_pages, because on ARM the pointer is not equal to
phys_to_virt(*dma_handle). In fact virt_to_phys only works for kernel
direct mapped RAM memory.
In ARM case the pointer could be an ioremap address, therefore passing
it to virt_to_phys would give you another physical address that doesn't
correspond to it.

Make xen_create_contiguous_region take a phys_addr_t as start parameter to
avoid the virt_to_phys calls which would be incorrect.

Changes in v6:
- remove extra spaces.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/arm/xen/mm.c | 7 +++----
arch/x86/xen/mmu.c | 7 +++++--
drivers/xen/swiotlb-xen.c | 31 +++++++++++++++++++++----------
include/xen/xen-ops.h | 4 ++--
4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 4330b15..b305b94 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -55,11 +55,10 @@ static int xen_exchange_memory(xen_ulong_t extents_in,
return success;
}

-int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
+int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
unsigned int address_bits,
dma_addr_t *dma_handle)
{
- phys_addr_t pstart = __pa(vstart);
xen_pfn_t in_frame, out_frame;
int success;

@@ -78,9 +77,9 @@ int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
}
EXPORT_SYMBOL_GPL(xen_create_contiguous_region);

-void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
+void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
{
- xen_pfn_t in_frame = __pa(vstart) >> PAGE_SHIFT;
+ xen_pfn_t in_frame = pstart >> PAGE_SHIFT;
struct xen_unpin unpin = {
.in = {
.nr_extents = 1,
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 6c34d7c..8830883 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2328,13 +2328,14 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
return success;
}

-int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
+int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
unsigned int address_bits,
dma_addr_t *dma_handle)
{
unsigned long *in_frames = discontig_frames, out_frame;
unsigned long flags;
int success;
+ unsigned long vstart = (unsigned long)phys_to_virt(pstart);

/*
* Currently an auto-translated guest will not perform I/O, nor will
@@ -2374,11 +2375,12 @@ int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
}
EXPORT_SYMBOL_GPL(xen_create_contiguous_region);

-void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
+void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
{
unsigned long *out_frames = discontig_frames, in_frame;
unsigned long flags;
int success;
+ unsigned long vstart;

if (xen_feature(XENFEAT_auto_translated_physmap))
return;
@@ -2386,6 +2388,7 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
if (unlikely(order > MAX_CONTIG_ORDER))
return;

+ vstart = (unsigned long)phys_to_virt(pstart);
memset((void *) vstart, 0, PAGE_SIZE << order);

spin_lock_irqsave(&xen_reservation_lock, flags);
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index deb9131..96ad316 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -46,6 +46,7 @@
#include <xen/hvc-console.h>
#include <xen/features.h>
#include <asm/dma-mapping.h>
+#include <asm/xen/page-coherent.h>

/*
* Used to do a quick range check in swiotlb_tbl_unmap_single and
@@ -244,6 +245,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
{
int i, j, rc;
int dma_bits;
+ phys_addr_t p = virt_to_phys(buf);

dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT;

@@ -253,7 +255,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)

do {
rc = xen_create_contiguous_region(
- (unsigned long)buf + (i << IO_TLB_SHIFT),
+ p + (i << IO_TLB_SHIFT),
get_order(slabs << IO_TLB_SHIFT),
dma_bits, &xen_dma_seg[j].dma_addr);
} while (rc && dma_bits++ < max_dma_bits);
@@ -389,7 +391,6 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
void *ret;
int order = get_order(size);
u64 dma_mask = DMA_BIT_MASK(32);
- unsigned long vstart;
phys_addr_t phys;
dma_addr_t dev_addr;
struct xen_dma_info *dma_info = NULL;
@@ -405,8 +406,12 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
if (dma_alloc_from_coherent(hwdev, size, dma_handle, &ret))
return ret;

- vstart = __get_free_pages(flags, order);
- ret = (void *)vstart;
+ /* On ARM this function returns an ioremap'ped virtual address for
+ * which virt_to_phys doesn't return the corresponding physical
+ * address. In fact on ARM virt_to_phys only works for kernel direct
+ * mapped RAM memory. Also see comment below.
+ */
+ ret = xen_alloc_coherent_pages(hwdev, size, dma_handle, flags, attrs);

if (!ret)
return ret;
@@ -414,16 +419,20 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
if (hwdev && hwdev->coherent_dma_mask)
dma_mask = dma_alloc_coherent_mask(hwdev, flags);

- phys = virt_to_phys(ret);
+ /* At this point dma_handle is the physical address, next we are
+ * going to set it to the machine address.
+ * Do not use virt_to_phys(ret) because on ARM it doesn't correspond
+ * to *dma_handle. */
+ phys = *dma_handle;
dev_addr = xen_phys_to_bus(phys);
if (!xen_feature(XENFEAT_auto_translated_physmap) &&
((dev_addr + size - 1 <= dma_mask)) &&
!range_straddles_page_boundary(phys, size))
*dma_handle = dev_addr;
else {
- if (xen_create_contiguous_region(vstart, order,
+ if (xen_create_contiguous_region(phys, order,
fls64(dma_mask), dma_handle) != 0) {
- free_pages(vstart, order);
+ xen_free_coherent_pages(hwdev, size, ret, (dma_addr_t)phys, attrs);
return NULL;
}

@@ -463,18 +472,20 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
if (hwdev && hwdev->coherent_dma_mask)
dma_mask = hwdev->coherent_dma_mask;

- phys = virt_to_phys(vaddr);
+ /* do not use virt_to_phys because on ARM it doesn't return you the
+ * physical address */
+ phys = xen_bus_to_phys(dev_addr);

if (xen_feature(XENFEAT_auto_translated_physmap) ||
(((dev_addr + size - 1 > dma_mask)) ||
range_straddles_page_boundary(phys, size))) {
- xen_destroy_contiguous_region((unsigned long)vaddr, order);
+ xen_destroy_contiguous_region(phys, order);
dma_info = xen_get_dma_info_from_dma(dev_addr);
rb_erase(&dma_info->rbnode, &bus_to_phys);
kfree(dma_info);
}

- free_pages((unsigned long)vaddr, order);
+ xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
}
EXPORT_SYMBOL_GPL(xen_swiotlb_free_coherent);

diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 9ef704d..fb2ea8f 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -19,11 +19,11 @@ void xen_arch_resume(void);
int xen_setup_shutdown_event(void);

extern unsigned long *xen_contiguous_bitmap;
-int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
+int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
unsigned int address_bits,
dma_addr_t *dma_handle);

-void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order);
+void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order);

struct vm_area_struct;
int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
--
1.7.2.5

2013-09-27 16:35:37

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 14/19] swiotlb: print a warning when the swiotlb is full

Signed-off-by: Stefano Stabellini <[email protected]>
---
drivers/xen/swiotlb-xen.c | 1 +
lib/swiotlb.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 96ad316..790c2eb 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -674,6 +674,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
sg->length,
dir);
if (map == SWIOTLB_MAP_ERROR) {
+ pr_warn("swiotlb buffer is full\n");
/* Don't panic here, we expect map_sg users
to do proper error handling. */
xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index eb45d17..f06da0d 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -502,6 +502,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,

not_found:
spin_unlock_irqrestore(&io_tlb_lock, flags);
+ pr_warn("swiotlb buffer is full\n");
return SWIOTLB_MAP_ERROR;
found:
spin_unlock_irqrestore(&io_tlb_lock, flags);
--
1.7.2.5

2013-09-27 16:35:45

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 19/19] swiotlb-xen: instead of bouncing on the swiotlb, pin single pages

If we are dealing with single page mappings that don't cross page
boundaries, we can try to pin the page and get the corresponding mfn,
using xen_pin_page. This avoids going through the swiotlb bounce
buffer. If xen_pin_page fails (because the underlying mfn doesn't
respect the dma_mask) fall back to the swiotlb bounce buffer.
Add a ref count to xen_dma_info, so that we can avoid pinnig pages that
are already pinned.
Use a spinlock to protect accesses, insertions and deletions in the
rbtrees.

Signed-off-by: Stefano Stabellini <[email protected]>
---
drivers/xen/swiotlb-xen.c | 152 ++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 143 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 022bcaf..6f94285 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -57,6 +57,8 @@
#define NR_DMA_SEGS ((xen_io_tlb_nslabs + IO_TLB_SEGSIZE - 1) / IO_TLB_SEGSIZE)
static char *xen_io_tlb_start, *xen_io_tlb_end;
static unsigned long xen_io_tlb_nslabs;
+spinlock_t swiotlb_lock;
+
/*
* Quick lookup value of the bus address of the IOTLB.
*/
@@ -79,6 +81,7 @@ struct xen_dma_info {
dma_addr_t dma_addr;
phys_addr_t phys_addr;
size_t size;
+ atomic_t refs;
struct rb_node rbnode_dma;
struct rb_node rbnode_phys;
};
@@ -254,6 +257,48 @@ static dma_addr_t xen_virt_to_bus(void *address)
return xen_phys_to_bus_quick(virt_to_phys(address));
}

+static int xen_pin_dev_page(struct device *dev,
+ phys_addr_t phys,
+ dma_addr_t *dev_addr)
+{
+ u64 dma_mask = DMA_BIT_MASK(32);
+ xen_pfn_t in;
+ struct xen_dma_info *dma_info = xen_get_dma_info_from_phys(phys);
+
+ if (dma_info != NULL) {
+ atomic_inc(&dma_info->refs);
+ *dev_addr = dma_info->dma_addr + (phys - dma_info->phys_addr);
+ return 0;
+ }
+
+ if (dev && dev->coherent_dma_mask)
+ dma_mask = dma_alloc_coherent_mask(dev, GFP_KERNEL);
+
+ in = phys >> PAGE_SHIFT;
+ if (!xen_pin_page(&in, fls64(dma_mask))) {
+ *dev_addr = in << PAGE_SHIFT;
+ dma_info = kzalloc(sizeof(struct xen_dma_info), GFP_NOWAIT);
+ if (!dma_info) {
+ pr_warn("cannot allocate xen_dma_info\n");
+ xen_destroy_contiguous_region(phys & PAGE_MASK, 0);
+ return -ENOMEM;
+ }
+ dma_info->phys_addr = phys & PAGE_MASK;
+ dma_info->size = PAGE_SIZE;
+ dma_info->dma_addr = *dev_addr;
+ if (xen_dma_add_entry(dma_info)) {
+ pr_warn("cannot add new entry to bus_to_phys\n");
+ xen_destroy_contiguous_region(phys & PAGE_MASK, 0);
+ kfree(dma_info);
+ return -EFAULT;
+ }
+ atomic_set(&dma_info->refs, 1);
+ *dev_addr += (phys & ~PAGE_MASK);
+ return 0;
+ }
+ return -EFAULT;
+}
+
static int check_pages_physically_contiguous(unsigned long pfn,
unsigned int offset,
size_t length)
@@ -434,6 +479,7 @@ retry:
rc = 0;
} else
rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
+ spin_lock_init(&swiotlb_lock);
return rc;
error:
if (repeat--) {
@@ -461,6 +507,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
phys_addr_t phys;
dma_addr_t dev_addr;
struct xen_dma_info *dma_info = NULL;
+ unsigned long irqflags;

/*
* Ignore region specifiers - the kernel's ideas of
@@ -497,7 +544,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
!range_straddles_page_boundary(phys, size))
*dma_handle = dev_addr;
else {
- if (xen_create_contiguous_region(phys, order,
+ if (xen_create_contiguous_region(phys & PAGE_MASK, order,
fls64(dma_mask), dma_handle) != 0) {
xen_free_coherent_pages(hwdev, size, ret, (dma_addr_t)phys, attrs);
return NULL;
@@ -509,15 +556,19 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
xen_destroy_contiguous_region(phys, order);
return NULL;
}
- dma_info->phys_addr = phys;
- dma_info->size = size;
+ dma_info->phys_addr = phys & PAGE_MASK;
+ dma_info->size = (1U << order) << PAGE_SHIFT;
dma_info->dma_addr = *dma_handle;
+ atomic_set(&dma_info->refs, 1);
+ spin_lock_irqsave(&swiotlb_lock, irqflags);
if (xen_dma_add_entry(dma_info)) {
+ spin_unlock_irqrestore(&swiotlb_lock, irqflags);
pr_warn("cannot add new entry to bus_to_phys\n");
xen_destroy_contiguous_region(phys, order);
kfree(dma_info);
return NULL;
}
+ spin_unlock_irqrestore(&swiotlb_lock, irqflags);
}
memset(ret, 0, size);
return ret;
@@ -532,6 +583,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
phys_addr_t phys;
u64 dma_mask = DMA_BIT_MASK(32);
struct xen_dma_info *dma_info = NULL;
+ unsigned long flags;

if (dma_release_from_coherent(hwdev, order, vaddr))
return;
@@ -539,6 +591,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
if (hwdev && hwdev->coherent_dma_mask)
dma_mask = hwdev->coherent_dma_mask;

+ spin_lock_irqsave(&swiotlb_lock, flags);
/* do not use virt_to_phys because on ARM it doesn't return you the
* physical address */
phys = xen_bus_to_phys(dev_addr);
@@ -546,12 +599,16 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
if (xen_feature(XENFEAT_auto_translated_physmap) ||
(((dev_addr + size - 1 > dma_mask)) ||
range_straddles_page_boundary(phys, size))) {
- xen_destroy_contiguous_region(phys, order);
dma_info = xen_get_dma_info_from_dma(dev_addr);
- rb_erase(&dma_info->rbnode, &bus_to_phys);
- kfree(dma_info);
+ if (atomic_dec_and_test(&dma_info->refs)) {
+ xen_destroy_contiguous_region(phys & PAGE_MASK, order);
+ rb_erase(&dma_info->rbnode_dma, &bus_to_phys);
+ rb_erase(&dma_info->rbnode_phys, &phys_to_bus);
+ kfree(dma_info);
+ }
}

+ spin_unlock_irqrestore(&swiotlb_lock, flags);
xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
}
EXPORT_SYMBOL_GPL(xen_swiotlb_free_coherent);
@@ -583,6 +640,23 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
!range_straddles_page_boundary(phys, size) && !swiotlb_force)
return dev_addr;

+ if (xen_feature(XENFEAT_auto_translated_physmap) &&
+ size <= PAGE_SIZE &&
+ !range_straddles_page_boundary(phys, size) &&
+ !swiotlb_force) {
+ unsigned long flags;
+ int rc;
+
+ spin_lock_irqsave(&swiotlb_lock, flags);
+ rc = xen_pin_dev_page(dev, phys, &dev_addr);
+ spin_unlock_irqrestore(&swiotlb_lock, flags);
+
+ if (!rc) {
+ dma_mark_clean(phys_to_virt(phys), size);
+ return dev_addr;
+ }
+ }
+
/*
* Oh well, have to allocate and map a bounce buffer.
* Pass the dma_addr of the first slab in the iotlb buffer as
@@ -618,10 +692,37 @@ EXPORT_SYMBOL_GPL(xen_swiotlb_map_page);
static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
size_t size, enum dma_data_direction dir)
{
- phys_addr_t paddr = xen_bus_to_phys(dev_addr);
+ struct xen_dma_info *dma_info;
+ phys_addr_t paddr = DMA_ERROR_CODE;
+ char *vaddr = NULL;
+ unsigned long flags;

BUG_ON(dir == DMA_NONE);

+ spin_lock_irqsave(&swiotlb_lock, flags);
+ dma_info = xen_get_dma_info_from_dma(dev_addr);
+ if (dma_info != NULL) {
+ paddr = dma_info->phys_addr + (dev_addr - dma_info->dma_addr);
+ vaddr = phys_to_virt(paddr);
+ }
+
+ if (xen_feature(XENFEAT_auto_translated_physmap) &&
+ paddr != DMA_ERROR_CODE &&
+ !(vaddr >= xen_io_tlb_start && vaddr < xen_io_tlb_end) &&
+ !swiotlb_force) {
+ if (atomic_dec_and_test(&dma_info->refs)) {
+ xen_destroy_contiguous_region(paddr & PAGE_MASK, 0);
+ rb_erase(&dma_info->rbnode_dma, &bus_to_phys);
+ rb_erase(&dma_info->rbnode_phys, &phys_to_bus);
+ kfree(dma_info);
+ }
+ spin_unlock_irqrestore(&swiotlb_lock, flags);
+ if ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL))
+ dma_mark_clean(vaddr, size);
+ return;
+ }
+ spin_unlock_irqrestore(&swiotlb_lock, flags);
+
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr)) {
swiotlb_tbl_unmap_single(hwdev, paddr, size, dir);
@@ -664,9 +765,19 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
enum dma_sync_target target)
{
phys_addr_t paddr = xen_bus_to_phys(dev_addr);
+ char *vaddr = phys_to_virt(paddr);

BUG_ON(dir == DMA_NONE);

+ if (xen_feature(XENFEAT_auto_translated_physmap) &&
+ paddr != DMA_ERROR_CODE &&
+ size <= PAGE_SIZE &&
+ !(vaddr >= xen_io_tlb_start && vaddr < xen_io_tlb_end) &&
+ !range_straddles_page_boundary(paddr, size) && !swiotlb_force) {
+ dma_mark_clean(vaddr, size);
+ return;
+ }
+
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr)) {
swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
@@ -717,13 +828,36 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
struct dma_attrs *attrs)
{
struct scatterlist *sg;
- int i;
+ int i, rc;
+ u64 dma_mask = DMA_BIT_MASK(32);
+ unsigned long flags;

BUG_ON(dir == DMA_NONE);

+ if (hwdev && hwdev->coherent_dma_mask)
+ dma_mask = dma_alloc_coherent_mask(hwdev, GFP_KERNEL);
+
for_each_sg(sgl, sg, nelems, i) {
phys_addr_t paddr = sg_phys(sg);
- dma_addr_t dev_addr = xen_phys_to_bus_quick(paddr);
+ dma_addr_t dev_addr;
+
+ if (xen_feature(XENFEAT_auto_translated_physmap) &&
+ !range_straddles_page_boundary(paddr, sg->length) &&
+ sg->length <= PAGE_SIZE &&
+ !swiotlb_force) {
+
+ spin_lock_irqsave(&swiotlb_lock, flags);
+ rc = xen_pin_dev_page(hwdev, paddr, &dev_addr);
+ spin_unlock_irqrestore(&swiotlb_lock, flags);
+
+ if (!rc) {
+ dma_mark_clean(phys_to_virt(paddr), sg->length);
+ sg_dma_len(sg) = sg->length;
+ sg->dma_address = dev_addr;
+ continue;
+ }
+ }
+ dev_addr = xen_phys_to_bus_quick(paddr);

if (swiotlb_force ||
xen_feature(XENFEAT_auto_translated_physmap) ||
--
1.7.2.5

2013-09-27 16:35:53

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 16/19] arm,arm64: do not always merge biovec if we are running on Xen

This is similar to what it is done on X86: biovecs are prevented from merging
otherwise every dma requests would be forced to bounce on the swiotlb buffer.

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/arm/include/asm/io.h | 8 ++++++++
arch/arm64/include/asm/io.h | 9 +++++++++
drivers/xen/biomerge.c | 4 +++-
3 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index d070741..c45effc 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -24,9 +24,11 @@
#ifdef __KERNEL__

#include <linux/types.h>
+#include <linux/blk_types.h>
#include <asm/byteorder.h>
#include <asm/memory.h>
#include <asm-generic/pci_iomap.h>
+#include <xen/xen.h>

/*
* ISA I/O bus memory addresses are 1:1 with the physical address.
@@ -372,6 +374,12 @@ extern void pci_iounmap(struct pci_dev *dev, void __iomem *addr);
#define BIOVEC_MERGEABLE(vec1, vec2) \
((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))

+extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
+ const struct bio_vec *vec2);
+#define BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
+ (__BIOVEC_PHYS_MERGEABLE(vec1, vec2) && \
+ (!xen_domain() || xen_biovec_phys_mergeable(vec1, vec2)))
+
#ifdef CONFIG_MMU
#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
extern int valid_phys_addr_range(phys_addr_t addr, size_t size);
diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 1d12f89..c163287b 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -22,11 +22,14 @@
#ifdef __KERNEL__

#include <linux/types.h>
+#include <linux/blk_types.h>

#include <asm/byteorder.h>
#include <asm/barrier.h>
#include <asm/pgtable.h>

+#include <xen/xen.h>
+
/*
* Generic IO read/write. These perform native-endian accesses.
*/
@@ -263,5 +266,11 @@ extern int devmem_is_allowed(unsigned long pfn);
*/
#define xlate_dev_kmem_ptr(p) p

+extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
+ const struct bio_vec *vec2);
+#define BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
+ (__BIOVEC_PHYS_MERGEABLE(vec1, vec2) && \
+ (!xen_domain() || xen_biovec_phys_mergeable(vec1, vec2)))
+
#endif /* __KERNEL__ */
#endif /* __ASM_IO_H */
diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
index 0edb91c..f323a2d 100644
--- a/drivers/xen/biomerge.c
+++ b/drivers/xen/biomerge.c
@@ -2,6 +2,7 @@
#include <linux/io.h>
#include <linux/export.h>
#include <xen/page.h>
+#include <xen/features.h>

bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
const struct bio_vec *vec2)
@@ -9,7 +10,8 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
unsigned long mfn1 = pfn_to_mfn(page_to_pfn(vec1->bv_page));
unsigned long mfn2 = pfn_to_mfn(page_to_pfn(vec2->bv_page));

- return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
+ return !xen_feature(XENFEAT_auto_translated_physmap) &&
+ __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
((mfn1 == mfn2) || ((mfn1+1) == mfn2));
}
EXPORT_SYMBOL(xen_biovec_phys_mergeable);
--
1.7.2.5

2013-09-27 16:36:00

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 15/19] swiotlb-xen: call dma_capable only if dev->dma_mask is allocated

Signed-off-by: Stefano Stabellini <[email protected]>
---
drivers/xen/swiotlb-xen.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 790c2eb..3011736 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -512,7 +512,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
* buffering it.
*/
if (!xen_feature(XENFEAT_auto_translated_physmap) &&
- dma_capable(dev, dev_addr, size) &&
+ dev->dma_mask && dma_capable(dev, dev_addr, size) &&
!range_straddles_page_boundary(phys, size) && !swiotlb_force)
return dev_addr;

@@ -532,7 +532,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
/*
* Ensure that the address returned is DMA'ble
*/
- if (!dma_capable(dev, dev_addr, size)) {
+ if (dev->dma_mask && !dma_capable(dev, dev_addr, size)) {
swiotlb_tbl_unmap_single(dev, map, size, dir);
dev_addr = 0;
}
@@ -660,7 +660,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,

if (swiotlb_force ||
xen_feature(XENFEAT_auto_translated_physmap) ||
- !dma_capable(hwdev, dev_addr, sg->length) ||
+ (hwdev->dma_mask && !dma_capable(hwdev, dev_addr, sg->length)) ||
range_straddles_page_boundary(paddr, sg->length)) {
/*
* Pass the dma_addr of the first slab in the iotlb buffer as
--
1.7.2.5

2013-09-27 16:36:07

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 17/19] xen: introduce XENMEM_pin

Introduce a new hypercall to pin one or more pages whose machine
addresses respect a dma_mask passed as an argument

Signed-off-by: Stefano Stabellini <[email protected]>
---
arch/arm/xen/mm.c | 16 ++++++++++++++++
arch/x86/xen/mmu.c | 7 +++++++
include/xen/interface/memory.h | 32 ++++++++++++++++++++++++++++++++
include/xen/xen-ops.h | 1 +
4 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index b305b94..146c1c3 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -55,6 +55,22 @@ static int xen_exchange_memory(xen_ulong_t extents_in,
return success;
}

+int xen_pin_page(xen_pfn_t *in_frame, unsigned int address_bits)
+{
+ struct xen_pin pin = {
+ .in = {
+ .nr_extents = 1,
+ .extent_order = 0,
+ .domid = DOMID_SELF,
+ .address_bits = address_bits
+ },
+ };
+ set_xen_guest_handle(pin.in.extent_start, in_frame);
+
+ return HYPERVISOR_memory_op(XENMEM_pin, &pin);
+}
+EXPORT_SYMBOL_GPL(xen_pin_page);
+
int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
unsigned int address_bits,
dma_addr_t *dma_handle)
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 8830883..8f76ce2 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2568,3 +2568,10 @@ int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
return -EINVAL;
}
EXPORT_SYMBOL_GPL(xen_unmap_domain_mfn_range);
+
+int xen_pin_page(xen_pfn_t *in_frame, unsigned int address_bits)
+{
+ return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_pin_page);
+
diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index 49db252..66ab578 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -314,4 +314,36 @@ struct xen_unpin {
};
DEFINE_GUEST_HANDLE_STRUCT(xen_unpin);

+/*
+ * XENMEM_pin pins a set of pages to make sure that the hypervisor does
+ * not change the p2m mappings for them.
+ *
+ */
+#define XENMEM_pin 28
+struct xen_pin {
+ /*
+ * [IN/OUT] Details of memory extents to be pinned (GMFN bases).
+ * Xen copies back the MFNs corresponding to the GMFNs passed in as
+ * argument.
+ * @in.address_bits contains the maximum number of bits addressable
+ * by the caller. If the machine addresses of the pages to be pinned
+ * are not addressable according to @in.address_bits, the hypercall
+ * fails and returns an errors. The pages are not pinned. Otherwise
+ * the hypercall succeeds.
+ */
+ struct xen_memory_reservation in;
+
+ /*
+ * [OUT] Number of input extents that were successfully pinned.
+ * 1. The first @nr_pinned input extents were successfully
+ * pinned.
+ * 2. All other input extents are untouched.
+ * 3. If not all input extents are pinned then the return code of this
+ * command will be non-zero.
+ */
+ xen_ulong_t nr_pinned;
+};
+DEFINE_GUEST_HANDLE_STRUCT(xen_pin);
+
+
#endif /* __XEN_PUBLIC_MEMORY_H__ */
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index fb2ea8f..4cf4fc5 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -22,6 +22,7 @@ extern unsigned long *xen_contiguous_bitmap;
int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
unsigned int address_bits,
dma_addr_t *dma_handle);
+int xen_pin_page(xen_pfn_t *in_frame, unsigned int address_bits);

void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order);

--
1.7.2.5

2013-09-27 16:36:14

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 12/19] swiotlb: don't assume that io_tlb_start-io_tlb_end is coherent

The swiotlb code has appropriate calls to dma_mark_clean in place for
buffers passed to swiotlb_map_page as an argument. However it assumes
that the swiotlb bounce buffer (io_tlb_start-io_tlb_end) is already
coherent and doesn't need any calls to dma_mark_clean.

On ARM the swiotlb bounce buffer is not coherent (the memory is
writealloc while it should be bufferable) and therefore we need to call
dma_mark_clean appropriately on the bounce buffer code paths too.

Note that most architecures have an empty dma_mark_clean implementation
anyway.

Signed-off-by: Stefano Stabellini <[email protected]>
---
lib/swiotlb.c | 13 ++++++++++---
1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 4e8686c..eb45d17 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -515,6 +515,7 @@ found:
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
+ dma_mark_clean(phys_to_virt(tlb_addr), size);

return tlb_addr;
}
@@ -547,7 +548,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
* First, sync the memory before unmapping the entry
*/
if (orig_addr && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
+ {
+ dma_mark_clean(phys_to_virt(tlb_addr), size);
swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
+ }

/*
* Return the buffer to the free list by setting the corresponding
@@ -587,17 +591,20 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,

switch (target) {
case SYNC_FOR_CPU:
- if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+ if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) {
+ dma_mark_clean(phys_to_virt(tlb_addr), size);
swiotlb_bounce(orig_addr, tlb_addr,
size, DMA_FROM_DEVICE);
+ }
else
BUG_ON(dir != DMA_TO_DEVICE);
break;
case SYNC_FOR_DEVICE:
- if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
swiotlb_bounce(orig_addr, tlb_addr,
size, DMA_TO_DEVICE);
- else
+ dma_mark_clean(phys_to_virt(tlb_addr), size);
+ } else
BUG_ON(dir != DMA_FROM_DEVICE);
break;
default:
--
1.7.2.5

2013-09-27 16:36:22

by Stefano Stabellini

[permalink] [raw]
Subject: [PATCH v6 18/19] swiotlb-xen: introduce a rbtree to track phys to bus mappings

Introduce a second red-back tree to track phys to bus mappings created after
the initialization of the swiotlb buffer.

Signed-off-by: Stefano Stabellini <[email protected]>
---
drivers/xen/swiotlb-xen.c | 99 +++++++++++++++++++++++++++++++++++++-------
1 files changed, 83 insertions(+), 16 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 3011736..022bcaf 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -79,7 +79,8 @@ struct xen_dma_info {
dma_addr_t dma_addr;
phys_addr_t phys_addr;
size_t size;
- struct rb_node rbnode;
+ struct rb_node rbnode_dma;
+ struct rb_node rbnode_phys;
};

/*
@@ -96,8 +97,13 @@ static struct xen_dma_info *xen_dma_seg;
* mappings.
*/
static struct rb_root bus_to_phys = RB_ROOT;
+/*
+ * This tree keeps track of physical address to bus address
+ * mappings apart from the ones belonging to the initial swiotlb buffer.
+ */
+static struct rb_root phys_to_bus = RB_ROOT;

-static int xen_dma_add_entry(struct xen_dma_info *new)
+static int xen_dma_add_entry_bus(struct xen_dma_info *new)
{
struct rb_node **link = &bus_to_phys.rb_node;
struct rb_node *parent = NULL;
@@ -106,7 +112,7 @@ static int xen_dma_add_entry(struct xen_dma_info *new)

while (*link) {
parent = *link;
- entry = rb_entry(parent, struct xen_dma_info, rbnode);
+ entry = rb_entry(parent, struct xen_dma_info, rbnode_dma);

if (new->dma_addr == entry->dma_addr)
goto err_out;
@@ -118,8 +124,41 @@ static int xen_dma_add_entry(struct xen_dma_info *new)
else
link = &(*link)->rb_right;
}
- rb_link_node(&new->rbnode, parent, link);
- rb_insert_color(&new->rbnode, &bus_to_phys);
+ rb_link_node(&new->rbnode_dma, parent, link);
+ rb_insert_color(&new->rbnode_dma, &bus_to_phys);
+ goto out;
+
+err_out:
+ rc = -EINVAL;
+ pr_warn("%s: cannot add phys=%pa -> dma=%pa: phys=%pa -> dma=%pa already exists\n",
+ __func__, &new->phys_addr, &new->dma_addr, &entry->phys_addr, &entry->dma_addr);
+out:
+ return rc;
+}
+
+static int xen_dma_add_entry_phys(struct xen_dma_info *new)
+{
+ struct rb_node **link = &phys_to_bus.rb_node;
+ struct rb_node *parent = NULL;
+ struct xen_dma_info *entry;
+ int rc = 0;
+
+ while (*link) {
+ parent = *link;
+ entry = rb_entry(parent, struct xen_dma_info, rbnode_phys);
+
+ if (new->dma_addr == entry->dma_addr)
+ goto err_out;
+ if (new->phys_addr == entry->phys_addr)
+ goto err_out;
+
+ if (new->phys_addr < entry->phys_addr)
+ link = &(*link)->rb_left;
+ else
+ link = &(*link)->rb_right;
+ }
+ rb_link_node(&new->rbnode_phys, parent, link);
+ rb_insert_color(&new->rbnode_phys, &phys_to_bus);
goto out;

err_out:
@@ -130,13 +169,22 @@ out:
return rc;
}

+static int xen_dma_add_entry(struct xen_dma_info *new)
+{
+ int rc;
+ if ((rc = xen_dma_add_entry_bus(new) < 0) ||
+ (rc = xen_dma_add_entry_phys(new) < 0))
+ return rc;
+ return 0;
+}
+
static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
{
struct rb_node *n = bus_to_phys.rb_node;
struct xen_dma_info *entry;

while (n) {
- entry = rb_entry(n, struct xen_dma_info, rbnode);
+ entry = rb_entry(n, struct xen_dma_info, rbnode_dma);
if (entry->dma_addr <= dma_addr &&
entry->dma_addr + entry->size > dma_addr) {
return entry;
@@ -150,11 +198,30 @@ static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
return NULL;
}

-static dma_addr_t xen_phys_to_bus(phys_addr_t paddr)
+static struct xen_dma_info *xen_get_dma_info_from_phys(phys_addr_t phys)
{
- int nr_seg;
- unsigned long offset;
- char *vaddr;
+ struct rb_node *n = phys_to_bus.rb_node;
+ struct xen_dma_info *entry;
+
+ while (n) {
+ entry = rb_entry(n, struct xen_dma_info, rbnode_phys);
+ if (entry->phys_addr <= phys &&
+ entry->phys_addr + entry->size > phys) {
+ return entry;
+ }
+ if (phys < entry->phys_addr)
+ n = n->rb_left;
+ else
+ n = n->rb_right;
+ }
+
+ return NULL;
+}
+
+/* Only looks into the initial buffer allocation in case of
+ * XENFEAT_auto_translated_physmap guests. */
+static dma_addr_t xen_phys_to_bus_quick(phys_addr_t paddr) { int nr_seg;
+ unsigned long offset; char *vaddr;

if (!xen_feature(XENFEAT_auto_translated_physmap))
return phys_to_machine(XPADDR(paddr)).maddr;
@@ -184,7 +251,7 @@ static phys_addr_t xen_bus_to_phys(dma_addr_t baddr)

static dma_addr_t xen_virt_to_bus(void *address)
{
- return xen_phys_to_bus(virt_to_phys(address));
+ return xen_phys_to_bus_quick(virt_to_phys(address));
}

static int check_pages_physically_contiguous(unsigned long pfn,
@@ -424,7 +491,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
* Do not use virt_to_phys(ret) because on ARM it doesn't correspond
* to *dma_handle. */
phys = *dma_handle;
- dev_addr = xen_phys_to_bus(phys);
+ dev_addr = xen_phys_to_bus_quick(phys);
if (!xen_feature(XENFEAT_auto_translated_physmap) &&
((dev_addr + size - 1 <= dma_mask)) &&
!range_straddles_page_boundary(phys, size))
@@ -503,7 +570,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
struct dma_attrs *attrs)
{
phys_addr_t map, phys = page_to_phys(page) + offset;
- dma_addr_t dev_addr = xen_phys_to_bus(phys);
+ dma_addr_t dev_addr = xen_phys_to_bus_quick(phys);

BUG_ON(dir == DMA_NONE);
/*
@@ -527,7 +594,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
if (map == SWIOTLB_MAP_ERROR)
return DMA_ERROR_CODE;

- dev_addr = xen_phys_to_bus(map);
+ dev_addr = xen_phys_to_bus_quick(map);

/*
* Ensure that the address returned is DMA'ble
@@ -656,7 +723,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,

for_each_sg(sgl, sg, nelems, i) {
phys_addr_t paddr = sg_phys(sg);
- dma_addr_t dev_addr = xen_phys_to_bus(paddr);
+ dma_addr_t dev_addr = xen_phys_to_bus_quick(paddr);

if (swiotlb_force ||
xen_feature(XENFEAT_auto_translated_physmap) ||
@@ -682,7 +749,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
sg_dma_len(sgl) = 0;
return DMA_ERROR_CODE;
}
- sg->dma_address = xen_phys_to_bus(map);
+ sg->dma_address = xen_phys_to_bus_quick(map);
} else
sg->dma_address = dev_addr;
sg_dma_len(sg) = sg->length;
--
1.7.2.5

2013-09-30 10:48:33

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v6 02/19] arm64: define DMA_ERROR_CODE

On Fri, Sep 27, 2013 at 05:09:50PM +0100, Stefano Stabellini wrote:
> Signed-off-by: Stefano Stabellini <[email protected]>
> CC: [email protected]
> CC: [email protected]

Acked-by: Catalin Marinas <[email protected]>

2013-09-30 10:49:24

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v6 09/19] arm64/xen: get_dma_ops: return xen_dma_ops if we are running on Xen

On Fri, Sep 27, 2013 at 05:09:57PM +0100, Stefano Stabellini wrote:
> Signed-off-by: Stefano Stabellini <[email protected]>
> Suggested-by: Catalin Marinas <[email protected]>
> CC: [email protected]
> CC: [email protected]

Acked-by: Catalin Marinas <[email protected]>

2013-09-30 14:54:43

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 03/19] xen: introduce XENMEM_exchange_and_pin and XENMEM_unpin

On Fri, Sep 27, 2013 at 05:09:51PM +0100, Stefano Stabellini wrote:
> XENMEM_exchange can't be used by autotranslate guests because of two
> severe limitations:
>
> - it does not copy back the mfns into the out field for autotranslate
> guests;
>
> - it does not guarantee that the hypervisor won't change the p2m
> mappings for the exchanged pages while the guest is using them. Xen
> never promises to keep the p2m mapping stable for autotranslate guests
> in general. In practice it won't happen unless one uses uncommon
> features like memory sharing or paging.
>
> To overcome these problems I am introducing two new hypercalls.
>
> Signed-off-by: Stefano Stabellini <[email protected]>

Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
>
>
> Changes in v6:
> - update the comments and the hypercalls structures.
>
> Changes in v5:
> - better comment for XENMEM_exchange_and_pin return codes;
>
> Changes in v4:
> - rename XENMEM_get_dma_buf to XENMEM_exchange_and_pin;
> - rename XENMEM_put_dma_buf to XENMEM_unpin;
> - improve the documentation of the new hypercalls;
> - add a note about out.address_bits for XENMEM_exchange.
> ---
> include/xen/interface/memory.h | 51 ++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 51 insertions(+), 0 deletions(-)
>
> diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
> index 2ecfe4f..49db252 100644
> --- a/include/xen/interface/memory.h
> +++ b/include/xen/interface/memory.h
> @@ -66,6 +66,9 @@ struct xen_memory_exchange {
> /*
> * [IN] Details of memory extents to be exchanged (GMFN bases).
> * Note that @in.address_bits is ignored and unused.
> + * @out.address_bits contains the maximum number of bits addressable
> + * by the caller. The addresses of the newly allocated pages have to
> + * meet this restriction.
> */
> struct xen_memory_reservation in;
>
> @@ -263,4 +266,52 @@ struct xen_remove_from_physmap {
> };
> DEFINE_GUEST_HANDLE_STRUCT(xen_remove_from_physmap);
>
> +/*
> + * This hypercall is similar to XENMEM_exchange: it takes the same
> + * struct as an argument and it exchanges the pages passed in with a new
> + * set of pages. The new pages are going to be "pinned": it's guaranteed
> + * that their p2m mapping won't be changed until explicitly "unpinned".
> + * Only normal guest r/w memory can be pinned: no granted pages or
> + * ballooned pages.
> + * If return code is zero then @out.extent_list provides the frame
> + * numbers of the newly-allocated memory.
> + * On X86 the frame numbers are machine frame numbers (mfns).
> + * On ARMv7 and ARMv8 the frame numbers are machine frame numbers (mfns).
> + * Returns zero on complete success, otherwise a negative error code.
> + * The most common error codes are:
> + * -ENOSYS if not implemented
> + * -EPERM if the domain is not privileged for this operation
> + * -EBUSY if the page is already pinned
> + * -EFAULT if an internal error occurs
> + * On complete success then always @nr_exchanged == @in.nr_extents. On
> + * partial success @nr_exchanged indicates how much work was done and a
> + * negative error code is returned.
> + */
> +#define XENMEM_exchange_and_pin 26
> +
> +/*
> + * XENMEM_unpin unpins a set of pages, previously pinned by
> + * XENMEM_exchange_and_pin. After this call the p2m mapping of the pages can
> + * be transparently changed by the hypervisor, as usual. The pages are
> + * still accessible from the guest.
> + */
> +#define XENMEM_unpin 27
> +struct xen_unpin {
> + /*
> + * [IN] Details of memory extents to be unpinned (GMFN bases).
> + * Note that @in.address_bits is ignored and unused.
> + */
> + struct xen_memory_reservation in;
> + /*
> + * [OUT] Number of input extents that were successfully unpinned.
> + * 1. The first @nr_unpinned input extents were successfully
> + * unpinned.
> + * 2. All other input extents are untouched.
> + * 3. If not all input extents are unpinned then the return code of this
> + * command will be non-zero.
> + */
> + xen_ulong_t nr_unpinned;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_unpin);
> +
> #endif /* __XEN_PUBLIC_MEMORY_H__ */
> --
> 1.7.2.5
>

2013-09-30 15:15:05

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 06/19] xen/arm,arm64: enable SWIOTLB_XEN

On Fri, Sep 27, 2013 at 05:09:54PM +0100, Stefano Stabellini wrote:
> Xen on arm and arm64 needs SWIOTLB_XEN: when running on Xen we need to
> program the hardware with mfns rather than pfns for dma addresses.
> Remove SWIOTLB_XEN dependency on X86 and PCI and make XEN select
> SWIOTLB_XEN on arm and arm64.
>
> At the moment always rely on swiotlb-xen, but when Xen starts supporting
> hardware IOMMUs we'll be able to avoid it conditionally on the presence
> of an IOMMU on the platform.
>
> Implement xen_create_contiguous_region on arm and arm64 by using
> XENMEM_exchange_and_pin.
>
> Initialize the xen-swiotlb from xen_early_init (before the native
> dma_ops are initialized), set xen_dma_ops to &xen_swiotlb_dma_ops.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
>
>
> Changes in v6:
> - introduce and export xen_dma_ops;
> - call xen_mm_init from as arch_initcall.
>
> Changes in v4:
> - remove redefinition of DMA_ERROR_CODE;
> - update the code to use XENMEM_exchange_and_pin and XENMEM_unpin;
> - add a note about hardware IOMMU in the commit message.
>
> Changes in v3:
> - code style changes;
> - warn on XENMEM_put_dma_buf failures.
> ---
> arch/arm/Kconfig | 1 +
> arch/arm/include/asm/xen/hypervisor.h | 2 +
> arch/arm/include/asm/xen/page.h | 2 +
> arch/arm/xen/Makefile | 2 +-
> arch/arm/xen/mm.c | 121 +++++++++++++++++++++++++++++++++
> arch/arm64/Kconfig | 1 +
> arch/arm64/xen/Makefile | 2 +-
> drivers/xen/Kconfig | 1 -
> drivers/xen/swiotlb-xen.c | 16 +++++
> 9 files changed, 145 insertions(+), 3 deletions(-)
> create mode 100644 arch/arm/xen/mm.c
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index c0bfb33..2c9d112 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1848,6 +1848,7 @@ config XEN
> depends on CPU_V7 && !CPU_V6
> depends on !GENERIC_ATOMIC64
> select ARM_PSCI
> + select SWIOTLB_XEN
> help
> Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
>
> diff --git a/arch/arm/include/asm/xen/hypervisor.h b/arch/arm/include/asm/xen/hypervisor.h
> index d7ab99a..1317ee4 100644
> --- a/arch/arm/include/asm/xen/hypervisor.h
> +++ b/arch/arm/include/asm/xen/hypervisor.h
> @@ -16,4 +16,6 @@ static inline enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
> return PARAVIRT_LAZY_NONE;
> }
>
> +extern struct dma_map_ops *xen_dma_ops;
> +
> #endif /* _ASM_ARM_XEN_HYPERVISOR_H */
> diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
> index 359a7b5..b0f7150 100644
> --- a/arch/arm/include/asm/xen/page.h
> +++ b/arch/arm/include/asm/xen/page.h
> @@ -6,12 +6,14 @@
>
> #include <linux/pfn.h>
> #include <linux/types.h>
> +#include <linux/dma-mapping.h>
>
> #include <xen/interface/grant_table.h>
>
> #define pfn_to_mfn(pfn) (pfn)
> #define phys_to_machine_mapping_valid(pfn) (1)
> #define mfn_to_pfn(mfn) (mfn)
> +#define mfn_to_local_pfn(m) (mfn_to_pfn(m))
> #define mfn_to_virt(m) (__va(mfn_to_pfn(m) << PAGE_SHIFT))
>
> #define pte_mfn pte_pfn
> diff --git a/arch/arm/xen/Makefile b/arch/arm/xen/Makefile
> index 4384103..66fc35d 100644
> --- a/arch/arm/xen/Makefile
> +++ b/arch/arm/xen/Makefile
> @@ -1 +1 @@
> -obj-y := enlighten.o hypercall.o grant-table.o
> +obj-y := enlighten.o hypercall.o grant-table.o mm.o
> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> new file mode 100644
> index 0000000..b065c98
> --- /dev/null
> +++ b/arch/arm/xen/mm.c
> @@ -0,0 +1,121 @@
> +#include <linux/bootmem.h>
> +#include <linux/gfp.h>
> +#include <linux/export.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/vmalloc.h>
> +#include <linux/swiotlb.h>
> +
> +#include <xen/xen.h>
> +#include <xen/interface/memory.h>
> +#include <xen/swiotlb-xen.h>
> +
> +#include <asm/cacheflush.h>
> +#include <asm/xen/page.h>
> +#include <asm/xen/hypercall.h>
> +#include <asm/xen/interface.h>
> +
> +static int xen_exchange_memory(xen_ulong_t extents_in,
> + unsigned int order_in,
> + xen_pfn_t *pfns_in,
> + xen_ulong_t extents_out,
> + unsigned int order_out,
> + xen_pfn_t *mfns_out,
> + unsigned int address_bits)
> +{
> + long rc;
> + int success;
> +
> + struct xen_memory_exchange exchange = {
> + .in = {
> + .nr_extents = extents_in,
> + .extent_order = order_in,
> + .domid = DOMID_SELF
> + },
> + .out = {
> + .nr_extents = extents_out,
> + .extent_order = order_out,
> + .address_bits = address_bits,
> + .domid = DOMID_SELF
> + }

I think you need to set .nr_exchange = 0 just in case there is
garbage on the stack and the hypercall is -ENOSYS.

> + };
> + set_xen_guest_handle(exchange.in.extent_start, pfns_in);
> + set_xen_guest_handle(exchange.out.extent_start, mfns_out);
> +
> + BUG_ON(extents_in << order_in != extents_out << order_out);
> +
> +
> + rc = HYPERVISOR_memory_op(XENMEM_exchange_and_pin, &exchange);
> + success = (exchange.nr_exchanged == extents_in);
> +
> + BUG_ON(!success && ((exchange.nr_exchanged != 0) || (rc == 0)));

I think you need to set nr_exchange = 0 before you make the hypercall.
Ohterwise you can get this:

a). Say rc = -ENOSYS, in which case the exchage.nr_exchange won't be touched.
success = 0

BUG(!0 && (<garbage> != 0)) <= BOOM.

If the hypercall failed (say -EBUSY), so we have:
success = 0

BUG(!0 && (1)) <= BOOM

Which I thought would be more of a normal error - and we would
just return the number of succesfull operations.

> + BUG_ON(success && (rc != 0));
> +
> + return success;
> +}
> +
> +int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
> + unsigned int address_bits,
> + dma_addr_t *dma_handle)
> +{
> + phys_addr_t pstart = __pa(vstart);
> + xen_pfn_t in_frame, out_frame;
> + int success;
> +
> + /* Get a new contiguous memory extent. */
> + in_frame = out_frame = pstart >> PAGE_SHIFT;
> + success = xen_exchange_memory(1, order, &in_frame,
> + 1, order, &out_frame,
> + address_bits);
> +
> + if (!success)
> + return -ENOMEM;
> +
> + *dma_handle = out_frame << PAGE_SHIFT;
> +
> + return success ? 0 : -ENOMEM;
> +}
> +EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
> +
> +void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
> +{
> + xen_pfn_t in_frame = __pa(vstart) >> PAGE_SHIFT;
> + struct xen_unpin unpin = {
> + .in = {
> + .nr_extents = 1,
> + .extent_order = order,
> + .domid = DOMID_SELF
> + },
> + };
> + set_xen_guest_handle(unpin.in.extent_start, &in_frame);
> +
> + WARN_ON(HYPERVISOR_memory_op(XENMEM_unpin, &unpin));
> +}
> +EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
> +
> +struct dma_map_ops *xen_dma_ops;
> +EXPORT_SYMBOL_GPL(xen_dma_ops);
> +
> +static struct dma_map_ops xen_swiotlb_dma_ops = {
> + .mapping_error = xen_swiotlb_dma_mapping_error,
> + .alloc = xen_swiotlb_alloc_coherent,
> + .free = xen_swiotlb_free_coherent,
> + .sync_single_for_cpu = xen_swiotlb_sync_single_for_cpu,
> + .sync_single_for_device = xen_swiotlb_sync_single_for_device,
> + .sync_sg_for_cpu = xen_swiotlb_sync_sg_for_cpu,
> + .sync_sg_for_device = xen_swiotlb_sync_sg_for_device,
> + .map_sg = xen_swiotlb_map_sg_attrs,
> + .unmap_sg = xen_swiotlb_unmap_sg_attrs,
> + .map_page = xen_swiotlb_map_page,
> + .unmap_page = xen_swiotlb_unmap_page,
> + .dma_supported = xen_swiotlb_dma_supported,
> +};
> +
> +int __init xen_mm_init(void)
> +{
> + xen_swiotlb_init(1, false);
> + xen_dma_ops = &xen_swiotlb_dma_ops;
> + return 0;
> +}
> +arch_initcall(xen_mm_init);
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9737e97..aa1f6fb 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -209,6 +209,7 @@ config XEN_DOM0
> config XEN
> bool "Xen guest support on ARM64 (EXPERIMENTAL)"
> depends on ARM64 && OF
> + select SWIOTLB_XEN
> help
> Say Y if you want to run Linux in a Virtual Machine on Xen on ARM64.
>
> diff --git a/arch/arm64/xen/Makefile b/arch/arm64/xen/Makefile
> index be24040..0ef9637 100644
> --- a/arch/arm64/xen/Makefile
> +++ b/arch/arm64/xen/Makefile
> @@ -1,2 +1,2 @@
> -xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o)
> +xen-arm-y += $(addprefix ../../arm/xen/, enlighten.o grant-table.o mm.o)
> obj-y := xen-arm.o hypercall.o
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 9e02d60..7e83688 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -140,7 +140,6 @@ config XEN_GRANT_DEV_ALLOC
>
> config SWIOTLB_XEN
> def_bool y
> - depends on PCI && X86
> select SWIOTLB
>
> config XEN_TMEM
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 84aef43..a1cb0f4 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -45,6 +45,8 @@
> #include <xen/xen-ops.h>
> #include <xen/hvc-console.h>
> #include <xen/features.h>
> +#include <asm/dma-mapping.h>
> +
> /*
> * Used to do a quick range check in swiotlb_tbl_unmap_single and
> * swiotlb_tbl_sync_single_*, to see if the memory was in fact allocated by this
> @@ -58,6 +60,20 @@ static unsigned long xen_io_tlb_nslabs;
> * Quick lookup value of the bus address of the IOTLB.
> */
>
> +#ifndef CONFIG_X86
> +static unsigned long dma_alloc_coherent_mask(struct device *dev,
> + gfp_t gfp)
> +{
> + unsigned long dma_mask = 0;
> +
> + dma_mask = dev->coherent_dma_mask;
> + if (!dma_mask)
> + dma_mask = (gfp & GFP_DMA) ? DMA_BIT_MASK(24) : DMA_BIT_MASK(32);
> +
> + return dma_mask;
> +}
> +#endif
> +
> struct xen_dma_info {
> dma_addr_t dma_addr;
> phys_addr_t phys_addr;
> --
> 1.7.2.5
>

2013-09-30 15:18:38

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 08/19] arm/xen: get_dma_ops: return xen_dma_ops if we are running on Xen

On Fri, Sep 27, 2013 at 05:09:56PM +0100, Stefano Stabellini wrote:
> Signed-off-by: Stefano Stabellini <[email protected]>
> Suggested-by: Catalin Marinas <[email protected]>
> CC: [email protected]
> CC: [email protected]

> ---
> arch/arm/include/asm/dma-mapping.h | 13 ++++++++++++-
> 1 files changed, 12 insertions(+), 1 deletions(-)
>
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
> index 8807124..b638d6d 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -12,17 +12,28 @@
> #include <asm/memory.h>
> #include <asm/cacheflush.h>
>
> +#include <xen/xen.h>
> +#include <asm/xen/hypervisor.h>
> +
> #define DMA_ERROR_CODE (~0)
> extern struct dma_map_ops arm_dma_ops;
> extern struct dma_map_ops arm_coherent_dma_ops;
>
> -static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +static inline struct dma_map_ops *__get_dma_ops(struct device *dev)

How about calling it __generic_dma_ops ?


> {
> if (dev && dev->archdata.dma_ops)
> return dev->archdata.dma_ops;
> return &arm_dma_ops;
> }
>
> +static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> +{
> + if (xen_domain())
> + return xen_dma_ops;
> + else
> + return __get_dma_ops(dev);


Could you explan (hopefully in the git commit description) why
we cannot over-write arm_dma_ops with xen_dma_ops?


Thank you.

> +}
> +
> static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
> {
> BUG_ON(!dev);
> --
> 1.7.2.5
>

2013-09-30 15:32:18

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 10/19] xen: introduce xen_alloc/free_coherent_pages

On Fri, Sep 27, 2013 at 05:09:58PM +0100, Stefano Stabellini wrote:
> xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
> and devices. On native x86 and ARMv8 is sufficient to call
> __get_free_pages in order to get a coherent buffer, while on ARM we need
> to call the native dma_ops->alloc implementation.
>
> When arm64 stops using the swiotlb by default and starts having multiple
> dma_ops implementations, we'll use __get_dma_ops there too.

I presume this is a future TODO, not some further patch (in which
case you should say in here the title of it). If it is a TODO could
you stick that in the sentence here somewhere to make it crytal clear that
it is not implemented.

Thank you.

>
> Introduce xen_alloc_coherent_pages to abstract the arch specific buffer
> allocation.
>
> Similarly introduce xen_free_coherent_pages to free a coherent buffer:
> on x86 and ARM64 is simply a call to free_pages while on ARM is
> arm_dma_ops.free.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
>
>
> Changes in v6:
> - call __get_dma_ops to get the native dma_ops pointer on arm.
> ---
> arch/arm/include/asm/xen/page-coherent.h | 22 ++++++++++++++++++++++
> arch/arm64/include/asm/xen/page-coherent.h | 24 ++++++++++++++++++++++++
> arch/ia64/include/asm/xen/page-coherent.h | 24 ++++++++++++++++++++++++
> arch/x86/include/asm/xen/page-coherent.h | 24 ++++++++++++++++++++++++
> 4 files changed, 94 insertions(+), 0 deletions(-)
> create mode 100644 arch/arm/include/asm/xen/page-coherent.h
> create mode 100644 arch/arm64/include/asm/xen/page-coherent.h
> create mode 100644 arch/ia64/include/asm/xen/page-coherent.h
> create mode 100644 arch/x86/include/asm/xen/page-coherent.h
>
> diff --git a/arch/arm/include/asm/xen/page-coherent.h b/arch/arm/include/asm/xen/page-coherent.h
> new file mode 100644
> index 0000000..92b7a8a
> --- /dev/null
> +++ b/arch/arm/include/asm/xen/page-coherent.h
> @@ -0,0 +1,22 @@
> +#ifndef _ASM_ARM_XEN_PAGE_COHERENT_H
> +#define _ASM_ARM_XEN_PAGE_COHERENT_H
> +
> +#include <asm/page.h>
> +#include <linux/dma-attrs.h>
> +#include <linux/dma-mapping.h>
> +
> +static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
> + dma_addr_t *dma_handle, gfp_t flags,
> + struct dma_attrs *attrs)
> +{
> + return __get_dma_ops(hwdev)->alloc(hwdev, size, dma_handle, flags, attrs);
> +}
> +
> +static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
> + void *cpu_addr, dma_addr_t dma_handle,
> + struct dma_attrs *attrs)
> +{
> + __get_dma_ops(hwdev)->free(hwdev, size, cpu_addr, dma_handle, attrs);
> +}
> +
> +#endif /* _ASM_ARM_XEN_PAGE_COHERENT_H */
> diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h
> new file mode 100644
> index 0000000..0d6ad25
> --- /dev/null
> +++ b/arch/arm64/include/asm/xen/page-coherent.h
> @@ -0,0 +1,24 @@
> +#ifndef _ASM_ARM64_XEN_PAGE_COHERENT_H
> +#define _ASM_ARM64_XEN_PAGE_COHERENT_H
> +
> +#include <asm/page.h>
> +#include <linux/dma-attrs.h>
> +#include <linux/dma-mapping.h>
> +
> +static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
> + dma_addr_t *dma_handle, gfp_t flags,
> + struct dma_attrs *attrs)
> +{
> + void *vstart = (void*)__get_free_pages(flags, get_order(size));
> + *dma_handle = virt_to_phys(vstart);
> + return vstart;
> +}
> +
> +static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
> + void *cpu_addr, dma_addr_t dma_handle,
> + struct dma_attrs *attrs)
> +{
> + free_pages((unsigned long) cpu_addr, get_order(size));
> +}
> +
> +#endif /* _ASM_ARM64_XEN_PAGE_COHERENT_H */
> diff --git a/arch/ia64/include/asm/xen/page-coherent.h b/arch/ia64/include/asm/xen/page-coherent.h
> new file mode 100644
> index 0000000..37b929c
> --- /dev/null
> +++ b/arch/ia64/include/asm/xen/page-coherent.h
> @@ -0,0 +1,24 @@
> +#ifndef _ASM_IA64_XEN_PAGE_COHERENT_H
> +#define _ASM_IA64_XEN_PAGE_COHERENT_H
> +
> +#include <asm/page.h>
> +#include <linux/dma-attrs.h>
> +#include <linux/dma-mapping.h>
> +
> +static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
> + dma_addr_t *dma_handle, gfp_t flags,
> + struct dma_attrs *attrs)
> +{
> + void *vstart = (void*)__get_free_pages(flags, get_order(size));
> + *dma_handle = virt_to_phys(vstart);
> + return vstart;
> +}
> +
> +static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
> + void *cpu_addr, dma_addr_t dma_handle,
> + struct dma_attrs *attrs)
> +{
> + free_pages((unsigned long) cpu_addr, get_order(size));
> +}
> +
> +#endif /* _ASM_IA64_XEN_PAGE_COHERENT_H */
> diff --git a/arch/x86/include/asm/xen/page-coherent.h b/arch/x86/include/asm/xen/page-coherent.h
> new file mode 100644
> index 0000000..31de2e0
> --- /dev/null
> +++ b/arch/x86/include/asm/xen/page-coherent.h
> @@ -0,0 +1,24 @@
> +#ifndef _ASM_X86_XEN_PAGE_COHERENT_H
> +#define _ASM_X86_XEN_PAGE_COHERENT_H
> +
> +#include <asm/page.h>
> +#include <linux/dma-attrs.h>
> +#include <linux/dma-mapping.h>
> +
> +static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
> + dma_addr_t *dma_handle, gfp_t flags,
> + struct dma_attrs *attrs)
> +{
> + void *vstart = (void*)__get_free_pages(flags, get_order(size));
> + *dma_handle = virt_to_phys(vstart);
> + return vstart;
> +}
> +
> +static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
> + void *cpu_addr, dma_addr_t dma_handle,
> + struct dma_attrs *attrs)
> +{
> + free_pages((unsigned long) cpu_addr, get_order(size));
> +}
> +
> +#endif /* _ASM_X86_XEN_PAGE_COHERENT_H */
> --
> 1.7.2.5
>

2013-09-30 15:35:21

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 11/19] swiotlb-xen: use xen_alloc/free_coherent_pages

On Fri, Sep 27, 2013 at 05:09:59PM +0100, Stefano Stabellini wrote:
> Use xen_alloc_coherent_pages and xen_free_coherent_pages to allocate or
> free coherent pages.
>
> We need to be careful handling the pointer returned by
> xen_alloc_coherent_pages, because on ARM the pointer is not equal to
> phys_to_virt(*dma_handle). In fact virt_to_phys only works for kernel
> direct mapped RAM memory.
> In ARM case the pointer could be an ioremap address, therefore passing
> it to virt_to_phys would give you another physical address that doesn't
> correspond to it.
>
> Make xen_create_contiguous_region take a phys_addr_t as start parameter to
> avoid the virt_to_phys calls which would be incorrect.
>
> Changes in v6:
> - remove extra spaces.
>
> Signed-off-by: Stefano Stabellini <[email protected]>

Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> arch/arm/xen/mm.c | 7 +++----
> arch/x86/xen/mmu.c | 7 +++++--
> drivers/xen/swiotlb-xen.c | 31 +++++++++++++++++++++----------
> include/xen/xen-ops.h | 4 ++--
> 4 files changed, 31 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> index 4330b15..b305b94 100644
> --- a/arch/arm/xen/mm.c
> +++ b/arch/arm/xen/mm.c
> @@ -55,11 +55,10 @@ static int xen_exchange_memory(xen_ulong_t extents_in,
> return success;
> }
>
> -int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
> +int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
> unsigned int address_bits,
> dma_addr_t *dma_handle)
> {
> - phys_addr_t pstart = __pa(vstart);
> xen_pfn_t in_frame, out_frame;
> int success;
>
> @@ -78,9 +77,9 @@ int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
> }
> EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
>
> -void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
> +void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
> {
> - xen_pfn_t in_frame = __pa(vstart) >> PAGE_SHIFT;
> + xen_pfn_t in_frame = pstart >> PAGE_SHIFT;
> struct xen_unpin unpin = {
> .in = {
> .nr_extents = 1,
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 6c34d7c..8830883 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -2328,13 +2328,14 @@ static int xen_exchange_memory(unsigned long extents_in, unsigned int order_in,
> return success;
> }
>
> -int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
> +int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
> unsigned int address_bits,
> dma_addr_t *dma_handle)
> {
> unsigned long *in_frames = discontig_frames, out_frame;
> unsigned long flags;
> int success;
> + unsigned long vstart = (unsigned long)phys_to_virt(pstart);
>
> /*
> * Currently an auto-translated guest will not perform I/O, nor will
> @@ -2374,11 +2375,12 @@ int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
> }
> EXPORT_SYMBOL_GPL(xen_create_contiguous_region);
>
> -void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
> +void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order)
> {
> unsigned long *out_frames = discontig_frames, in_frame;
> unsigned long flags;
> int success;
> + unsigned long vstart;
>
> if (xen_feature(XENFEAT_auto_translated_physmap))
> return;
> @@ -2386,6 +2388,7 @@ void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order)
> if (unlikely(order > MAX_CONTIG_ORDER))
> return;
>
> + vstart = (unsigned long)phys_to_virt(pstart);
> memset((void *) vstart, 0, PAGE_SIZE << order);
>
> spin_lock_irqsave(&xen_reservation_lock, flags);
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index deb9131..96ad316 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -46,6 +46,7 @@
> #include <xen/hvc-console.h>
> #include <xen/features.h>
> #include <asm/dma-mapping.h>
> +#include <asm/xen/page-coherent.h>
>
> /*
> * Used to do a quick range check in swiotlb_tbl_unmap_single and
> @@ -244,6 +245,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
> {
> int i, j, rc;
> int dma_bits;
> + phys_addr_t p = virt_to_phys(buf);
>
> dma_bits = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT) + PAGE_SHIFT;
>
> @@ -253,7 +255,7 @@ xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs)
>
> do {
> rc = xen_create_contiguous_region(
> - (unsigned long)buf + (i << IO_TLB_SHIFT),
> + p + (i << IO_TLB_SHIFT),
> get_order(slabs << IO_TLB_SHIFT),
> dma_bits, &xen_dma_seg[j].dma_addr);
> } while (rc && dma_bits++ < max_dma_bits);
> @@ -389,7 +391,6 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> void *ret;
> int order = get_order(size);
> u64 dma_mask = DMA_BIT_MASK(32);
> - unsigned long vstart;
> phys_addr_t phys;
> dma_addr_t dev_addr;
> struct xen_dma_info *dma_info = NULL;
> @@ -405,8 +406,12 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> if (dma_alloc_from_coherent(hwdev, size, dma_handle, &ret))
> return ret;
>
> - vstart = __get_free_pages(flags, order);
> - ret = (void *)vstart;
> + /* On ARM this function returns an ioremap'ped virtual address for
> + * which virt_to_phys doesn't return the corresponding physical
> + * address. In fact on ARM virt_to_phys only works for kernel direct
> + * mapped RAM memory. Also see comment below.
> + */
> + ret = xen_alloc_coherent_pages(hwdev, size, dma_handle, flags, attrs);
>
> if (!ret)
> return ret;
> @@ -414,16 +419,20 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> if (hwdev && hwdev->coherent_dma_mask)
> dma_mask = dma_alloc_coherent_mask(hwdev, flags);
>
> - phys = virt_to_phys(ret);
> + /* At this point dma_handle is the physical address, next we are
> + * going to set it to the machine address.
> + * Do not use virt_to_phys(ret) because on ARM it doesn't correspond
> + * to *dma_handle. */
> + phys = *dma_handle;
> dev_addr = xen_phys_to_bus(phys);
> if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> ((dev_addr + size - 1 <= dma_mask)) &&
> !range_straddles_page_boundary(phys, size))
> *dma_handle = dev_addr;
> else {
> - if (xen_create_contiguous_region(vstart, order,
> + if (xen_create_contiguous_region(phys, order,
> fls64(dma_mask), dma_handle) != 0) {
> - free_pages(vstart, order);
> + xen_free_coherent_pages(hwdev, size, ret, (dma_addr_t)phys, attrs);
> return NULL;
> }
>
> @@ -463,18 +472,20 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
> if (hwdev && hwdev->coherent_dma_mask)
> dma_mask = hwdev->coherent_dma_mask;
>
> - phys = virt_to_phys(vaddr);
> + /* do not use virt_to_phys because on ARM it doesn't return you the
> + * physical address */
> + phys = xen_bus_to_phys(dev_addr);
>
> if (xen_feature(XENFEAT_auto_translated_physmap) ||
> (((dev_addr + size - 1 > dma_mask)) ||
> range_straddles_page_boundary(phys, size))) {
> - xen_destroy_contiguous_region((unsigned long)vaddr, order);
> + xen_destroy_contiguous_region(phys, order);
> dma_info = xen_get_dma_info_from_dma(dev_addr);
> rb_erase(&dma_info->rbnode, &bus_to_phys);
> kfree(dma_info);
> }
>
> - free_pages((unsigned long)vaddr, order);
> + xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
> }
> EXPORT_SYMBOL_GPL(xen_swiotlb_free_coherent);
>
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index 9ef704d..fb2ea8f 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -19,11 +19,11 @@ void xen_arch_resume(void);
> int xen_setup_shutdown_event(void);
>
> extern unsigned long *xen_contiguous_bitmap;
> -int xen_create_contiguous_region(unsigned long vstart, unsigned int order,
> +int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
> unsigned int address_bits,
> dma_addr_t *dma_handle);
>
> -void xen_destroy_contiguous_region(unsigned long vstart, unsigned int order);
> +void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order);
>
> struct vm_area_struct;
> int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
> --
> 1.7.2.5
>

2013-09-30 15:56:55

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 12/19] swiotlb: don't assume that io_tlb_start-io_tlb_end is coherent

On Fri, Sep 27, 2013 at 05:10:00PM +0100, Stefano Stabellini wrote:
> The swiotlb code has appropriate calls to dma_mark_clean in place for
> buffers passed to swiotlb_map_page as an argument. However it assumes
> that the swiotlb bounce buffer (io_tlb_start-io_tlb_end) is already
> coherent and doesn't need any calls to dma_mark_clean.
>
> On ARM the swiotlb bounce buffer is not coherent (the memory is
> writealloc while it should be bufferable) and therefore we need to call
> dma_mark_clean appropriately on the bounce buffer code paths too.
>
> Note that most architecures have an empty dma_mark_clean implementation
> anyway.

The other architecture that uses swiotlb is IA64 and that does have
an implementation where it touches on page attributes.

Which means I have to figure out why my HP zx6000 won't boot with 3.11 now :-(

>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> lib/swiotlb.c | 13 ++++++++++---
> 1 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 4e8686c..eb45d17 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -515,6 +515,7 @@ found:
> io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
> swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
> + dma_mark_clean(phys_to_virt(tlb_addr), size);
>
> return tlb_addr;
> }
> @@ -547,7 +548,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
> * First, sync the memory before unmapping the entry
> */
> if (orig_addr && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
> + {
> + dma_mark_clean(phys_to_virt(tlb_addr), size);
> swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
> + }
>
> /*
> * Return the buffer to the free list by setting the corresponding
> @@ -587,17 +591,20 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
>
> switch (target) {
> case SYNC_FOR_CPU:
> - if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
> + if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> + dma_mark_clean(phys_to_virt(tlb_addr), size);
> swiotlb_bounce(orig_addr, tlb_addr,
> size, DMA_FROM_DEVICE);
> + }
> else
> BUG_ON(dir != DMA_TO_DEVICE);
> break;
> case SYNC_FOR_DEVICE:
> - if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
> + if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> swiotlb_bounce(orig_addr, tlb_addr,
> size, DMA_TO_DEVICE);
> - else
> + dma_mark_clean(phys_to_virt(tlb_addr), size);
> + } else
> BUG_ON(dir != DMA_FROM_DEVICE);
> break;
> default:
> --
> 1.7.2.5
>

2013-09-30 15:58:48

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 14/19] swiotlb: print a warning when the swiotlb is full

On Fri, Sep 27, 2013 at 05:10:02PM +0100, Stefano Stabellini wrote:
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> drivers/xen/swiotlb-xen.c | 1 +
> lib/swiotlb.c | 1 +
> 2 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 96ad316..790c2eb 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -674,6 +674,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> sg->length,
> dir);
> if (map == SWIOTLB_MAP_ERROR) {
> + pr_warn("swiotlb buffer is full\n");

It would be beneficial to use dev_warn instead.

And perhaps even call debug_dma_dump_mappings to help in diagnosing
a problem?

> /* Don't panic here, we expect map_sg users
> to do proper error handling. */
> xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index eb45d17..f06da0d 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -502,6 +502,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>
> not_found:
> spin_unlock_irqrestore(&io_tlb_lock, flags);
> + pr_warn("swiotlb buffer is full\n");
> return SWIOTLB_MAP_ERROR;
> found:
> spin_unlock_irqrestore(&io_tlb_lock, flags);
> --
> 1.7.2.5
>

2013-09-30 16:02:56

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 15/19] swiotlb-xen: call dma_capable only if dev->dma_mask is allocated

On Fri, Sep 27, 2013 at 05:10:03PM +0100, Stefano Stabellini wrote:

Why? I am looking at X86 and IA64 and I see:

79 if (!dev->dma_mask)
80 return 0;


Why not port dma_capable over to ARM?

> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> drivers/xen/swiotlb-xen.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 790c2eb..3011736 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -512,7 +512,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> * buffering it.
> */
> if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> - dma_capable(dev, dev_addr, size) &&
> + dev->dma_mask && dma_capable(dev, dev_addr, size) &&
> !range_straddles_page_boundary(phys, size) && !swiotlb_force)
> return dev_addr;
>
> @@ -532,7 +532,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> /*
> * Ensure that the address returned is DMA'ble
> */
> - if (!dma_capable(dev, dev_addr, size)) {
> + if (dev->dma_mask && !dma_capable(dev, dev_addr, size)) {
> swiotlb_tbl_unmap_single(dev, map, size, dir);
> dev_addr = 0;
> }
> @@ -660,7 +660,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
>
> if (swiotlb_force ||
> xen_feature(XENFEAT_auto_translated_physmap) ||
> - !dma_capable(hwdev, dev_addr, sg->length) ||
> + (hwdev->dma_mask && !dma_capable(hwdev, dev_addr, sg->length)) ||
> range_straddles_page_boundary(paddr, sg->length)) {
> /*
> * Pass the dma_addr of the first slab in the iotlb buffer as
> --
> 1.7.2.5
>

2013-09-30 16:07:27

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 16/19] arm,arm64: do not always merge biovec if we are running on Xen

On Fri, Sep 27, 2013 at 05:10:04PM +0100, Stefano Stabellini wrote:
> This is similar to what it is done on X86: biovecs are prevented from merging
> otherwise every dma requests would be forced to bounce on the swiotlb buffer.
>
> Signed-off-by: Stefano Stabellini <[email protected]>


This also alters the generic (well, was x86, now it is generic) code.
Perhaps that change should be made in a seperate patch and explain
what the autotranslate extra check will do for PV, PVH and HVM guests?

Wait, does that mean we had not been merging requests on HVM guests?
Is that good or bad?

> ---
> arch/arm/include/asm/io.h | 8 ++++++++
> arch/arm64/include/asm/io.h | 9 +++++++++
> drivers/xen/biomerge.c | 4 +++-
> 3 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> index d070741..c45effc 100644
> --- a/arch/arm/include/asm/io.h
> +++ b/arch/arm/include/asm/io.h
> @@ -24,9 +24,11 @@
> #ifdef __KERNEL__
>
> #include <linux/types.h>
> +#include <linux/blk_types.h>
> #include <asm/byteorder.h>
> #include <asm/memory.h>
> #include <asm-generic/pci_iomap.h>
> +#include <xen/xen.h>
>
> /*
> * ISA I/O bus memory addresses are 1:1 with the physical address.
> @@ -372,6 +374,12 @@ extern void pci_iounmap(struct pci_dev *dev, void __iomem *addr);
> #define BIOVEC_MERGEABLE(vec1, vec2) \
> ((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))
>
> +extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> + const struct bio_vec *vec2);
> +#define BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
> + (__BIOVEC_PHYS_MERGEABLE(vec1, vec2) && \
> + (!xen_domain() || xen_biovec_phys_mergeable(vec1, vec2)))
> +
> #ifdef CONFIG_MMU
> #define ARCH_HAS_VALID_PHYS_ADDR_RANGE
> extern int valid_phys_addr_range(phys_addr_t addr, size_t size);
> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> index 1d12f89..c163287b 100644
> --- a/arch/arm64/include/asm/io.h
> +++ b/arch/arm64/include/asm/io.h
> @@ -22,11 +22,14 @@
> #ifdef __KERNEL__
>
> #include <linux/types.h>
> +#include <linux/blk_types.h>
>
> #include <asm/byteorder.h>
> #include <asm/barrier.h>
> #include <asm/pgtable.h>
>
> +#include <xen/xen.h>
> +
> /*
> * Generic IO read/write. These perform native-endian accesses.
> */
> @@ -263,5 +266,11 @@ extern int devmem_is_allowed(unsigned long pfn);
> */
> #define xlate_dev_kmem_ptr(p) p
>
> +extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> + const struct bio_vec *vec2);
> +#define BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
> + (__BIOVEC_PHYS_MERGEABLE(vec1, vec2) && \
> + (!xen_domain() || xen_biovec_phys_mergeable(vec1, vec2)))
> +
> #endif /* __KERNEL__ */
> #endif /* __ASM_IO_H */
> diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
> index 0edb91c..f323a2d 100644
> --- a/drivers/xen/biomerge.c
> +++ b/drivers/xen/biomerge.c
> @@ -2,6 +2,7 @@
> #include <linux/io.h>
> #include <linux/export.h>
> #include <xen/page.h>
> +#include <xen/features.h>
>
> bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> const struct bio_vec *vec2)
> @@ -9,7 +10,8 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> unsigned long mfn1 = pfn_to_mfn(page_to_pfn(vec1->bv_page));
> unsigned long mfn2 = pfn_to_mfn(page_to_pfn(vec2->bv_page));
>
> - return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
> + return !xen_feature(XENFEAT_auto_translated_physmap) &&
> + __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
> ((mfn1 == mfn2) || ((mfn1+1) == mfn2));
> }
> EXPORT_SYMBOL(xen_biovec_phys_mergeable);
> --
> 1.7.2.5
>

2013-09-30 17:22:44

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 17/19] xen: introduce XENMEM_pin

On Fri, Sep 27, 2013 at 05:10:05PM +0100, Stefano Stabellini wrote:
> Introduce a new hypercall to pin one or more pages whose machine
> addresses respect a dma_mask passed as an argument
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> arch/arm/xen/mm.c | 16 ++++++++++++++++
> arch/x86/xen/mmu.c | 7 +++++++
> include/xen/interface/memory.h | 32 ++++++++++++++++++++++++++++++++
> include/xen/xen-ops.h | 1 +
> 4 files changed, 56 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> index b305b94..146c1c3 100644
> --- a/arch/arm/xen/mm.c
> +++ b/arch/arm/xen/mm.c
> @@ -55,6 +55,22 @@ static int xen_exchange_memory(xen_ulong_t extents_in,
> return success;
> }
>
> +int xen_pin_page(xen_pfn_t *in_frame, unsigned int address_bits)
> +{
> + struct xen_pin pin = {
> + .in = {
> + .nr_extents = 1,
> + .extent_order = 0,
> + .domid = DOMID_SELF,
> + .address_bits = address_bits
> + },
> + };
> + set_xen_guest_handle(pin.in.extent_start, in_frame);
> +
> + return HYPERVISOR_memory_op(XENMEM_pin, &pin);
> +}
> +EXPORT_SYMBOL_GPL(xen_pin_page);
> +
> int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
> unsigned int address_bits,
> dma_addr_t *dma_handle)
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 8830883..8f76ce2 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -2568,3 +2568,10 @@ int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
> return -EINVAL;
> }
> EXPORT_SYMBOL_GPL(xen_unmap_domain_mfn_range);
> +
> +int xen_pin_page(xen_pfn_t *in_frame, unsigned int address_bits)
> +{
> + return -ENOSYS;
> +}
> +EXPORT_SYMBOL_GPL(xen_pin_page);
> +
> diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
> index 49db252..66ab578 100644
> --- a/include/xen/interface/memory.h
> +++ b/include/xen/interface/memory.h
> @@ -314,4 +314,36 @@ struct xen_unpin {
> };
> DEFINE_GUEST_HANDLE_STRUCT(xen_unpin);
>
> +/*
> + * XENMEM_pin pins a set of pages to make sure that the hypervisor does
> + * not change the p2m mappings for them.
> + *
> + */
> +#define XENMEM_pin 28
> +struct xen_pin {
> + /*
> + * [IN/OUT] Details of memory extents to be pinned (GMFN bases).
> + * Xen copies back the MFNs corresponding to the GMFNs passed in as
> + * argument.
> + * @in.address_bits contains the maximum number of bits addressable
> + * by the caller. If the machine addresses of the pages to be pinned
> + * are not addressable according to @in.address_bits, the hypercall
> + * fails and returns an errors. The pages are not pinned. Otherwise
^^^^^-error.
What kind of error? And you should probably join the two sentences together
(the "If the ... The pages are not pinned.")


> + * the hypercall succeeds.

and does it return a number of pages that were pinned or just zero?

> + */
> + struct xen_memory_reservation in;
> +
> + /*
> + * [OUT] Number of input extents that were successfully pinned.
> + * 1. The first @nr_pinned input extents were successfully
> + * pinned.
> + * 2. All other input extents are untouched.
> + * 3. If not all input extents are pinned then the return code of this
> + * command will be non-zero.

OK, what return code?

> + */
> + xen_ulong_t nr_pinned;
> +};
> +DEFINE_GUEST_HANDLE_STRUCT(xen_pin);
> +
> +
> #endif /* __XEN_PUBLIC_MEMORY_H__ */
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index fb2ea8f..4cf4fc5 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -22,6 +22,7 @@ extern unsigned long *xen_contiguous_bitmap;
> int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order,
> unsigned int address_bits,
> dma_addr_t *dma_handle);
> +int xen_pin_page(xen_pfn_t *in_frame, unsigned int address_bits);
>
> void xen_destroy_contiguous_region(phys_addr_t pstart, unsigned int order);
>
> --
> 1.7.2.5
>

2013-09-30 17:27:55

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 18/19] swiotlb-xen: introduce a rbtree to track phys to bus mappings

On Fri, Sep 27, 2013 at 05:10:06PM +0100, Stefano Stabellini wrote:
> Introduce a second red-back tree to track phys to bus mappings created after
> the initialization of the swiotlb buffer.

Could you explain the use case a bit more please?

As in:
a) why is this needed
b) why are we using the rb tree instead of something else (say FIFO queue)
c) how long are these in usage?
d) do we need locking now?

>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> drivers/xen/swiotlb-xen.c | 99 +++++++++++++++++++++++++++++++++++++-------
> 1 files changed, 83 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 3011736..022bcaf 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -79,7 +79,8 @@ struct xen_dma_info {
> dma_addr_t dma_addr;
> phys_addr_t phys_addr;
> size_t size;
> - struct rb_node rbnode;
> + struct rb_node rbnode_dma;
> + struct rb_node rbnode_phys;
> };
>
> /*
> @@ -96,8 +97,13 @@ static struct xen_dma_info *xen_dma_seg;
> * mappings.
> */
> static struct rb_root bus_to_phys = RB_ROOT;
> +/*
> + * This tree keeps track of physical address to bus address
> + * mappings apart from the ones belonging to the initial swiotlb buffer.
> + */
> +static struct rb_root phys_to_bus = RB_ROOT;
>
> -static int xen_dma_add_entry(struct xen_dma_info *new)
> +static int xen_dma_add_entry_bus(struct xen_dma_info *new)
> {
> struct rb_node **link = &bus_to_phys.rb_node;
> struct rb_node *parent = NULL;
> @@ -106,7 +112,7 @@ static int xen_dma_add_entry(struct xen_dma_info *new)
>
> while (*link) {
> parent = *link;
> - entry = rb_entry(parent, struct xen_dma_info, rbnode);
> + entry = rb_entry(parent, struct xen_dma_info, rbnode_dma);
>
> if (new->dma_addr == entry->dma_addr)
> goto err_out;
> @@ -118,8 +124,41 @@ static int xen_dma_add_entry(struct xen_dma_info *new)
> else
> link = &(*link)->rb_right;
> }
> - rb_link_node(&new->rbnode, parent, link);
> - rb_insert_color(&new->rbnode, &bus_to_phys);
> + rb_link_node(&new->rbnode_dma, parent, link);
> + rb_insert_color(&new->rbnode_dma, &bus_to_phys);
> + goto out;
> +
> +err_out:
> + rc = -EINVAL;
> + pr_warn("%s: cannot add phys=%pa -> dma=%pa: phys=%pa -> dma=%pa already exists\n",
> + __func__, &new->phys_addr, &new->dma_addr, &entry->phys_addr, &entry->dma_addr);
> +out:
> + return rc;
> +}
> +
> +static int xen_dma_add_entry_phys(struct xen_dma_info *new)
> +{
> + struct rb_node **link = &phys_to_bus.rb_node;
> + struct rb_node *parent = NULL;
> + struct xen_dma_info *entry;
> + int rc = 0;
> +
> + while (*link) {
> + parent = *link;
> + entry = rb_entry(parent, struct xen_dma_info, rbnode_phys);
> +
> + if (new->dma_addr == entry->dma_addr)
> + goto err_out;
> + if (new->phys_addr == entry->phys_addr)
> + goto err_out;
> +
> + if (new->phys_addr < entry->phys_addr)
> + link = &(*link)->rb_left;
> + else
> + link = &(*link)->rb_right;
> + }
> + rb_link_node(&new->rbnode_phys, parent, link);
> + rb_insert_color(&new->rbnode_phys, &phys_to_bus);
> goto out;
>
> err_out:
> @@ -130,13 +169,22 @@ out:
> return rc;
> }
>
> +static int xen_dma_add_entry(struct xen_dma_info *new)
> +{
> + int rc;
> + if ((rc = xen_dma_add_entry_bus(new) < 0) ||
> + (rc = xen_dma_add_entry_phys(new) < 0))

Please don't do that. Just do

rc = xen_dma_add_entry(bus);
if (rc)
return rc;
rc = xen_dma_add_entry_phys(new);
if (rc) {
// unwind it somehow? <<== This is important :-)
}
return rc;


> + return rc;
> + return 0;
> +}
> +
> static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
> {
> struct rb_node *n = bus_to_phys.rb_node;
> struct xen_dma_info *entry;
>
> while (n) {
> - entry = rb_entry(n, struct xen_dma_info, rbnode);
> + entry = rb_entry(n, struct xen_dma_info, rbnode_dma);
> if (entry->dma_addr <= dma_addr &&
> entry->dma_addr + entry->size > dma_addr) {
> return entry;
> @@ -150,11 +198,30 @@ static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
> return NULL;
> }
>
> -static dma_addr_t xen_phys_to_bus(phys_addr_t paddr)
> +static struct xen_dma_info *xen_get_dma_info_from_phys(phys_addr_t phys)
> {
> - int nr_seg;
> - unsigned long offset;
> - char *vaddr;
> + struct rb_node *n = phys_to_bus.rb_node;
> + struct xen_dma_info *entry;
> +
> + while (n) {
> + entry = rb_entry(n, struct xen_dma_info, rbnode_phys);
> + if (entry->phys_addr <= phys &&
> + entry->phys_addr + entry->size > phys) {
> + return entry;
> + }
> + if (phys < entry->phys_addr)
> + n = n->rb_left;
> + else
> + n = n->rb_right;
> + }
> +
> + return NULL;
> +}
> +
> +/* Only looks into the initial buffer allocation in case of
> + * XENFEAT_auto_translated_physmap guests. */
> +static dma_addr_t xen_phys_to_bus_quick(phys_addr_t paddr) { int nr_seg;
> + unsigned long offset; char *vaddr;
>
> if (!xen_feature(XENFEAT_auto_translated_physmap))
> return phys_to_machine(XPADDR(paddr)).maddr;
> @@ -184,7 +251,7 @@ static phys_addr_t xen_bus_to_phys(dma_addr_t baddr)
>
> static dma_addr_t xen_virt_to_bus(void *address)
> {
> - return xen_phys_to_bus(virt_to_phys(address));
> + return xen_phys_to_bus_quick(virt_to_phys(address));
> }
>
> static int check_pages_physically_contiguous(unsigned long pfn,
> @@ -424,7 +491,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> * Do not use virt_to_phys(ret) because on ARM it doesn't correspond
> * to *dma_handle. */
> phys = *dma_handle;
> - dev_addr = xen_phys_to_bus(phys);
> + dev_addr = xen_phys_to_bus_quick(phys);
> if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> ((dev_addr + size - 1 <= dma_mask)) &&
> !range_straddles_page_boundary(phys, size))
> @@ -503,7 +570,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> struct dma_attrs *attrs)
> {
> phys_addr_t map, phys = page_to_phys(page) + offset;
> - dma_addr_t dev_addr = xen_phys_to_bus(phys);
> + dma_addr_t dev_addr = xen_phys_to_bus_quick(phys);
>
> BUG_ON(dir == DMA_NONE);
> /*
> @@ -527,7 +594,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> if (map == SWIOTLB_MAP_ERROR)
> return DMA_ERROR_CODE;
>
> - dev_addr = xen_phys_to_bus(map);
> + dev_addr = xen_phys_to_bus_quick(map);
>
> /*
> * Ensure that the address returned is DMA'ble
> @@ -656,7 +723,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
>
> for_each_sg(sgl, sg, nelems, i) {
> phys_addr_t paddr = sg_phys(sg);
> - dma_addr_t dev_addr = xen_phys_to_bus(paddr);
> + dma_addr_t dev_addr = xen_phys_to_bus_quick(paddr);
>
> if (swiotlb_force ||
> xen_feature(XENFEAT_auto_translated_physmap) ||
> @@ -682,7 +749,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> sg_dma_len(sgl) = 0;
> return DMA_ERROR_CODE;
> }
> - sg->dma_address = xen_phys_to_bus(map);
> + sg->dma_address = xen_phys_to_bus_quick(map);
> } else
> sg->dma_address = dev_addr;
> sg_dma_len(sg) = sg->length;
> --
> 1.7.2.5
>

2013-09-30 17:40:18

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 19/19] swiotlb-xen: instead of bouncing on the swiotlb, pin single pages

On Fri, Sep 27, 2013 at 05:10:07PM +0100, Stefano Stabellini wrote:
> If we are dealing with single page mappings that don't cross page
> boundaries, we can try to pin the page and get the corresponding mfn,
> using xen_pin_page. This avoids going through the swiotlb bounce
> buffer. If xen_pin_page fails (because the underlying mfn doesn't
> respect the dma_mask) fall back to the swiotlb bounce buffer.
> Add a ref count to xen_dma_info, so that we can avoid pinnig pages that
> are already pinned.
> Use a spinlock to protect accesses, insertions and deletions in the
> rbtrees.
>
> Signed-off-by: Stefano Stabellini <[email protected]>
> ---
> drivers/xen/swiotlb-xen.c | 152 ++++++++++++++++++++++++++++++++++++++++++---
> 1 files changed, 143 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 022bcaf..6f94285 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -57,6 +57,8 @@
> #define NR_DMA_SEGS ((xen_io_tlb_nslabs + IO_TLB_SEGSIZE - 1) / IO_TLB_SEGSIZE)
> static char *xen_io_tlb_start, *xen_io_tlb_end;
> static unsigned long xen_io_tlb_nslabs;
> +spinlock_t swiotlb_lock;
> +
> /*
> * Quick lookup value of the bus address of the IOTLB.
> */
> @@ -79,6 +81,7 @@ struct xen_dma_info {
> dma_addr_t dma_addr;
> phys_addr_t phys_addr;
> size_t size;
> + atomic_t refs;
> struct rb_node rbnode_dma;
> struct rb_node rbnode_phys;
> };
> @@ -254,6 +257,48 @@ static dma_addr_t xen_virt_to_bus(void *address)
> return xen_phys_to_bus_quick(virt_to_phys(address));
> }
>
> +static int xen_pin_dev_page(struct device *dev,
> + phys_addr_t phys,
> + dma_addr_t *dev_addr)

Something is odd with your tabs.
> +{
> + u64 dma_mask = DMA_BIT_MASK(32);

Why 32?

> + xen_pfn_t in;
> + struct xen_dma_info *dma_info = xen_get_dma_info_from_phys(phys);
> +
> + if (dma_info != NULL) {
> + atomic_inc(&dma_info->refs);
> + *dev_addr = dma_info->dma_addr + (phys - dma_info->phys_addr);
> + return 0;
> + }
> +
> + if (dev && dev->coherent_dma_mask)
> + dma_mask = dma_alloc_coherent_mask(dev, GFP_KERNEL);
> +
> + in = phys >> PAGE_SHIFT;
> + if (!xen_pin_page(&in, fls64(dma_mask))) {

Why not just make xen_pin_page use an phys address and it can also
do the appropiate bit shifting in it?

> + *dev_addr = in << PAGE_SHIFT;
> + dma_info = kzalloc(sizeof(struct xen_dma_info), GFP_NOWAIT);
> + if (!dma_info) {
> + pr_warn("cannot allocate xen_dma_info\n");
> + xen_destroy_contiguous_region(phys & PAGE_MASK, 0);

Perhaps we should add an inline function for that called 'xen_unpin_page' ?

> + return -ENOMEM;
> + }
> + dma_info->phys_addr = phys & PAGE_MASK;
> + dma_info->size = PAGE_SIZE;
> + dma_info->dma_addr = *dev_addr;
> + if (xen_dma_add_entry(dma_info)) {
> + pr_warn("cannot add new entry to bus_to_phys\n");
> + xen_destroy_contiguous_region(phys & PAGE_MASK, 0);
> + kfree(dma_info);
> + return -EFAULT;
> + }
> + atomic_set(&dma_info->refs, 1);
> + *dev_addr += (phys & ~PAGE_MASK);
> + return 0;
> + }

Don't you want to the opposite of dma_alloc_coherent_mask ?

> + return -EFAULT;
> +}
> +
> static int check_pages_physically_contiguous(unsigned long pfn,
> unsigned int offset,
> size_t length)
> @@ -434,6 +479,7 @@ retry:
> rc = 0;
> } else
> rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
> + spin_lock_init(&swiotlb_lock);
> return rc;
> error:
> if (repeat--) {
> @@ -461,6 +507,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> phys_addr_t phys;
> dma_addr_t dev_addr;
> struct xen_dma_info *dma_info = NULL;
> + unsigned long irqflags;
>
> /*
> * Ignore region specifiers - the kernel's ideas of
> @@ -497,7 +544,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> !range_straddles_page_boundary(phys, size))
> *dma_handle = dev_addr;
> else {
> - if (xen_create_contiguous_region(phys, order,
> + if (xen_create_contiguous_region(phys & PAGE_MASK, order,
> fls64(dma_mask), dma_handle) != 0) {
> xen_free_coherent_pages(hwdev, size, ret, (dma_addr_t)phys, attrs);
> return NULL;
> @@ -509,15 +556,19 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> xen_destroy_contiguous_region(phys, order);
> return NULL;
> }
> - dma_info->phys_addr = phys;
> - dma_info->size = size;
> + dma_info->phys_addr = phys & PAGE_MASK;
> + dma_info->size = (1U << order) << PAGE_SHIFT;
> dma_info->dma_addr = *dma_handle;
> + atomic_set(&dma_info->refs, 1);
> + spin_lock_irqsave(&swiotlb_lock, irqflags);
> if (xen_dma_add_entry(dma_info)) {
> + spin_unlock_irqrestore(&swiotlb_lock, irqflags);
> pr_warn("cannot add new entry to bus_to_phys\n");
> xen_destroy_contiguous_region(phys, order);
> kfree(dma_info);
> return NULL;
> }
> + spin_unlock_irqrestore(&swiotlb_lock, irqflags);
> }
> memset(ret, 0, size);
> return ret;
> @@ -532,6 +583,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
> phys_addr_t phys;
> u64 dma_mask = DMA_BIT_MASK(32);
> struct xen_dma_info *dma_info = NULL;
> + unsigned long flags;
>
> if (dma_release_from_coherent(hwdev, order, vaddr))
> return;
> @@ -539,6 +591,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
> if (hwdev && hwdev->coherent_dma_mask)
> dma_mask = hwdev->coherent_dma_mask;
>
> + spin_lock_irqsave(&swiotlb_lock, flags);
> /* do not use virt_to_phys because on ARM it doesn't return you the
> * physical address */
> phys = xen_bus_to_phys(dev_addr);
> @@ -546,12 +599,16 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
> if (xen_feature(XENFEAT_auto_translated_physmap) ||
> (((dev_addr + size - 1 > dma_mask)) ||
> range_straddles_page_boundary(phys, size))) {
> - xen_destroy_contiguous_region(phys, order);
> dma_info = xen_get_dma_info_from_dma(dev_addr);
> - rb_erase(&dma_info->rbnode, &bus_to_phys);
> - kfree(dma_info);
> + if (atomic_dec_and_test(&dma_info->refs)) {
> + xen_destroy_contiguous_region(phys & PAGE_MASK, order);
> + rb_erase(&dma_info->rbnode_dma, &bus_to_phys);
> + rb_erase(&dma_info->rbnode_phys, &phys_to_bus);
> + kfree(dma_info);
> + }

If xen_pin_dev_page failed or was not called we would still end up
calling this. And we would decrement a potentially garbage value? Or not.
> }
>
> + spin_unlock_irqrestore(&swiotlb_lock, flags);
> xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
> }
> EXPORT_SYMBOL_GPL(xen_swiotlb_free_coherent);
> @@ -583,6 +640,23 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> !range_straddles_page_boundary(phys, size) && !swiotlb_force)
> return dev_addr;
>
> + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> + size <= PAGE_SIZE &&
> + !range_straddles_page_boundary(phys, size) &&
> + !swiotlb_force) {
> + unsigned long flags;
> + int rc;
> +
> + spin_lock_irqsave(&swiotlb_lock, flags);
> + rc = xen_pin_dev_page(dev, phys, &dev_addr);
> + spin_unlock_irqrestore(&swiotlb_lock, flags);
> +
> + if (!rc) {
> + dma_mark_clean(phys_to_virt(phys), size);
> + return dev_addr;
> + }

And if there is an rc you should probably do
dev_warn(.., "RC ..")


But more importantly - all of this code adds an extra lock on the X86 side
which will get -ENOxxx on the xen_pin_dev_page.

I am wondering if it makes sense to make most of this code dependent
on CONFIG_ARM? As the check for auto-xlat falls flat on X86 + PVH. Thought
I have no idea what we want to do with PVH and X86 at this point.

> + }
> +
> /*
> * Oh well, have to allocate and map a bounce buffer.
> * Pass the dma_addr of the first slab in the iotlb buffer as
> @@ -618,10 +692,37 @@ EXPORT_SYMBOL_GPL(xen_swiotlb_map_page);
> static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
> size_t size, enum dma_data_direction dir)
> {
> - phys_addr_t paddr = xen_bus_to_phys(dev_addr);
> + struct xen_dma_info *dma_info;
> + phys_addr_t paddr = DMA_ERROR_CODE;
> + char *vaddr = NULL;
> + unsigned long flags;
>
> BUG_ON(dir == DMA_NONE);
>
> + spin_lock_irqsave(&swiotlb_lock, flags);
> + dma_info = xen_get_dma_info_from_dma(dev_addr);
> + if (dma_info != NULL) {
> + paddr = dma_info->phys_addr + (dev_addr - dma_info->dma_addr);
> + vaddr = phys_to_virt(paddr);
> + }
> +
> + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> + paddr != DMA_ERROR_CODE &&
> + !(vaddr >= xen_io_tlb_start && vaddr < xen_io_tlb_end) &&
> + !swiotlb_force) {
> + if (atomic_dec_and_test(&dma_info->refs)) {
> + xen_destroy_contiguous_region(paddr & PAGE_MASK, 0);
> + rb_erase(&dma_info->rbnode_dma, &bus_to_phys);
> + rb_erase(&dma_info->rbnode_phys, &phys_to_bus);
> + kfree(dma_info);
> + }
> + spin_unlock_irqrestore(&swiotlb_lock, flags);
> + if ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL))
> + dma_mark_clean(vaddr, size);
> + return;
> + }
> + spin_unlock_irqrestore(&swiotlb_lock, flags);
> +
> /* NOTE: We use dev_addr here, not paddr! */
> if (is_xen_swiotlb_buffer(dev_addr)) {
> swiotlb_tbl_unmap_single(hwdev, paddr, size, dir);
> @@ -664,9 +765,19 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
> enum dma_sync_target target)
> {
> phys_addr_t paddr = xen_bus_to_phys(dev_addr);
> + char *vaddr = phys_to_virt(paddr);
>
> BUG_ON(dir == DMA_NONE);
>
> + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> + paddr != DMA_ERROR_CODE &&
> + size <= PAGE_SIZE &&
> + !(vaddr >= xen_io_tlb_start && vaddr < xen_io_tlb_end) &&
> + !range_straddles_page_boundary(paddr, size) && !swiotlb_force) {
> + dma_mark_clean(vaddr, size);
> + return;
> + }
> +
> /* NOTE: We use dev_addr here, not paddr! */
> if (is_xen_swiotlb_buffer(dev_addr)) {
> swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
> @@ -717,13 +828,36 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> struct dma_attrs *attrs)
> {
> struct scatterlist *sg;
> - int i;
> + int i, rc;
> + u64 dma_mask = DMA_BIT_MASK(32);
> + unsigned long flags;
>
> BUG_ON(dir == DMA_NONE);
>
> + if (hwdev && hwdev->coherent_dma_mask)
> + dma_mask = dma_alloc_coherent_mask(hwdev, GFP_KERNEL);
> +
> for_each_sg(sgl, sg, nelems, i) {
> phys_addr_t paddr = sg_phys(sg);
> - dma_addr_t dev_addr = xen_phys_to_bus_quick(paddr);
> + dma_addr_t dev_addr;
> +
> + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> + !range_straddles_page_boundary(paddr, sg->length) &&
> + sg->length <= PAGE_SIZE &&
> + !swiotlb_force) {
> +
> + spin_lock_irqsave(&swiotlb_lock, flags);
> + rc = xen_pin_dev_page(hwdev, paddr, &dev_addr);
> + spin_unlock_irqrestore(&swiotlb_lock, flags);
> +
> + if (!rc) {
> + dma_mark_clean(phys_to_virt(paddr), sg->length);
> + sg_dma_len(sg) = sg->length;
> + sg->dma_address = dev_addr;
> + continue;
> + }
> + }
> + dev_addr = xen_phys_to_bus_quick(paddr);
>
> if (swiotlb_force ||
> xen_feature(XENFEAT_auto_translated_physmap) ||
> --
> 1.7.2.5
>

2013-10-01 13:40:23

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v6 10/19] xen: introduce xen_alloc/free_coherent_pages

On 30 Sep 2013, at 16:31, Konrad Rzeszutek Wilk <[email protected]> wrote:
> On Fri, Sep 27, 2013 at 05:09:58PM +0100, Stefano Stabellini wrote:
>> xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
>> and devices. On native x86 and ARMv8 is sufficient to call
>> __get_free_pages in order to get a coherent buffer, while on ARM we need
>> to call the native dma_ops->alloc implementation.
>>
>> When arm64 stops using the swiotlb by default and starts having multiple
>> dma_ops implementations, we'll use __get_dma_ops there too.
>
> I presume this is a future TODO, not some further patch (in which
> case you should say in here the title of it). If it is a TODO could
> you stick that in the sentence here somewhere to make it crytal clear that
> it is not implemented.

Thanks for pointing this out. The arm64 patch shouldn't be different
from arm here. I thought a subsequent patch already added
__get_dma_ops() for arm64.

Catalin-

2013-10-02 17:04:56

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 10/19] xen: introduce xen_alloc/free_coherent_pages

On Tue, 1 Oct 2013, Catalin Marinas wrote:
> On 30 Sep 2013, at 16:31, Konrad Rzeszutek Wilk <[email protected]> wrote:
> > On Fri, Sep 27, 2013 at 05:09:58PM +0100, Stefano Stabellini wrote:
> >> xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
> >> and devices. On native x86 and ARMv8 is sufficient to call
> >> __get_free_pages in order to get a coherent buffer, while on ARM we need
> >> to call the native dma_ops->alloc implementation.
> >>
> >> When arm64 stops using the swiotlb by default and starts having multiple
> >> dma_ops implementations, we'll use __get_dma_ops there too.
> >
> > I presume this is a future TODO, not some further patch (in which
> > case you should say in here the title of it). If it is a TODO could
> > you stick that in the sentence here somewhere to make it crytal clear that
> > it is not implemented.
>
> Thanks for pointing this out. The arm64 patch shouldn't be different
> from arm here. I thought a subsequent patch already added
> __get_dma_ops() for arm64.

Yes, the arm64 patch already added __get_dma_ops.

However I am not using __get_dma_ops to implement
xen_alloc_coherent_pages for arm64 because I can just use
__get_free_pages for the moment:


> diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h
> new file mode 100644
> index 0000000..0d6ad25
> --- /dev/null
> +++ b/arch/arm64/include/asm/xen/page-coherent.h
> @@ -0,0 +1,24 @@
> +#ifndef _ASM_ARM64_XEN_PAGE_COHERENT_H
> +#define _ASM_ARM64_XEN_PAGE_COHERENT_H
> +
> +#include <asm/page.h>
> +#include <linux/dma-attrs.h>
> +#include <linux/dma-mapping.h>
> +
> +static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
> + dma_addr_t *dma_handle, gfp_t flags,
> + struct dma_attrs *attrs)
> +{
> + void *vstart = (void*)__get_free_pages(flags, get_order(size));
> + *dma_handle = virt_to_phys(vstart);
> + return vstart;
> +}
> +
> +static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
> + void *cpu_addr, dma_addr_t dma_handle,
> + struct dma_attrs *attrs)
> +{
> + free_pages((unsigned long) cpu_addr, get_order(size));
> +}
> +
> +#endif /* _ASM_ARM64_XEN_PAGE_COHERENT_H */

2013-10-02 17:08:48

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v6 10/19] xen: introduce xen_alloc/free_coherent_pages

On Wed, Oct 02, 2013 at 06:03:46PM +0100, Stefano Stabellini wrote:
> On Tue, 1 Oct 2013, Catalin Marinas wrote:
> > On 30 Sep 2013, at 16:31, Konrad Rzeszutek Wilk <[email protected]> wrote:
> > > On Fri, Sep 27, 2013 at 05:09:58PM +0100, Stefano Stabellini wrote:
> > >> xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
> > >> and devices. On native x86 and ARMv8 is sufficient to call
> > >> __get_free_pages in order to get a coherent buffer, while on ARM we need
> > >> to call the native dma_ops->alloc implementation.
> > >>
> > >> When arm64 stops using the swiotlb by default and starts having multiple
> > >> dma_ops implementations, we'll use __get_dma_ops there too.
> > >
> > > I presume this is a future TODO, not some further patch (in which
> > > case you should say in here the title of it). If it is a TODO could
> > > you stick that in the sentence here somewhere to make it crytal clear that
> > > it is not implemented.
> >
> > Thanks for pointing this out. The arm64 patch shouldn't be different
> > from arm here. I thought a subsequent patch already added
> > __get_dma_ops() for arm64.
>
> Yes, the arm64 patch already added __get_dma_ops.
>
> However I am not using __get_dma_ops to implement
> xen_alloc_coherent_pages for arm64 because I can just use
> __get_free_pages for the moment:

So why do the work twice when we'll get coherency maintenance in the DMA
ops?

--
Catalin

2013-10-02 17:14:42

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 15/19] swiotlb-xen: call dma_capable only if dev->dma_mask is allocated

The real issue is that some devices (xgmac, I am looking at you), don't
set the dma_mask, even though they are perfectly capable of doing dma.

Maybe I should modify the arm implementation of dma_capable (see
http://marc.info/?l=linux-kernel&m=138029832007821&w=2) to ignore the
dma_mask.

On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 27, 2013 at 05:10:03PM +0100, Stefano Stabellini wrote:
>
> Why? I am looking at X86 and IA64 and I see:
>
> 79 if (!dev->dma_mask)
> 80 return 0;
>
>
> Why not port dma_capable over to ARM?
>
> > Signed-off-by: Stefano Stabellini <[email protected]>
> > ---
> > drivers/xen/swiotlb-xen.c | 6 +++---
> > 1 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 790c2eb..3011736 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -512,7 +512,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > * buffering it.
> > */
> > if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> > - dma_capable(dev, dev_addr, size) &&
> > + dev->dma_mask && dma_capable(dev, dev_addr, size) &&
> > !range_straddles_page_boundary(phys, size) && !swiotlb_force)
> > return dev_addr;
> >
> > @@ -532,7 +532,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > /*
> > * Ensure that the address returned is DMA'ble
> > */
> > - if (!dma_capable(dev, dev_addr, size)) {
> > + if (dev->dma_mask && !dma_capable(dev, dev_addr, size)) {
> > swiotlb_tbl_unmap_single(dev, map, size, dir);
> > dev_addr = 0;
> > }
> > @@ -660,7 +660,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> >
> > if (swiotlb_force ||
> > xen_feature(XENFEAT_auto_translated_physmap) ||
> > - !dma_capable(hwdev, dev_addr, sg->length) ||
> > + (hwdev->dma_mask && !dma_capable(hwdev, dev_addr, sg->length)) ||
> > range_straddles_page_boundary(paddr, sg->length)) {
> > /*
> > * Pass the dma_addr of the first slab in the iotlb buffer as
> > --
> > 1.7.2.5
> >
>

2013-10-02 17:15:34

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 10/19] xen: introduce xen_alloc/free_coherent_pages

On Wed, 2 Oct 2013, Catalin Marinas wrote:
> On Wed, Oct 02, 2013 at 06:03:46PM +0100, Stefano Stabellini wrote:
> > On Tue, 1 Oct 2013, Catalin Marinas wrote:
> > > On 30 Sep 2013, at 16:31, Konrad Rzeszutek Wilk <[email protected]> wrote:
> > > > On Fri, Sep 27, 2013 at 05:09:58PM +0100, Stefano Stabellini wrote:
> > > >> xen_swiotlb_alloc_coherent needs to allocate a coherent buffer for cpu
> > > >> and devices. On native x86 and ARMv8 is sufficient to call
> > > >> __get_free_pages in order to get a coherent buffer, while on ARM we need
> > > >> to call the native dma_ops->alloc implementation.
> > > >>
> > > >> When arm64 stops using the swiotlb by default and starts having multiple
> > > >> dma_ops implementations, we'll use __get_dma_ops there too.
> > > >
> > > > I presume this is a future TODO, not some further patch (in which
> > > > case you should say in here the title of it). If it is a TODO could
> > > > you stick that in the sentence here somewhere to make it crytal clear that
> > > > it is not implemented.
> > >
> > > Thanks for pointing this out. The arm64 patch shouldn't be different
> > > from arm here. I thought a subsequent patch already added
> > > __get_dma_ops() for arm64.
> >
> > Yes, the arm64 patch already added __get_dma_ops.
> >
> > However I am not using __get_dma_ops to implement
> > xen_alloc_coherent_pages for arm64 because I can just use
> > __get_free_pages for the moment:
>
> So why do the work twice when we'll get coherency maintenance in the DMA
> ops?

Yeah, good point.

2013-10-02 17:23:42

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 15/19] swiotlb-xen: call dma_capable only if dev->dma_mask is allocated

On Wed, Oct 02, 2013 at 06:13:35PM +0100, Stefano Stabellini wrote:
> The real issue is that some devices (xgmac, I am looking at you), don't
> set the dma_mask, even though they are perfectly capable of doing dma.

So this looks like a bug in the drivers. I thought I saw a huge patchset
by the ARM maintainer that tries to fix the dma_mask and dma_mask_coherent
bugs? And also fix the drivers to use the dma mask?

>
> Maybe I should modify the arm implementation of dma_capable (see
> http://marc.info/?l=linux-kernel&m=138029832007821&w=2) to ignore the
> dma_mask.

Or fix the 'xgmac'?
>
> On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> > On Fri, Sep 27, 2013 at 05:10:03PM +0100, Stefano Stabellini wrote:
> >
> > Why? I am looking at X86 and IA64 and I see:
> >
> > 79 if (!dev->dma_mask)
> > 80 return 0;
> >
> >
> > Why not port dma_capable over to ARM?
> >
> > > Signed-off-by: Stefano Stabellini <[email protected]>
> > > ---
> > > drivers/xen/swiotlb-xen.c | 6 +++---
> > > 1 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > > index 790c2eb..3011736 100644
> > > --- a/drivers/xen/swiotlb-xen.c
> > > +++ b/drivers/xen/swiotlb-xen.c
> > > @@ -512,7 +512,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > > * buffering it.
> > > */
> > > if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> > > - dma_capable(dev, dev_addr, size) &&
> > > + dev->dma_mask && dma_capable(dev, dev_addr, size) &&
> > > !range_straddles_page_boundary(phys, size) && !swiotlb_force)
> > > return dev_addr;
> > >
> > > @@ -532,7 +532,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > > /*
> > > * Ensure that the address returned is DMA'ble
> > > */
> > > - if (!dma_capable(dev, dev_addr, size)) {
> > > + if (dev->dma_mask && !dma_capable(dev, dev_addr, size)) {
> > > swiotlb_tbl_unmap_single(dev, map, size, dir);
> > > dev_addr = 0;
> > > }
> > > @@ -660,7 +660,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> > >
> > > if (swiotlb_force ||
> > > xen_feature(XENFEAT_auto_translated_physmap) ||
> > > - !dma_capable(hwdev, dev_addr, sg->length) ||
> > > + (hwdev->dma_mask && !dma_capable(hwdev, dev_addr, sg->length)) ||
> > > range_straddles_page_boundary(paddr, sg->length)) {
> > > /*
> > > * Pass the dma_addr of the first slab in the iotlb buffer as
> > > --
> > > 1.7.2.5
> > >
> >

2013-10-02 17:24:45

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 18/19] swiotlb-xen: introduce a rbtree to track phys to bus mappings

On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 27, 2013 at 05:10:06PM +0100, Stefano Stabellini wrote:
> > Introduce a second red-back tree to track phys to bus mappings created after
> > the initialization of the swiotlb buffer.
>
> Could you explain the use case a bit more please?
>
> As in:
> a) why is this needed

The reason I introduced it in this patch series was to keep track of
existing physical to dma mappings so that I could start to dynamically
pin single pages and avoid bouncing on the swiotlb all the time.
See the following patch in the series.

However it turns out that memcpying is faster that going to Xen for the
gpfn->mfn translation all the time.

So I'll drop "swiotlb-xen: instead of bouncing on the swiotlb, pin
single pages" from the series.

However with or without that patch, I would still like to keep this
second tree to keep track of physical to dma mappings created by
gnttab_map_refs: GNTTABOP_map_grant_ref returns the mfn of the granted
page and we can exploit it to avoid bouncing on the swiotlb buffer for
DMA operations on granted pages.



> b) why are we using the rb tree instead of something else (say FIFO queue)

Given the type of workload, I though that it would be the best fit. It
also happens to be the same data structure used by XenServer in their
kernel to achieve something similar.
I don't have hard numbers though.


> c) how long are these in usage?

During the entire life of the guest VM.


> d) do we need locking now?

Yes, but only for tree insertions, deletions and lookups.


> > Signed-off-by: Stefano Stabellini <[email protected]>
> > ---
> > drivers/xen/swiotlb-xen.c | 99 +++++++++++++++++++++++++++++++++++++-------
> > 1 files changed, 83 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 3011736..022bcaf 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -79,7 +79,8 @@ struct xen_dma_info {
> > dma_addr_t dma_addr;
> > phys_addr_t phys_addr;
> > size_t size;
> > - struct rb_node rbnode;
> > + struct rb_node rbnode_dma;
> > + struct rb_node rbnode_phys;
> > };
> >
> > /*
> > @@ -96,8 +97,13 @@ static struct xen_dma_info *xen_dma_seg;
> > * mappings.
> > */
> > static struct rb_root bus_to_phys = RB_ROOT;
> > +/*
> > + * This tree keeps track of physical address to bus address
> > + * mappings apart from the ones belonging to the initial swiotlb buffer.
> > + */
> > +static struct rb_root phys_to_bus = RB_ROOT;
> >
> > -static int xen_dma_add_entry(struct xen_dma_info *new)
> > +static int xen_dma_add_entry_bus(struct xen_dma_info *new)
> > {
> > struct rb_node **link = &bus_to_phys.rb_node;
> > struct rb_node *parent = NULL;
> > @@ -106,7 +112,7 @@ static int xen_dma_add_entry(struct xen_dma_info *new)
> >
> > while (*link) {
> > parent = *link;
> > - entry = rb_entry(parent, struct xen_dma_info, rbnode);
> > + entry = rb_entry(parent, struct xen_dma_info, rbnode_dma);
> >
> > if (new->dma_addr == entry->dma_addr)
> > goto err_out;
> > @@ -118,8 +124,41 @@ static int xen_dma_add_entry(struct xen_dma_info *new)
> > else
> > link = &(*link)->rb_right;
> > }
> > - rb_link_node(&new->rbnode, parent, link);
> > - rb_insert_color(&new->rbnode, &bus_to_phys);
> > + rb_link_node(&new->rbnode_dma, parent, link);
> > + rb_insert_color(&new->rbnode_dma, &bus_to_phys);
> > + goto out;
> > +
> > +err_out:
> > + rc = -EINVAL;
> > + pr_warn("%s: cannot add phys=%pa -> dma=%pa: phys=%pa -> dma=%pa already exists\n",
> > + __func__, &new->phys_addr, &new->dma_addr, &entry->phys_addr, &entry->dma_addr);
> > +out:
> > + return rc;
> > +}
> > +
> > +static int xen_dma_add_entry_phys(struct xen_dma_info *new)
> > +{
> > + struct rb_node **link = &phys_to_bus.rb_node;
> > + struct rb_node *parent = NULL;
> > + struct xen_dma_info *entry;
> > + int rc = 0;
> > +
> > + while (*link) {
> > + parent = *link;
> > + entry = rb_entry(parent, struct xen_dma_info, rbnode_phys);
> > +
> > + if (new->dma_addr == entry->dma_addr)
> > + goto err_out;
> > + if (new->phys_addr == entry->phys_addr)
> > + goto err_out;
> > +
> > + if (new->phys_addr < entry->phys_addr)
> > + link = &(*link)->rb_left;
> > + else
> > + link = &(*link)->rb_right;
> > + }
> > + rb_link_node(&new->rbnode_phys, parent, link);
> > + rb_insert_color(&new->rbnode_phys, &phys_to_bus);
> > goto out;
> >
> > err_out:
> > @@ -130,13 +169,22 @@ out:
> > return rc;
> > }
> >
> > +static int xen_dma_add_entry(struct xen_dma_info *new)
> > +{
> > + int rc;
> > + if ((rc = xen_dma_add_entry_bus(new) < 0) ||
> > + (rc = xen_dma_add_entry_phys(new) < 0))
>
> Please don't do that. Just do
>
> rc = xen_dma_add_entry(bus);
> if (rc)
> return rc;
> rc = xen_dma_add_entry_phys(new);
> if (rc) {
> // unwind it somehow? <<== This is important :-)
> }
> return rc;
>
>
> > + return rc;
> > + return 0;
> > +}
> > +
> > static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
> > {
> > struct rb_node *n = bus_to_phys.rb_node;
> > struct xen_dma_info *entry;
> >
> > while (n) {
> > - entry = rb_entry(n, struct xen_dma_info, rbnode);
> > + entry = rb_entry(n, struct xen_dma_info, rbnode_dma);
> > if (entry->dma_addr <= dma_addr &&
> > entry->dma_addr + entry->size > dma_addr) {
> > return entry;
> > @@ -150,11 +198,30 @@ static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
> > return NULL;
> > }
> >
> > -static dma_addr_t xen_phys_to_bus(phys_addr_t paddr)
> > +static struct xen_dma_info *xen_get_dma_info_from_phys(phys_addr_t phys)
> > {
> > - int nr_seg;
> > - unsigned long offset;
> > - char *vaddr;
> > + struct rb_node *n = phys_to_bus.rb_node;
> > + struct xen_dma_info *entry;
> > +
> > + while (n) {
> > + entry = rb_entry(n, struct xen_dma_info, rbnode_phys);
> > + if (entry->phys_addr <= phys &&
> > + entry->phys_addr + entry->size > phys) {
> > + return entry;
> > + }
> > + if (phys < entry->phys_addr)
> > + n = n->rb_left;
> > + else
> > + n = n->rb_right;
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +/* Only looks into the initial buffer allocation in case of
> > + * XENFEAT_auto_translated_physmap guests. */
> > +static dma_addr_t xen_phys_to_bus_quick(phys_addr_t paddr) { int nr_seg;
> > + unsigned long offset; char *vaddr;
> >
> > if (!xen_feature(XENFEAT_auto_translated_physmap))
> > return phys_to_machine(XPADDR(paddr)).maddr;
> > @@ -184,7 +251,7 @@ static phys_addr_t xen_bus_to_phys(dma_addr_t baddr)
> >
> > static dma_addr_t xen_virt_to_bus(void *address)
> > {
> > - return xen_phys_to_bus(virt_to_phys(address));
> > + return xen_phys_to_bus_quick(virt_to_phys(address));
> > }
> >
> > static int check_pages_physically_contiguous(unsigned long pfn,
> > @@ -424,7 +491,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > * Do not use virt_to_phys(ret) because on ARM it doesn't correspond
> > * to *dma_handle. */
> > phys = *dma_handle;
> > - dev_addr = xen_phys_to_bus(phys);
> > + dev_addr = xen_phys_to_bus_quick(phys);
> > if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> > ((dev_addr + size - 1 <= dma_mask)) &&
> > !range_straddles_page_boundary(phys, size))
> > @@ -503,7 +570,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > struct dma_attrs *attrs)
> > {
> > phys_addr_t map, phys = page_to_phys(page) + offset;
> > - dma_addr_t dev_addr = xen_phys_to_bus(phys);
> > + dma_addr_t dev_addr = xen_phys_to_bus_quick(phys);
> >
> > BUG_ON(dir == DMA_NONE);
> > /*
> > @@ -527,7 +594,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > if (map == SWIOTLB_MAP_ERROR)
> > return DMA_ERROR_CODE;
> >
> > - dev_addr = xen_phys_to_bus(map);
> > + dev_addr = xen_phys_to_bus_quick(map);
> >
> > /*
> > * Ensure that the address returned is DMA'ble
> > @@ -656,7 +723,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> >
> > for_each_sg(sgl, sg, nelems, i) {
> > phys_addr_t paddr = sg_phys(sg);
> > - dma_addr_t dev_addr = xen_phys_to_bus(paddr);
> > + dma_addr_t dev_addr = xen_phys_to_bus_quick(paddr);
> >
> > if (swiotlb_force ||
> > xen_feature(XENFEAT_auto_translated_physmap) ||
> > @@ -682,7 +749,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> > sg_dma_len(sgl) = 0;
> > return DMA_ERROR_CODE;
> > }
> > - sg->dma_address = xen_phys_to_bus(map);
> > + sg->dma_address = xen_phys_to_bus_quick(map);
> > } else
> > sg->dma_address = dev_addr;
> > sg_dma_len(sg) = sg->length;
> > --
> > 1.7.2.5
> >
>

2013-10-02 17:32:43

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 12/19] swiotlb: don't assume that io_tlb_start-io_tlb_end is coherent

On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 27, 2013 at 05:10:00PM +0100, Stefano Stabellini wrote:
> > The swiotlb code has appropriate calls to dma_mark_clean in place for
> > buffers passed to swiotlb_map_page as an argument. However it assumes
> > that the swiotlb bounce buffer (io_tlb_start-io_tlb_end) is already
> > coherent and doesn't need any calls to dma_mark_clean.
> >
> > On ARM the swiotlb bounce buffer is not coherent (the memory is
> > writealloc while it should be bufferable) and therefore we need to call
> > dma_mark_clean appropriately on the bounce buffer code paths too.
> >
> > Note that most architecures have an empty dma_mark_clean implementation
> > anyway.
>
> The other architecture that uses swiotlb is IA64 and that does have
> an implementation where it touches on page attributes.
>
> Which means I have to figure out why my HP zx6000 won't boot with 3.11 now :-(
>

Now this is a very thorny issue.

Honestly I don't like the dma_mark_clean interface very much: it's one
big hammer, when we actually need some finesse to handle coherency.

For example on ARM some devices might not need the dma_mark_clean call,
while others do. Calling it all the times is at the very best
inefficient and incorrect at worst.

I am thinking of calling the original map/unmap_page functions instead
(arm_dma_map_page or arm_coherent_dma_map_page in the arm case).
However in order to do that I would need to add more __get_dma_ops calls in
both lib/swiotlb.c and drivers/xen/swiotlb-xen.c


> > Signed-off-by: Stefano Stabellini <[email protected]>
> > ---
> > lib/swiotlb.c | 13 ++++++++++---
> > 1 files changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> > index 4e8686c..eb45d17 100644
> > --- a/lib/swiotlb.c
> > +++ b/lib/swiotlb.c
> > @@ -515,6 +515,7 @@ found:
> > io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> > if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
> > swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
> > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> >
> > return tlb_addr;
> > }
> > @@ -547,7 +548,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
> > * First, sync the memory before unmapping the entry
> > */
> > if (orig_addr && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
> > + {
> > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> > swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
> > + }
> >
> > /*
> > * Return the buffer to the free list by setting the corresponding
> > @@ -587,17 +591,20 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
> >
> > switch (target) {
> > case SYNC_FOR_CPU:
> > - if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
> > + if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> > swiotlb_bounce(orig_addr, tlb_addr,
> > size, DMA_FROM_DEVICE);
> > + }
> > else
> > BUG_ON(dir != DMA_TO_DEVICE);
> > break;
> > case SYNC_FOR_DEVICE:
> > - if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
> > + if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> > swiotlb_bounce(orig_addr, tlb_addr,
> > size, DMA_TO_DEVICE);
> > - else
> > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> > + } else
> > BUG_ON(dir != DMA_FROM_DEVICE);
> > break;
> > default:
> > --
> > 1.7.2.5
> >
>

2013-10-03 18:35:29

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v6 15/19] swiotlb-xen: call dma_capable only if dev->dma_mask is allocated

On Wed, Oct 2, 2013 at 12:22 PM, Konrad Rzeszutek Wilk
<[email protected]> wrote:
> On Wed, Oct 02, 2013 at 06:13:35PM +0100, Stefano Stabellini wrote:
>> The real issue is that some devices (xgmac, I am looking at you), don't
>> set the dma_mask, even though they are perfectly capable of doing dma.
>
> So this looks like a bug in the drivers. I thought I saw a huge patchset
> by the ARM maintainer that tries to fix the dma_mask and dma_mask_coherent
> bugs? And also fix the drivers to use the dma mask?

I think Russell only fixed incorrect handling of masks, not missing handling.

>>
>> Maybe I should modify the arm implementation of dma_capable (see
>> http://marc.info/?l=linux-kernel&m=138029832007821&w=2) to ignore the
>> dma_mask.
>
> Or fix the 'xgmac'?

There's really some core changes needed to fix this and I'd guess
there are lots of other cases of missing dma_mask. The problem is
platform devices don't create a mask to assign to dma_mask in the
first place. I think the right solution is assigning dma_mask to
&coherent_dma_mask by default as Russell did for AMBA devices. I don't
know if doing that would break anything or not.

Rob

>>
>> On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
>> > On Fri, Sep 27, 2013 at 05:10:03PM +0100, Stefano Stabellini wrote:
>> >
>> > Why? I am looking at X86 and IA64 and I see:
>> >
>> > 79 if (!dev->dma_mask)
>> > 80 return 0;
>> >
>> >
>> > Why not port dma_capable over to ARM?
>> >
>> > > Signed-off-by: Stefano Stabellini <[email protected]>
>> > > ---
>> > > drivers/xen/swiotlb-xen.c | 6 +++---
>> > > 1 files changed, 3 insertions(+), 3 deletions(-)
>> > >
>> > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
>> > > index 790c2eb..3011736 100644
>> > > --- a/drivers/xen/swiotlb-xen.c
>> > > +++ b/drivers/xen/swiotlb-xen.c
>> > > @@ -512,7 +512,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
>> > > * buffering it.
>> > > */
>> > > if (!xen_feature(XENFEAT_auto_translated_physmap) &&
>> > > - dma_capable(dev, dev_addr, size) &&
>> > > + dev->dma_mask && dma_capable(dev, dev_addr, size) &&
>> > > !range_straddles_page_boundary(phys, size) && !swiotlb_force)
>> > > return dev_addr;
>> > >
>> > > @@ -532,7 +532,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
>> > > /*
>> > > * Ensure that the address returned is DMA'ble
>> > > */
>> > > - if (!dma_capable(dev, dev_addr, size)) {
>> > > + if (dev->dma_mask && !dma_capable(dev, dev_addr, size)) {
>> > > swiotlb_tbl_unmap_single(dev, map, size, dir);
>> > > dev_addr = 0;
>> > > }
>> > > @@ -660,7 +660,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
>> > >
>> > > if (swiotlb_force ||
>> > > xen_feature(XENFEAT_auto_translated_physmap) ||
>> > > - !dma_capable(hwdev, dev_addr, sg->length) ||
>> > > + (hwdev->dma_mask && !dma_capable(hwdev, dev_addr, sg->length)) ||
>> > > range_straddles_page_boundary(paddr, sg->length)) {
>> > > /*
>> > > * Pass the dma_addr of the first slab in the iotlb buffer as
>> > > --
>> > > 1.7.2.5
>> > >
>> >
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

2013-10-04 13:23:41

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v6 12/19] swiotlb: don't assume that io_tlb_start-io_tlb_end is coherent

On Wed, Oct 02, 2013 at 06:31:57PM +0100, Stefano Stabellini wrote:
> On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> > On Fri, Sep 27, 2013 at 05:10:00PM +0100, Stefano Stabellini wrote:
> > > The swiotlb code has appropriate calls to dma_mark_clean in place for
> > > buffers passed to swiotlb_map_page as an argument. However it assumes
> > > that the swiotlb bounce buffer (io_tlb_start-io_tlb_end) is already
> > > coherent and doesn't need any calls to dma_mark_clean.
> > >
> > > On ARM the swiotlb bounce buffer is not coherent (the memory is
> > > writealloc while it should be bufferable) and therefore we need to call
> > > dma_mark_clean appropriately on the bounce buffer code paths too.
> > >
> > > Note that most architecures have an empty dma_mark_clean implementation
> > > anyway.
> >
> > The other architecture that uses swiotlb is IA64 and that does have
> > an implementation where it touches on page attributes.
> >
> > Which means I have to figure out why my HP zx6000 won't boot with 3.11 now :-(
> >
>
> Now this is a very thorny issue.
>
> Honestly I don't like the dma_mark_clean interface very much: it's one
> big hammer, when we actually need some finesse to handle coherency.
>
> For example on ARM some devices might not need the dma_mark_clean call,
> while others do. Calling it all the times is at the very best
> inefficient and incorrect at worst.
>
> I am thinking of calling the original map/unmap_page functions instead
> (arm_dma_map_page or arm_coherent_dma_map_page in the arm case).
> However in order to do that I would need to add more __get_dma_ops calls in
> both lib/swiotlb.c and drivers/xen/swiotlb-xen.c

I think that is OK for the Xen-SWIOTLB case.

For the lib/swiotlb - would that mean that non-Xen-ARM would use the
SWIOTLB? If so, I am OK with that too.

>
>
> > > Signed-off-by: Stefano Stabellini <[email protected]>
> > > ---
> > > lib/swiotlb.c | 13 ++++++++++---
> > > 1 files changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> > > index 4e8686c..eb45d17 100644
> > > --- a/lib/swiotlb.c
> > > +++ b/lib/swiotlb.c
> > > @@ -515,6 +515,7 @@ found:
> > > io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> > > if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
> > > swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
> > > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> > >
> > > return tlb_addr;
> > > }
> > > @@ -547,7 +548,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
> > > * First, sync the memory before unmapping the entry
> > > */
> > > if (orig_addr && ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
> > > + {
> > > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> > > swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
> > > + }
> > >
> > > /*
> > > * Return the buffer to the free list by setting the corresponding
> > > @@ -587,17 +591,20 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
> > >
> > > switch (target) {
> > > case SYNC_FOR_CPU:
> > > - if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
> > > + if (likely(dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> > > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> > > swiotlb_bounce(orig_addr, tlb_addr,
> > > size, DMA_FROM_DEVICE);
> > > + }
> > > else
> > > BUG_ON(dir != DMA_TO_DEVICE);
> > > break;
> > > case SYNC_FOR_DEVICE:
> > > - if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
> > > + if (likely(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) {
> > > swiotlb_bounce(orig_addr, tlb_addr,
> > > size, DMA_TO_DEVICE);
> > > - else
> > > + dma_mark_clean(phys_to_virt(tlb_addr), size);
> > > + } else
> > > BUG_ON(dir != DMA_FROM_DEVICE);
> > > break;
> > > default:
> > > --
> > > 1.7.2.5
> > >
> >

2013-10-09 17:26:51

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 18/19] swiotlb-xen: introduce a rbtree to track phys to bus mappings

On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 27, 2013 at 05:10:06PM +0100, Stefano Stabellini wrote:
> > Introduce a second red-back tree to track phys to bus mappings created after
> > the initialization of the swiotlb buffer.
>
> Could you explain the use case a bit more please?
>
> As in:
> a) why is this needed
> b) why are we using the rb tree instead of something else (say FIFO queue)
> c) how long are these in usage?
> d) do we need locking now?

Given that I was rebuilding a P2M for autotranslate guests in
swiotlb-xen, making the code much harder to read, I decided to go the
alternative route of providing a real P2M to ARM guest.

The next swiotlb series will introduce arch/arm/xen/p2m.c, moving the
two rbtrees there. As a consequence we can start using swiotlb-xen on
ARM with minimal changes. The whole thing looks much nicer now.



> > Signed-off-by: Stefano Stabellini <[email protected]>
> > ---
> > drivers/xen/swiotlb-xen.c | 99 +++++++++++++++++++++++++++++++++++++-------
> > 1 files changed, 83 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 3011736..022bcaf 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -79,7 +79,8 @@ struct xen_dma_info {
> > dma_addr_t dma_addr;
> > phys_addr_t phys_addr;
> > size_t size;
> > - struct rb_node rbnode;
> > + struct rb_node rbnode_dma;
> > + struct rb_node rbnode_phys;
> > };
> >
> > /*
> > @@ -96,8 +97,13 @@ static struct xen_dma_info *xen_dma_seg;
> > * mappings.
> > */
> > static struct rb_root bus_to_phys = RB_ROOT;
> > +/*
> > + * This tree keeps track of physical address to bus address
> > + * mappings apart from the ones belonging to the initial swiotlb buffer.
> > + */
> > +static struct rb_root phys_to_bus = RB_ROOT;
> >
> > -static int xen_dma_add_entry(struct xen_dma_info *new)
> > +static int xen_dma_add_entry_bus(struct xen_dma_info *new)
> > {
> > struct rb_node **link = &bus_to_phys.rb_node;
> > struct rb_node *parent = NULL;
> > @@ -106,7 +112,7 @@ static int xen_dma_add_entry(struct xen_dma_info *new)
> >
> > while (*link) {
> > parent = *link;
> > - entry = rb_entry(parent, struct xen_dma_info, rbnode);
> > + entry = rb_entry(parent, struct xen_dma_info, rbnode_dma);
> >
> > if (new->dma_addr == entry->dma_addr)
> > goto err_out;
> > @@ -118,8 +124,41 @@ static int xen_dma_add_entry(struct xen_dma_info *new)
> > else
> > link = &(*link)->rb_right;
> > }
> > - rb_link_node(&new->rbnode, parent, link);
> > - rb_insert_color(&new->rbnode, &bus_to_phys);
> > + rb_link_node(&new->rbnode_dma, parent, link);
> > + rb_insert_color(&new->rbnode_dma, &bus_to_phys);
> > + goto out;
> > +
> > +err_out:
> > + rc = -EINVAL;
> > + pr_warn("%s: cannot add phys=%pa -> dma=%pa: phys=%pa -> dma=%pa already exists\n",
> > + __func__, &new->phys_addr, &new->dma_addr, &entry->phys_addr, &entry->dma_addr);
> > +out:
> > + return rc;
> > +}
> > +
> > +static int xen_dma_add_entry_phys(struct xen_dma_info *new)
> > +{
> > + struct rb_node **link = &phys_to_bus.rb_node;
> > + struct rb_node *parent = NULL;
> > + struct xen_dma_info *entry;
> > + int rc = 0;
> > +
> > + while (*link) {
> > + parent = *link;
> > + entry = rb_entry(parent, struct xen_dma_info, rbnode_phys);
> > +
> > + if (new->dma_addr == entry->dma_addr)
> > + goto err_out;
> > + if (new->phys_addr == entry->phys_addr)
> > + goto err_out;
> > +
> > + if (new->phys_addr < entry->phys_addr)
> > + link = &(*link)->rb_left;
> > + else
> > + link = &(*link)->rb_right;
> > + }
> > + rb_link_node(&new->rbnode_phys, parent, link);
> > + rb_insert_color(&new->rbnode_phys, &phys_to_bus);
> > goto out;
> >
> > err_out:
> > @@ -130,13 +169,22 @@ out:
> > return rc;
> > }
> >
> > +static int xen_dma_add_entry(struct xen_dma_info *new)
> > +{
> > + int rc;
> > + if ((rc = xen_dma_add_entry_bus(new) < 0) ||
> > + (rc = xen_dma_add_entry_phys(new) < 0))
>
> Please don't do that. Just do
>
> rc = xen_dma_add_entry(bus);
> if (rc)
> return rc;
> rc = xen_dma_add_entry_phys(new);
> if (rc) {
> // unwind it somehow? <<== This is important :-)
> }
> return rc;
>
>
> > + return rc;
> > + return 0;
> > +}
> > +
> > static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
> > {
> > struct rb_node *n = bus_to_phys.rb_node;
> > struct xen_dma_info *entry;
> >
> > while (n) {
> > - entry = rb_entry(n, struct xen_dma_info, rbnode);
> > + entry = rb_entry(n, struct xen_dma_info, rbnode_dma);
> > if (entry->dma_addr <= dma_addr &&
> > entry->dma_addr + entry->size > dma_addr) {
> > return entry;
> > @@ -150,11 +198,30 @@ static struct xen_dma_info *xen_get_dma_info_from_dma(dma_addr_t dma_addr)
> > return NULL;
> > }
> >
> > -static dma_addr_t xen_phys_to_bus(phys_addr_t paddr)
> > +static struct xen_dma_info *xen_get_dma_info_from_phys(phys_addr_t phys)
> > {
> > - int nr_seg;
> > - unsigned long offset;
> > - char *vaddr;
> > + struct rb_node *n = phys_to_bus.rb_node;
> > + struct xen_dma_info *entry;
> > +
> > + while (n) {
> > + entry = rb_entry(n, struct xen_dma_info, rbnode_phys);
> > + if (entry->phys_addr <= phys &&
> > + entry->phys_addr + entry->size > phys) {
> > + return entry;
> > + }
> > + if (phys < entry->phys_addr)
> > + n = n->rb_left;
> > + else
> > + n = n->rb_right;
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +/* Only looks into the initial buffer allocation in case of
> > + * XENFEAT_auto_translated_physmap guests. */
> > +static dma_addr_t xen_phys_to_bus_quick(phys_addr_t paddr) { int nr_seg;
> > + unsigned long offset; char *vaddr;
> >
> > if (!xen_feature(XENFEAT_auto_translated_physmap))
> > return phys_to_machine(XPADDR(paddr)).maddr;
> > @@ -184,7 +251,7 @@ static phys_addr_t xen_bus_to_phys(dma_addr_t baddr)
> >
> > static dma_addr_t xen_virt_to_bus(void *address)
> > {
> > - return xen_phys_to_bus(virt_to_phys(address));
> > + return xen_phys_to_bus_quick(virt_to_phys(address));
> > }
> >
> > static int check_pages_physically_contiguous(unsigned long pfn,
> > @@ -424,7 +491,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > * Do not use virt_to_phys(ret) because on ARM it doesn't correspond
> > * to *dma_handle. */
> > phys = *dma_handle;
> > - dev_addr = xen_phys_to_bus(phys);
> > + dev_addr = xen_phys_to_bus_quick(phys);
> > if (!xen_feature(XENFEAT_auto_translated_physmap) &&
> > ((dev_addr + size - 1 <= dma_mask)) &&
> > !range_straddles_page_boundary(phys, size))
> > @@ -503,7 +570,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > struct dma_attrs *attrs)
> > {
> > phys_addr_t map, phys = page_to_phys(page) + offset;
> > - dma_addr_t dev_addr = xen_phys_to_bus(phys);
> > + dma_addr_t dev_addr = xen_phys_to_bus_quick(phys);
> >
> > BUG_ON(dir == DMA_NONE);
> > /*
> > @@ -527,7 +594,7 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > if (map == SWIOTLB_MAP_ERROR)
> > return DMA_ERROR_CODE;
> >
> > - dev_addr = xen_phys_to_bus(map);
> > + dev_addr = xen_phys_to_bus_quick(map);
> >
> > /*
> > * Ensure that the address returned is DMA'ble
> > @@ -656,7 +723,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> >
> > for_each_sg(sgl, sg, nelems, i) {
> > phys_addr_t paddr = sg_phys(sg);
> > - dma_addr_t dev_addr = xen_phys_to_bus(paddr);
> > + dma_addr_t dev_addr = xen_phys_to_bus_quick(paddr);
> >
> > if (swiotlb_force ||
> > xen_feature(XENFEAT_auto_translated_physmap) ||
> > @@ -682,7 +749,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> > sg_dma_len(sgl) = 0;
> > return DMA_ERROR_CODE;
> > }
> > - sg->dma_address = xen_phys_to_bus(map);
> > + sg->dma_address = xen_phys_to_bus_quick(map);
> > } else
> > sg->dma_address = dev_addr;
> > sg_dma_len(sg) = sg->length;
> > --
> > 1.7.2.5
> >
>

2013-10-09 17:28:39

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 19/19] swiotlb-xen: instead of bouncing on the swiotlb, pin single pages

On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 27, 2013 at 05:10:07PM +0100, Stefano Stabellini wrote:
> > If we are dealing with single page mappings that don't cross page
> > boundaries, we can try to pin the page and get the corresponding mfn,
> > using xen_pin_page. This avoids going through the swiotlb bounce
> > buffer. If xen_pin_page fails (because the underlying mfn doesn't
> > respect the dma_mask) fall back to the swiotlb bounce buffer.
> > Add a ref count to xen_dma_info, so that we can avoid pinnig pages that
> > are already pinned.
> > Use a spinlock to protect accesses, insertions and deletions in the
> > rbtrees.
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>

Thanks for the review, however I am dropping this patch because it
wasn't improving performances as I was hoping it would be.

I am taking a new approach now: I am keeping the 1:1 physical to machine
mapping for dom0 and using swiotlb-xen only to handle dma requests
involving foreign grants.

The code is much nicer, and it runs much faster.


> > drivers/xen/swiotlb-xen.c | 152 ++++++++++++++++++++++++++++++++++++++++++---
> > 1 files changed, 143 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 022bcaf..6f94285 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -57,6 +57,8 @@
> > #define NR_DMA_SEGS ((xen_io_tlb_nslabs + IO_TLB_SEGSIZE - 1) / IO_TLB_SEGSIZE)
> > static char *xen_io_tlb_start, *xen_io_tlb_end;
> > static unsigned long xen_io_tlb_nslabs;
> > +spinlock_t swiotlb_lock;
> > +
> > /*
> > * Quick lookup value of the bus address of the IOTLB.
> > */
> > @@ -79,6 +81,7 @@ struct xen_dma_info {
> > dma_addr_t dma_addr;
> > phys_addr_t phys_addr;
> > size_t size;
> > + atomic_t refs;
> > struct rb_node rbnode_dma;
> > struct rb_node rbnode_phys;
> > };
> > @@ -254,6 +257,48 @@ static dma_addr_t xen_virt_to_bus(void *address)
> > return xen_phys_to_bus_quick(virt_to_phys(address));
> > }
> >
> > +static int xen_pin_dev_page(struct device *dev,
> > + phys_addr_t phys,
> > + dma_addr_t *dev_addr)
>
> Something is odd with your tabs.
> > +{
> > + u64 dma_mask = DMA_BIT_MASK(32);
>
> Why 32?
>
> > + xen_pfn_t in;
> > + struct xen_dma_info *dma_info = xen_get_dma_info_from_phys(phys);
> > +
> > + if (dma_info != NULL) {
> > + atomic_inc(&dma_info->refs);
> > + *dev_addr = dma_info->dma_addr + (phys - dma_info->phys_addr);
> > + return 0;
> > + }
> > +
> > + if (dev && dev->coherent_dma_mask)
> > + dma_mask = dma_alloc_coherent_mask(dev, GFP_KERNEL);
> > +
> > + in = phys >> PAGE_SHIFT;
> > + if (!xen_pin_page(&in, fls64(dma_mask))) {
>
> Why not just make xen_pin_page use an phys address and it can also
> do the appropiate bit shifting in it?
>
> > + *dev_addr = in << PAGE_SHIFT;
> > + dma_info = kzalloc(sizeof(struct xen_dma_info), GFP_NOWAIT);
> > + if (!dma_info) {
> > + pr_warn("cannot allocate xen_dma_info\n");
> > + xen_destroy_contiguous_region(phys & PAGE_MASK, 0);
>
> Perhaps we should add an inline function for that called 'xen_unpin_page' ?
>
> > + return -ENOMEM;
> > + }
> > + dma_info->phys_addr = phys & PAGE_MASK;
> > + dma_info->size = PAGE_SIZE;
> > + dma_info->dma_addr = *dev_addr;
> > + if (xen_dma_add_entry(dma_info)) {
> > + pr_warn("cannot add new entry to bus_to_phys\n");
> > + xen_destroy_contiguous_region(phys & PAGE_MASK, 0);
> > + kfree(dma_info);
> > + return -EFAULT;
> > + }
> > + atomic_set(&dma_info->refs, 1);
> > + *dev_addr += (phys & ~PAGE_MASK);
> > + return 0;
> > + }
>
> Don't you want to the opposite of dma_alloc_coherent_mask ?
>
> > + return -EFAULT;
> > +}
> > +
> > static int check_pages_physically_contiguous(unsigned long pfn,
> > unsigned int offset,
> > size_t length)
> > @@ -434,6 +479,7 @@ retry:
> > rc = 0;
> > } else
> > rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs);
> > + spin_lock_init(&swiotlb_lock);
> > return rc;
> > error:
> > if (repeat--) {
> > @@ -461,6 +507,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > phys_addr_t phys;
> > dma_addr_t dev_addr;
> > struct xen_dma_info *dma_info = NULL;
> > + unsigned long irqflags;
> >
> > /*
> > * Ignore region specifiers - the kernel's ideas of
> > @@ -497,7 +544,7 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > !range_straddles_page_boundary(phys, size))
> > *dma_handle = dev_addr;
> > else {
> > - if (xen_create_contiguous_region(phys, order,
> > + if (xen_create_contiguous_region(phys & PAGE_MASK, order,
> > fls64(dma_mask), dma_handle) != 0) {
> > xen_free_coherent_pages(hwdev, size, ret, (dma_addr_t)phys, attrs);
> > return NULL;
> > @@ -509,15 +556,19 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
> > xen_destroy_contiguous_region(phys, order);
> > return NULL;
> > }
> > - dma_info->phys_addr = phys;
> > - dma_info->size = size;
> > + dma_info->phys_addr = phys & PAGE_MASK;
> > + dma_info->size = (1U << order) << PAGE_SHIFT;
> > dma_info->dma_addr = *dma_handle;
> > + atomic_set(&dma_info->refs, 1);
> > + spin_lock_irqsave(&swiotlb_lock, irqflags);
> > if (xen_dma_add_entry(dma_info)) {
> > + spin_unlock_irqrestore(&swiotlb_lock, irqflags);
> > pr_warn("cannot add new entry to bus_to_phys\n");
> > xen_destroy_contiguous_region(phys, order);
> > kfree(dma_info);
> > return NULL;
> > }
> > + spin_unlock_irqrestore(&swiotlb_lock, irqflags);
> > }
> > memset(ret, 0, size);
> > return ret;
> > @@ -532,6 +583,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
> > phys_addr_t phys;
> > u64 dma_mask = DMA_BIT_MASK(32);
> > struct xen_dma_info *dma_info = NULL;
> > + unsigned long flags;
> >
> > if (dma_release_from_coherent(hwdev, order, vaddr))
> > return;
> > @@ -539,6 +591,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
> > if (hwdev && hwdev->coherent_dma_mask)
> > dma_mask = hwdev->coherent_dma_mask;
> >
> > + spin_lock_irqsave(&swiotlb_lock, flags);
> > /* do not use virt_to_phys because on ARM it doesn't return you the
> > * physical address */
> > phys = xen_bus_to_phys(dev_addr);
> > @@ -546,12 +599,16 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
> > if (xen_feature(XENFEAT_auto_translated_physmap) ||
> > (((dev_addr + size - 1 > dma_mask)) ||
> > range_straddles_page_boundary(phys, size))) {
> > - xen_destroy_contiguous_region(phys, order);
> > dma_info = xen_get_dma_info_from_dma(dev_addr);
> > - rb_erase(&dma_info->rbnode, &bus_to_phys);
> > - kfree(dma_info);
> > + if (atomic_dec_and_test(&dma_info->refs)) {
> > + xen_destroy_contiguous_region(phys & PAGE_MASK, order);
> > + rb_erase(&dma_info->rbnode_dma, &bus_to_phys);
> > + rb_erase(&dma_info->rbnode_phys, &phys_to_bus);
> > + kfree(dma_info);
> > + }
>
> If xen_pin_dev_page failed or was not called we would still end up
> calling this. And we would decrement a potentially garbage value? Or not.
> > }
> >
> > + spin_unlock_irqrestore(&swiotlb_lock, flags);
> > xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
> > }
> > EXPORT_SYMBOL_GPL(xen_swiotlb_free_coherent);
> > @@ -583,6 +640,23 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
> > !range_straddles_page_boundary(phys, size) && !swiotlb_force)
> > return dev_addr;
> >
> > + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> > + size <= PAGE_SIZE &&
> > + !range_straddles_page_boundary(phys, size) &&
> > + !swiotlb_force) {
> > + unsigned long flags;
> > + int rc;
> > +
> > + spin_lock_irqsave(&swiotlb_lock, flags);
> > + rc = xen_pin_dev_page(dev, phys, &dev_addr);
> > + spin_unlock_irqrestore(&swiotlb_lock, flags);
> > +
> > + if (!rc) {
> > + dma_mark_clean(phys_to_virt(phys), size);
> > + return dev_addr;
> > + }
>
> And if there is an rc you should probably do
> dev_warn(.., "RC ..")
>
>
> But more importantly - all of this code adds an extra lock on the X86 side
> which will get -ENOxxx on the xen_pin_dev_page.
>
> I am wondering if it makes sense to make most of this code dependent
> on CONFIG_ARM? As the check for auto-xlat falls flat on X86 + PVH. Thought
> I have no idea what we want to do with PVH and X86 at this point.
>
> > + }
> > +
> > /*
> > * Oh well, have to allocate and map a bounce buffer.
> > * Pass the dma_addr of the first slab in the iotlb buffer as
> > @@ -618,10 +692,37 @@ EXPORT_SYMBOL_GPL(xen_swiotlb_map_page);
> > static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
> > size_t size, enum dma_data_direction dir)
> > {
> > - phys_addr_t paddr = xen_bus_to_phys(dev_addr);
> > + struct xen_dma_info *dma_info;
> > + phys_addr_t paddr = DMA_ERROR_CODE;
> > + char *vaddr = NULL;
> > + unsigned long flags;
> >
> > BUG_ON(dir == DMA_NONE);
> >
> > + spin_lock_irqsave(&swiotlb_lock, flags);
> > + dma_info = xen_get_dma_info_from_dma(dev_addr);
> > + if (dma_info != NULL) {
> > + paddr = dma_info->phys_addr + (dev_addr - dma_info->dma_addr);
> > + vaddr = phys_to_virt(paddr);
> > + }
> > +
> > + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> > + paddr != DMA_ERROR_CODE &&
> > + !(vaddr >= xen_io_tlb_start && vaddr < xen_io_tlb_end) &&
> > + !swiotlb_force) {
> > + if (atomic_dec_and_test(&dma_info->refs)) {
> > + xen_destroy_contiguous_region(paddr & PAGE_MASK, 0);
> > + rb_erase(&dma_info->rbnode_dma, &bus_to_phys);
> > + rb_erase(&dma_info->rbnode_phys, &phys_to_bus);
> > + kfree(dma_info);
> > + }
> > + spin_unlock_irqrestore(&swiotlb_lock, flags);
> > + if ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL))
> > + dma_mark_clean(vaddr, size);
> > + return;
> > + }
> > + spin_unlock_irqrestore(&swiotlb_lock, flags);
> > +
> > /* NOTE: We use dev_addr here, not paddr! */
> > if (is_xen_swiotlb_buffer(dev_addr)) {
> > swiotlb_tbl_unmap_single(hwdev, paddr, size, dir);
> > @@ -664,9 +765,19 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
> > enum dma_sync_target target)
> > {
> > phys_addr_t paddr = xen_bus_to_phys(dev_addr);
> > + char *vaddr = phys_to_virt(paddr);
> >
> > BUG_ON(dir == DMA_NONE);
> >
> > + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> > + paddr != DMA_ERROR_CODE &&
> > + size <= PAGE_SIZE &&
> > + !(vaddr >= xen_io_tlb_start && vaddr < xen_io_tlb_end) &&
> > + !range_straddles_page_boundary(paddr, size) && !swiotlb_force) {
> > + dma_mark_clean(vaddr, size);
> > + return;
> > + }
> > +
> > /* NOTE: We use dev_addr here, not paddr! */
> > if (is_xen_swiotlb_buffer(dev_addr)) {
> > swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target);
> > @@ -717,13 +828,36 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> > struct dma_attrs *attrs)
> > {
> > struct scatterlist *sg;
> > - int i;
> > + int i, rc;
> > + u64 dma_mask = DMA_BIT_MASK(32);
> > + unsigned long flags;
> >
> > BUG_ON(dir == DMA_NONE);
> >
> > + if (hwdev && hwdev->coherent_dma_mask)
> > + dma_mask = dma_alloc_coherent_mask(hwdev, GFP_KERNEL);
> > +
> > for_each_sg(sgl, sg, nelems, i) {
> > phys_addr_t paddr = sg_phys(sg);
> > - dma_addr_t dev_addr = xen_phys_to_bus_quick(paddr);
> > + dma_addr_t dev_addr;
> > +
> > + if (xen_feature(XENFEAT_auto_translated_physmap) &&
> > + !range_straddles_page_boundary(paddr, sg->length) &&
> > + sg->length <= PAGE_SIZE &&
> > + !swiotlb_force) {
> > +
> > + spin_lock_irqsave(&swiotlb_lock, flags);
> > + rc = xen_pin_dev_page(hwdev, paddr, &dev_addr);
> > + spin_unlock_irqrestore(&swiotlb_lock, flags);
> > +
> > + if (!rc) {
> > + dma_mark_clean(phys_to_virt(paddr), sg->length);
> > + sg_dma_len(sg) = sg->length;
> > + sg->dma_address = dev_addr;
> > + continue;
> > + }
> > + }
> > + dev_addr = xen_phys_to_bus_quick(paddr);
> >
> > if (swiotlb_force ||
> > xen_feature(XENFEAT_auto_translated_physmap) ||
> > --
> > 1.7.2.5
> >
>

2013-10-09 17:32:11

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 14/19] swiotlb: print a warning when the swiotlb is full

On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 27, 2013 at 05:10:02PM +0100, Stefano Stabellini wrote:
> > Signed-off-by: Stefano Stabellini <[email protected]>
> > ---
> > drivers/xen/swiotlb-xen.c | 1 +
> > lib/swiotlb.c | 1 +
> > 2 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 96ad316..790c2eb 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -674,6 +674,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
> > sg->length,
> > dir);
> > if (map == SWIOTLB_MAP_ERROR) {
> > + pr_warn("swiotlb buffer is full\n");
>
> It would be beneficial to use dev_warn instead.

OK

> And perhaps even call debug_dma_dump_mappings to help in diagnosing
> a problem?

I don't think that would be very beneficial because this is an internal
swiotlb issue: we just run out of buffer space.


> > /* Don't panic here, we expect map_sg users
> > to do proper error handling. */
> > xen_swiotlb_unmap_sg_attrs(hwdev, sgl, i, dir,
> > diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> > index eb45d17..f06da0d 100644
> > --- a/lib/swiotlb.c
> > +++ b/lib/swiotlb.c
> > @@ -502,6 +502,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> >
> > not_found:
> > spin_unlock_irqrestore(&io_tlb_lock, flags);
> > + pr_warn("swiotlb buffer is full\n");
> > return SWIOTLB_MAP_ERROR;
> > found:
> > spin_unlock_irqrestore(&io_tlb_lock, flags);
> > --
> > 1.7.2.5
> >
>

2013-10-09 17:55:50

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [PATCH v6 16/19] arm,arm64: do not always merge biovec if we are running on Xen

On Mon, 30 Sep 2013, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 27, 2013 at 05:10:04PM +0100, Stefano Stabellini wrote:
> > This is similar to what it is done on X86: biovecs are prevented from merging
> > otherwise every dma requests would be forced to bounce on the swiotlb buffer.
> >
> > Signed-off-by: Stefano Stabellini <[email protected]>
>
>
> This also alters the generic (well, was x86, now it is generic) code.
> Perhaps that change should be made in a seperate patch and explain
> what the autotranslate extra check will do for PV, PVH and HVM guests?
>
> Wait, does that mean we had not been merging requests on HVM guests?
> Is that good or bad?

With the new introduction of a p2m for xen on arm and arm64, I don't
need the change to drivers/xen/biomerge.c anymore.
No more autotranslate extra check.

> > ---
> > arch/arm/include/asm/io.h | 8 ++++++++
> > arch/arm64/include/asm/io.h | 9 +++++++++
> > drivers/xen/biomerge.c | 4 +++-
> > 3 files changed, 20 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> > index d070741..c45effc 100644
> > --- a/arch/arm/include/asm/io.h
> > +++ b/arch/arm/include/asm/io.h
> > @@ -24,9 +24,11 @@
> > #ifdef __KERNEL__
> >
> > #include <linux/types.h>
> > +#include <linux/blk_types.h>
> > #include <asm/byteorder.h>
> > #include <asm/memory.h>
> > #include <asm-generic/pci_iomap.h>
> > +#include <xen/xen.h>
> >
> > /*
> > * ISA I/O bus memory addresses are 1:1 with the physical address.
> > @@ -372,6 +374,12 @@ extern void pci_iounmap(struct pci_dev *dev, void __iomem *addr);
> > #define BIOVEC_MERGEABLE(vec1, vec2) \
> > ((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))
> >
> > +extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> > + const struct bio_vec *vec2);
> > +#define BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
> > + (__BIOVEC_PHYS_MERGEABLE(vec1, vec2) && \
> > + (!xen_domain() || xen_biovec_phys_mergeable(vec1, vec2)))
> > +
> > #ifdef CONFIG_MMU
> > #define ARCH_HAS_VALID_PHYS_ADDR_RANGE
> > extern int valid_phys_addr_range(phys_addr_t addr, size_t size);
> > diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> > index 1d12f89..c163287b 100644
> > --- a/arch/arm64/include/asm/io.h
> > +++ b/arch/arm64/include/asm/io.h
> > @@ -22,11 +22,14 @@
> > #ifdef __KERNEL__
> >
> > #include <linux/types.h>
> > +#include <linux/blk_types.h>
> >
> > #include <asm/byteorder.h>
> > #include <asm/barrier.h>
> > #include <asm/pgtable.h>
> >
> > +#include <xen/xen.h>
> > +
> > /*
> > * Generic IO read/write. These perform native-endian accesses.
> > */
> > @@ -263,5 +266,11 @@ extern int devmem_is_allowed(unsigned long pfn);
> > */
> > #define xlate_dev_kmem_ptr(p) p
> >
> > +extern bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> > + const struct bio_vec *vec2);
> > +#define BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
> > + (__BIOVEC_PHYS_MERGEABLE(vec1, vec2) && \
> > + (!xen_domain() || xen_biovec_phys_mergeable(vec1, vec2)))
> > +
> > #endif /* __KERNEL__ */
> > #endif /* __ASM_IO_H */
> > diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
> > index 0edb91c..f323a2d 100644
> > --- a/drivers/xen/biomerge.c
> > +++ b/drivers/xen/biomerge.c
> > @@ -2,6 +2,7 @@
> > #include <linux/io.h>
> > #include <linux/export.h>
> > #include <xen/page.h>
> > +#include <xen/features.h>
> >
> > bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> > const struct bio_vec *vec2)
> > @@ -9,7 +10,8 @@ bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
> > unsigned long mfn1 = pfn_to_mfn(page_to_pfn(vec1->bv_page));
> > unsigned long mfn2 = pfn_to_mfn(page_to_pfn(vec2->bv_page));
> >
> > - return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
> > + return !xen_feature(XENFEAT_auto_translated_physmap) &&
> > + __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
> > ((mfn1 == mfn2) || ((mfn1+1) == mfn2));
> > }
> > EXPORT_SYMBOL(xen_biovec_phys_mergeable);