2010-02-18 16:44:33

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [REPOST PATCH RFC SWIOTLB 0.5] Separate address translation routines from core SWIOTLB book-keeping

[Sorry for the repost - I sent the earlier one to the wrong LKML email]

Fujita-san et al.

Attached is a set of ten RFC patches that separate the address translation
(virt_to_phys, virt_to_bus, etc) from the SWIOTLB library. They also enhance
the SWIOTLB library a bit.

The idea behind this set of patches is to make it possible to have separate
mechanisms for translating virtual to physical or virtual to DMA addresses
on platforms which need an SWIOTLB, and where physical != PCI bus address.

One customers of this is the pv-ops project, which can switch between
different modes of operation depending on the environment it is running in:
bare-metal or virtualized (Xen for now).

On bare-metal SWIOTLB is used when there are no hardware IOMMU. In virtualized
environment it used when PCI pass-through is enabled for the guest. The problems
with PCI pass-through is that the guest's idea of PFN's is not the real thing.
To fix that, there is translation layer for PFN->machine frame number and vice-versa.
To bubble that up to the SWIOTLB layer there are two possible solutions.

One solution has been to wholesale copy the SWIOTLB, stick it in
arch/x86/xen/swiotlb.c and modify the virt_to_phys, phys_to_virt and others
to use the Xen address translation functions. Unfortunately, since the kernel can
run on bare-metal, there would be big code overlap with the real SWIOTLB.
(git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git xen/dom0/swiotlb-new)

Another approach, which this set of patches explores, is to abstract the
address translation and address determination functions away from the
SWIOTLB book-keeping functions. This way the core SWIOTLB library functions
are present in one place, while the address related functions are in
a separate library that can be loaded when running under non-bare-metal platform.

The set of ten patches is also accessible on:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb-2.6.git swiotlb-0.5

An example of how this can be utilized in both bare-metal and Xen environments
is this git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb-2.6.git xen-swiotlb-0.5

---

Documentation/x86/x86_64/boot-options.txt | 6 +-
include/linux/swiotlb.h | 44 ++++++-
lib/swiotlb.c | 187 ++++++++++++++++++-----------
3 files changed, 156 insertions(+), 81 deletions(-)


2010-02-18 16:44:26

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 01/10] swiotlb: fix: Update 'setup_io_tlb_npages' to accept both arguments in either order.

Before this patch, if you specified 'swiotlb=force,1024' it would
ignore both arguments. This fixes it and allows the user specify it
in any order (or none at all).

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
lib/swiotlb.c | 20 +++++++++++---------
1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 437eedb..e6d9e32 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -102,16 +102,18 @@ static int late_alloc;
static int __init
setup_io_tlb_npages(char *str)
{
- if (isdigit(*str)) {
- io_tlb_nslabs = simple_strtoul(str, &str, 0);
- /* avoid tail segment of size < IO_TLB_SEGSIZE */
- io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ while (*str) {
+ if (isdigit(*str)) {
+ io_tlb_nslabs = simple_strtoul(str, &str, 0);
+ /* avoid tail segment of size < IO_TLB_SEGSIZE */
+ io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+ }
+ if (!strncmp(str, "force", 5))
+ swiotlb_force = 1;
+ str += strcspn(str, ",");
+ if (*str == ',')
+ ++str;
}
- if (*str == ',')
- ++str;
- if (!strcmp(str, "force"))
- swiotlb_force = 1;
-
return 1;
}
__setup("swiotlb=", setup_io_tlb_npages);
--
1.6.2.5

2010-02-18 16:44:39

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 05/10] swiotlb: Make internal bookkeeping functions have 'do_' prefix.

The functions that operate on io_tlb_list/io_tlb_start/io_tlb_orig_addr
have the prefix 'do_' now.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
lib/swiotlb.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 0eb64d7..9085eab 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -60,8 +60,8 @@ enum dma_sync_target {
int swiotlb_force;

/*
- * Used to do a quick range check in unmap_single and
- * sync_single_*, to see if the memory was in fact allocated by this
+ * Used to do a quick range check in do_unmap_single and
+ * do_sync_single_*, to see if the memory was in fact allocated by this
* API.
*/
static char *io_tlb_start, *io_tlb_end;
@@ -394,7 +394,7 @@ static void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
* Allocates bounce buffer and returns its kernel virtual address.
*/
static void *
-map_single(struct device *hwdev, phys_addr_t phys, size_t size, int dir)
+do_map_single(struct device *hwdev, phys_addr_t phys, size_t size, int dir)
{
unsigned long flags;
char *dma_addr;
@@ -540,7 +540,7 @@ do_unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
}

static void
-sync_single(struct device *hwdev, char *dma_addr, size_t size,
+do_sync_single(struct device *hwdev, char *dma_addr, size_t size,
int dir, int target)
{
int index = (dma_addr - io_tlb_start) >> IO_TLB_SHIFT;
@@ -589,10 +589,10 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
if (!ret) {
/*
* We are either out of memory or the device can't DMA
- * to GFP_DMA memory; fall back on map_single(), which
+ * to GFP_DMA memory; fall back on do_map_single(), which
* will grab memory from the lowest available address range.
*/
- ret = map_single(hwdev, 0, size, DMA_FROM_DEVICE);
+ ret = do_map_single(hwdev, 0, size, DMA_FROM_DEVICE);
if (!ret)
return NULL;
}
@@ -626,7 +626,7 @@ swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
if (!is_swiotlb_buffer(paddr))
free_pages((unsigned long)vaddr, get_order(size));
else
- /* DMA_TO_DEVICE to avoid memcpy in unmap_single */
+ /* DMA_TO_DEVICE to avoid memcpy in do_unmap_single */
do_unmap_single(hwdev, vaddr, size, DMA_TO_DEVICE);
}
EXPORT_SYMBOL(swiotlb_free_coherent);
@@ -682,7 +682,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
/*
* Oh well, have to allocate and map a bounce buffer.
*/
- map = map_single(dev, phys, size, dir);
+ map = do_map_single(dev, phys, size, dir);
if (!map) {
swiotlb_full(dev, size, dir, 1);
map = io_tlb_overflow_buffer;
@@ -759,7 +759,7 @@ swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr,
BUG_ON(dir == DMA_NONE);

if (is_swiotlb_buffer(paddr)) {
- sync_single(hwdev, phys_to_virt(paddr), size, dir, target);
+ do_sync_single(hwdev, phys_to_virt(paddr), size, dir, target);
return;
}

@@ -847,7 +847,7 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,

if (swiotlb_force ||
!dma_capable(hwdev, dev_addr, sg->length)) {
- void *map = map_single(hwdev, sg_phys(sg),
+ void *map = do_map_single(hwdev, sg_phys(sg),
sg->length, dir);
if (!map) {
/* Don't panic here, we expect map_sg users
--
1.6.2.5

2010-02-18 16:52:46

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 10/10] swiotlb: EXPORT_SYMBOL_GPL functions + variables that are defined in the header file.

Make the functions and variables that are now declared in the swiotlb.h
header file visible by the linker.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
lib/swiotlb.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 674d025..ddf4a2d 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -56,19 +56,24 @@ int swiotlb_force;
* API.
*/
char *io_tlb_start, *io_tlb_end;
+EXPORT_SYMBOL_GPL(io_tlb_start);
+EXPORT_SYMBOL_GPL(io_tlb_end);

/*
* The number of IO TLB blocks (in groups of 64) betweeen io_tlb_start and
* io_tlb_end. This is command line adjustable via setup_io_tlb_npages.
*/
unsigned long io_tlb_nslabs;
+EXPORT_SYMBOL_GPL(io_tlb_nslabs);

/*
* When the IOMMU overflows we return a fallback buffer. This sets the size.
*/
unsigned long io_tlb_overflow = 32*1024;
+EXPORT_SYMBOL_GPL(io_tlb_overflow);

void *io_tlb_overflow_buffer;
+EXPORT_SYMBOL_GPL(io_tlb_overflow_buffer);

/*
* This is a free list describing the number of free entries available from
@@ -204,6 +209,7 @@ swiotlb_init_early(size_t default_size, int verbose)
if (verbose)
swiotlb_print_info();
}
+EXPORT_SYMBOL_GPL(swiotlb_init_early);

void __init
swiotlb_init(int verbose)
@@ -337,6 +343,7 @@ int is_swiotlb_buffer(phys_addr_t paddr)
return paddr >= virt_to_phys(io_tlb_start) &&
paddr < virt_to_phys(io_tlb_end);
}
+EXPORT_SYMBOL_GPL(is_swiotlb_buffer);

/*
* Bounce: copy the swiotlb buffer back to the original dma location
@@ -378,6 +385,7 @@ void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
memcpy(phys_to_virt(phys), dma_addr, size);
}
}
+EXPORT_SYMBOL_GPL(swiotlb_bounce);

/*
* Allocates bounce buffer and returns its kernel virtual address.
@@ -484,6 +492,7 @@ found:

return dma_addr;
}
+EXPORT_SYMBOL_GPL(do_map_single);

/*
* dma_addr is the kernel virtual address of the bounce buffer to unmap.
@@ -528,6 +537,7 @@ do_unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
}
spin_unlock_irqrestore(&io_tlb_lock, flags);
}
+EXPORT_SYMBOL_GPL(do_unmap_single);

void
do_sync_single(struct device *hwdev, char *dma_addr, size_t size,
@@ -555,6 +565,7 @@ do_sync_single(struct device *hwdev, char *dma_addr, size_t size,
BUG();
}
}
+EXPORT_SYMBOL_GPL(do_sync_single);

void *
swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -646,6 +657,7 @@ swiotlb_full(struct device *dev, size_t size, int dir, int do_panic)
if (dir == DMA_TO_DEVICE)
panic("DMA: Random memory could be DMA read\n");
}
+EXPORT_SYMBOL_GPL(swiotlb_full);

/*
* Map a single buffer of the indicated size for DMA in streaming mode. The
--
1.6.2.5

2010-02-18 16:54:22

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 02/10] swiotlb: Make 'setup_io_tlb_npages' accept new 'swiotlb=' syntax.

The old syntax for 'swiotlb' is still in effect, and we extend it
now to include the overflow buffer size. The syntax is now:

swiotlb=[force,][nslabs=<pages>,][overflow=<size>] or more
commonly know as:

swiotlb=[force]
swiotlb=[nslabs=<pages>]
swiotlb=[overflow=<size>]

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
Documentation/x86/x86_64/boot-options.txt | 6 ++++-
lib/swiotlb.c | 36 +++++++++++++++++++++++++---
2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 29a6ff8..81f9b94 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -267,10 +267,14 @@ IOMMU (input/output memory management unit)

iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
implementation:
- swiotlb=<pages>[,force]
+ swiotlb=[npages=<pages>]
+ swiotlb=[force]
+ swiotlb=[overflow=<size>]
+
<pages> Prereserve that many 128K pages for the software IO
bounce buffering.
force Force all IO through the software TLB.
+ <size> Size in bytes of the overflow buffer.

Settings for the IBM Calgary hardware IOMMU currently found in IBM
pSeries and xSeries machines:
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index e6d9e32..0663879 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -102,7 +102,27 @@ static int late_alloc;
static int __init
setup_io_tlb_npages(char *str)
{
+ int get_value(const char *token, char *str, char **endp)
+ {
+ ssize_t len;
+ int val = 0;
+
+ len = strlen(token);
+ if (!strncmp(str, token, len)) {
+ str += len;
+ if (*str == '=')
+ ++str;
+ if (*str != '\0')
+ val = simple_strtoul(str, endp, 0);
+ }
+ *endp = str;
+ return val;
+ }
+
+ int val;
+
while (*str) {
+ /* The old syntax */
if (isdigit(*str)) {
io_tlb_nslabs = simple_strtoul(str, &str, 0);
/* avoid tail segment of size < IO_TLB_SEGSIZE */
@@ -110,14 +130,22 @@ setup_io_tlb_npages(char *str)
}
if (!strncmp(str, "force", 5))
swiotlb_force = 1;
- str += strcspn(str, ",");
- if (*str == ',')
- ++str;
+ /* The new syntax: swiotlb=nslabs=16384,overflow=32768,force */
+ val = get_value("nslabs", str, &str);
+ if (val)
+ io_tlb_nslabs = ALIGN(val, IO_TLB_SEGSIZE);
+
+ val = get_value("overflow", str, &str);
+ if (val)
+ io_tlb_overflow = val;
+ str = strpbrk(str, ",");
+ if (!str)
+ break;
+ str++; /* skip ',' */
}
return 1;
}
__setup("swiotlb=", setup_io_tlb_npages);
-/* make io_tlb_overflow tunable too? */

/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
--
1.6.2.5

2010-02-18 16:54:29

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 09/10] swiotlb: Make swiotlb bookkeeping functions visible in the header file.

We put the init, free, and functions dealing with the operations on the
SWIOTLB buffer at the top of the header. Also we export some of the variables
that are used by the dma_ops functions.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
include/linux/swiotlb.h | 31 ++++++++++++++++++++++++++++++-
lib/swiotlb.c | 26 +++++++++-----------------
2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 84e7a53..af66473 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -30,8 +30,37 @@ static inline void swiotlb_free(void) { }
#endif
extern void swiotlb_print_info(void);

+/* Internal book-keeping functions. Must be linked against the library
+ * to take advantage of them.*/
+#ifdef CONFIG_SWIOTLB
+/*
+ * Enumeration for sync targets
+ */
+enum dma_sync_target {
+ SYNC_FOR_CPU = 0,
+ SYNC_FOR_DEVICE = 1,
+};
+extern char *io_tlb_start;
+extern char *io_tlb_end;
+extern unsigned long io_tlb_nslabs;
+extern void *io_tlb_overflow_buffer;
+extern unsigned long io_tlb_overflow;
+extern int is_swiotlb_buffer(phys_addr_t paddr);
+extern void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
+ enum dma_data_direction dir);
+extern void *do_map_single(struct device *hwdev, phys_addr_t phys,
+ unsigned long start_dma_addr, size_t size, int dir);
+
+extern void do_unmap_single(struct device *hwdev, char *dma_addr, size_t size,
+ int dir);
+
+extern void do_sync_single(struct device *hwdev, char *dma_addr, size_t size,
+ int dir, int target);
+extern void swiotlb_full(struct device *dev, size_t size, int dir, int do_panic);
+extern void __init swiotlb_init_early(size_t default_size, int verbose);
+#endif

-/* IOMMU functions. */
+/* swiotlb.c: dma_ops functions. */
extern void
*swiotlb_alloc_coherent(struct device *hwdev, size_t size,
dma_addr_t *dma_handle, gfp_t flags);
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 80a2306..674d025 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -48,14 +48,6 @@
*/
#define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)

-/*
- * Enumeration for sync targets
- */
-enum dma_sync_target {
- SYNC_FOR_CPU = 0,
- SYNC_FOR_DEVICE = 1,
-};
-
int swiotlb_force;

/*
@@ -63,18 +55,18 @@ int swiotlb_force;
* do_sync_single_*, to see if the memory was in fact allocated by this
* API.
*/
-static char *io_tlb_start, *io_tlb_end;
+char *io_tlb_start, *io_tlb_end;

/*
* The number of IO TLB blocks (in groups of 64) betweeen io_tlb_start and
* io_tlb_end. This is command line adjustable via setup_io_tlb_npages.
*/
-static unsigned long io_tlb_nslabs;
+unsigned long io_tlb_nslabs;

/*
* When the IOMMU overflows we return a fallback buffer. This sets the size.
*/
-static unsigned long io_tlb_overflow = 32*1024;
+unsigned long io_tlb_overflow = 32*1024;

void *io_tlb_overflow_buffer;

@@ -340,7 +332,7 @@ void __init swiotlb_free(void)
}
}

-static int is_swiotlb_buffer(phys_addr_t paddr)
+int is_swiotlb_buffer(phys_addr_t paddr)
{
return paddr >= virt_to_phys(io_tlb_start) &&
paddr < virt_to_phys(io_tlb_end);
@@ -349,7 +341,7 @@ static int is_swiotlb_buffer(phys_addr_t paddr)
/*
* Bounce: copy the swiotlb buffer back to the original dma location
*/
-static void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
+void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
enum dma_data_direction dir)
{
unsigned long pfn = PFN_DOWN(phys);
@@ -390,7 +382,7 @@ static void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
/*
* Allocates bounce buffer and returns its kernel virtual address.
*/
-static void *
+void *
do_map_single(struct device *hwdev, phys_addr_t phys,
unsigned long start_dma_addr, size_t size, int dir)
{
@@ -496,7 +488,7 @@ found:
/*
* dma_addr is the kernel virtual address of the bounce buffer to unmap.
*/
-static void
+void
do_unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
{
unsigned long flags;
@@ -537,7 +529,7 @@ do_unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
spin_unlock_irqrestore(&io_tlb_lock, flags);
}

-static void
+void
do_sync_single(struct device *hwdev, char *dma_addr, size_t size,
int dir, int target)
{
@@ -632,7 +624,7 @@ swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
}
EXPORT_SYMBOL(swiotlb_free_coherent);

-static void
+void
swiotlb_full(struct device *dev, size_t size, int dir, int do_panic)
{
/*
--
1.6.2.5

2010-02-18 17:00:17

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 03/10] swiotlb: Normalize the swiotlb_init_* function's naming syntax.

The previous function names were misleading by including "with_default_size"
as the size was being passed in as argument. Change the functions names
to be clear on what they do.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
lib/swiotlb.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 0663879..7b66fc3 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -174,7 +174,7 @@ void swiotlb_print_info(void)
* structures for the software IO TLB used to implement the DMA API.
*/
void __init
-swiotlb_init_with_default_size(size_t default_size, int verbose)
+swiotlb_init_early(size_t default_size, int verbose)
{
unsigned long i, bytes;

@@ -217,7 +217,7 @@ swiotlb_init_with_default_size(size_t default_size, int verbose)
void __init
swiotlb_init(int verbose)
{
- swiotlb_init_with_default_size(64 * (1<<20), verbose); /* default to 64MB */
+ swiotlb_init_early(64 * (1<<20), verbose); /* default to 64MB */
}

/*
@@ -226,7 +226,7 @@ swiotlb_init(int verbose)
* This should be just like above, but with some error catching.
*/
int
-swiotlb_late_init_with_default_size(size_t default_size)
+swiotlb_init_late(size_t default_size)
{
unsigned long i, bytes, req_nslabs = io_tlb_nslabs;
unsigned int order;
--
1.6.2.5

2010-02-18 17:02:16

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 06/10] swiotlb: do_map_single: abstract out swiotlb_virt_to_bus calls out.

We want to move that function out of do_map_single so that the caller
of this function does the virt->phys->bus address translation.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
lib/swiotlb.c | 23 +++++++++++++++--------
1 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 9085eab..4ab3885 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -394,20 +394,19 @@ static void swiotlb_bounce(phys_addr_t phys, char *dma_addr, size_t size,
* Allocates bounce buffer and returns its kernel virtual address.
*/
static void *
-do_map_single(struct device *hwdev, phys_addr_t phys, size_t size, int dir)
+do_map_single(struct device *hwdev, phys_addr_t phys,
+ unsigned long start_dma_addr, size_t size, int dir)
{
unsigned long flags;
char *dma_addr;
unsigned int nslots, stride, index, wrap;
int i;
- unsigned long start_dma_addr;
unsigned long mask;
unsigned long offset_slots;
unsigned long max_slots;

mask = dma_get_seg_boundary(hwdev);
- start_dma_addr = swiotlb_virt_to_bus(hwdev, io_tlb_start) & mask;
-
+ start_dma_addr = start_dma_addr & mask;
offset_slots = ALIGN(start_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;

/*
@@ -574,6 +573,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
void *ret;
int order = get_order(size);
u64 dma_mask = DMA_BIT_MASK(32);
+ unsigned long start_dma_addr;

if (hwdev && hwdev->coherent_dma_mask)
dma_mask = hwdev->coherent_dma_mask;
@@ -592,7 +592,9 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
* to GFP_DMA memory; fall back on do_map_single(), which
* will grab memory from the lowest available address range.
*/
- ret = do_map_single(hwdev, 0, size, DMA_FROM_DEVICE);
+ start_dma_addr = swiotlb_virt_to_bus(hwdev, io_tlb_start);
+ ret = do_map_single(hwdev, 0, start_dma_addr, size,
+ DMA_FROM_DEVICE);
if (!ret)
return NULL;
}
@@ -607,7 +609,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
(unsigned long long)dma_mask,
(unsigned long long)dev_addr);

- /* DMA_TO_DEVICE to avoid memcpy in unmap_single */
+ /* DMA_TO_DEVICE to avoid memcpy in do_unmap_single */
do_unmap_single(hwdev, ret, size, DMA_TO_DEVICE);
return NULL;
}
@@ -666,6 +668,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
enum dma_data_direction dir,
struct dma_attrs *attrs)
{
+ unsigned long start_dma_addr;
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dev_addr = phys_to_dma(dev, phys);
void *map;
@@ -682,7 +685,8 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
/*
* Oh well, have to allocate and map a bounce buffer.
*/
- map = do_map_single(dev, phys, size, dir);
+ start_dma_addr = swiotlb_virt_to_bus(dev, io_tlb_start);
+ map = do_map_single(dev, phys, start_dma_addr, size, dir);
if (!map) {
swiotlb_full(dev, size, dir, 1);
map = io_tlb_overflow_buffer;
@@ -836,11 +840,13 @@ int
swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
enum dma_data_direction dir, struct dma_attrs *attrs)
{
+ unsigned long start_dma_addr;
struct scatterlist *sg;
int i;

BUG_ON(dir == DMA_NONE);

+ start_dma_addr = swiotlb_virt_to_bus(hwdev, io_tlb_start);
for_each_sg(sgl, sg, nelems, i) {
phys_addr_t paddr = sg_phys(sg);
dma_addr_t dev_addr = phys_to_dma(hwdev, paddr);
@@ -848,7 +854,8 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
if (swiotlb_force ||
!dma_capable(hwdev, dev_addr, sg->length)) {
void *map = do_map_single(hwdev, sg_phys(sg),
- sg->length, dir);
+ start_dma_addr,
+ sg->length, dir);
if (!map) {
/* Don't panic here, we expect map_sg users
to do proper error handling. */
--
1.6.2.5

2010-02-18 17:06:55

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 08/10] swiotlb: Re-order the function declerations.

Move to the top the function declerations dealing with startup/shutdown
of SWIOTLB. This is in preperation of next set of patches which export
bookkeeping functions out of swiotlb.c. This will move all of them
at the top of the header file, while the dma_ops functions are the end
of the header file.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
include/linux/swiotlb.h | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index febedcf..84e7a53 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -23,7 +23,15 @@ extern int swiotlb_force;
#define IO_TLB_SHIFT 11

extern void swiotlb_init(int verbose);
+#ifdef CONFIG_SWIOTLB
+extern void __init swiotlb_free(void);
+#else
+static inline void swiotlb_free(void) { }
+#endif
+extern void swiotlb_print_info(void);
+

+/* IOMMU functions. */
extern void
*swiotlb_alloc_coherent(struct device *hwdev, size_t size,
dma_addr_t *dma_handle, gfp_t flags);
@@ -89,11 +97,4 @@ swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr);
extern int
swiotlb_dma_supported(struct device *hwdev, u64 mask);

-#ifdef CONFIG_SWIOTLB
-extern void __init swiotlb_free(void);
-#else
-static inline void swiotlb_free(void) { }
-#endif
-
-extern void swiotlb_print_info(void);
#endif /* __LINUX_SWIOTLB_H */
--
1.6.2.5

2010-02-18 17:11:58

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 07/10] swiotlb: Fix checkpatch warnings.

From: Konrad Rzeszutek <[email protected]>

I've fixed most of the checkpatch warnings except these three:

a). WARNING: consider using strict_strtoul in preference to simple_strtoul
115: FILE: swiotlb.c:115:
+ val = simple_strtoul(str, endp, 0);

b). WARNING: consider using strict_strtoul in preference to simple_strtoul
126: FILE: swiotlb.c:126:
+ io_tlb_nslabs = simple_strtoul(str, &str, 0);

c).WARNING: Use of volatile is usually wrong: see Documentation/volatile-considered-harmful.txt
151: FILE: swiotlb.c:151:
+ volatile void *address)

total: 0 errors, 3 warnings, 965 lines checked

As a) and b) are OK, we MUST use simple_strtoul. For c) the 'volatile-consider*'
document outlines that it is OK for pointers to data structrues in coherent memory
which this certainly could be, hence not fixing that.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
lib/swiotlb.c | 38 +++++++++++++++++++-------------------
1 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 4ab3885..80a2306 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -29,16 +29,15 @@
#include <linux/ctype.h>
#include <linux/highmem.h>

-#include <asm/io.h>
+#include <linux/io.h>
#include <asm/dma.h>
-#include <asm/scatterlist.h>
+#include <linux/scatterlist.h>

#include <linux/init.h>
#include <linux/bootmem.h>
#include <linux/iommu-helper.h>

-#define OFFSET(val,align) ((unsigned long) \
- ( (val) & ( (align) - 1)))
+#define OFFSET(val, align) ((unsigned long) ((val) & ((align) - 1)))

#define SLABS_PER_PAGE (1 << (PAGE_SHIFT - IO_TLB_SHIFT))

@@ -200,7 +199,7 @@ swiotlb_init_early(size_t default_size, int verbose)
*/
io_tlb_list = alloc_bootmem(io_tlb_nslabs * sizeof(int));
for (i = 0; i < io_tlb_nslabs; i++)
- io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+ io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
io_tlb_index = 0;
io_tlb_orig_addr = alloc_bootmem(io_tlb_nslabs * sizeof(phys_addr_t));

@@ -269,18 +268,16 @@ swiotlb_init_late(size_t default_size)
* between io_tlb_start and io_tlb_end.
*/
io_tlb_list = (unsigned int *)__get_free_pages(GFP_KERNEL,
- get_order(io_tlb_nslabs * sizeof(int)));
+ get_order(io_tlb_nslabs * sizeof(int)));
if (!io_tlb_list)
goto cleanup2;

for (i = 0; i < io_tlb_nslabs; i++)
- io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
+ io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
io_tlb_index = 0;

- io_tlb_orig_addr = (phys_addr_t *)
- __get_free_pages(GFP_KERNEL,
- get_order(io_tlb_nslabs *
- sizeof(phys_addr_t)));
+ io_tlb_orig_addr = (phys_addr_t *) __get_free_pages(GFP_KERNEL,
+ get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
if (!io_tlb_orig_addr)
goto cleanup3;

@@ -290,7 +287,7 @@ swiotlb_init_late(size_t default_size)
* Get the overflow emergency buffer
*/
io_tlb_overflow_buffer = (void *)__get_free_pages(GFP_DMA,
- get_order(io_tlb_overflow));
+ get_order(io_tlb_overflow));
if (!io_tlb_overflow_buffer)
goto cleanup4;

@@ -305,8 +302,8 @@ cleanup4:
get_order(io_tlb_nslabs * sizeof(phys_addr_t)));
io_tlb_orig_addr = NULL;
cleanup3:
- free_pages((unsigned long)io_tlb_list, get_order(io_tlb_nslabs *
- sizeof(int)));
+ free_pages((unsigned long)io_tlb_list,
+ get_order(io_tlb_nslabs * sizeof(int)));
io_tlb_list = NULL;
cleanup2:
io_tlb_end = NULL;
@@ -410,8 +407,8 @@ do_map_single(struct device *hwdev, phys_addr_t phys,
offset_slots = ALIGN(start_dma_addr, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;

/*
- * Carefully handle integer overflow which can occur when mask == ~0UL.
- */
+ * Carefully handle integer overflow which can occur when mask == ~0UL.
+ */
max_slots = mask + 1
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
@@ -458,7 +455,8 @@ do_map_single(struct device *hwdev, phys_addr_t phys,

for (i = index; i < (int) (index + nslots); i++)
io_tlb_list[i] = 0;
- for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
+ for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE)
+ != IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
io_tlb_list[i] = ++count;
dma_addr = io_tlb_start + (index << IO_TLB_SHIFT);

@@ -532,7 +530,8 @@ do_unmap_single(struct device *hwdev, char *dma_addr, size_t size, int dir)
* Step 2: merge the returned slots with the preceding slots,
* if available (non zero)
*/
- for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
+ for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) !=
+ IO_TLB_SEGSIZE - 1) && io_tlb_list[i]; i--)
io_tlb_list[i] = ++count;
}
spin_unlock_irqrestore(&io_tlb_lock, flags);
@@ -888,7 +887,8 @@ EXPORT_SYMBOL(swiotlb_map_sg);
*/
void
swiotlb_unmap_sg_attrs(struct device *hwdev, struct scatterlist *sgl,
- int nelems, enum dma_data_direction dir, struct dma_attrs *attrs)
+ int nelems, enum dma_data_direction dir,
+ struct dma_attrs *attrs)
{
struct scatterlist *sg;
int i;
--
1.6.2.5

2010-02-18 17:12:15

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: [PATCH 04/10] swiotlb: Make printk's use same prefix and include dev_err when possible.

Various printk's had the prefix 'DMA' in them, but not all of them.
This makes all of the printk's have the 'DMA' in them and for error
cases use the 'dev_err' macro.

Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
---
lib/swiotlb.c | 20 ++++++++++----------
1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 7b66fc3..0eb64d7 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -162,9 +162,9 @@ void swiotlb_print_info(void)
pstart = virt_to_phys(io_tlb_start);
pend = virt_to_phys(io_tlb_end);

- printk(KERN_INFO "Placing %luMB software IO TLB between %p - %p\n",
+ printk(KERN_INFO "DMA: Placing %luMB software IO TLB between %p - %p\n",
bytes >> 20, io_tlb_start, io_tlb_end);
- printk(KERN_INFO "software IO TLB at phys %#llx - %#llx\n",
+ printk(KERN_INFO "DMA: software IO TLB at phys %#llx - %#llx\n",
(unsigned long long)pstart,
(unsigned long long)pend);
}
@@ -190,7 +190,7 @@ swiotlb_init_early(size_t default_size, int verbose)
*/
io_tlb_start = alloc_bootmem_low_pages(bytes);
if (!io_tlb_start)
- panic("Cannot allocate SWIOTLB buffer");
+ panic("DMA: Cannot allocate SWIOTLB buffer");
io_tlb_end = io_tlb_start + bytes;

/*
@@ -209,7 +209,7 @@ swiotlb_init_early(size_t default_size, int verbose)
*/
io_tlb_overflow_buffer = alloc_bootmem_low(io_tlb_overflow);
if (!io_tlb_overflow_buffer)
- panic("Cannot allocate SWIOTLB overflow buffer!\n");
+ panic("DMA: Cannot allocate SWIOTLB overflow buffer!\n");
if (verbose)
swiotlb_print_info();
}
@@ -255,8 +255,8 @@ swiotlb_init_late(size_t default_size)
goto cleanup1;

if (order != get_order(bytes)) {
- printk(KERN_WARNING "Warning: only able to allocate %ld MB "
- "for software IO TLB\n", (PAGE_SIZE << order) >> 20);
+ printk(KERN_WARNING "DMA: Warning: only able to allocate %ld MB"
+ " for software IO TLB\n", (PAGE_SIZE << order) >> 20);
io_tlb_nslabs = SLABS_PER_PAGE << order;
bytes = io_tlb_nslabs << IO_TLB_SHIFT;
}
@@ -602,7 +602,8 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,

/* Confirm address can be DMA'd by device */
if (dev_addr + size - 1 > dma_mask) {
- printk("hwdev DMA mask = 0x%016Lx, dev_addr = 0x%016Lx\n",
+ dev_err(hwdev, "DMA: hwdev DMA mask = 0x%016Lx, " \
+ "dev_addr = 0x%016Lx\n",
(unsigned long long)dma_mask,
(unsigned long long)dev_addr);

@@ -640,8 +641,7 @@ swiotlb_full(struct device *dev, size_t size, int dir, int do_panic)
* When the mapping is small enough return a static buffer to limit
* the damage, or panic when the transfer is too big.
*/
- printk(KERN_ERR "DMA: Out of SW-IOMMU space for %zu bytes at "
- "device %s\n", size, dev ? dev_name(dev) : "?");
+ dev_err(dev, "DMA: Out of SW-IOMMU space for %zu bytes.", size);

if (size <= io_tlb_overflow || !do_panic)
return;
@@ -694,7 +694,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
* Ensure that the address returned is DMA'ble
*/
if (!dma_capable(dev, dev_addr, size))
- panic("map_single: bounce buffer is not DMA'ble");
+ panic("DMA: swiotlb_map_single: bounce buffer is not DMA'ble");

return dev_addr;
}
--
1.6.2.5

2010-02-25 06:54:53

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 02/10] swiotlb: Make 'setup_io_tlb_npages' accept new 'swiotlb=' syntax.

On Thu, 18 Feb 2010 11:26:58 -0500
Konrad Rzeszutek Wilk <[email protected]> wrote:

> The old syntax for 'swiotlb' is still in effect, and we extend it
> now to include the overflow buffer size. The syntax is now:
>
> swiotlb=[force,][nslabs=<pages>,][overflow=<size>] or more
> commonly know as:
>
> swiotlb=[force]
> swiotlb=[nslabs=<pages>]
> swiotlb=[overflow=<size>]
>
> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> ---
> Documentation/x86/x86_64/boot-options.txt | 6 ++++-
> lib/swiotlb.c | 36 +++++++++++++++++++++++++---
> 2 files changed, 37 insertions(+), 5 deletions(-)

'overflow' is a workaround for broken drivers (that ignores DMA
mapping errors). We have fixed such drivers and other hardware IOMMU
implementations don't have such workaround. So please don't extend
it. We should remove it instead.

2010-02-25 13:07:04

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH 02/10] swiotlb: Make 'setup_io_tlb_npages' accept new 'swiotlb=' syntax.

On Thu, Feb 25, 2010 at 03:53:20PM +0900, FUJITA Tomonori wrote:
> On Thu, 18 Feb 2010 11:26:58 -0500
> Konrad Rzeszutek Wilk <[email protected]> wrote:
>
> > The old syntax for 'swiotlb' is still in effect, and we extend it
> > now to include the overflow buffer size. The syntax is now:
> >
> > swiotlb=[force,][nslabs=<pages>,][overflow=<size>] or more
> > commonly know as:
> >
> > swiotlb=[force]
> > swiotlb=[nslabs=<pages>]
> > swiotlb=[overflow=<size>]
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > ---
> > Documentation/x86/x86_64/boot-options.txt | 6 ++++-
> > lib/swiotlb.c | 36 +++++++++++++++++++++++++---
> > 2 files changed, 37 insertions(+), 5 deletions(-)
>
> 'overflow' is a workaround for broken drivers (that ignores DMA
> mapping errors). We have fixed such drivers and other hardware IOMMU
> implementations don't have such workaround. So please don't extend
> it. We should remove it instead.

Will do. Do you remember which drivers (or e-mail threads) exhibited this
behavior so that I can do a regression test to make absolutely certain
that removing the overflow functionality won't blow something up?

2010-02-25 23:53:05

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH 02/10] swiotlb: Make 'setup_io_tlb_npages' accept new 'swiotlb=' syntax.

On Thu, 25 Feb 2010 07:49:17 -0500
Konrad Rzeszutek Wilk <[email protected]> wrote:

> On Thu, Feb 25, 2010 at 03:53:20PM +0900, FUJITA Tomonori wrote:
> > On Thu, 18 Feb 2010 11:26:58 -0500
> > Konrad Rzeszutek Wilk <[email protected]> wrote:
> >
> > > The old syntax for 'swiotlb' is still in effect, and we extend it
> > > now to include the overflow buffer size. The syntax is now:
> > >
> > > swiotlb=[force,][nslabs=<pages>,][overflow=<size>] or more
> > > commonly know as:
> > >
> > > swiotlb=[force]
> > > swiotlb=[nslabs=<pages>]
> > > swiotlb=[overflow=<size>]
> > >
> > > Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
> > > ---
> > > Documentation/x86/x86_64/boot-options.txt | 6 ++++-
> > > lib/swiotlb.c | 36 +++++++++++++++++++++++++---
> > > 2 files changed, 37 insertions(+), 5 deletions(-)
> >
> > 'overflow' is a workaround for broken drivers (that ignores DMA
> > mapping errors). We have fixed such drivers and other hardware IOMMU
> > implementations don't have such workaround. So please don't extend
> > it. We should remove it instead.
>
> Will do. Do you remember which drivers (or e-mail threads) exhibited this
> behavior so that I can do a regression test to make absolutely certain
> that removing the overflow functionality won't blow something up?

You can easily find such drivers by doing grep
{dma|pci}_map_{single|Sig} and seeing if {dma|pci}_mapping_error is
properly used or not.

There are some yet but the majority of such drivers are unlikely to be
used with IOMMU (e.g. really old drivers).

'overflow' works in only some conditions; that is, broken drivers
could lead to data corruption even with 'overflow'
functionality.

I've not reviewed that patchset carefully but my other comments are:

- exporting too generic naming such as 'do_map_single' is not a good
idea.

- please don't mix up new feature patches with 'fixing the printk
format, checkpatch warnings' (which I'm not interested in).