2005-11-06 13:11:42

by Muli Ben-Yehuda

[permalink] [raw]
Subject: [PATCH] x86-64: dma_ops for DMA mapping - K3

Hi Andi,

Here's the latest version of the dma_ops patch, updated to address
your comments. The patch is against Linus's tree as of a few minutes
ago and applies cleanly to 2.6.14-git9. Tested on AMD64 with gart,
swiotlb, nommu and iommu=off. There are still a few cleanups left, but
I'd appreciate it if this could see wider testing at this
stage. Please apply...

Summary of changes:

This patch cleans up x86_64's DMA mapping dispatching code. Right now
we have three possible IOMMU types: AGP GART, swiotlb and nommu, and
in the future we will also have Xen's x86_64 swiotlb and other HW
IOMMUs for x86_64. In order to support all of them cleanly, this
patch:

- introduces a struct dma_mapping_ops with function pointers for each
of the DMA mapping operations of gart (AMD HW IOMMU), swiotlb
(software IOMMU) and nommu (no IOMMU).

- gets rid of:

if (swiotlb)
return swiotlb_xxx();

in various places in favor of:

if (unlikely(dma_ops) && dma_ops->xxx)
return dma_ops->xxx();

in dma-mapping.h.

- in order to keep the fast path fast, if no dma_ops is specified
the default nommu_ops are used.

- PCI_DMA_BUS_IS_PHYS is now checked against the dma_ops being set
(gart enabled and swiotlb, BUS_IS_PHYS=0) or not set (nommu and gart
disabled via iommu=off, BUS_IS_PHYS=1).

Signed-Off-By: Muli Ben-Yehuda <[email protected]>

---

arch/x86_64/Kconfig | 9 -
arch/x86_64/kernel/Makefile | 4
arch/x86_64/kernel/pci-dma.c | 60 --------
arch/x86_64/kernel/pci-gart.c | 134 +++++++++++--------
arch/x86_64/kernel/pci-nommu.c | 83 ++++++++---
arch/x86_64/kernel/setup.c | 28 ++++
arch/x86_64/mm/init.c | 13 +
include/asm-x86_64/dma-mapping.h | 257 ++++++++++++++++++++++++++-----------
include/asm-x86_64/nommu-mapping.h | 62 ++++++++
include/asm-x86_64/pci.h | 11 -
include/asm-x86_64/swiotlb.h | 7 -
11 files changed, 437 insertions(+), 231 deletions(-)

diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/arch/x86_64/Kconfig hg/arch/x86_64/Kconfig
--- vanilla/arch/x86_64/Kconfig 2005-11-06 11:12:55.000000000 +0200
+++ hg/arch/x86_64/Kconfig 2005-11-06 11:14:28.000000000 +0200
@@ -348,15 +348,6 @@ config SWIOTLB
depends on GART_IOMMU
default y

-config DUMMY_IOMMU
- bool
- depends on !GART_IOMMU && !SWIOTLB
- default y
- help
- Don't use IOMMU code. This will cause problems when you have more than 4GB
- of memory and any 32-bit devices. Don't turn on unless you know what you
- are doing.
-
config X86_MCE
bool "Machine check support" if EMBEDDED
default y
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/arch/x86_64/kernel/Makefile hg/arch/x86_64/kernel/Makefile
--- vanilla/arch/x86_64/kernel/Makefile 2005-11-03 08:38:45.000000000 +0200
+++ hg/arch/x86_64/kernel/Makefile 2005-11-05 10:34:46.000000000 +0200
@@ -7,7 +7,8 @@ EXTRA_AFLAGS := -traditional
obj-y := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_x86_64.o \
x8664_ksyms.o i387.o syscall.o vsyscall.o \
- setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o
+ setup64.o bootflag.o e820.o reboot.o quirks.o i8237.o \
+ pci-nommu.o

obj-$(CONFIG_X86_MCE) += mce.o
obj-$(CONFIG_X86_MCE_INTEL) += mce_intel.o
@@ -26,7 +27,6 @@ obj-$(CONFIG_SOFTWARE_SUSPEND) += suspen
obj-$(CONFIG_CPU_FREQ) += cpufreq/
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-$(CONFIG_GART_IOMMU) += pci-gart.o aperture.o
-obj-$(CONFIG_DUMMY_IOMMU) += pci-nommu.o pci-dma.o
obj-$(CONFIG_KPROBES) += kprobes.o
obj-$(CONFIG_X86_PM_TIMER) += pmtimer.o

diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/arch/x86_64/kernel/pci-dma.c hg/arch/x86_64/kernel/pci-dma.c
--- vanilla/arch/x86_64/kernel/pci-dma.c 2005-09-08 14:06:40.000000000 +0300
+++ hg/arch/x86_64/kernel/pci-dma.c 1970-01-01 02:00:00.000000000 +0200
@@ -1,60 +0,0 @@
-/*
- * Dynamic DMA mapping support.
- */
-
-#include <linux/types.h>
-#include <linux/mm.h>
-#include <linux/string.h>
-#include <linux/pci.h>
-#include <linux/module.h>
-#include <asm/io.h>
-
-/* Map a set of buffers described by scatterlist in streaming
- * mode for DMA. This is the scatter-gather version of the
- * above pci_map_single interface. Here the scatter gather list
- * elements are each tagged with the appropriate dma address
- * and length. They are obtained via sg_dma_{address,length}(SG).
- *
- * NOTE: An implementation may be able to use a smaller number of
- * DMA address/length pairs than there are SG table elements.
- * (for example via virtual mapping capabilities)
- * The routine returns the number of addr/length pairs actually
- * used, at most nents.
- *
- * Device ownership issues as mentioned above for pci_map_single are
- * the same here.
- */
-int dma_map_sg(struct device *hwdev, struct scatterlist *sg,
- int nents, int direction)
-{
- int i;
-
- BUG_ON(direction == DMA_NONE);
- for (i = 0; i < nents; i++ ) {
- struct scatterlist *s = &sg[i];
- BUG_ON(!s->page);
- s->dma_address = virt_to_bus(page_address(s->page) +s->offset);
- s->dma_length = s->length;
- }
- return nents;
-}
-
-EXPORT_SYMBOL(dma_map_sg);
-
-/* Unmap a set of streaming mode DMA translations.
- * Again, cpu read rules concerning calls here are the same as for
- * pci_unmap_single() above.
- */
-void dma_unmap_sg(struct device *dev, struct scatterlist *sg,
- int nents, int dir)
-{
- int i;
- for (i = 0; i < nents; i++) {
- struct scatterlist *s = &sg[i];
- BUG_ON(s->page == NULL);
- BUG_ON(s->dma_address == 0);
- dma_unmap_single(dev, s->dma_address, s->dma_length, dir);
- }
-}
-
-EXPORT_SYMBOL(dma_unmap_sg);
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/arch/x86_64/kernel/pci-gart.c hg/arch/x86_64/kernel/pci-gart.c
--- vanilla/arch/x86_64/kernel/pci-gart.c 2005-10-30 11:52:08.000000000 +0200
+++ hg/arch/x86_64/kernel/pci-gart.c 2005-11-06 11:50:32.000000000 +0200
@@ -31,8 +31,6 @@
#include <asm/cacheflush.h>
#include <asm/kdebug.h>

-dma_addr_t bad_dma_address;
-
unsigned long iommu_bus_base; /* GART remapping area (physical) */
static unsigned long iommu_size; /* size of remapping area bytes */
static unsigned long iommu_pages; /* .. and in pages */
@@ -40,7 +38,6 @@ static unsigned long iommu_pages; /* ..
u32 *iommu_gatt_base; /* Remapping table */

int no_iommu;
-static int no_agp;
#ifdef CONFIG_IOMMU_DEBUG
int panic_on_overflow = 1;
int force_iommu = 1;
@@ -48,8 +45,8 @@ int force_iommu = 1;
int panic_on_overflow = 0;
int force_iommu = 0;
#endif
-int iommu_merge = 1;
-int iommu_sac_force = 0;
+
+extern int iommu_merge;

/* If this is disabled the IOMMU will use an optimized flushing strategy
of only flushing when an mapping is reused. With it true the GART is flushed
@@ -58,10 +55,6 @@ int iommu_sac_force = 0;
also seen with Qlogic at least). */
int iommu_fullflush = 1;

-/* This tells the BIO block layer to assume merging. Default to off
- because we cannot guarantee merging later. */
-int iommu_bio_merge = 0;
-
#define MAX_NB 8

/* Allocation bitmap for the remapping area */
@@ -113,6 +106,50 @@ static struct device fallback_dev = {
.dma_mask = &fallback_dev.coherent_dma_mask,
};

+static void
+gart_free_coherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle);
+
+static dma_addr_t
+gart_map_single(struct device *hwdev, void *ptr, size_t size,
+ int direction);
+
+static void
+gart_unmap_single(struct device *dev, dma_addr_t addr,size_t size,
+ int direction);
+
+static int
+gart_map_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction);
+
+static void
+gart_unmap_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction);
+
+/* these two are temporarily used by swiotlb */
+void*
+gart_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp);
+
+int gart_dma_supported(struct device *hwdev, u64 mask);
+
+static struct dma_mapping_ops gart_dma_ops = {
+ .mapping_error = NULL,
+ .alloc_coherent = gart_alloc_coherent,
+ .free_coherent = gart_free_coherent,
+ .map_single = gart_map_single,
+ .unmap_single = gart_unmap_single,
+ .sync_single_for_cpu = NULL,
+ .sync_single_for_device = NULL,
+ .sync_single_range_for_cpu = NULL,
+ .sync_single_range_for_device = NULL,
+ .sync_sg_for_cpu = NULL,
+ .sync_sg_for_device = NULL,
+ .map_sg = gart_map_sg,
+ .unmap_sg = gart_unmap_sg,
+ .dma_supported = gart_dma_supported,
+};
+
static unsigned long alloc_iommu(int size)
{
unsigned long offset, flags;
@@ -203,8 +240,8 @@ static void *dma_alloc_pages(struct devi
* Allocate memory for a coherent mapping.
*/
void *
-dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
- gfp_t gfp)
+gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp)
{
void *memory;
unsigned long dma_mask = 0;
@@ -267,7 +304,7 @@ dma_alloc_coherent(struct device *dev, s

error:
if (panic_on_overflow)
- panic("dma_alloc_coherent: IOMMU overflow by %lu bytes\n", size);
+ panic("gart_alloc_coherent: IOMMU overflow by %lu bytes\n", size);
free_pages((unsigned long)memory, get_order(size));
return NULL;
}
@@ -276,15 +313,10 @@ error:
* Unmap coherent memory.
* The caller must ensure that the device has finished accessing the mapping.
*/
-void dma_free_coherent(struct device *dev, size_t size,
+void gart_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t bus)
{
- if (swiotlb) {
- swiotlb_free_coherent(dev, size, vaddr, bus);
- return;
- }
-
- dma_unmap_single(dev, bus, size, 0);
+ gart_unmap_single(dev, bus, size, 0);
free_pages((unsigned long)vaddr, get_order(size));
}

@@ -403,14 +435,12 @@ static dma_addr_t dma_map_area(struct de
}

/* Map a single area into the IOMMU */
-dma_addr_t dma_map_single(struct device *dev, void *addr, size_t size, int dir)
+dma_addr_t gart_map_single(struct device *dev, void *addr, size_t size, int dir)
{
unsigned long phys_mem, bus;

BUG_ON(dir == DMA_NONE);

- if (swiotlb)
- return swiotlb_map_single(dev,addr,size,dir);
if (!dev)
dev = &fallback_dev;

@@ -440,7 +470,7 @@ static int dma_map_sg_nonforce(struct de
addr = dma_map_area(dev, addr, s->length, dir, 0);
if (addr == bad_dma_address) {
if (i > 0)
- dma_unmap_sg(dev, sg, i, dir);
+ gart_unmap_sg(dev, sg, i, dir);
nents = 0;
sg[0].dma_length = 0;
break;
@@ -509,7 +539,7 @@ static inline int dma_map_cont(struct sc
* DMA map all entries in a scatterlist.
* Merge chunks that have page aligned sizes into a continuous mapping.
*/
-int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int dir)
+int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents, int dir)
{
int i;
int out;
@@ -521,8 +551,6 @@ int dma_map_sg(struct device *dev, struc
if (nents == 0)
return 0;

- if (swiotlb)
- return swiotlb_map_sg(dev,sg,nents,dir);
if (!dev)
dev = &fallback_dev;

@@ -565,7 +593,7 @@ int dma_map_sg(struct device *dev, struc

error:
flush_gart(NULL);
- dma_unmap_sg(dev, sg, nents, dir);
+ gart_unmap_sg(dev, sg, nents, dir);
/* When it was forced try again unforced */
if (force_iommu)
return dma_map_sg_nonforce(dev, sg, nents, dir);
@@ -580,18 +608,13 @@ error:
/*
* Free a DMA mapping.
*/
-void dma_unmap_single(struct device *dev, dma_addr_t dma_addr,
+void gart_unmap_single(struct device *dev, dma_addr_t dma_addr,
size_t size, int direction)
{
unsigned long iommu_page;
int npages;
int i;

- if (swiotlb) {
- swiotlb_unmap_single(dev,dma_addr,size,direction);
- return;
- }
-
if (dma_addr < iommu_bus_base + EMERGENCY_PAGES*PAGE_SIZE ||
dma_addr >= iommu_bus_base + iommu_size)
return;
@@ -607,13 +630,10 @@ void dma_unmap_single(struct device *dev
/*
* Wrapper for pci_unmap_single working with scatterlists.
*/
-void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, int dir)
+void gart_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, int dir)
{
int i;
- if (swiotlb) {
- swiotlb_unmap_sg(dev,sg,nents,dir);
- return;
- }
+
for (i = 0; i < nents; i++) {
struct scatterlist *s = &sg[i];
if (!s->dma_length || !s->length)
@@ -622,7 +642,7 @@ void dma_unmap_sg(struct device *dev, st
}
}

-int dma_supported(struct device *dev, u64 mask)
+int gart_dma_supported(struct device *dev, u64 mask)
{
/* Copied from i386. Doesn't make much sense, because it will
only work for pci_alloc_coherent.
@@ -648,24 +668,17 @@ int dma_supported(struct device *dev, u6
return 1;
}

-int dma_get_cache_alignment(void)
-{
- return boot_cpu_data.x86_clflush_size;
-}
-
-EXPORT_SYMBOL(dma_unmap_sg);
-EXPORT_SYMBOL(dma_map_sg);
-EXPORT_SYMBOL(dma_map_single);
-EXPORT_SYMBOL(dma_unmap_single);
-EXPORT_SYMBOL(dma_supported);
+EXPORT_SYMBOL(gart_unmap_sg);
+EXPORT_SYMBOL(gart_map_sg);
+EXPORT_SYMBOL(gart_map_single);
+EXPORT_SYMBOL(gart_unmap_single);
+EXPORT_SYMBOL(gart_dma_supported);
EXPORT_SYMBOL(no_iommu);
EXPORT_SYMBOL(force_iommu);
-EXPORT_SYMBOL(bad_dma_address);
-EXPORT_SYMBOL(iommu_bio_merge);
-EXPORT_SYMBOL(iommu_sac_force);
-EXPORT_SYMBOL(dma_get_cache_alignment);
-EXPORT_SYMBOL(dma_alloc_coherent);
-EXPORT_SYMBOL(dma_free_coherent);
+EXPORT_SYMBOL(gart_alloc_coherent);
+EXPORT_SYMBOL(gart_free_coherent);
+
+static int no_agp;

static __init unsigned long check_iommu_size(unsigned long aper, u64 aper_size)
{
@@ -803,6 +816,7 @@ static int __init pci_iommu_init(void)
(no_agp && init_k8_gatt(&info) < 0)) {
printk(KERN_INFO "PCI-DMA: Disabling IOMMU.\n");
no_iommu = 1;
+ dma_ops = NULL;
return -1;
}

@@ -879,6 +893,10 @@ static int __init pci_iommu_init(void)

flush_gart(NULL);

+ printk(KERN_DEBUG "%s: setting dma_ops to gart_dma_ops(%p)\n",
+ __func__, &gart_dma_ops);
+ dma_ops = &gart_dma_ops;
+
return 0;
}

@@ -910,11 +928,15 @@ __init int iommu_setup(char *p)
{
int arg;

+ iommu_merge = 1;
+
while (*p) {
if (!strncmp(p,"noagp",5))
no_agp = 1;
- if (!strncmp(p,"off",3))
+ if (!strncmp(p,"off",3)) {
no_iommu = 1;
+ dma_ops = NULL;
+ }
if (!strncmp(p,"force",5)) {
force_iommu = 1;
iommu_aperture_allowed = 1;
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/arch/x86_64/kernel/pci-nommu.c hg/arch/x86_64/kernel/pci-nommu.c
--- vanilla/arch/x86_64/kernel/pci-nommu.c 2005-10-30 11:52:08.000000000 +0200
+++ hg/arch/x86_64/kernel/pci-nommu.c 2005-11-05 10:45:22.000000000 +0200
@@ -7,12 +7,15 @@
#include <asm/proto.h>
#include <asm/processor.h>

+/* these are defined here because pci-nommu.c is always compiled in */
int iommu_merge = 0;
EXPORT_SYMBOL(iommu_merge);

dma_addr_t bad_dma_address;
EXPORT_SYMBOL(bad_dma_address);

+/* This tells the BIO block layer to assume merging. Default to off
+ because we cannot guarantee merging later. */
int iommu_bio_merge = 0;
EXPORT_SYMBOL(iommu_bio_merge);

@@ -23,8 +26,8 @@ EXPORT_SYMBOL(iommu_sac_force);
* Dummy IO MMU functions
*/

-void *dma_alloc_coherent(struct device *hwdev, size_t size,
- dma_addr_t *dma_handle, gfp_t gfp)
+void *nommu_alloc_coherent(struct device *hwdev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp)
{
void *ret;
u64 mask;
@@ -50,45 +53,77 @@ void *dma_alloc_coherent(struct device *
memset(ret, 0, size);
return ret;
}
-EXPORT_SYMBOL(dma_alloc_coherent);
+EXPORT_SYMBOL(nommu_alloc_coherent);

-void dma_free_coherent(struct device *hwdev, size_t size,
+void nommu_free_coherent(struct device *hwdev, size_t size,
void *vaddr, dma_addr_t dma_handle)
{
free_pages((unsigned long)vaddr, get_order(size));
}
-EXPORT_SYMBOL(dma_free_coherent);
+EXPORT_SYMBOL(nommu_free_coherent);

-int dma_supported(struct device *hwdev, u64 mask)
+/* Map a set of buffers described by scatterlist in streaming
+ * mode for DMA. This is the scatter-gather version of the
+ * above pci_map_single interface. Here the scatter gather list
+ * elements are each tagged with the appropriate dma address
+ * and length. They are obtained via sg_dma_{address,length}(SG).
+ *
+ * NOTE: An implementation may be able to use a smaller number of
+ * DMA address/length pairs than there are SG table elements.
+ * (for example via virtual mapping capabilities)
+ * The routine returns the number of addr/length pairs actually
+ * used, at most nents.
+ *
+ * Device ownership issues as mentioned above for pci_map_single are
+ * the same here.
+ */
+int nommu_map_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction)
{
- /*
- * we fall back to GFP_DMA when the mask isn't all 1s,
- * so we can't guarantee allocations that must be
- * within a tighter range than GFP_DMA..
- * RED-PEN this won't work for pci_map_single. Caller has to
- * use GFP_DMA in the first place.
- */
- if (mask < 0x00ffffff)
- return 0;
+ int i;

- return 1;
-}
-EXPORT_SYMBOL(dma_supported);
+ BUG_ON(direction == DMA_NONE);
+ for (i = 0; i < nents; i++ ) {
+ struct scatterlist *s = &sg[i];
+ BUG_ON(!s->page);
+ s->dma_address = virt_to_bus(page_address(s->page) +s->offset);
+ s->dma_length = s->length;
+ }
+ return nents;
+}
+EXPORT_SYMBOL(nommu_map_sg);

-int dma_get_cache_alignment(void)
+/* Unmap a set of streaming mode DMA translations.
+ * Again, cpu read rules concerning calls here are the same as for
+ * pci_unmap_single() above.
+ */
+void nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
+ int nents, int dir)
{
- return boot_cpu_data.x86_clflush_size;
+ int i;
+ for (i = 0; i < nents; i++) {
+ struct scatterlist *s = &sg[i];
+ BUG_ON(s->page == NULL);
+ BUG_ON(s->dma_address == 0);
+ dma_unmap_single(dev, s->dma_address, s->dma_length, dir);
+ }
}
-EXPORT_SYMBOL(dma_get_cache_alignment);
+EXPORT_SYMBOL(nommu_unmap_sg);

-static int __init check_ram(void)
+static void check_ram(void)
{
if (end_pfn >= 0xffffffff>>PAGE_SHIFT) {
printk(
KERN_ERR "WARNING more than 4GB of memory but IOMMU not compiled in.\n"
KERN_ERR "WARNING 32bit PCI may malfunction.\n");
}
- return 0;
}
-__initcall(check_ram);

+static int __init nommu_init(void)
+{
+ check_ram();
+
+ return 0;
+}
+
+__initcall(nommu_init);
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/arch/x86_64/kernel/setup.c hg/arch/x86_64/kernel/setup.c
--- vanilla/arch/x86_64/kernel/setup.c 2005-11-02 14:32:08.000000000 +0200
+++ hg/arch/x86_64/kernel/setup.c 2005-11-05 10:42:33.000000000 +0200
@@ -42,6 +42,7 @@
#include <linux/edd.h>
#include <linux/mmzone.h>
#include <linux/kexec.h>
+#include <linux/dma-mapping.h>

#include <asm/mtrr.h>
#include <asm/uaccess.h>
@@ -60,6 +61,7 @@
#include <asm/setup.h>
#include <asm/mach_apic.h>
#include <asm/numa.h>
+#include <asm/swiotlb.h>

/*
* Machine setup..
@@ -86,6 +88,32 @@ unsigned long saved_video_mode;

#ifdef CONFIG_SWIOTLB
int swiotlb;
+
+/* these two are temporarily used by swiotlb */
+extern void*
+gart_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp);
+
+extern int gart_dma_supported(struct device *hwdev, u64 mask);
+
+struct dma_mapping_ops swiotlb_dma_ops = {
+ .mapping_error = swiotlb_dma_mapping_error,
+ .alloc_coherent = gart_alloc_coherent, /* FIXME: we are called via gart_alloc_coherent */
+ .free_coherent = swiotlb_free_coherent,
+ .map_single = swiotlb_map_single,
+ .unmap_single = swiotlb_unmap_single,
+ .sync_single_for_cpu = swiotlb_sync_single_for_cpu,
+ .sync_single_for_device = swiotlb_sync_single_for_device,
+ .sync_single_range_for_cpu = swiotlb_sync_single_range_for_cpu,
+ .sync_single_range_for_device = swiotlb_sync_single_range_for_device,
+ .sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
+ .sync_sg_for_device = swiotlb_sync_sg_for_device,
+ .map_sg = swiotlb_map_sg,
+ .unmap_sg = swiotlb_unmap_sg,
+ /* FIXME: historically we used gart_dma_supported, keep it the same way */
+ .dma_supported = gart_dma_supported,
+};
+
EXPORT_SYMBOL(swiotlb);
#endif

diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/arch/x86_64/mm/init.c hg/arch/x86_64/mm/init.c
--- vanilla/arch/x86_64/mm/init.c 2005-09-26 15:10:03.000000000 +0300
+++ hg/arch/x86_64/mm/init.c 2005-11-06 11:51:00.000000000 +0200
@@ -22,6 +22,7 @@
#include <linux/pagemap.h>
#include <linux/bootmem.h>
#include <linux/proc_fs.h>
+#include <linux/dma-mapping.h>

#include <asm/processor.h>
#include <asm/system.h>
@@ -36,6 +37,7 @@
#include <asm/mmu_context.h>
#include <asm/proto.h>
#include <asm/smp.h>
+#include <asm/dma-mapping.h>

#ifndef Dprintk
#define Dprintk(x...)
@@ -45,6 +47,9 @@
extern int swiotlb;
#endif

+struct dma_mapping_ops* dma_ops;
+EXPORT_SYMBOL(dma_ops);
+
extern char _stext[];

DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
@@ -392,8 +397,12 @@ void __init mem_init(void)
if (!iommu_aperture &&
(end_pfn >= 0xffffffff>>PAGE_SHIFT || force_iommu))
swiotlb = 1;
- if (swiotlb)
- swiotlb_init();
+ if (swiotlb) {
+ swiotlb_init();
+ dma_ops = &swiotlb_dma_ops;
+ printk(KERN_DEBUG "%s: setting dma_ops to swiotlb_dma_ops(%p)\n",
+ __func__, &swiotlb_dma_ops);
+ }
#endif

/* How many end-of-memory variables you have, grandma! */
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/include/asm-x86_64/dma-mapping.h hg/include/asm-x86_64/dma-mapping.h
--- vanilla/include/asm-x86_64/dma-mapping.h 2005-11-03 08:38:45.000000000 +0200
+++ hg/include/asm-x86_64/dma-mapping.h 2005-11-05 10:41:43.000000000 +0200
@@ -11,143 +11,255 @@
#include <asm/scatterlist.h>
#include <asm/io.h>
#include <asm/swiotlb.h>
+#include <asm/nommu-mapping.h>
+
+struct dma_mapping_ops {
+ int (*mapping_error)(dma_addr_t dma_addr);
+ void* (*alloc_coherent)(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp);
+ void (*free_coherent)(struct device *dev, size_t size,
+ void *vaddr, dma_addr_t dma_handle);
+ dma_addr_t (*map_single)(struct device *hwdev, void *ptr,
+ size_t size, int direction);
+ void (*unmap_single)(struct device *dev, dma_addr_t addr,
+ size_t size, int direction);
+ void (*sync_single_for_cpu)(struct device *hwdev,
+ dma_addr_t dma_handle,
+ size_t size, int direction);
+ void (*sync_single_for_device)(struct device *hwdev,
+ dma_addr_t dma_handle,
+ size_t size, int direction);
+ void (*sync_single_range_for_cpu)(struct device *hwdev,
+ dma_addr_t dma_handle,
+ unsigned long offset,
+ size_t size, int direction);
+ void (*sync_single_range_for_device)(struct device *hwdev,
+ dma_addr_t dma_handle,
+ unsigned long offset,
+ size_t size,
+ int direction);
+ void (*sync_sg_for_cpu)(struct device *hwdev, struct scatterlist *sg,
+ int nelems, int direction);
+ void (*sync_sg_for_device)(struct device *hwdev, struct scatterlist *sg,
+ int nelems, int direction);
+ int (*map_sg)(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction);
+ void (*unmap_sg)(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction);
+ int (*dma_supported)(struct device *hwdev, u64 mask);
+};

extern dma_addr_t bad_dma_address;
-#define dma_mapping_error(x) \
- (swiotlb ? swiotlb_dma_mapping_error(x) : ((x) == bad_dma_address))
+extern struct dma_mapping_ops* dma_ops;

-void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
- gfp_t gfp);
-void dma_free_coherent(struct device *dev, size_t size, void *vaddr,
- dma_addr_t dma_handle);
+#define have_iommu (unlikely(dma_ops != NULL))

-#ifdef CONFIG_GART_IOMMU
+static inline int dma_mapping_error(dma_addr_t dma_addr)
+{
+ if (have_iommu && dma_ops->mapping_error)
+ return dma_ops->mapping_error(dma_addr);

-extern dma_addr_t dma_map_single(struct device *hwdev, void *ptr, size_t size,
- int direction);
-extern void dma_unmap_single(struct device *dev, dma_addr_t addr,size_t size,
- int direction);
+ return (dma_addr == bad_dma_address);
+}

-#else
+static inline void*
+dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp)
+{
+ if (have_iommu && dma_ops->alloc_coherent)
+ return dma_ops->alloc_coherent(dev, size, dma_handle, gfp);

-/* No IOMMU */
+ return nommu_alloc_coherent(dev, size, dma_handle, gfp);
+}

-static inline dma_addr_t dma_map_single(struct device *hwdev, void *ptr,
- size_t size, int direction)
+static inline void
+dma_free_coherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle)
{
- dma_addr_t addr;
+ if (have_iommu && dma_ops->free_coherent) {
+ dma_ops->free_coherent(dev, size, vaddr, dma_handle);
+ return;
+ }

- if (direction == DMA_NONE)
- out_of_line_bug();
- addr = virt_to_bus(ptr);
-
- if ((addr+size) & ~*hwdev->dma_mask)
- out_of_line_bug();
- return addr;
+ nommu_free_coherent(dev, size, vaddr, dma_handle);
}

-static inline void dma_unmap_single(struct device *hwdev, dma_addr_t dma_addr,
- size_t size, int direction)
+static inline dma_addr_t
+dma_map_single(struct device *hwdev, void *ptr, size_t size,
+ int direction)
{
- if (direction == DMA_NONE)
- out_of_line_bug();
- /* Nothing to do */
+ if (have_iommu && dma_ops->map_single)
+ return dma_ops->map_single(hwdev, ptr, size, direction);
+
+ return nommu_map_single(hwdev, ptr, size, direction);
}

-#endif
+static inline void
+dma_unmap_single(struct device *dev, dma_addr_t addr,size_t size,
+ int direction)
+{
+ if (have_iommu && dma_ops->unmap_single) {
+ dma_ops->unmap_single(dev, addr, size, direction);
+ return;
+ }
+
+ nommu_unmap_single(dev, addr, size, direction);
+}

#define dma_map_page(dev,page,offset,size,dir) \
dma_map_single((dev), page_address(page)+(offset), (size), (dir))

-static inline void dma_sync_single_for_cpu(struct device *hwdev,
- dma_addr_t dma_handle,
- size_t size, int direction)
+#define dma_unmap_page dma_unmap_single
+
+static inline void
+dma_sync_single_for_cpu(struct device *hwdev, dma_addr_t dma_handle,
+ size_t size, int direction)
{
+ void (*f)(struct device *hwdev, dma_addr_t dma_handle,
+ size_t size, int direction);
+
if (direction == DMA_NONE)
out_of_line_bug();

- if (swiotlb)
- return swiotlb_sync_single_for_cpu(hwdev,dma_handle,size,direction);
+ if (have_iommu && dma_ops->sync_single_for_cpu) {
+ f = dma_ops->sync_single_for_cpu;
+ f(hwdev, dma_handle, size, direction);
+ return;
+ }

flush_write_buffers();
}

-static inline void dma_sync_single_for_device(struct device *hwdev,
- dma_addr_t dma_handle,
- size_t size, int direction)
+static inline void
+dma_sync_single_for_device(struct device *hwdev, dma_addr_t dma_handle,
+ size_t size, int direction)
{
- if (direction == DMA_NONE)
+ void (*f)(struct device *hwdev, dma_addr_t dma_handle,
+ size_t size, int direction);
+
+ if (direction == DMA_NONE)
out_of_line_bug();

- if (swiotlb)
- return swiotlb_sync_single_for_device(hwdev,dma_handle,size,direction);
+ if (have_iommu && dma_ops->sync_single_for_device) {
+ f = dma_ops->sync_single_for_device;
+ f(hwdev, dma_handle, size, direction);
+ return;
+ }

flush_write_buffers();
}

-static inline void dma_sync_single_range_for_cpu(struct device *hwdev,
- dma_addr_t dma_handle,
- unsigned long offset,
- size_t size, int direction)
+static inline void
+dma_sync_single_range_for_cpu(struct device *hwdev, dma_addr_t dma_handle,
+ unsigned long offset, size_t size, int direction)
{
+ void (*f)(struct device *hwdev, dma_addr_t dma_handle,
+ unsigned long offset, size_t size, int direction);
+
if (direction == DMA_NONE)
out_of_line_bug();

- if (swiotlb)
- return swiotlb_sync_single_range_for_cpu(hwdev,dma_handle,offset,size,direction);
+ if (have_iommu && dma_ops->sync_single_range_for_cpu) {
+ f = dma_ops->sync_single_range_for_cpu;
+ f(hwdev, dma_handle, offset, size, direction);
+ return;
+ }

flush_write_buffers();
}

-static inline void dma_sync_single_range_for_device(struct device *hwdev,
- dma_addr_t dma_handle,
- unsigned long offset,
- size_t size, int direction)
+static inline void
+dma_sync_single_range_for_device(struct device *hwdev, dma_addr_t dma_handle,
+ unsigned long offset, size_t size, int direction)
{
- if (direction == DMA_NONE)
+ void (*f)(struct device *hwdev, dma_addr_t dma_handle,
+ unsigned long offset, size_t size, int direction);
+
+ if (direction == DMA_NONE)
out_of_line_bug();

- if (swiotlb)
- return swiotlb_sync_single_range_for_device(hwdev,dma_handle,offset,size,direction);
+ if (have_iommu && dma_ops->sync_single_range_for_device) {
+ f = dma_ops->sync_single_range_for_device;
+ f(hwdev, dma_handle, offset, size, direction);
+ return;
+ }

flush_write_buffers();
}

-static inline void dma_sync_sg_for_cpu(struct device *hwdev,
- struct scatterlist *sg,
- int nelems, int direction)
+static inline void
+dma_sync_sg_for_cpu(struct device *hwdev, struct scatterlist *sg,
+ int nelems, int direction)
{
+ void (*f)(struct device *hwdev, struct scatterlist *sg,
+ int nelems, int direction);
+
if (direction == DMA_NONE)
out_of_line_bug();

- if (swiotlb)
- return swiotlb_sync_sg_for_cpu(hwdev,sg,nelems,direction);
+ if (have_iommu && dma_ops->sync_sg_for_cpu) {
+ f = dma_ops->sync_sg_for_cpu;
+ f(hwdev, sg, nelems, direction);
+ return;
+ }

flush_write_buffers();
}

-static inline void dma_sync_sg_for_device(struct device *hwdev,
- struct scatterlist *sg,
- int nelems, int direction)
+static inline void
+dma_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg,
+ int nelems, int direction)
{
+ void (*f)(struct device *hwdev, struct scatterlist *sg,
+ int nelems, int direction);
+
if (direction == DMA_NONE)
out_of_line_bug();

- if (swiotlb)
- return swiotlb_sync_sg_for_device(hwdev,sg,nelems,direction);
+ if (have_iommu && dma_ops->sync_sg_for_device) {
+ f = dma_ops->sync_sg_for_device;
+ f(hwdev, sg, nelems, direction);
+ return;
+ }

flush_write_buffers();
}

-extern int dma_map_sg(struct device *hwdev, struct scatterlist *sg,
- int nents, int direction);
-extern void dma_unmap_sg(struct device *hwdev, struct scatterlist *sg,
- int nents, int direction);
+static inline int
+dma_map_sg(struct device *hwdev, struct scatterlist *sg, int nents, int direction)
+{
+ if (have_iommu && dma_ops->map_sg)
+ return dma_ops->map_sg(hwdev, sg, nents, direction);

-#define dma_unmap_page dma_unmap_single
+ return nommu_map_sg(hwdev, sg, nents, direction);
+}
+
+static inline void
+dma_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents,
+ int direction)
+{
+ if (have_iommu && dma_ops->unmap_sg) {
+ dma_ops->unmap_sg(hwdev, sg, nents, direction);
+ return;
+ }
+
+ nommu_unmap_sg(hwdev, sg, nents, direction);
+}
+
+static inline int dma_supported(struct device *hwdev, u64 mask)
+{
+ if (have_iommu && dma_ops->dma_supported)
+ return dma_ops->dma_supported(hwdev, mask);
+
+ return nommu_dma_supported(hwdev, mask);
+}
+
+/* same for gart, swiotlb, and nommu */
+static inline int dma_get_cache_alignment(void)
+{
+ return boot_cpu_data.x86_clflush_size;
+}

-extern int dma_supported(struct device *hwdev, u64 mask);
-extern int dma_get_cache_alignment(void);
#define dma_is_consistent(h) 1

static inline int dma_set_mask(struct device *dev, u64 mask)
@@ -158,9 +270,10 @@ static inline int dma_set_mask(struct de
return 0;
}

-static inline void dma_cache_sync(void *vaddr, size_t size, enum dma_data_direction dir)
+static inline void
+dma_cache_sync(void *vaddr, size_t size, enum dma_data_direction dir)
{
flush_write_buffers();
}

-#endif
+#endif /* _X8664_DMA_MAPPING_H */
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/include/asm-x86_64/nommu-mapping.h hg/include/asm-x86_64/nommu-mapping.h
--- vanilla/include/asm-x86_64/nommu-mapping.h 1970-01-01 02:00:00.000000000 +0200
+++ hg/include/asm-x86_64/nommu-mapping.h 2005-11-05 10:37:37.000000000 +0200
@@ -0,0 +1,62 @@
+#ifndef _ASM_NOMMU_MAPPING_H
+#define _ASM_NOMMU_MAPPING_H 1
+
+#include <linux/config.h>
+
+/* GART DMA mapping implemenation */
+extern void*
+nommu_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t gfp);
+
+extern void
+nommu_free_coherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle);
+
+static inline dma_addr_t
+nommu_map_single(struct device *hwdev, void *ptr, size_t size, int direction)
+{
+ dma_addr_t addr;
+
+ if (direction == DMA_NONE)
+ out_of_line_bug();
+ addr = virt_to_bus(ptr);
+
+ if ((addr+size) & ~*hwdev->dma_mask)
+ out_of_line_bug();
+ return addr;
+}
+
+static inline void
+nommu_unmap_single(struct device *dev, dma_addr_t addr,size_t size,
+ int direction)
+{
+ if (direction == DMA_NONE)
+ out_of_line_bug();
+ /* Nothing to do */
+}
+
+extern int
+nommu_map_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction);
+
+extern void
+nommu_unmap_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction);
+
+static inline int
+nommu_dma_supported(struct device *hwdev, u64 mask)
+{
+ /*
+ * we fall back to GFP_DMA when the mask isn't all 1s,
+ * so we can't guarantee allocations that must be
+ * within a tighter range than GFP_DMA..
+ * RED-PEN this won't work for pci_map_single. Caller has to
+ * use GFP_DMA in the first place.
+ */
+ if (mask < 0x00ffffff)
+ return 0;
+
+ return 1;
+}
+
+#endif /* _ASM_NOMMU_MAPPING_H */
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/include/asm-x86_64/pci.h hg/include/asm-x86_64/pci.h
--- vanilla/include/asm-x86_64/pci.h 2005-10-28 16:49:06.000000000 +0200
+++ hg/include/asm-x86_64/pci.h 2005-11-06 12:01:10.000000000 +0200
@@ -42,18 +42,20 @@ int pcibios_set_irq_routing(struct pci_d
#include <asm/scatterlist.h>
#include <linux/string.h>
#include <asm/page.h>
+#include <linux/dma-mapping.h> /* for have_iommu */

extern int iommu_setup(char *opt);

-#ifdef CONFIG_GART_IOMMU
/* The PCI address space does equal the physical memory
* address space. The networking and block device layers use
* this boolean for bounce buffer decisions
*
- * On AMD64 it mostly equals, but we set it to zero to tell some subsystems
- * that an IOMMU is available.
+ * On AMD64 it mostly equals, but we set it to zero if a hardware
+ * IOMMU (gart) of sotware IOMMU (swiotlb) is available.
*/
-#define PCI_DMA_BUS_IS_PHYS (no_iommu ? 1 : 0)
+#define PCI_DMA_BUS_IS_PHYS (have_iommu ? 0 : 1)
+
+#ifdef CONFIG_GART_IOMMU

/*
* x86-64 always supports DAC, but sometimes it is useful to force
@@ -79,7 +81,6 @@ extern int iommu_sac_force;
#else
/* No IOMMU */

-#define PCI_DMA_BUS_IS_PHYS 1
#define pci_dac_dma_supported(pci_dev, mask) 1

#define DECLARE_PCI_UNMAP_ADDR(ADDR_NAME)
diff -Naurp --exclude-from /home/muli/w/dontdiff vanilla/include/asm-x86_64/swiotlb.h hg/include/asm-x86_64/swiotlb.h
--- vanilla/include/asm-x86_64/swiotlb.h 2005-11-03 08:38:45.000000000 +0200
+++ hg/include/asm-x86_64/swiotlb.h 2005-11-04 15:18:18.000000000 +0200
@@ -3,6 +3,8 @@

#include <linux/config.h>

+#include <asm/dma-mapping.h>
+
/* SWIOTLB interface */

extern dma_addr_t swiotlb_map_single(struct device *hwdev, void *ptr, size_t size,
@@ -38,6 +40,9 @@ extern void *swiotlb_alloc_coherent (str
dma_addr_t *dma_handle, gfp_t flags);
extern void swiotlb_free_coherent (struct device *hwdev, size_t size,
void *vaddr, dma_addr_t dma_handle);
+extern int swiotlb_dma_supported(struct device *hwdev, u64 mask);
+
+extern struct dma_mapping_ops swiotlb_dma_ops;

#ifdef CONFIG_SWIOTLB
extern int swiotlb;
@@ -45,4 +50,4 @@ extern int swiotlb;
#define swiotlb 0
#endif

-#endif
+#endif /* _ASM_SWTIOLB_H */


--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/


2005-11-06 14:59:49

by Matti Aarnio

[permalink] [raw]
Subject: Re: [PATCH] x86-64: dma_ops for DMA mapping - K3

On Sun, Nov 06, 2005 at 03:11:12PM +0200, Muli Ben-Yehuda wrote:
> Hi Andi,
>
> Here's the latest version of the dma_ops patch, updated to address
> your comments. The patch is against Linus's tree as of a few minutes
> ago and applies cleanly to 2.6.14-git9. Tested on AMD64 with gart,
> swiotlb, nommu and iommu=off. There are still a few cleanups left, but
> I'd appreciate it if this could see wider testing at this
> stage. Please apply...

Works mostly.
There is some problem which I am not sure of is it related
to this at all or not. BUG report attached below.

My machine has 4 GB memory, and is ASUS A8N-SLI board with
Amd64 Athlon X2 processor on it.

/Matti Aarnio


PCI: Setting latency timer of device 0000:00:02.0 to 64
ohci_hcd 0000:00:02.0: OHCI Host Controller
ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
ohci_hcd 0000:00:02.0: irq 58, io mem 0xca102000
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 10 ports detected
usb 1-2: new high speed USB device using ehci_hcd and address 2
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/kernel/traps.c:336
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: ohci_hcd ehci_hcd tuner tvaudio msp3400 bttv video_buf i2c_algo_bit v4l2_common btcx_risc tveeprom videodev ns558 gameport parport_pc parport i2c_nforce2 i2c_core shpchp snd_mpu401 snd_mpu401_uart snd_rawmidi snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc ext3 jbd raid1 sata_nv sata_sil libata sd_mod scsi_mod
Pid: 17, comm: khubd Not tainted 2.6.14-git9 #1
RIP: 0010:[<ffffffff801108d0>] <ffffffff801108d0>{out_of_line_bug+0}
RSP: 0018:ffff810037eb7be0 EFLAGS: 00010206
RAX: ffffffff00000000 RBX: ffff81013dbee070 RCX: 000000013db02928
RDX: 000000013db02930 RSI: ffff81013db02928 RDI: ffff81000c90ad50
RBP: 000000013db02928 R08: 0000000000000000 R09: ffff81013dbee070
R10: 000000000000000f R11: ffffffff802320b0 R12: ffff81013eff3178
R13: ffff81013dbee09c R14: ffff81013de7cad8 R15: 0000000000000010
FS: 00002aaaaaacf3f0(0000) GS:ffffffff80581800(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aaaaab8800f CR3: 000000013dab2000 CR4: 00000000000006e0
Process khubd (pid: 17, threadinfo ffff810037eb6000, task ffff810037e7e0c0)
Stack: ffffffff80232842 0000000000000400 ffffffff8051c4e0 0000000000000246
0000000000000246 000000000000537c ffffffff8013c358 ffff810037eb7cd8
ffffffff803c36c0 ffffffff8051c523
Call Trace:<ffffffff80232842>{hcd_submit_urb+1938} <ffffffff8013c358>{release_console_sem+424}
<ffffffff8013cfe0>{vprintk+800} <ffffffff802336bf>{usb_start_wait_urb+143}
<ffffffff8017305f>{poison_obj+63} <ffffffff80172e1c>{dbg_redzone1+28}
<ffffffff80233a6e>{usb_control_msg+254} <ffffffff8022f096>{hub_port_init+614}
<ffffffff8023018c>{hub_thread+1548} <ffffffff80385d7f>{thread_return+95}
<ffffffff8017305f>{poison_obj+63} <ffffffff80153b80>{autoremove_wake_function+0}
<ffffffff8022fb80>{hub_thread+0} <ffffffff80153620>{keventd_create_kthread+0}
<ffffffff801537eb>{kthread+219} <ffffffff801380fd>{schedule_tail+77}
<ffffffff8010fdf6>{child_rip+8} <ffffffff80153620>{keventd_create_kthread+0}
<ffffffff80153710>{kthread+0} <ffffffff8010fdee>{child_rip+0}

Code: 0f 0b 68 ad 46 3a 80 c2 50 01 c3 66 66 90 66 90 48 83 ec 18
RIP <ffffffff801108d0>{out_of_line_bug+0} RSP <ffff810037eb7be0>

2005-11-06 15:51:01

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: dma_ops for DMA mapping - K3

On Sun, Nov 06, 2005 at 04:59:47PM +0200, Matti Aarnio wrote:
> On Sun, Nov 06, 2005 at 03:11:12PM +0200, Muli Ben-Yehuda wrote:
> > Hi Andi,
> >
> > Here's the latest version of the dma_ops patch, updated to address
> > your comments. The patch is against Linus's tree as of a few minutes
> > ago and applies cleanly to 2.6.14-git9. Tested on AMD64 with gart,
> > swiotlb, nommu and iommu=off. There are still a few cleanups left, but
> > I'd appreciate it if this could see wider testing at this
> > stage. Please apply...
>
> Works mostly.
> There is some problem which I am not sure of is it related
> to this at all or not. BUG report attached below.

I think this is the same problem as the one referenced in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113063216900541&w=2
which predates my changes. The error message looks different
because the "no IOMMU" code path now goes through nommu, rather than
gart with no_iommu set, but it appears to be the same underlying
problem. Can you please send me your vmlinux and / or apply this debug
patch and post the results to verify?

Thanks,
Muli

Exporting patch:
# HG changeset patch
# User Muli Ben-Yehuda <[email protected]>
# Node ID 64c901d4194adab05d2401208bb24e69d9993dba
# Parent d75ad581e6f925a389f7f69bf3ec725459c77f9f
debug patch for Matti Aarnio

diff -r d75ad581e6f925a389f7f69bf3ec725459c77f9f -r 64c901d4194adab05d2401208bb24e69d9993dba drivers/usb/core/hcd.c
--- a/drivers/usb/core/hcd.c Sun Nov 6 10:08:10 2005
+++ b/drivers/usb/core/hcd.c Sun Nov 6 15:48:07 2005
@@ -1183,14 +1183,32 @@
*/
if (hcd->self.controller->dma_mask) {
if (usb_pipecontrol (urb->pipe)
- && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP))
+ && !(urb->transfer_flags & URB_NO_SETUP_DMA_MAP)) {
+ printk("%s: calling dma_map_single(direction = %d, "
+ "ptr = %p, virt_to_bus(ptr) = 0x%lx, "
+ "size = %d, hwdev->dma_mask = 0x%Lx)\n",
+ __func__, DMA_TO_DEVICE,
+ urb->setup_packet,
+ virt_to_bus(urb->setup_packet),
+ (int)sizeof(struct usb_ctrlrequest),
+ *hcd->self.controller->dma_mask);
urb->setup_dma = dma_map_single (
hcd->self.controller,
urb->setup_packet,
sizeof (struct usb_ctrlrequest),
DMA_TO_DEVICE);
+ }
if (urb->transfer_buffer_length != 0
&& !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
+ printk("%s: calling dma_map_single(direction = %d, "
+ "ptr = %p, virt_to_bus(ptr) = 0x%lx, "
+ "size = %d, hwdev->dma_mask = 0x%Lx)\n",
+ __func__, (usb_pipein (urb->pipe) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE),
+ urb->transfer_buffer,
+ virt_to_bus(urb->transfer_buffer),
+ urb->transfer_buffer_length,
+ *hcd->self.controller->dma_mask);
urb->transfer_dma = dma_map_single (
hcd->self.controller,
urb->transfer_buffer,
diff -r d75ad581e6f925a389f7f69bf3ec725459c77f9f -r 64c901d4194adab05d2401208bb24e69d9993dba include/asm-x86_64/nommu-mapping.h
--- a/include/asm-x86_64/nommu-mapping.h Sun Nov 6 10:08:10 2005
+++ b/include/asm-x86_64/nommu-mapping.h Sun Nov 6 15:48:07 2005
@@ -21,8 +21,13 @@
out_of_line_bug();
addr = virt_to_bus(ptr);

- if ((addr+size) & ~*hwdev->dma_mask)
+ if ((addr+size) & ~*hwdev->dma_mask) {
+ printk("%s: addr 0x%Lx, size %d, ~mask 0x%Lx, "
+ "(addr+size) & ~*hwdev->dma_mask = 0x%lx\n",
+ __func__, addr, (int)size, ~*hwdev->dma_mask,
+ (addr+size) & ~*hwdev->dma_mask);
out_of_line_bug();
+ }
return addr;
}


--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

2005-11-06 16:46:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: dma_ops for DMA mapping - K3

On Sunday 06 November 2005 15:59, Matti Aarnio wrote:
> On Sun, Nov 06, 2005 at 03:11:12PM +0200, Muli Ben-Yehuda wrote:
> > Hi Andi,
> >
> > Here's the latest version of the dma_ops patch, updated to address
> > your comments. The patch is against Linus's tree as of a few minutes
> > ago and applies cleanly to 2.6.14-git9. Tested on AMD64 with gart,
> > swiotlb, nommu and iommu=off. There are still a few cleanups left, but
> > I'd appreciate it if this could see wider testing at this
> > stage. Please apply...
>
> Works mostly.
> There is some problem which I am not sure of is it related
> to this at all or not.

You can easily find out: Does it happen without the patch?
If yes please post the full boot log.

Thanks,
-Andi

2005-11-06 17:06:51

by Matti Aarnio

[permalink] [raw]
Subject: Re: [PATCH] x86-64: dma_ops for DMA mapping - K3

On Sun, Nov 06, 2005 at 05:45:53PM +0100, Andi Kleen wrote:
> On Sunday 06 November 2005 15:59, Matti Aarnio wrote:
> > On Sun, Nov 06, 2005 at 03:11:12PM +0200, Muli Ben-Yehuda wrote:
> > > Hi Andi,
> > >
> > > Here's the latest version of the dma_ops patch, updated to address
> > > your comments. The patch is against Linus's tree as of a few minutes
> > > ago and applies cleanly to 2.6.14-git9. Tested on AMD64 with gart,
> > > swiotlb, nommu and iommu=off. There are still a few cleanups left, but
> > > I'd appreciate it if this could see wider testing at this
> > > stage. Please apply...
> >
> > Works mostly.
> > There is some problem which I am not sure of is it related
> > to this at all or not.
>
> You can easily find out: Does it happen without the patch?
> If yes please post the full boot log.

git7 blows up like git2, git9 plain was not tested at all.
I am applying the debug patch and compiling right now for a test.

> Thanks,
> -Andi

/Matti Aarnio

2005-11-06 17:19:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86-64: dma_ops for DMA mapping - K3

On Sunday 06 November 2005 18:06, Matti Aarnio wrote:

>
> git7 blows up like git2, git9 plain was not tested at all.
> I am applying the debug patch and compiling right now for a test.

Please just test plain git9 and post full boot log if it fails.

-Andi

2005-11-07 08:57:37

by Muli Ben-Yehuda

[permalink] [raw]
Subject: Re: [PATCH] x86-64: dma_ops for DMA mapping - K3

On Sun, Nov 06, 2005 at 06:18:44PM +0100, Andi Kleen wrote:
> On Sunday 06 November 2005 18:06, Matti Aarnio wrote:
>
> >
> > git7 blows up like git2, git9 plain was not tested at all.
> > I am applying the debug patch and compiling right now for a test.
>
> Please just test plain git9 and post full boot log if it fails.

On git2 and git7 (and probably also git9) we die on Matti's machine
because we fall back to gart with no_iommu set when we find we don't
have an IOMMU in init_k8gatt(). That causes to a panic when we try to
DMA above 4GB due to the USB controller being only 32-bit-DMA
capable.

I think that the right thing to do at that point is fall back to
swiotlb[0]. Now, We could theoretically switch to swiotlb in
pci_iommu_init() when init_k8_gatt() fails, but at that point it's too
late to call swiotlb_init() (it causes a crash in the bootmem
allocator in my tests). That leaves the following options:

- realize earlier, when we can still call the standard swiotlb_init()
that we are going to need it (is this possible?)

- switch to having our own swiotlb_init() which relies on either
GPF_DMA32 or allocating the swiotlb scratch space statically in
vmlinux (thanks Arjan), and use it when init_k8_gatt() fails.

- call swiotb_init() unconditionally in mem_init(), and free it later
if we don't need it.

Thoughts?

As a side note, none of this is related to my dma_ops patch - we die
on Matti's machine the same way with and without it and we can fix it
roughly the same way with and without it. I do think the dma_ops patch
makes the potential fixes cleaner, though.

[0] Matti machine dies at the moment even with swiotlb=soft. I think
that's a seperate, orthogonal bug.

Cheers,
Muli
--
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/