Hi,
this patchset introduces code to debug drivers usage of the DMA-API.
Tests with hardware IOMMUs have shown several bugs in drivers regarding
the usage of that API.
Problems were found especially in network card drivers.
These bugs often don't show up or have any negative impact if there is
no hardware IOMMU in use in the system. But with an hardware IOMMU these
bugs turn the hardware unusable or, in the worst case, cause data
corruption on devices which are managed by other (good) drivers.
With the code these patches introduce driver developers can find several
bugs of misusing the DMA-API in their drivers. But be aware, it can not
find all possible bugs. If it finds a problem it prints out messages
like
tg3 0000:08:04.0: PCI-DMA: device driver tries to free DMA memory it has not allocated [device address=0x000000042f0f3ae7] [size=48 bytes]
Pid: 6285, comm: bash Not tainted 2.6.28-rc5-00176-g6ae6379-dirty #6
Call Trace:
<IRQ> [<ffffffff80221276>] check_unmap+0x52/0x1ce
[<ffffffff80221af0>] debug_unmap_single+0x61/0xa4
[<ffffffff8053d396>] skb_dma_unmap+0xf2/0x10c
[<ffffffff8040a986>] tg3_poll+0xe8/0x822
[<ffffffff803abe2f>] mix_pool_bytes_extract+0x5c/0x155
[<ffffffff80540e42>] net_rx_action+0x9d/0x170
[<ffffffff8023bc3c>] __do_softirq+0x7a/0x13d
[<ffffffff8020c3cc>] call_softirq+0x1c/0x28
[<ffffffff8020d8ac>] do_softirq+0x2c/0x68
[<ffffffff8023bb7c>] irq_exit+0x3f/0x85
[<ffffffff8020db5e>] do_IRQ+0x14d/0x16f
[<ffffffff8020b686>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff80368187>] memcmp+0xb/0x22
[<ffffffff8029f11e>] __d_lookup+0xb9/0xf9
[<ffffffff80298166>] do_lookup+0x2a/0x1c1
[<ffffffff8029979f>] __link_path_walk+0x331/0xc0d
[<ffffffff8029a0c1>] path_walk+0x46/0x8b
[<ffffffff8029a245>] do_path_lookup+0xff/0x121
[<ffffffff8029ad2d>] path_lookup_open+0x54/0x95
[<ffffffff8029af7c>] do_filp_open+0x9d/0x782
[<ffffffff802a234e>] alloc_fd+0x69/0x10c
[<ffffffff8028fda9>] do_sys_open+0x48/0xcc
[<ffffffff8020b17b>] system_call_fastpath+0x16/0x1b
or (from another machine with AMD IOMMU):
ixgbe 0000:02:00.0: PCI-DMA: device driver frees DMA memory with different size [device address=0x0000000003fed812] [map size=258 bytes] [unmap size=256 bytes]
Pid: 6178, comm: rmmod Not tainted 2.6.28-rc5 #4
Call Trace:
[<ffffffff8022a2ae>] iommu_queue_inv_iommu_pages+0x5e/0x70
[<ffffffff80225956>] check_unmap+0x1c6/0x240
[<ffffffff80225ff5>] debug_unmap_single+0xb5/0x110
[<ffffffffa0213997>] ixgbe_clean_rx_ring+0x147/0x220
[<ffffffffa0214d7d>] ixgbe_down+0x2fd/0x3d0 [ixgbe]
[<ffffffffa02150b3>] ixgbe_close+0x13/0xc0 [ixgbe]
[<ffffffff80431326>] dev_close+0x56/0xa0
[<ffffffff804313b3>] rollback_registered+0x43/0x220
[<ffffffff804315a5>] unregister_netdevice+0x15/0x60
[<ffffffff80431601>] unregister_netdev+0x11/0x20
[<ffffffffa021aef8>] ixgbe_remove+0x48/0x16e [ixgbe]
[<ffffffff80386ffc>] pci_device_remove+0x2c/0x60
[<ffffffff803ef929>] __device_release_driver+0x99/0x100
[<ffffffff803efa48>] driver_detach+0xb8/0xc0
[<ffffffff803eea6e>] bus_remove_driver+0x8e/0xd0
[<ffffffff80387374>] pci_unregister_driver+0x34/0x90
[<ffffffff8026c6c7>] sys_delete_module+0x1c7/0x2a0
[<ffffffff802a9ce9>] do_munmap+0x349/0x390
[<ffffffff80374481>] __up_write+0x21/0x150
[<ffffffff8020c30b>] system_call_fastpath+0x16/0x1b
This way driver developers get an idea where the problem is in their
code.
Please review and send any objections or, if there are none, consider
for inclusion ;)
Thanks,
Joerg
Impact: adds a new header file for DMA-API debugging
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma_debug.h | 41 ++++++++++++++++++++++++++++++++++++++
1 files changed, 41 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/include/asm/dma_debug.h
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
new file mode 100644
index 0000000..d79f024
--- /dev/null
+++ b/arch/x86/include/asm/dma_debug.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright (C) 2008 Advanced Micro Devices, Inc.
+ *
+ * Author: Joerg Roedel <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ASM_X86_DMA_DEBUG
+#define __ASM_X86_DMA_DEBUG
+
+/* Allocation flags */
+#define DMA_DEBUG_SINGLE 0
+#define DMA_DEBUG_SG 1
+#define DMA_DEBUG_COHERENT 2
+
+struct device;
+struct list_head;
+
+struct dma_debug_entry {
+ struct list_head list;
+ struct device *dev;
+ int type;
+ void *cpu_addr;
+ u64 dev_addr;
+ u64 size;
+ int direction;
+};
+
+#endif /* __ASM_X86_DMA_DEBUG */
--
1.5.6.4
Impact: detect bugs in map/unmap_single usage
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 9 +++++-
arch/x86/include/asm/dma_debug.h | 20 +++++++++++++
arch/x86/kernel/pci-dma-debug.c | 55 ++++++++++++++++++++++++++++++++++++
3 files changed, 83 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 83d7b7d..c9bead2 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -98,9 +98,14 @@ dma_map_single(struct device *hwdev, void *ptr, size_t size,
int direction)
{
struct dma_mapping_ops *ops = get_dma_ops(hwdev);
+ dma_addr_t addr;
BUG_ON(!valid_dma_direction(direction));
- return ops->map_single(hwdev, virt_to_phys(ptr), size, direction);
+ addr = ops->map_single(hwdev, virt_to_phys(ptr), size, direction);
+
+ debug_map_single(hwdev, ptr, size, direction, addr);
+
+ return addr;
}
static inline void
@@ -112,6 +117,8 @@ dma_unmap_single(struct device *dev, dma_addr_t addr, size_t size,
BUG_ON(!valid_dma_direction(direction));
if (ops->unmap_single)
ops->unmap_single(dev, addr, size, direction);
+
+ debug_unmap_single(dev, addr, size, direction);
}
static inline int
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
index f2c3d53..ba4d9b7 100644
--- a/arch/x86/include/asm/dma_debug.h
+++ b/arch/x86/include/asm/dma_debug.h
@@ -43,6 +43,14 @@ struct dma_debug_entry {
extern
void dma_debug_init(void);
+extern
+void debug_map_single(struct device *dev, void *ptr, size_t size,
+ int direction, dma_addr_t dma_addr);
+
+extern
+void debug_unmap_single(struct device *dev, dma_addr_t addr,
+ size_t size, int direction);
+
#else /* CONFIG_DMA_API_DEBUG */
static inline
@@ -50,6 +58,18 @@ void dma_debug_init(void)
{
}
+static inline
+void debug_map_single(struct device *dev, void *ptr, size_t size,
+ int direction, dma_addr_t dma_addr)
+{
+}
+
+static inline
+void debug_unmap_single(struct device *dev, dma_addr_t addr,
+ size_t size, int direction)
+{
+}
+
#endif /* CONFIG_DMA_API_DEBUG */
#endif /* __ASM_X86_DMA_DEBUG */
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
index fc95631..9afb6c8 100644
--- a/arch/x86/kernel/pci-dma-debug.c
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -234,3 +234,58 @@ void dma_debug_init(void)
printk(KERN_INFO "PCI-DMA: DMA API debugging enabled by kernel config\n");
}
+void debug_map_single(struct device *dev, void *ptr, size_t size,
+ int direction, dma_addr_t dma_addr)
+{
+ unsigned long flags;
+ struct dma_debug_entry *entry;
+
+ if (dma_addr == bad_dma_address)
+ return;
+
+ entry = dma_entry_alloc();
+ if (!entry)
+ return;
+
+ entry->dev = dev;
+ entry->type = DMA_DEBUG_SINGLE;
+ entry->cpu_addr = ptr;
+ entry->dev_addr = dma_addr;
+ entry->size = size;
+ entry->direction = direction;
+
+ spin_lock_irqsave(&dma_lock, flags);
+ add_dma_entry(entry);
+ spin_unlock_irqrestore(&dma_lock, flags);
+}
+EXPORT_SYMBOL(debug_map_single);
+
+void debug_unmap_single(struct device *dev, dma_addr_t addr,
+ size_t size, int direction)
+{
+ unsigned long flags;
+ struct dma_debug_entry ref = {
+ .type = DMA_DEBUG_SINGLE,
+ .dev = dev,
+ .dev_addr = addr,
+ .size = size,
+ .direction = direction,
+ };
+ struct dma_debug_entry *entry;
+
+ if (addr == bad_dma_address)
+ return;
+
+ spin_lock_irqsave(&dma_lock, flags);
+
+ entry = find_dma_entry(&ref);
+
+ if (check_unmap(&ref, entry)) {
+ remove_dma_entry(entry);
+ dma_entry_free(entry);
+ }
+
+ spin_unlock_irqrestore(&dma_lock, flags);
+}
+EXPORT_SYMBOL(debug_unmap_single);
+
--
1.5.6.4
Impact: adds a new Kconfig option
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/Kconfig.debug | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 2a3dfbd..d4fafd5 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -162,6 +162,18 @@ config DOUBLEFAULT
option saves about 4k and might cause you much additional grey
hair.
+config DMA_API_DEBUG
+ default n
+ bool "Enable debugging of DMA-API usage"
+ depends on X86
+ help
+ Enable this option to debug the use of the DMA API by device drivers.
+ With this option you will be able to detect common bugs in device
+ drivers like double-freeing of DMA mappings or freeing mappings that
+ were never allocated.
+ This option causes a performance degredation. Use only if you want to
+ debug device drivers. If unsure, say N.
+
config IOMMU_DEBUG
bool "Enable IOMMU debugging"
depends on GART_IOMMU && DEBUG_KERNEL
--
1.5.6.4
Impact: detect bugs in map/unmap_sg usage
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 9 ++++-
arch/x86/include/asm/dma_debug.h | 20 +++++++++++
arch/x86/kernel/pci-dma-debug.c | 63 ++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index c9bead2..c7bdb75 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -126,9 +126,14 @@ dma_map_sg(struct device *hwdev, struct scatterlist *sg,
int nents, int direction)
{
struct dma_mapping_ops *ops = get_dma_ops(hwdev);
+ int ret;
BUG_ON(!valid_dma_direction(direction));
- return ops->map_sg(hwdev, sg, nents, direction);
+ ret = ops->map_sg(hwdev, sg, nents, direction);
+
+ debug_map_sg(hwdev, sg, ret, direction);
+
+ return ret;
}
static inline void
@@ -140,6 +145,8 @@ dma_unmap_sg(struct device *hwdev, struct scatterlist *sg, int nents,
BUG_ON(!valid_dma_direction(direction));
if (ops->unmap_sg)
ops->unmap_sg(hwdev, sg, nents, direction);
+
+ debug_unmap_sg(hwdev, sg, nents, direction);
}
static inline void
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
index ba4d9b7..ff06d1c 100644
--- a/arch/x86/include/asm/dma_debug.h
+++ b/arch/x86/include/asm/dma_debug.h
@@ -51,6 +51,14 @@ extern
void debug_unmap_single(struct device *dev, dma_addr_t addr,
size_t size, int direction);
+extern
+void debug_map_sg(struct device *dev, struct scatterlist *sg,
+ int nents, int direction);
+
+extern
+void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
+ int nelems, int dir);
+
#else /* CONFIG_DMA_API_DEBUG */
static inline
@@ -70,6 +78,18 @@ void debug_unmap_single(struct device *dev, dma_addr_t addr,
{
}
+static inline
+void debug_map_sg(struct device *dev, struct scatterlist *sg,
+ int nents, int direction)
+{
+}
+
+static inline
+void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
+ int nelems, int dir)
+{
+}
+
#endif /* CONFIG_DMA_API_DEBUG */
#endif /* __ASM_X86_DMA_DEBUG */
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
index 9afb6c8..55ef69a 100644
--- a/arch/x86/kernel/pci-dma-debug.c
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -289,3 +289,66 @@ void debug_unmap_single(struct device *dev, dma_addr_t addr,
}
EXPORT_SYMBOL(debug_unmap_single);
+void debug_map_sg(struct device *dev, struct scatterlist *sg,
+ int nents, int direction)
+{
+ unsigned long flags;
+ struct dma_debug_entry *entry;
+ struct scatterlist *s;
+ int i;
+
+ for_each_sg(sg, s, nents, i) {
+ entry = dma_entry_alloc();
+ if (!entry)
+ return;
+
+ entry->type = DMA_DEBUG_SG;
+ entry->dev = dev;
+ entry->cpu_addr = sg_virt(s);
+ entry->size = s->length;
+ entry->dev_addr = s->dma_address;
+ entry->direction = direction;
+
+ spin_lock_irqsave(&dma_lock, flags);
+ add_dma_entry(entry);
+ spin_unlock_irqrestore(&dma_lock, flags);
+ }
+}
+EXPORT_SYMBOL(debug_map_sg);
+
+void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
+ int nelems, int dir)
+{
+ unsigned long flags;
+ struct dma_debug_entry *entry;
+ struct scatterlist *s;
+ int i;
+
+ spin_lock_irqsave(&dma_lock, flags);
+
+ for_each_sg(sglist, s, nelems, i) {
+
+ struct dma_debug_entry ref = {
+ .type = DMA_DEBUG_SG,
+ .dev = dev,
+ .cpu_addr = sg_virt(s),
+ .dev_addr = s->dma_address,
+ .size = s->length,
+ .direction = dir,
+ };
+
+ if (ref.dev_addr == bad_dma_address)
+ continue;
+
+ entry = find_dma_entry(&ref);
+
+ if (check_unmap(&ref, entry)) {
+ remove_dma_entry(entry);
+ dma_entry_free(entry);
+ }
+ }
+
+ spin_unlock_irqrestore(&dma_lock, flags);
+}
+EXPORT_SYMBOL(debug_unmap_sg);
+
--
1.5.6.4
Impact: detect bugs in sync_single_range* usage
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 4 ++++
arch/x86/include/asm/dma_debug.h | 26 ++++++++++++++++++++++++++
arch/x86/kernel/pci-dma-debug.c | 17 +++++++++++++++++
3 files changed, 47 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 63bed40..2b6399d 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -185,6 +185,8 @@ dma_sync_single_range_for_cpu(struct device *hwdev, dma_addr_t dma_handle,
if (ops->sync_single_range_for_cpu)
ops->sync_single_range_for_cpu(hwdev, dma_handle, offset,
size, direction);
+ debug_sync_single_range_for_cpu(hwdev, dma_handle, offset,
+ size, direction);
flush_write_buffers();
}
@@ -199,6 +201,8 @@ dma_sync_single_range_for_device(struct device *hwdev, dma_addr_t dma_handle,
if (ops->sync_single_range_for_device)
ops->sync_single_range_for_device(hwdev, dma_handle,
offset, size, direction);
+ debug_sync_single_range_for_device(hwdev, dma_handle, offset,
+ size, direction);
flush_write_buffers();
}
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
index 8262cd1..bc4a841 100644
--- a/arch/x86/include/asm/dma_debug.h
+++ b/arch/x86/include/asm/dma_debug.h
@@ -75,6 +75,17 @@ extern
void debug_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
size_t size, int direction);
+extern
+void debug_sync_single_range_for_cpu(struct device *dev, dma_addr_t dma_handle,
+ unsigned long offset, size_t size,
+ int direction);
+
+extern
+void debug_sync_single_range_for_device(struct device *dev,
+ dma_addr_t dma_handle,
+ unsigned long offset,
+ size_t size, int direction);
+
#else /* CONFIG_DMA_API_DEBUG */
static inline
@@ -130,6 +141,21 @@ void debug_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
{
}
+static inline
+void debug_sync_single_range_for_cpu(struct device *dev, dma_addr_t dma_handle,
+ unsigned long offset, size_t size,
+ int direction)
+{
+}
+
+static inline
+void debug_sync_single_range_for_device(struct device *dev,
+ dma_addr_t dma_handle,
+ unsigned long offset,
+ size_t size, int direction)
+{
+}
+
#endif /* CONFIG_DMA_API_DEBUG */
#endif /* __ASM_X86_DMA_DEBUG */
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
index 1dfcd33..92b9491 100644
--- a/arch/x86/kernel/pci-dma-debug.c
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -422,3 +422,20 @@ void debug_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
}
EXPORT_SYMBOL(debug_sync_single_for_device);
+void debug_sync_single_range_for_cpu(struct device *dev, dma_addr_t dma_handle,
+ unsigned long offset, size_t size,
+ int direction)
+{
+ check_sync(dev, dma_handle, size, offset, direction, true);
+}
+EXPORT_SYMBOL(debug_sync_single_range_for_cpu);
+
+void debug_sync_single_range_for_device(struct device *dev,
+ dma_addr_t dma_handle,
+ unsigned long offset,
+ size_t size, int direction)
+{
+ check_sync(dev, dma_handle, size, offset, direction, false);
+}
+EXPORT_SYMBOL(debug_sync_single_range_for_device);
+
--
1.5.6.4
Impact: detect bugs in sync_single* usage
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 2 ++
arch/x86/include/asm/dma_debug.h | 20 ++++++++++++++++++++
arch/x86/kernel/pci-dma-debug.c | 14 ++++++++++++++
3 files changed, 36 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 2893adb..63bed40 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -158,6 +158,7 @@ dma_sync_single_for_cpu(struct device *hwdev, dma_addr_t dma_handle,
BUG_ON(!valid_dma_direction(direction));
if (ops->sync_single_for_cpu)
ops->sync_single_for_cpu(hwdev, dma_handle, size, direction);
+ debug_sync_single_for_cpu(hwdev, dma_handle, size, direction);
flush_write_buffers();
}
@@ -170,6 +171,7 @@ dma_sync_single_for_device(struct device *hwdev, dma_addr_t dma_handle,
BUG_ON(!valid_dma_direction(direction));
if (ops->sync_single_for_device)
ops->sync_single_for_device(hwdev, dma_handle, size, direction);
+ debug_sync_single_for_device(hwdev, dma_handle, size, direction);
flush_write_buffers();
}
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
index 7245e27..8262cd1 100644
--- a/arch/x86/include/asm/dma_debug.h
+++ b/arch/x86/include/asm/dma_debug.h
@@ -67,6 +67,14 @@ extern
void debug_free_coherent(struct device *dev, size_t size,
void *virt, dma_addr_t addr);
+extern
+void debug_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
+ size_t size, int direction);
+
+extern
+void debug_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
+ size_t size, int direction);
+
#else /* CONFIG_DMA_API_DEBUG */
static inline
@@ -110,6 +118,18 @@ void debug_free_coherent(struct device *dev, size_t size,
{
}
+static inline
+void debug_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
+ size_t size, int direction)
+{
+}
+
+static inline
+void debug_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
+ size_t size, int direction)
+{
+}
+
#endif /* CONFIG_DMA_API_DEBUG */
#endif /* __ASM_X86_DMA_DEBUG */
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
index db5ef9a..1dfcd33 100644
--- a/arch/x86/kernel/pci-dma-debug.c
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -408,3 +408,17 @@ void debug_free_coherent(struct device *dev, size_t size,
}
EXPORT_SYMBOL(debug_free_coherent);
+void debug_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
+ size_t size, int direction)
+{
+ check_sync(dev, dma_handle, size, 0, direction, true);
+}
+EXPORT_SYMBOL(debug_sync_single_for_cpu);
+
+void debug_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
+ size_t size, int direction)
+{
+ check_sync(dev, dma_handle, size, 0, direction, false);
+}
+EXPORT_SYMBOL(debug_sync_single_for_device);
+
--
1.5.6.4
Impact: detect bugs in sync_sg* usage
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 2 ++
arch/x86/include/asm/dma_debug.h | 20 ++++++++++++++++++++
arch/x86/kernel/pci-dma-debug.c | 24 ++++++++++++++++++++++++
3 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 2b6399d..6a580e9 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -215,6 +215,7 @@ dma_sync_sg_for_cpu(struct device *hwdev, struct scatterlist *sg,
BUG_ON(!valid_dma_direction(direction));
if (ops->sync_sg_for_cpu)
ops->sync_sg_for_cpu(hwdev, sg, nelems, direction);
+ debug_sync_sg_for_cpu(hwdev, sg, nelems, direction);
flush_write_buffers();
}
@@ -227,6 +228,7 @@ dma_sync_sg_for_device(struct device *hwdev, struct scatterlist *sg,
BUG_ON(!valid_dma_direction(direction));
if (ops->sync_sg_for_device)
ops->sync_sg_for_device(hwdev, sg, nelems, direction);
+ debug_sync_sg_for_device(hwdev, sg, nelems, direction);
flush_write_buffers();
}
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
index bc4a841..b30d3c9 100644
--- a/arch/x86/include/asm/dma_debug.h
+++ b/arch/x86/include/asm/dma_debug.h
@@ -86,6 +86,14 @@ void debug_sync_single_range_for_device(struct device *dev,
unsigned long offset,
size_t size, int direction);
+extern
+void debug_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+ int nelems, int direction);
+
+extern
+void debug_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+ int nelems, int direction);
+
#else /* CONFIG_DMA_API_DEBUG */
static inline
@@ -156,6 +164,18 @@ void debug_sync_single_range_for_device(struct device *dev,
{
}
+static inline
+void debug_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+ int nelems, int direction)
+{
+}
+
+static inline
+void debug_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+ int nelems, int direction)
+{
+}
+
#endif /* CONFIG_DMA_API_DEBUG */
#endif /* __ASM_X86_DMA_DEBUG */
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
index 92b9491..ac8e1f9 100644
--- a/arch/x86/kernel/pci-dma-debug.c
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -439,3 +439,27 @@ void debug_sync_single_range_for_device(struct device *dev,
}
EXPORT_SYMBOL(debug_sync_single_range_for_device);
+void debug_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
+ int nelems, int direction)
+{
+ struct scatterlist *s;
+ int i;
+
+ for_each_sg(sg, s, nelems, i)
+ check_sync(dev, s->dma_address, s->dma_length, 0,
+ direction, true);
+}
+EXPORT_SYMBOL(debug_sync_sg_for_cpu);
+
+void debug_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
+ int nelems, int direction)
+{
+ struct scatterlist *s;
+ int i;
+
+ for_each_sg(sg, s, nelems, i)
+ check_sync(dev, s->dma_address, s->dma_length, 0,
+ direction, false);
+}
+EXPORT_SYMBOL(debug_sync_sg_for_device);
+
--
1.5.6.4
Impact: detect bugs in alloc/free_coherent usage
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 10 +++++-
arch/x86/include/asm/dma_debug.h | 20 +++++++++++++
arch/x86/kernel/pci-dma-debug.c | 56 ++++++++++++++++++++++++++++++++++++
3 files changed, 84 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index c7bdb75..2893adb 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -304,8 +304,12 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
if (!ops->alloc_coherent)
return NULL;
- return ops->alloc_coherent(dev, size, dma_handle,
- dma_alloc_coherent_gfp_flags(dev, gfp));
+ memory = ops->alloc_coherent(dev, size, dma_handle,
+ dma_alloc_coherent_gfp_flags(dev, gfp));
+
+ debug_alloc_coherent(dev, size, *dma_handle, memory);
+
+ return memory;
}
static inline void dma_free_coherent(struct device *dev, size_t size,
@@ -320,6 +324,8 @@ static inline void dma_free_coherent(struct device *dev, size_t size,
if (ops->free_coherent)
ops->free_coherent(dev, size, vaddr, bus);
+
+ debug_free_coherent(dev, size, vaddr, bus);
}
#endif
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
index ff06d1c..7245e27 100644
--- a/arch/x86/include/asm/dma_debug.h
+++ b/arch/x86/include/asm/dma_debug.h
@@ -59,6 +59,14 @@ extern
void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
int nelems, int dir);
+extern
+void debug_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t dma_addr, void *virt);
+
+extern
+void debug_free_coherent(struct device *dev, size_t size,
+ void *virt, dma_addr_t addr);
+
#else /* CONFIG_DMA_API_DEBUG */
static inline
@@ -90,6 +98,18 @@ void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
{
}
+static inline
+void debug_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t dma_addr, void *virt)
+{
+}
+
+static inline
+void debug_free_coherent(struct device *dev, size_t size,
+ void *virt, dma_addr_t addr)
+{
+}
+
#endif /* CONFIG_DMA_API_DEBUG */
#endif /* __ASM_X86_DMA_DEBUG */
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
index 55ef69a..db5ef9a 100644
--- a/arch/x86/kernel/pci-dma-debug.c
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -352,3 +352,59 @@ void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
}
EXPORT_SYMBOL(debug_unmap_sg);
+void debug_alloc_coherent(struct device *dev, size_t size,
+ dma_addr_t dma_addr, void *virt)
+{
+ unsigned long flags;
+ struct dma_debug_entry *entry;
+
+ if (dma_addr == bad_dma_address)
+ return;
+
+ entry = dma_entry_alloc();
+ if (!entry)
+ return;
+
+ entry->type = DMA_DEBUG_COHERENT;
+ entry->dev = dev;
+ entry->cpu_addr = virt;
+ entry->size = size;
+ entry->dev_addr = dma_addr;
+ entry->direction = DMA_BIDIRECTIONAL;
+
+ spin_lock_irqsave(&dma_lock, flags);
+ add_dma_entry(entry);
+ spin_unlock_irqrestore(&dma_lock, flags);
+}
+EXPORT_SYMBOL(debug_alloc_coherent);
+
+void debug_free_coherent(struct device *dev, size_t size,
+ void *virt, dma_addr_t addr)
+{
+ unsigned long flags;
+ struct dma_debug_entry ref = {
+ .type = DMA_DEBUG_COHERENT,
+ .dev = dev,
+ .cpu_addr = virt,
+ .dev_addr = addr,
+ .size = size,
+ .direction = DMA_BIDIRECTIONAL,
+ };
+ struct dma_debug_entry *entry;
+
+ if (addr == bad_dma_address)
+ return;
+
+ spin_lock_irqsave(&dma_lock, flags);
+
+ entry = find_dma_entry(&ref);
+
+ if (check_unmap(&ref, entry)) {
+ remove_dma_entry(entry);
+ dma_entry_free(entry);
+ }
+
+ spin_unlock_irqrestore(&dma_lock, flags);
+}
+EXPORT_SYMBOL(debug_free_coherent);
+
--
1.5.6.4
Impact: adds helper functions to be used later
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/kernel/pci-dma-debug.c | 125 +++++++++++++++++++++++++++++++++++++++
1 files changed, 125 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
index c2d3408..fc95631 100644
--- a/arch/x86/kernel/pci-dma-debug.c
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -42,6 +42,11 @@ static struct kmem_cache *dma_entry_cache;
/* lock to protect the data structures */
static DEFINE_SPINLOCK(dma_lock);
+static char *type2name[3] = { "single", "scather-gather", "coherent" };
+
+static char *dir2name[4] = { "DMA_BIDIRECTIONAL", "DMA_TO_DEVICE",
+ "DMA_FROM_DEVICE", "DMA_NONE" };
+
static int hash_fn(struct dma_debug_entry *entry)
{
/*
@@ -95,6 +100,126 @@ static void remove_dma_entry(struct dma_debug_entry *entry)
list_del(&entry->list);
}
+static bool check_unmap(struct dma_debug_entry *ref,
+ struct dma_debug_entry *entry)
+{
+ bool errors = false;
+
+ if (!entry) {
+ dev_printk(KERN_ERR, ref->dev, "PCI-DMA: device driver tries "
+ "to free DMA memory it has not allocated "
+ "[device address=0x%016llx] [size=%llu bytes]\n",
+ ref->dev_addr, ref->size);
+ dump_stack();
+
+ return false;
+ }
+
+ if (ref->size != entry->size) {
+ dev_printk(KERN_ERR, ref->dev, "PCI-DMA: device driver frees "
+ "DMA memory with different size "
+ "[device address=0x%016llx] [map size=%llu bytes] "
+ "[unmap size=%llu bytes]\n",
+ ref->dev_addr, entry->size, ref->size);
+ errors = true;
+ }
+
+ if (ref->type != entry->type) {
+ dev_printk(KERN_ERR, ref->dev, "PCI-DMA: device driver frees "
+ "DMA memory different that it was allocated "
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[mapped as %s] [unmapped as %s]\n",
+ ref->dev_addr, ref->size,
+ type2name[entry->type], type2name[ref->type]);
+ errors = true;
+ } else if ((entry->type == DMA_DEBUG_COHERENT) &&
+ (ref->cpu_addr != entry->cpu_addr)) {
+ dev_printk(KERN_ERR, ref->dev, "PCI-DMA: device driver frees "
+ "DMA memory with different CPU address "
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[cpu alloc address=%p] [cpu free address=%p]",
+ ref->dev_addr, ref->size,
+ entry->cpu_addr, ref->cpu_addr);
+ errors = true;
+
+ }
+
+ /*
+ * This may be no bug in reality - but most implementations of the
+ * DMA API don't handle this properly, so check for it here
+ */
+ if (ref->direction != entry->direction) {
+ dev_printk(KERN_ERR, ref->dev, "PCI-DMA: device driver frees "
+ "DMA memory with different direction "
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[mapped with %s] [unmapped with %s]\n",
+ ref->dev_addr, ref->size,
+ dir2name[entry->direction],
+ dir2name[ref->direction]);
+ errors = true;
+ }
+
+ if (errors)
+ dump_stack();
+
+ return true;
+}
+
+static void check_sync(struct device *dev, dma_addr_t addr,
+ u64 size, u64 offset, int direction, bool to_cpu)
+{
+ bool error = false;
+ unsigned long flags;
+ struct dma_debug_entry ref = {
+ .dev = dev,
+ .dev_addr = addr,
+ .size = size,
+ .direction = direction,
+ };
+ struct dma_debug_entry *entry;
+
+ spin_lock_irqsave(&dma_lock, flags);
+
+ entry = find_dma_entry(&ref);
+
+ if (!entry) {
+ dev_printk(KERN_ERR, dev, "PCI-DMA: device driver tries "
+ "to sync DMA memory it has not allocated "
+ "[device address=0x%016llx] [size=%llu bytes]\n",
+ addr, size);
+ error = true;
+ goto out;
+ }
+
+ if ((offset + size) > entry->size) {
+ dev_printk(KERN_ERR, dev, "PCI-DMA: device driver syncs"
+ " DMA memory outside allocated range "
+ "[device address=0x%016llx] "
+ "[allocation size=%llu bytes] [sync offset=%llu] "
+ "[sync size=%llu]\n", entry->dev_addr, entry->size,
+ offset, size);
+ error = true;
+ }
+
+ if (direction != entry->direction) {
+ dev_printk(KERN_ERR, dev, "PCI-DMA: device driver syncs "
+ "DMA memory with different direction "
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[mapped with %s] [synced with %s]\n",
+ addr, entry->size,
+ dir2name[entry->direction],
+ dir2name[direction]);
+ error = true;
+ }
+
+out:
+ spin_unlock_irqrestore(&dma_lock, flags);
+
+ if (error)
+ dump_stack();
+}
+
+
void dma_debug_init(void)
{
int i;
--
1.5.6.4
Impact: creates necessary data structures for DMA-API debugging
Signed-off-by: Joerg Roedel <[email protected]>
---
arch/x86/include/asm/dma-mapping.h | 1 +
arch/x86/include/asm/dma_debug.h | 14 +++++
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/pci-dma-debug.c | 111 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/pci-dma.c | 2 +
5 files changed, 130 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/kernel/pci-dma-debug.c
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 7f225a4..83d7b7d 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -9,6 +9,7 @@
#include <linux/scatterlist.h>
#include <asm/io.h>
#include <asm/swiotlb.h>
+#include <asm/dma_debug.h>
#include <asm-generic/dma-coherent.h>
extern dma_addr_t bad_dma_address;
diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
index d79f024..f2c3d53 100644
--- a/arch/x86/include/asm/dma_debug.h
+++ b/arch/x86/include/asm/dma_debug.h
@@ -38,4 +38,18 @@ struct dma_debug_entry {
int direction;
};
+#ifdef CONFIG_DMA_API_DEBUG
+
+extern
+void dma_debug_init(void);
+
+#else /* CONFIG_DMA_API_DEBUG */
+
+static inline
+void dma_debug_init(void)
+{
+}
+
+#endif /* CONFIG_DMA_API_DEBUG */
+
#endif /* __ASM_X86_DMA_DEBUG */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index e489ff9..6271cd2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -105,6 +105,8 @@ microcode-$(CONFIG_MICROCODE_INTEL) += microcode_intel.o
microcode-$(CONFIG_MICROCODE_AMD) += microcode_amd.o
obj-$(CONFIG_MICROCODE) += microcode.o
+obj-$(CONFIG_DMA_API_DEBUG) += pci-dma-debug.o
+
###
# 64 bit specific files
ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
new file mode 100644
index 0000000..c2d3408
--- /dev/null
+++ b/arch/x86/kernel/pci-dma-debug.c
@@ -0,0 +1,111 @@
+/*
+ * Copyright (C) 2008 Advanced Micro Devices, Inc.
+ *
+ * Author: Joerg Roedel <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/types.h>
+#include <linux/scatterlist.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/module.h>
+#include <linux/hardirq.h>
+#include <linux/dma-mapping.h>
+#include <asm/bug.h>
+#include <asm/dma-mapping.h>
+#include <asm/dma_debug.h>
+
+#define HASH_SIZE 256
+#define HASH_FN_SHIFT 20
+#define HASH_FN_MASK 0xffULL
+
+/* Hash list to save the allocated dma addresses */
+static struct list_head dma_entry_hash[HASH_SIZE];
+
+/* A slab cache to allocate dma_map_entries fast */
+static struct kmem_cache *dma_entry_cache;
+
+/* lock to protect the data structures */
+static DEFINE_SPINLOCK(dma_lock);
+
+static int hash_fn(struct dma_debug_entry *entry)
+{
+ /*
+ * Hash function is based on the dma address.
+ * We use bits 20-27 here as the index into the hash
+ */
+ BUG_ON(entry->dev_addr == bad_dma_address);
+
+ return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
+}
+
+static struct dma_debug_entry *dma_entry_alloc(void)
+{
+ gfp_t gfp = GFP_KERNEL | __GFP_ZERO;
+
+ if (in_atomic())
+ gfp |= GFP_ATOMIC;
+
+ return kmem_cache_alloc(dma_entry_cache, gfp);
+}
+
+static void dma_entry_free(struct dma_debug_entry *entry)
+{
+ kmem_cache_free(dma_entry_cache, entry);
+}
+
+static struct dma_debug_entry *
+find_dma_entry(struct dma_debug_entry *ref)
+{
+ int idx = hash_fn(ref);
+ struct dma_debug_entry *entry;
+
+ list_for_each_entry(entry, &dma_entry_hash[idx], list) {
+ if ((entry->dev_addr == ref->dev_addr) &&
+ (entry->dev == ref->dev))
+ return entry;
+ }
+
+ return NULL;
+}
+
+static void add_dma_entry(struct dma_debug_entry *entry)
+{
+ int idx = hash_fn(entry);
+
+ list_add_tail(&entry->list, &dma_entry_hash[idx]);
+}
+
+static void remove_dma_entry(struct dma_debug_entry *entry)
+{
+ list_del(&entry->list);
+}
+
+void dma_debug_init(void)
+{
+ int i;
+
+ for (i = 0; i < HASH_SIZE; ++i)
+ INIT_LIST_HEAD(&dma_entry_hash[i]);
+
+ dma_entry_cache = kmem_cache_create("dma_debug_entry_cache",
+ sizeof(struct dma_debug_entry),
+ 0, SLAB_PANIC, NULL);
+
+ printk(KERN_INFO "PCI-DMA: DMA API debugging enabled by kernel config\n");
+}
+
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 1926248..94096b8 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -275,6 +275,8 @@ EXPORT_SYMBOL(dma_supported);
static int __init pci_iommu_init(void)
{
+ dma_debug_init();
+
calgary_iommu_init();
intel_iommu_init();
--
1.5.6.4
* Joerg Roedel <[email protected]> wrote:
> Hi,
>
> this patchset introduces code to debug drivers usage of the DMA-API.
> Tests with hardware IOMMUs have shown several bugs in drivers
> regarding the usage of that API. Problems were found especially in
> network card drivers.
>
> These bugs often don't show up or have any negative impact if there
> is no hardware IOMMU in use in the system. But with an hardware
> IOMMU these bugs turn the hardware unusable or, in the worst case,
> cause data corruption on devices which are managed by other (good)
> drivers.
>
> With the code these patches introduce driver developers can find
> several bugs of misusing the DMA-API in their drivers. But be aware,
> it can not find all possible bugs. If it finds a problem it prints
> out messages like
>
> tg3 0000:08:04.0: PCI-DMA: device driver tries to free DMA memory it
> has not allocated [device address=0x000000042f0f3ae7] [size=48
> bytes]
>
> Pid: 6285, comm: bash Not tainted 2.6.28-rc5-00176-g6ae6379-dirty #6
> Call Trace:
> <IRQ> [<ffffffff80221276>] check_unmap+0x52/0x1ce
> [<ffffffff80221af0>] debug_unmap_single+0x61/0xa4
> [<ffffffff8053d396>] skb_dma_unmap+0xf2/0x10c
> [<ffffffff8040a986>] tg3_poll+0xe8/0x822
> [<ffffffff803abe2f>] mix_pool_bytes_extract+0x5c/0x155
> [<ffffffff80540e42>] net_rx_action+0x9d/0x170
very nice! I like this approach - and have a few comments on some
details - will post them as a reply to the patchs.
Ingo
* Joerg Roedel <[email protected]> wrote:
> +config DMA_API_DEBUG
> + default n
'default n' can be omitted in general from interactive Kconfig
entries.
> + bool "Enable debugging of DMA-API usage"
> + depends on X86
perhaps add a proper dependency on iommu or swiotlb presence as well -
in case we decide to make it all disable-able again and dont have the
callbacks present?
Ingo
On Fri, Nov 21, 2008 at 05:37:17PM +0100, Ingo Molnar wrote:
>
> * Joerg Roedel <[email protected]> wrote:
>
> > Hi,
> >
> > this patchset introduces code to debug drivers usage of the DMA-API.
> > Tests with hardware IOMMUs have shown several bugs in drivers
> > regarding the usage of that API. Problems were found especially in
> > network card drivers.
> >
> > These bugs often don't show up or have any negative impact if there
> > is no hardware IOMMU in use in the system. But with an hardware
> > IOMMU these bugs turn the hardware unusable or, in the worst case,
> > cause data corruption on devices which are managed by other (good)
> > drivers.
> >
> > With the code these patches introduce driver developers can find
> > several bugs of misusing the DMA-API in their drivers. But be aware,
> > it can not find all possible bugs. If it finds a problem it prints
> > out messages like
> >
> > tg3 0000:08:04.0: PCI-DMA: device driver tries to free DMA memory it
> > has not allocated [device address=0x000000042f0f3ae7] [size=48
> > bytes]
> >
> > Pid: 6285, comm: bash Not tainted 2.6.28-rc5-00176-g6ae6379-dirty #6
> > Call Trace:
> > <IRQ> [<ffffffff80221276>] check_unmap+0x52/0x1ce
> > [<ffffffff80221af0>] debug_unmap_single+0x61/0xa4
> > [<ffffffff8053d396>] skb_dma_unmap+0xf2/0x10c
> > [<ffffffff8040a986>] tg3_poll+0xe8/0x822
> > [<ffffffff803abe2f>] mix_pool_bytes_extract+0x5c/0x155
> > [<ffffffff80540e42>] net_rx_action+0x9d/0x170
>
> very nice! I like this approach - and have a few comments on some
> details - will post them as a reply to the patchs.
Cool. Thank you.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
* Joerg Roedel <[email protected]> wrote:
> +#ifndef __ASM_X86_DMA_DEBUG
> +#define __ASM_X86_DMA_DEBUG
> +
> +/* Allocation flags */
> +#define DMA_DEBUG_SINGLE 0
> +#define DMA_DEBUG_SG 1
> +#define DMA_DEBUG_COHERENT 2
please use enum for such internal flags, not define.
> +
> +struct device;
> +struct list_head;
> +
> +struct dma_debug_entry {
> + struct list_head list;
> + struct device *dev;
> + int type;
> + void *cpu_addr;
> + u64 dev_addr;
> + u64 size;
> + int direction;
> +};
please align new x86/include structures vertically like this:
> +struct dma_debug_entry {
> + struct list_head list;
> + struct device *dev;
> + int type;
> + void *cpu_addr;
> + u64 dev_addr;
> + u64 size;
> + int direction;
> +};
[ for the arts major students reading lkml ;-) ]
Ingo
On Fri, Nov 21, 2008 at 05:40:14PM +0100, Ingo Molnar wrote:
>
> * Joerg Roedel <[email protected]> wrote:
>
> > +config DMA_API_DEBUG
> > + default n
>
> 'default n' can be omitted in general from interactive Kconfig
> entries.
Ok, then I remove it.
>
> > + bool "Enable debugging of DMA-API usage"
> > + depends on X86
>
> perhaps add a proper dependency on iommu or swiotlb presence as well -
> in case we decide to make it all disable-able again and dont have the
> callbacks present?
No problem. But its hard to believe that we make the DMA-API
disable-able some day ;)
The best dependency in this case should be SWIOTLB I think.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
On Fri, Nov 21, 2008 at 05:42:54PM +0100, Ingo Molnar wrote:
>
> * Joerg Roedel <[email protected]> wrote:
>
> > +#ifndef __ASM_X86_DMA_DEBUG
> > +#define __ASM_X86_DMA_DEBUG
> > +
> > +/* Allocation flags */
> > +#define DMA_DEBUG_SINGLE 0
> > +#define DMA_DEBUG_SG 1
> > +#define DMA_DEBUG_COHERENT 2
>
> please use enum for such internal flags, not define.
>
> > +
> > +struct device;
> > +struct list_head;
> > +
> > +struct dma_debug_entry {
> > + struct list_head list;
> > + struct device *dev;
> > + int type;
> > + void *cpu_addr;
> > + u64 dev_addr;
> > + u64 size;
> > + int direction;
> > +};
>
> please align new x86/include structures vertically like this:
>
> > +struct dma_debug_entry {
> > + struct list_head list;
> > + struct device *dev;
> > + int type;
> > + void *cpu_addr;
> > + u64 dev_addr;
> > + u64 size;
> > + int direction;
> > +};
>
> [ for the arts major students reading lkml ;-) ]
Ok, I will update the patch :)
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
btw, if somebody wants to try out this code, it can also be pulled from
git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu.git dma-api-debug
Have fun,
Joerg
On Fri, 2008-11-21 at 17:26 +0100, Joerg Roedel wrote:
> this patchset introduces code to debug drivers usage of the DMA-API.
> Tests with hardware IOMMUs have shown several bugs in drivers
> regarding the usage of that API.
> Problems were found especially in network card drivers.
This is really useful -- but surely it shouldn't be x86-specific?
All the code except the hooks in the architecture's dma_map_single() et
al functions could be generic, couldn't it?
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
* Joerg Roedel <[email protected]> wrote:
> +extern
> +void dma_debug_init(void);
this can be on a single line.
> +
> +#else /* CONFIG_DMA_API_DEBUG */
> +
> +static inline
> +void dma_debug_init(void)
this too. (when something fits on a single line, we prefer it so)
> +#include <linux/types.h>
> +#include <linux/scatterlist.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/module.h>
> +#include <linux/hardirq.h>
> +#include <linux/dma-mapping.h>
> +#include <asm/bug.h>
> +#include <asm/dma-mapping.h>
> +#include <asm/dma_debug.h>
to reduce the chances of commit conflicts in the future, we
generally sort include lines in x86 files the following way:
> +#include <linux/scatterlist.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/spinlock.h>
> +#include <linux/hardirq.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
>
> +#include <asm/bug.h>
> +#include <asm/dma-mapping.h>
> +#include <asm/dma_debug.h>
[ note the extra newline too between the linux/ and asm/ portions. ]
> +#define HASH_SIZE 256
> +#define HASH_FN_SHIFT 20
> +#define HASH_FN_MASK 0xffULL
please align the values vertically.
> +/* Hash list to save the allocated dma addresses */
> +static struct list_head dma_entry_hash[HASH_SIZE];
Should be cacheline-aligned i guess - if this feature is enabled this
is a hot area.
> +/* A slab cache to allocate dma_map_entries fast */
> +static struct kmem_cache *dma_entry_cache;
__read_mostly - to isolate it from the above hot area.
> +/* lock to protect the data structures */
> +static DEFINE_SPINLOCK(dma_lock);
should have a separate cacheline too i guess.
> +static int hash_fn(struct dma_debug_entry *entry)
> +{
> + /*
> + * Hash function is based on the dma address.
> + * We use bits 20-27 here as the index into the hash
> + */
> + BUG_ON(entry->dev_addr == bad_dma_address);
please use WARN_ON_ONCE() instead of crashing the box in possibly hard
to debug spots.
> + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
> +}
> +
> +static struct dma_debug_entry *dma_entry_alloc(void)
> +{
> + gfp_t gfp = GFP_KERNEL | __GFP_ZERO;
> +
> + if (in_atomic())
> + gfp |= GFP_ATOMIC;
> +
> + return kmem_cache_alloc(dma_entry_cache, gfp);
hm. that slab allocation in the middle of DMA mapping ops makes me a
bit nervous. the DMA mapping ops are generally rather atomic.
and in_atomic() check there is a bug on !PREEMPT kernels, so it wont
fly.
We dont have a gfp flag passed in as all the DMA mapping APIs really
expect all allocations having been done in advance already.
Any chance to attach the debug entry to the iotlb entries themselves -
either during build time (for swiotlb) or during iommu init time (for
the hw iommu's) ?
Ingo
On Fri, Nov 21, 2008 at 04:54:52PM +0000, David Woodhouse wrote:
> On Fri, 2008-11-21 at 17:26 +0100, Joerg Roedel wrote:
> > this patchset introduces code to debug drivers usage of the DMA-API.
> > Tests with hardware IOMMUs have shown several bugs in drivers
> > regarding the usage of that API.
> > Problems were found especially in network card drivers.
>
> This is really useful -- but surely it shouldn't be x86-specific?
>
> All the code except the hooks in the architecture's dma_map_single() et
> al functions could be generic, couldn't it?
Yes, in principle we could move most of it to generic code. There is
nothing architecture specific in it.
Anybody who prefers this to be arch/x86 before moving it to lib/?
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
please use this nice and clean structure init style:
> + entry->type = DMA_DEBUG_COHERENT;
> + entry->dev = dev;
> + entry->cpu_addr = virt;
> + entry->size = size;
> + entry->dev_addr = dma_addr;
> + entry->direction = DMA_BIDIRECTIONAL;
here too:
> + struct dma_debug_entry ref = {
> + .type = DMA_DEBUG_COHERENT,
> + .dev = dev,
> + .cpu_addr = virt,
> + .dev_addr = addr,
> + .size = size,
> + .direction = DMA_BIDIRECTIONAL,
> + };
Ingo
please use this nice structure init style:
> + entry->type = DMA_DEBUG_SG;
> + entry->dev = dev;
> + entry->cpu_addr = sg_virt(s);
> + entry->size = s->length;
> + entry->dev_addr = s->dma_address;
> + entry->direction = direction;
here too:
> + struct dma_debug_entry ref = {
> + .type = DMA_DEBUG_SG,
> + .dev = dev,
> + .cpu_addr = sg_virt(s),
> + .dev_addr = s->dma_address,
> + .size = s->length,
> + .direction = dir,
Ingo
* Joerg Roedel <[email protected]> wrote:
> + for_each_sg(sg, s, nelems, i)
> + check_sync(dev, s->dma_address, s->dma_length, 0,
> + direction, true);
curly braces needed for the multi-line loop body.
> + for_each_sg(sg, s, nelems, i)
> + check_sync(dev, s->dma_address, s->dma_length, 0,
> + direction, false);
ditto.
Ingo
* Joerg Roedel <[email protected]> wrote:
> On Fri, Nov 21, 2008 at 04:54:52PM +0000, David Woodhouse wrote:
> > On Fri, 2008-11-21 at 17:26 +0100, Joerg Roedel wrote:
> > > this patchset introduces code to debug drivers usage of the DMA-API.
> > > Tests with hardware IOMMUs have shown several bugs in drivers
> > > regarding the usage of that API.
> > > Problems were found especially in network card drivers.
> >
> > This is really useful -- but surely it shouldn't be x86-specific?
> >
> > All the code except the hooks in the architecture's dma_map_single() et
> > al functions could be generic, couldn't it?
>
> Yes, in principle we could move most of it to generic code. There is
> nothing architecture specific in it. Anybody who prefers this to be
> arch/x86 before moving it to lib/?
yeah, we want to make it generic once it works.
but my comments about the allocation needs to be addressed (see my
comments on [03/10]), and solving that will likely impact the
structure of the approach in a way that will generalize it anyway, as
a side-effect.
Ingo
On Fri, 2008-11-21 at 17:57 +0100, Joerg Roedel wrote:
> Yes, in principle we could move most of it to generic code. There is
> nothing architecture specific in it.
> Anybody who prefers this to be arch/x86 before moving it to lib/?
I think it's better to start off with a generic version, and test it on
more than one architecture before feeding it upstream. That way, you
know you aren't going to have to go back to the drawing board with it.
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
* Joerg Roedel <[email protected]> wrote:
> +static bool check_unmap(struct dma_debug_entry *ref,
> + struct dma_debug_entry *entry)
> +{
> + bool errors = false;
> +
> + if (!entry) {
> + dev_printk(KERN_ERR, ref->dev, "PCI-DMA: device driver tries "
> + "to free DMA memory it has not allocated "
> + "[device address=0x%016llx] [size=%llu bytes]\n",
> + ref->dev_addr, ref->size);
> + dump_stack();
> +
> + return false;
okay, the warnings need to be one-shot. It will be enabled by distros
in debug kernels to test a wide range of drivers, and the output will
be collected by kerneloops.org. Distros will disable the debug feature
fast if it spams the logs.
So please make it WARN_ONCE() type of warnings. Dont use dump_stack()
directly but use the WARN() signature so that it's picked up by
automated bug collection mechanisms.
This holds true for all the other warnings as well. Plus probably the
whole mechanism should self-deactivate like lockdep does, when it
notices the first error. That guarantees that even if it has a false
positive or some other bug it wont break more stuff.
> + struct dma_debug_entry ref = {
> + .dev = dev,
> + .dev_addr = addr,
> + .size = size,
> + .direction = direction,
> + };
(align the field init vertically please.)
Ingo
On Fri, Nov 21, 2008 at 05:56:28PM +0100, Ingo Molnar wrote:
> > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
> > +}
> > +
> > +static struct dma_debug_entry *dma_entry_alloc(void)
> > +{
> > + gfp_t gfp = GFP_KERNEL | __GFP_ZERO;
> > +
> > + if (in_atomic())
> > + gfp |= GFP_ATOMIC;
> > +
> > + return kmem_cache_alloc(dma_entry_cache, gfp);
>
> hm. that slab allocation in the middle of DMA mapping ops makes me a
> bit nervous. the DMA mapping ops are generally rather atomic.
>
> and in_atomic() check there is a bug on !PREEMPT kernels, so it wont
> fly.
I am not sure I understand this correctly. You say the check for
in_atomic() will break on !PREEMPT kernels?
> We dont have a gfp flag passed in as all the DMA mapping APIs really
> expect all allocations having been done in advance already.
Hmm, I can change the code to pre-allocate a certain (configurable?)
number of these entries and disble the checking if we run out of it.
> Any chance to attach the debug entry to the iotlb entries themselves -
> either during build time (for swiotlb) or during iommu init time (for
> the hw iommu's) ?
Hm, I want to avoid that. This approach has the advantage that is
works independent of any dma_ops implementation. So it can be used to
find out if a DMA bug originates from the device driver or (in my case)
from the IOMMU driver.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
* Joerg Roedel <[email protected]> wrote:
> + ret = ops->map_sg(hwdev, sg, nents, direction);
stray double space in '= ops'.
another very small detail:
> + unsigned long flags;
> + struct dma_debug_entry *entry;
> + struct scatterlist *s;
> + int i;
please order them like this, similar to the include line:
> + struct dma_debug_entry *entry;
> + struct scatterlist *s;
> + unsigned long flags;
> + int i;
that makes the whole variable section non-intrusive. (and also acts as
a patch-conflict-reducer mechanism in the future - when variable
definition lines get particularly long)
(not a hard rule, exceptions are possible - grouping same-type fields
together, etc.)
Ingo
* David Woodhouse <[email protected]> wrote:
> On Fri, 2008-11-21 at 17:57 +0100, Joerg Roedel wrote:
> > Yes, in principle we could move most of it to generic code. There is
> > nothing architecture specific in it.
> > Anybody who prefers this to be arch/x86 before moving it to lib/?
>
> I think it's better to start off with a generic version, and test it
> on more than one architecture before feeding it upstream. That way,
> you know you aren't going to have to go back to the drawing board
> with it.
Joerg, i'd suggest to make the generic conceptual bits generic, and
add an HAVE_DMA_DEBUG_SUPPORT Kconfig bool flag to every architecture
that enables it. It needs active changes to every architecture that
supports this facility.
Ingo
* Joerg Roedel <[email protected]> wrote:
> On Fri, Nov 21, 2008 at 05:56:28PM +0100, Ingo Molnar wrote:
> > > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
> > > +}
> > > +
> > > +static struct dma_debug_entry *dma_entry_alloc(void)
> > > +{
> > > + gfp_t gfp = GFP_KERNEL | __GFP_ZERO;
> > > +
> > > + if (in_atomic())
> > > + gfp |= GFP_ATOMIC;
> > > +
> > > + return kmem_cache_alloc(dma_entry_cache, gfp);
> >
> > hm. that slab allocation in the middle of DMA mapping ops makes me a
> > bit nervous. the DMA mapping ops are generally rather atomic.
> >
> > and in_atomic() check there is a bug on !PREEMPT kernels, so it wont
> > fly.
>
> I am not sure I understand this correctly. You say the check for
> in_atomic() will break on !PREEMPT kernels?
Correct. There is no check to be able to tell whether we can schedule
or not. I.e. on !PREEMPT your patches will crash and burn.
> > We dont have a gfp flag passed in as all the DMA mapping APIs
> > really expect all allocations having been done in advance already.
>
> Hmm, I can change the code to pre-allocate a certain (configurable?)
> number of these entries and disble the checking if we run out of it.
yeah, that's a good approach too - that's what lockdep does. Perhaps
make the max entries nr a Kconfig entity - so it can be tuned up/down
depending on what type of iommu scheme is enabled.
Ingo
On Fri, Nov 21, 2008 at 06:18:02PM +0100, Ingo Molnar wrote:
>
> * David Woodhouse <[email protected]> wrote:
>
> > On Fri, 2008-11-21 at 17:57 +0100, Joerg Roedel wrote:
> > > Yes, in principle we could move most of it to generic code. There is
> > > nothing architecture specific in it.
> > > Anybody who prefers this to be arch/x86 before moving it to lib/?
> >
> > I think it's better to start off with a generic version, and test it
> > on more than one architecture before feeding it upstream. That way,
> > you know you aren't going to have to go back to the drawing board
> > with it.
>
> Joerg, i'd suggest to make the generic conceptual bits generic, and
> add an HAVE_DMA_DEBUG_SUPPORT Kconfig bool flag to every architecture
> that enables it. It needs active changes to every architecture that
> supports this facility.
Ok, I will move the generic bits to lib/ and include/linux and let
architectures decide if they want to use it.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
On Fri, 2008-11-21 at 18:18 +0100, Ingo Molnar wrote:
> Joerg, i'd suggest to make the generic conceptual bits generic, and
> add an HAVE_DMA_DEBUG_SUPPORT Kconfig bool flag to every architecture
> that enables it. It needs active changes to every architecture that
> supports this facility.
Yes, that makes sense.
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
On Fri, 2008-11-21 at 18:20 +0100, Joerg Roedel wrote:
> Ok, I will move the generic bits to lib/ and include/linux and let
> architectures decide if they want to use it.
Once you've done that, I'll try to hook it up on PowerPC to make sure it
works there.
--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation
* Ingo Molnar <[email protected]> wrote:
> > > We dont have a gfp flag passed in as all the DMA mapping APIs
> > > really expect all allocations having been done in advance
> > > already.
> >
> > Hmm, I can change the code to pre-allocate a certain
> > (configurable?) number of these entries and disble the checking if
> > we run out of it.
>
> yeah, that's a good approach too - that's what lockdep does. Perhaps
> make the max entries nr a Kconfig entity - so it can be tuned
> up/down depending on what type of iommu scheme is enabled.
there's another reason why we want to do that: this scheme covers all
of DMA - not just the ones which need to go via an iommu and for which
there's an IOTLB entry present. So the pool should probably be sized
after RAM size, to be on the safe side.
Ingo
On Fri, Nov 21, 2008 at 05:24:29PM +0000, David Woodhouse wrote:
> On Fri, 2008-11-21 at 18:20 +0100, Joerg Roedel wrote:
> > Ok, I will move the generic bits to lib/ and include/linux and let
> > architectures decide if they want to use it.
>
> Once you've done that, I'll try to hook it up on PowerPC to make sure it
> works there.
Ok, cool. Thanks
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
* Joerg Roedel <[email protected]> wrote:
> +static struct list_head dma_entry_hash[HASH_SIZE];
> +
> +/* A slab cache to allocate dma_map_entries fast */
> +static struct kmem_cache *dma_entry_cache;
> +
> +/* lock to protect the data structures */
> +static DEFINE_SPINLOCK(dma_lock);
some more generic comments about the data structure: it's main purpose
is to provide a mapping based on (dev,addr). There's little if any
cross-entry interaction - same-address+same-dev DMA is checked.
1)
the hash:
+ return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
should mix in entry->dev as well - that way we get not just per
address but per device hash space separation as well.
2)
HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in
practice albeit perhaps a bit too small. There's seldom any coherency
between the physical addresses of DMA - we rarely have any real
(performance-relevant) physical co-location of DMA addresses beyond 4K
granularity. So using 1MB chunking here will discard a good deal of
random low bits we should be hashing on.
3)
And the most scalable locking would be per hash bucket locking - no
global lock is needed. The bucket hash heads should probably be
cacheline sized - so we'd get one lock per bucket.
This way if there's irq+DMA traffic on one CPU from one device into
one range of memory, and irq+DMA traffic on another CPU to another
device, they will map to two different hash buckets.
4)
Plus it might be an option to make hash lookup lockless as well:
depending on the DMA flux we can get a lot of lookups, and taking the
bucket lock can be avoided, if you use RCU-safe list ops and drive the
refilling of the free entries pool from RCU.
Ingo
* Joerg Roedel <[email protected]> wrote:
> On Fri, Nov 21, 2008 at 05:24:29PM +0000, David Woodhouse wrote:
> > On Fri, 2008-11-21 at 18:20 +0100, Joerg Roedel wrote:
> > > Ok, I will move the generic bits to lib/ and include/linux and let
> > > architectures decide if they want to use it.
> >
> > Once you've done that, I'll try to hook it up on PowerPC to make
> > sure it works there.
>
> Ok, cool. Thanks
i'll give it a whirl on x86 once the allocation bug is resolved. x86
testing will be the most interesting in practice, because most drivers
there are developed with no dynamic DMA in mind. (many of the x86
drivers were developed before IOMMUs were supported in Linux)
Ingo
From: Joerg Roedel <[email protected]>
Date: Fri, 21 Nov 2008 17:26:01 +0100
> Impact: adds a new Kconfig option
>
> Signed-off-by: Joerg Roedel <[email protected]>
This definitely should be generic code, rather than something
tucked away in x86 platform code.
On Fri, 21 Nov 2008 17:26:00 +0100
Joerg Roedel <[email protected]> wrote:
> Hi,
>
> this patchset introduces code to debug drivers usage of the DMA-API.
> Tests with hardware IOMMUs have shown several bugs in drivers regarding
> the usage of that API.
> Problems were found especially in network card drivers.
Especially? Have you met dma similar bugs with storage (scsi, etc)
drivers?
> This way driver developers get an idea where the problem is in their
> code.
>
> Please review and send any objections or, if there are none, consider
> for inclusion ;)
This should be architecture independent code. Except for that, it
looks very useful.
Thanks,
On Fri, 21 Nov 2008 17:26:05 +0100
Joerg Roedel <[email protected]> wrote:
> Impact: detect bugs in map/unmap_single usage
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/include/asm/dma-mapping.h | 9 +++++-
> arch/x86/include/asm/dma_debug.h | 20 +++++++++++++
> arch/x86/kernel/pci-dma-debug.c | 55 ++++++++++++++++++++++++++++++++++++
> 3 files changed, 83 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> index 83d7b7d..c9bead2 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -98,9 +98,14 @@ dma_map_single(struct device *hwdev, void *ptr, size_t size,
> int direction)
> {
> struct dma_mapping_ops *ops = get_dma_ops(hwdev);
> + dma_addr_t addr;
>
> BUG_ON(!valid_dma_direction(direction));
> - return ops->map_single(hwdev, virt_to_phys(ptr), size, direction);
> + addr = ops->map_single(hwdev, virt_to_phys(ptr), size, direction);
> +
> + debug_map_single(hwdev, ptr, size, direction, addr);
debug_map_single could fail due to OOM. Then debug_unmap_single in
dma_unmap_single gives a false warning because it can't find the
dma_debug_entry?
On Fri, 21 Nov 2008 17:26:07 +0100
Joerg Roedel <[email protected]> wrote:
> Impact: detect bugs in alloc/free_coherent usage
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/include/asm/dma-mapping.h | 10 +++++-
> arch/x86/include/asm/dma_debug.h | 20 +++++++++++++
> arch/x86/kernel/pci-dma-debug.c | 56 ++++++++++++++++++++++++++++++++++++
> 3 files changed, 84 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> index c7bdb75..2893adb 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -304,8 +304,12 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
> if (!ops->alloc_coherent)
> return NULL;
>
> - return ops->alloc_coherent(dev, size, dma_handle,
> - dma_alloc_coherent_gfp_flags(dev, gfp));
> + memory = ops->alloc_coherent(dev, size, dma_handle,
> + dma_alloc_coherent_gfp_flags(dev, gfp));
> +
> + debug_alloc_coherent(dev, size, *dma_handle, memory);
> +
> + return memory;
> }
>
> static inline void dma_free_coherent(struct device *dev, size_t size,
> @@ -320,6 +324,8 @@ static inline void dma_free_coherent(struct device *dev, size_t size,
>
> if (ops->free_coherent)
> ops->free_coherent(dev, size, vaddr, bus);
> +
> + debug_free_coherent(dev, size, vaddr, bus);
> }
>
> #endif
> diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
> index ff06d1c..7245e27 100644
> --- a/arch/x86/include/asm/dma_debug.h
> +++ b/arch/x86/include/asm/dma_debug.h
> @@ -59,6 +59,14 @@ extern
> void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
> int nelems, int dir);
>
> +extern
> +void debug_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t dma_addr, void *virt);
> +
> +extern
> +void debug_free_coherent(struct device *dev, size_t size,
> + void *virt, dma_addr_t addr);
> +
> #else /* CONFIG_DMA_API_DEBUG */
>
> static inline
> @@ -90,6 +98,18 @@ void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
> {
> }
>
> +static inline
> +void debug_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t dma_addr, void *virt)
> +{
> +}
> +
> +static inline
> +void debug_free_coherent(struct device *dev, size_t size,
> + void *virt, dma_addr_t addr)
> +{
> +}
> +
> #endif /* CONFIG_DMA_API_DEBUG */
>
> #endif /* __ASM_X86_DMA_DEBUG */
> diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
> index 55ef69a..db5ef9a 100644
> --- a/arch/x86/kernel/pci-dma-debug.c
> +++ b/arch/x86/kernel/pci-dma-debug.c
> @@ -352,3 +352,59 @@ void debug_unmap_sg(struct device *dev, struct scatterlist *sglist,
> }
> EXPORT_SYMBOL(debug_unmap_sg);
>
> +void debug_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t dma_addr, void *virt)
> +{
> + unsigned long flags;
> + struct dma_debug_entry *entry;
> +
> + if (dma_addr == bad_dma_address)
> + return;
> +
> + entry = dma_entry_alloc();
> + if (!entry)
> + return;
> +
> + entry->type = DMA_DEBUG_COHERENT;
> + entry->dev = dev;
> + entry->cpu_addr = virt;
> + entry->size = size;
> + entry->dev_addr = dma_addr;
> + entry->direction = DMA_BIDIRECTIONAL;
> +
> + spin_lock_irqsave(&dma_lock, flags);
> + add_dma_entry(entry);
> + spin_unlock_irqrestore(&dma_lock, flags);
> +}
> +EXPORT_SYMBOL(debug_alloc_coherent);
Can you clean up the duplication in debug_map_single, debug_map_sg,
and debug_alloc_coherent? The higher-level helper functions might
help.
> +void debug_free_coherent(struct device *dev, size_t size,
> + void *virt, dma_addr_t addr)
> +{
> + unsigned long flags;
> + struct dma_debug_entry ref = {
> + .type = DMA_DEBUG_COHERENT,
> + .dev = dev,
> + .cpu_addr = virt,
> + .dev_addr = addr,
> + .size = size,
> + .direction = DMA_BIDIRECTIONAL,
> + };
> + struct dma_debug_entry *entry;
> +
> + if (addr == bad_dma_address)
> + return;
> +
> + spin_lock_irqsave(&dma_lock, flags);
> +
> + entry = find_dma_entry(&ref);
> +
> + if (check_unmap(&ref, entry)) {
> + remove_dma_entry(entry);
> + dma_entry_free(entry);
> + }
> +
> + spin_unlock_irqrestore(&dma_lock, flags);
> +}
> +EXPORT_SYMBOL(debug_free_coherent);
Ditto.
On Fri, 21 Nov 2008 17:26:03 +0100
Joerg Roedel <[email protected]> wrote:
> Impact: creates necessary data structures for DMA-API debugging
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> arch/x86/include/asm/dma-mapping.h | 1 +
> arch/x86/include/asm/dma_debug.h | 14 +++++
> arch/x86/kernel/Makefile | 2 +
> arch/x86/kernel/pci-dma-debug.c | 111 ++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/pci-dma.c | 2 +
> 5 files changed, 130 insertions(+), 0 deletions(-)
> create mode 100644 arch/x86/kernel/pci-dma-debug.c
>
> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> index 7f225a4..83d7b7d 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -9,6 +9,7 @@
> #include <linux/scatterlist.h>
> #include <asm/io.h>
> #include <asm/swiotlb.h>
> +#include <asm/dma_debug.h>
> #include <asm-generic/dma-coherent.h>
>
> extern dma_addr_t bad_dma_address;
> diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
> index d79f024..f2c3d53 100644
> --- a/arch/x86/include/asm/dma_debug.h
> +++ b/arch/x86/include/asm/dma_debug.h
> @@ -38,4 +38,18 @@ struct dma_debug_entry {
> int direction;
> };
>
> +#ifdef CONFIG_DMA_API_DEBUG
> +
> +extern
> +void dma_debug_init(void);
> +
> +#else /* CONFIG_DMA_API_DEBUG */
> +
> +static inline
> +void dma_debug_init(void)
> +{
> +}
> +
> +#endif /* CONFIG_DMA_API_DEBUG */
> +
> #endif /* __ASM_X86_DMA_DEBUG */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index e489ff9..6271cd2 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -105,6 +105,8 @@ microcode-$(CONFIG_MICROCODE_INTEL) += microcode_intel.o
> microcode-$(CONFIG_MICROCODE_AMD) += microcode_amd.o
> obj-$(CONFIG_MICROCODE) += microcode.o
>
> +obj-$(CONFIG_DMA_API_DEBUG) += pci-dma-debug.o
> +
> ###
> # 64 bit specific files
> ifeq ($(CONFIG_X86_64),y)
> diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
> new file mode 100644
> index 0000000..c2d3408
> --- /dev/null
> +++ b/arch/x86/kernel/pci-dma-debug.c
> @@ -0,0 +1,111 @@
> +/*
> + * Copyright (C) 2008 Advanced Micro Devices, Inc.
> + *
> + * Author: Joerg Roedel <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> + */
> +
> +#include <linux/types.h>
> +#include <linux/scatterlist.h>
> +#include <linux/list.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/module.h>
> +#include <linux/hardirq.h>
> +#include <linux/dma-mapping.h>
> +#include <asm/bug.h>
> +#include <asm/dma-mapping.h>
> +#include <asm/dma_debug.h>
> +
> +#define HASH_SIZE 256
> +#define HASH_FN_SHIFT 20
> +#define HASH_FN_MASK 0xffULL
> +
> +/* Hash list to save the allocated dma addresses */
> +static struct list_head dma_entry_hash[HASH_SIZE];
> +
> +/* A slab cache to allocate dma_map_entries fast */
> +static struct kmem_cache *dma_entry_cache;
> +
> +/* lock to protect the data structures */
> +static DEFINE_SPINLOCK(dma_lock);
> +
> +static int hash_fn(struct dma_debug_entry *entry)
> +{
> + /*
> + * Hash function is based on the dma address.
> + * We use bits 20-27 here as the index into the hash
> + */
> + BUG_ON(entry->dev_addr == bad_dma_address);
'bad_dma_address' is x86 specific. You already found it though.
On Fri, 21 Nov 2008 18:45:51 +0100
Ingo Molnar <[email protected]> wrote:
>
> * Joerg Roedel <[email protected]> wrote:
>
> > On Fri, Nov 21, 2008 at 05:24:29PM +0000, David Woodhouse wrote:
> > > On Fri, 2008-11-21 at 18:20 +0100, Joerg Roedel wrote:
> > > > Ok, I will move the generic bits to lib/ and include/linux and let
> > > > architectures decide if they want to use it.
> > >
> > > Once you've done that, I'll try to hook it up on PowerPC to make
> > > sure it works there.
> >
> > Ok, cool. Thanks
>
> i'll give it a whirl on x86 once the allocation bug is resolved. x86
> testing will be the most interesting in practice, because most drivers
> there are developed with no dynamic DMA in mind. (many of the x86
> drivers were developed before IOMMUs were supported in Linux)
Yeah, one of the problems due to this is that some drivers wrongly
assume that the dma mapping operations never fail (they do DMA with a
wrong address). With VT-d and AMD IOMMU, the dma mapping operations
fail just because of OOM.
Some time ago, I hooked the fault injection mechanism to the dma
mapping operations (I linked struct fault_attr to struct pci_dev so
you can make dma_map_single/sg fail with a particular pci device). It
might interest some people:
git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git dmafault
On Sat, Nov 22, 2008 at 12:27:39PM +0900, FUJITA Tomonori wrote:
> On Fri, 21 Nov 2008 17:26:00 +0100
> Joerg Roedel <[email protected]> wrote:
>
> > Hi,
> >
> > this patchset introduces code to debug drivers usage of the DMA-API.
> > Tests with hardware IOMMUs have shown several bugs in drivers regarding
> > the usage of that API.
> > Problems were found especially in network card drivers.
>
> Especially? Have you met dma similar bugs with storage (scsi, etc)
> drivers?
No. For me the code only triggered network card drivers. Namely the
ixgbe, tg3 and bnx2 driver.
>
> > This way driver developers get an idea where the problem is in their
> > code.
> >
> > Please review and send any objections or, if there are none, consider
> > for inclusion ;)
>
> This should be architecture independent code. Except for that, it
> looks very useful.
Thanks,
Joerg
On Sat, Nov 22, 2008 at 12:27:43PM +0900, FUJITA Tomonori wrote:
> On Fri, 21 Nov 2008 18:45:51 +0100
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Joerg Roedel <[email protected]> wrote:
> >
> > > On Fri, Nov 21, 2008 at 05:24:29PM +0000, David Woodhouse wrote:
> > > > On Fri, 2008-11-21 at 18:20 +0100, Joerg Roedel wrote:
> > > > > Ok, I will move the generic bits to lib/ and include/linux and let
> > > > > architectures decide if they want to use it.
> > > >
> > > > Once you've done that, I'll try to hook it up on PowerPC to make
> > > > sure it works there.
> > >
> > > Ok, cool. Thanks
> >
> > i'll give it a whirl on x86 once the allocation bug is resolved. x86
> > testing will be the most interesting in practice, because most drivers
> > there are developed with no dynamic DMA in mind. (many of the x86
> > drivers were developed before IOMMUs were supported in Linux)
>
> Yeah, one of the problems due to this is that some drivers wrongly
> assume that the dma mapping operations never fail (they do DMA with a
> wrong address). With VT-d and AMD IOMMU, the dma mapping operations
> fail just because of OOM.
At least AMD IOMMU code will not fail because of memory shortage. All
necessary data structures, including the pagetables, are preallocated.
The only place were it might fail is dma_alloc_coherent. But that is not
specific to that driver.
> Some time ago, I hooked the fault injection mechanism to the dma
> mapping operations (I linked struct fault_attr to struct pci_dev so
> you can make dma_map_single/sg fail with a particular pci device). It
> might interest some people:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git dmafault
This would also be helpful to find bugs in drivers together with this
code. Do you plan to submit it?
Joerg
On Sat, Nov 22, 2008 at 12:27:42PM +0900, FUJITA Tomonori wrote:
> On Fri, 21 Nov 2008 17:26:07 +0100
> Joerg Roedel <[email protected]> wrote:
> > +void debug_alloc_coherent(struct device *dev, size_t size,
> > + dma_addr_t dma_addr, void *virt)
> > +{
> > + unsigned long flags;
> > + struct dma_debug_entry *entry;
> > +
> > + if (dma_addr == bad_dma_address)
> > + return;
> > +
> > + entry = dma_entry_alloc();
> > + if (!entry)
> > + return;
> > +
> > + entry->type = DMA_DEBUG_COHERENT;
> > + entry->dev = dev;
> > + entry->cpu_addr = virt;
> > + entry->size = size;
> > + entry->dev_addr = dma_addr;
> > + entry->direction = DMA_BIDIRECTIONAL;
> > +
> > + spin_lock_irqsave(&dma_lock, flags);
> > + add_dma_entry(entry);
> > + spin_unlock_irqrestore(&dma_lock, flags);
> > +}
> > +EXPORT_SYMBOL(debug_alloc_coherent);
>
> Can you clean up the duplication in debug_map_single, debug_map_sg,
> and debug_alloc_coherent? The higher-level helper functions might
> help.
Hmm, lets see. For me it makes only sense if it won't result in helper
functions with tons of parameters. This is worse than little code
duplication. But lets see.
Joerg
On Sat, Nov 22, 2008 at 12:27:41PM +0900, FUJITA Tomonori wrote:
> On Fri, 21 Nov 2008 17:26:05 +0100
> Joerg Roedel <[email protected]> wrote:
>
> > Impact: detect bugs in map/unmap_single usage
> >
> > Signed-off-by: Joerg Roedel <[email protected]>
> > ---
> > arch/x86/include/asm/dma-mapping.h | 9 +++++-
> > arch/x86/include/asm/dma_debug.h | 20 +++++++++++++
> > arch/x86/kernel/pci-dma-debug.c | 55 ++++++++++++++++++++++++++++++++++++
> > 3 files changed, 83 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> > index 83d7b7d..c9bead2 100644
> > --- a/arch/x86/include/asm/dma-mapping.h
> > +++ b/arch/x86/include/asm/dma-mapping.h
> > @@ -98,9 +98,14 @@ dma_map_single(struct device *hwdev, void *ptr, size_t size,
> > int direction)
> > {
> > struct dma_mapping_ops *ops = get_dma_ops(hwdev);
> > + dma_addr_t addr;
> >
> > BUG_ON(!valid_dma_direction(direction));
> > - return ops->map_single(hwdev, virt_to_phys(ptr), size, direction);
> > + addr = ops->map_single(hwdev, virt_to_phys(ptr), size, direction);
> > +
> > + debug_map_single(hwdev, ptr, size, direction, addr);
>
> debug_map_single could fail due to OOM. Then debug_unmap_single in
> dma_unmap_single gives a false warning because it can't find the
> dma_debug_entry?
True. I will add a check to disable checking when a map operation has
failed.
Joerg
On Sat, Nov 22, 2008 at 12:27:41PM +0900, FUJITA Tomonori wrote:
> On Fri, 21 Nov 2008 17:26:03 +0100
> Joerg Roedel <[email protected]> wrote:
>
> > Impact: creates necessary data structures for DMA-API debugging
> >
> > Signed-off-by: Joerg Roedel <[email protected]>
> > ---
> > arch/x86/include/asm/dma-mapping.h | 1 +
> > arch/x86/include/asm/dma_debug.h | 14 +++++
> > arch/x86/kernel/Makefile | 2 +
> > arch/x86/kernel/pci-dma-debug.c | 111 ++++++++++++++++++++++++++++++++++++
> > arch/x86/kernel/pci-dma.c | 2 +
> > 5 files changed, 130 insertions(+), 0 deletions(-)
> > create mode 100644 arch/x86/kernel/pci-dma-debug.c
> >
> > diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> > index 7f225a4..83d7b7d 100644
> > --- a/arch/x86/include/asm/dma-mapping.h
> > +++ b/arch/x86/include/asm/dma-mapping.h
> > @@ -9,6 +9,7 @@
> > #include <linux/scatterlist.h>
> > #include <asm/io.h>
> > #include <asm/swiotlb.h>
> > +#include <asm/dma_debug.h>
> > #include <asm-generic/dma-coherent.h>
> >
> > extern dma_addr_t bad_dma_address;
> > diff --git a/arch/x86/include/asm/dma_debug.h b/arch/x86/include/asm/dma_debug.h
> > index d79f024..f2c3d53 100644
> > --- a/arch/x86/include/asm/dma_debug.h
> > +++ b/arch/x86/include/asm/dma_debug.h
> > @@ -38,4 +38,18 @@ struct dma_debug_entry {
> > int direction;
> > };
> >
> > +#ifdef CONFIG_DMA_API_DEBUG
> > +
> > +extern
> > +void dma_debug_init(void);
> > +
> > +#else /* CONFIG_DMA_API_DEBUG */
> > +
> > +static inline
> > +void dma_debug_init(void)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_DMA_API_DEBUG */
> > +
> > #endif /* __ASM_X86_DMA_DEBUG */
> > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> > index e489ff9..6271cd2 100644
> > --- a/arch/x86/kernel/Makefile
> > +++ b/arch/x86/kernel/Makefile
> > @@ -105,6 +105,8 @@ microcode-$(CONFIG_MICROCODE_INTEL) += microcode_intel.o
> > microcode-$(CONFIG_MICROCODE_AMD) += microcode_amd.o
> > obj-$(CONFIG_MICROCODE) += microcode.o
> >
> > +obj-$(CONFIG_DMA_API_DEBUG) += pci-dma-debug.o
> > +
> > ###
> > # 64 bit specific files
> > ifeq ($(CONFIG_X86_64),y)
> > diff --git a/arch/x86/kernel/pci-dma-debug.c b/arch/x86/kernel/pci-dma-debug.c
> > new file mode 100644
> > index 0000000..c2d3408
> > --- /dev/null
> > +++ b/arch/x86/kernel/pci-dma-debug.c
> > @@ -0,0 +1,111 @@
> > +/*
> > + * Copyright (C) 2008 Advanced Micro Devices, Inc.
> > + *
> > + * Author: Joerg Roedel <[email protected]>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of the GNU General Public License version 2 as published
> > + * by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> > + */
> > +
> > +#include <linux/types.h>
> > +#include <linux/scatterlist.h>
> > +#include <linux/list.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/module.h>
> > +#include <linux/hardirq.h>
> > +#include <linux/dma-mapping.h>
> > +#include <asm/bug.h>
> > +#include <asm/dma-mapping.h>
> > +#include <asm/dma_debug.h>
> > +
> > +#define HASH_SIZE 256
> > +#define HASH_FN_SHIFT 20
> > +#define HASH_FN_MASK 0xffULL
> > +
> > +/* Hash list to save the allocated dma addresses */
> > +static struct list_head dma_entry_hash[HASH_SIZE];
> > +
> > +/* A slab cache to allocate dma_map_entries fast */
> > +static struct kmem_cache *dma_entry_cache;
> > +
> > +/* lock to protect the data structures */
> > +static DEFINE_SPINLOCK(dma_lock);
> > +
> > +static int hash_fn(struct dma_debug_entry *entry)
> > +{
> > + /*
> > + * Hash function is based on the dma address.
> > + * We use bits 20-27 here as the index into the hash
> > + */
> > + BUG_ON(entry->dev_addr == bad_dma_address);
>
> 'bad_dma_address' is x86 specific. You already found it though.
Interesting. Is there another value for dma_addr_t which drivers can
check for to find out if a dma-api operation failed?
Joerg
On Fri, Nov 21, 2008 at 06:43:48PM +0100, Ingo Molnar wrote:
>
> * Joerg Roedel <[email protected]> wrote:
>
> > +static struct list_head dma_entry_hash[HASH_SIZE];
> > +
> > +/* A slab cache to allocate dma_map_entries fast */
> > +static struct kmem_cache *dma_entry_cache;
> > +
> > +/* lock to protect the data structures */
> > +static DEFINE_SPINLOCK(dma_lock);
>
> some more generic comments about the data structure: it's main purpose
> is to provide a mapping based on (dev,addr). There's little if any
> cross-entry interaction - same-address+same-dev DMA is checked.
>
> 1)
>
> the hash:
>
> + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
>
> should mix in entry->dev as well - that way we get not just per
> address but per device hash space separation as well.
>
> 2)
>
> HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in
> practice albeit perhaps a bit too small. There's seldom any coherency
> between the physical addresses of DMA - we rarely have any real
> (performance-relevant) physical co-location of DMA addresses beyond 4K
> granularity. So using 1MB chunking here will discard a good deal of
> random low bits we should be hashing on.
>
> 3)
>
> And the most scalable locking would be per hash bucket locking - no
> global lock is needed. The bucket hash heads should probably be
> cacheline sized - so we'd get one lock per bucket.
Hmm, I just had the idea of saving this data in struct device. How about
that? The locking should scale too and we can extend it easier. For
example it simplifys a per-device disable function for the checking. Or
another future feature might be leak tracing.
> This way if there's irq+DMA traffic on one CPU from one device into
> one range of memory, and irq+DMA traffic on another CPU to another
> device, they will map to two different hash buckets.
>
> 4)
>
> Plus it might be an option to make hash lookup lockless as well:
> depending on the DMA flux we can get a lot of lookups, and taking the
> bucket lock can be avoided, if you use RCU-safe list ops and drive the
> refilling of the free entries pool from RCU.
Joerg
On Sat, 22 Nov 2008 10:40:32 +0100
Joerg Roedel <[email protected]> wrote:
> > > +static int hash_fn(struct dma_debug_entry *entry)
> > > +{
> > > + /*
> > > + * Hash function is based on the dma address.
> > > + * We use bits 20-27 here as the index into the hash
> > > + */
> > > + BUG_ON(entry->dev_addr == bad_dma_address);
> >
> > 'bad_dma_address' is x86 specific. You already found it though.
>
> Interesting. Is there another value for dma_addr_t which drivers can
> check for to find out if a dma-api operation failed?
They are architecture dependent. But only X86 uses a variable because
of GART and swiotlb, I think.
On Sat, 22 Nov 2008 10:33:18 +0100
Joerg Roedel <[email protected]> wrote:
> On Sat, Nov 22, 2008 at 12:27:43PM +0900, FUJITA Tomonori wrote:
> > On Fri, 21 Nov 2008 18:45:51 +0100
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Joerg Roedel <[email protected]> wrote:
> > >
> > > > On Fri, Nov 21, 2008 at 05:24:29PM +0000, David Woodhouse wrote:
> > > > > On Fri, 2008-11-21 at 18:20 +0100, Joerg Roedel wrote:
> > > > > > Ok, I will move the generic bits to lib/ and include/linux and let
> > > > > > architectures decide if they want to use it.
> > > > >
> > > > > Once you've done that, I'll try to hook it up on PowerPC to make
> > > > > sure it works there.
> > > >
> > > > Ok, cool. Thanks
> > >
> > > i'll give it a whirl on x86 once the allocation bug is resolved. x86
> > > testing will be the most interesting in practice, because most drivers
> > > there are developed with no dynamic DMA in mind. (many of the x86
> > > drivers were developed before IOMMUs were supported in Linux)
> >
> > Yeah, one of the problems due to this is that some drivers wrongly
> > assume that the dma mapping operations never fail (they do DMA with a
> > wrong address). With VT-d and AMD IOMMU, the dma mapping operations
> > fail just because of OOM.
>
> At least AMD IOMMU code will not fail because of memory shortage. All
> necessary data structures, including the pagetables, are preallocated.
> The only place were it might fail is dma_alloc_coherent. But that is not
> specific to that driver.
Oops, you are right. Somehow I wrongly thought that AMD IOMMU
allocates pte on the fly like VT-d.
> > Some time ago, I hooked the fault injection mechanism to the dma
> > mapping operations (I linked struct fault_attr to struct pci_dev so
> > you can make dma_map_single/sg fail with a particular pci device). It
> > might interest some people:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git dmafault
>
> This would also be helpful to find bugs in drivers together with this
> code. Do you plan to submit it?
Yeah, with your debug mechanism, this might be more helpful because
the debug mechanism can catch drivers that don't handle dma mapping
failure properly. I'll submit this after you finish the debug
mechanism.
Thanks,
On Sat, 22 Nov 2008 19:16:05 +0900
FUJITA Tomonori <[email protected]> wrote:
> On Sat, 22 Nov 2008 10:40:32 +0100
> Joerg Roedel <[email protected]> wrote:
>
> > > > +static int hash_fn(struct dma_debug_entry *entry)
> > > > +{
> > > > + /*
> > > > + * Hash function is based on the dma address.
> > > > + * We use bits 20-27 here as the index into the hash
> > > > + */
> > > > + BUG_ON(entry->dev_addr == bad_dma_address);
> > >
> > > 'bad_dma_address' is x86 specific. You already found it though.
> >
> > Interesting. Is there another value for dma_addr_t which drivers can
> > check for to find out if a dma-api operation failed?
>
> They are architecture dependent. But only X86 uses a variable because
> of GART and swiotlb, I think.
BTW, this code doesn't work even on x86 (swiotlb). dma_map_error
should be used, which is an architecture-independent way to test a dma
address.
* Joerg Roedel <[email protected]> wrote:
> On Fri, Nov 21, 2008 at 06:43:48PM +0100, Ingo Molnar wrote:
> >
> > * Joerg Roedel <[email protected]> wrote:
> >
> > > +static struct list_head dma_entry_hash[HASH_SIZE];
> > > +
> > > +/* A slab cache to allocate dma_map_entries fast */
> > > +static struct kmem_cache *dma_entry_cache;
> > > +
> > > +/* lock to protect the data structures */
> > > +static DEFINE_SPINLOCK(dma_lock);
> >
> > some more generic comments about the data structure: it's main purpose
> > is to provide a mapping based on (dev,addr). There's little if any
> > cross-entry interaction - same-address+same-dev DMA is checked.
> >
> > 1)
> >
> > the hash:
> >
> > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
> >
> > should mix in entry->dev as well - that way we get not just per
> > address but per device hash space separation as well.
> >
> > 2)
> >
> > HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in
> > practice albeit perhaps a bit too small. There's seldom any coherency
> > between the physical addresses of DMA - we rarely have any real
> > (performance-relevant) physical co-location of DMA addresses beyond 4K
> > granularity. So using 1MB chunking here will discard a good deal of
> > random low bits we should be hashing on.
> >
> > 3)
> >
> > And the most scalable locking would be per hash bucket locking - no
> > global lock is needed. The bucket hash heads should probably be
> > cacheline sized - so we'd get one lock per bucket.
>
> Hmm, I just had the idea of saving this data in struct device. How
> about that? The locking should scale too and we can extend it
> easier. For example it simplifys a per-device disable function for
> the checking. Or another future feature might be leak tracing.
that will help with spreading the hash across devices, but brings in
lifetime issues: you must be absolutely sure all DMA has drained at
the point a device is deinitialized.
Dunno ... i think it's still better to have a central hash for all DMA
ops that is aware of per device details.
The moment we spread this out to struct device we've lost the ability
to _potentially_ do inter-device checking. (for example to make sure
no other device is DMA-ing on the same address - which is a possible
bug pattern as well although right now Linux does not really avoid it
actively)
Hm?
Btw., also have a look at lib/debugobjects.c: i think we should also
consider extending that facility and then just hook it up to the DMA
ops. The DMA checking is kind of a similar "op lifetime" scenario -
with a few extras to extend lib/debugobjects.c with. Note how it
already has pooling, a good hash, etc.
Ingo
* Ingo Molnar <[email protected]> wrote:
>
> * Joerg Roedel <[email protected]> wrote:
>
> > On Fri, Nov 21, 2008 at 06:43:48PM +0100, Ingo Molnar wrote:
> > >
> > > * Joerg Roedel <[email protected]> wrote:
> > >
> > > > +static struct list_head dma_entry_hash[HASH_SIZE];
> > > > +
> > > > +/* A slab cache to allocate dma_map_entries fast */
> > > > +static struct kmem_cache *dma_entry_cache;
> > > > +
> > > > +/* lock to protect the data structures */
> > > > +static DEFINE_SPINLOCK(dma_lock);
> > >
> > > some more generic comments about the data structure: it's main purpose
> > > is to provide a mapping based on (dev,addr). There's little if any
> > > cross-entry interaction - same-address+same-dev DMA is checked.
> > >
> > > 1)
> > >
> > > the hash:
> > >
> > > + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
> > >
> > > should mix in entry->dev as well - that way we get not just per
> > > address but per device hash space separation as well.
> > >
> > > 2)
> > >
> > > HASH_FN_SHIFT is 1MB chunks right now - that's probably fine in
> > > practice albeit perhaps a bit too small. There's seldom any coherency
> > > between the physical addresses of DMA - we rarely have any real
> > > (performance-relevant) physical co-location of DMA addresses beyond 4K
> > > granularity. So using 1MB chunking here will discard a good deal of
> > > random low bits we should be hashing on.
> > >
> > > 3)
> > >
> > > And the most scalable locking would be per hash bucket locking - no
> > > global lock is needed. The bucket hash heads should probably be
> > > cacheline sized - so we'd get one lock per bucket.
> >
> > Hmm, I just had the idea of saving this data in struct device. How
> > about that? The locking should scale too and we can extend it
> > easier. For example it simplifys a per-device disable function for
> > the checking. Or another future feature might be leak tracing.
>
> that will help with spreading the hash across devices, but brings in
> lifetime issues: you must be absolutely sure all DMA has drained at
> the point a device is deinitialized.
Note that obviously proper DMA quiescence is a must-have during device
dinit anyway, but still, it's an extra complication to init/deinit the
hashes, etc.
Ingo
Another generic suggestion:
Why not just replace dma_ops with a debug version? That way it could
be a runtime feature based off the already existing DMA op callbacks,
without any extra overhead.
I'd prefer such a solution much more with an x86 architecture
maintainer hat on, because this way distributions could enable this
debug feature (with the facility being off by default) without
worrying about the wrapping overhead.
This would basically be a special variant of stacked DMA ops support.
Ingo
Joerg Roedel <[email protected]> writes:
> +/* Hash list to save the allocated dma addresses */
> +static struct list_head dma_entry_hash[HASH_SIZE];
Hash tables should use hlists.
> +static int hash_fn(struct dma_debug_entry *entry)
> +{
> + /*
> + * Hash function is based on the dma address.
> + * We use bits 20-27 here as the index into the hash
> + */
> + BUG_ON(entry->dev_addr == bad_dma_address);
> +
> + return (entry->dev_addr >> HASH_FN_SHIFT) & HASH_FN_MASK;
It would be probably safer to use a stronger hash like FNV
There are a couple to reuse in include/
> +}
> +
> +static struct dma_debug_entry *dma_entry_alloc(void)
> +{
> + gfp_t gfp = GFP_KERNEL | __GFP_ZERO;
> +
> + if (in_atomic())
> + gfp |= GFP_ATOMIC;
> +
> + return kmem_cache_alloc(dma_entry_cache, gfp);
> +}
While the basic idea is reasonable this function is unfortunately
broken. It's not always safe to allocate memory (e.g. in the block
write out path which uses map_sg). You would need to use
a mempool or something.
Besides the other problem of using GFP_ATOMIC is that it can
fail under high load and you don't handle this case very well
(would report a bug incorrectly). And stress tests tend to
trigger that, reporting false positives in such a case is a very very
bad thing, it leads to QA people putting these messages
on their blacklists.
-Andi
--
[email protected]