2012-10-03 14:56:32

by Shuah Khan

[permalink] [raw]
Subject: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors

A recent dma mapping error analysis effort showed that a large percentage
of dma_map_single() and dma_map_page() returns are not checked for mapping
errors.

Reference:
http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

Adding support for tracking dma mapping and unmapping errors to help assess
the following:

When do dma mapping errors get detected?
How often do these errors occur?
Why don't we see failures related to missing dma mapping error checks?
Are they silent failures?

Patch v3: Addresses review and design comments from Patch v2.
Patch v2: Addressed design issues from Patch v1.

Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.

map_errors: (system wide counter)
Total number of dma mapping errors returned by the dma mapping interfaces,
in response to mapping requests from all devices in the system.
map_errors_not_checked: (system wide counter)
Total number of dma mapping errors devices failed to check before using
the returned address.
unmap_errors: (system wide counter)
Total number of times devices tried to unmap or free an invalid dma
address.
map_err_type: (new field added to dma_debug_entry structure)
New field to maintain dma mapping error check status. This error type
is applicable to the dma map page and dma map single entries tracked by
dma-debug api. This status indicates whether or not a good mapping is
checked by the device before its use. dma_map_single() and dma_map_page()
could fail to create a mapping in some cases, and drivers are expected to
call dma_mapping_error() to check for errors. Please note that this is not
counter.

Enhancements to dma-debug api are made to add new debugfs interfaces to
report total dma errors, dma errors that are not checked, and unmap errors
for the entire system. Please note that these are system wide counters for
all devices in the system.

The following new dma-debug interface is added:

debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
Sets dma map error checked status for the dma map entry if one is
found. Decrements the system wide dma_map_errors_not_checked counter
that is incremented by debug_dma_map_page() when it checks for
mapping error before adding it to the dma debug entry table.

New dma-debug internal interface:
check_mapping_error(struct device *dev, dma_addr_t dma_addr)
Calling dma_mapping_error() from dma-debug api will result in dma mapping
check when it shouldn't. This internal routine checks dma mapping error
without any debug checks.

The following existing dma-debug api are changed to support this feature:
debug_dma_map_page()
Increments dma_map_errors and dma_map_errors_not_checked errors totals
for the system, dma-debug api keeps track of, when dma_addr is invalid.
This routine now calls internal check_mapping_error() interface to
avoid doing dma mapping debug checks from dma-debug internal mapping
error checks.
check_unmap()
This is an existing internal routines that checks for various mapping
errors. Changed to increment system wide dma_unmap_errors, when a
device requests an invalid address to be unmapped. This routine now
calls internal check_mapping_error() interface to avoid doing dma
mapping debug checks from dma-debug internal mapping error checks.

Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
to validate these new interfaces on x86_64. Other architectures will be
changed in a subsequent patch.

Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
CONFIG_DMA_API_DEBUG enabled and disabled.

Signed-off-by: Shuah Khan <[email protected]>
---
Documentation/DMA-API.txt | 13 ++++
arch/x86/include/asm/dma-mapping.h | 1 +
include/linux/dma-debug.h | 7 ++
lib/dma-debug.c | 128 ++++++++++++++++++++++++++++++++----
4 files changed, 136 insertions(+), 13 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 66bd97a..886aaf9 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -638,6 +638,19 @@ this directory the following files can currently be found:
dma-api/error_count This file is read-only and shows the total
numbers of errors found.

+ dma-api/map_errors This file is read-only and shows the total
+ number of dma mapping errors detected.
+
+ dma-api/map_errors_not_checked
+ This file is read-only and shows the total
+ number of dma mapping errors, device drivers
+ failed to check prior to using the returned
+ address.
+
+ dma-api/unmap_errors
+ This file is read-only and shows the total
+ number of invalid dma unmapping attempts.
+
dma-api/num_errors The number in this file shows how many
warnings will be printed to the kernel log
before it stops. This number is initialized to
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index f7b4c79..808dae6 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
{
struct dma_map_ops *ops = get_dma_ops(dev);
+ debug_dma_mapping_error(dev, dma_addr);
if (ops->mapping_error)
return ops->mapping_error(dev, dma_addr);

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 171ad8a..fc0e34c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
int direction, dma_addr_t dma_addr,
bool map_single);

+extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, int direction, bool map_single);

@@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
{
}

+static inline void debug_dma_mapping_error(struct device *dev,
+ dma_addr_t dma_addr)
+{
+}
+
static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, int direction,
bool map_single)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 66ce414..1708a10 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -45,18 +45,25 @@ enum {
dma_debug_coherent,
};

+enum map_err_types {
+ MAP_ERR_CHECK_NOT_APPLICABLE,
+ MAP_ERR_NOT_CHECKED,
+ MAP_ERR_CHECKED,
+};
+
#define DMA_DEBUG_STACKTRACE_ENTRIES 5

struct dma_debug_entry {
- struct list_head list;
- struct device *dev;
- int type;
- phys_addr_t paddr;
- u64 dev_addr;
- u64 size;
- int direction;
- int sg_call_ents;
- int sg_mapped_ents;
+ struct list_head list;
+ struct device *dev;
+ int type;
+ phys_addr_t paddr;
+ u64 dev_addr;
+ u64 size;
+ int direction;
+ int sg_call_ents;
+ int sg_mapped_ents;
+ enum map_err_types map_err_type;
#ifdef CONFIG_STACKTRACE
struct stack_trace stacktrace;
unsigned long st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
@@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
/* Global error count */
static u32 error_count;

+/* dma mapping error counts */
+static u32 map_errors;
+static u32 map_errors_not_checked;
+static u32 unmap_errors;
+
/* Global error show enable*/
static u32 show_all_errors __read_mostly;
/* Number of errors to show */
@@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent __read_mostly;
static struct dentry *num_free_entries_dent __read_mostly;
static struct dentry *min_free_entries_dent __read_mostly;
static struct dentry *filter_dent __read_mostly;
+static struct dentry *map_errors_dent __read_mostly;
+static struct dentry *map_errors_not_checked_dent __read_mostly;
+static struct dentry *unmap_errors_dent __read_mostly;

/* per-driver filter related state */

@@ -114,6 +129,12 @@ static struct device_driver *current_driver __read_mostly;

static DEFINE_RWLOCK(driver_name_lock);

+static const char *const maperr2str[] = {
+ [MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
+ [MAP_ERR_NOT_CHECKED] = "dma map error not checked",
+ [MAP_ERR_CHECKED] = "dma map error checked",
+};
+
static const char *type2name[4] = { "single", "page",
"scather-gather", "coherent" };

@@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
list_for_each_entry(entry, &bucket->list, list) {
if (!dev || dev == entry->dev) {
dev_info(entry->dev,
- "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
+ "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
type2name[entry->type], idx,
(unsigned long long)entry->paddr,
entry->dev_addr, entry->size,
- dir2name[entry->direction]);
+ dir2name[entry->direction],
+ maperr2str[entry->map_err_type]);
}
}

@@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
if (!filter_dent)
goto out_err;

+ map_errors_dent = debugfs_create_u32("map_errors", 0444,
+ dma_debug_dent,
+ &map_errors);
+
+ if (!map_errors_dent)
+ goto out_err;
+
+ map_errors_not_checked_dent = debugfs_create_u32(
+ "map_errors_not_checked",
+ 0444,
+ dma_debug_dent,
+ &map_errors_not_checked);
+
+ if (!map_errors_not_checked_dent)
+ goto out_err;
+
+ unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
+ dma_debug_dent,
+ &unmap_errors);
+ if (!unmap_errors_dent)
+ goto out_err;
+
return 0;

out_err:
@@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
__setup("dma_debug=", dma_debug_cmdline);
__setup("dma_debug_entries=", dma_debug_entries_cmdline);

+/* Calling dma_mapping_error() from dma-debug api will result in calling
+ debug_dma_mapping_error() - need internal mapping error routine to
+ avoid debug checks */
+#ifndef DMA_ERROR_CODE
+#define DMA_ERROR_CODE 0
+#endif
+static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ if (ops->mapping_error)
+ return ops->mapping_error(dev, dma_addr);
+
+ return (dma_addr == DMA_ERROR_CODE);
+}
+
static void check_unmap(struct dma_debug_entry *ref)
{
struct dma_debug_entry *entry;
struct hash_bucket *bucket;
unsigned long flags;

- if (dma_mapping_error(ref->dev, ref->dev_addr)) {
+ if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
+ unmap_errors += 1;
err_printk(ref->dev, NULL, "DMA-API: device driver tries "
"to free an invalid DMA memory address\n");
return;
@@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
dir2name[ref->direction]);
}

+ if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
+ err_printk(ref->dev, entry,
+ "DMA-API: device driver failed to check map error"
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[mapped as %s]",
+ ref->dev_addr, ref->size,
+ type2name[entry->type]);
+ }
+
hash_bucket_del(entry);
dma_entry_free(entry);

@@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
if (unlikely(global_disable))
return;

- if (unlikely(dma_mapping_error(dev, dma_addr)))
+ if (unlikely(check_mapping_error(dev, dma_addr))) {
+ map_errors += 1;
+ map_errors_not_checked += 1;
return;
+ }

entry = dma_entry_alloc();
if (!entry)
@@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
entry->dev_addr = dma_addr;
entry->size = size;
entry->direction = direction;
+ entry->map_err_type = MAP_ERR_NOT_CHECKED;

if (map_single)
entry->type = dma_debug_single;
@@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
}
EXPORT_SYMBOL(debug_dma_map_page);

+void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+ struct dma_debug_entry ref;
+ struct dma_debug_entry *entry;
+ struct hash_bucket *bucket;
+ unsigned long flags;
+
+ if (unlikely(global_disable))
+ return;
+
+ ref.dev = dev;
+ ref.dev_addr = dma_addr;
+ bucket = get_hash_bucket(&ref, &flags);
+ entry = bucket_find_exact(bucket, &ref);
+
+ if (!entry) {
+ /* very likley dma-api didn't call debug_dma_map_page() or
+ debug_dma_map_page() detected mapping error */
+ if (map_errors_not_checked)
+ map_errors_not_checked -= 1;
+ goto out;
+ }
+
+ entry->map_err_type = MAP_ERR_CHECKED;
+out:
+ put_hash_bucket(bucket, &flags);
+}
+EXPORT_SYMBOL(debug_dma_mapping_error);
+
void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, int direction, bool map_single)
{
--
1.7.9.5



2012-10-03 21:45:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors

On Wed, 03 Oct 2012 08:55:59 -0600
Shuah Khan <[email protected]> wrote:

> A recent dma mapping error analysis effort showed that a large percentage
> of dma_map_single() and dma_map_page() returns are not checked for mapping
> errors.
>
> Reference:
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
>
> Adding support for tracking dma mapping and unmapping errors to help assess
> the following:
>
> When do dma mapping errors get detected?
> How often do these errors occur?
> Why don't we see failures related to missing dma mapping error checks?
> Are they silent failures?

This seems to be a strange way of addressing kernel programming errors.
Instead of fixing them up, we generate lots of statistics about how
often they happen!

Would it not be better to find and fix the buggy code sites? A
coccinelle script wold probably help here.

And let's also look at *why* we keep doing this. Partly it's because
these things are self-propagating - people copy-n-paste bad code so we
get more bad code.


Another reason surely is the poor documentation. Suppose our diligent
programmer goes to the dma_map_single() definition site:

#define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)

No documentation at all. Because it's a stupid macro it doesn't even
give the types and names of the arguments or the type of the return
value.

So he goes to dma_map_single_attrs() and finds that is altogether
undocmented.

So he goes into Documentation/DMA-API-HOWTO.txt, searches for
"dma_map_single" and finds

: To map a single region, you do:
:
: struct device *dev = &my_dev->dev;
: dma_addr_t dma_handle;
: void *addr = buffer->ptr;
: size_t size = buffer->len;
:
: dma_handle = dma_map_single(dev, addr, size, direction);
:
: and to unmap it:
:
: dma_unmap_single(dev, dma_handle, size, direction);


So it is hardly surprising that we keep screwing this up!

2012-10-04 14:13:25

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors

On Wed, Oct 03, 2012 at 02:45:11PM -0700, Andrew Morton wrote:
> On Wed, 03 Oct 2012 08:55:59 -0600
> Shuah Khan <[email protected]> wrote:
>
> > A recent dma mapping error analysis effort showed that a large percentage
> > of dma_map_single() and dma_map_page() returns are not checked for mapping
> > errors.
> >
> > Reference:
> > http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> >
> > Adding support for tracking dma mapping and unmapping errors to help assess
> > the following:
> >
> > When do dma mapping errors get detected?
> > How often do these errors occur?
> > Why don't we see failures related to missing dma mapping error checks?
> > Are they silent failures?
>
> This seems to be a strange way of addressing kernel programming errors.
> Instead of fixing them up, we generate lots of statistics about how
> often they happen!

And by using this we can fix the drivers.
>
> Would it not be better to find and fix the buggy code sites? A
> coccinelle script wold probably help here.

That is the end goal (fixing the buggy code sites). Shuah has
identified the bad culprits.

>
> And let's also look at *why* we keep doing this. Partly it's because
> these things are self-propagating - people copy-n-paste bad code so we
> get more bad code.


And this patch will now tell the poor engineers that write new code that
they pasted bad code.

>
>
> Another reason surely is the poor documentation. Suppose our diligent
> programmer goes to the dma_map_single() definition site:
>
> #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)
>
> No documentation at all. Because it's a stupid macro it doesn't even
> give the types and names of the arguments or the type of the return
> value.
>
> So he goes to dma_map_single_attrs() and finds that is altogether
> undocmented.
>
> So he goes into Documentation/DMA-API-HOWTO.txt, searches for
> "dma_map_single" and finds
>
> : To map a single region, you do:
> :
> : struct device *dev = &my_dev->dev;
> : dma_addr_t dma_handle;
> : void *addr = buffer->ptr;
> : size_t size = buffer->len;
> :
> : dma_handle = dma_map_single(dev, addr, size, direction);
> :
> : and to unmap it:
> :
> : dma_unmap_single(dev, dma_handle, size, direction);
>
>
> So it is hardly surprising that we keep screwing this up!

Right, so that should be also modified (Thank you for looking at that)!

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-10-04 17:49:57

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors

On Wed, Oct 03, 2012 at 08:55:59AM -0600, Shuah Khan wrote:
> A recent dma mapping error analysis effort showed that a large percentage
> of dma_map_single() and dma_map_page() returns are not checked for mapping
> errors.
>
> Reference:
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
>
> Adding support for tracking dma mapping and unmapping errors to help assess
> the following:
>
> When do dma mapping errors get detected?
> How often do these errors occur?
> Why don't we see failures related to missing dma mapping error checks?
> Are they silent failures?
>
> Patch v3: Addresses review and design comments from Patch v2.
> Patch v2: Addressed design issues from Patch v1.
>
> Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.
>
> map_errors: (system wide counter)
> Total number of dma mapping errors returned by the dma mapping interfaces,
> in response to mapping requests from all devices in the system.
> map_errors_not_checked: (system wide counter)
> Total number of dma mapping errors devices failed to check before using
> the returned address.
> unmap_errors: (system wide counter)
> Total number of times devices tried to unmap or free an invalid dma
> address.
> map_err_type: (new field added to dma_debug_entry structure)
> New field to maintain dma mapping error check status. This error type
> is applicable to the dma map page and dma map single entries tracked by
> dma-debug api. This status indicates whether or not a good mapping is
> checked by the device before its use. dma_map_single() and dma_map_page()
> could fail to create a mapping in some cases, and drivers are expected to
> call dma_mapping_error() to check for errors. Please note that this is not
> counter.
>
> Enhancements to dma-debug api are made to add new debugfs interfaces to
> report total dma errors, dma errors that are not checked, and unmap errors
> for the entire system. Please note that these are system wide counters for
> all devices in the system.
>
> The following new dma-debug interface is added:
>
> debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> Sets dma map error checked status for the dma map entry if one is
> found. Decrements the system wide dma_map_errors_not_checked counter
> that is incremented by debug_dma_map_page() when it checks for
> mapping error before adding it to the dma debug entry table.
>
> New dma-debug internal interface:
> check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> Calling dma_mapping_error() from dma-debug api will result in dma mapping
> check when it shouldn't. This internal routine checks dma mapping error
> without any debug checks.
>
> The following existing dma-debug api are changed to support this feature:
> debug_dma_map_page()
> Increments dma_map_errors and dma_map_errors_not_checked errors totals
> for the system, dma-debug api keeps track of, when dma_addr is invalid.
> This routine now calls internal check_mapping_error() interface to
> avoid doing dma mapping debug checks from dma-debug internal mapping
> error checks.
> check_unmap()
> This is an existing internal routines that checks for various mapping
> errors. Changed to increment system wide dma_unmap_errors, when a
> device requests an invalid address to be unmapped. This routine now
> calls internal check_mapping_error() interface to avoid doing dma
> mapping debug checks from dma-debug internal mapping error checks.
>
> Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> to validate these new interfaces on x86_64. Other architectures will be
> changed in a subsequent patch.
>
> Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
> CONFIG_DMA_API_DEBUG enabled and disabled.
>
> Signed-off-by: Shuah Khan <[email protected]>
> ---
> Documentation/DMA-API.txt | 13 ++++
> arch/x86/include/asm/dma-mapping.h | 1 +
> include/linux/dma-debug.h | 7 ++
> lib/dma-debug.c | 128 ++++++++++++++++++++++++++++++++----
> 4 files changed, 136 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> index 66bd97a..886aaf9 100644
> --- a/Documentation/DMA-API.txt
> +++ b/Documentation/DMA-API.txt
> @@ -638,6 +638,19 @@ this directory the following files can currently be found:
> dma-api/error_count This file is read-only and shows the total
> numbers of errors found.
>
> + dma-api/map_errors This file is read-only and shows the total
> + number of dma mapping errors detected.
> +
> + dma-api/map_errors_not_checked
> + This file is read-only and shows the total
> + number of dma mapping errors, device drivers
> + failed to check prior to using the returned
> + address.
> +
> + dma-api/unmap_errors
> + This file is read-only and shows the total
> + number of invalid dma unmapping attempts.
> +
> dma-api/num_errors The number in this file shows how many
> warnings will be printed to the kernel log
> before it stops. This number is initialized to
> diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> index f7b4c79..808dae6 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> {
> struct dma_map_ops *ops = get_dma_ops(dev);
> + debug_dma_mapping_error(dev, dma_addr);
> if (ops->mapping_error)
> return ops->mapping_error(dev, dma_addr);
>
> diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
> index 171ad8a..fc0e34c 100644
> --- a/include/linux/dma-debug.h
> +++ b/include/linux/dma-debug.h
> @@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
> int direction, dma_addr_t dma_addr,
> bool map_single);
>
> +extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
> +
> extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> size_t size, int direction, bool map_single);
>
> @@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
> {
> }
>
> +static inline void debug_dma_mapping_error(struct device *dev,
> + dma_addr_t dma_addr)
> +{
> +}
> +
> static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> size_t size, int direction,
> bool map_single)
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 66ce414..1708a10 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -45,18 +45,25 @@ enum {
> dma_debug_coherent,
> };
>
> +enum map_err_types {
> + MAP_ERR_CHECK_NOT_APPLICABLE,
> + MAP_ERR_NOT_CHECKED,
> + MAP_ERR_CHECKED,
> +};
> +
> #define DMA_DEBUG_STACKTRACE_ENTRIES 5
>
> struct dma_debug_entry {
> - struct list_head list;
> - struct device *dev;
> - int type;
> - phys_addr_t paddr;
> - u64 dev_addr;
> - u64 size;
> - int direction;
> - int sg_call_ents;
> - int sg_mapped_ents;
> + struct list_head list;
> + struct device *dev;
> + int type;
> + phys_addr_t paddr;
> + u64 dev_addr;
> + u64 size;
> + int direction;
> + int sg_call_ents;
> + int sg_mapped_ents;
> + enum map_err_types map_err_type;

Something gone wild with your tabs/space. It should be only
one extra field added.

Otherwise:
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

> #ifdef CONFIG_STACKTRACE
> struct stack_trace stacktrace;
> unsigned long st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
> @@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
> /* Global error count */
> static u32 error_count;
>
> +/* dma mapping error counts */
> +static u32 map_errors;
> +static u32 map_errors_not_checked;
> +static u32 unmap_errors;
> +
> /* Global error show enable*/
> static u32 show_all_errors __read_mostly;
> /* Number of errors to show */
> @@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent __read_mostly;
> static struct dentry *num_free_entries_dent __read_mostly;
> static struct dentry *min_free_entries_dent __read_mostly;
> static struct dentry *filter_dent __read_mostly;
> +static struct dentry *map_errors_dent __read_mostly;
> +static struct dentry *map_errors_not_checked_dent __read_mostly;
> +static struct dentry *unmap_errors_dent __read_mostly;
>
> /* per-driver filter related state */
>
> @@ -114,6 +129,12 @@ static struct device_driver *current_driver __read_mostly;
>
> static DEFINE_RWLOCK(driver_name_lock);
>
> +static const char *const maperr2str[] = {
> + [MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
> + [MAP_ERR_NOT_CHECKED] = "dma map error not checked",
> + [MAP_ERR_CHECKED] = "dma map error checked",
> +};
> +
> static const char *type2name[4] = { "single", "page",
> "scather-gather", "coherent" };
>
> @@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
> list_for_each_entry(entry, &bucket->list, list) {
> if (!dev || dev == entry->dev) {
> dev_info(entry->dev,
> - "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
> + "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
> type2name[entry->type], idx,
> (unsigned long long)entry->paddr,
> entry->dev_addr, entry->size,
> - dir2name[entry->direction]);
> + dir2name[entry->direction],
> + maperr2str[entry->map_err_type]);
> }
> }
>
> @@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
> if (!filter_dent)
> goto out_err;
>
> + map_errors_dent = debugfs_create_u32("map_errors", 0444,
> + dma_debug_dent,
> + &map_errors);
> +
> + if (!map_errors_dent)
> + goto out_err;
> +
> + map_errors_not_checked_dent = debugfs_create_u32(
> + "map_errors_not_checked",
> + 0444,
> + dma_debug_dent,
> + &map_errors_not_checked);
> +
> + if (!map_errors_not_checked_dent)
> + goto out_err;
> +
> + unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
> + dma_debug_dent,
> + &unmap_errors);
> + if (!unmap_errors_dent)
> + goto out_err;
> +
> return 0;
>
> out_err:
> @@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
> __setup("dma_debug=", dma_debug_cmdline);
> __setup("dma_debug_entries=", dma_debug_entries_cmdline);
>
> +/* Calling dma_mapping_error() from dma-debug api will result in calling
> + debug_dma_mapping_error() - need internal mapping error routine to
> + avoid debug checks */
> +#ifndef DMA_ERROR_CODE
> +#define DMA_ERROR_CODE 0
> +#endif
> +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> + const struct dma_map_ops *ops = get_dma_ops(dev);
> + if (ops->mapping_error)
> + return ops->mapping_error(dev, dma_addr);
> +
> + return (dma_addr == DMA_ERROR_CODE);
> +}
> +
> static void check_unmap(struct dma_debug_entry *ref)
> {
> struct dma_debug_entry *entry;
> struct hash_bucket *bucket;
> unsigned long flags;
>
> - if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> + if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> + unmap_errors += 1;
> err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> "to free an invalid DMA memory address\n");
> return;
> @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> dir2name[ref->direction]);
> }
>
> + if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> + err_printk(ref->dev, entry,
> + "DMA-API: device driver failed to check map error"
> + "[device address=0x%016llx] [size=%llu bytes] "
> + "[mapped as %s]",
> + ref->dev_addr, ref->size,
> + type2name[entry->type]);
> + }
> +
> hash_bucket_del(entry);
> dma_entry_free(entry);
>
> @@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> if (unlikely(global_disable))
> return;
>
> - if (unlikely(dma_mapping_error(dev, dma_addr)))
> + if (unlikely(check_mapping_error(dev, dma_addr))) {
> + map_errors += 1;
> + map_errors_not_checked += 1;
> return;
> + }
>
> entry = dma_entry_alloc();
> if (!entry)
> @@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> entry->dev_addr = dma_addr;
> entry->size = size;
> entry->direction = direction;
> + entry->map_err_type = MAP_ERR_NOT_CHECKED;
>
> if (map_single)
> entry->type = dma_debug_single;
> @@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> }
> EXPORT_SYMBOL(debug_dma_map_page);
>
> +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> + struct dma_debug_entry ref;
> + struct dma_debug_entry *entry;
> + struct hash_bucket *bucket;
> + unsigned long flags;
> +
> + if (unlikely(global_disable))
> + return;
> +
> + ref.dev = dev;
> + ref.dev_addr = dma_addr;
> + bucket = get_hash_bucket(&ref, &flags);
> + entry = bucket_find_exact(bucket, &ref);
> +
> + if (!entry) {
> + /* very likley dma-api didn't call debug_dma_map_page() or
> + debug_dma_map_page() detected mapping error */
> + if (map_errors_not_checked)
> + map_errors_not_checked -= 1;
> + goto out;
> + }
> +
> + entry->map_err_type = MAP_ERR_CHECKED;
> +out:
> + put_hash_bucket(bucket, &flags);
> +}
> +EXPORT_SYMBOL(debug_dma_mapping_error);
> +
> void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> size_t size, int direction, bool map_single)
> {
> --
> 1.7.9.5
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-10-04 22:16:48

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors

On Thu, 2012-10-04 at 10:01 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 03, 2012 at 02:45:11PM -0700, Andrew Morton wrote:
> > On Wed, 03 Oct 2012 08:55:59 -0600
> > Shuah Khan <[email protected]> wrote:
> >
> > > A recent dma mapping error analysis effort showed that a large percentage
> > > of dma_map_single() and dma_map_page() returns are not checked for mapping
> > > errors.
> > >
> > > Reference:
> > > http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> > >
> > > Adding support for tracking dma mapping and unmapping errors to help assess
> > > the following:
> > >
> > > When do dma mapping errors get detected?
> > > How often do these errors occur?
> > > Why don't we see failures related to missing dma mapping error checks?
> > > Are they silent failures?
> >
> > This seems to be a strange way of addressing kernel programming errors.
> > Instead of fixing them up, we generate lots of statistics about how
> > often they happen!
>
> And by using this we can fix the drivers.

Sorry I was out sick for a bit and catching up. Several drivers are
missing checks for mapping errors and in several cases in addition to
missing mapping error checks, drivers are not doing unmap as they
should. Is is very likely the result of cut and paste as you pointed
out. After the analysis, I realized it is worth while spending time to
provide debug infrastructure driver writers can use to find problems and
continue to use it when new mapping code gets added to an existing
driver and/or a new driver is written.

> >
> > Would it not be better to find and fix the buggy code sites? A
> > coccinelle script wold probably help here.
>
> That is the end goal (fixing the buggy code sites). Shuah has
> identified the bad culprits.

I compiled a list and documented it for review. We have to start fixing
them.

>
> >
> > And let's also look at *why* we keep doing this. Partly it's because
> > these things are self-propagating - people copy-n-paste bad code so we
> > get more bad code.
>
>
> And this patch will now tell the poor engineers that write new code that
> they pasted bad code.
>
> >
> >
> > Another reason surely is the poor documentation. Suppose our diligent
> > programmer goes to the dma_map_single() definition site:
> >
> > #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, NULL)
> >
> > No documentation at all. Because it's a stupid macro it doesn't even
> > give the types and names of the arguments or the type of the return
> > value.
> >
> > So he goes to dma_map_single_attrs() and finds that is altogether
> > undocmented.
> >
> > So he goes into Documentation/DMA-API-HOWTO.txt, searches for
> > "dma_map_single" and finds
> >
> > : To map a single region, you do:
> > :
> > : struct device *dev = &my_dev->dev;
> > : dma_addr_t dma_handle;
> > : void *addr = buffer->ptr;
> > : size_t size = buffer->len;
> > :
> > : dma_handle = dma_map_single(dev, addr, size, direction);
> > :
> > : and to unmap it:
> > :
> > : dma_unmap_single(dev, dma_handle, size, direction);
> >
> >
> > So it is hardly surprising that we keep screwing this up!
>
> Right, so that should be also modified (Thank you for looking at that)!
>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

2012-10-04 22:19:26

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH v3] dma-debug: New interfaces to debug dma mapping errors

On Thu, 2012-10-04 at 13:38 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 03, 2012 at 08:55:59AM -0600, Shuah Khan wrote:
> > A recent dma mapping error analysis effort showed that a large percentage
> > of dma_map_single() and dma_map_page() returns are not checked for mapping
> > errors.
> >
> > Reference:
> > http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
> >
> > Adding support for tracking dma mapping and unmapping errors to help assess
> > the following:
> >
> > When do dma mapping errors get detected?
> > How often do these errors occur?
> > Why don't we see failures related to missing dma mapping error checks?
> > Are they silent failures?
> >
> > Patch v3: Addresses review and design comments from Patch v2.
> > Patch v2: Addressed design issues from Patch v1.
> >
> > Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.
> >
> > map_errors: (system wide counter)
> > Total number of dma mapping errors returned by the dma mapping interfaces,
> > in response to mapping requests from all devices in the system.
> > map_errors_not_checked: (system wide counter)
> > Total number of dma mapping errors devices failed to check before using
> > the returned address.
> > unmap_errors: (system wide counter)
> > Total number of times devices tried to unmap or free an invalid dma
> > address.
> > map_err_type: (new field added to dma_debug_entry structure)
> > New field to maintain dma mapping error check status. This error type
> > is applicable to the dma map page and dma map single entries tracked by
> > dma-debug api. This status indicates whether or not a good mapping is
> > checked by the device before its use. dma_map_single() and dma_map_page()
> > could fail to create a mapping in some cases, and drivers are expected to
> > call dma_mapping_error() to check for errors. Please note that this is not
> > counter.
> >
> > Enhancements to dma-debug api are made to add new debugfs interfaces to
> > report total dma errors, dma errors that are not checked, and unmap errors
> > for the entire system. Please note that these are system wide counters for
> > all devices in the system.
> >
> > The following new dma-debug interface is added:
> >
> > debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > Sets dma map error checked status for the dma map entry if one is
> > found. Decrements the system wide dma_map_errors_not_checked counter
> > that is incremented by debug_dma_map_page() when it checks for
> > mapping error before adding it to the dma debug entry table.
> >
> > New dma-debug internal interface:
> > check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > Calling dma_mapping_error() from dma-debug api will result in dma mapping
> > check when it shouldn't. This internal routine checks dma mapping error
> > without any debug checks.
> >
> > The following existing dma-debug api are changed to support this feature:
> > debug_dma_map_page()
> > Increments dma_map_errors and dma_map_errors_not_checked errors totals
> > for the system, dma-debug api keeps track of, when dma_addr is invalid.
> > This routine now calls internal check_mapping_error() interface to
> > avoid doing dma mapping debug checks from dma-debug internal mapping
> > error checks.
> > check_unmap()
> > This is an existing internal routines that checks for various mapping
> > errors. Changed to increment system wide dma_unmap_errors, when a
> > device requests an invalid address to be unmapped. This routine now
> > calls internal check_mapping_error() interface to avoid doing dma
> > mapping debug checks from dma-debug internal mapping error checks.
> >
> > Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> > to validate these new interfaces on x86_64. Other architectures will be
> > changed in a subsequent patch.
> >
> > Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
> > CONFIG_DMA_API_DEBUG enabled and disabled.
> >
> > Signed-off-by: Shuah Khan <[email protected]>
> > ---
> > Documentation/DMA-API.txt | 13 ++++
> > arch/x86/include/asm/dma-mapping.h | 1 +
> > include/linux/dma-debug.h | 7 ++
> > lib/dma-debug.c | 128 ++++++++++++++++++++++++++++++++----
> > 4 files changed, 136 insertions(+), 13 deletions(-)
> >
> > diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
> > index 66bd97a..886aaf9 100644
> > --- a/Documentation/DMA-API.txt
> > +++ b/Documentation/DMA-API.txt
> > @@ -638,6 +638,19 @@ this directory the following files can currently be found:
> > dma-api/error_count This file is read-only and shows the total
> > numbers of errors found.
> >
> > + dma-api/map_errors This file is read-only and shows the total
> > + number of dma mapping errors detected.
> > +
> > + dma-api/map_errors_not_checked
> > + This file is read-only and shows the total
> > + number of dma mapping errors, device drivers
> > + failed to check prior to using the returned
> > + address.
> > +
> > + dma-api/unmap_errors
> > + This file is read-only and shows the total
> > + number of invalid dma unmapping attempts.
> > +
> > dma-api/num_errors The number in this file shows how many
> > warnings will be printed to the kernel log
> > before it stops. This number is initialized to
> > diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
> > index f7b4c79..808dae6 100644
> > --- a/arch/x86/include/asm/dma-mapping.h
> > +++ b/arch/x86/include/asm/dma-mapping.h
> > @@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
> > static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > {
> > struct dma_map_ops *ops = get_dma_ops(dev);
> > + debug_dma_mapping_error(dev, dma_addr);
> > if (ops->mapping_error)
> > return ops->mapping_error(dev, dma_addr);
> >
> > diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
> > index 171ad8a..fc0e34c 100644
> > --- a/include/linux/dma-debug.h
> > +++ b/include/linux/dma-debug.h
> > @@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
> > int direction, dma_addr_t dma_addr,
> > bool map_single);
> >
> > +extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
> > +
> > extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> > size_t size, int direction, bool map_single);
> >
> > @@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
> > {
> > }
> >
> > +static inline void debug_dma_mapping_error(struct device *dev,
> > + dma_addr_t dma_addr)
> > +{
> > +}
> > +
> > static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> > size_t size, int direction,
> > bool map_single)
> > diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> > index 66ce414..1708a10 100644
> > --- a/lib/dma-debug.c
> > +++ b/lib/dma-debug.c
> > @@ -45,18 +45,25 @@ enum {
> > dma_debug_coherent,
> > };
> >
> > +enum map_err_types {
> > + MAP_ERR_CHECK_NOT_APPLICABLE,
> > + MAP_ERR_NOT_CHECKED,
> > + MAP_ERR_CHECKED,
> > +};
> > +
> > #define DMA_DEBUG_STACKTRACE_ENTRIES 5
> >
> > struct dma_debug_entry {
> > - struct list_head list;
> > - struct device *dev;
> > - int type;
> > - phys_addr_t paddr;
> > - u64 dev_addr;
> > - u64 size;
> > - int direction;
> > - int sg_call_ents;
> > - int sg_mapped_ents;
> > + struct list_head list;
> > + struct device *dev;
> > + int type;
> > + phys_addr_t paddr;
> > + u64 dev_addr;
> > + u64 size;
> > + int direction;
> > + int sg_call_ents;
> > + int sg_mapped_ents;
> > + enum map_err_types map_err_type;
>
> Something gone wild with your tabs/space. It should be only
> one extra field added.
I tabbed the existing fields over for readability. Guess I should have
avoided the churn and left it alone. Will fix the tabs and send v4.

>
> Otherwise:
> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>

Is it ok to add your reviewed by tag when I send v4?

-- Shuah
>
> > #ifdef CONFIG_STACKTRACE
> > struct stack_trace stacktrace;
> > unsigned long st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
> > @@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
> > /* Global error count */
> > static u32 error_count;
> >
> > +/* dma mapping error counts */
> > +static u32 map_errors;
> > +static u32 map_errors_not_checked;
> > +static u32 unmap_errors;
> > +
> > /* Global error show enable*/
> > static u32 show_all_errors __read_mostly;
> > /* Number of errors to show */
> > @@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent __read_mostly;
> > static struct dentry *num_free_entries_dent __read_mostly;
> > static struct dentry *min_free_entries_dent __read_mostly;
> > static struct dentry *filter_dent __read_mostly;
> > +static struct dentry *map_errors_dent __read_mostly;
> > +static struct dentry *map_errors_not_checked_dent __read_mostly;
> > +static struct dentry *unmap_errors_dent __read_mostly;
> >
> > /* per-driver filter related state */
> >
> > @@ -114,6 +129,12 @@ static struct device_driver *current_driver __read_mostly;
> >
> > static DEFINE_RWLOCK(driver_name_lock);
> >
> > +static const char *const maperr2str[] = {
> > + [MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
> > + [MAP_ERR_NOT_CHECKED] = "dma map error not checked",
> > + [MAP_ERR_CHECKED] = "dma map error checked",
> > +};
> > +
> > static const char *type2name[4] = { "single", "page",
> > "scather-gather", "coherent" };
> >
> > @@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
> > list_for_each_entry(entry, &bucket->list, list) {
> > if (!dev || dev == entry->dev) {
> > dev_info(entry->dev,
> > - "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
> > + "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
> > type2name[entry->type], idx,
> > (unsigned long long)entry->paddr,
> > entry->dev_addr, entry->size,
> > - dir2name[entry->direction]);
> > + dir2name[entry->direction],
> > + maperr2str[entry->map_err_type]);
> > }
> > }
> >
> > @@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
> > if (!filter_dent)
> > goto out_err;
> >
> > + map_errors_dent = debugfs_create_u32("map_errors", 0444,
> > + dma_debug_dent,
> > + &map_errors);
> > +
> > + if (!map_errors_dent)
> > + goto out_err;
> > +
> > + map_errors_not_checked_dent = debugfs_create_u32(
> > + "map_errors_not_checked",
> > + 0444,
> > + dma_debug_dent,
> > + &map_errors_not_checked);
> > +
> > + if (!map_errors_not_checked_dent)
> > + goto out_err;
> > +
> > + unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
> > + dma_debug_dent,
> > + &unmap_errors);
> > + if (!unmap_errors_dent)
> > + goto out_err;
> > +
> > return 0;
> >
> > out_err:
> > @@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
> > __setup("dma_debug=", dma_debug_cmdline);
> > __setup("dma_debug_entries=", dma_debug_entries_cmdline);
> >
> > +/* Calling dma_mapping_error() from dma-debug api will result in calling
> > + debug_dma_mapping_error() - need internal mapping error routine to
> > + avoid debug checks */
> > +#ifndef DMA_ERROR_CODE
> > +#define DMA_ERROR_CODE 0
> > +#endif
> > +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > + const struct dma_map_ops *ops = get_dma_ops(dev);
> > + if (ops->mapping_error)
> > + return ops->mapping_error(dev, dma_addr);
> > +
> > + return (dma_addr == DMA_ERROR_CODE);
> > +}
> > +
> > static void check_unmap(struct dma_debug_entry *ref)
> > {
> > struct dma_debug_entry *entry;
> > struct hash_bucket *bucket;
> > unsigned long flags;
> >
> > - if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> > + if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> > + unmap_errors += 1;
> > err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> > "to free an invalid DMA memory address\n");
> > return;
> > @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> > dir2name[ref->direction]);
> > }
> >
> > + if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> > + err_printk(ref->dev, entry,
> > + "DMA-API: device driver failed to check map error"
> > + "[device address=0x%016llx] [size=%llu bytes] "
> > + "[mapped as %s]",
> > + ref->dev_addr, ref->size,
> > + type2name[entry->type]);
> > + }
> > +
> > hash_bucket_del(entry);
> > dma_entry_free(entry);
> >
> > @@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> > if (unlikely(global_disable))
> > return;
> >
> > - if (unlikely(dma_mapping_error(dev, dma_addr)))
> > + if (unlikely(check_mapping_error(dev, dma_addr))) {
> > + map_errors += 1;
> > + map_errors_not_checked += 1;
> > return;
> > + }
> >
> > entry = dma_entry_alloc();
> > if (!entry)
> > @@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> > entry->dev_addr = dma_addr;
> > entry->size = size;
> > entry->direction = direction;
> > + entry->map_err_type = MAP_ERR_NOT_CHECKED;
> >
> > if (map_single)
> > entry->type = dma_debug_single;
> > @@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
> > }
> > EXPORT_SYMBOL(debug_dma_map_page);
> >
> > +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > + struct dma_debug_entry ref;
> > + struct dma_debug_entry *entry;
> > + struct hash_bucket *bucket;
> > + unsigned long flags;
> > +
> > + if (unlikely(global_disable))
> > + return;
> > +
> > + ref.dev = dev;
> > + ref.dev_addr = dma_addr;
> > + bucket = get_hash_bucket(&ref, &flags);
> > + entry = bucket_find_exact(bucket, &ref);
> > +
> > + if (!entry) {
> > + /* very likley dma-api didn't call debug_dma_map_page() or
> > + debug_dma_map_page() detected mapping error */
> > + if (map_errors_not_checked)
> > + map_errors_not_checked -= 1;
> > + goto out;
> > + }
> > +
> > + entry->map_err_type = MAP_ERR_CHECKED;
> > +out:
> > + put_hash_bucket(bucket, &flags);
> > +}
> > +EXPORT_SYMBOL(debug_dma_mapping_error);
> > +
> > void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
> > size_t size, int direction, bool map_single)
> > {
> > --
> > 1.7.9.5
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

2012-10-05 01:23:20

by Shuah Khan

[permalink] [raw]
Subject: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors

A recent dma mapping error analysis effort showed that a large percentage
of dma_map_single() and dma_map_page() returns are not checked for mapping
errors.

Reference:
http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis

Adding support for tracking dma mapping and unmapping errors to help assess
the following:

When do dma mapping errors get detected?
How often do these errors occur?
Why don't we see failures related to missing dma mapping error checks?
Are they silent failures?

Patch v4: Addresses extra tab review comments from Patch v3.
Patch v3: Addresses review and design comments from Patch v2.
Patch v2: Addressed design issues from Patch v1.

Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.

map_errors: (system wide counter)
Total number of dma mapping errors returned by the dma mapping interfaces,
in response to mapping requests from all devices in the system.
map_errors_not_checked: (system wide counter)
Total number of dma mapping errors devices failed to check before using
the returned address.
unmap_errors: (system wide counter)
Total number of times devices tried to unmap or free an invalid dma
address.
map_err_type: (new field added to dma_debug_entry structure)
New field to maintain dma mapping error check status. This error type
is applicable to the dma map page and dma map single entries tracked by
dma-debug api. This status indicates whether or not a good mapping is
checked by the device before its use. dma_map_single() and dma_map_page()
could fail to create a mapping in some cases, and drivers are expected to
call dma_mapping_error() to check for errors. Please note that this is not
counter.

Enhancements to dma-debug api are made to add new debugfs interfaces to
report total dma errors, dma errors that are not checked, and unmap errors
for the entire system. Please note that these are system wide counters for
all devices in the system.

The following new dma-debug interface is added:

debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
Sets dma map error checked status for the dma map entry if one is
found. Decrements the system wide dma_map_errors_not_checked counter
that is incremented by debug_dma_map_page() when it checks for
mapping error before adding it to the dma debug entry table.

New dma-debug internal interface:
check_mapping_error(struct device *dev, dma_addr_t dma_addr)
Calling dma_mapping_error() from dma-debug api will result in dma mapping
check when it shouldn't. This internal routine checks dma mapping error
without any debug checks.

The following existing dma-debug api are changed to support this feature:
debug_dma_map_page()
Increments dma_map_errors and dma_map_errors_not_checked errors totals
for the system, dma-debug api keeps track of, when dma_addr is invalid.
This routine now calls internal check_mapping_error() interface to
avoid doing dma mapping debug checks from dma-debug internal mapping
error checks.
check_unmap()
This is an existing internal routines that checks for various mapping
errors. Changed to increment system wide dma_unmap_errors, when a
device requests an invalid address to be unmapped. This routine now
calls internal check_mapping_error() interface to avoid doing dma
mapping debug checks from dma-debug internal mapping error checks.

Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
to validate these new interfaces on x86_64. Other architectures will be
changed in a subsequent patch.

Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
CONFIG_DMA_API_DEBUG enabled and disabled.

Signed-off-by: Shuah Khan <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
---
Documentation/DMA-API.txt | 13 +++++
arch/x86/include/asm/dma-mapping.h | 1 +
include/linux/dma-debug.h | 7 +++
lib/dma-debug.c | 110 ++++++++++++++++++++++++++++++++++--
4 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 66bd97a..886aaf9 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -638,6 +638,19 @@ this directory the following files can currently be found:
dma-api/error_count This file is read-only and shows the total
numbers of errors found.

+ dma-api/map_errors This file is read-only and shows the total
+ number of dma mapping errors detected.
+
+ dma-api/map_errors_not_checked
+ This file is read-only and shows the total
+ number of dma mapping errors, device drivers
+ failed to check prior to using the returned
+ address.
+
+ dma-api/unmap_errors
+ This file is read-only and shows the total
+ number of invalid dma unmapping attempts.
+
dma-api/num_errors The number in this file shows how many
warnings will be printed to the kernel log
before it stops. This number is initialized to
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index f7b4c79..808dae6 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -47,6 +47,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
{
struct dma_map_ops *ops = get_dma_ops(dev);
+ debug_dma_mapping_error(dev, dma_addr);
if (ops->mapping_error)
return ops->mapping_error(dev, dma_addr);

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index 171ad8a..fc0e34c 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -39,6 +39,8 @@ extern void debug_dma_map_page(struct device *dev, struct page *page,
int direction, dma_addr_t dma_addr,
bool map_single);

+extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, int direction, bool map_single);

@@ -105,6 +107,11 @@ static inline void debug_dma_map_page(struct device *dev, struct page *page,
{
}

+static inline void debug_dma_mapping_error(struct device *dev,
+ dma_addr_t dma_addr)
+{
+}
+
static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, int direction,
bool map_single)
diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 66ce414..74a7da9 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -45,6 +45,12 @@ enum {
dma_debug_coherent,
};

+enum map_err_types {
+ MAP_ERR_CHECK_NOT_APPLICABLE,
+ MAP_ERR_NOT_CHECKED,
+ MAP_ERR_CHECKED,
+};
+
#define DMA_DEBUG_STACKTRACE_ENTRIES 5

struct dma_debug_entry {
@@ -57,6 +63,7 @@ struct dma_debug_entry {
int direction;
int sg_call_ents;
int sg_mapped_ents;
+ enum map_err_types map_err_type;
#ifdef CONFIG_STACKTRACE
struct stack_trace stacktrace;
unsigned long st_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
@@ -83,6 +90,11 @@ static u32 global_disable __read_mostly;
/* Global error count */
static u32 error_count;

+/* dma mapping error counts */
+static u32 map_errors;
+static u32 map_errors_not_checked;
+static u32 unmap_errors;
+
/* Global error show enable*/
static u32 show_all_errors __read_mostly;
/* Number of errors to show */
@@ -104,6 +116,9 @@ static struct dentry *show_num_errors_dent __read_mostly;
static struct dentry *num_free_entries_dent __read_mostly;
static struct dentry *min_free_entries_dent __read_mostly;
static struct dentry *filter_dent __read_mostly;
+static struct dentry *map_errors_dent __read_mostly;
+static struct dentry *map_errors_not_checked_dent __read_mostly;
+static struct dentry *unmap_errors_dent __read_mostly;

/* per-driver filter related state */

@@ -114,6 +129,12 @@ static struct device_driver *current_driver __read_mostly;

static DEFINE_RWLOCK(driver_name_lock);

+static const char *const maperr2str[] = {
+ [MAP_ERR_CHECK_NOT_APPLICABLE] = "dma map error check not applicable",
+ [MAP_ERR_NOT_CHECKED] = "dma map error not checked",
+ [MAP_ERR_CHECKED] = "dma map error checked",
+};
+
static const char *type2name[4] = { "single", "page",
"scather-gather", "coherent" };

@@ -381,11 +402,12 @@ void debug_dma_dump_mappings(struct device *dev)
list_for_each_entry(entry, &bucket->list, list) {
if (!dev || dev == entry->dev) {
dev_info(entry->dev,
- "%s idx %d P=%Lx D=%Lx L=%Lx %s\n",
+ "%s idx %d P=%Lx D=%Lx L=%Lx %s %s\n",
type2name[entry->type], idx,
(unsigned long long)entry->paddr,
entry->dev_addr, entry->size,
- dir2name[entry->direction]);
+ dir2name[entry->direction],
+ maperr2str[entry->map_err_type]);
}
}

@@ -695,6 +717,28 @@ static int dma_debug_fs_init(void)
if (!filter_dent)
goto out_err;

+ map_errors_dent = debugfs_create_u32("map_errors", 0444,
+ dma_debug_dent,
+ &map_errors);
+
+ if (!map_errors_dent)
+ goto out_err;
+
+ map_errors_not_checked_dent = debugfs_create_u32(
+ "map_errors_not_checked",
+ 0444,
+ dma_debug_dent,
+ &map_errors_not_checked);
+
+ if (!map_errors_not_checked_dent)
+ goto out_err;
+
+ unmap_errors_dent = debugfs_create_u32("unmap_errors", 0444,
+ dma_debug_dent,
+ &unmap_errors);
+ if (!unmap_errors_dent)
+ goto out_err;
+
return 0;

out_err:
@@ -843,13 +887,29 @@ static __init int dma_debug_entries_cmdline(char *str)
__setup("dma_debug=", dma_debug_cmdline);
__setup("dma_debug_entries=", dma_debug_entries_cmdline);

+/* Calling dma_mapping_error() from dma-debug api will result in calling
+ debug_dma_mapping_error() - need internal mapping error routine to
+ avoid debug checks */
+#ifndef DMA_ERROR_CODE
+#define DMA_ERROR_CODE 0
+#endif
+static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ if (ops->mapping_error)
+ return ops->mapping_error(dev, dma_addr);
+
+ return (dma_addr == DMA_ERROR_CODE);
+}
+
static void check_unmap(struct dma_debug_entry *ref)
{
struct dma_debug_entry *entry;
struct hash_bucket *bucket;
unsigned long flags;

- if (dma_mapping_error(ref->dev, ref->dev_addr)) {
+ if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
+ unmap_errors += 1;
err_printk(ref->dev, NULL, "DMA-API: device driver tries "
"to free an invalid DMA memory address\n");
return;
@@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
dir2name[ref->direction]);
}

+ if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
+ err_printk(ref->dev, entry,
+ "DMA-API: device driver failed to check map error"
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[mapped as %s]",
+ ref->dev_addr, ref->size,
+ type2name[entry->type]);
+ }
+
hash_bucket_del(entry);
dma_entry_free(entry);

@@ -1022,8 +1091,11 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
if (unlikely(global_disable))
return;

- if (unlikely(dma_mapping_error(dev, dma_addr)))
+ if (unlikely(check_mapping_error(dev, dma_addr))) {
+ map_errors += 1;
+ map_errors_not_checked += 1;
return;
+ }

entry = dma_entry_alloc();
if (!entry)
@@ -1035,6 +1107,7 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
entry->dev_addr = dma_addr;
entry->size = size;
entry->direction = direction;
+ entry->map_err_type = MAP_ERR_NOT_CHECKED;

if (map_single)
entry->type = dma_debug_single;
@@ -1050,6 +1123,35 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
}
EXPORT_SYMBOL(debug_dma_map_page);

+void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+ struct dma_debug_entry ref;
+ struct dma_debug_entry *entry;
+ struct hash_bucket *bucket;
+ unsigned long flags;
+
+ if (unlikely(global_disable))
+ return;
+
+ ref.dev = dev;
+ ref.dev_addr = dma_addr;
+ bucket = get_hash_bucket(&ref, &flags);
+ entry = bucket_find_exact(bucket, &ref);
+
+ if (!entry) {
+ /* very likley dma-api didn't call debug_dma_map_page() or
+ debug_dma_map_page() detected mapping error */
+ if (map_errors_not_checked)
+ map_errors_not_checked -= 1;
+ goto out;
+ }
+
+ entry->map_err_type = MAP_ERR_CHECKED;
+out:
+ put_hash_bucket(bucket, &flags);
+}
+EXPORT_SYMBOL(debug_dma_mapping_error);
+
void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
size_t size, int direction, bool map_single)
{
--
1.7.9.5


2012-10-05 22:51:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors

On Thu, 04 Oct 2012 19:23:13 -0600
Shuah Khan <[email protected]> wrote:

> A recent dma mapping error analysis effort showed that a large percentage
> of dma_map_single() and dma_map_page() returns are not checked for mapping
> errors.
>
> Reference:
> http://linuxdriverproject.org/mediawiki/index.php/DMA_Mapping_Error_Analysis
>
> Adding support for tracking dma mapping and unmapping errors to help assess
> the following:
>
> When do dma mapping errors get detected?
> How often do these errors occur?
> Why don't we see failures related to missing dma mapping error checks?
> Are they silent failures?
>
> Patch v4: Addresses extra tab review comments from Patch v3.
> Patch v3: Addresses review and design comments from Patch v2.
> Patch v2: Addressed design issues from Patch v1.
>
> Enhance dma-debug infrastructure to track dma mapping, and unmapping errors.
>
> map_errors: (system wide counter)
> Total number of dma mapping errors returned by the dma mapping interfaces,
> in response to mapping requests from all devices in the system.
> map_errors_not_checked: (system wide counter)
> Total number of dma mapping errors devices failed to check before using
> the returned address.
> unmap_errors: (system wide counter)
> Total number of times devices tried to unmap or free an invalid dma
> address.
> map_err_type: (new field added to dma_debug_entry structure)
> New field to maintain dma mapping error check status. This error type
> is applicable to the dma map page and dma map single entries tracked by
> dma-debug api. This status indicates whether or not a good mapping is
> checked by the device before its use. dma_map_single() and dma_map_page()
> could fail to create a mapping in some cases, and drivers are expected to
> call dma_mapping_error() to check for errors. Please note that this is not
> counter.
>
> Enhancements to dma-debug api are made to add new debugfs interfaces to
> report total dma errors, dma errors that are not checked, and unmap errors
> for the entire system. Please note that these are system wide counters for
> all devices in the system.
>
> The following new dma-debug interface is added:
>
> debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> Sets dma map error checked status for the dma map entry if one is
> found. Decrements the system wide dma_map_errors_not_checked counter
> that is incremented by debug_dma_map_page() when it checks for
> mapping error before adding it to the dma debug entry table.
>
> New dma-debug internal interface:
> check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> Calling dma_mapping_error() from dma-debug api will result in dma mapping
> check when it shouldn't. This internal routine checks dma mapping error
> without any debug checks.
>
> The following existing dma-debug api are changed to support this feature:
> debug_dma_map_page()
> Increments dma_map_errors and dma_map_errors_not_checked errors totals
> for the system, dma-debug api keeps track of, when dma_addr is invalid.
> This routine now calls internal check_mapping_error() interface to
> avoid doing dma mapping debug checks from dma-debug internal mapping
> error checks.
> check_unmap()
> This is an existing internal routines that checks for various mapping
> errors. Changed to increment system wide dma_unmap_errors, when a
> device requests an invalid address to be unmapped. This routine now
> calls internal check_mapping_error() interface to avoid doing dma
> mapping debug checks from dma-debug internal mapping error checks.
>
> Changed arch/x86/include/asm/dma-mapping.h to call debug_dma_mapping_error()
> to validate these new interfaces on x86_64. Other architectures will be
> changed in a subsequent patch.
>
> Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with
> CONFIG_DMA_API_DEBUG enabled and disabled.

Still seems overly complicated to me, but whatev.

I think the way to handle this is pretty simple: set a flag in the dma
entry when someone runs dma_mapping_error() and, if that flag wasn't
set at unmap time, emit a loud warning.

>From my reading of the code, this patch indeed does that, along with a
bunch of other (unnecessary?) stuff. But boy, the changelog conceals
this information well!

> Documentation/DMA-API.txt | 13 +++++
> arch/x86/include/asm/dma-mapping.h | 1 +
> include/linux/dma-debug.h | 7 +++
> lib/dma-debug.c | 110 ++++++++++++++++++++++++++++++++++--

Please, go through Documentation/DMA-API-HOWTO.txt with a toothcomb and
ensure that all this is appropriately covered and that the examples are
completed for error checking.

>
> ...
>
> +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> + const struct dma_map_ops *ops = get_dma_ops(dev);
> + if (ops->mapping_error)
> + return ops->mapping_error(dev, dma_addr);
> +
> + return (dma_addr == DMA_ERROR_CODE);
> +}

I'm not a fan of functions called check_foo() because I never know
whether their return value means "foo is true" or "foo is false". It
doesn't help that the return value here is undocumented. Names such as
has_mapping_error() or mapping_has_error() will remove this question,
thereby making the call sites more readable.

> static void check_unmap(struct dma_debug_entry *ref)
> {
> struct dma_debug_entry *entry;
> struct hash_bucket *bucket;
> unsigned long flags;
>
> - if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> + if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> + unmap_errors += 1;
> err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> "to free an invalid DMA memory address\n");
> return;
> @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> dir2name[ref->direction]);
> }
>
> + if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> + err_printk(ref->dev, entry,
> + "DMA-API: device driver failed to check map error"
> + "[device address=0x%016llx] [size=%llu bytes] "
> + "[mapped as %s]",
> + ref->dev_addr, ref->size,
> + type2name[entry->type]);
> + }

It's important that this warning be associated with a backtrace so we
can identify the offending call site in the usual fashion.
err_printk() does include a backtrace under some circumstances, but
those circumstances hurt my brain.

Is it guaranteed that we'll have that backtrace? If not, I'd suggest
making it so.

> hash_bucket_del(entry);
> dma_entry_free(entry);
>
>
> ...
>
> +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> +{
> + struct dma_debug_entry ref;
> + struct dma_debug_entry *entry;
> + struct hash_bucket *bucket;
> + unsigned long flags;
> +
> + if (unlikely(global_disable))
> + return;
> +
> + ref.dev = dev;
> + ref.dev_addr = dma_addr;
> + bucket = get_hash_bucket(&ref, &flags);
> + entry = bucket_find_exact(bucket, &ref);
> +
> + if (!entry) {
> + /* very likley dma-api didn't call debug_dma_map_page() or
> + debug_dma_map_page() detected mapping error */
> + if (map_errors_not_checked)
> + map_errors_not_checked -= 1;
> + goto out;
> + }
> +
> + entry->map_err_type = MAP_ERR_CHECKED;
> +out:
> + put_hash_bucket(bucket, &flags);
> +}
> +EXPORT_SYMBOL(debug_dma_mapping_error);

Well, it's a global, exported-to-modules symbol. Some formal
documentation is appropriate for such things.

>
> ...
>

2012-10-08 17:07:32

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors

On Fri, 2012-10-05 at 15:51 -0700, Andrew Morton wrote:

>
> Still seems overly complicated to me, but whatev.
>
> I think the way to handle this is pretty simple: set a flag in the dma
> entry when someone runs dma_mapping_error() and, if that flag wasn't
> set at unmap time, emit a loud warning.
>
> From my reading of the code, this patch indeed does that, along with a
> bunch of other (unnecessary?) stuff. But boy, the changelog conceals
> this information well!

Are you referring to the system wide error counters when you say
unnecessary stuff. The reason I added those was to catch errors when
drivers don't do unmap. Several drivers fail to do unmap. Checking flag
from unmap debug interfaces, doesn't cover these cases. However, I think
the value of system wide counters is limited in the sense that they
don't really identify the driver that needs fixing. In that sense it can
be deemed unnecessary. I dropped them in v5 patch, which I am sending
out.

I also changed the changelog to be clear.

>
> > Documentation/DMA-API.txt | 13 +++++
> > arch/x86/include/asm/dma-mapping.h | 1 +
> > include/linux/dma-debug.h | 7 +++
> > lib/dma-debug.c | 110 ++++++++++++++++++++++++++++++++++--
>
> Please, go through Documentation/DMA-API-HOWTO.txt with a toothcomb and
> ensure that all this is appropriately covered and that the examples are
> completed for error checking.
>
> >
> > ...
> >
> > +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > + const struct dma_map_ops *ops = get_dma_ops(dev);
> > + if (ops->mapping_error)
> > + return ops->mapping_error(dev, dma_addr);
> > +
> > + return (dma_addr == DMA_ERROR_CODE);
> > +}
>
> I'm not a fan of functions called check_foo() because I never know
> whether their return value means "foo is true" or "foo is false". It
> doesn't help that the return value here is undocumented. Names such as
> has_mapping_error() or mapping_has_error() will remove this question,
> thereby making the call sites more readable.

Changed the name to has_mapping_error().

>
> > static void check_unmap(struct dma_debug_entry *ref)
> > {
> > struct dma_debug_entry *entry;
> > struct hash_bucket *bucket;
> > unsigned long flags;
> >
> > - if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> > + if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> > + unmap_errors += 1;
> > err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> > "to free an invalid DMA memory address\n");
> > return;
> > @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> > dir2name[ref->direction]);
> > }
> >
> > + if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> > + err_printk(ref->dev, entry,
> > + "DMA-API: device driver failed to check map error"
> > + "[device address=0x%016llx] [size=%llu bytes] "
> > + "[mapped as %s]",
> > + ref->dev_addr, ref->size,
> > + type2name[entry->type]);
> > + }
>
> It's important that this warning be associated with a backtrace so we
> can identify the offending call site in the usual fashion.
> err_printk() does include a backtrace under some circumstances, but
> those circumstances hurt my brain.
>
> Is it guaranteed that we'll have that backtrace? If not, I'd suggest
> making it so.

Yes it does. Here is one from my test run:

[ 5232.368453] e1000e 0000:00:19.0: DMA-API: device driver failed to
check map error[device address=0x00000000ffe4e1c0] [size=1522 bytes]
[mapped as single]

[ 5232.368546] Pid: 0, comm: swapper/1 Not tainted 3.6.0-next-20121002+
#2
[ 5232.368549] Call Trace:
[ 5232.368553] <IRQ> [<ffffffff81055faf>] warn_slowpath_common
+0x7f/0xc0
[ 5232.368570] [<ffffffff810560a6>] warn_slowpath_fmt+0x46/0x50
[ 5232.368579] [<ffffffff81341fbe>] check_unmap+0x48e/0x860
[ 5232.368588] [<ffffffff81092ee9>] ? check_preempt_wakeup+0x1c9/0x270
[ 5232.368595] [<ffffffff8134251c>] debug_dma_unmap_page+0x5c/0x60
[ 5232.368616] [<ffffffffa0016e4e>] e1000_clean_rx_irq+0x1be/0x410
[e1000e]
[ 5232.368626] [<ffffffff8108c885>] ? wake_up_process+0x15/0x20
[ 5232.368644] [<ffffffffa001b35c>] e1000e_poll+0x7c/0x400 [e1000e]
[ 5232.368653] [<ffffffff8107e539>] ? enqueue_hrtimer+0x39/0xc0
[ 5232.368662] [<ffffffff815756c4>] net_rx_action+0x134/0x240
[ 5232.368670] [<ffffffff813b15c3>] ? arch_local_irq_enable+0x8/0xd
[ 5232.368678] [<ffffffff8105e9d0>] __do_softirq+0xc0/0x240
[ 5232.368688] [<ffffffff816856ce>] ? _raw_spin_lock+0xe/0x20
[ 5232.368697] [<ffffffff8168f31c>] call_softirq+0x1c/0x30
[ 5232.368706] [<ffffffff81016315>] do_softirq+0x65/0xa0
[ 5232.368713] [<ffffffff8105ecae>] irq_exit+0x8e/0xb0
[ 5232.368721] [<ffffffff8168fba3>] do_IRQ+0x63/0xe0
[ 5232.368728] [<ffffffff81685cad>] common_interrupt+0x6d/0x6d
[ 5232.368731] <EOI> [<ffffffff813b15c3>] ? arch_local_irq_enable
+0x8/0xd
[ 5232.368744] [<ffffffff813b2159>] acpi_idle_enter_c1+0x94/0xb9
[ 5232.368752] [<ffffffff81533fd9>] cpuidle_enter+0x19/0x20
[ 5232.368758] [<ffffffff815346ac>] cpuidle_idle_call+0xac/0x290
[ 5232.368767] [<ffffffff8101d59f>] cpu_idle+0xcf/0x120
[ 5232.368775] [<ffffffff81670bc6>] start_secondary+0x1ec/0x1f3
[ 5232.368780] ---[ end trace 58eeb6eaa1eca88f ]---
[ 5232.368782] Mapped at:
[ 5232.368785] [<ffffffff81340fcb>] debug_dma_map_page+0x8b/0x150
[ 5232.368792] [<ffffffffa001823f>] e1000_alloc_rx_buffers+0x15f/0x2d0
[e1000e]
[ 5232.368808] [<ffffffffa0016f8b>] e1000_clean_rx_irq+0x2fb/0x410
[e1000e]
[ 5232.368823] [<ffffffffa001b35c>] e1000e_poll+0x7c/0x400 [e1000e]
[ 5232.368838] [<ffffffff815756c4>] net_rx_action+0x134/0x240

>
> > hash_bucket_del(entry);
> > dma_entry_free(entry);
> >
> >
> > ...
> >
> > +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > + struct dma_debug_entry ref;
> > + struct dma_debug_entry *entry;
> > + struct hash_bucket *bucket;
> > + unsigned long flags;
> > +
> > + if (unlikely(global_disable))
> > + return;
> > +
> > + ref.dev = dev;
> > + ref.dev_addr = dma_addr;
> > + bucket = get_hash_bucket(&ref, &flags);
> > + entry = bucket_find_exact(bucket, &ref);
> > +
> > + if (!entry) {
> > + /* very likley dma-api didn't call debug_dma_map_page() or
> > + debug_dma_map_page() detected mapping error */
> > + if (map_errors_not_checked)
> > + map_errors_not_checked -= 1;
> > + goto out;
> > + }
> > +
> > + entry->map_err_type = MAP_ERR_CHECKED;
> > +out:
> > + put_hash_bucket(bucket, &flags);
> > +}
> > +EXPORT_SYMBOL(debug_dma_mapping_error);
>
> Well, it's a global, exported-to-modules symbol. Some formal
> documentation is appropriate for such things.

I added information on what it does and who can/should call to
DMA-API.txt
>
> >
> > ...
> >
>

2012-10-09 21:02:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors

On Mon, 08 Oct 2012 11:07:20 -0600
Shuah Khan <[email protected]> wrote:

> >
> > Still seems overly complicated to me, but whatev.
> >
> > I think the way to handle this is pretty simple: set a flag in the dma
> > entry when someone runs dma_mapping_error() and, if that flag wasn't
> > set at unmap time, emit a loud warning.
> >
> > From my reading of the code, this patch indeed does that, along with a
> > bunch of other (unnecessary?) stuff. But boy, the changelog conceals
> > this information well!
>
> Are you referring to the system wide error counters when you say
> unnecessary stuff. The reason I added those was to catch errors when
> drivers don't do unmap. Several drivers fail to do unmap. Checking flag
> from unmap debug interfaces, doesn't cover these cases. However, I think
> the value of system wide counters is limited in the sense that they
> don't really identify the driver that needs fixing. In that sense it can
> be deemed unnecessary. I dropped them in v5 patch, which I am sending
> out.

hm. Could we keep a counter of the number of map/unmap calls within
the dma object and then emit a warning if that is non-zero at teardown
time? That should identify the offending driver.

2012-10-10 21:50:37

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors

On Tue, 2012-10-09 at 14:02 -0700, Andrew Morton wrote:
> On Mon, 08 Oct 2012 11:07:20 -0600
> Shuah Khan <[email protected]> wrote:
>
> > >
> > > Still seems overly complicated to me, but whatev.
> > >
> > > I think the way to handle this is pretty simple: set a flag in the dma
> > > entry when someone runs dma_mapping_error() and, if that flag wasn't
> > > set at unmap time, emit a loud warning.
> > >
> > > From my reading of the code, this patch indeed does that, along with a
> > > bunch of other (unnecessary?) stuff. But boy, the changelog conceals
> > > this information well!
> >
> > Are you referring to the system wide error counters when you say
> > unnecessary stuff. The reason I added those was to catch errors when
> > drivers don't do unmap. Several drivers fail to do unmap. Checking flag
> > from unmap debug interfaces, doesn't cover these cases. However, I think
> > the value of system wide counters is limited in the sense that they
> > don't really identify the driver that needs fixing. In that sense it can
> > be deemed unnecessary. I dropped them in v5 patch, which I am sending
> > out.
>
> hm. Could we keep a counter of the number of map/unmap calls within
> the dma object and then emit a warning if that is non-zero at teardown
> time? That should identify the offending driver.
>

Thanks. That would work. Will take a look and see what it takes to
implement.

-- Shuah