2009-06-05 08:34:03

by FUJITA Tomonori

[permalink] [raw]
Subject: [PATCH] dma-debug: disable DMA_API_DEBUG for now

dma-debugs wrongly assumes that no multiple DMA transfers are
performed against the same dma address on one device at the same
time. However it's true only with hardware IOMMUs. For example, an
application can easily send the same buffer twice with different
lengths to one device by using DIO and AIO. If these requests are not
unmapped in the same order in which they were mapped,
hash_bucket_find() finds a wrong entry and gives a false warning.

We should fix this before 2.6.30 release. Seems that there is no
easy way to fix it. I think that it's better to just disable
dma-debug for now.

Torsten Kaiser found this bug with the RAID1 configuration:

http://marc.info/?t=124336541900008&r=1&w=2

Reported-by: Torsten Kaiser <[email protected]>
Signed-off-by: FUJITA Tomonori <[email protected]>
---
arch/sh/Kconfig | 1 -
arch/x86/Kconfig | 1 -
2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index e7390dd..4fb19d7 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -14,7 +14,6 @@ config SUPERH
select HAVE_GENERIC_DMA_COHERENT
select HAVE_IOREMAP_PROT if MMU
select HAVE_ARCH_TRACEHOOK
- select HAVE_DMA_API_DEBUG
help
The SuperH is a RISC processor targeted for use in embedded systems
and consumer electronics; it was also used in the Sega Dreamcast
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a6efe0a..b3cf490 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -42,7 +42,6 @@ config X86
select HAVE_GENERIC_DMA_COHERENT if X86_32
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select USER_STACKTRACE_SUPPORT
- select HAVE_DMA_API_DEBUG
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_BZIP2
select HAVE_KERNEL_LZMA
--
1.6.0.6


2009-06-05 10:42:18

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now


On Fri, Jun 05, 2009 at 05:33:14PM +0900, FUJITA Tomonori wrote:
> dma-debugs wrongly assumes that no multiple DMA transfers are
> performed against the same dma address on one device at the same
> time. However it's true only with hardware IOMMUs. For example, an
> application can easily send the same buffer twice with different
> lengths to one device by using DIO and AIO. If these requests are not
> unmapped in the same order in which they were mapped,
> hash_bucket_find() finds a wrong entry and gives a false warning.
>
> We should fix this before 2.6.30 release. Seems that there is no
> easy way to fix it. I think that it's better to just disable
> dma-debug for now.
>
> Torsten Kaiser found this bug with the RAID1 configuration:
>
> http://marc.info/?t=124336541900008&r=1&w=2
>

Argh, I never thought that this can happen. But its not explicitly
forbidden so we have to handle this situation. Thanks for tracking down
the bug to this issue.
However, I think there is a somehow simple fix for the issue. Patch is
attached. Its the least intrusive way I can think of to fix this
problem.
But its up to Linus/Ingo to decide if it can be accepted at this very
late point in the cycle. Since dma-debug is new with 2.6.30 it will at
least not introduce any regression. The patch is boot and basically
load-tested with some disk and network load and showed no issues on my
test machine. The diffstat looks big but more than the half of the patch
adds comments to explain what it does. So the diffstat looks bigger than
the actual change.
Linus, if you think my patch is not acceptable at this point then please
apply the patch from FUJITA Tomonori.

Regards,

Joerg

>From c0baa4d43a674c9cc321af3b23a6d5c9b46d65bd Mon Sep 17 00:00:00 2001
From: Joerg Roedel <[email protected]>
Date: Fri, 5 Jun 2009 12:01:35 +0200
Subject: [PATCH] dma-debug: change hash_bucket_find from first-fit to best-fit

Some device drivers map the same physical address multiple times to a
dma address. Without an IOMMU this results in the same dma address being
put into the dma-debug hash multiple times. With a first-fit match in
hash_bucket_find() this function may return the wrong dma_debug_entry.
This can result in false positive warnings. This patch fixes it by
changing the first-fit behavior of hash_bucket_find() into a best-fit
algorithm.

Reported-by: Torsten Kaiser <[email protected]>
Reported-by: FUJITA Tomonori <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
---
lib/dma-debug.c | 43 +++++++++++++++++++++++++++++++++++++++----
1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 69da09a..2b16536 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -185,15 +185,50 @@ static void put_hash_bucket(struct hash_bucket *bucket,
static struct dma_debug_entry *hash_bucket_find(struct hash_bucket *bucket,
struct dma_debug_entry *ref)
{
- struct dma_debug_entry *entry;
+ struct dma_debug_entry *entry, *ret = NULL;
+ int matches = 0, match_lvl, last_lvl = 0;

list_for_each_entry(entry, &bucket->list, list) {
- if ((entry->dev_addr == ref->dev_addr) &&
- (entry->dev == ref->dev))
+ if ((entry->dev_addr != ref->dev_addr) ||
+ (entry->dev != ref->dev))
+ continue;
+
+ /*
+ * Some drivers map the same physical address multiple
+ * times. Without a hardware IOMMU this results in the
+ * same device addresses being put into the dma-debug
+ * hash multiple times too. This can result in false
+ * positives being reported. Therfore we implement a
+ * best-fit algorithm here which returns the entry from
+ * the hash which fits best to the reference value
+ * instead of the first-fit.
+ */
+ matches += 1;
+ match_lvl = 0;
+ entry->size == ref->size ? ++match_lvl : match_lvl;
+ entry->type == ref->type ? ++match_lvl : match_lvl;
+ entry->direction == ref->direction ? ++match_lvl : match_lvl;
+
+ if (match_lvl == 3) {
+ /* perfect-fit - return the result */
return entry;
+ } else if (match_lvl > last_lvl) {
+ /*
+ * We found an entry that fits better then the
+ * previous one
+ */
+ last_lvl = match_lvl;
+ ret = entry;
+ }
}

- return NULL;
+ /*
+ * If we have multiple matches but no perfect-fit, just return
+ * NULL.
+ */
+ ret = (matches == 1) ? ret : NULL;
+
+ return ret;
}

/*
--
1.6.3.1




--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-05 11:39:27

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, 5 Jun 2009 12:41:33 +0200
Joerg Roedel <[email protected]> wrote:

>
> On Fri, Jun 05, 2009 at 05:33:14PM +0900, FUJITA Tomonori wrote:
> > dma-debugs wrongly assumes that no multiple DMA transfers are
> > performed against the same dma address on one device at the same
> > time. However it's true only with hardware IOMMUs. For example, an
> > application can easily send the same buffer twice with different
> > lengths to one device by using DIO and AIO. If these requests are not
> > unmapped in the same order in which they were mapped,
> > hash_bucket_find() finds a wrong entry and gives a false warning.
> >
> > We should fix this before 2.6.30 release. Seems that there is no
> > easy way to fix it. I think that it's better to just disable
> > dma-debug for now.
> >
> > Torsten Kaiser found this bug with the RAID1 configuration:
> >
> > http://marc.info/?t=124336541900008&r=1&w=2
> >
>
> Argh, I never thought that this can happen. But its not explicitly
> forbidden so we have to handle this situation. Thanks for tracking down
> the bug to this issue.
> However, I think there is a somehow simple fix for the issue. Patch is
> attached. Its the least intrusive way I can think of to fix this
> problem.
> But its up to Linus/Ingo to decide if it can be accepted at this very
> late point in the cycle. Since dma-debug is new with 2.6.30 it will at
> least not introduce any regression. The patch is boot and basically
> load-tested with some disk and network load and showed no issues on my
> test machine. The diffstat looks big but more than the half of the patch
> adds comments to explain what it does. So the diffstat looks bigger than
> the actual change.
> Linus, if you think my patch is not acceptable at this point then please
> apply the patch from FUJITA Tomonori.
>
> Regards,
>
> Joerg
>
> From c0baa4d43a674c9cc321af3b23a6d5c9b46d65bd Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <[email protected]>
> Date: Fri, 5 Jun 2009 12:01:35 +0200
> Subject: [PATCH] dma-debug: change hash_bucket_find from first-fit to best-fit
>
> Some device drivers map the same physical address multiple times to a
> dma address. Without an IOMMU this results in the same dma address being
> put into the dma-debug hash multiple times. With a first-fit match in
> hash_bucket_find() this function may return the wrong dma_debug_entry.
> This can result in false positive warnings. This patch fixes it by
> changing the first-fit behavior of hash_bucket_find() into a best-fit
> algorithm.

I thought about something like this fix however we can do better; with
this fix, we could overlook a bad hardware IOMMU bug.

I think that the better fix can handle both cases per device:

- multiple identical dma addresses should not happen (with devices
behind hardware IOMMU)
- multiple identical dma addresses could happen


It's better to use a strict-fit algorithm (as we do now) with the
former. It's pretty difficult to do something better than a best-fit
algorithm with the latter though.



> Reported-by: Torsten Kaiser <[email protected]>
> Reported-by: FUJITA Tomonori <[email protected]>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> lib/dma-debug.c | 43 +++++++++++++++++++++++++++++++++++++++----
> 1 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 69da09a..2b16536 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -185,15 +185,50 @@ static void put_hash_bucket(struct hash_bucket *bucket,
> static struct dma_debug_entry *hash_bucket_find(struct hash_bucket *bucket,
> struct dma_debug_entry *ref)
> {
> - struct dma_debug_entry *entry;
> + struct dma_debug_entry *entry, *ret = NULL;
> + int matches = 0, match_lvl, last_lvl = 0;
>
> list_for_each_entry(entry, &bucket->list, list) {
> - if ((entry->dev_addr == ref->dev_addr) &&
> - (entry->dev == ref->dev))
> + if ((entry->dev_addr != ref->dev_addr) ||
> + (entry->dev != ref->dev))
> + continue;
> +
> + /*
> + * Some drivers map the same physical address multiple
> + * times. Without a hardware IOMMU this results in the
> + * same device addresses being put into the dma-debug
> + * hash multiple times too. This can result in false
> + * positives being reported. Therfore we implement a
> + * best-fit algorithm here which returns the entry from
> + * the hash which fits best to the reference value
> + * instead of the first-fit.
> + */
> + matches += 1;
> + match_lvl = 0;
> + entry->size == ref->size ? ++match_lvl : match_lvl;
> + entry->type == ref->type ? ++match_lvl : match_lvl;
> + entry->direction == ref->direction ? ++match_lvl : match_lvl;
> +
> + if (match_lvl == 3) {
> + /* perfect-fit - return the result */
> return entry;
> + } else if (match_lvl > last_lvl) {
> + /*
> + * We found an entry that fits better then the
> + * previous one
> + */
> + last_lvl = match_lvl;
> + ret = entry;
> + }
> }
>
> - return NULL;
> + /*
> + * If we have multiple matches but no perfect-fit, just return
> + * NULL.
> + */
> + ret = (matches == 1) ? ret : NULL;
> +
> + return ret;
> }
>
> /*
> --
> 1.6.3.1
>
>
>
>
> --
> | Advanced Micro Devices GmbH
> Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
> System |
> Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
> Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
> | Registergericht M?nchen, HRB Nr. 43632
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-06-05 12:44:35

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 05, 2009 at 08:38:22PM +0900, FUJITA Tomonori wrote:
> On Fri, 5 Jun 2009 12:41:33 +0200
> Joerg Roedel <[email protected]> wrote:

> I thought about something like this fix however we can do better; with
> this fix, we could overlook a bad hardware IOMMU bug.

I don't buy this yet. Can you explain what kind of hardware IOMMU bug we
can miss here? For the match in my patch it is still strictly necessary
that the dma address matches.

> I think that the better fix can handle both cases per device:
>
> - multiple identical dma addresses should not happen (with devices
> behind hardware IOMMU)
> - multiple identical dma addresses could happen
>
>
> It's better to use a strict-fit algorithm (as we do now) with the
> former. It's pretty difficult to do something better than a best-fit
> algorithm with the latter though.

When we understand the same under 'strict-fit' this will not work I
think. The idea behind dma-debug is to find the cases where the
dma_unmap parameters are not equal to the dma_map parameters. If we use
strict-fit we risk to oversee some sets of these cases because there is
no dma_debug_entry which strictly matches the reference value.

Regards,

Joerg


--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-05 14:58:37

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Friday 05 June 2009, FUJITA Tomonori wrote:
> I think that the better fix can handle both cases per device:
>
> - multiple identical dma addresses should not happen (with devices
> behind hardware IOMMU)
> - multiple identical dma addresses could happen

I guess you could also have the case where for a given range
of addresses, you use a linear mapping and the dma addresses
can be identical, while for other physical addresses you would
rely on address translation.

For example on PowerPC/Cell with infiniband adapters, you can get
linear mapping behavior for DMA_ATTR_WEAK_ORDERING but IOMMU
translation without that flag, for the same device and same
physical address.

Arnd <><

2009-06-05 15:52:36

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 5, 2009 at 12:41 PM, Joerg Roedel <[email protected]> wrote:
> From c0baa4d43a674c9cc321af3b23a6d5c9b46d65bd Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <[email protected]>
> Date: Fri, 5 Jun 2009 12:01:35 +0200
> Subject: [PATCH] dma-debug: change hash_bucket_find from first-fit to best-fit
>
> Some device drivers map the same physical address multiple times to a
> dma address. Without an IOMMU this results in the same dma address being
> put into the dma-debug hash multiple times. With a first-fit match in
> hash_bucket_find() this function may return the wrong dma_debug_entry.
> This can result in false positive warnings. This patch fixes it by
> changing the first-fit behavior of hash_bucket_find() into a best-fit
> algorithm.
>
> Reported-by: Torsten Kaiser <[email protected]>
> Reported-by: FUJITA Tomonori <[email protected]>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> lib/dma-debug.c | 43 +++++++++++++++++++++++++++++++++++++++----
> 1 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/lib/dma-debug.c b/lib/dma-debug.c
> index 69da09a..2b16536 100644
> --- a/lib/dma-debug.c
> +++ b/lib/dma-debug.c
> @@ -185,15 +185,50 @@ static void put_hash_bucket(struct hash_bucket *bucket,
> static struct dma_debug_entry *hash_bucket_find(struct hash_bucket *bucket,
> struct dma_debug_entry *ref)
> {
> - struct dma_debug_entry *entry;
> + struct dma_debug_entry *entry, *ret = NULL;
> + int matches = 0, match_lvl, last_lvl = 0;
>
> list_for_each_entry(entry, &bucket->list, list) {
> - if ((entry->dev_addr == ref->dev_addr) &&
> - (entry->dev == ref->dev))
> + if ((entry->dev_addr != ref->dev_addr) ||
> + (entry->dev != ref->dev))
> + continue;
> +
> + /*
> + * Some drivers map the same physical address multiple
> + * times. Without a hardware IOMMU this results in the
> + * same device addresses being put into the dma-debug
> + * hash multiple times too. This can result in false
> + * positives being reported. Therfore we implement a
> + * best-fit algorithm here which returns the entry from
> + * the hash which fits best to the reference value
> + * instead of the first-fit.
> + */
> + matches += 1;
> + match_lvl = 0;
> + entry->size == ref->size ? ++match_lvl : match_lvl;
> + entry->type == ref->type ? ++match_lvl : match_lvl;
> + entry->direction == ref->direction ? ++match_lvl : match_lvl;
> +
> + if (match_lvl == 3) {
> + /* perfect-fit - return the result */
> return entry;
> + } else if (match_lvl > last_lvl) {
> + /*
> + * We found an entry that fits better then the
> + * previous one
> + */
> + last_lvl = match_lvl;
> + ret = entry;
> + }
> }
>
> - return NULL;
> + /*
> + * If we have multiple matches but no perfect-fit, just return
> + * NULL.
> + */
> + ret = (matches == 1) ? ret : NULL;

This doesn't look right to me.
The comment above says "returns the entry from the hash which fits
best", but this will always return NULL, if there are are multiple
entrys, but no perfect match.

On a wrong unmap that would result in "DMA-API: device driver tries to
free DMA memory it has not allocated" instead of a report what kind of
mismatches happend.

Or did you mean to return NULL if there are more then one matches at
the last_lvl? Then a matches=1; is missing from the "found an entry"
block.

Should there be a warning if more then one possible match were found?

* driver maps address 'a' with size 1
* driver maps same address 'a' with size 2
* driver wrongly unmaps the second allocation with size 1
-> no warning, because the first allocation is returned
* driver wants to correctly unmap the first allocation
-> wrong warning about this unmap because size mismatch

Also what about sg_call_ents and sg_mapped_ents?
Could it be possible to get the same address/size with different sg-counts.

As in my case most of the warnings where about a wrong unmap count and
only a few about the memory size, that seems possible.

For reference, here part of the warnings from my system:

Wrong count:
Jun 5 03:32:58 treogen [ 51.469720] sata_sil24 0000:04:00.0:
DMA-API: device driver frees DMA sg list with different entry count
[map count=1] [unmap count=2]
Jun 5 03:32:58 treogen [ 51.469725] Modules linked in: msp3400
tuner tea5767 tda8290 tuner_xc2028 xc
5000 tda9887 tuner_simple tuner_types mt20xx tea5761 bttv ir_common
v4l2_common videodev v4l1_compat v4
l2_compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc tveeprom sg pata_amd
Jun 5 03:32:58 treogen [ 51.469771] Pid: 0, comm: swapper Not
tainted 2.6.30-rc7 #1
Jun 5 03:32:58 treogen [ 51.469775] Call Trace:
Jun 5 03:32:58 treogen [ 51.469779] <IRQ> [<ffffffff8041755e>] ?
check_unmap+0x65e/0x6a0
Jun 5 03:32:58 treogen [ 51.469797] [<ffffffff802432d8>]
warn_slowpath_common+0x78/0xd0
Jun 5 03:32:58 treogen [ 51.469804] [<ffffffff802433b4>]
warn_slowpath_fmt+0x64/0x70
Jun 5 03:32:58 treogen [ 51.469816] [<ffffffff8028dd42>] ?
mempool_free_slab+0x12/0x20
Jun 5 03:32:58 treogen [ 51.469828] [<ffffffff8068d74d>] ?
_spin_lock_irqsave+0x1d/0x40
Jun 5 03:32:58 treogen [ 51.469835] [<ffffffff8041755e>]
check_unmap+0x65e/0x6a0
Jun 5 03:32:58 treogen [ 51.469842] [<ffffffff804176ae>]
debug_dma_unmap_sg+0x10e/0x1b0

Wrong size:
Jun 3 06:50:33 treogen [ 276.421432] sata_sil24 0000:04:00.0:
DMA-API: device driver frees DMA memory with different size [device
address=0x000000007b590000] [map size=16384 bytes] [unmap size=12288
bytes]
Jun 3 06:50:33 treogen [
276.421438] Modules linked in: msp3400 tuner tea5767 tda8290
tuner_xc2028 xc5000 tda9887 tuner_simple tuner_types mt20xx tea5761
bttv ir_common v4l2_common videodev v4l1_compat v4l2_compat_ioctl32
videobuf_dma_sg videobuf_core btcx_risc sg pata_amd tveeprom
Jun 3 06:50:33 treogen [ 276.421480] Pid: 1301, comm: kcryptd Not
tainted 2.6.30-rc7 #1
Jun 3 06:50:33 treogen [ 276.421485] Call Trace:
Jun 3 06:50:33 treogen [ 276.421490] <IRQ> [<ffffffff80417355>] ?
check_unmap+0x455/0x6a0
Jun 3 06:50:33 treogen [ 276.421507] [<ffffffff802432d8>]
warn_slowpath_common+0x78/0xd0
Jun 3 06:50:33 treogen [ 276.421514] [<ffffffff802433b4>]
warn_slowpath_fmt+0x64/0x70
Jun 3 06:50:33 treogen [ 276.421524] [<ffffffff8028dd42>] ?
mempool_free_slab+0x12/0x20
Jun 3 06:50:33 treogen [ 276.421535] [<ffffffff8068d74d>] ?
_spin_lock_irqsave+0x1d/0x40
Jun 3 06:50:33 treogen [ 276.421542] [<ffffffff80417355>]
check_unmap+0x455/0x6a0
Jun 3 06:50:33 treogen [ 276.421549] [<ffffffff804176ae>]
debug_dma_unmap_sg+0x10e/0x1b0

> +
> + return ret;
> }
>
> /*
> --
> 1.6.3.1
>
>
>
>
> --
> | Advanced Micro Devices GmbH
> Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
> System |
> Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
> Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
> | Registergericht M?nchen, HRB Nr. 43632
>
>

2009-06-05 18:20:33

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 05, 2009 at 05:52:27PM +0200, Torsten Kaiser wrote:
>
> This doesn't look right to me.
> The comment above says "returns the entry from the hash which fits
> best", but this will always return NULL, if there are are multiple
> entrys, but no perfect match.

This is intentional. If there is no perfect-fit there is no way to tell
which entry was meant. So we potentially report wrong errors with a
wrong mapping backtrace which confuses even more than the wrong
"DMA-API: device driver tries to free DMA memory it has not allocated".

> Should there be a warning if more then one possible match were found?

True. That would be better. But I tried to keep the code change as small
as possible without disabling the feature completly.

> * driver maps address 'a' with size 1
> * driver maps same address 'a' with size 2
> * driver wrongly unmaps the second allocation with size 1
> -> no warning, because the first allocation is returned

Hmm, I am not sure if we can handle this situation correctly in the
dma-debug code. There is no unique key to identify a mapping request
which allows to assign an unmap request to it. Currently dma-debug uses
device and dma-address. But that seems not to be sufficient. The
best-fit algorithm is also a but fuzzy of course.

> * driver wants to correctly unmap the first allocation
> -> wrong warning about this unmap because size mismatch

Ok, at least we get a warning about a bug. Not very useful because it
reports the wrong bug. Is this the situation which triggered the
original bug report?

> Also what about sg_call_ents and sg_mapped_ents?
> Could it be possible to get the same address/size with different sg-counts.

It looks not forbidden in the API. So I guess this can happen too.

Regards,

Joerg

2009-06-05 20:25:15

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 5, 2009 at 8:20 PM, Joerg Roedel <[email protected]> wrote:
> On Fri, Jun 05, 2009 at 05:52:27PM +0200, Torsten Kaiser wrote:
>>
>> This doesn't look right to me.
>> The comment above says "returns the entry from the hash which fits
>> best", but this will always return NULL, if there are are multiple
>> entrys, but no perfect match.
>
> This is intentional. If there is no perfect-fit there is no way to tell
> which entry was meant. So we potentially report wrong errors with a
> wrong mapping backtrace which confuses even more than the wrong
> "DMA-API: device driver tries to free DMA memory it has not allocated".

Ah, OK.
I thought that reporting a (maybe) wrong difference in map/unmap would
be better that a definitely wrong message "freed not allocated
memory". And if the fuzzy matcher would warn about the fuzzy match, a
developer would know that this difference might be wrong.

>> Should there be a warning if more then one possible match were found?
>
> True. That would be better. But I tried to keep the code change as small
> as possible without disabling the feature completly.

OK, it just looked like the code did something else the comment said.

>> * driver maps address 'a' with size 1
>> * driver maps same address 'a' with size 2
>> * driver wrongly unmaps the second allocation with size 1
>> -> no warning, because the first allocation is returned
>
> Hmm, I am not sure if we can handle this situation correctly in the
> dma-debug code. There is no unique key to identify a mapping request
> which allows to assign an unmap request to it. Currently dma-debug uses
> device and dma-address. But that seems not to be sufficient. The
> best-fit algorithm is also a but fuzzy of course.

Yes, I just read pci-gart_64.c to see if there is a better key.
The API always seems to include size and direction too.

Would it work to to use all device, address, size and direction as
key, as the proposes exact matcher does?
This would obvious not work with the current diagnostics in
check_unmap, as not even single near matches would be returned.
But if you would move the diagnostics from check_unmap into
hash_bucket_find it could work:
If hash_bucket_find does not find a perfect match, it could output:
* no match -> tries to free DMA memory it has not allocated
* 1 match -> warn about any differences (size, type, cpu-address, sg
list count, direction)
* 2+ matches, with perfect match -> no warning; set flag in other matches.
* 2+ matches, without perfect match -> list differences from all
matches; set flag in other matches after returning the best

If differences from a flagged match are output, add a note, that this
warning is unreliable because of these possible map/unmap mismatches.

>> * driver wants to correctly unmap the first allocation
>> -> wrong warning about this unmap because size mismatch
>
> Ok, at least we get a warning about a bug. Not very useful because it
> reports the wrong bug. Is this the situation which triggered the
> original bug report?

No, the original report was with a correctly working driver that just
confused the dma-debugging code into issuing wrong warnings:
http://marc.info/?l=linux-kernel&m=124336530609750&w=2

This constructed case was just me trying to find more evil cases that
are currently not covered.

Your patch would blame the correctly programmed second unmap for
"freeing unallocated memory",
my proposal would blame it for the size difference, but with a note
that there was a concurrent second map at the same address.

And in a case where the code maps the same address twice but wants to
unmap it wrong, it would correctly output both possible, but
mismatching maps, instead of the wrong "unallocated memory" warning.

>> Also what about sg_call_ents and sg_mapped_ents?
>> Could it be possible to get the same address/size with different sg-counts.
>
> It looks not forbidden in the API. So I guess this can happen too.

Should it then be added to the fuzzy matcher?
Otherwise I would except the warnings like "DMA-API: device driver
frees DMA sg list with different entry count [map count=2] [unmap
count=1]" to still trigger. (I didn't have the time to test your patch
yet)

Torsten

2009-06-05 22:11:26

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 5, 2009 at 8:20 PM, Joerg Roedel <[email protected]> wrote:
> On Fri, Jun 05, 2009 at 05:52:27PM +0200, Torsten Kaiser wrote:
>> Should there be a warning if more then one possible match were found?
>
> True. That would be better. But I tried to keep the code change as small
> as possible without disabling the feature completly.
>
>> * driver maps address 'a' with size 1
>> * driver maps same address 'a' with size 2
>> * driver wrongly unmaps the second allocation with size 1
>> -> no warning, because the first allocation is returned
>
> Hmm, I am not sure if we can handle this situation correctly in the
> dma-debug code. There is no unique key to identify a mapping request
> which allows to assign an unmap request to it. Currently dma-debug uses
> device and dma-address. But that seems not to be sufficient. The
> best-fit algorithm is also a but fuzzy of course.

Maybe we just shouldn't try to handle it at all:

static void add_dma_entry(struct dma_debug_entry *entry)
{
struct hash_bucket *bucket;
unsigned long flags;

bucket = get_hash_bucket(entry, &flags);
if(hash_bucket_find(bucket, entry)) {
printk(KERN_ERR "DMA-API: device mapped same address twice, "
"this use case cannot be handled currently -
disabling debugging\n");
global_disable = true;
}
hash_bucket_add(bucket, entry);
put_hash_bucket(bucket, &flags);
}

This would allow this feature to remain for most cased, but would also
prevent all false warnings.

Torsten

2009-06-07 08:13:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now


* Joerg Roedel <[email protected]> wrote:

>
> On Fri, Jun 05, 2009 at 05:33:14PM +0900, FUJITA Tomonori wrote:
> > dma-debugs wrongly assumes that no multiple DMA transfers are
> > performed against the same dma address on one device at the same
> > time. However it's true only with hardware IOMMUs. For example, an
> > application can easily send the same buffer twice with different
> > lengths to one device by using DIO and AIO. If these requests are not
> > unmapped in the same order in which they were mapped,
> > hash_bucket_find() finds a wrong entry and gives a false warning.
> >
> > We should fix this before 2.6.30 release. Seems that there is no
> > easy way to fix it. I think that it's better to just disable
> > dma-debug for now.
> >
> > Torsten Kaiser found this bug with the RAID1 configuration:
> >
> > http://marc.info/?t=124336541900008&r=1&w=2
> >
>
> Argh, I never thought that this can happen. But its not explicitly
> forbidden so we have to handle this situation. Thanks for tracking
> down the bug to this issue.
>
> However, I think there is a somehow simple fix for the issue.
> Patch is attached. Its the least intrusive way I can think of to
> fix this problem.
>
> But its up to Linus/Ingo to decide if it can be accepted at this
> very late point in the cycle. Since dma-debug is new with 2.6.30
> it will at least not introduce any regression. [...]

I think it's too late for v2.6.30 to do any of the changes - and the
DMA debug facility is off by default.

Also, i think such DMA patterns, while 'allowed' can be quite
dangerous as its such a rare usage combination really. AIO and DIO
are crazy to begin with, mixing AIO and DIO for the same buffer is
madness square two. (It can result in 3 agents for the same memory
address: CPU, dma1 and dma2. How many interesting chipset erratums
could there be related to such scenarios?)

But it is certainly not the task of a debug facility to restrict
existing user-visible ABIs, so fixing the false positive is correct.

So i've applied your fix to the iommu branch for v2.6.31 and marked
it for -stable backporting, that way 2.6.30.1 will be able to pick
the patch up (if it remains problem-free in testing).

Thanks,

Ingo

2009-06-07 08:22:14

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Sun, Jun 7, 2009 at 10:13 AM, Ingo Molnar <[email protected]> wrote:
> * Joerg Roedel <[email protected]> wrote:
>> But its up to Linus/Ingo to decide if it can be accepted at this
>> very late point in the cycle. Since dma-debug is new with 2.6.30
>> it will at least not introduce any regression. [...]
>
> I think it's too late for v2.6.30 to do any of the changes - and the
> DMA debug facility is off by default.
>
> Also, i think such DMA patterns, while 'allowed' can be quite
> dangerous as its such a rare usage combination really. AIO and DIO
> are crazy to begin with, mixing AIO and DIO for the same buffer is
> madness square two. (It can result in 3 agents for the same memory
> address: CPU, dma1 and dma2. How many interesting chipset erratums
> could there be related to such scenarios?)

I think in my case the cause is somewhat simpler: RAID1
My root filesystem is on a RAID1 that consists of two disks, both
connected to the sata_sil24 controller.
So it is only natural for the md driver to issue two dma read requests
for the same address: one for each drive.

Torsten

2009-06-07 10:23:33

by Joerg Roedel

[permalink] [raw]
Subject: [tip:core/iommu] dma-debug: change hash_bucket_find from first-fit to best-fit

Commit-ID: 7caf6a49bb17d0377210693af5737563b31aa5ee
Gitweb: http://git.kernel.org/tip/7caf6a49bb17d0377210693af5737563b31aa5ee
Author: Joerg Roedel <[email protected]>
AuthorDate: Fri, 5 Jun 2009 12:01:35 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 7 Jun 2009 10:04:53 +0200

dma-debug: change hash_bucket_find from first-fit to best-fit

Some device drivers map the same physical address multiple times to a
dma address. Without an IOMMU this results in the same dma address being
put into the dma-debug hash multiple times. With a first-fit match in
hash_bucket_find() this function may return the wrong dma_debug_entry.

This can result in false positive warnings. This patch fixes it by
changing the first-fit behavior of hash_bucket_find() into a best-fit
algorithm.

Reported-by: Torsten Kaiser <[email protected]>
Reported-by: FUJITA Tomonori <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: FUJITA Tomonori <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
lib/dma-debug.c | 43 +++++++++++++++++++++++++++++++++++++++----
1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index cdd205d..8fcc09c 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -186,15 +186,50 @@ static void put_hash_bucket(struct hash_bucket *bucket,
static struct dma_debug_entry *hash_bucket_find(struct hash_bucket *bucket,
struct dma_debug_entry *ref)
{
- struct dma_debug_entry *entry;
+ struct dma_debug_entry *entry, *ret = NULL;
+ int matches = 0, match_lvl, last_lvl = 0;

list_for_each_entry(entry, &bucket->list, list) {
- if ((entry->dev_addr == ref->dev_addr) &&
- (entry->dev == ref->dev))
+ if ((entry->dev_addr != ref->dev_addr) ||
+ (entry->dev != ref->dev))
+ continue;
+
+ /*
+ * Some drivers map the same physical address multiple
+ * times. Without a hardware IOMMU this results in the
+ * same device addresses being put into the dma-debug
+ * hash multiple times too. This can result in false
+ * positives being reported. Therfore we implement a
+ * best-fit algorithm here which returns the entry from
+ * the hash which fits best to the reference value
+ * instead of the first-fit.
+ */
+ matches += 1;
+ match_lvl = 0;
+ entry->size == ref->size ? ++match_lvl : match_lvl;
+ entry->type == ref->type ? ++match_lvl : match_lvl;
+ entry->direction == ref->direction ? ++match_lvl : match_lvl;
+
+ if (match_lvl == 3) {
+ /* perfect-fit - return the result */
return entry;
+ } else if (match_lvl > last_lvl) {
+ /*
+ * We found an entry that fits better then the
+ * previous one
+ */
+ last_lvl = match_lvl;
+ ret = entry;
+ }
}

- return NULL;
+ /*
+ * If we have multiple matches but no perfect-fit, just return
+ * NULL.
+ */
+ ret = (matches == 1) ? ret : NULL;
+
+ return ret;
}

/*

2009-06-07 10:45:55

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Sun, 7 Jun 2009 10:13:05 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Joerg Roedel <[email protected]> wrote:
>
> >
> > On Fri, Jun 05, 2009 at 05:33:14PM +0900, FUJITA Tomonori wrote:
> > > dma-debugs wrongly assumes that no multiple DMA transfers are
> > > performed against the same dma address on one device at the same
> > > time. However it's true only with hardware IOMMUs. For example, an
> > > application can easily send the same buffer twice with different
> > > lengths to one device by using DIO and AIO. If these requests are not
> > > unmapped in the same order in which they were mapped,
> > > hash_bucket_find() finds a wrong entry and gives a false warning.
> > >
> > > We should fix this before 2.6.30 release. Seems that there is no
> > > easy way to fix it. I think that it's better to just disable
> > > dma-debug for now.
> > >
> > > Torsten Kaiser found this bug with the RAID1 configuration:
> > >
> > > http://marc.info/?t=124336541900008&r=1&w=2
> > >
> >
> > Argh, I never thought that this can happen. But its not explicitly
> > forbidden so we have to handle this situation. Thanks for tracking
> > down the bug to this issue.
> >
> > However, I think there is a somehow simple fix for the issue.
> > Patch is attached. Its the least intrusive way I can think of to
> > fix this problem.
> >
> > But its up to Linus/Ingo to decide if it can be accepted at this
> > very late point in the cycle. Since dma-debug is new with 2.6.30
> > it will at least not introduce any regression. [...]
>
> I think it's too late for v2.6.30 to do any of the changes - and the
> DMA debug facility is off by default.

Fedora enables it by default and seems that we got some bogus bug
reports. I like not to see any bogus bug reports about v2.6.30. So I
prefer to disable dma-debug feature.


> Also, i think such DMA patterns, while 'allowed' can be quite
> dangerous as its such a rare usage combination really. AIO and DIO
> are crazy to begin with, mixing AIO and DIO for the same buffer is
> madness square two. (It can result in 3 agents for the same memory
> address: CPU, dma1 and dma2. How many interesting chipset erratums
> could there be related to such scenarios?)

As Torsten pointed out, this bug happens on common and sane
configurations. And if you want to write the same contents to two
places on disk, AIO and DIO is a reasonable solution, I think.


> But it is certainly not the task of a debug facility to restrict
> existing user-visible ABIs, so fixing the false positive is correct.
>
> So i've applied your fix to the iommu branch for v2.6.31 and marked
> it for -stable backporting, that way 2.6.30.1 will be able to pick
> the patch up (if it remains problem-free in testing).
>

Joerg patch weakens the checking capability. IMHO, it's the wrong
direction.

2009-06-10 20:42:06

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 5, 2009 at 12:41 PM, Joerg Roedel<[email protected]> wrote:
> From c0baa4d43a674c9cc321af3b23a6d5c9b46d65bd Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <[email protected]>
> Date: Fri, 5 Jun 2009 12:01:35 +0200
> Subject: [PATCH] dma-debug: change hash_bucket_find from first-fit to best-fit
>
> Some device drivers map the same physical address multiple times to a
> dma address. Without an IOMMU this results in the same dma address being
> put into the dma-debug hash multiple times. With a first-fit match in
> hash_bucket_find() this function may return the wrong dma_debug_entry.
> This can result in false positive warnings. This patch fixes it by
> changing the first-fit behavior of hash_bucket_find() into a best-fit
> algorithm.
>
> Reported-by: Torsten Kaiser <[email protected]>
> Reported-by: FUJITA Tomonori <[email protected]>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> lib/dma-debug.c | 43 +++++++++++++++++++++++++++++++++++++++----
> 1 files changed, 39 insertions(+), 4 deletions(-)

I applied this patch to the just released 2.6.30, but it does not fix
the false warning on my System.

Jun 10 21:10:14 treogen [ 2611.715341] ------------[ cut here ]------------
Jun 10 21:10:14 treogen [ 2611.715359] WARNING: at lib/dma-debug.c:565
check_unmap+0x536/0x620()
Jun 10 21:10:14 treogen [ 2611.715363] Hardware name: KFN5-D SLI
Jun 10 21:10:14 treogen [ 2611.715369] sata_sil24 0000:04:00.0:
DMA-API: device driver frees DMA sg list with different entry count
[map count=2] [unmap count=1]
Jun 10 21:10:14 treogen [ 2611.715374] Modules linked in: msp3400
tuner tea5767 tda8290 tuner_xc2028 xc5000 tda9887 tuner_simple
tuner_types mt20xx tea5761 bttv ir_common v4l2_common videodev
v4l1_compat v4l2_compat_ioctl32 videobuf_dma_sg videobuf_core
btcx_risc sg pata_amd tveeprom
Jun 10 21:10:14 treogen [ 2611.715416] Pid: 0, comm: swapper Not
tainted 2.6.30 #1
Jun 10 21:10:14 treogen [ 2611.715420] Call Trace:
Jun 10 21:10:14 treogen [ 2611.715424] <IRQ> [<ffffffff804175c6>] ?
check_unmap+0x536/0x620
Jun 10 21:10:14 treogen [ 2611.715441] [<ffffffff80243308>]
warn_slowpath_common+0x78/0xd0
Jun 10 21:10:14 treogen [ 2611.715448] [<ffffffff802433e4>]
warn_slowpath_fmt+0x64/0x70
Jun 10 21:10:14 treogen [ 2611.715458] [<ffffffff8057c5d8>] ?
dec_pending+0x88/0x170
Jun 10 21:10:14 treogen [ 2611.715465] [<ffffffff8057c8bf>] ?
clone_endio+0x9f/0xd0
Jun 10 21:10:14 treogen [ 2611.715475] [<ffffffff8068e52d>] ?
_spin_lock_irqsave+0x1d/0x40
Jun 10 21:10:14 treogen [ 2611.715482] [<ffffffff804175c6>]
check_unmap+0x536/0x620
Jun 10 21:10:14 treogen [ 2611.715493] [<ffffffff8028de5a>] ?
mempool_free+0x8a/0xa0
Jun 10 21:10:14 treogen [ 2611.715501] [<ffffffff804177bd>]
debug_dma_unmap_sg+0x10d/0x190
Jun 10 21:10:14 treogen [ 2611.715510] [<ffffffff802c0e55>] ?
__slab_free+0x185/0x340
Jun 10 21:10:14 treogen [ 2611.715519] [<ffffffff804c1121>] ?
__scsi_put_command+0x61/0xa0
Jun 10 21:10:14 treogen [ 2611.715528] [<ffffffff804d4258>]
ata_sg_clean+0x78/0xf0
Jun 10 21:10:14 treogen [ 2611.715535] [<ffffffff804d4305>]
__ata_qc_complete+0x35/0x110
Jun 10 21:10:14 treogen [ 2611.715544] [<ffffffff804c7b88>] ?
scsi_io_completion+0x398/0x530
Jun 10 21:10:14 treogen [ 2611.715551] [<ffffffff804d449d>]
ata_qc_complete+0xbd/0x250
Jun 10 21:10:14 treogen [ 2611.715559] [<ffffffff804d49db>]
ata_qc_complete_multiple+0xab/0xf0
Jun 10 21:10:14 treogen [ 2611.715568] [<ffffffff804ea289>]
sil24_interrupt+0xb9/0x5b0
Jun 10 21:10:14 treogen [ 2611.715576] [<ffffffff802614cc>] ?
getnstimeofday+0x5c/0xf0
Jun 10 21:10:14 treogen [ 2611.715584] [<ffffffff8025d349>] ?
ktime_get_ts+0x59/0x60
Jun 10 21:10:14 treogen [ 2611.715593] [<ffffffff802730b0>]
handle_IRQ_event+0x70/0x180
Jun 10 21:10:14 treogen [ 2611.715601] [<ffffffff802753bd>]
handle_fasteoi_irq+0x6d/0xe0
Jun 10 21:10:14 treogen [ 2611.715609] [<ffffffff8020e42f>]
handle_irq+0x1f/0x30
Jun 10 21:10:14 treogen [ 2611.715614] [<ffffffff8020db7a>] do_IRQ+0x6a/0xf0
Jun 10 21:10:14 treogen [ 2611.715623] [<ffffffff8020be53>]
ret_from_intr+0x0/0xf
Jun 10 21:10:14 treogen [ 2611.715627] <EOI> [<ffffffff80213657>] ?
default_idle+0x77/0xe0
Jun 10 21:10:14 treogen [ 2611.715640] [<ffffffff80213655>] ?
default_idle+0x75/0xe0
Jun 10 21:10:14 treogen [ 2611.715648] [<ffffffff8025e3bf>] ?
notifier_call_chain+0x3f/0x80
Jun 10 21:10:14 treogen [ 2611.715655] [<ffffffff802136f8>] ?
c1e_idle+0x38/0x110
Jun 10 21:10:14 treogen [ 2611.715663] [<ffffffff8020a71e>] ?
cpu_idle+0x6e/0xd0
Jun 10 21:10:14 treogen [ 2611.715672] [<ffffffff8067bb8d>] ?
rest_init+0x6d/0x80
Jun 10 21:10:14 treogen [ 2611.715682] [<ffffffff80929cc5>] ?
start_kernel+0x35a/0x422
Jun 10 21:10:14 treogen [ 2611.715690] [<ffffffff80929289>] ?
x86_64_start_reservations+0x99/0xb9
Jun 10 21:10:14 treogen [ 2611.715697] [<ffffffff80929389>] ?
x86_64_start_kernel+0xe0/0xf2
Jun 10 21:10:14 treogen [ 2611.715702] ---[ end trace db81115dbc8b11c5 ]---
Jun 10 21:10:14 treogen [ 2611.715706] Mapped at:
Jun 10 21:10:14 treogen [ 2611.715709] [<ffffffff80416ef9>]
debug_dma_map_sg+0x159/0x180
Jun 10 21:10:14 treogen [ 2611.715717] [<ffffffff804d47cc>]
ata_qc_issue+0x19c/0x300
Jun 10 21:10:14 treogen [ 2611.715724] [<ffffffff804da6c8>]
ata_scsi_translate+0xa8/0x180
Jun 10 21:10:14 treogen [ 2611.715733] [<ffffffff804dd261>]
ata_scsi_queuecmd+0xb1/0x2d0
Jun 10 21:10:14 treogen [ 2611.715739] [<ffffffff804c0f23>]
scsi_dispatch_cmd+0xe3/0x220

So I implemented a check that would turn of the DMA-API debugging, if
such a double mapping was observed.
The resulting place where this was triggered also verified that is
seems to be the RAID1 that is causing these mappings:

Jun 10 21:39:30 treogen [ 9.251818] md: Autodetecting RAID arrays.
Jun 10 21:39:30 treogen [ 9.334044] md: Scanned 5 and added 5 devices.
Jun 10 21:39:30 treogen [ 9.334048] md: autorun ...
Jun 10 21:39:30 treogen [ 9.334052] md: considering sdb3 ...
Jun 10 21:39:30 treogen [ 9.334065] md: adding sdb3 ...
Jun 10 21:39:30 treogen [ 9.334073] md: sdb1 has different UUID to sdb3
Jun 10 21:39:30 treogen [ 9.334081] md: sdc2 has different UUID to sdb3
Jun 10 21:39:30 treogen [ 9.334091] md: adding sda3 ...
Jun 10 21:39:30 treogen [ 9.334101] md: sda1 has different UUID to sdb3
Jun 10 21:39:30 treogen [ 9.334111] md: created md3
Jun 10 21:39:30 treogen [ 9.334115] md: bind<sda3>
Jun 10 21:39:30 treogen [ 9.334149] md: bind<sdb3>
Jun 10 21:39:30 treogen [ 9.334166] md: running: <sdb3><sda3>
Jun 10 21:39:30 treogen [ 9.346111] raid1: raid set md3 active with
2 out of 2 mirrors
Jun 10 21:39:30 treogen [ 9.354255] md3: bitmap initialized from
disk: read 10/10 pages, set 24 bits
Jun 10 21:39:30 treogen [ 9.354260] created bitmap (145 pages) for device md3
Jun 10 21:39:30 treogen [ 9.354399] DMA-API: device mapped same
address twice, this use case cannot be handled currently - disabling
debugging
Jun 10 21:39:30 treogen [ 9.359858] md: considering sdb1 ...
Jun 10 21:39:30 treogen [ 9.359874] md: adding sdb1 ...
Jun 10 21:39:30 treogen [ 9.359882] md: sdc2 has different UUID to sdb1

Here is the patch, I was using. I hope Gmail does mangle it to badly...

From: Torsten Kaiser <[email protected]>

The DMA-API allowes a device to map the same addresse multiple times.
If this happens (for example with md-raid1) the DMA-API debugging code can no
longer associate the map and unmap request with 100% confidence, as the keys
collide.
If the wrong entry gets returned on unmap, this can trigger bogus warnings about
mismatching parameters. This was visible on my system.
As such double mappings seem to be rather uncommon, only disable the DMA-API
if such a mapping is observed. This way all wrong warnings can be
prevented without
removing this debugging tool.

Possible enhancement: If the returned entry is 100% identical to the
entry that needs
to be added, it would be irrelevant which entry will be returned on
unmap. So could
allow identical entries into the hash list without turning itself off.

Signed-off-by: Torsten Kaiser <[email protected]
---
--- lib/dma-debug.c.orig 2009-06-10 21:22:53.371813622 +0200
+++ lib/dma-debug.c 2009-06-10 21:26:40.892282983 +0200
@@ -253,6 +253,12 @@ static void add_dma_entry(struct dma_deb
unsigned long flags;

bucket = get_hash_bucket(entry, &flags);
+ if (hash_bucket_find(bucket, entry)) {
+ printk(KERN_ERR "DMA-API: device mapped same address twice, "
+ "this use case cannot be handled currently "
+ "- disabling debugging\n");
+ global_disable = true;
+ }
hash_bucket_add(bucket, entry);
put_hash_bucket(bucket, &flags);
}

2009-06-11 08:11:04

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Wed, Jun 10, 2009 at 10:41:53PM +0200, Torsten Kaiser wrote:
> I applied this patch to the just released 2.6.30, but it does not fix
> the false warning on my System.
>
> Jun 10 21:10:14 treogen [ 2611.715341] ------------[ cut here ]------------
> Jun 10 21:10:14 treogen [ 2611.715359] WARNING: at lib/dma-debug.c:565
> check_unmap+0x536/0x620()
> Jun 10 21:10:14 treogen [ 2611.715363] Hardware name: KFN5-D SLI
> Jun 10 21:10:14 treogen [ 2611.715369] sata_sil24 0000:04:00.0:
> DMA-API: device driver frees DMA sg list with different entry count [map count=2] [unmap count=1]

Ok, thats because we need also to check for sg_call_ents in the best-fit
checks. Can you please test if you can reproduce it with the attached
patch?

>From 1e83c7eab546314ad9dbe08602d243bb83e93b50 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <[email protected]>
Date: Thu, 11 Jun 2009 10:03:42 +0200
Subject: [PATCH] dma-debug: check for sg_call_ents in best-fit algorithm too

If we don't check for sg_call_ents the hash_bucket_find function might
still return the wrong dma_debug_entry for sg mappings.

Signed-off-by: Joerg Roedel <[email protected]>
---
lib/dma-debug.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index ad65fc0..eb86ec3 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -262,11 +262,12 @@ static struct dma_debug_entry *hash_bucket_find(struct hash_bucket *bucket,
*/
matches += 1;
match_lvl = 0;
- entry->size == ref->size ? ++match_lvl : match_lvl;
- entry->type == ref->type ? ++match_lvl : match_lvl;
- entry->direction == ref->direction ? ++match_lvl : match_lvl;
+ entry->size == ref->size ? ++match_lvl : 0;
+ entry->type == ref->type ? ++match_lvl : 0;
+ entry->direction == ref->direction ? ++match_lvl : 0;
+ entry->sg_call_ents == ref->sg_call_ents ? ++match_lvl : 0;

- if (match_lvl == 3) {
+ if (match_lvl == 4) {
/* perfect-fit - return the result */
return entry;
} else if (match_lvl > last_lvl) {
--
1.6.3.1


--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-11 17:38:58

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Thu, Jun 11, 2009 at 10:10 AM, Joerg Roedel<[email protected]> wrote:
> On Wed, Jun 10, 2009 at 10:41:53PM +0200, Torsten Kaiser wrote:
>> I applied this patch to the just released 2.6.30, but it does not fix
>> the false warning on my System.
>>
>> Jun 10 21:10:14 treogen [ 2611.715341] ------------[ cut here ]------------
>> Jun 10 21:10:14 treogen [ 2611.715359] WARNING: at lib/dma-debug.c:565
>> check_unmap+0x536/0x620()
>> Jun 10 21:10:14 treogen [ 2611.715363] Hardware name: KFN5-D SLI
>> Jun 10 21:10:14 treogen [ 2611.715369] sata_sil24 0000:04:00.0:
>> DMA-API: device driver frees DMA sg list with different entry count [map count=2] [unmap count=1]
>
> Ok, thats because we need also to check for sg_call_ents in the best-fit
> checks. Can you please test if you can reproduce it with the attached
> patch?
>
> From 1e83c7eab546314ad9dbe08602d243bb83e93b50 Mon Sep 17 00:00:00 2001
> From: Joerg Roedel <[email protected]>
> Date: Thu, 11 Jun 2009 10:03:42 +0200
> Subject: [PATCH] dma-debug: check for sg_call_ents in best-fit algorithm too
>
> If we don't check for sg_call_ents the hash_bucket_find function might
> still return the wrong dma_debug_entry for sg mappings.
>
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> lib/dma-debug.c | 9 +++++----
> 1 files changed, 5 insertions(+), 4 deletions(-)

I tried this patch, but I still get a wrong warning from the DMA-API:
Jun 11 19:24:57 treogen [ 9.451064] raid1: raid set md2 active with
2 out of 2 mirrors
Jun 11 19:24:57 treogen [ 9.479278] md2: bitmap initialized from
disk: read 10/10 pages, set 0 bits
Jun 11 19:24:57 treogen [ 9.479282] created bitmap (153 pages) for device md2
Jun 11 19:24:57 treogen [ 9.513544] md: ... autorun DONE.
Jun 11 19:24:57 treogen [ 9.517590] md3: unknown partition table
Jun 11 19:24:57 treogen [ 20.718608] XFS mounting filesystem dm-0
Jun 11 19:24:57 treogen [ 21.220452] ------------[ cut here ]------------
Jun 11 19:24:57 treogen [ 21.220477] WARNING: at lib/dma-debug.c:532
check_unmap+0x49e/0x620()
Jun 11 19:24:57 treogen [ 21.220482] Hardware name: KFN5-D SLI
Jun 11 19:24:57 treogen [ 21.220488] sata_sil24 0000:04:00.0:
DMA-API: device driver tries to free DM
A memory it has not allocated [device address=0x000000011e4c3000]
[size=4096 bytes]
Jun 11 19:24:57 treogen [ 21.220494] Modules linked in:
Jun 11 19:24:57 treogen [ 21.220502] Pid: 1301, comm: kcryptd Not
tainted 2.6.30 #3
Jun 11 19:24:57 treogen [ 21.220507] Call Trace:
Jun 11 19:24:57 treogen [ 21.220510] <IRQ> [<ffffffff8041753e>] ?
check_unmap+0x49e/0x620
Jun 11 19:24:57 treogen [ 21.220528] [<ffffffff80243308>]
warn_slowpath_common+0x78/0xd0
Jun 11 19:24:57 treogen [ 21.220535] [<ffffffff802433e4>]
warn_slowpath_fmt+0x64/0x70
Jun 11 19:24:57 treogen [ 21.220545] [<ffffffff802c10b4>] ?
kmem_cache_free+0xa4/0x130
Jun 11 19:24:57 treogen [ 21.220555] [<ffffffff8028ddc2>] ?
mempool_free_slab+0x12/0x20
Jun 11 19:24:57 treogen [ 21.220562] [<ffffffff8028de5a>] ?
mempool_free+0x8a/0xa0
Jun 11 19:24:57 treogen [ 21.220573] [<ffffffff8068e53d>] ?
_spin_lock_irqsave+0x1d/0x40
Jun 11 19:24:57 treogen [ 21.220580] [<ffffffff8041753e>]
check_unmap+0x49e/0x620
Jun 11 19:24:57 treogen [ 21.220587] [<ffffffff8028ddc2>] ?
mempool_free_slab+0x12/0x20
Jun 11 19:24:57 treogen [ 21.220597] [<ffffffff803f17bc>] ?
__freed_request+0x10c/0x160
Jun 11 19:24:57 treogen [ 21.220605] [<ffffffff804177cd>]
debug_dma_unmap_sg+0x10d/0x190
Jun 11 19:24:57 treogen [ 21.220614] [<ffffffff804c1131>] ?
__scsi_put_command+0x61/0xa0
Jun 11 19:24:57 treogen [ 21.220624] [<ffffffff804d4268>]
ata_sg_clean+0x78/0xf0
Jun 11 19:24:57 treogen [ 21.220631] [<ffffffff804d4315>]
__ata_qc_complete+0x35/0x110
Jun 11 19:24:57 treogen [ 21.220640] [<ffffffff804c7b98>] ?
scsi_io_completion+0x398/0x530
Jun 11 19:24:57 treogen [ 21.220647] [<ffffffff804d44ad>]
ata_qc_complete+0xbd/0x250
Jun 11 19:24:57 treogen [ 21.220654] [<ffffffff804d49eb>]
ata_qc_complete_multiple+0xab/0xf0
Jun 11 19:24:57 treogen [ 21.220664] [<ffffffff804ea299>]
sil24_interrupt+0xb9/0x5b0
Jun 11 19:24:57 treogen [ 21.220673] [<ffffffff802730b0>]
handle_IRQ_event+0x70/0x180
Jun 11 19:24:57 treogen [ 21.220681] [<ffffffff802753bd>]
handle_fasteoi_irq+0x6d/0xe0
Jun 11 19:24:57 treogen [ 21.220689] [<ffffffff8020e42f>]
handle_irq+0x1f/0x30
Jun 11 19:24:57 treogen [ 21.220695] [<ffffffff8020db7a>] do_IRQ+0x6a/0xf0
Jun 11 19:24:57 treogen [ 21.220704] [<ffffffff8020be53>]
ret_from_intr+0x0/0xf
Jun 11 19:24:57 treogen [ 21.220707] <EOI> [<ffffffff80409aab>] ?
memcpy_c+0xb/0x20
Jun 11 19:24:57 treogen [ 21.220722] [<ffffffff803e5bc8>] ?
crypto_cbc_encrypt+0xd8/0x1b0
Jun 11 19:24:57 treogen [ 21.220729] [<ffffffff8022f320>] ?
twofish_encrypt+0x0/0x10
Jun 11 19:24:57 treogen [ 21.220739] [<ffffffff803dcd68>] ?
async_encrypt+0x38/0x40
Jun 11 19:24:57 treogen [ 21.220749] [<ffffffff8058447a>] ?
crypt_convert+0x20a/0x2a0
Jun 11 19:24:57 treogen [ 21.220757] [<ffffffff805847dd>] ?
kcryptd_crypt+0x2cd/0x500
Jun 11 19:24:57 treogen [ 21.220765] [<ffffffff80584510>] ?
kcryptd_crypt+0x0/0x500
Jun 11 19:24:57 treogen [ 21.220774] [<ffffffff80255cf7>] ?
worker_thread+0x137/0x1f0
Jun 11 19:24:57 treogen [ 21.220782] [<ffffffff80259d20>] ?
autoremove_wake_function+0x0/0x40
Jun 11 19:24:57 treogen [ 21.220790] [<ffffffff80255bc0>] ?
worker_thread+0x0/0x1f0
Jun 11 19:24:57 treogen [ 21.220797] [<ffffffff80255bc0>] ?
worker_thread+0x0/0x1f0
Jun 11 19:24:57 treogen [ 21.220804] [<ffffffff80259886>] ? kthread+0x56/0x90
Jun 11 19:24:57 treogen [ 21.220812] [<ffffffff8020c4aa>] ?
child_rip+0xa/0x20
Jun 11 19:24:57 treogen [ 21.220819] [<ffffffff8020bea9>] ?
restore_args+0x0/0x30
Jun 11 19:24:57 treogen [ 21.220825] [<ffffffff80259830>] ? kthread+0x0/0x90
Jun 11 19:24:57 treogen [ 21.220832] [<ffffffff8020c4a0>] ?
child_rip+0x0/0x20
Jun 11 19:24:57 treogen [ 21.220836] ---[ end trace f020083379b5b162 ]---
Jun 11 19:24:57 treogen [ 21.339740] Ending clean XFS mount for
filesystem: dm-0

Torsten

2009-06-12 07:51:19

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Thu, Jun 11, 2009 at 07:38:47PM +0200, Torsten Kaiser wrote:
> I tried this patch, but I still get a wrong warning from the DMA-API:
> Jun 11 19:24:57 treogen [ 9.451064] raid1: raid set md2 active with
> 2 out of 2 mirrors
> Jun 11 19:24:57 treogen [ 9.479278] md2: bitmap initialized from
> disk: read 10/10 pages, set 0 bits
> Jun 11 19:24:57 treogen [ 9.479282] created bitmap (153 pages) for device md2
> Jun 11 19:24:57 treogen [ 9.513544] md: ... autorun DONE.
> Jun 11 19:24:57 treogen [ 9.517590] md3: unknown partition table
> Jun 11 19:24:57 treogen [ 20.718608] XFS mounting filesystem dm-0
> Jun 11 19:24:57 treogen [ 21.220452] ------------[ cut here ]------------
> Jun 11 19:24:57 treogen [ 21.220477] WARNING: at lib/dma-debug.c:532
> check_unmap+0x49e/0x620()
> Jun 11 19:24:57 treogen [ 21.220482] Hardware name: KFN5-D SLI
> Jun 11 19:24:57 treogen [ 21.220488] sata_sil24 0000:04:00.0:
> DMA-API: device driver tries to free DM A memory it has not allocated [device address=0x000000011e4c3000]

This happens with the best-fit if there is no perfect fit but multiple
matches. This warning needs to be adjusted to also cover this case. I
have this on my list and will fix it soon.

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632

2009-06-12 14:14:25

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Thu, Jun 11, 2009 at 07:38:47PM +0200, Torsten Kaiser wrote:
> DMA-API: device driver tries to free DM
> A memory it has not allocated [device address=0x000000011e4c3000]
> [size=4096 bytes]

Hmm, looking again over the code I've seen that the ref
dma_debug_entries are not alway filled with all necessary information
for best-fit. Can you please try if you still get false warnings when
you apply the two patches attached instead of the one I sent yesterday?

Thanks,

Joerg

--
| Advanced Micro Devices GmbH
Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei M?nchen
System |
Research | Gesch?ftsf?hrer: Thomas M. McCoy, Giuliano Meroni
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis M?nchen
| Registergericht M?nchen, HRB Nr. 43632


Attachments:
(No filename) (787.00 B)
0001-dma-debug-check-for-sg_call_ents-in-best-fit-algorit.patch (1.79 kB)
0002-dma-debug-be-more-careful-when-building-reference-en.patch (8.61 kB)
Download all attachments

2009-06-12 14:51:20

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 12, 2009 at 4:13 PM, Joerg Roedel<[email protected]> wrote:
> On Thu, Jun 11, 2009 at 07:38:47PM +0200, Torsten Kaiser wrote:
>> DMA-API: device driver tries to free DM
>> A memory it has not allocated [device address=0x000000011e4c3000]
>> [size=4096 bytes]
>
> Hmm, looking again over the code I've seen that the ref
> dma_debug_entries are not alway filled with all necessary information
> for best-fit. Can you please try if you still get false warnings when
> you apply the two patches attached instead of the one I sent yesterday?

Hmm... There isn't a get_nr_mapped_entries() in 2.6.30.
Are these patches again the current linus tree?

For unfilled information I wanted to complain that for non-sg mapping
sg_call_ents never gets filled. But as dma_entry_alloc() does a
memset() to clean the entrys it should not matter.
And what about dma_debug_entry.paddr? Should this also be compared?

Torsten

2009-06-13 11:10:27

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 12, 2009 at 04:51:01PM +0200, Torsten Kaiser wrote:
> On Fri, Jun 12, 2009 at 4:13 PM, Joerg Roedel<[email protected]> wrote:
> > On Thu, Jun 11, 2009 at 07:38:47PM +0200, Torsten Kaiser wrote:
> >> DMA-API: device driver tries to free DM
> >> A memory it has not allocated [device address=0x000000011e4c3000]
> >> [size=4096 bytes]
> >
> > Hmm, looking again over the code I've seen that the ref
> > dma_debug_entries are not alway filled with all necessary information
> > for best-fit. Can you please try if you still get false warnings when
> > you apply the two patches attached instead of the one I sent yesterday?
>
> Hmm... There isn't a get_nr_mapped_entries() in 2.6.30.
> Are these patches again the current linus tree?

Indeed. These patches are against tip/core/iommu but should also apply
against current linus/master too.

> For unfilled information I wanted to complain that for non-sg mapping
> sg_call_ents never gets filled. But as dma_entry_alloc() does a
> memset() to clean the entrys it should not matter.
> And what about dma_debug_entry.paddr? Should this also be compared?

paddr shouldn't matter. We also can't compare against it because it is
not a parameter we get in the unmap path (not always).

Joerg

2009-06-13 14:26:24

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Sat, Jun 13, 2009 at 1:10 PM, Joerg Roedel<[email protected]> wrote:
> On Fri, Jun 12, 2009 at 04:51:01PM +0200, Torsten Kaiser wrote:
>> On Fri, Jun 12, 2009 at 4:13 PM, Joerg Roedel<[email protected]> wrote:
>> > On Thu, Jun 11, 2009 at 07:38:47PM +0200, Torsten Kaiser wrote:
>> >> DMA-API: device driver tries to free DM
>> >> A memory it has not allocated [device address=0x000000011e4c3000]
>> >> [size=4096 bytes]
>> >
>> > Hmm, looking again over the code I've seen that the ref
>> > dma_debug_entries are not alway filled with all necessary information
>> > for best-fit. Can you please try if you still get false warnings when
>> > you apply the two patches attached instead of the one I sent yesterday?
>>
>> Hmm... There isn't a get_nr_mapped_entries() in 2.6.30.
>> Are these patches again the current linus tree?
>
> Indeed. These patches are against tip/core/iommu but should also apply
> against current linus/master too.

OK, I will try 2.6.30-git6 with and without these patches.

>> For unfilled information I wanted to complain that for non-sg mapping
>> sg_call_ents never gets filled. But as dma_entry_alloc() does a
>> memset() to clean the entrys it should not matter.
>> And what about dma_debug_entry.paddr? Should this also be compared?
>
> paddr shouldn't matter. We also can't compare against it because it is
> not a parameter we get in the unmap path (not always).

check_unmap() checks it and sg_call_ents are also optional.

Torsten

2009-06-13 17:08:19

by Torsten Kaiser

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Fri, Jun 12, 2009 at 4:13 PM, Joerg Roedel<[email protected]> wrote:
> On Thu, Jun 11, 2009 at 07:38:47PM +0200, Torsten Kaiser wrote:
>> DMA-API: device driver tries to free DM
>> A memory it has not allocated [device address=0x000000011e4c3000]
>> [size=4096 bytes]
>
> Hmm, looking again over the code I've seen that the ref
> dma_debug_entries are not alway filled with all necessary information
> for best-fit. Can you please try if you still get false warnings when
> you apply the two patches attached instead of the one I sent yesterday?

I tested these patches
(0001-dma-debug-check-for-sg_call_ents-in-best-fit-algorit.patch and
0002-dma-debug-be-more-careful-when-building-reference-en.patch)
against 2.6.30-git6 and did not see any warnings.

I can't be 100% sure about the fix, because I do not have a reliable
trigger, but it looks quite good.

Torsten


Just for reference, with an unpatched 2.6.30-git6 the warning was still there:
Jun 13 16:54:56 treogen [ 168.201493] ------------[ cut here ]------------
Jun 13 16:54:56 treogen [ 168.201512] WARNING: at lib/dma-debug.c:827
check_unmap+0x593/0x670()
Jun 13 16:54:56 treogen [ 168.201517] Hardware name: KFN5-D SLI
Jun 13 16:54:56 treogen [ 168.201522] sata_sil24 0000:04:00.0:
DMA-API: device driver frees DMA sg lis
t with different entry count [map count=1] [unmap count=2]
Jun 13 16:54:56 treogen [ 168.201526] Modules linked in: msp3400
tuner tea5767 tda8290 tuner_xc2028 xc
5000 tda9887 tuner_simple tuner_types mt20xx tea5761 bttv ir_common
v4l2_common videodev v4l1_compat v4
l2_compat_ioctl32 videobuf_dma_sg videobuf_core pata_amd btcx_risc tveeprom sg
Jun 13 16:54:56 treogen [ 168.201562] Pid: 0, comm: swapper Not
tainted 2.6.30-git6 #1
Jun 13 16:54:56 treogen [ 168.201566] Call Trace:
Jun 13 16:54:56 treogen [ 168.201569] <IRQ> [<ffffffff8120bb63>] ?
check_unmap+0x593/0x670
Jun 13 16:54:56 treogen [ 168.201585] [<ffffffff81044138>]
warn_slowpath_common+0x78/0xd0
Jun 13 16:54:56 treogen [ 168.201591] [<ffffffff81044214>]
warn_slowpath_fmt+0x64/0x70
Jun 13 16:54:56 treogen [ 168.201599] [<ffffffff810b3bf5>] ?
__slab_free+0x185/0x340
Jun 13 16:54:56 treogen [ 168.201606] [<ffffffff8120bb63>]
check_unmap+0x593/0x670
Jun 13 16:54:56 treogen [ 168.201615] [<ffffffff810e532d>] ?
bio_free+0x4d/0x60
Jun 13 16:54:56 treogen [ 168.201622] [<ffffffff810e5350>] ?
bio_fs_destructor+0x10/0x20
Jun 13 16:54:56 treogen [ 168.201628] [<ffffffff8120bd3e>]
debug_dma_unmap_sg+0xfe/0x140
Jun 13 16:54:56 treogen [ 168.201636] [<ffffffff812ab102>] ?
put_device+0x12/0x20
Jun 13 16:54:56 treogen [ 168.201645] [<ffffffff812c9248>]
ata_sg_clean+0x78/0xf0
Jun 13 16:54:56 treogen [ 168.201651] [<ffffffff812c92f5>]
__ata_qc_complete+0x35/0x110
Jun 13 16:54:56 treogen [ 168.201658] [<ffffffff812c948d>]
ata_qc_complete+0xbd/0x250
Jun 13 16:54:56 treogen [ 168.201665] [<ffffffff812c99bc>]
ata_qc_complete_multiple+0x9c/0xf0
Jun 13 16:54:56 treogen [ 168.201674] [<ffffffff812df219>]
sil24_interrupt+0xb9/0x5b0
Jun 13 16:54:56 treogen [ 168.201681] [<ffffffff81061c5c>] ?
getnstimeofday+0x5c/0xf0
Jun 13 16:54:56 treogen [ 168.201688] [<ffffffff8105dba9>] ?
ktime_get_ts+0x59/0x60
Jun 13 16:54:56 treogen [ 168.201696] [<ffffffff81074598>]
handle_IRQ_event+0x68/0x160
Jun 13 16:54:56 treogen [ 168.201703] [<ffffffff8107663d>]
handle_fasteoi_irq+0x6d/0xe0
Jun 13 16:54:56 treogen [ 168.201710] [<ffffffff8100e33f>]
handle_irq+0x1f/0x30
Jun 13 16:54:56 treogen [ 168.201715] [<ffffffff8100d9ba>] do_IRQ+0x6a/0xe0
Jun 13 16:54:56 treogen [ 168.201722] [<ffffffff8100bd53>]
ret_from_intr+0x0/0xa
Jun 13 16:54:56 treogen [ 168.201725] <EOI> [<ffffffff8101340a>] ?
default_idle+0x6a/0xd0
Jun 13 16:54:56 treogen [ 168.201738] [<ffffffff8105ebff>] ?
notifier_call_chain+0x3f/0x80
Jun 13 16:54:56 treogen [ 168.201744] [<ffffffff810134a8>] ?
c1e_idle+0x38/0x100
Jun 13 16:54:56 treogen [ 168.201751] [<ffffffff8105ec65>] ?
atomic_notifier_call_chain+0x15/0x20
Jun 13 16:54:56 treogen [ 168.201757] [<ffffffff8100a662>] ?
cpu_idle+0x62/0xb0
Jun 13 16:54:56 treogen [ 168.201766] [<ffffffff8147190d>] ?
rest_init+0x6d/0x80
Jun 13 16:54:56 treogen [ 168.201774] [<ffffffff8171dcad>] ?
start_kernel+0x342/0x405
Jun 13 16:54:56 treogen [ 168.201781] [<ffffffff8171d289>] ?
x86_64_start_reservations+0x99/0xb9
Jun 13 16:54:56 treogen [ 168.201787] [<ffffffff8171d389>] ?
x86_64_start_kernel+0xe0/0xf2
Jun 13 16:54:56 treogen [ 168.201792] ---[ end trace 7ab4a9443c6c6e69 ]---
Jun 13 16:54:56 treogen [ 168.201795] Mapped at:
Jun 13 16:54:56 treogen [ 168.201797] [<ffffffff8120c609>]
debug_dma_map_sg+0x159/0x180
Jun 13 16:54:56 treogen [ 168.201805] [<ffffffff812c97bc>]
ata_qc_issue+0x19c/0x300
Jun 13 16:54:56 treogen [ 168.201812] [<ffffffff812cf6b8>]
ata_scsi_translate+0xa8/0x180
Jun 13 16:54:56 treogen [ 168.201818] [<ffffffff812d2251>]
ata_scsi_queuecmd+0xb1/0x2d0
Jun 13 16:54:56 treogen [ 168.201823] [<ffffffff812b5fb3>]
scsi_dispatch_cmd+0xe3/0x220

2009-06-13 17:35:04

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Sat, Jun 13, 2009 at 04:26:06PM +0200, Torsten Kaiser wrote:
> On Sat, Jun 13, 2009 at 1:10 PM, Joerg Roedel<[email protected]> wrote:
> >
> > paddr shouldn't matter. We also can't compare against it because it is
> > not a parameter we get in the unmap path (not always).
>
> check_unmap() checks it and sg_call_ents are also optional.

check_unmap() only checks it for coherent mappings (where it makes
sense). But for a best-fit search it does not make sense because we
already check for the type and multiple coherent mappings to the same
dma_addr for the same device are not possible (because for coherent
mappings the memory is allocated by the dma_ops backend).

Joerg

2009-06-15 07:46:24

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH] dma-debug: disable DMA_API_DEBUG for now

On Sat, Jun 13, 2009 at 07:08:01PM +0200, Torsten Kaiser wrote:
> On Fri, Jun 12, 2009 at 4:13 PM, Joerg Roedel<[email protected]> wrote:
> > On Thu, Jun 11, 2009 at 07:38:47PM +0200, Torsten Kaiser wrote:
> >> DMA-API: device driver tries to free DM
> >> A memory it has not allocated [device address=0x000000011e4c3000]
> >> [size=4096 bytes]
> >
> > Hmm, looking again over the code I've seen that the ref
> > dma_debug_entries are not alway filled with all necessary information
> > for best-fit. Can you please try if you still get false warnings when
> > you apply the two patches attached instead of the one I sent yesterday?
>
> I tested these patches
> (0001-dma-debug-check-for-sg_call_ents-in-best-fit-algorit.patch and
> 0002-dma-debug-be-more-careful-when-building-reference-en.patch)
> against 2.6.30-git6 and did not see any warnings.
>
> I can't be 100% sure about the fix, because I do not have a reliable
> trigger, but it looks quite good.

Ok cool, I will push these two patches upstream then and will send them
to -stable too. So we get this fixed in a 30.x release too. Thanks a lot
for your testing.

Joerg