2013-05-27 16:13:27

by Ming Lei

[permalink] [raw]
Subject: [RFC PATCH 0/2] dma-unmap: allow to only unmap completed DMA buffer

This patchset tries to loose the check on DMA buffer size in check_unmap()
of dma-debug first, then only unmap the actual completed DMA buffer in USB
unmap path.

Considered that DMA unmapping often runs in hard irq context, the patch set
may save irq handling time. And the improvement can be observed on
ARMv7(Pandaboard) with the chage, at average ~25us is saved about ehci irq
handling under usbnet ping test case.

Thanks,
--
Ming Lei


2013-05-27 16:13:39

by Ming Lei

[permalink] [raw]
Subject: [RFC PATCH 1/2] dma-debug: allow size to become smaller in dma_unmap

This patch looses the check on DMA buffer size for streaming
DMA unmap, based on the below fact:

- it is common to see only part of DMA transfer is completed,
especially in case of DMA_FROM_DEVICE

So it isn't necessary to unmap the whole DMA buffer inside DMA
unmapping, and unmapping the actual completed buffer should be more
efficient. Considered that unmapping is often called in hard irq
context, time of irq handling can be saved.

Cc: Shuah Khan <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Alexander Duyck <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
lib/dma-debug.c | 39 ++++++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index d87a17a..202c522 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -857,6 +857,7 @@ static void check_unmap(struct dma_debug_entry *ref)
struct dma_debug_entry *entry;
struct hash_bucket *bucket;
unsigned long flags;
+ unsigned int size_invalid = 0;

bucket = get_hash_bucket(ref, &flags);
entry = bucket_find_exact(bucket, ref);
@@ -879,13 +880,8 @@ static void check_unmap(struct dma_debug_entry *ref)
return;
}

- if (ref->size != entry->size) {
- err_printk(ref->dev, entry, "DMA-API: device driver frees "
- "DMA memory with different size "
- "[device address=0x%016llx] [map size=%llu bytes] "
- "[unmap size=%llu bytes]\n",
- ref->dev_addr, entry->size, ref->size);
- }
+ if (ref->size > entry->size)
+ size_invalid = 1;

if (ref->type != entry->type) {
err_printk(ref->dev, entry, "DMA-API: device driver frees "
@@ -894,18 +890,27 @@ static void check_unmap(struct dma_debug_entry *ref)
"[mapped as %s] [unmapped as %s]\n",
ref->dev_addr, ref->size,
type2name[entry->type], type2name[ref->type]);
- } else if ((entry->type == dma_debug_coherent) &&
- (ref->paddr != entry->paddr)) {
- err_printk(ref->dev, entry, "DMA-API: device driver frees "
- "DMA memory with different CPU address "
- "[device address=0x%016llx] [size=%llu bytes] "
- "[cpu alloc address=0x%016llx] "
- "[cpu free address=0x%016llx]",
- ref->dev_addr, ref->size,
- (unsigned long long)entry->paddr,
- (unsigned long long)ref->paddr);
+ } else if (entry->type == dma_debug_coherent) {
+ if (ref->paddr != entry->paddr)
+ err_printk(ref->dev, entry, "DMA-API: device driver frees "
+ "DMA memory with different CPU address "
+ "[device address=0x%016llx] [size=%llu bytes] "
+ "[cpu alloc address=0x%016llx] "
+ "[cpu free address=0x%016llx]",
+ ref->dev_addr, ref->size,
+ (unsigned long long)entry->paddr,
+ (unsigned long long)ref->paddr);
+ if (ref->size != entry->size)
+ size_invalid = 1;
}

+ if (size_invalid)
+ err_printk(ref->dev, entry, "DMA-API: device driver frees "
+ "DMA memory with different size "
+ "[device address=0x%016llx] [map size=%llu bytes] "
+ "[unmap size=%llu bytes]\n",
+ ref->dev_addr, entry->size, ref->size);
+
if (ref->sg_call_ents && ref->type == dma_debug_sg &&
ref->sg_call_ents != entry->sg_call_ents) {
err_printk(ref->dev, entry, "DMA-API: device driver frees "
--
1.7.9.5

2013-05-27 16:13:45

by Ming Lei

[permalink] [raw]
Subject: [PATCH 2/2] USB: hcd: only unmap the actual completed DMA buffer

This patch only unmap the actual completed DMA buffer instead of
the whole transfer buffer.

It is common to see only part of DMA transfer is completed, especially
in case of DMA_FROM_DEVICE because the length of incoming traffic often
is unknown before submitting URB, so this patch may improve USB
DMA unmapping which runs in hard irq context.

The patch has been tested on ARMv7(Pandaboard), and it is observed that
at average ~25us is saved about ehci interrupt handling on below usbnet
test case:

- Pandaboard: IP address is IP_A
- on one x86 box, run below command:
#ping -f -s 1472 IP_A
- compute ehci interrupt handling time on Pandaboard during ping
test

Cc: Alan Stern <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
drivers/usb/core/hcd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 53baa87..2722487 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1374,12 +1374,12 @@ void usb_hcd_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
else if (urb->transfer_flags & URB_DMA_MAP_PAGE)
dma_unmap_page(hcd->self.controller,
urb->transfer_dma,
- urb->transfer_buffer_length,
+ urb->actual_length,
dir);
else if (urb->transfer_flags & URB_DMA_MAP_SINGLE)
dma_unmap_single(hcd->self.controller,
urb->transfer_dma,
- urb->transfer_buffer_length,
+ urb->actual_length,
dir);
else if (urb->transfer_flags & URB_MAP_LOCAL)
hcd_free_coherent(urb->dev->bus,
--
1.7.9.5

2013-05-27 17:13:45

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: [PATCH 2/2] USB: hcd: only unmap the actual completed DMA buffer

Hello.

On 27-05-2013 20:13, Ming Lei wrote:

> This patch only unmap the actual completed DMA buffer instead of
> the whole transfer buffer.

Who will unmap the rest of the buffer?

> It is common to see only part of DMA transfer is completed, especially
> in case of DMA_FROM_DEVICE because the length of incoming traffic often
> is unknown before submitting URB, so this patch may improve USB
> DMA unmapping which runs in hard irq context.

> The patch has been tested on ARMv7(Pandaboard), and it is observed that
> at average ~25us is saved about ehci interrupt handling on below usbnet
> test case:

> - Pandaboard: IP address is IP_A
> - on one x86 box, run below command:
> #ping -f -s 1472 IP_A
> - compute ehci interrupt handling time on Pandaboard during ping
> test

This seems just crazy to me. What has been mapped, should be unmapped.

> Cc: Alan Stern <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> drivers/usb/core/hcd.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

WBR, Sergei

2013-05-27 20:18:28

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] dma-unmap: allow to only unmap completed DMA buffer

On Tue, 28 May 2013, Ming Lei wrote:

> This patchset tries to loose the check on DMA buffer size in check_unmap()
> of dma-debug first, then only unmap the actual completed DMA buffer in USB
> unmap path.
>
> Considered that DMA unmapping often runs in hard irq context, the patch set
> may save irq handling time. And the improvement can be observed on
> ARMv7(Pandaboard) with the chage, at average ~25us is saved about ehci irq
> handling under usbnet ping test case.

Are you aware that these changes directly contradict the advice in
Documentation/DMA-API.txt? That file says:


void
dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
enum dma_data_direction direction)

Unmaps the region previously mapped. All the parameters passed in
must be identical to those passed in (and returned) by the mapping
API.


If you are going to make a change like this then you have to change the
documentation too. But in fact I doubt that the proposed change will
work correctly in all circumstances. For example, if an IOMMU is
present.

Alan Stern

2013-05-27 20:37:10

by Alexander Duyck

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] dma-debug: allow size to become smaller in dma_unmap

On 05/27/2013 09:13 AM, Ming Lei wrote:
> This patch looses the check on DMA buffer size for streaming
> DMA unmap, based on the below fact:
>
> - it is common to see only part of DMA transfer is completed,
> especially in case of DMA_FROM_DEVICE
>
> So it isn't necessary to unmap the whole DMA buffer inside DMA
> unmapping, and unmapping the actual completed buffer should be more
> efficient. Considered that unmapping is often called in hard irq
> context, time of irq handling can be saved.
>
> Cc: Shuah Khan <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Alexander Duyck <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>

What you are proposing doesn't make much sense. If you are only wanting
to use part of a buffer then just use the dma_sync primitives.

The idea behind unmapping a buffer is to free any resources associated
with it. Calling map once, and unmap multiple times per buffer is just
asking for trouble in the form of use after free or memory leaks.

Thanks,

Alex