2019-08-03 06:28:14

by Shik Chen

[permalink] [raw]
Subject: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

Similar to the commit 1161db6776bd ("media: usb: pwc: Don't use coherent
DMA buffers for ISO transfer") [1] for the pwc driver. Use streaming DMA
APIs to transfer buffers and sync them explicitly, because accessing
buffers allocated by usb_alloc_coherent() is slow on platforms without
hardware coherent DMA.

Tested on x86-64 (Intel Celeron 4305U) and armv7l (Rockchip RK3288) with
Logitech Brio 4K camera at 1920x1080 using the WebRTC sample site [3].

| | URB (us) | Decode (Gbps) | CPU (%) |
|------------------|------------|---------------|---------|
| x86-64 coherent | 53 +- 20 | 50.6 | 0.24 |
| x86-64 streaming | 55 +- 19 | 50.1 | 0.25 |
| armv7l coherent | 342 +- 379 | 1.8 | 2.16 |
| armv7l streaming | 99 +- 98 | 11.0 | 0.36 |

The performance characteristics are improved a lot on armv7l, and
remained (almost) unchanged on x86-64. The code used for measurement can
be found at [2].

[1] https://git.kernel.org/torvalds/c/1161db6776bd
[2] https://crrev.com/c/1729133
[3] https://webrtc.github.io/samples/src/content/getusermedia/resolution/

Signed-off-by: Shik Chen <[email protected]>
---
The allocated buffer could be as large as 768K when streaming 4K video.
Ideally we should use some generic helper to allocate non-coherent
memory in a more efficient way, such as https://lwn.net/Articles/774429/
("make the non-consistent DMA allocator more userful").

But I cannot find any existing helper so a simple kzalloc() is used in
this patch. The logic to figure out the DMA addressable GFP flags is
similar to __dma_direct_alloc_pages() without the optimistic retries:
https://elixir.bootlin.com/linux/v5.2.5/source/kernel/dma/direct.c#L96

drivers/media/usb/uvc/uvc_video.c | 65 +++++++++++++++++++++----------
1 file changed, 45 insertions(+), 20 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
index 8fa77a81dd7f2c..962c35478896c4 100644
--- a/drivers/media/usb/uvc/uvc_video.c
+++ b/drivers/media/usb/uvc/uvc_video.c
@@ -1539,6 +1539,8 @@ static void uvc_video_complete(struct urb *urb)
* Process the URB headers, and optionally queue expensive memcpy tasks
* to be deferred to a work queue.
*/
+ dma_sync_single_for_cpu(&urb->dev->dev, urb->transfer_dma,
+ urb->transfer_buffer_length, DMA_FROM_DEVICE);
stream->decode(uvc_urb, buf, buf_meta);

/* If no async work is needed, resubmit the URB immediately. */
@@ -1565,18 +1567,51 @@ static void uvc_free_urb_buffers(struct uvc_streaming *stream)
if (!uvc_urb->buffer)
continue;

-#ifndef CONFIG_DMA_NONCOHERENT
- usb_free_coherent(stream->dev->udev, stream->urb_size,
- uvc_urb->buffer, uvc_urb->dma);
-#else
+ dma_unmap_single(&stream->dev->udev->dev, uvc_urb->dma,
+ stream->urb_size, DMA_FROM_DEVICE);
kfree(uvc_urb->buffer);
-#endif
- uvc_urb->buffer = NULL;
}

stream->urb_size = 0;
}

+static gfp_t uvc_alloc_gfp_flags(struct device *dev)
+{
+ u64 mask = dma_get_mask(dev);
+
+ if (dev->bus_dma_mask)
+ mask &= dev->bus_dma_mask;
+
+ if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
+ return GFP_DMA;
+
+ if (mask < DMA_BIT_MASK(64)) {
+ if (IS_ENABLED(CONFIG_ZONE_DMA32))
+ return GFP_DMA32;
+ if (IS_ENABLED(CONFIG_ZONE_DMA))
+ return GFP_DMA;
+ }
+
+ return 0;
+}
+
+static char *uvc_alloc_urb_buffer(struct device *dev, size_t size,
+ gfp_t gfp_flags, dma_addr_t *dma_handle)
+{
+ void *buffer = kzalloc(size, gfp_flags | uvc_alloc_gfp_flags(dev));
+
+ if (!buffer)
+ return NULL;
+
+ *dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(dev, *dma_handle)) {
+ kfree(buffer);
+ return NULL;
+ }
+
+ return buffer;
+}
+
/*
* Allocate transfer buffers. This function can be called with buffers
* already allocated when resuming from suspend, in which case it will
@@ -1607,18 +1642,14 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,

/* Retry allocations until one succeed. */
for (; npackets > 1; npackets /= 2) {
+ stream->urb_size = psize * npackets;
+
for (i = 0; i < UVC_URBS; ++i) {
struct uvc_urb *uvc_urb = &stream->uvc_urb[i];

- stream->urb_size = psize * npackets;
-#ifndef CONFIG_DMA_NONCOHERENT
- uvc_urb->buffer = usb_alloc_coherent(
- stream->dev->udev, stream->urb_size,
+ uvc_urb->buffer = uvc_alloc_urb_buffer(
+ &stream->dev->udev->dev, stream->urb_size,
gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
-#else
- uvc_urb->buffer =
- kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
-#endif
if (!uvc_urb->buffer) {
uvc_free_urb_buffers(stream);
break;
@@ -1728,12 +1759,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
urb->context = uvc_urb;
urb->pipe = usb_rcvisocpipe(stream->dev->udev,
ep->desc.bEndpointAddress);
-#ifndef CONFIG_DMA_NONCOHERENT
urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
urb->transfer_dma = uvc_urb->dma;
-#else
- urb->transfer_flags = URB_ISO_ASAP;
-#endif
urb->interval = ep->desc.bInterval;
urb->transfer_buffer = uvc_urb->buffer;
urb->complete = uvc_video_complete;
@@ -1793,10 +1820,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,

usb_fill_bulk_urb(urb, stream->dev->udev, pipe, uvc_urb->buffer,
size, uvc_video_complete, uvc_urb);
-#ifndef CONFIG_DMA_NONCOHERENT
urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
urb->transfer_dma = uvc_urb->dma;
-#endif

uvc_urb->urb = urb;
}
--
2.22.0.770.g0f2c4a37fd-goog


2019-08-27 02:06:17

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

On Fri, Aug 2, 2019 at 10:12 PM Shik Chen <[email protected]> wrote:
>
> Similar to the commit 1161db6776bd ("media: usb: pwc: Don't use coherent
> DMA buffers for ISO transfer") [1] for the pwc driver. Use streaming DMA
> APIs to transfer buffers and sync them explicitly, because accessing
> buffers allocated by usb_alloc_coherent() is slow on platforms without
> hardware coherent DMA.
>
> Tested on x86-64 (Intel Celeron 4305U) and armv7l (Rockchip RK3288) with
> Logitech Brio 4K camera at 1920x1080 using the WebRTC sample site [3].
>
> | | URB (us) | Decode (Gbps) | CPU (%) |
> |------------------|------------|---------------|---------|
> | x86-64 coherent | 53 +- 20 | 50.6 | 0.24 |
> | x86-64 streaming | 55 +- 19 | 50.1 | 0.25 |
> | armv7l coherent | 342 +- 379 | 1.8 | 2.16 |
> | armv7l streaming | 99 +- 98 | 11.0 | 0.36 |
>
> The performance characteristics are improved a lot on armv7l, and
> remained (almost) unchanged on x86-64. The code used for measurement can
> be found at [2].
>
> [1] https://git.kernel.org/torvalds/c/1161db6776bd
> [2] https://crrev.com/c/1729133
> [3] https://webrtc.github.io/samples/src/content/getusermedia/resolution/
>
> Signed-off-by: Shik Chen <[email protected]>
> ---
> The allocated buffer could be as large as 768K when streaming 4K video.
> Ideally we should use some generic helper to allocate non-coherent
> memory in a more efficient way, such as https://lwn.net/Articles/774429/
> ("make the non-consistent DMA allocator more userful").
>
> But I cannot find any existing helper so a simple kzalloc() is used in
> this patch. The logic to figure out the DMA addressable GFP flags is
> similar to __dma_direct_alloc_pages() without the optimistic retries:
> https://elixir.bootlin.com/linux/v5.2.5/source/kernel/dma/direct.c#L96
>
> drivers/media/usb/uvc/uvc_video.c | 65 +++++++++++++++++++++----------
> 1 file changed, 45 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
> index 8fa77a81dd7f2c..962c35478896c4 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -1539,6 +1539,8 @@ static void uvc_video_complete(struct urb *urb)
> * Process the URB headers, and optionally queue expensive memcpy tasks
> * to be deferred to a work queue.
> */
> + dma_sync_single_for_cpu(&urb->dev->dev, urb->transfer_dma,
> + urb->transfer_buffer_length, DMA_FROM_DEVICE);
> stream->decode(uvc_urb, buf, buf_meta);
>
> /* If no async work is needed, resubmit the URB immediately. */
> @@ -1565,18 +1567,51 @@ static void uvc_free_urb_buffers(struct uvc_streaming *stream)
> if (!uvc_urb->buffer)
> continue;
>
> -#ifndef CONFIG_DMA_NONCOHERENT
> - usb_free_coherent(stream->dev->udev, stream->urb_size,
> - uvc_urb->buffer, uvc_urb->dma);
> -#else
> + dma_unmap_single(&stream->dev->udev->dev, uvc_urb->dma,
> + stream->urb_size, DMA_FROM_DEVICE);
> kfree(uvc_urb->buffer);
> -#endif
> - uvc_urb->buffer = NULL;
> }
>
> stream->urb_size = 0;
> }
>
> +static gfp_t uvc_alloc_gfp_flags(struct device *dev)
> +{
> + u64 mask = dma_get_mask(dev);
> +
> + if (dev->bus_dma_mask)
> + mask &= dev->bus_dma_mask;
> +
> + if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
> + return GFP_DMA;
> +
> + if (mask < DMA_BIT_MASK(64)) {
> + if (IS_ENABLED(CONFIG_ZONE_DMA32))
> + return GFP_DMA32;
> + if (IS_ENABLED(CONFIG_ZONE_DMA))
> + return GFP_DMA;
> + }
> +
> + return 0;
> +}
> +
> +static char *uvc_alloc_urb_buffer(struct device *dev, size_t size,
> + gfp_t gfp_flags, dma_addr_t *dma_handle)
> +{
> + void *buffer = kzalloc(size, gfp_flags | uvc_alloc_gfp_flags(dev));
> +
> + if (!buffer)
> + return NULL;
> +
> + *dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE);
> + if (dma_mapping_error(dev, *dma_handle)) {
> + kfree(buffer);
> + return NULL;
> + }
> +
> + return buffer;
> +}
> +
> /*
> * Allocate transfer buffers. This function can be called with buffers
> * already allocated when resuming from suspend, in which case it will
> @@ -1607,18 +1642,14 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
>
> /* Retry allocations until one succeed. */
> for (; npackets > 1; npackets /= 2) {
> + stream->urb_size = psize * npackets;
> +
> for (i = 0; i < UVC_URBS; ++i) {
> struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
>
> - stream->urb_size = psize * npackets;
> -#ifndef CONFIG_DMA_NONCOHERENT
> - uvc_urb->buffer = usb_alloc_coherent(
> - stream->dev->udev, stream->urb_size,
> + uvc_urb->buffer = uvc_alloc_urb_buffer(
> + &stream->dev->udev->dev, stream->urb_size,
> gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
> -#else
> - uvc_urb->buffer =
> - kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
> -#endif
> if (!uvc_urb->buffer) {
> uvc_free_urb_buffers(stream);
> break;
> @@ -1728,12 +1759,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
> urb->context = uvc_urb;
> urb->pipe = usb_rcvisocpipe(stream->dev->udev,
> ep->desc.bEndpointAddress);
> -#ifndef CONFIG_DMA_NONCOHERENT
> urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
> urb->transfer_dma = uvc_urb->dma;
> -#else
> - urb->transfer_flags = URB_ISO_ASAP;
> -#endif
> urb->interval = ep->desc.bInterval;
> urb->transfer_buffer = uvc_urb->buffer;
> urb->complete = uvc_video_complete;
> @@ -1793,10 +1820,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
>
> usb_fill_bulk_urb(urb, stream->dev->udev, pipe, uvc_urb->buffer,
> size, uvc_video_complete, uvc_urb);
> -#ifndef CONFIG_DMA_NONCOHERENT
> urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
> urb->transfer_dma = uvc_urb->dma;
> -#endif
>
> uvc_urb->urb = urb;
> }
> --
> 2.22.0.770.g0f2c4a37fd-goog
>

Gentle ping.

Also, oops, looks like we forgot to add Kieran on CC. Let me fix it now.

Best regards,
Tomasz

2019-09-28 03:34:37

by Nicolas Boichat

[permalink] [raw]
Subject: Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

On Fri, Aug 2, 2019 at 9:13 PM Shik Chen <[email protected]> wrote:
>
> Similar to the commit 1161db6776bd ("media: usb: pwc: Don't use coherent
> DMA buffers for ISO transfer") [1] for the pwc driver. Use streaming DMA
> APIs to transfer buffers and sync them explicitly, because accessing
> buffers allocated by usb_alloc_coherent() is slow on platforms without
> hardware coherent DMA.
>
> Tested on x86-64 (Intel Celeron 4305U) and armv7l (Rockchip RK3288) with
> Logitech Brio 4K camera at 1920x1080 using the WebRTC sample site [3].
>
> | | URB (us) | Decode (Gbps) | CPU (%) |
> |------------------|------------|---------------|---------|
> | x86-64 coherent | 53 +- 20 | 50.6 | 0.24 |
> | x86-64 streaming | 55 +- 19 | 50.1 | 0.25 |
> | armv7l coherent | 342 +- 379 | 1.8 | 2.16 |
> | armv7l streaming | 99 +- 98 | 11.0 | 0.36 |
>
> The performance characteristics are improved a lot on armv7l, and
> remained (almost) unchanged on x86-64. The code used for measurement can
> be found at [2].
>
> [1] https://git.kernel.org/torvalds/c/1161db6776bd
> [2] https://crrev.com/c/1729133
> [3] https://webrtc.github.io/samples/src/content/getusermedia/resolution/
>
> Signed-off-by: Shik Chen <[email protected]>
> ---
> The allocated buffer could be as large as 768K when streaming 4K video.
> Ideally we should use some generic helper to allocate non-coherent
> memory in a more efficient way, such as https://lwn.net/Articles/774429/
> ("make the non-consistent DMA allocator more userful").
>
> But I cannot find any existing helper so a simple kzalloc() is used in
> this patch. The logic to figure out the DMA addressable GFP flags is
> similar to __dma_direct_alloc_pages() without the optimistic retries:
> https://elixir.bootlin.com/linux/v5.2.5/source/kernel/dma/direct.c#L96
>
> drivers/media/usb/uvc/uvc_video.c | 65 +++++++++++++++++++++----------
> 1 file changed, 45 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c
> index 8fa77a81dd7f2c..962c35478896c4 100644
> --- a/drivers/media/usb/uvc/uvc_video.c
> +++ b/drivers/media/usb/uvc/uvc_video.c
> @@ -1539,6 +1539,8 @@ static void uvc_video_complete(struct urb *urb)
> * Process the URB headers, and optionally queue expensive memcpy tasks
> * to be deferred to a work queue.
> */
> + dma_sync_single_for_cpu(&urb->dev->dev, urb->transfer_dma,
> + urb->transfer_buffer_length, DMA_FROM_DEVICE);
> stream->decode(uvc_urb, buf, buf_meta);
>
> /* If no async work is needed, resubmit the URB immediately. */
> @@ -1565,18 +1567,51 @@ static void uvc_free_urb_buffers(struct uvc_streaming *stream)
> if (!uvc_urb->buffer)
> continue;
>
> -#ifndef CONFIG_DMA_NONCOHERENT
> - usb_free_coherent(stream->dev->udev, stream->urb_size,
> - uvc_urb->buffer, uvc_urb->dma);
> -#else
> + dma_unmap_single(&stream->dev->udev->dev, uvc_urb->dma,
> + stream->urb_size, DMA_FROM_DEVICE);
> kfree(uvc_urb->buffer);
> -#endif
> - uvc_urb->buffer = NULL;
> }
>
> stream->urb_size = 0;
> }
>
> +static gfp_t uvc_alloc_gfp_flags(struct device *dev)
> +{
> + u64 mask = dma_get_mask(dev);
> +
> + if (dev->bus_dma_mask)
> + mask &= dev->bus_dma_mask;
> +
> + if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
> + return GFP_DMA;
> +
> + if (mask < DMA_BIT_MASK(64)) {
> + if (IS_ENABLED(CONFIG_ZONE_DMA32))
> + return GFP_DMA32;

We're hitting issues with this on 64-bit ARM platform, where
ZONE_DMA32 is enabled (default), the kmalloc allocation with GFP_DMA32
fails.

There is no slab cache for GFP_DMA32, so those calls are expected to
fail, AFAICT there are no other (working) kmalloc(..., .. | GFP_DMA32)
users in the kernel, so I don't think we want to add a cache. If this
helps, some discussion here why the cache wasn't added:
https://lore.kernel.org/patchwork/patch/1009563/#1198622

This function looks out of place in a high-level driver, but from your
comment above (below ---), I guess we may have to live with that until
the kernel provides a better API?

For the time being, looking at how __dma_direct_alloc_pages works,
could you use alloc_pages_node (or dma_alloc_contiguous?) instead of
kmalloc, that would let you use GFP_DMA32 as needed?

> + if (IS_ENABLED(CONFIG_ZONE_DMA))
> + return GFP_DMA;
> + }
> +
> + return 0;
> +}
> +
> +static char *uvc_alloc_urb_buffer(struct device *dev, size_t size,
> + gfp_t gfp_flags, dma_addr_t *dma_handle)
> +{
> + void *buffer = kzalloc(size, gfp_flags | uvc_alloc_gfp_flags(dev));
> +
> + if (!buffer)
> + return NULL;
> +
> + *dma_handle = dma_map_single(dev, buffer, size, DMA_FROM_DEVICE);
> + if (dma_mapping_error(dev, *dma_handle)) {
> + kfree(buffer);
> + return NULL;
> + }
> +
> + return buffer;
> +}
> +
> /*
> * Allocate transfer buffers. This function can be called with buffers
> * already allocated when resuming from suspend, in which case it will
> @@ -1607,18 +1642,14 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream,
>
> /* Retry allocations until one succeed. */
> for (; npackets > 1; npackets /= 2) {
> + stream->urb_size = psize * npackets;
> +
> for (i = 0; i < UVC_URBS; ++i) {
> struct uvc_urb *uvc_urb = &stream->uvc_urb[i];
>
> - stream->urb_size = psize * npackets;
> -#ifndef CONFIG_DMA_NONCOHERENT
> - uvc_urb->buffer = usb_alloc_coherent(
> - stream->dev->udev, stream->urb_size,
> + uvc_urb->buffer = uvc_alloc_urb_buffer(
> + &stream->dev->udev->dev, stream->urb_size,
> gfp_flags | __GFP_NOWARN, &uvc_urb->dma);
> -#else
> - uvc_urb->buffer =
> - kmalloc(stream->urb_size, gfp_flags | __GFP_NOWARN);
> -#endif
> if (!uvc_urb->buffer) {
> uvc_free_urb_buffers(stream);
> break;
> @@ -1728,12 +1759,8 @@ static int uvc_init_video_isoc(struct uvc_streaming *stream,
> urb->context = uvc_urb;
> urb->pipe = usb_rcvisocpipe(stream->dev->udev,
> ep->desc.bEndpointAddress);
> -#ifndef CONFIG_DMA_NONCOHERENT
> urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
> urb->transfer_dma = uvc_urb->dma;
> -#else
> - urb->transfer_flags = URB_ISO_ASAP;
> -#endif
> urb->interval = ep->desc.bInterval;
> urb->transfer_buffer = uvc_urb->buffer;
> urb->complete = uvc_video_complete;
> @@ -1793,10 +1820,8 @@ static int uvc_init_video_bulk(struct uvc_streaming *stream,
>
> usb_fill_bulk_urb(urb, stream->dev->udev, pipe, uvc_urb->buffer,
> size, uvc_video_complete, uvc_urb);
> -#ifndef CONFIG_DMA_NONCOHERENT
> urb->transfer_flags = URB_NO_TRANSFER_DMA_MAP;
> urb->transfer_dma = uvc_urb->dma;
> -#endif
>
> uvc_urb->urb = urb;
> }
> --
> 2.22.0.770.g0f2c4a37fd-goog
>

2019-09-30 08:26:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

On Sat, Sep 28, 2019 at 11:33:16AM +0800, Nicolas Boichat wrote:
> > +static gfp_t uvc_alloc_gfp_flags(struct device *dev)
> > +{
> > + u64 mask = dma_get_mask(dev);
> > +
> > + if (dev->bus_dma_mask)
> > + mask &= dev->bus_dma_mask;
> > +
> > + if (mask < DMA_BIT_MASK(32) && IS_ENABLED(CONFIG_ZONE_DMA))
> > + return GFP_DMA;
> > +
> > + if (mask < DMA_BIT_MASK(64)) {
> > + if (IS_ENABLED(CONFIG_ZONE_DMA32))
> > + return GFP_DMA32;
>
> We're hitting issues with this on 64-bit ARM platform, where
> ZONE_DMA32 is enabled (default), the kmalloc allocation with GFP_DMA32
> fails.
>
> There is no slab cache for GFP_DMA32, so those calls are expected to
> fail, AFAICT there are no other (working) kmalloc(..., .. | GFP_DMA32)
> users in the kernel, so I don't think we want to add a cache. If this
> helps, some discussion here why the cache wasn't added:
> https://lore.kernel.org/patchwork/patch/1009563/#1198622

And drivers really have no business looking at the dma mask. I have
a plan for dma_alloc_pages API that could replace that cruft, but
until then please use GFP_KERNEL and let the dma subsystem bounce
buffer if needed.

2019-10-01 06:41:00

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

On Mon, Sep 30, 2019 at 01:23:10AM -0700, Christoph Hellwig wrote:
> And drivers really have no business looking at the dma mask. I have
> a plan for dma_alloc_pages API that could replace that cruft, but
> until then please use GFP_KERNEL and let the dma subsystem bounce
> buffer if needed.

Can you try this series:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages

and see if it does whay you need for usb?

2020-02-27 06:30:29

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

+Sergey Senozhatsky who's going to be looking into this.

Hi Christoph,

On Tue, Oct 1, 2019 at 3:37 PM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Sep 30, 2019 at 01:23:10AM -0700, Christoph Hellwig wrote:
> > And drivers really have no business looking at the dma mask. I have
> > a plan for dma_alloc_pages API that could replace that cruft, but
> > until then please use GFP_KERNEL and let the dma subsystem bounce
> > buffer if needed.
>
> Can you try this series:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
>
> and see if it does whay you need for usb?

Reviving this thread. Sorry for no updates for a long time.

dma_alloc_pages() still wouldn't be an equivalent replacement of the
existing dma_alloc_coherent() (used behind the scenes by
usb_alloc_coherent()). That's because the latter can allocate
non-contiguous memory if the DMA device can handle it (i.e. is behind
an IOMMU), but the former can only allocate a contiguous range of
pages.

That said, I noticed that you also put a lot of effort into making the
NONCONSISTENT attribute more usable. Perhaps that's the way to go here
then? Of course we would need to make sure that the attribute is
handled properly on ARM and ARM64, which are the most affected
platforms. Right now neither handles them. The former doesn't use the
generic DMA mapping ops, while the latter does, but doesn't enable a
Kconfig option needed to allow generic inconsistent allocations.

Any hints would be appreciated.

Best regards,
Tomasz

2020-04-21 11:23:08

by Tomasz Figa

[permalink] [raw]
Subject: Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

On Thu, Feb 27, 2020 at 7:28 AM Tomasz Figa <[email protected]> wrote:
>
> +Sergey Senozhatsky who's going to be looking into this.
>
> Hi Christoph,
>
> On Tue, Oct 1, 2019 at 3:37 PM Christoph Hellwig <[email protected]> wrote:
> >
> > On Mon, Sep 30, 2019 at 01:23:10AM -0700, Christoph Hellwig wrote:
> > > And drivers really have no business looking at the dma mask. I have
> > > a plan for dma_alloc_pages API that could replace that cruft, but
> > > until then please use GFP_KERNEL and let the dma subsystem bounce
> > > buffer if needed.
> >
> > Can you try this series:
> >
> > http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
> >
> > and see if it does whay you need for usb?
>
> Reviving this thread. Sorry for no updates for a long time.
>
> dma_alloc_pages() still wouldn't be an equivalent replacement of the
> existing dma_alloc_coherent() (used behind the scenes by
> usb_alloc_coherent()). That's because the latter can allocate
> non-contiguous memory if the DMA device can handle it (i.e. is behind
> an IOMMU), but the former can only allocate a contiguous range of
> pages.
>
> That said, I noticed that you also put a lot of effort into making the
> NONCONSISTENT attribute more usable. Perhaps that's the way to go here
> then? Of course we would need to make sure that the attribute is
> handled properly on ARM and ARM64, which are the most affected
> platforms. Right now neither handles them. The former doesn't use the
> generic DMA mapping ops, while the latter does, but doesn't enable a
> Kconfig option needed to allow generic inconsistent allocations.
>
> Any hints would be appreciated.

Hi Christoph, would you have some time to check the above?

Hi Catalin, Will, do you know why CONFIG_DMA_NONCOHERENT_CACHE_SYNC is
not enabled on arm64?

Thanks in advance. :)

Best regards,
Tomasz

2020-04-27 12:50:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] media: uvcvideo: Use streaming DMA APIs to transfer buffers

On Tue, Apr 21, 2020 at 01:21:15PM +0200, Tomasz Figa wrote:
> On Thu, Feb 27, 2020 at 7:28 AM Tomasz Figa <[email protected]> wrote:
> >
> > +Sergey Senozhatsky who's going to be looking into this.
> >
> > Hi Christoph,
> >
> > That said, I noticed that you also put a lot of effort into making the
> > NONCONSISTENT attribute more usable. Perhaps that's the way to go here
> > then? Of course we would need to make sure that the attribute is
> > handled properly on ARM and ARM64, which are the most affected
> > platforms. Right now neither handles them. The former doesn't use the
> > generic DMA mapping ops, while the latter does, but doesn't enable a
> > Kconfig option needed to allow generic inconsistent allocations.
> >
> > Any hints would be appreciated.
>
> Hi Christoph, would you have some time to check the above?
>
> Hi Catalin, Will, do you know why CONFIG_DMA_NONCOHERENT_CACHE_SYNC is
> not enabled on arm64?

NONCONSISTENT is still a mess, mostly because dma_cache_sync is such
a horrible API. I've been wanting to switch to the normal
sync_for_device / sync_for_cpu primitives instead. Let me see if I can
expedite that.