LinuxLists.cc - Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

2009-09-01 13:28:46

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Tue, Aug 25, 2009 at 08:53:29AM -0400, Steven Walter wrote:
> On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM
> Linux<[email protected]> wrote:
> [...]
> > As far as userspace DMA coherency, the only way you could do it with
> > current kernel APIs is by using get_user_pages(), creating a scatterlist
> > from those, and then passing it to dma_map_sg(). ?While the device has
> > ownership of the SG, userspace must _not_ touch the buffer until after
> > DMA has completed.
> [...]
>
> Would that work on a processor with VIVT caches? It seems not. In
> particular, dma_map_page uses page_address to get a virtual address to
> pass to map_single(). map_single() in turn uses this address to
> perform cache maintenance. Since page_address() returns the kernel
> virtual address, I don't see how any cache-lines for the userspace
> virtual address would get invalidated (for the DMA_FROM_DEVICE case).

You are correct.

> If that's true, then what is the correct way to allow DMA to/from a
> userspace buffer with a VIVT cache? If not true, what am I missing?

I don't think you read what I said (but I've also forgotten what I did
say).

To put it simply, the kernel does not support DMA direct from userspace
pages. Solutions which have been proposed in the past only work with a
sub-set of conditions (such as the one above only works with VIPT
caches.)

2009-09-01 13:44:24

by Laurent Pinchart

[permalink] [raw]

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Tuesday 01 September 2009 15:28:24 Russell King - ARM Linux wrote:
> On Tue, Aug 25, 2009 at 08:53:29AM -0400, Steven Walter wrote:
> > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM
> > Linux<[email protected]> wrote:
> > [...]
> >
> > > As far as userspace DMA coherency, the only way you could do it with
> > > current kernel APIs is by using get_user_pages(), creating a
> > > scatterlist from those, and then passing it to dma_map_sg(). While the
> > > device has ownership of the SG, userspace must _not_ touch the buffer
> > > until after DMA has completed.
> >
> > [...]
> >
> > Would that work on a processor with VIVT caches? It seems not. In
> > particular, dma_map_page uses page_address to get a virtual address to
> > pass to map_single(). map_single() in turn uses this address to
> > perform cache maintenance. Since page_address() returns the kernel
> > virtual address, I don't see how any cache-lines for the userspace
> > virtual address would get invalidated (for the DMA_FROM_DEVICE case).
>
> You are correct.
>
> > If that's true, then what is the correct way to allow DMA to/from a
> > userspace buffer with a VIVT cache? If not true, what am I missing?
>
> I don't think you read what I said (but I've also forgotten what I did
> say).
>
> To put it simply, the kernel does not support DMA direct from userspace
> pages. Solutions which have been proposed in the past only work with a
> sub-set of conditions (such as the one above only works with VIPT
> caches.)

I might be missing something obvious, but I fail to see how VIVT caches could
work at all with multiple mappings. If a kernel-allocated buffer is DMA'ed to,
we certainly want to invalidate all cache lines that store buffer data. As the
cache doesn't care about physical addresses we thus need to invalidate all
virtual mappings for the buffer. If the buffer is mmap'ed in userspace I don't
see how that would be done.

--
Laurent Pinchart

2009-09-01 14:18:32

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Tue, Sep 01, 2009 at 03:43:48PM +0200, Laurent Pinchart wrote:
> I might be missing something obvious, but I fail to see how VIVT caches
> could work at all with multiple mappings. If a kernel-allocated buffer
> is DMA'ed to, we certainly want to invalidate all cache lines that store
> buffer data. As the cache doesn't care about physical addresses we thus
> need to invalidate all virtual mappings for the buffer. If the buffer is
> mmap'ed in userspace I don't see how that would be done.

You need to ask MM gurus about that. I don't touch the Linux MM very
often so tend to keep forgetting how it works. However, it does work
for shared mappings of files on CPUs with VIVT caches.

2009-09-01 16:54:26

by Hugh Dickins

[permalink] [raw]

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Tue, 1 Sep 2009, Russell King - ARM Linux wrote:
> On Tue, Sep 01, 2009 at 03:43:48PM +0200, Laurent Pinchart wrote:
> > I might be missing something obvious, but I fail to see how VIVT caches
> > could work at all with multiple mappings. If a kernel-allocated buffer
> > is DMA'ed to, we certainly want to invalidate all cache lines that store
> > buffer data. As the cache doesn't care about physical addresses we thus
> > need to invalidate all virtual mappings for the buffer. If the buffer is
> > mmap'ed in userspace I don't see how that would be done.
>
> You need to ask MM gurus about that. I don't touch the Linux MM very
> often so tend to keep forgetting how it works. However, it does work
> for shared mappings of files on CPUs with VIVT caches.

I believe arch/arm/mm/flush.c __flush_dcache_aliases() is what does it.

Hugh

2009-09-02 15:15:05

by Imre Deak

[permalink] [raw]

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Tue, Sep 01, 2009 at 03:43:48PM +0200, ext Laurent Pinchart wrote:
> On Tuesday 01 September 2009 15:28:24 Russell King - ARM Linux wrote:
> > On Tue, Aug 25, 2009 at 08:53:29AM -0400, Steven Walter wrote:
> > > On Thu, Aug 6, 2009 at 6:25 PM, Russell King - ARM
> > > Linux<[email protected]> wrote:
> > > [...]
> > >
> > > > As far as userspace DMA coherency, the only way you could do it with
> > > > current kernel APIs is by using get_user_pages(), creating a
> > > > scatterlist from those, and then passing it to dma_map_sg(). While the
> > > > device has ownership of the SG, userspace must _not_ touch the buffer
> > > > until after DMA has completed.
> > >
> > > [...]
> > >
> > > Would that work on a processor with VIVT caches? It seems not. In
> > > particular, dma_map_page uses page_address to get a virtual address to
> > > pass to map_single(). map_single() in turn uses this address to
> > > perform cache maintenance. Since page_address() returns the kernel
> > > virtual address, I don't see how any cache-lines for the userspace
> > > virtual address would get invalidated (for the DMA_FROM_DEVICE case).
> >
> > You are correct.
> >
> > > If that's true, then what is the correct way to allow DMA to/from a
> > > userspace buffer with a VIVT cache? If not true, what am I missing?
> >
> > I don't think you read what I said (but I've also forgotten what I did
> > say).
> >
> > To put it simply, the kernel does not support DMA direct from userspace
> > pages. Solutions which have been proposed in the past only work with a
> > sub-set of conditions (such as the one above only works with VIPT
> > caches.)
>
> I might be missing something obvious, but I fail to see how VIVT caches could
> work at all with multiple mappings. If a kernel-allocated buffer is DMA'ed to,
> we certainly want to invalidate all cache lines that store buffer data. As the
> cache doesn't care about physical addresses we thus need to invalidate all
> virtual mappings for the buffer. If the buffer is mmap'ed in userspace I don't
> see how that would be done.

To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc
are ok:

The cache lines for direct mapping are flushed in dma_alloc_* and
vmalloc. After this you are not supposed to access the buffers
through the direct mapping until you're done with the DMA.

For kmalloc you use the direct mapping in the first place, so the
flush in dma_map_* will be enough.

For user mappings I think you'd have to do an additional flush for
the direct mapping, while the user mapping is flushed in dma_map_*.

--Imre

2009-09-03 07:37:16

by Imre Deak

[permalink] [raw]

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Wed, Sep 02, 2009 at 05:10:44PM +0200, Deak Imre (Nokia-D/Helsinki) wrote:
> On Tue, Sep 01, 2009 at 03:43:48PM +0200, ext Laurent Pinchart wrote:
> > [...]
> > I might be missing something obvious, but I fail to see how VIVT caches could
> > work at all with multiple mappings. If a kernel-allocated buffer is DMA'ed to,
> > we certainly want to invalidate all cache lines that store buffer data. As the
> > cache doesn't care about physical addresses we thus need to invalidate all
> > virtual mappings for the buffer. If the buffer is mmap'ed in userspace I don't
> > see how that would be done.
>
> To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc
> are ok:
>
> The cache lines for direct mapping are flushed in dma_alloc_* and
> vmalloc. After this you are not supposed to access the buffers
> through the direct mapping until you're done with the DMA.
>
> For kmalloc you use the direct mapping in the first place, so the
> flush in dma_map_* will be enough.
>
> For user mappings I think you'd have to do an additional flush for
> the direct mapping, while the user mapping is flushed in dma_map_*.

Based on the the discussion so far this is my understanding on how
zero-copy DMA is possible on ARM. Could you please confirm / correct
these? :

- user space passes an arbitrary buffer:
- get_user_pages(user address range)
- DMA(user address range)
- user space reads from the buffer

Problems:
- not supported according to Russell
- unhandled faults for cache ops on not-present PTEs, but patch
from Laurent fixes this

- mmap a kernel buffer to user space with cacheable mapping:
- user space writes to the buffer
- flush cache(user address range)
- DMA(kernel buffer)
- user space reads from the buffer

The additional flush cache is needed for VIVT/aliasing VIPT.
Instead of the flush cache:
- the mapping can be done with writethrough, non-writeallocate or
non-cacheable mapping, or
- for aliasing VIPT a non-aliasing user address is picked

DMA(address range) is:
- dma_map_*(address range)
- perform DMA to/from address range
- dma_unmap_*(address range)

Thanks,
Imre

2009-09-03 08:36:24

by Russell King - ARM Linux

[permalink] [raw]

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Wed, Sep 02, 2009 at 06:10:44PM +0300, Imre Deak wrote:
> To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc
> are ok:

For dma_map_*, the only pages/addresses which are valid to pass are
those returned by get_free_pages() or kmalloc. Everything else is
not permitted.

Use of vmalloc'd and dma_alloc_* pages with the dma_map_* APIs is invalid
use of the DMA API. See the notes in the DMA-mapping.txt document
against "dma_map_single".

> For user mappings I think you'd have to do an additional flush for
> the direct mapping, while the user mapping is flushed in dma_map_*.

I will not accept a patch which adds flushing of anything other than
the kernel direct mapping in the dma_map_* functions, so please find
a different approach.

2009-09-08 13:05:16

by Steven Walter

[permalink] [raw]

Subject: Re: How to efficiently handle DMA and cache on ARMv7 ? (was "Is get_user_pages() enough to prevent pages from being swapped out ?")

On Thu, Sep 3, 2009 at 4:36 AM, Russell King - ARM
Linux<[email protected]> wrote:
> On Wed, Sep 02, 2009 at 06:10:44PM +0300, Imre Deak wrote:
>> To my understanding buffers returned by dma_alloc_*, kmalloc, vmalloc
>> are ok:
>
> For dma_map_*, the only pages/addresses which are valid to pass are
> those returned by get_free_pages() or kmalloc. ?Everything else is
> not permitted.
>
> Use of vmalloc'd and dma_alloc_* pages with the dma_map_* APIs is invalid
> use of the DMA API. ?See the notes in the DMA-mapping.txt document
> against "dma_map_single".

Actually, DMA-mapping.txt seems to explicitly say that it's allowed to
use pages allocated by vmalloc:

"It is possible to DMA to the _underlying_ memory mapped into a
vmalloc() area, but this requires walking page tables to get the
physical addresses, and then translating each of those pages back to a
kernel address using something like __va()."

>> For user mappings I think you'd have to do an additional flush for
>> the direct mapping, while the user mapping is flushed in dma_map_*.
>
> I will not accept a patch which adds flushing of anything other than
> the kernel direct mapping in the dma_map_* functions, so please find
> a different approach.

What's the concern here? Just the performance overhead of the checks
and additional flushes? It seems much more desirable for the
dma_map_* API to take care of potential cache aliases than to require
every driver to manage it for itself. After all, part of the purpose
of the DMA API is to manage the cache maintenance around DMAs in an
architecture-independent way.
--
-Steven Walter <[email protected]>