2015-11-04 09:58:25

by Sharma, Sanjeev

[permalink] [raw]
Subject: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

_dma_page_cpu_to_dev() treat DMA_BIDIRECTIONAL similar to DMA_TO_DEVICE
which means that destination buffer is device memory,means cpu may have
written some data to source buffer and data may be in cache line.For
cleaner operation we need to call outer_flush_range() which will
clean and invalidate outer cache lines.

Signed-off-by: Sanjeev Sharma <[email protected]>
---
arch/arm/mm/dma-mapping.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e62400e..e195235 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -850,12 +850,20 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
dma_cache_maint_page(page, off, size, dir, dmac_map_area);

paddr = page_to_phys(page) + off;
- if (dir == DMA_FROM_DEVICE) {
- outer_inv_range(paddr, paddr + size);
- } else {
- outer_clean_range(paddr, paddr + size);
+
+ switch (dir) {
+ case DMA_FROM_DEVICE:
+ outer_inv_range(paddr, paddr + size);
+ break;
+ case DMA_TO_DEVICE:
+ outer_clean_range(paddr, paddr + size);
+ break;
+ case DMA_BIDIRECTIONAL:
+ outer_flush_range(paddr, paddr + size);
+ break;
+ default:
+ break;
}
- /* FIXME: non-speculating: flush on bidirectional mappings? */
}

static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
--
1.7.11.7


2015-11-04 10:39:14

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

On Wed, Nov 04, 2015 at 03:26:48PM +0530, Sanjeev Sharma wrote:
> _dma_page_cpu_to_dev() treat DMA_BIDIRECTIONAL similar to DMA_TO_DEVICE
> which means that destination buffer is device memory,means cpu may have
> written some data to source buffer and data may be in cache line.For
> cleaner operation we need to call outer_flush_range() which will
> clean and invalidate outer cache lines.

Why isn't the clean sufficient in this case? We're mapping the buffer
to the device, so we clean the dirty lines in the CPU caches and make
them visible to the device. If the CPU later wants to read the buffer
(i.e. after the device has DMA'd into it), you'll need to map the
buffer to the CPU, which will perform the invalidation of the CPU caches.

Will

> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index e62400e..e195235 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -850,12 +850,20 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
> dma_cache_maint_page(page, off, size, dir, dmac_map_area);
>
> paddr = page_to_phys(page) + off;
> - if (dir == DMA_FROM_DEVICE) {
> - outer_inv_range(paddr, paddr + size);
> - } else {
> - outer_clean_range(paddr, paddr + size);
> +
> + switch (dir) {
> + case DMA_FROM_DEVICE:
> + outer_inv_range(paddr, paddr + size);
> + break;
> + case DMA_TO_DEVICE:
> + outer_clean_range(paddr, paddr + size);
> + break;
> + case DMA_BIDIRECTIONAL:
> + outer_flush_range(paddr, paddr + size);
> + break;
> + default:
> + break;
> }
> - /* FIXME: non-speculating: flush on bidirectional mappings? */
> }
>
> static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
> --
> 1.7.11.7
>

2015-11-04 10:54:32

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

On Wed, Nov 04, 2015 at 10:39:13AM +0000, Will Deacon wrote:
> On Wed, Nov 04, 2015 at 03:26:48PM +0530, Sanjeev Sharma wrote:
> > _dma_page_cpu_to_dev() treat DMA_BIDIRECTIONAL similar to DMA_TO_DEVICE
> > which means that destination buffer is device memory,means cpu may have
> > written some data to source buffer and data may be in cache line.For
> > cleaner operation we need to call outer_flush_range() which will
> > clean and invalidate outer cache lines.
>
> Why isn't the clean sufficient in this case? We're mapping the buffer
> to the device, so we clean the dirty lines in the CPU caches and make
> them visible to the device. If the CPU later wants to read the buffer
> (i.e. after the device has DMA'd into it), you'll need to map the
> buffer to the CPU, which will perform the invalidation of the CPU caches.

Indeed. bidirectional mode is already handled prefectly well by this
code. No patches are required.

(I never received the original email.)

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2015-11-05 05:57:36

by Sharma, Sanjeev

[permalink] [raw]
Subject: RE: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

-----Original Message-----
From: Russell King - ARM Linux [mailto:[email protected]]
Sent: Wednesday, November 04, 2015 4:24 PM
To: Will Deacon
Cc: Sharma, Sanjeev; [email protected]; [email protected]; [email protected]
Subject: Re: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

On Wed, Nov 04, 2015 at 10:39:13AM +0000, Will Deacon wrote:
> On Wed, Nov 04, 2015 at 03:26:48PM +0530, Sanjeev Sharma wrote:
> > _dma_page_cpu_to_dev() treat DMA_BIDIRECTIONAL similar to
> > DMA_TO_DEVICE which means that destination buffer is device
> > memory,means cpu may have written some data to source buffer and
> > data may be in cache line.For cleaner operation we need to call
> > outer_flush_range() which will clean and invalidate outer cache lines.
>
> Why isn't the clean sufficient in this case? We're mapping the buffer
> to the device, so we clean the dirty lines in the CPU caches and make
> them visible to the device. If the CPU later wants to read the buffer
> (i.e. after the device has DMA'd into it), you'll need to map the
> buffer to the CPU, which will perform the invalidation of the CPU caches.

Indeed. bidirectional mode is already handled prefectly well by this code. No patches are required.

Thanks Russell & Will for providing input.

Let's assume , CPU don't read the buffer then there could be the problem correct ? IMO, to handle every use case outer_flush_range can be used ?
If still it doesn't make sense to use flush on bidirectional mappings, then FIXME comment should be removed from the function to avoid any
Confusion.

(I never received the original email.)

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.

2015-11-09 10:08:09

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

On Mon, Nov 09, 2015 at 11:29:17AM +0530, sanjeev sharma wrote:
> On Wed, Nov 04, 2015 at 10:39:13AM +0000, Will Deacon wrote:
> > On Wed, Nov 04, 2015 at 03:26:48PM +0530, Sanjeev Sharma wrote:
> > > _dma_page_cpu_to_dev() treat DMA_BIDIRECTIONAL similar to
> > > DMA_TO_DEVICE which means that destination buffer is device
> > > memory,means cpu may have written some data to source buffer and
> > > data may be in cache line.For cleaner operation we need to call
> > > outer_flush_range() which will clean and invalidate outer cache lines.
> >
> > Why isn't the clean sufficient in this case? We're mapping the buffer
> > to the device, so we clean the dirty lines in the CPU caches and make
> > them visible to the device. If the CPU later wants to read the buffer
> > (i.e. after the device has DMA'd into it), you'll need to map the
> > buffer to the CPU, which will perform the invalidation of the CPU caches.
>
> Indeed. bidirectional mode is already handled prefectly well by this
> code. No patches are required.
>
> Thanks Russell & Will for providing input.
>
> Let's assume , CPU don't read the buffer then there could be the problem
> correct ? IMO, to handle every use case outer_flush_range can be used ?
> If still it doesn't make sense to use flush on bidirectional mappings, then
> FIXME comment should be removed from the function to avoid any
> Confusion.
>
>
>
> Please let me know what you think on above comment ?

I still don't understand the problem that you're trying to fix.

Sorry,

Will

2015-11-09 10:17:03

by Sharma, Sanjeev

[permalink] [raw]
Subject: RE: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

On Mon, Nov 09, 2015 at 11:29:17AM +0530, sanjeev sharma wrote:
> On Wed, Nov 04, 2015 at 10:39:13AM +0000, Will Deacon wrote:
> > On Wed, Nov 04, 2015 at 03:26:48PM +0530, Sanjeev Sharma wrote:
> > > _dma_page_cpu_to_dev() treat DMA_BIDIRECTIONAL similar to
> > > DMA_TO_DEVICE which means that destination buffer is device
> > > memory,means cpu may have written some data to source buffer and
> > > data may be in cache line.For cleaner operation we need to call
> > > outer_flush_range() which will clean and invalidate outer cache lines.
> >
> > Why isn't the clean sufficient in this case? We're mapping the buffer
> > to the device, so we clean the dirty lines in the CPU caches and make
> > them visible to the device. If the CPU later wants to read the buffer
> > (i.e. after the device has DMA'd into it), you'll need to map the
> > buffer to the CPU, which will perform the invalidation of the CPU caches.
>
> Indeed. bidirectional mode is already handled prefectly well by this
> code. No patches are required.
>
> Thanks Russell & Will for providing input.
>
> Let's assume , CPU don't read the buffer then there could be the problem
> correct ? IMO, to handle every use case outer_flush_range can be used ?
> If still it doesn't make sense to use flush on bidirectional mappings, then
> FIXME comment should be removed from the function to avoid any
> Confusion.
>
>
>
> Please let me know what you think on above comment ?

I still don't understand the problem that you're trying to fix.

It may cause the following issue.
1.we create the buffer with cache, and in some cases, the cache may be dirty.
2.then we call the sync_for_device function with flag DMA_BIDIRECTIONAL to avoid some cache problems.
3. however __dma_page_cpu_to_dev() just see DMA_BIDIRECTIONAL the same as
DMA_TO_DEVICE, which means the kernel will not invalid the cache if we use the flag DMA_BIDIRECTIONAL.
4.since the dirty cache is not invalid, the dirty content may be showed on the buffer in the future rendering.

Sorry,

Will

2015-11-09 10:50:47

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

On 09/11/15 10:15, Sharma, Sanjeev wrote:
> On Mon, Nov 09, 2015 at 11:29:17AM +0530, sanjeev sharma wrote:
>> On Wed, Nov 04, 2015 at 10:39:13AM +0000, Will Deacon wrote:
>> > On Wed, Nov 04, 2015 at 03:26:48PM +0530, Sanjeev Sharma wrote:
>> > > _dma_page_cpu_to_dev() treat DMA_BIDIRECTIONAL similar to
>> > > DMA_TO_DEVICE which means that destination buffer is device
>> > > memory,means cpu may have written some data to source buffer and
>> > > data may be in cache line.For cleaner operation we need to call
>> > > outer_flush_range() which will clean and invalidate outer cache lines.
>> >
>> > Why isn't the clean sufficient in this case? We're mapping the buffer
>> > to the device, so we clean the dirty lines in the CPU caches and make
>> > them visible to the device. If the CPU later wants to read the buffer
>> > (i.e. after the device has DMA'd into it), you'll need to map the
>> > buffer to the CPU, which will perform the invalidation of the CPU caches.
>>
>> Indeed. bidirectional mode is already handled prefectly well by this
>> code. No patches are required.
>>
>> Thanks Russell & Will for providing input.
>>
>> Let's assume , CPU don't read the buffer then there could be the problem
>> correct ? IMO, to handle every use case outer_flush_range can be used ?
>> If still it doesn't make sense to use flush on bidirectional mappings, then
>> FIXME comment should be removed from the function to avoid any
>> Confusion.
>>
>>
>>
>> Please let me know what you think on above comment ?
>
> I still don't understand the problem that you're trying to fix.
>
> It may cause the following issue.
> 1.we create the buffer with cache, and in some cases, the cache may be dirty.
> 2.then we call the sync_for_device function with flag DMA_BIDIRECTIONAL to avoid some cache problems.

This performs a cache clean, so the dirty lines are flushed out and
cache and memory contents now match.

> 3. however __dma_page_cpu_to_dev() just see DMA_BIDIRECTIONAL the same as
> DMA_TO_DEVICE, which means the kernel will not invalid the cache if we use the flag DMA_BIDIRECTIONAL.

The CPU doesn't need to invalidate the cache at this point, since a)
it's valid, and, crucially b) it will now refrain from accessing the
buffer until the device has finished writing.

> 4.since the dirty cache is not invalid, the dirty content may be showed on the buffer in the future rendering.

The CPU _must_ call *_sync_for_cpu before it either reads or writes the
buffer again. In this case, DMA_BIDIRECTIONAL is equivalent to
DMA_FROM_DEVICE, thus will invalidate what the CPU still thinks are
clean cache lines, so that whatever the device wrote to memory is then
visible.

If you're seeing wrong data anywhere, that implies you have some
necessary sync calls missing.

Robin.

2015-11-09 12:00:32

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] ARM:dma-mapping: Handle DMA_BIDIRECTIONAL in _dma_page_cpu_to_dev()

On Mon, Nov 09, 2015 at 10:15:34AM +0000, Sharma, Sanjeev wrote:
> It may cause the following issue.
> 1.we create the buffer with cache, and in some cases, the cache may be dirty.
> 2.then we call the sync_for_device function with flag DMA_BIDIRECTIONAL to
> avoid some cache problems.

This is wrong. Please read the DMA-API document on proper use of these
functions. Enable CONFIG_DMA_API_DEBUG as well.

> 3. however __dma_page_cpu_to_dev() just see DMA_BIDIRECTIONAL the same as
> DMA_TO_DEVICE, which means the kernel will not invalid the cache if we use
> the flag DMA_BIDIRECTIONAL.
> 4.since the dirty cache is not invalid, the dirty content may be showed on
> the buffer in the future rendering.

This is again wrong. __dma_page_cpu_to_dev() with DMA_BIDIRECTIONAL will
_clean_ the cache, which means it will push out all the dirty content
in the cache. However, it leaves the data in the cache in case we want
to read it later (for the FROM_DEVICE.)

It is _invalid_ to read from the mapping while the device owns it, and
as Cortex CPUs speculatively prefetch, you can end up with new cach
lines allocated in this memory region. So, before reading the memory,
you _must_ either unmap the DMA buffer, or call dma_sync_for_cpu().
Either of those two functions will then invalidate the cache for a
DMA_BIDIRECTIONAL mapping, allowing you to safely read the data.

--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.