2015-06-18 21:49:13

by Lorenzo Nava

[permalink] [raw]
Subject: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA


This patch allows the use of CMA for DMA coherent memory allocation.
At the moment if the input parameter "is_coherent" is set to true
the allocation is not made using the CMA, which I think is not the
desired behaviour.

Signed-off-by: Lorenzo Nava <lorenx4@xxxxxxxx>
---
Changes in v2:
correct __arm_dma_free() according to __dma_alloc() allocation
---
Changes in v3:
now __dma_alloc(), if 'is_coherent' is true, returns memory from CMA
if there is no need for atomic allocation. If CMA is not available
the function returns the result of __alloc_simple_buffer().
__arm_dma_free() frees memory according to the new alloc function
avoiding __dma_free_remap() for coherent DMA if CMA is not enable.
arm_dma_alloc() mark pages as cacheable if attrs are set by default
to NULL. If attrs is not NULL, attributes are preserved in the allocation.

Coherent allocation tested on Xilinx Zynq processor.
---
arch/arm/mm/dma-mapping.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 7e7583d..8e7f402 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -645,15 +645,29 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
size = PAGE_ALIGN(size);
want_vaddr = !dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs);

- if (is_coherent || nommu())
+ if (nommu()) {
addr = __alloc_simple_buffer(dev, size, gfp, &page);
- else if (!(gfp & __GFP_WAIT))
+ goto dma_alloc_done;
+ }
+
+ if (dev_get_cma_area(dev) && (gfp & __GFP_WAIT)) {
+ addr = __alloc_from_contiguous(dev, size, prot, &page,
+ caller, want_vaddr);
+ goto dma_alloc_done;
+ }
+
+ if (is_coherent) {
+ addr = __alloc_simple_buffer(dev, size, gfp, &page);
+ goto dma_alloc_done;
+ }
+
+ if (!(gfp & __GFP_WAIT))
addr = __alloc_from_pool(size, &page);
- else if (!dev_get_cma_area(dev))
- addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller, want_vaddr);
else
- addr = __alloc_from_contiguous(dev, size, prot, &page, caller, want_vaddr);
+ addr = __alloc_remap_buffer(dev, size, gfp, prot, &page,
+ caller, want_vaddr);

+dma_alloc_done:
if (page)
*handle = pfn_to_dma(dev, page_to_pfn(page));

@@ -680,9 +694,14 @@ void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
static void *arm_coherent_dma_alloc(struct device *dev, size_t size,
dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
{
- pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
+ pgprot_t prot;
void *memory;

+ if (attrs == NULL)
+ prot = PAGE_KERNEL;
+ else
+ prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
+
if (dma_alloc_from_coherent(dev, size, handle, &memory))
return memory;

@@ -735,12 +754,12 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr,

size = PAGE_ALIGN(size);

- if (is_coherent || nommu()) {
+ if (nommu()) {
__dma_free_buffer(page, size);
} else if (__free_from_pool(cpu_addr, size)) {
return;
} else if (!dev_get_cma_area(dev)) {
- if (want_vaddr)
+ if (want_vaddr && !is_coherent)
__dma_free_remap(cpu_addr, size);
__dma_free_buffer(page, size);
} else {
--
1.7.10.4


2015-06-26 16:26:03

by Catalin Marinas

[permalink] [raw]
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Thu, Jun 18, 2015 at 11:44:22PM +0200, Lorenzo Nava wrote:
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 7e7583d..8e7f402 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -645,15 +645,29 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> size = PAGE_ALIGN(size);
> want_vaddr = !dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs);
>
> - if (is_coherent || nommu())
> + if (nommu()) {
> addr = __alloc_simple_buffer(dev, size, gfp, &page);
> - else if (!(gfp & __GFP_WAIT))
> + goto dma_alloc_done;
> + }
> +
> + if (dev_get_cma_area(dev) && (gfp & __GFP_WAIT)) {
> + addr = __alloc_from_contiguous(dev, size, prot, &page,
> + caller, want_vaddr);
> + goto dma_alloc_done;
> + }
> +
> + if (is_coherent) {
> + addr = __alloc_simple_buffer(dev, size, gfp, &page);
> + goto dma_alloc_done;
> + }
> +
> + if (!(gfp & __GFP_WAIT))
> addr = __alloc_from_pool(size, &page);
> - else if (!dev_get_cma_area(dev))
> - addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller, want_vaddr);
> else
> - addr = __alloc_from_contiguous(dev, size, prot, &page, caller, want_vaddr);
> + addr = __alloc_remap_buffer(dev, size, gfp, prot, &page,
> + caller, want_vaddr);
>
> +dma_alloc_done:
> if (page)
> *handle = pfn_to_dma(dev, page_to_pfn(page));

The logic here looks alright but I would have preferred the original
more concise "if ... else if" constructs than the gotos (just personal
preference).

> @@ -680,9 +694,14 @@ void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> static void *arm_coherent_dma_alloc(struct device *dev, size_t size,
> dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
> {
> - pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
> + pgprot_t prot;
> void *memory;
>
> + if (attrs == NULL)
> + prot = PAGE_KERNEL;
> + else
> + prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
> +
> if (dma_alloc_from_coherent(dev, size, handle, &memory))
> return memory;

I still think this is the wrong way to fix. It doesn't address the
coherent dma mmap operation. I already replied on the previous version
that we should rather have an extra argument "coherent" to
__get_dma_pgprot().

> @@ -735,12 +754,12 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
>
> size = PAGE_ALIGN(size);
>
> - if (is_coherent || nommu()) {
> + if (nommu()) {
> __dma_free_buffer(page, size);
> } else if (__free_from_pool(cpu_addr, size)) {
> return;

You have an unnecessary __free_from_pool() call here in the is_coherent
case.

> } else if (!dev_get_cma_area(dev)) {
> - if (want_vaddr)
> + if (want_vaddr && !is_coherent)
> __dma_free_remap(cpu_addr, size);
> __dma_free_buffer(page, size);
> } else {

--
Catalin

2015-06-28 20:14:50

by Lorenzo Nava

[permalink] [raw]
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Fri, Jun 26, 2015 at 6:25 PM, Catalin Marinas
<[email protected]> wrote:
> On Thu, Jun 18, 2015 at 11:44:22PM +0200, Lorenzo Nava wrote:
>> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
>> index 7e7583d..8e7f402 100644
>> --- a/arch/arm/mm/dma-mapping.c
>> +++ b/arch/arm/mm/dma-mapping.c
>> @@ -645,15 +645,29 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
>> size = PAGE_ALIGN(size);
>> want_vaddr = !dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs);
>>
>> - if (is_coherent || nommu())
>> + if (nommu()) {
>> addr = __alloc_simple_buffer(dev, size, gfp, &page);
>> - else if (!(gfp & __GFP_WAIT))
>> + goto dma_alloc_done;
>> + }
>> +
>> + if (dev_get_cma_area(dev) && (gfp & __GFP_WAIT)) {
>> + addr = __alloc_from_contiguous(dev, size, prot, &page,
>> + caller, want_vaddr);
>> + goto dma_alloc_done;
>> + }
>> +
>> + if (is_coherent) {
>> + addr = __alloc_simple_buffer(dev, size, gfp, &page);
>> + goto dma_alloc_done;
>> + }
>> +
>> + if (!(gfp & __GFP_WAIT))
>> addr = __alloc_from_pool(size, &page);
>> - else if (!dev_get_cma_area(dev))
>> - addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller, want_vaddr);
>> else
>> - addr = __alloc_from_contiguous(dev, size, prot, &page, caller, want_vaddr);
>> + addr = __alloc_remap_buffer(dev, size, gfp, prot, &page,
>> + caller, want_vaddr);
>>
>> +dma_alloc_done:
>> if (page)
>> *handle = pfn_to_dma(dev, page_to_pfn(page));
>
> The logic here looks alright but I would have preferred the original
> more concise "if ... else if" constructs than the gotos (just personal
> preference).
>
Ok, I can switch back to "if..else": I thought it was becoming too
difficult to read, but in the end the code looks good also with the
original style.

>> @@ -680,9 +694,14 @@ void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
>> static void *arm_coherent_dma_alloc(struct device *dev, size_t size,
>> dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
>> {
>> - pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
>> + pgprot_t prot;
>> void *memory;
>>
>> + if (attrs == NULL)
>> + prot = PAGE_KERNEL;
>> + else
>> + prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
>> +
>> if (dma_alloc_from_coherent(dev, size, handle, &memory))
>> return memory;
>
> I still think this is the wrong way to fix. It doesn't address the
> coherent dma mmap operation. I already replied on the previous version
> that we should rather have an extra argument "coherent" to
> __get_dma_pgprot().
>
I avoided touching the __get_dma_pgprot() function because it affects
a lot of different functions.
If you think that the implementation you suggested in previous reply
was ok and doesn't introduce problems on the other functions using the
__get_dma_pgprot(), for me it's of course ok as well. Do you see any
code that maybe need a double check: I'm thinking, for example, at the
function arm_iommu_alloc_attrs() and arm_iommu_mmap_attrs()? I agree
with you that the extra argument in the __get_dma_pgprot() function is
definitely the best solution, but I have to be sure that this doesn't
affect any other functions with unexpected behaviour.
For the dma mmap there is still the patch "[PATCH v3]
arm/mm/dma-mapping.c: Add arm_coherent_dma_mmap" which corrects the
behaviour of mapped attributes and makes this version of patch ok.

>> @@ -735,12 +754,12 @@ static void __arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
>>
>> size = PAGE_ALIGN(size);
>>
>> - if (is_coherent || nommu()) {
>> + if (nommu()) {
>> __dma_free_buffer(page, size);
>> } else if (__free_from_pool(cpu_addr, size)) {
>> return;
>
> You have an unnecessary __free_from_pool() call here in the is_coherent
> case.
Ok, I'll fix it. Do you think that short-circuit evaluation can be
used here or it is better to use another solution?

>
>> } else if (!dev_get_cma_area(dev)) {
>> - if (want_vaddr)
>> + if (want_vaddr && !is_coherent)
>> __dma_free_remap(cpu_addr, size);
>> __dma_free_buffer(page, size);
>> } else {
>
> --
> Catalin

Thanks.
Lorenzo

2015-06-29 10:37:49

by Catalin Marinas

[permalink] [raw]
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Sun, Jun 28, 2015 at 10:14:43PM +0200, Lorenzo Nava wrote:
> On Fri, Jun 26, 2015 at 6:25 PM, Catalin Marinas
> <[email protected]> wrote:
> > On Thu, Jun 18, 2015 at 11:44:22PM +0200, Lorenzo Nava wrote:
> >> @@ -680,9 +694,14 @@ void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
> >> static void *arm_coherent_dma_alloc(struct device *dev, size_t size,
> >> dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
> >> {
> >> - pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
> >> + pgprot_t prot;
> >> void *memory;
> >>
> >> + if (attrs == NULL)
> >> + prot = PAGE_KERNEL;
> >> + else
> >> + prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
> >> +
> >> if (dma_alloc_from_coherent(dev, size, handle, &memory))
> >> return memory;
> >
> > I still think this is the wrong way to fix. It doesn't address the
> > coherent dma mmap operation. I already replied on the previous version
> > that we should rather have an extra argument "coherent" to
> > __get_dma_pgprot().
>
> I avoided touching the __get_dma_pgprot() function because it affects
> a lot of different functions.
> If you think that the implementation you suggested in previous reply
> was ok and doesn't introduce problems on the other functions using the
> __get_dma_pgprot(), for me it's of course ok as well.

I forgot about the arm_dma_mmap fix here:

https://lkml.org/lkml/2015/5/7/512

So we either fix both cases by changing __get_dma_pgprot() or just go
for Mike's and your patches as above. It's up to Russell.

At some point, we could do with some more clean-up in the dma-mapping.c.
For example, both __alloc_simple_buffer() and __alloc_from_contiguous()
end up calling __dma_clear_buffer() even when not necessary (cacheable
mapping). Not too bad though as this is only done when setting up the
buffer.

> Do you see any code that maybe need a double check: I'm thinking, for
> example, at the function arm_iommu_alloc_attrs() and
> arm_iommu_mmap_attrs()?

I think this has bigger problems. The code seems to only be right for
non-cacheable mappings. For cacheable/coherent iommu mappings, I don't
see any use of the IOMMU_CACHE attribute (should it be returned by
__dma_direction_to_prot?).

--
Catalin

2015-06-29 10:47:03

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Mon, Jun 29, 2015 at 11:37:37AM +0100, Catalin Marinas wrote:
> I forgot about the arm_dma_mmap fix here:
>
> https://lkml.org/lkml/2015/5/7/512
>
> So we either fix both cases by changing __get_dma_pgprot() or just go
> for Mike's and your patches as above. It's up to Russell.

I'd prefer the "least code" option. :)

> At some point, we could do with some more clean-up in the dma-mapping.c.
> For example, both __alloc_simple_buffer() and __alloc_from_contiguous()
> end up calling __dma_clear_buffer() even when not necessary (cacheable
> mapping). Not too bad though as this is only done when setting up the
> buffer.

The reason we always clear the buffer is that we can't be sure that a
driver will not map a buffer allocated by dma_alloc_coherent() into
userspace without it first being initialised. There have been drivers
which do this (ALSA in particular.) I haven't checked whether this
instance still does this, but it used to - and the problem is once
one instance exists, it gets copied...

If it isn't initialised, userspace gets access to whatever data was in
that page prior to the allocation, which could be something like your
root password, or bank details, some other sensitive data.

So, removing the clearing probably opens a rather large security hole.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-06-29 11:05:47

by Catalin Marinas

[permalink] [raw]
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Mon, Jun 29, 2015 at 11:46:04AM +0100, Russell King - ARM Linux wrote:
> On Mon, Jun 29, 2015 at 11:37:37AM +0100, Catalin Marinas wrote:
> > I forgot about the arm_dma_mmap fix here:
> >
> > https://lkml.org/lkml/2015/5/7/512
> >
> > So we either fix both cases by changing __get_dma_pgprot() or just go
> > for Mike's and your patches as above. It's up to Russell.
>
> I'd prefer the "least code" option. :)
>
> > At some point, we could do with some more clean-up in the dma-mapping.c.
> > For example, both __alloc_simple_buffer() and __alloc_from_contiguous()
> > end up calling __dma_clear_buffer() even when not necessary (cacheable
> > mapping). Not too bad though as this is only done when setting up the
> > buffer.
>
> The reason we always clear the buffer is that we can't be sure that a
> driver will not map a buffer allocated by dma_alloc_coherent() into
> userspace without it first being initialised. There have been drivers
> which do this (ALSA in particular.) I haven't checked whether this
> instance still does this, but it used to - and the problem is once
> one instance exists, it gets copied...

You are right, the memset'ing is probably still necessary to patch
potential security holes. The cache flushing is not for coherent buffers
(sometimes this may be more expensive than the memset itself, though
lost in the noise if only done once).

--
Catalin

2015-06-29 11:30:35

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH v3] arm DMA: Fix allocation from CMA for coherent DMA

On Mon, Jun 29, 2015 at 12:05:33PM +0100, Catalin Marinas wrote:
> On Mon, Jun 29, 2015 at 11:46:04AM +0100, Russell King - ARM Linux wrote:
> > The reason we always clear the buffer is that we can't be sure that a
> > driver will not map a buffer allocated by dma_alloc_coherent() into
> > userspace without it first being initialised. There have been drivers
> > which do this (ALSA in particular.) I haven't checked whether this
> > instance still does this, but it used to - and the problem is once
> > one instance exists, it gets copied...
>
> You are right, the memset'ing is probably still necessary to patch
> potential security holes. The cache flushing is not for coherent buffers
> (sometimes this may be more expensive than the memset itself, though
> lost in the noise if only done once).

Well, CMA itself is _not_ a fast allocator. The code is highly inefficient
- multiple loops within loops within loops.

cma_alloc() walks over the range by the alignment size, trying to allocate
each block in sequence.

The worst case is: CMA region size nMB, you're trying to allocate with a
4k alignment, CMA memory is all pinned. It will attempt to allocate the
first 4k page, calling alloc_contig_range() on the range. That tries to
reclaim the pages, and while it may be able to reclaim some, it will try
five times to do so before failing. We'll then increment the starting
address by 4k and repeat, until we get to the end of CMA. So, the outer
loop tries nMB/4kB times to do the allocation - for a 256MB CMA region,
that's 64k calls to alloc_contig_range(), and 327680 calls to the page
reclaim functions.

I've recently seen a single allocation call for 8MB (1080p frame) from
256MB of CMA with scattered pinned pages taking about two seconds to
complete.

One of the problems is that CMA has no knowledge of other users of the
pages, and has no feedback from alloc_contig_range() about which pages
are pinned, so cma_alloc() can't make a sensible guess at the next block
to try.

zeroing and cache flushing is the _least_ of the problems in that code
path.

(Pages can be pinned by ext4fs - see the LWN articles on the subject of
pinned pages and CMA - and also by drivers that hold on to GFP_MOVABLE
pages.)

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.