Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751903AbbESW15 (ORCPT ); Tue, 19 May 2015 18:27:57 -0400 Received: from mail-wi0-f180.google.com ([209.85.212.180]:37624 "EHLO mail-wi0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751174AbbESW1z (ORCPT ); Tue, 19 May 2015 18:27:55 -0400 MIME-Version: 1.0 In-Reply-To: <2882347.Nj1Dq9Wlqh@wuerfel> References: <20150519163436.GZ21251@e104818-lin.cambridge.arm.com> <2882347.Nj1Dq9Wlqh@wuerfel> Date: Wed, 20 May 2015 00:27:53 +0200 Message-ID: Subject: Re: [RFC] arm: DMA-API contiguous cacheable memory From: Lorenzo Nava To: Arnd Bergmann Cc: linux-arm-kernel@lists.infradead.org, Catalin Marinas , linux@arm.linux.org.uk, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5175 Lines: 104 On Wed, May 20, 2015 at 12:14 AM, Arnd Bergmann wrote: > On Wednesday 20 May 2015 00:05:54 Lorenzo Nava wrote: >> >> On Tue, May 19, 2015 at 6:34 PM, Catalin Marinas >> wrote: >> > On Mon, May 18, 2015 at 10:56:06PM +0200, Lorenzo Nava wrote: >> >> it's been a while since I've started working with DMA on ARM processor >> >> for a smart camera project. Typically the requirements is to have a >> >> large memory area which can be accessed by both DMA and user. I've >> >> already noticed that many people wonder about which would be the best >> >> way to have data received from DMA mapped in user space and, more >> >> important, mapped in a cacheable area of memory. Having a memory >> >> mapped region which is cacheable is very important if the user must >> >> access the data and make some sort of processing on that. >> >> My question is: why don't we introduce a function in the DMA-API >> >> interface for ARM processors which allows to allocate a contiguous and >> >> cacheable area of memory (> 4MB)? >> >> This new function can take advantage of the CMA mechanism as >> >> dma_alloc_coherent() function does, but using different PTE attribute >> >> for the allocated pages. Basically making a function similar to >> >> arm_dma_alloc() and set the attributes differently would do the trick: >> >> >> >> pgprot_t prot = __pgprot_modify(prot, L_PTE_MT_MASK, >> >> L_PTE_MT_WRITEALLOC | L_PTE_XN) >> > >> > We already have a way to specify whether a device is coherent via the >> > "dma-coherent" DT property. This allows the correct dma_map_ops to be >> > set for a device. For cache coherent devices, the >> > arm_coherent_dma_alloc() and __dma_alloc() should return cacheable >> > memory. > > That is not what Lorenzo was asking about though. > >> > However, looking at the code, it seems that __dma_alloc() does not use >> > the CMA when is_coherent == true, though you would hit a limit on the >> > number of pages that can be allocated. >> > >> > As for mmap'ing to user space, there is arm_dma_mmap(). This one sets >> > the vm_page_prot to what __get_dma_pgprot() returns which is always >> > non-cacheable. >> > >> > I haven't checked the history cache coherent DMA support on arm but I >> > think some of the above can be changed. As an example, on arm64 >> > __dma_alloc() allocates from CMA independent of whether the device is >> > coherent or not. Also __get_dma_pgprot() returns cacheable attributes >> > for coherent devices, which in turn allows cacheable user mapping of >> > such buffers. You don't really need to implement additional functions, >> > just tweaks to the existing ones. >> >> Thanks for the answer. I do agree with you on that: I'll take a look >> at arm64 code and I'll be glad to contribute with patches as soon as >> possible. >> >> Anyway I'd like to focus on a different aspect: I think that this >> solution can manage cache coherent DMA, so devices which guarantees >> the coherency using cache snooping mechanism. However how can I manage >> devices which needs contiguous memory and don't guarantee cache >> coherency? If the device doesn't implement sg functionality, I can't >> allocate buffers which is greater than 4MB because I can't use neither >> dma_alloc_coherent() nor accessing directly to CMA (well, actually I >> can use dma_alloc_coherent(), but it sounds a little bit confusing). > > So you have a device that is not cache-coherent, and you want to > allocate cacheable memory and manage coherency manually. > > This is normally done using alloc_pages() and dma_map_single(), > but as you have realized, that does not use the CMA area. > >> Do you think that dma_alloc_coherent() can be used as well with this >> type of devices? Do you think that a new dma_alloc_contiguous() >> function would help in this case? >> Maybe my interpretation of dma_alloc_coherent() is not correct, and >> the coherency can be managed using the dma_sync_single_for_* functions >> and it doesn't require hardware mechanism. > > I believe dma_alloc_attrs is the interface you want, with attributes > DMA_ATTR_FORCE_CONTIGUOUS and DMA_ATTR_NON_CONSISTENT. I don't > know if that is already implemented on arm64, but this is something > that can definitely be done. > > With that memory, you should be able to use the normal streaming > API (dma_sync_single_for_*). There is an older interface called > dma_alloc_noncoherent(), but that cannot be easily implemented on > ARM. > > Arnd Yes, this is exactly the point. Currently this function is used only with dma_alloc_coherent() function (which actually call dma_alloc_attrs()). This function, anyway, is not available in the DMA API of linux, but I think it could be useful to manage some kind of devices (see my previous mail). What do you think would be the best way to access dma_alloc_attrs function from a device driver? Call the function directly? Thank you. Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/