Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755594AbcKOIR1 (ORCPT ); Tue, 15 Nov 2016 03:17:27 -0500 Received: from mailout1.samsung.com ([203.254.224.24]:39493 "EHLO mailout1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755399AbcKOIRZ (ORCPT ); Tue, 15 Nov 2016 03:17:25 -0500 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 X-AuditID: cbfee61b-f796f6d000004092-16-582ac237f343 Content-transfer-encoding: 8BIT Subject: Re: [PATCH] [RFC] drivers: dma-coherent: use MEMREMAP_WB instead of MEMREMAP_WC To: Brian Starkey References: <1478682609-26477-1-git-send-email-jaewon31.kim@samsung.com> <20161109092726.GA6009@e106950-lin.cambridge.arm.com> <5822F0AE.30101@samsung.com> <20161109102336.GB6009@e106950-lin.cambridge.arm.com> <5823D057.2050509@samsung.com> <20161110095155.GA28852@e106950-lin.cambridge.arm.com> <58259520.208@samsung.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, jaewon31.kim@samsung.com From: Jaewon Kim Message-id: <582AC24E.8090909@samsung.com> Date: Tue, 15 Nov 2016 17:07:42 +0900 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 In-reply-to: <58259520.208@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprDIsWRmVeSWpSXmKPExsVy+t9jQV3zQ1oRBluncls8OnWB0aL3/Ssm i8u75rBZ3Fvzn9WBxWPNvDWMHps+TWL36NuyitHj8ya5AJYoN5uM1MSU1CKF1Lzk/JTMvHRb pdAQN10LJYW8xNxUW6UIXd+QICWFssScUiDPyAANODgHuAcr6dsluGW8vLKXqWBGaMW114vY GxhPOHcxcnJICJhIPNmzlgnCFpO4cG89WxcjF4eQwCxGiQ+nv7KCJHgFBCV+TL7H0sXIwcEs IC9x5FI2hKkuMWVKLkT5Q0aJl19OsYOUCwtESVz7eoENpEZEQFOicU48SFhIYBGzxLJHnCA2 s4CfxN5FZ8DK2QS0Jd4vmAS1SUtix8zHYHEWAVWJf83nGEFsUYEIidXrrjGD2JxAa0/eecAy gRHoRoTjZiEcNwvhuAWMzKsYJVILkguKk9JzjfJSy/WKE3OLS/PS9ZLzczcxgmPmmfQOxsO7 3A8xCnAwKvHwOntpRQixJpYVV+YeYpTgYFYS4a3cBxTiTUmsrEotyo8vKs1JLT7EaAp04kRm KdHkfGA855XEG5qYm5gbG1iYW1qaGCmJ8zbOfhYuJJCeWJKanZpakFoE08fEwSnVwLjraVFQ 3IejVj8u17G6/t+kkMsfayO0aM/J0ws4ri6odnrqzJxWl+i43mRu3YMNx1TWq7n/nP9Bq2me h3bI2aJNDHyiEZwPfdOPMxt336k5/OO07MqzpyN/qilnqZ9pynj5RfHch4WJPo6npp+aYtts eMOqQiibo9Jop39gXLRjXxWPQdqrBUosxRmJhlrMRcWJAE4VMj6vAgAA X-MTR: 20000000000000000@CPGS Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10658 Lines: 246 Hi Brian please look into my new approach. I may need to change other part to use this patch but I want to get your comment for dma_alloc_from_coherent. [PATCH] [RFC] drivers: dma-coherent: pass struct dma_attrs to dma_alloc_from_coherent dma_alloc_from_coherent does not get struct dma_attrs information. If dma_attrs information is passed to dma_alloc_from_coherent, dma_alloc_from_coherent can do more jobs accodring to the information. As a example I added DMA_ATTR_SKIP_ZEROING to skip zeroing. Accoring to driver implementation ZEROING could be skipped or could be done later. Signed-off-by: Jaewon Kim --- drivers/base/dma-coherent.c | 6 +++++- include/linux/dma-mapping.h | 7 ++++--- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index 640a7e6..428eced 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -151,6 +151,7 @@ void *dma_mark_declared_memory_occupied(struct device *dev, * @dma_handle: This will be filled with the correct dma handle * @ret: This pointer will be filled with the virtual address * to allocated area. + * @attrs: dma_attrs to pass additional information * * This function should be only called from per-arch dma_alloc_coherent() * to support allocation from per-device coherent memory pools. @@ -159,7 +160,8 @@ void *dma_mark_declared_memory_occupied(struct device *dev, * generic memory areas, or !0 if dma_alloc_coherent should return @ret. */ int dma_alloc_from_coherent(struct device *dev, ssize_t size, - dma_addr_t *dma_handle, void **ret) + dma_addr_t *dma_handle, void **ret, + struct dma_attrs *attrs) { struct dma_coherent_mem *mem; int order = get_order(size); @@ -190,6 +192,8 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size, *ret = mem->virt_base + (pageno << PAGE_SHIFT); dma_memory_map = (mem->flags & DMA_MEMORY_MAP); spin_unlock_irqrestore(&mem->spinlock, flags); + if (dma_get_attr(DMA_ATTR_SKIP_ZEROING, attrs)) + return 1; if (dma_memory_map) memset(*ret, 0, size); else diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 08528af..737fd71 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -151,13 +151,14 @@ static inline int is_device_dma_capable(struct device *dev) * Don't use them in device drivers. */ int dma_alloc_from_coherent(struct device *dev, ssize_t size, - dma_addr_t *dma_handle, void **ret); + dma_addr_t *dma_handle, void **ret, + struct dma_attrs *attrs); int dma_release_from_coherent(struct device *dev, int order, void *vaddr); int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, size_t size, int *ret); #else -#define dma_alloc_from_coherent(dev, size, handle, ret) (0) +#define dma_alloc_from_coherent(dev, size, handle, ret, attrs) (0) #define dma_release_from_coherent(dev, order, vaddr) (0) #define dma_mmap_from_coherent(dev, vma, vaddr, order, ret) (0) #endif /* CONFIG_HAVE_GENERIC_DMA_COHERENT */ @@ -456,7 +457,7 @@ static inline void *dma_alloc_attrs(struct device *dev, size_t size, BUG_ON(!ops); - if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr)) + if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr, attrs)) return cpu_addr; if (!arch_dma_alloc_attrs(&dev, &flag)) Thank you Jaewon Kim On 2016년 11월 11일 18:53, Jaewon Kim wrote: > Hi > > On 2016년 11월 10일 18:51, Brian Starkey wrote: >> Hi Jaewon, >> >> On Thu, Nov 10, 2016 at 10:41:43AM +0900, Jaewon Kim wrote: >>> Hi >>> >>> On 2016년 11월 09일 19:23, Brian Starkey wrote: >>>> Hi, >>>> >>>> On Wed, Nov 09, 2016 at 06:47:26PM +0900, Jaewon Kim wrote: >>>>> >>>>> On 2016년 11월 09일 18:27, Brian Starkey wrote: >>>>>> Hi Jaewon, >>>>>> >>>>>> On Wed, Nov 09, 2016 at 06:10:09PM +0900, Jaewon Kim wrote: >>>>>>> Commit 6b03ae0d42bf (drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MA) >>>>>>> added MEMREMAP_WC for DMA_MEMORY_MAP. If, however, CPU cache can be used on >>>>>>> DMA_MEMORY_MAP, I think MEMREMAP_WC can be changed to MEMREMAP_WB. On my local >>>>>>> ARM device, memset in dma_alloc_from_coherent sometimes takes much longer with >>>>>>> MEMREMAP_WC compared to MEMREMAP_WB. >>>>>>> >>>>>>> Test results on AArch64 by allocating 4MB with putting trace_printk right >>>>>>> before and after memset. >>>>>>> MEMREMAP_WC : 11.0ms, 5.7ms, 4.2ms, 4.9ms, 5.4ms, 4.3ms, 3.5ms >>>>>>> MEMREMAP_WB : 0.7ms, 0.6ms, 0.6ms, 0.6ms, 0.6ms, 0.5ms, 0.4 ms >>>>>>> >>>>>> This doesn't look like a good idea to me. The point of coherent memory >>>>>> is to have it non-cached, however WB will make writes hit the cache. >>>>>> >>>>>> Writing to the cache is of course faster than writing to RAM, but >>>>>> that's not what we want to do here. >>>>>> >>>>>> -Brian >>>>>> >>>>> Hi Brian >>>>> >>>>> Thank you for your comment. >>>>> If allocated memory will be used by TZ side, however, I think cacheable >>>>> also can be used to be fast on memset in dma_alloc_from_coherent. >>>> Are you trying to share the buffer between the secure and non-secure >>>> worlds on the CPU? In that case, I don't think caching really helps >>>> you. I'm not a TZ expert, but I believe the two worlds can never >>>> share cached data. >>> I do not want memory sharing between the secure and non-secure worlds. >>> I just want faster allocation. >> So now I'm confused. Why did you mention TZ? >> >> Could you explain what your use-case for the buffer you are allocating >> is? > I wanted to send physically contiguous memory to TZapp size at Linux runtime. > And during discussion I realized I need to consider more if dma-coherent is proper approach only for TZapp. > But if another DMA master is joined, l think we can think this issue again. > Like secure HW device get memory from dma-coherent and it passes to TZapp. >>> I am not a TZ expert, either. I also think they cannot share cached data. >>> As far as I know secure world can decide its cache policy with secure >>> world page table regardless of non-secure world. >>>> If you want the secure world to see the non-secure world's data, as >>>> far as I know you will need to clean the cache in the non-secure world >>>> to make sure the secure world can see it (and vice-versa). I'd expect >>>> this to remove most of the speed advantage of using WB in the first >>>> place, except for some possible speedup from more efficient bursting. >>> Yes I also think non-secure world need to clean the cache before secure world >>> access the memory region to avoid invalid data issue. But if other software >>> like Linux driver or hypervisor do the cache cleaning, or engineer confirm, >>> then we may be able to use MEMREMAP_WB or just skipping memset for faster >>> memory allocation in dma_alloc_from_coherent. >> Skipping the memset doesn't sound like a good plan, you'd potentially >> be leaking information to whatever device receives the memory. And >> adding WB mappings to an API which is intended to be used for sharing >> buffers with DMA masters doesn't sound like a good move either. >> >>>> If you're sharing the buffer with other DMA masters, regardless of >>>> secure/non-secure you're not going to want WB mappings. >>>> >>> If there is a scenario where another DMA master works on this memory, >>> an engineer, I think, need to consider cache clean if he/she uses WB. >> The whole point of dma-coherent memory is for use by DMA masters. >> >>>>> How do you think to add another flag to distinguish this case? >>>> You could look into the streaming DMA API. It will depend on the exact >>>> implementation, but at some point you're still going to have to pay >>>> the penalty of syncing the CPU and device. >>>> >>>> -Brian >>>> >>> I cannot find DMA API and flag for WB. So I am considering additional flag >>> to meet my request. In my opinion the flag can be either WB or non-zeroing. >> To me, it sounds like dma-coherent isn't the right tool to achieve >> what you want, but I'm not clear on exactly what it is you're trying >> to do (I know you want faster allocations, the point is what for?) >> > I actually looking into enum dma_attr which has DMA_ATTR_SKIP_ZEROING. > If dma_alloc_attrs in arch/arm64/include/asm/dma-mapping.h passes struct dma_attrs *attrs to dma_alloc_from_coherent, > I think I can do what I want. > > Thank you for your comment. >> -Brian >> >>> For case #1 - DMA_MEMORY_MAP_WB >>> --- a/drivers/base/dma-coherent.c >>> +++ b/drivers/base/dma-coherent.c >>> @@ -32,7 +32,9 @@ static bool dma_init_coherent_memory( >>> if (!size) >>> goto out; >>> >>> - if (flags & DMA_MEMORY_MAP) >>> + if (flags & DMA_MEMORY_MAP_WB) >>> + mem_base = memremap(phys_addr, size, MEMREMAP_WB); >>> + else if (flags & DMA_MEMORY_MAP) >>> mem_base = memremap(phys_addr, size, MEMREMAP_WC); >>> else >>> mem_base = ioremap(phys_addr, size); >>> >>> For case #2 - DMA_MEMORY_MAP_NOZEROING >>> --- a/drivers/base/dma-coherent.c >>> +++ b/drivers/base/dma-coherent.c >>> @@ -190,6 +190,8 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size, >>> *ret = mem->virt_base + (pageno << PAGE_SHIFT); >>> dma_memory_map = (mem->flags & DMA_MEMORY_MAP); >>> spin_unlock_irqrestore(&mem->spinlock, flags); >>> + if (mem->flags & DMA_MEMORY_MAP_NOZEROING) >>> + return 1; >>> if (dma_memory_map) >>> memset(*ret, 0, size); >>> else >>> >>> Can I get your comment? >>> >>> Thank you >>> Jaewon Kim >>> >>>>>>> Signed-off-by: Jaewon Kim >>>>>>> --- >>>>>>> drivers/base/dma-coherent.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c >>>>>>> index 640a7e6..0512a1d 100644 >>>>>>> --- a/drivers/base/dma-coherent.c >>>>>>> +++ b/drivers/base/dma-coherent.c >>>>>>> @@ -33,7 +33,7 @@ static bool dma_init_coherent_memory( >>>>>>> goto out; >>>>>>> >>>>>>> if (flags & DMA_MEMORY_MAP) >>>>>>> - mem_base = memremap(phys_addr, size, MEMREMAP_WC); >>>>>>> + mem_base = memremap(phys_addr, size, MEMREMAP_WB); >>>>>>> else >>>>>>> mem_base = ioremap(phys_addr, size); >>>>>>> if (!mem_base) >>>>>>> -- >>>>>>> 1.9.1 >>>>>>> >>>>>> >>>>>> >>>> >>>> >> >>