Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755187AbcKJBle (ORCPT ); Wed, 9 Nov 2016 20:41:34 -0500 Received: from mailout4.samsung.com ([203.254.224.34]:50514 "EHLO mailout4.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752532AbcKJBld (ORCPT ); Wed, 9 Nov 2016 20:41:33 -0500 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 X-AuditID: cbfee61b-f796f6d000004092-ba-5823d04b0495 Content-transfer-encoding: 8BIT Subject: Re: [PATCH] [RFC] drivers: dma-coherent: use MEMREMAP_WB instead of MEMREMAP_WC To: Brian Starkey References: <1478682609-26477-1-git-send-email-jaewon31.kim@samsung.com> <20161109092726.GA6009@e106950-lin.cambridge.arm.com> <5822F0AE.30101@samsung.com> <20161109102336.GB6009@e106950-lin.cambridge.arm.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, jaewon31.kim@samsung.com From: Jaewon Kim Message-id: <5823D057.2050509@samsung.com> Date: Thu, 10 Nov 2016 10:41:43 +0900 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 In-reply-to: <20161109102336.GB6009@e106950-lin.cambridge.arm.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprFIsWRmVeSWpSXmKPExsVy+t9jAV3vC8oRBv+f81s8OnWB0aL3/Ssm i8u75rBZ3Fvzn9WBxWPNvDWMHps+TWL36NuyitHj8ya5AJYoN5uM1MSU1CKF1Lzk/JTMvHRb pdAQN10LJYW8xNxUW6UIXd+QICWFssScUiDPyAANODgHuAcr6dsluGWc3rKYsWCKZsWbfSkN jB0KXYycHBICJhK7lz5igbDFJC7cW88GYgsJLGWU+H06E8TmFRCU+DH5HlANBwezgLzEkUvZ EKa6xJQpuV2MXEDVDxklnqz6zAxSLiwQJXHt6wU2kBoRAU2JxjnxEBNXM0nc/ZcAYjML+Ens XXSGHcRmE9CWeL9gEivEJi2JYyv+MoHYLAKqEkunTgK7RlQgQmL1umtg4zkFnCRObz/KPIFR YBaS42YhHDcL4bgFjMyrGCVSC5ILipPSc43yUsv1ihNzi0vz0vWS83M3MYLj5Zn0DsbDu9wP MQpwMCrx8FpUKkcIsSaWFVfmHmKU4GBWEuGdcRooxJuSWFmVWpQfX1Sak1p8iNEU6MSJzFKi yfnAWM4riTc0MTcxNzawMLe0NDFSEudtnP0sXEggPbEkNTs1tSC1CKaPiYNTqoFR71FLgF/G z8nbqxkvZ2f7lkvo3NWsc70rzvEsXTQhTWzjfs1y7rNlPcJ/D768s+flphCBOQvfJPtLh7qd OB7SKp+7ksXXVbEluuauwuoiw5udzdtn/LeweKBUsGr5k19dzy8FLk5Q5AswcjFc+CysSJd9 AutPMcMN6V3KW4tOewbJbL1UyqLEUpyRaKjFXFScCADfCFZ4rQIAAA== X-MTR: 20000000000000000@CPGS Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5130 Lines: 136 Hi On 2016년 11월 09일 19:23, Brian Starkey wrote: > Hi, > > On Wed, Nov 09, 2016 at 06:47:26PM +0900, Jaewon Kim wrote: >> >> >> On 2016년 11월 09일 18:27, Brian Starkey wrote: >>> Hi Jaewon, >>> >>> On Wed, Nov 09, 2016 at 06:10:09PM +0900, Jaewon Kim wrote: >>>> Commit 6b03ae0d42bf (drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MA) >>>> added MEMREMAP_WC for DMA_MEMORY_MAP. If, however, CPU cache can be used on >>>> DMA_MEMORY_MAP, I think MEMREMAP_WC can be changed to MEMREMAP_WB. On my local >>>> ARM device, memset in dma_alloc_from_coherent sometimes takes much longer with >>>> MEMREMAP_WC compared to MEMREMAP_WB. >>>> >>>> Test results on AArch64 by allocating 4MB with putting trace_printk right >>>> before and after memset. >>>> MEMREMAP_WC : 11.0ms, 5.7ms, 4.2ms, 4.9ms, 5.4ms, 4.3ms, 3.5ms >>>> MEMREMAP_WB : 0.7ms, 0.6ms, 0.6ms, 0.6ms, 0.6ms, 0.5ms, 0.4 ms >>>> >>> >>> This doesn't look like a good idea to me. The point of coherent memory >>> is to have it non-cached, however WB will make writes hit the cache. >>> >>> Writing to the cache is of course faster than writing to RAM, but >>> that's not what we want to do here. >>> >>> -Brian >>> >> Hi Brian >> >> Thank you for your comment. >> If allocated memory will be used by TZ side, however, I think cacheable >> also can be used to be fast on memset in dma_alloc_from_coherent. > > Are you trying to share the buffer between the secure and non-secure > worlds on the CPU? In that case, I don't think caching really helps > you. I'm not a TZ expert, but I believe the two worlds can never > share cached data. I do not want memory sharing between the secure and non-secure worlds. I just want faster allocation. I am not a TZ expert, either. I also think they cannot share cached data. As far as I know secure world can decide its cache policy with secure world page table regardless of non-secure world. > If you want the secure world to see the non-secure world's data, as > far as I know you will need to clean the cache in the non-secure world > to make sure the secure world can see it (and vice-versa). I'd expect > this to remove most of the speed advantage of using WB in the first > place, except for some possible speedup from more efficient bursting. Yes I also think non-secure world need to clean the cache before secure world access the memory region to avoid invalid data issue. But if other software like Linux driver or hypervisor do the cache cleaning, or engineer confirm, then we may be able to use MEMREMAP_WB or just skipping memset for faster memory allocation in dma_alloc_from_coherent. > > If you're sharing the buffer with other DMA masters, regardless of > secure/non-secure you're not going to want WB mappings. > If there is a scenario where another DMA master works on this memory, an engineer, I think, need to consider cache clean if he/she uses WB. >> How do you think to add another flag to distinguish this case? > > You could look into the streaming DMA API. It will depend on the exact > implementation, but at some point you're still going to have to pay > the penalty of syncing the CPU and device. > > -Brian > I cannot find DMA API and flag for WB. So I am considering additional flag to meet my request. In my opinion the flag can be either WB or non-zeroing. For case #1 - DMA_MEMORY_MAP_WB --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -32,7 +32,9 @@ static bool dma_init_coherent_memory( if (!size) goto out; - if (flags & DMA_MEMORY_MAP) + if (flags & DMA_MEMORY_MAP_WB) + mem_base = memremap(phys_addr, size, MEMREMAP_WB); + else if (flags & DMA_MEMORY_MAP) mem_base = memremap(phys_addr, size, MEMREMAP_WC); else mem_base = ioremap(phys_addr, size); For case #2 - DMA_MEMORY_MAP_NOZEROING --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -190,6 +190,8 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size, *ret = mem->virt_base + (pageno << PAGE_SHIFT); dma_memory_map = (mem->flags & DMA_MEMORY_MAP); spin_unlock_irqrestore(&mem->spinlock, flags); + if (mem->flags & DMA_MEMORY_MAP_NOZEROING) + return 1; if (dma_memory_map) memset(*ret, 0, size); else Can I get your comment? Thank you Jaewon Kim >>>> Signed-off-by: Jaewon Kim >>>> --- >>>> drivers/base/dma-coherent.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c >>>> index 640a7e6..0512a1d 100644 >>>> --- a/drivers/base/dma-coherent.c >>>> +++ b/drivers/base/dma-coherent.c >>>> @@ -33,7 +33,7 @@ static bool dma_init_coherent_memory( >>>> goto out; >>>> >>>> if (flags & DMA_MEMORY_MAP) >>>> - mem_base = memremap(phys_addr, size, MEMREMAP_WC); >>>> + mem_base = memremap(phys_addr, size, MEMREMAP_WB); >>>> else >>>> mem_base = ioremap(phys_addr, size); >>>> if (!mem_base) >>>> -- >>>> 1.9.1 >>>> >>> >>> >>> >> > > >