Received: by 10.192.165.148 with SMTP id m20csp1796071imm; Thu, 26 Apr 2018 02:20:23 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+scup0EbMerSJhETnbOBIFFaUp4FXAL/NKpf9dcNvd1kpXPdRXLGYmeGmkTQHAuxu9Ddod X-Received: by 10.101.98.90 with SMTP id q26mr26967540pgv.113.1524734423919; Thu, 26 Apr 2018 02:20:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524734423; cv=none; d=google.com; s=arc-20160816; b=PpwVDXl3vMnFCkUTq1IZttzwTCpEZC46lxFSSvRY1rsRA/fFZm37hgbhHb+1X8ow81 7Q+sZdiP/s0MULm2V/9ysHE18Bt9fAOQjpKEcxIi8I5Lp9two9WcmA2sO1ClM1dUCsOt YR92RgggD1k+HUVPeMDW7uhxxnn+y8w6yNDNU5P3EcSfr2PJO7fpIi/2QKX53kkTcV12 wuPmx3nQiR3QChcgSpjAhNQ7Cx9z0jgvXjDqNVXNSg1ChjRO7E4WVPEY7L5IQ/Qql0c3 Jd064nqZXK4CcHwNmuk94FjhaOUpeF/9CQNagx44GG6elaImTVecC8Q3q+NC+yX4LyZM Zb2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=8Fa9+tzDDNoANVNxRiIgWOTyj9dGtwqw7Ip18BXvvHU=; b=J+OffInAxAnRO1syYupcgiGX+0ObuKBcS4KHjECOs/5rlSC7smZ9EjXoC6AEqDUZZX 0+7N5SnUtfb4c6e5ElLoAdO4ZeDWrsl4uQp6g5q2DhfcGjK+Y+HG05wd1r3YP0VmD+1h XWJFjzBrtzrSw4gkIeaT49RTf/K+YQOXrXDebJBKMgngEWczR+zO8a12cJlf1qZ/NP5D 3Lsyomk6cIXGywAPtNz/VpvQ/C5ryQzknpN/PR2pYS8HH/pRbFX3Ifnu2SkHvBXA67WG k9CDdPlqgjP1cK6FITAnDBHcrxf5dadSO3NTRPdcWN2ib6/SVVkoeNUjvh0B76rFt3se fmBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=RYW3RL6Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j186si5490175pgc.621.2018.04.26.02.20.08; Thu, 26 Apr 2018 02:20:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=RYW3RL6Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754696AbeDZJRz (ORCPT + 99 others); Thu, 26 Apr 2018 05:17:55 -0400 Received: from mail-it0-f43.google.com ([209.85.214.43]:55897 "EHLO mail-it0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753676AbeDZJRr (ORCPT ); Thu, 26 Apr 2018 05:17:47 -0400 Received: by mail-it0-f43.google.com with SMTP id 144-v6so8844187iti.5 for ; Thu, 26 Apr 2018 02:17:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=8Fa9+tzDDNoANVNxRiIgWOTyj9dGtwqw7Ip18BXvvHU=; b=RYW3RL6QiNx9cApfNXsBEDmj3SuW91EyMY5FGS8OUbh//FVa0IRI5z5juLi/9pYozD 8Vigd7Q9Jr0BdzL3xJMMOBlZGFrDy0a1gViUAGKQdboMZ1HB23/7ianzA2mgswkcn0j2 hny4OVEoXYQAz09+C2hDwxA7Euc4TRWdM9wb4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=8Fa9+tzDDNoANVNxRiIgWOTyj9dGtwqw7Ip18BXvvHU=; b=DH5MQH34G0+SqwHXcFlMorfbi7SnqqWOZ095O57TD7nvIlbSKlYzsQLGh4L+ZTUv+d +N0OLShXHGz3cSx1TnZie45XH4WKDXbCHp6tvvCcodalJrHVyY6KpO5ZWRv8o5c3cMVx EHOZYqQq8S3bRtZx75tEX93AnQBvXbf3WrVsNPhT5QFjv4MT9I82zcdkp8/jSY4V/A7C Bnfs/Q7ZpIKmUvnU5Fo5FoX4uObswtiAcHHOuYeyWoMP1b8arXLWeL4wpHTJA+4O79f4 3oRIlXwqqfZI98nY9SBseIsVVz0zpMsn+b3nR8+cbnAwtnROCfwQQ7HGDH7weYAvG/5Q k5bQ== X-Gm-Message-State: ALQs6tB2LPS6xuQ+FJSJePkpAQlWbjjeecuVQWKybZGarPL8RfvcDtfm VW/iavBHSRneOrwwVmRubojMr9xChER8ArVKadogDg== X-Received: by 2002:a24:24d7:: with SMTP id f206-v6mr25101641ita.59.1524734266724; Thu, 26 Apr 2018 02:17:46 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a4f:f0d3:0:0:0:0:0 with HTTP; Thu, 26 Apr 2018 02:17:46 -0700 (PDT) X-Originating-IP: [2a02:168:5635:0:39d2:f87e:2033:9f6] In-Reply-To: <20180425232646.GR16141@n2100.armlinux.org.uk> References: <20180425054855.GA17038@infradead.org> <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo> <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local> <20180425153312.GD27076@infradead.org> <20180425232646.GR16141@n2100.armlinux.org.uk> From: Daniel Vetter Date: Thu, 26 Apr 2018 11:17:46 +0200 X-Google-Sender-Auth: 4U09zLSNS-UAIUhiFFbKraEeQho Message-ID: Subject: Re: noveau vs arm dma ops To: Russell King - ARM Linux Cc: Christoph Hellwig , Linux Kernel Mailing List , amd-gfx list , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Jerome Glisse , iommu@lists.linux-foundation.org, dri-devel , Dan Williams , Thierry Reding , Logan Gunthorpe , =?UTF-8?Q?Christian_K=C3=B6nig?= , Linux ARM , "open list:DMA BUFFER SHARING FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 26, 2018 at 1:26 AM, Russell King - ARM Linux wrote: > On Wed, Apr 25, 2018 at 11:35:13PM +0200, Daniel Vetter wrote: >> On arm that doesn't work. The iommu api seems like a good fit, except >> the dma-api tends to get in the way a bit (drm/msm apparently has >> similar problems like tegra), and if you need contiguous memory >> dma_alloc_coherent is the only way to get at contiguous memory. There >> was a huge discussion years ago about that, and direct cma access was >> shot down because it would have exposed too much of the caching >> attribute mangling required (most arm platforms need wc-pages to not >> be in the kernel's linear map apparently). > > I think you completely misunderstand ARM from what you've written above, > and this worries me greatly about giving DRM the level of control that > is being asked for. > > Modern ARMs have a PIPT cache or a non-aliasing VIPT cache, and cache > attributes are stored in the page tables. These caches are inherently > non-aliasing when there are multiple mappings (which is a great step > forward compared to the previous aliasing caches.) > > As the cache attributes are stored in the page tables, this in theory > allows different virtual mappings of the same physical memory to have > different cache attributes. However, there's a problem, and that's > called speculative prefetching. > > Let's say you have one mapping which is cacheable, and another that is > marked as write combining. If a cache line is speculatively prefetched > through the cacheable mapping of this memory, and then you read the > same physical location through the write combining mapping, it is > possible that you could read cached data. > > So, it is generally accepted that all mappings of any particular > physical bit of memory should have the same cache attributes to avoid > unpredictable behaviour. > > This presents a problem with what is generally called "lowmem" where > the memory is mapped in kernel virtual space with cacheable > attributes. It can also happen with highmem if the memory is > kmapped. > > This is why, on ARM, you can't use something like get_free_pages() to > grab some pages from the system, pass it to the GPU, map it into > userspace as write-combining, etc. It _might_ work for some CPUs, > but ARM CPUs vary in how much prefetching they do, and what may work > for one particular CPU is in no way guaranteed to work for another > ARM CPU. > > The official line from architecture folk is to assume that the caches > infinitely speculate, are of infinite size, and can writeback *dirty* > data at any moment. > > The way to stop things like speculative prefetches to particular > physical memory is to, quite "simply", not have any cacheable > mappings of that physical memory anywhere in the system. > > Now, cache flushes on ARM tend to be fairly expensive for GPU buffers. > If you have, say, an 8MB buffer (for a 1080p frame) and you need to > do a cache operation on that buffer, you'll be iterating over it > 32 or maybe 64 bytes at a time "just in case" there's a cache line > present. Referring to my previous email, where I detailed the > potential need for _two_ flushes, one before the GPU operation and > one after, and this becomes _really_ expensive. At that point, you're > probably way better off using write-combine memory where you don't > need to spend CPU cycles performing cache flushing - potentially > across all CPUs in the system if cache operations aren't broadcasted. > > This isn't a simple matter of "just provide some APIs for cache > operations" - there's much more that needs to be understood by > all parties here, especially when we have GPU drivers that can be > used with quite different CPUs. > > It may well be that for some combinations of CPUs and workloads, it's > better to use write-combine memory without cache flushing, but for > other CPUs that tradeoff (for the same workload) could well be > different. > > Older ARMs get more interesting, because they have aliasing caches. > That means the CPU cache aliases across different virtual space > mappings in some way, which complicates (a) the mapping of memory > and (b) handling the cache operations on it. > > It's too late for me to go into that tonight, and I probably won't > be reading mail for the next week and a half, sorry. I didn't know all the details well enough (and neither had the time to write a few paragraphs like you did), but the above is what I had in mind and meant. Sorry if my sloppy reply sounded like I'm mixing stuff up. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch