Received: by 10.192.165.148 with SMTP id m20csp1304885imm; Wed, 25 Apr 2018 16:28:29 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoSwmJbnmyxYqkANbuTUqw7j/NxvCfuthkC+7A2qXN9g9lDFdnNzwIf+guSKf1ruwcEpWhQ X-Received: by 10.101.77.3 with SMTP id i3mr2222774pgt.452.1524698909061; Wed, 25 Apr 2018 16:28:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524698909; cv=none; d=google.com; s=arc-20160816; b=zLymXmdbSY42KUumV1/nP27LvujtKuthNl2A0MZnqeHcflqvFgEuK0RGEBsAo9KBUx vQXk4UcqNDg8jVU/+OIqh8SJGS0Zq0zQM4DWDfE9zZlbUJYm9jwecOdG/Ir483/f02os Nx6WuCgYLgnOfjE+Ji+HX8dQ0DdSX2OH6upuo25AovURxJMkILDh5vU+3mEDRL4l0eQe bS2z3USeNGZJUK2PJ6AD9SIrz07DwQaZBEuaiLA7vsb9tQqj/Nuzb4QGH+arE+f5tg1T MYoUdQE2XKKEYO1QK/+qlzwn4pPARDwruOfSZ5qRBWG+O3WecbpKOx6iDzsmeg7XiRcx RSMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=7Utgx98dXLD+ilXk5iNsHUWqEDx+NCJUhtQVH/0KOtE=; b=juWBQkeWVBDjPdAeUzwM60IEkJ+65UEiZ3m7QgjifQrci5L5B7QwxT7Ph6IN+W3w83 tpo2ej8Z0LA+PRV0nu0Q8pTJSy2UY96TOaUa5OYvNzdqmVWNDfHgRT0gGnpptrCZ1VMh kYlUM5o2yuB9tiScUUlD85zqNWyRiTqFe/Ofq6HLYeoPAmSVS73HDRVGyl2DRy8yYCJj H3IsX7tqqdGa+g2ot6BG/e2ZQxL8hOIMDZAAIxody6rcaEVn2vh/mBqf3U0fwb6RSqJ1 kaCC09uVei0FKYqNVOLIqCZht40UPFSkuB8iUeg8AobqSsm2an60/uzrbQ8FybXC8lOg 0SJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=UhV8KGFH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m14si14332339pgs.190.2018.04.25.16.28.13; Wed, 25 Apr 2018 16:28:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=UhV8KGFH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753890AbeDYX1H (ORCPT + 99 others); Wed, 25 Apr 2018 19:27:07 -0400 Received: from pandora.armlinux.org.uk ([78.32.30.218]:50230 "EHLO pandora.armlinux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751563AbeDYX1F (ORCPT ); Wed, 25 Apr 2018 19:27:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=7Utgx98dXLD+ilXk5iNsHUWqEDx+NCJUhtQVH/0KOtE=; b=UhV8KGFHhYUdSFm5BkIpUUmGf JVM8DmmQVsG1k1De1Z36jiFrUn/f8c+qc9gszMar5b+gOSH7/HksDwWMTSGxS/EGOYysFVvgs8PqX h20MlM5Dj0Rm3avtgeIWSU30MNe7V5aL8AWV43RHDtbHl4aVQswTien5IqD0V/LkaYDn4=; Received: from n2100.armlinux.org.uk ([fd8f:7570:feb6:1:214:fdff:fe10:4f86]:34311) by pandora.armlinux.org.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.90_1) (envelope-from ) id 1fBToJ-0002nv-1J; Thu, 26 Apr 2018 00:26:54 +0100 Received: from linux by n2100.armlinux.org.uk with local (Exim 4.90_1) (envelope-from ) id 1fBToG-0001H7-5t; Thu, 26 Apr 2018 00:26:48 +0100 Date: Thu, 26 Apr 2018 00:26:46 +0100 From: Russell King - ARM Linux To: Daniel Vetter Cc: Christoph Hellwig , Linux Kernel Mailing List , amd-gfx list , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Jerome Glisse , iommu@lists.linux-foundation.org, dri-devel , Dan Williams , Thierry Reding , Logan Gunthorpe , Christian =?iso-8859-1?Q?K=F6nig?= , Linux ARM , "open list:DMA BUFFER SHARING FRAMEWORK" Subject: Re: noveau vs arm dma ops Message-ID: <20180425232646.GR16141@n2100.armlinux.org.uk> References: <20180425054855.GA17038@infradead.org> <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo> <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local> <20180425153312.GD27076@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 25, 2018 at 11:35:13PM +0200, Daniel Vetter wrote: > On arm that doesn't work. The iommu api seems like a good fit, except > the dma-api tends to get in the way a bit (drm/msm apparently has > similar problems like tegra), and if you need contiguous memory > dma_alloc_coherent is the only way to get at contiguous memory. There > was a huge discussion years ago about that, and direct cma access was > shot down because it would have exposed too much of the caching > attribute mangling required (most arm platforms need wc-pages to not > be in the kernel's linear map apparently). I think you completely misunderstand ARM from what you've written above, and this worries me greatly about giving DRM the level of control that is being asked for. Modern ARMs have a PIPT cache or a non-aliasing VIPT cache, and cache attributes are stored in the page tables. These caches are inherently non-aliasing when there are multiple mappings (which is a great step forward compared to the previous aliasing caches.) As the cache attributes are stored in the page tables, this in theory allows different virtual mappings of the same physical memory to have different cache attributes. However, there's a problem, and that's called speculative prefetching. Let's say you have one mapping which is cacheable, and another that is marked as write combining. If a cache line is speculatively prefetched through the cacheable mapping of this memory, and then you read the same physical location through the write combining mapping, it is possible that you could read cached data. So, it is generally accepted that all mappings of any particular physical bit of memory should have the same cache attributes to avoid unpredictable behaviour. This presents a problem with what is generally called "lowmem" where the memory is mapped in kernel virtual space with cacheable attributes. It can also happen with highmem if the memory is kmapped. This is why, on ARM, you can't use something like get_free_pages() to grab some pages from the system, pass it to the GPU, map it into userspace as write-combining, etc. It _might_ work for some CPUs, but ARM CPUs vary in how much prefetching they do, and what may work for one particular CPU is in no way guaranteed to work for another ARM CPU. The official line from architecture folk is to assume that the caches infinitely speculate, are of infinite size, and can writeback *dirty* data at any moment. The way to stop things like speculative prefetches to particular physical memory is to, quite "simply", not have any cacheable mappings of that physical memory anywhere in the system. Now, cache flushes on ARM tend to be fairly expensive for GPU buffers. If you have, say, an 8MB buffer (for a 1080p frame) and you need to do a cache operation on that buffer, you'll be iterating over it 32 or maybe 64 bytes at a time "just in case" there's a cache line present. Referring to my previous email, where I detailed the potential need for _two_ flushes, one before the GPU operation and one after, and this becomes _really_ expensive. At that point, you're probably way better off using write-combine memory where you don't need to spend CPU cycles performing cache flushing - potentially across all CPUs in the system if cache operations aren't broadcasted. This isn't a simple matter of "just provide some APIs for cache operations" - there's much more that needs to be understood by all parties here, especially when we have GPU drivers that can be used with quite different CPUs. It may well be that for some combinations of CPUs and workloads, it's better to use write-combine memory without cache flushing, but for other CPUs that tradeoff (for the same workload) could well be different. Older ARMs get more interesting, because they have aliasing caches. That means the CPU cache aliases across different virtual space mappings in some way, which complicates (a) the mapping of memory and (b) handling the cache operations on it. It's too late for me to go into that tonight, and I probably won't be reading mail for the next week and a half, sorry. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up