Received: by 10.192.165.148 with SMTP id m20csp1277543imm; Wed, 25 Apr 2018 15:57:01 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+zC2ywXPPTh8/q0D32H/1wNExDvE2P9e79ksvxFEspLavKR+CmIgQ5jY/6YA04hNZ3OPu0 X-Received: by 10.101.97.136 with SMTP id c8mr9724407pgv.131.1524697021750; Wed, 25 Apr 2018 15:57:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524697021; cv=none; d=google.com; s=arc-20160816; b=IkRQfi5A1Sc+UnP7m7kiDeFvqaAwE3CSXKi8oTeAmmPGKIBMfSAwqVzAEOQIRHlpN4 5s8w48Ibb931LHyncprRu8YcTo5JE7fMJZ5LU6J0vlOyrIzaoulUvmaUXLtz9HqUPYdn EXQ9W8ybB9pG3M62LN02WxWcAsukaGnLW9Kk6hA6oyjR5TglYMegcW7TIKwg9FCJvTaA RowaMzfkl1JN2PvCriwkvvbOYNhu0dG7xcqyZqXvBbI0ZYyp9xB4YLzykw0uVqLunzfi psfKUsA5Y4g34IYwRNfeteBBX9GgfODQeoZDdvvArKb95ucumEbfmEojBOpQ276T6L6E uXxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=0ZoFWeI1OMvBvHeLmRDZnCYAysocPD3oQrtBfTF6Frg=; b=GESyf42H8EhYIldxwQwvH6AlJNasBzZYiPHdG6tZ5ydwICuoS3N2O/zZ2cBoqfuAPB COoz8W1mrydlIXv7uhUyTife0MyF/fCegxWCs73PTmmjOyAltBTyMOsopzopd3TVapEj 0hTETQ29lM7VCu45v84fyabhynVnD142aC1qsnZZjNZDmt6z+2E/3EfASSflvYMV5wUm wgSy99dhpgKn1KlKv6QbKaXaIkgekH1l2Vm47xGPrZ4xfxJSzY7TDjYmKlUx4cHA5z5u r0onlpzx6EwY9VH6hXjLoCx4b3kHXBUR5ET2oj+HmcOTruCStwY/TQsHNu5un5MOG1va xqow== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=XDThpGgu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i71si4644475pge.436.2018.04.25.15.56.47; Wed, 25 Apr 2018 15:57:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@armlinux.org.uk header.s=pandora-2014 header.b=XDThpGgu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753963AbeDYWzS (ORCPT + 99 others); Wed, 25 Apr 2018 18:55:18 -0400 Received: from pandora.armlinux.org.uk ([78.32.30.218]:49878 "EHLO pandora.armlinux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752661AbeDYWzN (ORCPT ); Wed, 25 Apr 2018 18:55:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=0ZoFWeI1OMvBvHeLmRDZnCYAysocPD3oQrtBfTF6Frg=; b=XDThpGguaq91xuOsKYPj4v7eE jloxLaS7XSeSVZB5ooq2Fuuluo7o7iARFiAP8TbAwQ4itB65QvMi4juzWAvz7QmOg12MtZZY8zFvi OJfMI/nIn3X+6nv1V6lraVF6IqqOLt1C1SwFMgzizLx936XNv4gbe0mt2gbF8Je1Y0NS0=; Received: from n2100.armlinux.org.uk ([fd8f:7570:feb6:1:214:fdff:fe10:4f86]:33905) by pandora.armlinux.org.uk with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.90_1) (envelope-from ) id 1fBTJH-0002gM-8x; Wed, 25 Apr 2018 23:54:47 +0100 Received: from linux by n2100.armlinux.org.uk with local (Exim 4.90_1) (envelope-from ) id 1fBTJE-0000yz-Mq; Wed, 25 Apr 2018 23:54:44 +0100 Date: Wed, 25 Apr 2018 23:54:43 +0100 From: Russell King - ARM Linux To: Christoph Hellwig Cc: Thierry Reding , Christian =?iso-8859-1?Q?K=F6nig?= , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Linux Kernel Mailing List , amd-gfx list , Jerome Glisse , dri-devel , Dan Williams , Logan Gunthorpe , "open list:DMA BUFFER SHARING FRAMEWORK" , iommu@lists.linux-foundation.org, linux-arm-kernel@lists.infradead.org Subject: Re: noveau vs arm dma ops Message-ID: <20180425225443.GQ16141@n2100.armlinux.org.uk> References: <20180424184847.GA3247@infradead.org> <20180425054855.GA17038@infradead.org> <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo> <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local> <20180425153312.GD27076@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180425153312.GD27076@infradead.org> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 25, 2018 at 08:33:12AM -0700, Christoph Hellwig wrote: > On Wed, Apr 25, 2018 at 12:04:29PM +0200, Daniel Vetter wrote: > > - dma api hides the cache flushing requirements from us. GPUs love > > non-snooped access, and worse give userspace control over that. We want > > a strict separation between mapping stuff and flushing stuff. With the > > IOMMU api we mostly have the former, but for the later arch maintainers > > regularly tells they won't allow that. So we have drm_clflush.c. > > The problem is that a cache flushing API entirely separate is hard. That > being said if you look at my generic dma-noncoherent API series it tries > to move that way. So far it is in early stages and apparently rather > buggy unfortunately. And if folk want a cacheable mapping with explicit cache flushing, the cache flushing must not be defined in terms of "this is what CPU seems to need" but from the point of view of a CPU with infinite prefetching, infinite caching and infinite capacity to perform writebacks of dirty cache lines at unexpected moments when the memory is mapped in a cacheable mapping. (The reason for that is you're operating in a non-CPU specific space, so you can't make any guarantees as to how much caching or prefetching will occur by the CPU - different CPUs will do different amounts.) So, for example, the sequence: GPU writes to memory CPU reads from cacheable memory if the memory was previously dirty (iow, CPU has written), you need to flush the dirty cache lines _before_ the GPU writes happen, but you don't know whether the CPU has speculatively prefetched, so you need to flush any prefetched cache lines before reading from the cacheable memory _after_ the GPU has finished writing. Also note that "flush" there can be "clean the cache", "clean and invalidate the cache" or "invalidate the cache" as appropriate - some CPUs are able to perform those three operations, and the appropriate one depends on not only where in the above sequence it's being used, but also on what the operations are. So, the above sequence could be: CPU invalidates cache for memory (due to possible dirty cache lines) GPU writes to memory CPU invalidates cache for memory (to get rid of any speculatively prefetched lines) CPU reads from cacheable memory Yes, in the above case, _two_ cache operations are required to ensure correct behaviour. However, if you know for certain that the memory was previously clean, then the first cache operation can be skipped. What I'm pointing out is there's much more than just "I want to flush the cache" here, which is currently what DRM seems to assume at the moment with the code in drm_cache.c. If we can agree a set of interfaces that allows _proper_ use of these facilities, one which can be used appropriately, then there shouldn't be a problem. The DMA API does that via it's ideas about who owns a particular buffer (because of the above problem) and that's something which would need to be carried over to such a cache flushing API (it should be pretty obvious that having a GPU read or write memory while the cache for that memory is being cleaned will lead to unexpected results.) Also note that things get even more interesting in a SMP environment if cache operations aren't broadcasted across the SMP cluster (which means cache operations have to be IPI'd to other CPUs.) The next issue, which I've brought up before, is that exposing cache flushing to userspace on architectures where it isn't already exposed comes. As has been shown by Google Project Zero, this risks exposing those architectures to Spectre and Meltdown exploits where they weren't at such a risk before. (I've pretty much shown here that you _do_ need to control which cache lines get flushed to make these exploits work, and flushing the cache by reading lots of data in liu of having the ability to explicitly flush bits of cache makes it very difficult to impossible for them to work.) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up