Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3911509imu; Mon, 14 Jan 2019 11:15:36 -0800 (PST) X-Google-Smtp-Source: ALg8bN5PgvnSuhNdx3THjWviOVhIwKO1OYEar3O1m8QjxdWj4LCiMQZEcANzIwgnmkPGMsntg2FS X-Received: by 2002:a63:1013:: with SMTP id f19mr27770pgl.38.1547493336383; Mon, 14 Jan 2019 11:15:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547493336; cv=none; d=google.com; s=arc-20160816; b=XtZCHQ4s0AmiKkh6J8K2/KU6RJZIQt0eR6rtl6EjjKBJuFYWCFrSzDGb0sirWSm+W4 FGOwsVtuWGxqsWtfUUf9a3S/d2wuQCP0rTKRgLOCVkgKJbWYxETEPKnW5ROC9n+3yZkj 9FX9fUFCkFsYeqhStewzhpJEBavLTcXyyoI+inZzip7whrh2iloBF0mVVMkhBUZyp1me ZNb6ALMN5SYRE0oOuBFgHmJk7TKG9kDVyn6NJnCfIramT8LOt9LMx21na9rKyzdm0/mQ MEZO6VXsn8U/BVt4BqdFpXHslYtUX9I7CqZIUBP9Ayx3j8+2Dxc/2/5DA8NB5OqmTMOO 64rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=UelO0A9SBdJ8AwrrL0aAQuy2ptS5HjZdVP7fsnj91JY=; b=WRSC81pTC/4wRmvXyaEHRxLFvMEQlDwCimMs5Ed/zMEG44qyFPM4jCCs8yYCI6S/aN Q7dBIqVY6fJKAg0UHpLEniDQxBjhU8D6dniDeSHgi/+LXac1u+t+UHU/8WAY0emh663y HQTKZS6OYledy5st4k5FC4M2HohVxJTheQWMzyQzP0OcRHLqg9uH0uS3LjnBk4lW4LPy 1mxfmNDFzg2ACt8jyvm7aIWWvAll7kXpB/gxQRNmBvKTb8XAGlPRnK/WUgc0d+FVFR1n U+sF3Nwivr2mNFu2GekmzXEY9g+Xi4gSKdsKsKHqtWVQau32Zggxs4leSArj8PFZnu2N i9Sw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u20si1067168plj.129.2019.01.14.11.15.20; Mon, 14 Jan 2019 11:15:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726776AbfANTN6 (ORCPT + 99 others); Mon, 14 Jan 2019 14:13:58 -0500 Received: from foss.arm.com ([217.140.101.70]:39188 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726706AbfANTN6 (ORCPT ); Mon, 14 Jan 2019 14:13:58 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6E13E80D; Mon, 14 Jan 2019 11:13:57 -0800 (PST) Received: from fuggles.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7A4283F5BD; Mon, 14 Jan 2019 11:13:55 -0800 (PST) Date: Mon, 14 Jan 2019 19:13:50 +0000 From: Will Deacon To: "Koenig, Christian" Cc: Ard Biesheuvel , Michel =?iso-8859-1?Q?D=E4nzer?= , Linux Kernel Mailing List , Carsten Haitzler , David Airlie , dri-devel , "Huang, Ray" , "Zhang, Jerry" , linux-arm-kernel , Bernhard =?iso-8859-1?Q?Rosenkr=E4nzer?= Subject: Re: [RFC PATCH] drm/ttm: force cached mappings for system RAM on ARM Message-ID: <20190114191350.GA29600@fuggles.cambridge.arm.com> References: <20190110072841.3283-1-ard.biesheuvel@linaro.org> <5d8135de-80fe-9c0e-2206-ecb809f64cdb@daenzer.net> <55facfb9-92af-86b8-40e9-d63b887b5592@amd.com> <9f956898-7973-98ee-6bf1-e1d445e9d365@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9f956898-7973-98ee-6bf1-e1d445e9d365@amd.com> User-Agent: Mutt/1.11.1+86 (6f28e57d73f2) () Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 14, 2019 at 07:07:54PM +0000, Koenig, Christian wrote: > Am 14.01.19 um 18:32 schrieb Ard Biesheuvel: > - The reason remapping the CPU side as cacheable does work (which I > did test) is because the GPU's uncacheable accesses (which I assume > are made using the NoSnoop PCIe transaction attribute) are actually > emitted as cacheable in some cases. > . On my AMD Seattle, with or without SMMU (which is stage 2 only), I > must use cacheable accesses from the CPU side or things are broken. > This might be a h/w flaw, though. > . On systems with stage 1+2 SMMUs, the driver uses stage 1 > translations which always override the memory attributes to cacheable > for DMA coherent devices. This is what is affecting the Cavium > ThunderX2 (although it appears the attributes emitted by the RC may be > incorrect as well.) > > The latter issue is a shortcoming in the SMMU driver that we have to > fix, i.e., it should take care not to modify the incoming attributes > of DMA coherent PCIe devices for NoSnoop to be able to work. > > So in summary, the mismatch appears to be between the CPU accessing > the vmap region with non-cacheable attributes and the GPU accessing > the same memory with cacheable attributes, resulting in a loss of > coherency and lots of visible corruption. > > Actually it is the other way around. The CPU thinks some data is in the > cache and the GPU only updates the system memory version because the > snoop flag is not set. > > > That doesn't seem to be what is happening. As far as we can tell from > our experiments, all inbound transactions are always cacheable, and so > the only way to make things work is to ensure that the CPU uses the > same attributes. > > > Ok that doesn't make any sense. If inbound transactions are cacheable or not is > irrelevant when the CPU always uses uncached accesses. > > See on the PCIe side you have the snoop bit in the read/write transactions > which tells the root hub if the device wants to snoop caches or not. > > When the CPU accesses some memory as cached then devices need to snoop the > cache for coherent accesses. > > When the CPU accesses some memory as uncached then devices can disable snooping > to improve performance, but when they don't do this it is mandated by the spec > that this still works. Which spec? The Arm architecture (and others including Power afaiu) doesn't guarantee coherency when memory is accessed using mismatched cacheability attributes. Will