Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3429963imu; Mon, 14 Jan 2019 02:55:02 -0800 (PST) X-Google-Smtp-Source: ALg8bN7DKri6c9NAn3irwLSdhaRls/adNTR1S0mo2a2njGzJXfIt4FiDI0fQNckyDJXPFIAh7f2d X-Received: by 2002:a63:2643:: with SMTP id m64mr21852511pgm.35.1547463302525; Mon, 14 Jan 2019 02:55:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547463302; cv=none; d=google.com; s=arc-20160816; b=BWFzrlXTqAm1IMfRgBIDBMwuHYPNB0QmEj1g1G/bjtrfl7+efLc9qDlqKmrUSiBTa7 KJxrOqH6nqjChGS6sUx4y/6UNhlfgVgRw6FMAGtWpuaREOeR9JqXt7rMiExOxYlHYTWL t5SvC2FwJa1YEghE+eVMaLJpT7ca8166JfztL9dLiakSsbmOlYmMv69BgrzC+dEdFYNp /zgPj0f9z0GyfTUkk/jhuxMkesSDv3UPABXrP8ZC7Oa04Rwe5xNBwIFfufyqcFg82t1w 5Zs2hshIcaGkC1uDiiGoDAuxc7HhuK8eviTFIo2BM3HfBgwVli50v1jCbDToUQDhllDo zBqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=PuyA1u1/e7dZRU052cjFTRNGgKhojhL2AZTNPERZL5I=; b=CTHtGpHtsn3SOixn2ME58imTK0U1aZTHi0IgYw9EGb443bJndmvRkMukfx2Et6mP5D cHEIhQGIvnHqQ9Cvemh2dtfx9q4b4h9JrODNmbx632VJ2tSkemC5XFVXmETAhnBxJh5B JCNxtcy3s4Z17GXl+YJywE3yoBhz7wX9/QyHSiG2uhULUERhmvS88XkQUGMBxSYQDi4/ KfGRcu1QVblbojH2zH4Qx8tvBq1lFyq89WWfLoXF/gjbDLFNYlnJPb+tpvrIXOl1IgEE pgxp053WiZssr73jIH/nNpG7j5JVsizRq4M17Lo+W1P8V7xuqdzxfn42oZI2ZkUp8KXx a4qA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=fpFJBDE3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id bi6si61009plb.279.2019.01.14.02.54.45; Mon, 14 Jan 2019 02:55:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=fpFJBDE3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726526AbfANKxZ (ORCPT + 99 others); Mon, 14 Jan 2019 05:53:25 -0500 Received: from mail-it1-f195.google.com ([209.85.166.195]:50807 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726187AbfANKxZ (ORCPT ); Mon, 14 Jan 2019 05:53:25 -0500 Received: by mail-it1-f195.google.com with SMTP id z7so12353688iti.0 for ; Mon, 14 Jan 2019 02:53:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=PuyA1u1/e7dZRU052cjFTRNGgKhojhL2AZTNPERZL5I=; b=fpFJBDE39PjXMWkep9AhPsABN0vTdzodyp1zEVZhf+jCAKAzp26ZobH+/R+IOtepgX lxqwxo3ZPSsY+SpWbNo6oj2c2+vRdH/Q/b3CFYL2TQQgpcwe17s/OxtTANO53DEa0Nb9 CU64Dm+dadpc1smGvaiLTB0jXv3R4DeuVdy4M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=PuyA1u1/e7dZRU052cjFTRNGgKhojhL2AZTNPERZL5I=; b=J5a+GvKLPbUZbPBQAjTuKBV8AqUJFeXF2Zhh4oBs0JswlWmEe7ZpKz7iBEL5Njylwa l2xct0y5rmhobCioli1P0HnnWXy6rSYEiEoROMguHfuGb6unZr0nq4UiOFEYl8yYFwVD 28qBQc88PKnLlZHM1kz83L19xTqTF08vdxk+quMyj5SxOdz+xW3xAIMkuWhWUnCe5wXU n/0Wz4200hmlGGvREiiflcS0Y7QjgmP530Qnr5vD6IEASM/8cpA8nKy9gW/2o9Bv3v/W 19TVwJTZYKfaA6ubGp5F9tjXDFMWbvBLv4d9ED8MF73bpvGxzb15QEw0dguXxsNyUd0Y GPlA== X-Gm-Message-State: AJcUukcr43qXMJawbBYE6eN8rTxQDDFnXLDcdiU994B9exitvHlZ3mbD 7pO0ezVmr95m2qFIJIEyZU4wzREsoWs3bhlPU538yg== X-Received: by 2002:a02:4c9:: with SMTP id 192mr16806810jab.2.1547463204348; Mon, 14 Jan 2019 02:53:24 -0800 (PST) MIME-Version: 1.0 References: <20190110072841.3283-1-ard.biesheuvel@linaro.org> <5d8135de-80fe-9c0e-2206-ecb809f64cdb@daenzer.net> In-Reply-To: <5d8135de-80fe-9c0e-2206-ecb809f64cdb@daenzer.net> From: Ard Biesheuvel Date: Mon, 14 Jan 2019 11:53:13 +0100 Message-ID: Subject: Re: [RFC PATCH] drm/ttm: force cached mappings for system RAM on ARM To: =?UTF-8?Q?Michel_D=C3=A4nzer?= Cc: Linux Kernel Mailing List , Carsten Haitzler , David Airlie , Will Deacon , dri-devel , Huang Rui , Junwei Zhang , Christian Koenig , linux-arm-kernel , =?UTF-8?Q?Bernhard_Rosenkr=C3=A4nzer?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 10 Jan 2019 at 10:34, Michel D=C3=A4nzer wrote= : > > On 2019-01-10 8:28 a.m., Ard Biesheuvel wrote: > > ARM systems do not permit the use of anything other than cached > > mappings for system memory, since that memory may be mapped in the > > linear region as well, and the architecture does not permit aliases > > with mismatched attributes. > > > > So short-circuit the evaluation in ttm_io_prot() if the flags include > > TTM_PL_SYSTEM when running on ARM or arm64, and just return cached > > attributes immediately. > > > > This fixes the radeon and amdgpu [TBC] drivers when running on arm64. > > Without this change, amdgpu does not start at all, and radeon only > > produces corrupt display output. > > > > Cc: Christian Koenig > > Cc: Huang Rui > > Cc: Junwei Zhang > > Cc: David Airlie > > Reported-by: Carsten Haitzler > > Signed-off-by: Ard Biesheuvel > > --- > > drivers/gpu/drm/ttm/ttm_bo_util.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/tt= m_bo_util.c > > index 046a6dda690a..0c1eef5f7ae3 100644 > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c > > @@ -530,6 +530,11 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgpro= t_t tmp) > > if (caching_flags & TTM_PL_FLAG_CACHED) > > return tmp; > > > > +#if defined(__arm__) || defined(__aarch64__) > > + /* ARM only permits cached mappings of system memory */ > > + if (caching_flags & TTM_PL_SYSTEM) > > + return tmp; > > +#endif > > #if defined(__i386__) || defined(__x86_64__) > > if (caching_flags & TTM_PL_FLAG_WC) > > tmp =3D pgprot_writecombine(tmp); > > > > Apart from Christian's concerns, I think this is the wrong place for > this, because other TTM / driver code will still consider the memory > uncacheable. E.g. the amdgpu driver will program the GPU to treat the > memory as uncacheable, so it won't participate in cache coherency > protocol for it, which is unlikely to work as expected in general if the > CPU treats the memory as cacheable. > Will and I have spent some time digging into this, so allow me to share some preliminary findings while we carry on and try to fix this properly. - The patch above is flawed, i.e., it doesn't do what it intends to since it uses TTM_PL_SYSTEM instead of TTM_PL_FLAG_SYSTEM. Apologies for that. - The existence of a linear region mapping with mismatched attributes is likely not the culprit here. (We do something similar with non-cache coherent DMA in other places). - The reason remapping the CPU side as cacheable does work (which I did test) is because the GPU's uncacheable accesses (which I assume are made using the NoSnoop PCIe transaction attribute) are actually emitted as cacheable in some cases. . On my AMD Seattle, with or without SMMU (which is stage 2 only), I must use cacheable accesses from the CPU side or things are broken. This might be a h/w flaw, though. . On systems with stage 1+2 SMMUs, the driver uses stage 1 translations which always override the memory attributes to cacheable for DMA coherent devices. This is what is affecting the Cavium ThunderX2 (although it appears the attributes emitted by the RC may be incorrect as well.) The latter issue is a shortcoming in the SMMU driver that we have to fix, i.e., it should take care not to modify the incoming attributes of DMA coherent PCIe devices for NoSnoop to be able to work. So in summary, the mismatch appears to be between the CPU accessing the vmap region with non-cacheable attributes and the GPU accessing the same memory with cacheable attributes, resulting in a loss of coherency and lots of visible corruption. To be able to debug this further, could you elaborate a bit on - How does the hardware emit those uncached/wc inbound accesses? Do they rely on NoSnoop? - Christian pointed out that some accesses must be uncached even when not using WC. What kind of accesses are those? And do they access system RAM?