Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp5788351iog; Thu, 23 Jun 2022 05:29:59 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sjOaPtlwcIU+f0DFbiAGh0mKS72bxiz3wr4eC0qY+rdHooTpJjqw7pxpK/rAODoqe5rc2e X-Received: by 2002:a05:6402:3227:b0:435:8e00:62b4 with SMTP id g39-20020a056402322700b004358e0062b4mr10382102eda.325.1655987398968; Thu, 23 Jun 2022 05:29:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655987398; cv=none; d=google.com; s=arc-20160816; b=NHRgbi3AkUxz0+rcZZJm42PXZGVHTKW7oeVxEyJ23uKH/5KOWzFPos73APdOqnoeVw PJoOPlW+i7gQrO5d7hjPAglyOQqBmcAYJL1bT98vdG83zedIBT+vjBVsKQBeMv6J4Gft FL+o2q0a47gj0Z7bphPokggQ4IWMnnbZjpOvEdBlxPY2i9GtHFBzv+/Cgn1mID5BD1fW Rc2JAahlE3GMt4a9qeTlp/SrUUGUXFKLXa+ivQVlog+keWH4CD5s7CoJ4u1PtBFiVzwC wnsrJ4YJIK/lCN5eCmP1J+cM6JmsPzZquXgIMW2P5se7u7zE56ZHEBbhwWJcOjvHNjtO Xbvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=y+XJeNaYqZsy0zse5RViqenHlSleSJngZ0WVAAdiTgo=; b=wRgaQ2d/YTgmFrlKpvz2r6xL3ja0n/KedHwS/joeEPjIdwAW2nQcIpjl+SHQRLIsW1 N6tI1kSuimnpztFgaUyLraSgFbUe+yGNGZ6Q5j6VgfKW+lH4OyenowQ1iZ0OtvFrbbfE vbDZEk4DY/quyki4raoj+qa7YnK9e/Mbm/fi094xG8rtY3Es2+N+Q/3QeanCvz7fL98L HlR7k0aU/ZlQT0T1qEVLQHNGWG6y+SQVGp9mwHiPqm2tG3FJO0S5FETJlZuGOvUUAKS0 IPYaCiKL84VF83pXDEL5zfgDYP1/8KrGKSZYE4YNl6Cc4v2gQynDrJdyF+7jODP+AaAq f2GA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qk33-20020a1709077fa100b00722bc09757bsi14129261ejc.571.2022.06.23.05.29.29; Thu, 23 Jun 2022 05:29:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231516AbiFWMO4 (ORCPT + 99 others); Thu, 23 Jun 2022 08:14:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229916AbiFWMOy (ORCPT ); Thu, 23 Jun 2022 08:14:54 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9655B2DAB6 for ; Thu, 23 Jun 2022 05:14:53 -0700 (PDT) Received: from gallifrey.ext.pengutronix.de ([2001:67c:670:201:5054:ff:fe8d:eefb] helo=[IPv6:::1]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o4LjV-0003Qz-PU; Thu, 23 Jun 2022 14:14:49 +0200 Message-ID: <3c088a9a511762f7868b10dbe431942d3724917a.camel@pengutronix.de> Subject: Re: DMA-buf and uncached system memory From: Lucas Stach To: Christian =?ISO-8859-1?Q?K=F6nig?= , Pekka Paalanen Cc: "Sharma, Shashank" , lkml , dri-devel , Nicolas Dufresne , linaro-mm-sig@lists.linaro.org, Sumit Semwal , linux-media Date: Thu, 23 Jun 2022 14:14:48 +0200 In-Reply-To: <34a1efd9-5447-848b-c08c-de75b48e997e@amd.com> References: <91ff0bbb-ea3a-2663-3453-dea96ccd6dd8@amd.com> <9178e19f5c0e141772b61b759abaa0d176f902b6.camel@ndufresne.ca> <20220623101326.18beeab3@eldfell> <954d0a9b-29ef-52ef-f6ca-22d7e6aa3f4d@amd.com> <4b69f9f542d6efde2190b73c87096e87fa24d8ef.camel@pengutronix.de> <95cca943bbfda6af07339fb8d2dc7f4da3aa0280.camel@pengutronix.de> <05814ddb-4f3e-99d8-025a-c31db7b2c46b@amd.com> <708e27755317a7650ca08ba2e4c14691ac0d6ba2.camel@pengutronix.de> <6287f5f8-d9af-e03d-a2c8-ea8ddcbdc0d8@amd.com> <34a1efd9-5447-848b-c08c-de75b48e997e@amd.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 2001:67c:670:201:5054:ff:fe8d:eefb X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am Donnerstag, dem 23.06.2022 um 13:54 +0200 schrieb Christian König: > Am 23.06.22 um 13:29 schrieb Lucas Stach: > > [SNIP] > > > Well then the existing DMA-buf framework is not what you want to use for > > > this. > > > > > Sorry, but this is just ignoring reality. You try to flag 8+ years of > > DMA-buf usage on non-coherent arches as "you shouldn't do this". At > > this point there are probably a lot more users (drivers) of DMA-buf in > > the kernel for devices, which are used on non-coherent arches, than > > there are on coherent arches. > > Well, it's my reality that people come up with bug reports about that > and we have been pushing back on this with the explanation "Hey this is > not supported" as long as I can think about it. > > I mean I even had somebody from ARM which told me that this is not going > to work with our GPUs on a specific SoC. That there are ARM internal use > cases which just seem to work because all the devices are non-coherent > is completely new to me. > Yes, trying to hook up a peripheral that assumes cache snooping in some design details to a non coherent SoC may end up exploding in various ways. On the other hand you can work around most of those assumptions by marking the memory as uncached to the CPU, which may tank performance, but will work from a correctness PoV. > I'm as much surprised as you are about this lack of agreement about such > fundamental stuff. > > > > > Non-coherent without explicit domain transfer points is just not going > > > > to work. So why can't we solve the issue for DMA-buf in the same way as > > > > the DMA API already solved it years ago: by adding the equivalent of > > > > the dma_sync calls that do cache maintenance when necessary? On x86 (or > > > > any system where things are mostly coherent) you could still no-op them > > > > for the common case and only trigger cache cleaning if the importer > > > > explicitly says that is going to do a non-snooping access. > > > Because DMA-buf is a framework for buffer sharing between cache coherent > > > devices which don't signal transitions. > > > > > > We intentionally didn't implemented any of the dma_sync_* functions > > > because that would break the intended use case. > > > > > Non coherent access, including your non-snoop scanout, and no domain > > transition signal just doesn't go together when you want to solve > > things in a generic way. > > Yeah, that's the stuff I totally agree on. > > See we absolutely do have the requirement of implementing coherent > access without domain transitions for Vulkan and OpenGL+extensions. > Coherent can mean 2 different things: 1. CPU cached with snooping from the IO device 2. CPU uncached The Vulkan and GL "coherent" uses are really coherent without explicit domain transitions, so on non coherent arches that require the transitions the only way to implement this is by making the memory CPU uncached. Which from a performance PoV will probably not be what app developers expect, but will still expose the correct behavior. > When we now have to introduce domain transitions to get non coherent > access working we are essentially splitting all the drivers into > coherent and non-coherent ones. > > That doesn't sounds like it would improve interop. > > > Remember that in a fully (not only IO) coherent system the CPU isn't > > the only agent that may cache the content you are trying to access > > here. The dirty cacheline could reasonably still be sitting in a GPU or > > VPU cache, so you need some way to clean those cachelines, which isn't > > a magic "importer knows how to call CPU cache clean instructions". > > IIRC we do already have/had a SYNC_IOCTL for cases like this, but (I > need to double check as well, that's way to long ago) this was kicked > out because of the requirements above. > The DMA_BUF_IOCTL_SYNC is available in upstream, with the explicit documentation that "userspace can not rely on coherent access". > > > You can of course use DMA-buf in an incoherent environment, but then you > > > can't expect that this works all the time. > > > > > > This is documented behavior and so far we have bluntly rejected any of > > > the complains that it doesn't work on most ARM SoCs and I don't really > > > see a way to do this differently. > > Can you point me to that part of the documentation? A quick grep for > > "coherent" didn't immediately turn something up within the DMA-buf > > dirs. > > Search for "cache coherency management". It's quite a while ago, but I > do remember helping to review that stuff. > That only turns up the lines in DMA_BUF_IOCTL_SYNC doc, which are saying the exact opposite of the DMA-buf is always coherent. I also don't see why you think that both world views are so totally different. We could just require explicit domain transitions for non- snoop access, which would probably solve your scanout issue and would not be a problem for most ARM systems, where we could no-op this if the buffer is already in uncached memory and at the same time keep the "x86 assumes cached + snooped access by default" semantics. Regards, Lucas