Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp5687654iog; Thu, 23 Jun 2022 03:25:46 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vnSup0e8Od6XZ2+33Mi6/QLrXYAr+hRutmROAtRKU7XU3TyrlFrc/1f1uUHSmGjvhOnE7O X-Received: by 2002:a17:907:1b1c:b0:6fe:f1a9:ef5a with SMTP id mp28-20020a1709071b1c00b006fef1a9ef5amr7461982ejc.233.1655979946217; Thu, 23 Jun 2022 03:25:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655979946; cv=none; d=google.com; s=arc-20160816; b=iE+DbfVmbgbAsUszyI9LWLvK4/cQAyAeKjC1iIZj2jbC2r7nelA8We0AyOrFU/5pZr qjWQ0iyyWWMjhxjDl+zFaDTgBHfVYRGdOo6P4D5IPovIQMfwhxnWdI8zm8IOXmaojib9 Ix1R5OPek6AuqEXsAsnyh4ar3mcv5YRkdEKX0fE3P6NnqsZ6qekMcXuU18jDkAuuGjVD 2WQQfeRMTUyOMYruu1gIUiW4qhm7jDxLfgAhe4wOFlswNdIvBVfUzF5M12zj7wROVQUS s0rhpXdQUzL0cXUMc03xSJQLz1EHKWPA7U8Jv3Ee2rjcShQmC+ZKfpfpDLXmAz7RY+2O SLyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=1sDENXCXQGqHpRIPQeq/rGDW1maiEDoi8Nu/DOWcdA4=; b=0QcnsnC5ufjWyN7Cd6PV+MhuYnAm75f8Z9TRIXe3Ys5arT51Dzeb7sTTX4Fok/h0SQ 6SA2IdKOTTJB0mG5t4fzcg/+KP0yAoreZZqlbjmM/bbWRRHQSj7FKV2Vt+0q9yuU7EGV HiftqwBc1muih5/ZzhS6+4LB5YMc6xOiUyszRZyWcZNbepwRbpyuHOpo+sPho/qT6clV uvcAXVOBeqVTwHTScHKXuSH+6XsO3B4O2AHkEdBK+NWuut5ItyOLzhGAuPN6cf+SyrHY 12jTz0OyLhVySVocAwpCqH80M+ffwEAgqoUf7YpPo/vIKawLqQzrxrfwt3R5/6bI17/n r35g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g14-20020a1709065d0e00b00723ed83e871si2764399ejt.185.2022.06.23.03.25.19; Thu, 23 Jun 2022 03:25:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231281AbiFWKNX (ORCPT + 99 others); Thu, 23 Jun 2022 06:13:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230127AbiFWKNW (ORCPT ); Thu, 23 Jun 2022 06:13:22 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2DD424474F for ; Thu, 23 Jun 2022 03:13:21 -0700 (PDT) Received: from gallifrey.ext.pengutronix.de ([2001:67c:670:201:5054:ff:fe8d:eefb] helo=[IPv6:::1]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o4Jpt-0003Vx-Ky; Thu, 23 Jun 2022 12:13:17 +0200 Message-ID: <708e27755317a7650ca08ba2e4c14691ac0d6ba2.camel@pengutronix.de> Subject: Re: DMA-buf and uncached system memory From: Lucas Stach To: Christian =?ISO-8859-1?Q?K=F6nig?= , Pekka Paalanen Cc: "Sharma, Shashank" , lkml , dri-devel , Nicolas Dufresne , linaro-mm-sig@lists.linaro.org, Sumit Semwal , linux-media Date: Thu, 23 Jun 2022 12:13:16 +0200 In-Reply-To: <05814ddb-4f3e-99d8-025a-c31db7b2c46b@amd.com> References: <91ff0bbb-ea3a-2663-3453-dea96ccd6dd8@amd.com> <9178e19f5c0e141772b61b759abaa0d176f902b6.camel@ndufresne.ca> <20220623101326.18beeab3@eldfell> <954d0a9b-29ef-52ef-f6ca-22d7e6aa3f4d@amd.com> <4b69f9f542d6efde2190b73c87096e87fa24d8ef.camel@pengutronix.de> <95cca943bbfda6af07339fb8d2dc7f4da3aa0280.camel@pengutronix.de> <05814ddb-4f3e-99d8-025a-c31db7b2c46b@amd.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 2001:67c:670:201:5054:ff:fe8d:eefb X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-kernel@vger.kernel.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am Donnerstag, dem 23.06.2022 um 11:46 +0200 schrieb Christian König: > Am 23.06.22 um 11:33 schrieb Lucas Stach: > > [SNIP] > > > > > > In the DMA API keeping things mapped is also a valid use-case, but then > > > > > > you need to do explicit domain transfers via the dma_sync_* family, > > > > > > which DMA-buf has not inherited. Again those sync are no-ops on cache > > > > > > coherent architectures, but do any necessary cache maintenance on non > > > > > > coherent arches. > > > > > Correct, yes. Coherency is mandatory for DMA-buf, you can't use > > > > > dma_sync_* on it when you are the importer. > > > > > > > > > > The exporter could of course make use of that because he is the owner of > > > > > the buffer. > > > > In the example given here with UVC video, you don't know that the > > > > buffer will be exported and needs to be coherent without > > > > synchronization points, due to the mapping cache at the DRM side. So > > > > V4L2 naturally allocates the buffers from CPU cached memory. If the > > > > expectation is that those buffers are device coherent without relying > > > > on the map/unmap_attachment calls, then V4L2 needs to always > > > > synchronize caches on DQBUF when the buffer is allocated from CPU > > > > cached memory and a single DMA-buf attachment exists. And while writing > > > > this I realize that this is probably exactly what V4L2 should do... > > > No, the expectation is that the importer can deal with whatever the > > > exporter provides. > > > > > > If the importer can't access the DMA-buf coherently it's his job to > > > handle that gracefully. > > How does the importer know that the memory behind the DMA-buf is in CPU > > cached memory? > > > > If you now tell me that an importer always needs to assume this and > > reject the import if it can't do snooping, then any DMA-buf usage on > > most ARM SoCs is currently invalid usage. > > Yes, exactly that. I've pointed out a couple of times now that a lot of > ARM SoCs don't implement that the way we need it. > > We already had tons of bug reports because somebody attached a random > PCI root complex to an ARM SoC and expected it to work with for example > an AMD GPU. > > Non-cache coherent applications are currently not really supported by > the DMA-buf framework in any way. > I'm not talking about bolting on a PCIe root complex, with its implicit inherited "PCI is cache coherent" expectations to a ARM SoC, but just the standard VPU/GPU/display engines are not snooping on most ARM SoCs. > > On most of the multimedia > > targeted ARM SoCs being unable to snoop the cache is the norm, not an > > exception. > > > > > See for example on AMD/Intel hardware most of the engines can perfectly > > > deal with cache coherent memory accesses. Only the display engines can't. > > > > > > So on import time we can't even say if the access can be coherent and > > > snoop the CPU cache or not because we don't know how the imported > > > DMA-buf will be used later on. > > > > > So for those mixed use cases, wouldn't it help to have something > > similar to the dma_sync in the DMA-buf API, so your scanout usage can > > tell the exporter that it's going to do non-snoop access and any dirty > > cache lines must be cleaned? Signaling this to the exporter would allow > > to skip the cache maintenance if the buffer is in CPU uncached memory, > > which again is a default case for the ARM SoC world. > > Well for the AMD and Intel use cases we at least have the opportunity to > signal cache flushing, but I'm not sure if that counts for everybody. > Sure, all the non-coherent arches have some way to do the cache maintenance in some explicit way. Non coherent and no cache maintenance instruction would be a recipe for desaster. ;) > What we would rather do for those use cases is an indicator on the > DMA-buf if the underlying backing store is CPU cached or not. The > importer can then cleanly reject the use cases where it can't support > CPU cache snooping. > > This then results in the normal fallback paths which we have anyway for > those use cases because DMA-buf sharing is not always possible. > That's a very x86 centric world view you have there. 99% of DMA-buf uses on those cheap ARM SoCs is non-snooping. We can not do any fallbacks here, as the whole graphics world on those SoCs with their different IP cores mixed together depends on DMA-buf sharing working efficiently even when the SoC is mostly non coherent. In fact DMA-buf sharing works fine on most of those SoCs because everyone just assumes that all the accelerators don't snoop, so the memory shared via DMA-buf is mostly CPU uncached. It only falls apart for uses like the UVC cameras, where the shared buffer ends up being CPU cached. Non-coherent without explicit domain transfer points is just not going to work. So why can't we solve the issue for DMA-buf in the same way as the DMA API already solved it years ago: by adding the equivalent of the dma_sync calls that do cache maintenance when necessary? On x86 (or any system where things are mostly coherent) you could still no-op them for the common case and only trigger cache cleaning if the importer explicitly says that is going to do a non-snooping access. Regards, Lucas