Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp829733iog; Fri, 24 Jun 2022 15:25:09 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uYDZsI5AGnpw4xpMAblxaIaqKOoYsEEWZA/DX5AyIQaqR7fnK4aX/96lrwKcu+0pj+pto8 X-Received: by 2002:a17:902:d151:b0:16a:13c6:69c8 with SMTP id t17-20020a170902d15100b0016a13c669c8mr1329135plt.116.1656109509390; Fri, 24 Jun 2022 15:25:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656109509; cv=none; d=google.com; s=arc-20160816; b=aPwI0Uax3iFSSMQrYf/j4wB9TqoCjtvqz54Deq/LUsUz0u+8JsGHjOeO7AVkw9mlvD jdlJllQwbhxTmNfztMzd4/FgqnhEvKc36RKy/XoDl6LvTX0coNXjABBi1radTDsmS5Fy xColHKlQjz0L4T820zOo6DAVMiqfhDYappWWkSq1hjbOMLfpzsm56gzGCYvSHtkbRkhl tT+jWT1zBudn6X6aCHkFmW70273hQ99+lWYOgC9fJOygm9cE9J1yXnWpoTIhrNu1oCWe YfmxoHFPjKLPso2TtlO1+yLvIrFX1j3gq0vseR8I4uR9D5pavFBH9CcQMo+rJWvu/InV BLjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:mail-followup-to :message-id:subject:cc:to:from:date:dkim-signature; bh=bViPZBAMgiXFQeC0Ts9kGgYh0djVJEBJnUFAscIdXfs=; b=J++M8X/mULa+yL1seiqTG9VshFFnD9D4mfot25MDra1TgIIWzPrTpDxA5ISaxTnbq8 yH9sLp0bgJShhtow4bYE9nAxnzvtjLMY/H0JWwJQsAnPPqW6kU876DOV39NqmZcxEAqx xPmzwxDuPvIQ13GanYOzI73o9j8BIWcx5MEa4Eqipcw/5GAHFqJ6YmVGpTrdHPk1L3pr Rph8jfeTt4mkNjQ4pIvtLLrjBPQAhilwBWddbV0VL4fsSdOZvdpA9iViy/4/bqpSxJDj eNjIKzW3G4huI9K/mMDACjgpkKJ+dR7LF8sfYSvVR9aIUAFrE5Mg5E8UcQ8q5EXveUu5 J05A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=Rz8Nx4ym; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g66-20020a636b45000000b00408871135c5si4248136pgc.559.2022.06.24.15.24.56; Fri, 24 Jun 2022 15:25:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ffwll.ch header.s=google header.b=Rz8Nx4ym; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229454AbiFXWCc (ORCPT + 99 others); Fri, 24 Jun 2022 18:02:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229998AbiFXWCb (ORCPT ); Fri, 24 Jun 2022 18:02:31 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EFFC87D5C for ; Fri, 24 Jun 2022 15:02:30 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id k22so4805056wrd.6 for ; Fri, 24 Jun 2022 15:02:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=bViPZBAMgiXFQeC0Ts9kGgYh0djVJEBJnUFAscIdXfs=; b=Rz8Nx4ymy75hrX0gaCBI4344J1bwBh+sOZ0axM97pDdxTgAZCwKsYA02NUuNgyuZeG qKht7O/JTqYnIz91prEg2+GBQqCdj3NY1VKMJIlEooRATP79woE+w3eDaKmVVDTxzbVf iIVM0EFWFYWW3QEvxzfR51JtkpYZKX/LIyVms= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=bViPZBAMgiXFQeC0Ts9kGgYh0djVJEBJnUFAscIdXfs=; b=zkWDNGJ5a5lBIBVwuFtf94fqMgrxTOfbzV9aBSqmIvHi58c4vPixHisWEPLutTukg1 jRh4zNo6SJEnzei+7lv5gstwTyT2inCGJWROa4wDGXoTEUnjL/A4GGXhZW5kgpDp5J6d mPlhOyOpOGa9NGLYBQ4g//H41nLJ+qm6mGLWmW312lISNbMQIG3kuInNXvHhkyeMvbFg ro0cYbzYSLczxG/vv9xbMqxyBf0OQcPzED6pz25kHu37vdGqM2nBkbLaaMRzt8ShZhr3 TK1eKbVobgOijfx4Cgl7owpV7Gl/rTfK0P70BAC6EiI/1DJ/uUIkXggQefvDjTZmcd2S /z9w== X-Gm-Message-State: AJIora9s9pvr9khygKJBrAKqL82YXqTVDIfcAChpRYzll2DqR0nZYs0V 23mUdkDLHeLQROGuN5KlwR6AJQ== X-Received: by 2002:adf:e502:0:b0:21b:8de6:7f14 with SMTP id j2-20020adfe502000000b0021b8de67f14mr1062488wrm.3.1656108148577; Fri, 24 Jun 2022 15:02:28 -0700 (PDT) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id v17-20020a5d43d1000000b0021b95bcaf7fsm3328710wrr.59.2022.06.24.15.02.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Jun 2022 15:02:27 -0700 (PDT) Date: Sat, 25 Jun 2022 00:02:25 +0200 From: Daniel Vetter To: Christian =?iso-8859-1?Q?K=F6nig?= Cc: Daniel Stone , Pekka Paalanen , "Sharma, Shashank" , lkml , dri-devel , Nicolas Dufresne , linaro-mm-sig@lists.linaro.org, Sumit Semwal , linux-media Subject: Re: [Linaro-mm-sig] Re: DMA-buf and uncached system memory Message-ID: Mail-Followup-To: Christian =?iso-8859-1?Q?K=F6nig?= , Daniel Stone , Pekka Paalanen , "Sharma, Shashank" , lkml , dri-devel , Nicolas Dufresne , linaro-mm-sig@lists.linaro.org, Sumit Semwal , linux-media References: <4b69f9f542d6efde2190b73c87096e87fa24d8ef.camel@pengutronix.de> <95cca943bbfda6af07339fb8d2dc7f4da3aa0280.camel@pengutronix.de> <05814ddb-4f3e-99d8-025a-c31db7b2c46b@amd.com> <708e27755317a7650ca08ba2e4c14691ac0d6ba2.camel@pengutronix.de> <6287f5f8-d9af-e03d-a2c8-ea8ddcbdc0d8@amd.com> <578953dd-6298-2bfe-a8fb-52004b84fd17@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <578953dd-6298-2bfe-a8fb-52004b84fd17@amd.com> X-Operating-System: Linux phenom 5.10.0-8-amd64 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 23, 2022 at 01:32:18PM +0200, Christian K?nig wrote: > Am 23.06.22 um 13:27 schrieb Daniel Stone: > > Hi Christian, > > > > On Thu, 23 Jun 2022 at 12:11, Christian K?nig wrote: > > > > In fact DMA-buf sharing works fine on most of those SoCs because > > > > everyone just assumes that all the accelerators don't snoop, so the > > > > memory shared via DMA-buf is mostly CPU uncached. It only falls apart > > > > for uses like the UVC cameras, where the shared buffer ends up being > > > > CPU cached. > > > Well then the existing DMA-buf framework is not what you want to use for > > > this. > > > > > > > Non-coherent without explicit domain transfer points is just not going > > > > to work. So why can't we solve the issue for DMA-buf in the same way as > > > > the DMA API already solved it years ago: by adding the equivalent of > > > > the dma_sync calls that do cache maintenance when necessary? On x86 (or > > > > any system where things are mostly coherent) you could still no-op them > > > > for the common case and only trigger cache cleaning if the importer > > > > explicitly says that is going to do a non-snooping access. > > > Because DMA-buf is a framework for buffer sharing between cache coherent > > > devices which don't signal transitions. > > > > > > We intentionally didn't implemented any of the dma_sync_* functions > > > because that would break the intended use case. > > > > > > You can of course use DMA-buf in an incoherent environment, but then you > > > can't expect that this works all the time. > > > > > > This is documented behavior and so far we have bluntly rejected any of > > > the complains that it doesn't work on most ARM SoCs and I don't really > > > see a way to do this differently. > > For some strange reason, 'let's do buffer sharing but make sure it > > doesn't work on Arm' wasn't exactly the intention of the groups who > > came together under the Linaro umbrella to create dmabuf. > > > > If it's really your belief that dmabuf requires universal snooping, I > > recommend you send the patch to update the documentation, as well as > > to remove DRIVER_PRIME from, realistically, most non-PCIE drivers. > > Well, to be honest I think that would indeed be necessary. > > What we have created are essentially two different worlds, one for PCI > devices and one for the rest. > > This was indeed not the intention, but it's a fact that basically all > DMA-buf based PCI drivers assume coherent access. dma-buf does not require universal snooping. It does defacto require that all device access is coherent with all other device access, and consistent with the exporters notion of how cpu coherency is achieved. Not that coherent does not mean snooping, as long as all devices do unsnooped access and the exporter either does wc/uc or flushes caches that's perfectly fine, and how all the arm soc dma-buf sharing works. We did originally have the wording in there that you have to map/unamp around every device access, but that got dropped because no one was doing that anyway. Now where this totally breaks down is how we make this work, because the idea was that dma_buf_attach validates this all. Where this means all the hilarious reasons buffer sharing might not work: - wrong coherency mode (cpu cached or not) - not contiguous (we do check that, but only once we get the sg from dma_buf_attachment_map, which strictly speaking is a bit too late but most drivers do attach&map as one step so not that bad in practice) - whether the dma api will throw in bounce buffers or not - random shit like "oh this is in the wrong memory bank", which I think never landed in upstream p2p connectivity is about the only one that gets this right, yay. And the only reason we can even get it right is because all the information is exposed to drivers fully. The issue is that the device dma api refuses to share this information because it would "leak". Which sucks, because we have defacto build every single cross-device use-case of dma-buf on the assumption we can check this (up to gl/vk specs), but oh well. So in practice this gets sorted out by endless piles of hacks to make individual use-cases work. Oh and: This is definitely not limited to arm socs. x86 socs with intel at least have exactly all the same issues, and they get solved by adding various shitty hacks to the involved drivers (like i915+amdgpu). Luckily the intel camera driver isn't in upstream yet, since that would break a bunch of the hacks since suddently there will be now 2 cpu cache incoherent devices in an x86 system. Ideally someone fixes this, but I'm not hopeful. I recommend pouring more drinks. What is definitely not correct is claiming that dma-buf wasn't meant for this. We discussed cache coherency issues endless in budapest 12 or so years ago, I was there. It's just that the reality of the current implementation is falling short, and every time someone tries to fix it we get shouted down by dma api maintainers for looking behind their current. tldr; You have to magically know to not use cpu cached allocators on these machines. Aside: This is also why vgem alloates wc memory on x86. It's the least common denominator that works. arm unfortunately doesn't allow you to allocate wc memory, so there stuff is simply somewhat broken. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch