Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5487931pxj; Wed, 23 Jun 2021 02:16:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwZbjzbktsqP9AGvSEjqeCArxaYctdeGo6AoMtfeHFgy3KdfCnt+rCypWSWn01DBmwYRTPR X-Received: by 2002:a02:a486:: with SMTP id d6mr8002812jam.88.1624439798513; Wed, 23 Jun 2021 02:16:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624439798; cv=none; d=google.com; s=arc-20160816; b=uMqBopzsiGSdX7SnbuJmJl6/NhusTvd6uCy9B9JQVIFNhc4VUcpSCBC6yopwwe5/jf wSP4cxgDeo08RpLkpjHS+7tT6YGFu5brq5Mhr7NmAxZrBAaJWCe4y/LyMZsg+MxALQYf z9ES8ka2PXB/wF97ZchSOFB4zcXuhDDLLDR7p8FyO/LqWtjtiKhgzJTjOGatqf8qaEot qZE8ZryEfkjGv0HTWFjcEUOs3wQOJu6Ax7X9xPo1ARUwhpy8GvA465BRKiLRODbrMkSI K4C2e1985yTT6IVLHfBdy+6yOc2qRCh8ZxhaX/E/1PyRfsxUhF9zaGYQ1TyZ13ydW5hl 12hw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=quouQNx98OeM2JWkzBIUm3+vF8x4xm+zxWV/raoo3aQ=; b=SAQhIUsRh9kQ3nx7yUJN0uUH8KWiebJ12ruMmAaOoOQfqDS+6C9597u2tFEF3rEGpK 04yNTliL35mxP92kTHYi+e5rtwJUYw+7Tmz+3Buhf3AdCEWCIHhSbgA/0THgcZPsxww1 1xdJ2FpbBOwvpYKasDsNEpiQUMewwstl8MiatabaDvlKLI5QdwCJeWfaaAL5ZVZ0LdWu e2/No0OxAMNNga2DCrazIWdo9r7GS6UPWSikLBzUsPGUTUxD8d54wM5FrxlI/e9LiNoP rGsiWo+xeYYaCjiSm/2WobxvYls04OvnV8z1dKKBhm9lay/+qOF5rWCeflKqXBHIhUZ2 itxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Fmu2kU2I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y12si2042420jap.126.2021.06.23.02.16.27; Wed, 23 Jun 2021 02:16:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Fmu2kU2I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230241AbhFWJRr (ORCPT + 99 others); Wed, 23 Jun 2021 05:17:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230205AbhFWJRq (ORCPT ); Wed, 23 Jun 2021 05:17:46 -0400 Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28CAEC061760; Wed, 23 Jun 2021 02:15:28 -0700 (PDT) Received: by mail-oi1-x235.google.com with SMTP id q10so2586520oij.5; Wed, 23 Jun 2021 02:15:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=quouQNx98OeM2JWkzBIUm3+vF8x4xm+zxWV/raoo3aQ=; b=Fmu2kU2I9U/tO77+2EI8nnRLVsm1KZns9poRXkmFydncMdpd17HRS7jmpP6eSA7e9H BPhqm28O6C2qLsEb6KXrV189fkUkVFRIdy10NA3Zi3UnajXBIYCYWRgiAEw+pfjA+jIR Gd+28Lw0bfeU/z2KQNu9rTF9lSNYnw+7oQjwznJh9yhwQHK7meHufl25CdQfhM7NkSrV x1wKS5LkejWdl4ftXDvaM7DGGXNZ15lP1fphbHDmeeUCuMuhLeLzYsDj20mIjg9ujadC +W1Of5Blo0JBXkv/cliU89T7B/mKdUdRC+M3luCI7hIv9SUyco0iL10gFoVPIYistGGv pC0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=quouQNx98OeM2JWkzBIUm3+vF8x4xm+zxWV/raoo3aQ=; b=LZwPIIajNlKq7A4HUTDu59V1+/QzizxdpUv0KqKynRA4TgyjzhWZwvSRyhd/f7XxX7 A0bWtqXAwhSAw8wMwoIRA1ySICDhzRP0IdpDoJLMQu8nwvOMdP8cBz03Cw6RpgtNMD71 ZdxEBn0T+iqem8Yw4UnCIbE4XUGBbrWj6kmGxJKwcEFb9+Ab+lTvVvv7/NdsxRx1h/tv PKxzzJUAifm3vH9C8+KqzNTNOfF5nh0RPaJ5q+v6VVmiW2YUkJBf4Km3nql3bgMIjbTe 9zeabWOG/7pVCvYHaX+5dh+gwWUGDqH/oIM5qYAO612Gnf9GrbhY4VEpKS0P6cA3GC3U A2Qg== X-Gm-Message-State: AOAM530qQQyuHFc+H4qtCua1fATn5lXn4ifkj18th6AFRf3YmkCYmmfl pEJ0BcdNoYBf8STHw2/Kee5mUk4zFotlR/f0LKE= X-Received: by 2002:aca:3bc3:: with SMTP id i186mr2375999oia.102.1624439727422; Wed, 23 Jun 2021 02:15:27 -0700 (PDT) MIME-Version: 1.0 References: <20210621232912.GK1096940@ziepe.ca> <20210622120142.GL1096940@ziepe.ca> <20210622152343.GO1096940@ziepe.ca> <3fabe8b7-7174-bf49-5ffe-26db30968a27@amd.com> <20210622154027.GS1096940@ziepe.ca> <09df4a03-d99c-3949-05b2-8b49c71a109e@amd.com> <20210622160538.GT1096940@ziepe.ca> In-Reply-To: From: Oded Gabbay Date: Wed, 23 Jun 2021 12:14:59 +0300 Message-ID: Subject: Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: Jason Gunthorpe , =?UTF-8?Q?Christian_K=C3=B6nig?= , Gal Pressman , sleybo@amazon.com, linux-rdma , Oded Gabbay , Christoph Hellwig , Linux Kernel Mailing List , dri-devel , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Doug Ledford , Tomer Tayar , amd-gfx list , Greg KH , Alex Deucher , Leon Romanovsky , "open list:DMA BUFFER SHARING FRAMEWORK" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 23, 2021 at 11:57 AM Christian K=C3=B6nig wrote: > > Am 22.06.21 um 18:05 schrieb Jason Gunthorpe: > > On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian K=C3=B6nig wrote: > >> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe: > >>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian K=C3=B6nig wrote: > >>>> [SNIP] > >>>> No absolutely not. NVidia GPUs work exactly the same way. > >>>> > >>>> And you have tons of similar cases in embedded and SoC systems where > >>>> intermediate memory between devices isn't directly addressable with = the CPU. > >>> None of that is PCI P2P. > >>> > >>> It is all some specialty direct transfer. > >>> > >>> You can't reasonably call dma_map_resource() on non CPU mapped memory > >>> for instance, what address would you pass? > >>> > >>> Do not confuse "I am doing transfers between two HW blocks" with PCI > >>> Peer to Peer DMA transfers - the latter is a very narrow subcase. > >>> > >>>> No, just using the dma_map_resource() interface. > >>> Ik, but yes that does "work". Logan's series is better. > >> No it isn't. It makes devices depend on allocating struct pages for th= eir > >> BARs which is not necessary nor desired. > > Which dramatically reduces the cost of establishing DMA mappings, a > > loop of dma_map_resource() is very expensive. > > Yeah, but that is perfectly ok. Our BAR allocations are either in chunks > of at least 2MiB or only a single 4KiB page. > > Oded might run into more performance problems, but those DMA-buf > mappings are usually set up only once. > > >> How do you prevent direct I/O on those pages for example? > > GUP fails. > > At least that is calming. > > >> Allocating a struct pages has their use case, for example for exposing= VRAM > >> as memory for HMM. But that is something very specific and should not = limit > >> PCIe P2P DMA in general. > > Sure, but that is an ideal we are far from obtaining, and nobody wants > > to work on it prefering to do hacky hacky like this. > > > > If you believe in this then remove the scatter list from dmabuf, add a > > new set of dma_map* APIs to work on physical addresses and all the > > other stuff needed. > > Yeah, that's what I totally agree on. And I actually hoped that the new > P2P work for PCIe would go into that direction, but that didn't > materialized. > > But allocating struct pages for PCIe BARs which are essentially > registers and not memory is much more hacky than the dma_resource_map() > approach. > > To re-iterate why I think that having struct pages for those BARs is a > bad idea: Our doorbells on AMD GPUs are write and read pointers for ring > buffers. > > When you write to the BAR you essentially tell the firmware that you > have either filled the ring buffer or read a bunch of it. This in turn > then triggers an interrupt in the hardware/firmware which was eventually > asleep. > > By using PCIe P2P we want to avoid the round trip to the CPU when one > device has filled the ring buffer and another device must be woken up to > process it. > > Think of it as MSI-X in reverse and allocating struct pages for those > BARs just to work around the shortcomings of the DMA API makes no sense > at all to me. We would also like to do that *in the future*. In Gaudi it will never be supported (due to security limitations) but I definitely see it happening in future ASICs. Oded > > > We also do have the VRAM BAR, and for HMM we do allocate struct pages > for the address range exposed there. But this is a different use case. > > Regards, > Christian. > > > > > Otherwise, we have what we have and drivers don't get to opt out. This > > is why the stuff in AMDGPU was NAK'd. > > > > Jason >