Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp1431635pxb; Fri, 20 Aug 2021 05:36:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxk/MEWBuNuuWt1keTDByckTitBhPT28DK5iczDTCyC218NuV3r3HJ51QvE98cliP7g/0Ir X-Received: by 2002:a17:906:3c10:: with SMTP id h16mr21928943ejg.205.1629462979698; Fri, 20 Aug 2021 05:36:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629462979; cv=none; d=google.com; s=arc-20160816; b=swCjBcSLdCeXABxfTWbYCW6+TBTPud8MmcqpCl5hmeLgvArFRAN3n91A/I0/Qq3kBn PcETfQqEtprkX282Sbyal2xbNEddbXDh5wKWn4E6Uz/W9Y1IDJFxbhKleiuUBIVIpMad XuDZShcKBc2OEDBDEBNLychJjot/K20XrDZBWrw//GZ61zygvTQwhy1/1/AuY4lDToKA vgZBkwAKmafRSqHMXMABAk1Vw9VA1UzFieljM3qE9edGey+AIt1I4t0Dy9QxgaX2A/St kUAL4+ZRmIM8wFXNWzgY+bz1xF4NXtYct5Ymjs1Qx9nynxTt7K+QllSSrgoAIRRDSCtH 1ODA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=WeQmd5+0vxDa5m6jnZ3knC4/DcXKbYH1w7mE5S3oKFc=; b=dusY+kO+UPsS7LRKchYoSz/pg3S24JabB43p4j0WmpJq8/BqXn2OvxUA6wmSXCtyaf aWexFsBXkJlAyQvXRfKg1x+smYicXWE4zWpeGuPSZRl5DT1/9shF7jcA+RknD2HVmyrH xi/GiKOyuLnrzW2PDIlgQfxsgU2F6k+nYqSmmTjbf4Mmd2wosSQk4ZtKMZxYrGi4SXVJ xMMhswOQH0f/uAZKF52Q1tV8YUark81sh/JZmQf7oz62SOJda2qEfy28sLLb8Igpd9on plUl2wL1zJx1N7nW0GGaF7U+u+pP/sDtV46SBVtjKwBlcq2PsEp0tLCziry56Xlu0K3t RFUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=EmixaAS+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lt22si6708301ejb.119.2021.08.20.05.35.48; Fri, 20 Aug 2021 05:36:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=EmixaAS+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240509AbhHTMd7 (ORCPT + 99 others); Fri, 20 Aug 2021 08:33:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240472AbhHTMd4 (ORCPT ); Fri, 20 Aug 2021 08:33:56 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5BA7C061757 for ; Fri, 20 Aug 2021 05:33:18 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id x5so7283678qtq.13 for ; Fri, 20 Aug 2021 05:33:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=WeQmd5+0vxDa5m6jnZ3knC4/DcXKbYH1w7mE5S3oKFc=; b=EmixaAS+anbfR20Pxi+he04PHrXHQozzeVwOkECBp43bQMSJrCajuRzjIPkPrgPDjx zl9jUbk0xDHwyqqTUkqMjtn3fYhvvWIlUVIyXZlT5bEj6DnpytFMZwI82abS2Gr3URni ZbmTi3v+O/OZ4rafrzDZ+O2BrbV+uARvR2UJJ4TP5dqWJBCHOn8WvoGPiypRRpV3Q1O3 6BPFzPT0PXqyaAj7SnZ2ZgjaoI+tvWldLGjbXUn2FSEN4JrdFZXjkbkQ2FhJm9/ZVxMW iu7vGS4SxP9v2VZPwGpWMwyCIuZDriXLGP7GfFcjuZQdrIyVN/IGB88ogyCHoimSNfQn tJZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=WeQmd5+0vxDa5m6jnZ3knC4/DcXKbYH1w7mE5S3oKFc=; b=iIEiTUYDYCBy+qzGafoboJ182kK3nMg4Rjj0qwL3IM2U3Fq57AMWlTBiuMnEU4fVUF t6bSI8U5uu3lL/vUoGU6FMC4OTMnXCF3WLm76eudnxNvDB6zX4Y94n/k8xS5D1wH9H4h vP+F+k1476bBkgvaHF9PDGY5optYfpYhm4LhCM8SUGWs7AGE4KpasE5gOJg5wKEP1uhK itt7+GhAcF3abJyg3663XfSv5S8x/We/J0hnMspvEWfGuZQn3V1sLpSJX/nX63ala24s bizqlauo9sY89DU62Voxu3pK5w5wwXcJ2vQiBOyhdLRJq+lAb7NAYLQYgo+Mu7OiWLYp kFaQ== X-Gm-Message-State: AOAM531FtWSHIW5cV7NmSGSikFZlEijNxtrX07CEZcm0eTVirh6lG2c+ i+MzMzBuBAZNQL5lQyRJyhi+zQ== X-Received: by 2002:a05:622a:1aaa:: with SMTP id s42mr17497328qtc.122.1629462797984; Fri, 20 Aug 2021 05:33:17 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-113-129.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.113.129]) by smtp.gmail.com with ESMTPSA id j26sm2632446qki.26.2021.08.20.05.33.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Aug 2021 05:33:17 -0700 (PDT) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1mH3i0-001pzx-AP; Fri, 20 Aug 2021 09:33:16 -0300 Date: Fri, 20 Aug 2021 09:33:16 -0300 From: Jason Gunthorpe To: Daniel Vetter Cc: Gal Pressman , Sumit Semwal , Christian =?utf-8?B?S8O2bmln?= , Doug Ledford , "open list:DMA BUFFER SHARING FRAMEWORK" , dri-devel , Linux Kernel Mailing List , linux-rdma , Oded Gabbay , Tomer Tayar , Yossi Leybovich , Alexander Matushevsky , Leon Romanovsky , Jianxin Xiong , John Hubbard Subject: Re: [RFC] Make use of non-dynamic dmabuf in RDMA Message-ID: <20210820123316.GV543798@ziepe.ca> References: <20210818074352.29950-1-galpress@amazon.com> <20210819230602.GU543798@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 20, 2021 at 09:25:30AM +0200, Daniel Vetter wrote: > On Fri, Aug 20, 2021 at 1:06 AM Jason Gunthorpe wrote: > > On Wed, Aug 18, 2021 at 11:34:51AM +0200, Daniel Vetter wrote: > > > On Wed, Aug 18, 2021 at 9:45 AM Gal Pressman wrote: > > > > > > > > Hey all, > > > > > > > > Currently, the RDMA subsystem can only work with dynamic dmabuf > > > > attachments, which requires the RDMA device to support on-demand-paging > > > > (ODP) which is not common on most devices (only supported by mlx5). > > > > > > > > While the dynamic requirement makes sense for certain GPUs, some devices > > > > (such as habanalabs) have device memory that is always "pinned" and do > > > > not need/use the move_notify operation. > > > > > > > > The motivation of this RFC is to use habanalabs as the dmabuf exporter, > > > > and EFA as the importer to allow for peer2peer access through libibverbs. > > > > > > > > This draft patch changes the dmabuf driver to differentiate between > > > > static/dynamic attachments by looking at the move_notify op instead of > > > > the importer_ops struct, and allowing the peer2peer flag to be enabled > > > > in case of a static exporter. > > > > > > > > Thanks > > > > > > > > Signed-off-by: Gal Pressman > > > > > > Given that habanalabs dma-buf support is very firmly in limbo (at > > > least it's not yet in linux-next or anywhere else) I think you want to > > > solve that problem first before we tackle the additional issue of > > > making p2p work without dynamic dma-buf. Without that it just doesn't > > > make a lot of sense really to talk about solutions here. > > > > I have been thinking about adding a dmabuf exporter to VFIO, for > > basically the same reason habana labs wants to do it. > > > > In that situation we'd want to see an approach similar to this as well > > to have a broad usability. > > > > The GPU drivers also want this for certain sophisticated scenarios > > with RDMA, the intree drivers just haven't quite got there yet. > > > > So, I think it is worthwhile to start thinking about this regardless > > of habana labs. > > Oh sure, I've been having these for a while. I think there's two options: > - some kind of soft-pin, where the contract is that we only revoke > when absolutely necessary, and it's expected to be catastrophic on the > importer's side. Honestly, I'm not very keen on this. We don't really have HW support in several RDMA scenarios for even catastrophic unpin. Gal, can EFA even do this for a MR? You basically have to resize the rkey/lkey to zero length (or invalidate it like a FMR) under the catstrophic revoke. The rkey/lkey cannot just be destroyed as that opens a security problem with rkey/lkey re-use. I think I saw EFA's current out of tree implementations had this bug. > to do is mmap revoke), and I think that model of exclusive device > ownership with the option to revoke fits pretty well for at least some > of the accelerators floating around. In that case importers would > never get a move_notify (maybe we should call this revoke_notify to > make it clear it's a bit different) callback, except when the entire > thing has been yanked. I think that would fit pretty well for VFIO, > and I think we should be able to make it work for rdma too as some > kind of auto-deregister. The locking might be fun with both of these > since I expect some inversions compared to the register path, we'll > have to figure these out. It fits semantically nicely, VFIO also has a revoke semantic for BAR mappings. The challenge is the RDMA side which doesn't have a 'dma disabled error state' for objects as part of the spec. Some HW, like mlx5, can implement this for MR objects (see revoke_mr), but I don't know if anything else can, and even mlx5 currently can't do a revoke for any other object type. I don't know how useful it would be, need to check on some of the use cases. The locking is tricky as we have to issue a device command, but that device command cannot run concurrently with destruction or the tail part of creation. Jason