Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2540776lqp; Mon, 25 Mar 2024 01:47:50 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVLLHvRHSOuu4YzcwvGA2rrSYGAPlxnFGNKO1/oBOkS8cgDtYaObyR+0oYJ2FCTMxa/+QQrvS1+wXOze2cnFD+VuieTYypty4pTOMjRmw== X-Google-Smtp-Source: AGHT+IEi0pXxOfwkuXjIhedsNuPNlEtOBuC9W94vWZ+hMAxlnU32vl9IhoTIO0Ly+KJVl1Ic1rhm X-Received: by 2002:a17:902:ea08:b0:1e0:a525:25d8 with SMTP id s8-20020a170902ea0800b001e0a52525d8mr6684453plg.19.1711356469753; Mon, 25 Mar 2024 01:47:49 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711356469; cv=pass; d=google.com; s=arc-20160816; b=dbTXUe3e5J9PSEkjkiwCbjuuHTWKy22gW/L/x4REOWwJVl0QKsHamrrYGTiekMyUah /QURuL5o3qYTCmGuPPCx/WAqUA+uvlmV6Gsf6PzHKM4dNqH5Er/3d4NuwVymjXO4/3nh r5MDUcYULbuhRCORpbZCHI/P3wQhx26eRmUWdKrFuGaqihx0wWV5h8rM7QK1KAb9+x1j rLBBauDy+XqF6cEBJhatvT27Mfqq89t1bU0rgvY/e40iHWTPeREqYN7ywPZB7FQsIeM4 c+wcYF4CjEiiPbXW+KrCOwZT3COAAMtWXx/v5a0OdKf+PSbJZSi+cBgPeyoIVAuXE5Ex nh2Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date; bh=wdf6PUBzbOxshWuBwuNO7bfujr6oxShqhVnMFylPW/s=; fh=jRDc2X3onbhpqdaKS9nBti11D4KIfRsMU+kHMcDU2hM=; b=VRO2Qm/eLfTyu354NEN7X4+gZt70xqIoHKHUp/+Td981EjERv8U8zUUUnIAABU6Ffu 93DNOc9NpJ4TEhUZOlfBz1HJ1UfBR8WHxK3Kso/L+YVgmBwWsSB9TMdyMrWik8ow6afm 1WCmMManWPw6SHRSGt63MoMAhWBxbpb9+kIr6f1VAjw7ezPgKnsAvuHg5Hjr/u6lUMkQ y972bsynlM4SICkPse2BgXXMk3hatfGU1e2eNjsKl9K/a0Vg4iBEjQ6ONB8YUOf5Ip5q XOHxO4wzNx2f5ireC5pc7LxYoMRDXbrpdmvFmJMootmyu9GCIH/Cx2SH4dU6zvsOXbGb FL3g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-116001-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116001-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id s10-20020a17090302ca00b001e0a7c2cc80si4197994plk.153.2024.03.25.01.47.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 01:47:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-116001-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-116001-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116001-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 3D8BFB2280A for ; Mon, 25 Mar 2024 08:14:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AFB6D173D97; Mon, 25 Mar 2024 03:10:38 +0000 (UTC) Received: from verein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC35924A899; Sun, 24 Mar 2024 23:22:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.95.11.211 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322547; cv=none; b=Agsj4QlORW1UWFl/bSqBJXKOcGM49TeKqTsbvn5zg3jzx6j6YE0+fz+vUyi+WziMBkdoGFKts+mNUCbxInhkW4rF2MNs8aRxVgkfmuOGAKiBKEakH2pP7qRKXlOyaKWbcbNpSvtPTObZclvXeyIG7pcjP/8V1AoyPKwbRXeUgms= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711322547; c=relaxed/simple; bh=QD2J7s+PbQMl/4TLX0Q9/+FJg4j9efgCpQRUHwRCQyI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Ysn1E/vkAvIpQQdhL/buj0v+IHKqkpgM6yHYp1jGB5CPiljkxWOEFqZT8cv6LIXCuc1R8xyUwEq7TxJo9Xgq9Yx882MCZLashOUeInZZHj/4F8SdUmK33jqUKxbkyjE2SbbE6SbQ9qXDIHPRSJbf655mYinhN+SfSHsvwKo6fa8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=pass smtp.mailfrom=lst.de; arc=none smtp.client-ip=213.95.11.211 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lst.de Received: by verein.lst.de (Postfix, from userid 2407) id 08B4F68D0F; Mon, 25 Mar 2024 00:22:16 +0100 (CET) Date: Mon, 25 Mar 2024 00:22:15 +0100 From: Christoph Hellwig To: Jason Gunthorpe Cc: Christoph Hellwig , Leon Romanovsky , Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240324232215.GC20765@lst.de> References: <20240306221400.GA8663@lst.de> <20240307000036.GP9225@ziepe.ca> <20240307150505.GA28978@lst.de> <20240307210116.GQ9225@ziepe.ca> <20240308164920.GA17991@lst.de> <20240308202342.GZ9225@ziepe.ca> <20240309161418.GA27113@lst.de> <20240319153620.GB66976@ziepe.ca> <20240321223910.GA22663@lst.de> <20240322184330.GL66976@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240322184330.GL66976@ziepe.ca> User-Agent: Mutt/1.5.17 (2007-11-01) On Fri, Mar 22, 2024 at 03:43:30PM -0300, Jason Gunthorpe wrote: > If we are going to make caller provided uniformity a requirement, lets > imagine a formal memory type idea to help keep this a little > abstracted? > > DMA_MEMORY_TYPE_NORMAL > DMA_MEMORY_TYPE_P2P_NOT_ACS > DMA_MEMORY_TYPE_ENCRYPTED > DMA_MEMORY_TYPE_BOUNCE_BUFFER // ?? > > Then maybe the driver flow looks like: > > if (transaction.memory_type == DMA_MEMORY_TYPE_NORMAL && dma_api_has_iommu(dev)) { Add a nice helper to make this somewhat readable, but yes. > } else if (transaction.memory_type == DMA_MEMORY_TYPE_P2P_NOT_ACS) { > num_hwsgls = transcation.num_sgls; > for_each_range(transaction, range) { > hwsgl[i].addr = dma_api_p2p_not_acs_map(range.start_physical, range.length, p2p_memory_provider); > hwsgl[i].len = range.size; > } > } else { > /* Must be DMA_MEMORY_TYPE_NORMAL, DMA_MEMORY_TYPE_ENCRYPTED, DMA_MEMORY_TYPE_BOUNCE_BUFFER? */ > num_hwsgls = transcation.num_sgls; > for_each_range(transaction, range) { > hwsgl[i].addr = dma_api_map_cpu_page(range.start_page, range.length); > hwsgl[i].len = range.size; > } > And these two are really the same except that we call a different map helper underneath. So I think as far as the driver is concerned they should be the same, the DMA API just needs to key off the memory tap. > And the hmm_range_fault case is sort of like: > > struct dma_api_iommu_state state; > dma_api_iommu_start(&state, mr.num_pages); > > [..] > hmm_range_fault(...) > if (present) > dma_link_page(&state, faulting_address_offset, page); > else > dma_unlink_page(&state, faulting_address_offset, page); > > Is this looking closer? Yes. > > > So I take it as a requirement that RDMA MUST make single MR's out of a > > > hodgepodge of page types. RDMA MRs cannot be split. Multiple MR's are > > > not a functional replacement for a single MR. > > > > But MRs consolidate multiple dma addresses anyway. > > I'm not sure I understand this? The RDMA MRs take a a list of PFNish address, (or SGLs with the enhanced MRs from Mellanox) and give you back a single rkey/lkey. > To go back to my main thesis - I would like a high performance low > level DMA API that is capable enough that it could implement > scatterlist dma_map_sg() and thus also implement any future > scatterlist_v2, bio, hmm_range_fault or any other thing we come up > with on top of it. This is broadly what I thought we agreed to at LSF > last year. I think the biggest underlying problem of the scatterlist based DMA implementation for IOMMUs is that it's trying to handle to much, that is magic coalescing even if the segments boundaries don't align with the IOMMU page size. If we can get rid of that misfeature I think we'd greatly simply the API and implementation.