Received: by 2002:ab2:3319:0:b0:1ef:7a0f:c32d with SMTP id i25csp686435lqc; Fri, 8 Mar 2024 08:49:56 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVshdsRA0vbxr/tiDGMn+xy2qNNe02EI5JU29r89atDnF2bt9VDlsxaQshwvsfA3fLWrVZHsqYpuJhy3/jdscuPC8yNMNOSUvW8Fch9+g== X-Google-Smtp-Source: AGHT+IH60MLTiHqL/YVFtoz8aOfhxPYlohcyjhebkn/3usHSoY16b66TDxTxoepfqtjLtE+ueNRr X-Received: by 2002:a05:622a:1303:b0:42e:c5b6:1328 with SMTP id v3-20020a05622a130300b0042ec5b61328mr573108qtk.38.1709916596448; Fri, 08 Mar 2024 08:49:56 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709916596; cv=pass; d=google.com; s=arc-20160816; b=IcxAJoYR79SyiR7z7ahulmkCP9gEttTcTzIAzpFXXFUo3Fhvdfe0VgENAWkYi5fKMn EiBEAJBYtYgxS+sI/BxaTBCe7P/QuPp8+Cy2sm4/to8Iz1twcCErLbYKmDHj1QI60ApE Jzd1jddtqICeV6sXTwB70M0GHoblS2e3T71Axois5G16JaK557XTwzRXuS0cboPXjBZE 2/khewB9EW8iVUgbclGhIrhCO3LA+KENfxaWh6IK57xFVxiKwVe7d1Q2251gW5XKe6JU j8FMQBZF4sGv/mpijMIvudY++wwia2CTkHuZKxzTOemcYfmPFqCAVTqnI2Y2qpGTOtEj 9GIA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date; bh=FJdRNrfaUk+clLIWSPTwanrknh7+Ussqz1t90RvWT2o=; fh=jRDc2X3onbhpqdaKS9nBti11D4KIfRsMU+kHMcDU2hM=; b=LtKKikN56nv8QMRt8isTIpMU9s1F6VRSAliDKZHJbDqxPgdssYBBfySwyvfEVPbqIq 2b8hFQyAOlAaQnh1VPOUIdh4QMpqxNhGgUw+96a/CBg8R36TkbLO4rMlFPAKzQF7+W6M N8N9Uy1eB9E93hjLUYI7NLAM9CjQa82JM10YiG26B8wA3gfKTT+w45u2TCSvQsOp0pVM tOm+j5VhJ9lKJYHM96dtNQeCsUwun+zghlbgE9L3QI8fHoDG20QjJsLfdaoPN9OHw7C3 8eSS0R9KyhvXjCfmVJ3abOFJZvzLuNRHEsFUdAAI4k4VQm47xrzaERa2Y7QzfpFbqWBL +Fpg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-97328-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-97328-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id v2-20020a05622a014200b0042dfb6f53a2si6589056qtw.748.2024.03.08.08.49.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Mar 2024 08:49:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-97328-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-97328-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-97328-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 131D71C20B46 for ; Fri, 8 Mar 2024 16:49:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BEAB31CA97; Fri, 8 Mar 2024 16:49:31 +0000 (UTC) Received: from verein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A72C01C295; Fri, 8 Mar 2024 16:49:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.95.11.211 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709916571; cv=none; b=GzBdn1IQXyzYqn6ErG7GRwTr+2VKV7HPJwYZgHM1g6llE4tjVuovQmya/5j7+ofYj1zMqje7vz4eiAWZEfUc0MpMSDbL1Ibhnj7NLTSPiXWO/tfmQK9K0QIIjPlEo8yxNzL4+frlqDWeJpeDOpEjh0Yl6srW0gOfsh6hOGqOX8c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709916571; c=relaxed/simple; bh=KzdKUekyi9rBF/gDC5z7G4GjeHKa0pomXxJxjC5D0Xc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=dplyUU3oQ489ZG7EOUXb2fiNPGjdbyoe5PzKd8NYCCknNTHSWk6kYxj7XYZWVdGwpcUz55jWdkMBPWDwNbJHhposntsq9mj6JFIuvZ/VhYZsM/BOCqbKxcJXJk6HY0sSMUO2QEmwLUuOjUTrTUH1FIS4+BOhFOIXo8jNJHaSp1c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=pass smtp.mailfrom=lst.de; arc=none smtp.client-ip=213.95.11.211 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lst.de Received: by verein.lst.de (Postfix, from userid 2407) id E414168BEB; Fri, 8 Mar 2024 17:49:20 +0100 (CET) Date: Fri, 8 Mar 2024 17:49:20 +0100 From: Christoph Hellwig To: Jason Gunthorpe Cc: Christoph Hellwig , Leon Romanovsky , Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240308164920.GA17991@lst.de> References: <47afacda-3023-4eb7-b227-5f725c3187c2@arm.com> <20240305122935.GB36868@unreal> <20240306144416.GB19711@lst.de> <20240306154328.GM9225@ziepe.ca> <20240306162022.GB28427@lst.de> <20240306174456.GO9225@ziepe.ca> <20240306221400.GA8663@lst.de> <20240307000036.GP9225@ziepe.ca> <20240307150505.GA28978@lst.de> <20240307210116.GQ9225@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240307210116.GQ9225@ziepe.ca> User-Agent: Mutt/1.5.17 (2007-11-01) On Thu, Mar 07, 2024 at 05:01:16PM -0400, Jason Gunthorpe wrote: > > > > It's just kinda hard to do. For aligned IOMMU mapping you'd only > > have one dma_addr_t mappings (or maybe a few if P2P regions are > > involved), so this probably doesn't matter. For direct mappings > > you'd have a few, but maybe the better answer is to use THP > > more aggressively and reduce the number of segments. > > Right, those things have all been done. 100GB of huge pages is still > using a fair amount of memory for storing dma_addr_t's. > > It is hard to do perfectly, but I think it is not so bad if we focus > on the direct only case and simple systems that can exclude swiotlb > early on. Even with direct mappings only we still need to take care of cache synchronization. > > If all flows includes multiple non-coalesced regions that just makes > > things very complicated, and that's exactly what I'd want to avoid. > > I don't see how to avoid it unless we say RDMA shouldn't use this API, > which is kind of the whole point from my perspective.. The DMA API callers really need to know what is P2P or not for various reasons. And they should generally have that information available, either from pin_user_pages that needs to special case it or from the in-kernel I/O submitter that build it from P2P and normal memory. > Sure, 3 SGL entries is fine, that isn't what I'm pointing at > > I'm saying that today if you give such a scatterlist to dma_map_sg() > it scans it and computes the IOVA space need, allocates one IOVA > space, then subdivides that single space up into the 3 HW SGLs you > show. > > If you don't preserve that then we are calling, 4k at a time, a > dma_map_page() which is not anywhere close to the same outcome as what > dma_map_sg did. I may not get contiguous IOVA, I may not get 3 SGLs, > and we call into the IOVA allocator a huge number of times. Again, your callers must know what is a P2P region and what is not. I don't think it is a hard burdern to do mappings at that granularity, and we can encapsulate this in nice helpes for say the block layer and pin_user_pages callers to start. > > It needs to work following the same basic structure of dma_map_sg, > unfolding that logic into helpers so that the driver can provide > the data structure: > > - Scan the io ranges and figure out how much IOVA needed > (dma_io_summarize_range) That is in general a function of the upper layer and not the DMA code. > - Allocate the IOVA (dma_init_io) And this step is only needed for the iommu case. > > That's why I really just want 2 cases. If the caller guarantees the > > range is coalescable and there is an IOMMU use the iommu-API like > > API, else just iter over map_single/page. > > But how does the caller even know if it is coalescable? Other than the > trivial case of a single CPU range, that is a complicated detail based > on what pages are inside the range combined with the capability of the > device doing DMA. I don't see a simple way for the caller to figure > this out. You need to sweep every page and collect some information on > it. The above is to abstract that detail. dma_get_merge_boundary already provides this information in terms of the device capabilities. And given that the callers knows what is P2P and what is not we have all the information that is needed.