Received: by 2002:a89:2c3:0:b0:1ed:23cc:44d1 with SMTP id d3csp1111578lqs; Wed, 6 Mar 2024 06:44:40 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVvVTwVgVSYxmNX9RkQC7mnaItRgifb9u0r4DCmWoT7yiJE9/apBy1BAa3tGx0JEze7+FMPdMEvyLxYmxnoXjmxhpDEtPDPr42+UjKVMw== X-Google-Smtp-Source: AGHT+IEeaDm+95uiiGmrXGvKXQUABeP0CU0C1JroDhwIdve4YQrZ80yKuQLoa3zjZRLimRdUje7s X-Received: by 2002:a50:bacb:0:b0:567:1797:426f with SMTP id x69-20020a50bacb000000b005671797426fmr7875460ede.24.1709736280127; Wed, 06 Mar 2024 06:44:40 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709736280; cv=pass; d=google.com; s=arc-20160816; b=PomyAfsjT/JcMDQGlUQNpNJDDlYABYD3jSipQh1KaRlgLwui2Wt9N3XnW5oeEfrYDa BIDIivb/MKKIFL9yffK4mKrvaAx736F5DBiz0USpV4jBVAPiR4l5TTS5Xe/Az4o+iqsp aG1XktP6Jsrni4N/3T8QR7mkoLS7AR40tRXcLpzMtWx1wk6rPieIVclPbLzcw6QZOZ0P iru1cmcfgfF7b0J9/EArkpnU873eR43y8F5EFnMnL+TfKuhI47MWTaeRl1qPaII99Qt7 OKJoFFgraWSn4vsiFLHVgseUfvKGbwPEuw6U/ws3aBqzHX/O7VarvVLmp0PqCVLJ5nX6 fsNg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date; bh=ubmPeJg46t3K7leg5ZeQN7O0CINWlIT7YUSXrdtujmc=; fh=R0Sp0xJ1jknWgejWtw+Wg9Lys8PGRPF3ucknzRBTtl0=; b=N/CrYCd81oJ3BnCA0bsOsF4Fd3teOhN+fVW6v1jdlDUk3/71ZSitG6BaJdg5aq1RHC ucIpmhVWtUKmA2OrR3jXCHptjQi7RbyQL5td3fnhVJb/P/rkcfQsHI+K9bYXl8aw88WA kiz8al2PaTsTrwd2ZPhZkgqZKGQpxAkH1UIcBoft1nHvsaAUdku5fUruIYeRdT7wOnhV xn/NQS3ym4wUL0wEci2dDiO+ZHnDy6j6bXsEyKcb32JOwPSHUlbeg336xUoo3SFeNGGP Wn0ZwNNzqUFZHJnBEFFJFUnlFFb+A5MSvBPHU8RegP1ustcUCUosVwVhP1DIIQTX2q0t H+sw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-94077-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94077-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id f30-20020a50a6de000000b0056734263fb6si3259518edc.585.2024.03.06.06.44.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 06:44:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-94077-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-94077-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94077-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id D50B61F21504 for ; Wed, 6 Mar 2024 14:44:39 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EBA8A13473B; Wed, 6 Mar 2024 14:44:28 +0000 (UTC) Received: from verein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BC6712FB31; Wed, 6 Mar 2024 14:44:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.95.11.211 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709736268; cv=none; b=LopTb/fKh34Npc6RH+aVljXKQu9LKnQxIKOKMWoJbAU09CeBdkWJC7Q8EE3Ah8ZNrZ0zg8v7blR49lzR0EeNi9IZz0vIFniPstE1OEOiqcpBaS1oSrTOE1c7DWdjdbES98Ql6S8yqpFs8/ldkd50uJR7LWElMOXdb1KXA5EKAAQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709736268; c=relaxed/simple; bh=Cg+/N+/d48kp+Bii6xwPc6Sv3b/fJs9pKZfyKHqumHo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ihvgbR0V/c/+G46kcT7Bcac8cuquZlRMadzCs1boqg+FblI1xFMXQnUOPOsjafPpPmu6fpSU/pgk80tmbucADUtbKT9gGPTp5fEwgKEw1CJaIz7PFpE6BXodGmpYwH5/ijMc+EDLE10issr+RjDPsHwhU8cfrbNF+/WouN1+KzE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=pass smtp.mailfrom=lst.de; arc=none smtp.client-ip=213.95.11.211 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lst.de Received: by verein.lst.de (Postfix, from userid 2407) id 6727E68C4E; Wed, 6 Mar 2024 15:44:17 +0100 (CET) Date: Wed, 6 Mar 2024 15:44:16 +0100 From: Christoph Hellwig To: Leon Romanovsky Cc: Robin Murphy , Christoph Hellwig , Marek Szyprowski , Joerg Roedel , Will Deacon , Jason Gunthorpe , Chaitanya Kulkarni , Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Zhu Yanjun Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps Message-ID: <20240306144416.GB19711@lst.de> References: <47afacda-3023-4eb7-b227-5f725c3187c2@arm.com> <20240305122935.GB36868@unreal> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240305122935.GB36868@unreal> User-Agent: Mutt/1.5.17 (2007-11-01) On Tue, Mar 05, 2024 at 02:29:35PM +0200, Leon Romanovsky wrote: > > > These advanced DMA mapping APIs are needed to calculate IOVA size to > > > allocate it as one chunk and some sort of offset calculations to know > > > which part of IOVA to map. > > > > I don't follow this part at all - at *some* point, something must know a > > range of memory addresses involved in a DMA transfer, so that's where it > > should map that range for DMA. > > In all presented cases in this series, the overall DMA size is known in > advance. In RDMA case, it is known when user registers the memory, in > VFIO, when live migration is happening and in NVMe, when BIO is created. > > So once we allocated IOVA, we will need to link ranges, which si the > same as map but without IOVA allocation. But above you say: "These advanced DMA mapping APIs are needed to calculate IOVA size to allocate it as one chunk and some sort of offset calculations to know which part of IOVA to map." this suggests you need helpers to calculate the len and offset. I can't see where that would ever make sense. The total transfer size should just be passed in by the callers and be known, and there should be no offset. > > > Instead of teaching DMA to know these specific datatypes, let's separate > > > existing DMA mapping routine to two steps and give an option to advanced > > > callers (subsystems) perform all calculations internally in advance and > > > map pages later when it is needed. > > > > From a brief look, this is clearly an awkward reinvention of the IOMMU API. > > If IOMMU-aware drivers/subsystems want to explicitly manage IOMMU address > > spaces then they can and should use the IOMMU API. Perhaps there's room for > > some quality-of-life additions to the IOMMU API to help with common usage > > patterns, but the generic DMA mapping API is absolutely not the place for > > it. > > DMA mapping gives nice abstraction from IOMMU, and allows us to have > same flow for IOMMU and non-IOMMU flows without duplicating code, while > you suggest to teach almost every part in the kernel to know about IOMMU. Except that the flows are fundamentally different for the "can coalesce" vs "can't coalesce" case. In the former we have one dma_addr_t range, and in the latter as many as there are input vectors (this is ignoring the weird iommu merging case where we we coalesce some but not all segments, but I'd rather not have that in a new API). So if we want to efficiently be able to handle these cases we need two APIs in the driver and a good framework to switch between them. Robins makes a point here that the iommu API handles the can coalesce case and he has a point as that's exactly how the IOMMU API works. I'd still prefer to wrap it with dma callers to handle things like swiotlb and maybe Xen grant tables and to avoid the type confusion between dma_addr_t and then untyped iova in the iommu layer, but having this layer or not is probably worth a discussion. > > In this series, we changed RDMA, VFIO and NVMe, and in all cases we > removed more code than added. From what I saw, VDPA and virito-blk will > benefit from proposed API too. > > Even in this RFC, where Chaitanya did partial job and didn't convert > whole driver, the gain is pretty obvious: > https://lore.kernel.org/linux-rdma/016fc02cbfa9be3c156a6f74df38def1e09c08f1.1709635535.git.leon@kernel.org/T/#u > I have no idea how that nvme patch is even supposed to work. It removes the PRP path in nvme-pci, which not only is the most common I/O path but actually required for the admin queue as NVMe doesn't support SGLs for the admin queue.