Received: by 2002:ab2:788f:0:b0:1ee:8f2e:70ae with SMTP id b15csp361895lqi; Wed, 6 Mar 2024 22:01:47 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWOhZK6Bmy8ufaS3guYVHBkMm+DGLpAWzfTq8Hg47DSc7Yv+d+IWSTUm/ray6iAQUjWrf5kAWfxE1n3jn53eUmhsJ7OR0VBm7Ar4bvRug== X-Google-Smtp-Source: AGHT+IEabDyVbcsigHoE38Hb2S1APOyJdcUyULyx6PGiknU2qlAU0O5ZcflZVzMph29R/YUYDnhF X-Received: by 2002:a81:54c6:0:b0:609:6705:f740 with SMTP id i189-20020a8154c6000000b006096705f740mr15685683ywb.23.1709791307117; Wed, 06 Mar 2024 22:01:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709791307; cv=pass; d=google.com; s=arc-20160816; b=sxurLLQoH6Q/x/uqkAaHnOJC1H136x0nTrnMDnF3gZWs9hZUvRJNMAjnrO9OuPfhHb zqhYh68T58FMebcQdeN8uy/1bxiFNKHO6eEFLDYXzAr9yupcmaBZ+L/XicrfXWSvr9OO I186YX6Ouui83mTKDMngrUF/dfynVssewFXqJku/taqDTRx6xqvdhyonymzyvY97ezkS 7hWnrxgKXXDD0+pB9MM61YNkx923gGESKHHep6nyhesVOL6EknihiYSZEiQ13tnOYheq t9fVM2OiI3HtqM+WA+YUG5RoJrZo2cz8wkWLhCpVu1ikgcmSr8ftMZ5v6kXcHs29tvII Pkcg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :date:dkim-signature:message-id; bh=60EtCnJ2FcNiwpYPsbvwvWYPwkzfqNpyu9Z59YrAUNk=; fh=t1qw6OE0U9pLC+acr83n1Ye3fO5XVb7d+VW/KSmdFtg=; b=L+dv7v0u9FAt+qdCTjGqwHtOZd0EEHz8xT7omsHlHLxuOpjndU4aoymuYxh0Sir3iz tGsoxXdcoGO5+3/awS05nd/6z+dMr/MM7tTywgEn1XbpymbY/ZNFNnMQOy9dobtmYX1l szof74DhVTWqtFLD6B1MidQkRB2rsv9F03GvT6KkpUDfqa5zibFkkxfwXVrAoEw1PuFm i9eeZwYmUELHqveRXGM9p+vqp6vXaV7hoXHQyCIJWgzI1xcbscKtBt1yvoa2mIW7rmwG Wj71/oTpgii/iv58T6hTEukqFNw+tnQBLB4ujVhJT95m4o/LOJwp2cpuVPGCaGpSLUop cbxw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=UdrEz7Ip; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-94991-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94991-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id p6-20020a05622a13c600b0042ef4fb3b28si8071506qtk.341.2024.03.06.22.01.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Mar 2024 22:01:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-94991-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=UdrEz7Ip; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-94991-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-94991-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 945C91C22707 for ; Thu, 7 Mar 2024 06:01:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1DA981BF50; Thu, 7 Mar 2024 06:01:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="UdrEz7Ip" Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C90A1BF2B for ; Thu, 7 Mar 2024 06:01:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709791282; cv=none; b=YzzvecXTOIP6mf7RHy4nJ+cEJzYIiNWXJCynDjdbzkS2xndM1hsJJqw2dMsa5iIhI5XTUAdlTTnXEdDrF6nTAOwNgr3pDUTmwunoxtt5cItxGGDA9390h3QPFEkfGrPHbdcFs0S4VFiSTfASi+KNyLH0RpYP1Im5oKgnn0rpWYM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709791282; c=relaxed/simple; bh=R7T1A9aShHl0L6/UdxvdCTU/uPqE6auAGtfkHOOkbmE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=nZKbIrs7JEFoAToZd3/yc8F+c6sw6G6lw92/EPRVxql9p2AvHdD3a7BdbbI0X+rvK6sm+J9/wqQ0Y32VZal4LUiReVV/wpHpDibGR8rQbR77eOWWk1+iS5r8pIs0jbCUVcJvEAt3NUXRId91kJsmEgqqAb8JWvDGQCMguCgnF28= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=UdrEz7Ip; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709791277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=60EtCnJ2FcNiwpYPsbvwvWYPwkzfqNpyu9Z59YrAUNk=; b=UdrEz7IpJ/hQTxYW/BcPojv+mE+JtKstiGC5/ykSqHqvqIVDyZD9EJtSOWlo6ETYFVgqq8 fosDmVE2Q9bLFrmKiozlqtdTnlwbbSCfR4CsiQiKbX+TKJa6MWcCol072W3TBMYup5s85G WD3PQttcSGwrC7B3Ln676VsoeTLnDmg= Date: Thu, 7 Mar 2024 07:01:10 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps To: Leon Romanovsky , Christoph Hellwig , Robin Murphy , Marek Szyprowski , Joerg Roedel , Will Deacon , Jason Gunthorpe , Chaitanya Kulkarni Cc: Jonathan Corbet , Jens Axboe , Keith Busch , Sagi Grimberg , Yishai Hadas , Shameer Kolothum , Kevin Tian , Alex Williamson , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, Bart Van Assche , Damien Le Moal , Amir Goldstein , "josef@toxicpanda.com" , "Martin K. Petersen" , "daniel@iogearbox.net" , Dan Williams , "jack@suse.com" , Leon Romanovsky , Zhu Yanjun References: X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Zhu Yanjun In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT 在 2024/3/5 12:18, Leon Romanovsky 写道: > This is complimentary part to the proposed LSF/MM topic. > https://lore.kernel.org/linux-rdma/22df55f8-cf64-4aa8-8c0b-b556c867b926@linux.dev/T/#m85672c860539fdbbc8fe0f5ccabdc05b40269057 I am interested in this topic. Hope I can join the meeting to discuss this topic. Zhu Yanjun > > This is posted as RFC to get a feedback on proposed split, but RDMA, VFIO and > DMA patches are ready for review and inclusion, the NVMe patches are still in > progress as they require agreement on API first. > > Thanks > > ------------------------------------------------------------------------------- > The DMA mapping operation performs two steps at one same time: allocates > IOVA space and actually maps DMA pages to that space. This one shot > operation works perfectly for non-complex scenarios, where callers use > that DMA API in control path when they setup hardware. > > However in more complex scenarios, when DMA mapping is needed in data > path and especially when some sort of specific datatype is involved, > such one shot approach has its drawbacks. > > That approach pushes developers to introduce new DMA APIs for specific > datatype. For example existing scatter-gather mapping functions, or > latest Chuck's RFC series to add biovec related DMA mapping [1] and > probably struct folio will need it too. > > These advanced DMA mapping APIs are needed to calculate IOVA size to > allocate it as one chunk and some sort of offset calculations to know > which part of IOVA to map. > > Instead of teaching DMA to know these specific datatypes, let's separate > existing DMA mapping routine to two steps and give an option to advanced > callers (subsystems) perform all calculations internally in advance and > map pages later when it is needed. > > In this series, three users are converted and each of such conversion > presents different positive gain: > 1. RDMA simplifies and speeds up its pagefault handling for > on-demand-paging (ODP) mode. > 2. VFIO PCI live migration code saves huge chunk of memory. > 3. NVMe PCI avoids intermediate SG table manipulation and operates > directly on BIOs. > > Thanks > > [1] https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@klimt.1015granger.net > > Chaitanya Kulkarni (2): > block: add dma_link_range() based API > nvme-pci: use blk_rq_dma_map() for NVMe SGL > > Leon Romanovsky (14): > mm/hmm: let users to tag specific PFNs > dma-mapping: provide an interface to allocate IOVA > dma-mapping: provide callbacks to link/unlink pages to specific IOVA > iommu/dma: Provide an interface to allow preallocate IOVA > iommu/dma: Prepare map/unmap page functions to receive IOVA > iommu/dma: Implement link/unlink page callbacks > RDMA/umem: Preallocate and cache IOVA for UMEM ODP > RDMA/umem: Store ODP access mask information in PFN > RDMA/core: Separate DMA mapping to caching IOVA and page linkage > RDMA/umem: Prevent UMEM ODP creation with SWIOTLB > vfio/mlx5: Explicitly use number of pages instead of allocated length > vfio/mlx5: Rewrite create mkey flow to allow better code reuse > vfio/mlx5: Explicitly store page list > vfio/mlx5: Convert vfio to use DMA link API > > Documentation/core-api/dma-attributes.rst | 7 + > block/blk-merge.c | 156 ++++++++++++++ > drivers/infiniband/core/umem_odp.c | 219 +++++++------------ > drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + > drivers/infiniband/hw/mlx5/odp.c | 59 +++-- > drivers/iommu/dma-iommu.c | 129 ++++++++--- > drivers/nvme/host/pci.c | 220 +++++-------------- > drivers/vfio/pci/mlx5/cmd.c | 252 ++++++++++++---------- > drivers/vfio/pci/mlx5/cmd.h | 22 +- > drivers/vfio/pci/mlx5/main.c | 136 +++++------- > include/linux/blk-mq.h | 9 + > include/linux/dma-map-ops.h | 13 ++ > include/linux/dma-mapping.h | 39 ++++ > include/linux/hmm.h | 3 + > include/rdma/ib_umem_odp.h | 22 +- > include/rdma/ib_verbs.h | 54 +++++ > kernel/dma/debug.h | 2 + > kernel/dma/direct.h | 7 +- > kernel/dma/mapping.c | 91 ++++++++ > mm/hmm.c | 34 +-- > 20 files changed, 870 insertions(+), 605 deletions(-) >