Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp404105pxv; Fri, 9 Jul 2021 00:26:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJybXknX5u4n14fMkKa7Xs09l4PRMvokZUgOjYPzNM1Yf9b85hMw3UWpWe9EPZ8A33kR0YCy X-Received: by 2002:a17:906:6ad3:: with SMTP id q19mr35450221ejs.11.1625815613192; Fri, 09 Jul 2021 00:26:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625815613; cv=none; d=google.com; s=arc-20160816; b=amBkjlkS9JS6tmPFbOfbBJ558l3vogC2dznzjrQImWapHJ9byrA1lqi2/EAV28MtFY Z1IGpp0YjxgWuSveo0PSUDQ62PQzaqrtJ6RoA9vQQcPA5JxC17RsDn1JRNRYp4IdS5ov LBhtbKI47AW/Dx6pI5jGLaHVWRomHRgaM6mREGYhXiPUxkfRfCb24K69uEK+LKtpzAlV vZfnLeyqEn2LdCMnve6LzHz8gFT0fmPbjpvtqaROzcAKjuidvxR2co5zEGTxXMeWSWE5 GXH2edehAvJ8u7DW2Dito9ckdSOUY0a6thv73JuXCOzL4p1AQzhiU9CDQdO7X9rss75D wspA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=whFUneYINRIgyp8Pel0QMZvhhIk7uCGvvk56hecjsos=; b=F+czL5eSnieNgzc9Gnq43k/IJh5UMIr5cqOq3JpzzA6BgH+QKFOHDwwl/lzkz6c0wI h5x1lzkdC4UUnWRLzBpMvIbT9TccJfRRywYGdSpOSBz2U51syYAoe8rHUInYgRiNRcKi XyBSDG9PbTlgcLm5EMLiovyblPIeg58MZpF/6+5cn843Khd7jebocEzW5E3r0BNq94XQ RE//avdGlNtYd+DPPYqZut4/zQClb4Z4Xf4Q2ndnDy5HbYK0dre0kQfpaHXaB1QcaVj4 3pz9fy73gr3fYQhd66kG9eQg1+R2Prr1Y3ODo0DkS3+K41Pv8cGdmZshQY6wjZ0ue7Cf IuSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=TDjKIe5Q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w19si6196333edv.129.2021.07.09.00.26.30; Fri, 09 Jul 2021 00:26:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=TDjKIe5Q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230420AbhGIH2Q (ORCPT + 99 others); Fri, 9 Jul 2021 03:28:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230121AbhGIH2P (ORCPT ); Fri, 9 Jul 2021 03:28:15 -0400 Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BD3DC0613DD for ; Fri, 9 Jul 2021 00:25:32 -0700 (PDT) Received: by mail-qk1-x733.google.com with SMTP id i125so8414660qke.12 for ; Fri, 09 Jul 2021 00:25:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=whFUneYINRIgyp8Pel0QMZvhhIk7uCGvvk56hecjsos=; b=TDjKIe5QOAn1zKQay0mscOVMXJczKUdKIBUyrz2/d9BHU6hpfp6DKYlgfH1hIr5XU5 CQvUjkem9fIaRR9atRbvtKtokHArjX7H6SA0d+acKx1fSSW2qOOqYhXW9aHDRSSjoYlI yNcc3dA9bh17k3UDaQEgruEFezCrz4ScdOT48= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=whFUneYINRIgyp8Pel0QMZvhhIk7uCGvvk56hecjsos=; b=BtrOjYWDoIlAA8/UezVjoJzzI6N8K2WUtEDCGu/ABtgYM35uuu8YS0YfnxhYbUKjPC IMkRw2ttOu1qMJ10oFESoCl8L8y7UuCZVmOSUDg90jhuDhntURfMHv9bPzeR5oDyC8/m HVqkDonHXuSPWyrf6W8ETDQ0SqPIuYFyVRBv7is+tBUD4RrRFmYH2C2xMOhdyShzH7Ad Y6aEwKoGt5lV05QkVWz4J0Hwdt7Vy6W1yEaN6qfmhuNHGINGtXQdGxljoUG666H/dbxD cMW7KDk1uMN5a3Tot1BCYjz3UCBbf4970ZJE6MiYybMN/7eczIk7QwV7yHXcVwAC2TV/ YCAQ== X-Gm-Message-State: AOAM5302rPXuz7VEV7Q6Y1BAEwMETVXQ0KFYeoFIgF0qIhplbpm8/G07 8uj7sqz77R7R2TXa2tRIWMhnHBRqRkldw4MMK5lWNg== X-Received: by 2002:a37:644f:: with SMTP id y76mr12978776qkb.194.1625815531525; Fri, 09 Jul 2021 00:25:31 -0700 (PDT) MIME-Version: 1.0 References: <20210707075505.2896824-1-stevensd@google.com> In-Reply-To: From: David Stevens Date: Fri, 9 Jul 2021 16:25:20 +0900 Message-ID: Subject: Re: [PATCH 0/4] Add dynamic iommu backed bounce buffers To: Robin Murphy Cc: Joerg Roedel , Will Deacon , Christoph Hellwig , Sergey Senozhatsky , iommu@lists.linux-foundation.org, open list , David Stevens Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 9, 2021 at 2:14 AM Robin Murphy wrote: > > On 2021-07-08 10:29, Joerg Roedel wrote: > > Adding Robin too. > > > > On Wed, Jul 07, 2021 at 04:55:01PM +0900, David Stevens wrote: > >> Add support for per-domain dynamic pools of iommu bounce buffers to the > >> dma-iommu API. This allows iommu mappings to be reused while still > >> maintaining strict iommu protection. Allocating buffers dynamically > >> instead of using swiotlb carveouts makes per-domain pools more amenable > >> on systems with large numbers of devices or where devices are unknown. > > But isn't that just as true for the currently-supported case? All you > need is a large enough Thunderbolt enclosure and you could suddenly plug > in a dozen untrusted GPUs all wanting to map hundreds of megabytes of > memory. If there's a real concern worth addressing, surely it's worth > addressing properly for everyone. Bounce buffers consume memory, so there is always going to be some limitation on how many devices are supported. This patch series limits the memory consumption at a given point in time to approximately the amount of active DMA transactions. There's really no way to improve significantly on that. The 'approximately' qualification could be removed by adding a shrinker, but that doesn't change things materially. This is compared to reusing swiotlb, where the amount of memory consumed would be the largest amount of active DMA transactions you want bounce buffers to handle. I see two concrete shortcomings here. First, most of the time you're not doing heavy IO, especially for consumer workloads. Second, it raises the problem of per-device tuning, since you don't want to waste performance by having too few bounce buffers but you also don't want to waste memory by preallocating too many bounce buffers. This tuning becomes more problematic once you start dealing with external devices. Also, although this doesn't directly address the raised concern, the bounce buffers are only used for relatively small DMA transactions. So large allocations like framebuffers won't actually consume extra memory via bounce buffers. > >> When enabled, all non-direct streaming mappings below a configurable > >> size will go through bounce buffers. Note that this means drivers which > >> don't properly use the DMA API (e.g. i915) cannot use an iommu when this > >> feature is enabled. However, all drivers which work with swiotlb=force > >> should work. > >> > >> Bounce buffers serve as an optimization in situations where interactions > >> with the iommu are very costly. For example, virtio-iommu operations in > >> a guest on a linux host require a vmexit, involvement the VMM, and a > >> VFIO syscall. For relatively small DMA operations, memcpy can be > >> significantly faster. > > Yup, back when the bounce-buffering stuff first came up I know > networking folks were interested in terms of latency for small packets - > virtualised IOMMUs are indeed another interesting case I hadn't thought > of. It's definitely been on the radar as another use-case we'd like to > accommodate with the bounce-buffering scheme. However, that's the thing: > bouncing is bouncing and however you look at it it still overlaps so > much with the untrusted case - there's no reason that couldn't use > pre-mapped bounce buffers too, for instance - that the only necessary > difference is really the policy decision of when to bounce. iommu-dma > has already grown complicated enough, and having *three* different ways > of doing things internally just seems bonkers and untenable. Pre-map the > bounce buffers? Absolutely. Dynamically grow them on demand? Yes please! > Do it all as a special thing in its own NIH module and leave the > existing mess to rot? Sorry, but no. I do agree that iommu-dma is getting fairly complicated. Since a virtualized IOMMU uses bounce buffers much more heavily than sub-granule untrusted DMA, and for the reasons stated earlier in this email, I don't think pre-allocated bounce buffers are viable for the virtualized IOMMU case. I can look at migrating the sub-granule untrusted DMA case to dynamic bounce buffers, if that's an acceptable approach. -David > Thanks, > Robin. > > >> As a performance comparison, on a device with an i5-10210U, I ran fio > >> with a VFIO passthrough NVMe drive with '--direct=1 --rw=read > >> --ioengine=libaio --iodepth=64' and block sizes 4k, 16k, 64k, and > >> 128k. Test throughput increased by 2.8x, 4.7x, 3.6x, and 3.6x. Time > >> spent in iommu_dma_unmap_(page|sg) per GB processed decreased by 97%, > >> 94%, 90%, and 87%. Time spent in iommu_dma_map_(page|sg) decreased > >> by >99%, as bounce buffers don't require syncing here in the read case. > >> Running with multiple jobs doesn't serve as a useful performance > >> comparison because virtio-iommu and vfio_iommu_type1 both have big > >> locks that significantly limit mulithreaded DMA performance. > >> > >> This patch set is based on v5.13-rc7 plus the patches at [1]. > >> > >> David Stevens (4): > >> dma-iommu: add kalloc gfp flag to alloc helper > >> dma-iommu: replace device arguments > >> dma-iommu: expose a few helper functions to module > >> dma-iommu: Add iommu bounce buffers to dma-iommu api > >> > >> drivers/iommu/Kconfig | 10 + > >> drivers/iommu/Makefile | 1 + > >> drivers/iommu/dma-iommu.c | 119 ++++-- > >> drivers/iommu/io-buffer-pool.c | 656 +++++++++++++++++++++++++++++++++ > >> drivers/iommu/io-buffer-pool.h | 91 +++++ > >> include/linux/dma-iommu.h | 12 + > >> 6 files changed, 861 insertions(+), 28 deletions(-) > >> create mode 100644 drivers/iommu/io-buffer-pool.c > >> create mode 100644 drivers/iommu/io-buffer-pool.h > >> > >> -- > >> 2.32.0.93.g670b81a890-goog