Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1081795rwd; Tue, 16 May 2023 11:29:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5ktEYOx9Gp8zECFbhbj9UnnxO8aWYFLl4L/5OQ9EA24JpMyorUUUUJruonOsDuLyhyeHGF X-Received: by 2002:a05:6a20:2585:b0:105:2e9e:b13a with SMTP id k5-20020a056a20258500b001052e9eb13amr13351120pzd.8.1684261777678; Tue, 16 May 2023 11:29:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684261777; cv=none; d=google.com; s=arc-20160816; b=s0FaRPtKyINdGigmuRosBijAgJInNCWLx1tbR+kjgjhh+AfbrNGCfQ0ck68JaGI350 V5GM7DDFtqFiJQGikL6fYzDGSnFffjYGbxUFfRict8nj3lUldO++MaLpXldy+7CHMqi2 CXQeiU2y+qN64Ab87PuKBXpdZg4SctKObxFKoYgYAUTMNrs9vxtbKBG7QfaCUqRNsUEs AVcr1cAow+2fgj2Z+NR+qRjB3EJQHpKObhQO+X/AnMz4XCZ/LTlkPyBJuMWlbAbTv/44 j31bHXYiDbf/XMwlBMGRomMPSJiD9XwQgIVVE8UXMXmeoM8kvVbz9dVX9i26ehSojLZ/ fu6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=M/eW1WRJrrMmoS25uP/cdL5q3bIpTk22IXvfMbQjF/w=; b=kUQmKj1YChZHqkhSKJW0VD4hacMAc7zhQ8OTxO4zowiRRLrJGgvPsTHina/D86p61F aqb3MPxtjwv1zKkQk1761Jb8qOm+pkDnDSZpmL0LhQqOrDBiOa//w9YAItn49LqoZH1C +SuymY19fh1acAXUW7jo+v6uQrAcutvp7d7aUx5f8O11CluHBc8XpJNR1nbj9UFrIF47 JQ9A4Br6S3gCRv7DEuT8iAgeaZPi0MKnghV5dxuF66EevI1H1RErER9oS8/21TMZlCAL lyinDz3JQnNEtXPOSNqb0lg2iR6g2qi47y3xXMHOu2as/4lH7NAq3BCNthv++faw1wFt eo3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h18-20020aa796d2000000b00648628d30f5si16111371pfq.379.2023.05.16.11.29.23; Tue, 16 May 2023 11:29:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231808AbjEPR7m (ORCPT + 99 others); Tue, 16 May 2023 13:59:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229527AbjEPR7l (ORCPT ); Tue, 16 May 2023 13:59:41 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A03719B; Tue, 16 May 2023 10:59:39 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3F31563CA4; Tue, 16 May 2023 17:59:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 99991C433D2; Tue, 16 May 2023 17:59:33 +0000 (UTC) Date: Tue, 16 May 2023 18:59:30 +0100 From: Catalin Marinas To: Petr =?utf-8?B?VGVzYcWZw61r?= Cc: Christoph Hellwig , "Michael Kelley (LINUX)" , Petr Tesarik , Jonathan Corbet , Greg Kroah-Hartman , "Rafael J. Wysocki" , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Marek Szyprowski , Robin Murphy , "Paul E. McKenney" , Borislav Petkov , Randy Dunlap , Damien Le Moal , Kim Phillips , "Steven Rostedt (Google)" , Andy Shevchenko , Hans de Goede , Jason Gunthorpe , Kees Cook , Thomas Gleixner , "open list:DOCUMENTATION" , open list , "open list:DRM DRIVERS" , "open list:DMA MAPPING HELPERS" , Roberto Sassu , Kefeng Wang Subject: Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers Message-ID: References: <346abecdb13b565820c414ecf3267275577dbbf3.1683623618.git.petr.tesarik.ext@huawei.com> <20230516061309.GA7219@lst.de> <20230516083942.0303b5fb@meshulam.tesarici.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230516083942.0303b5fb@meshulam.tesarici.cz> X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 16, 2023 at 08:39:42AM +0200, Petr Tesařík wrote: > On Tue, 16 May 2023 08:13:09 +0200 > Christoph Hellwig wrote: > > On Mon, May 15, 2023 at 07:43:52PM +0000, Michael Kelley (LINUX) wrote: > > > FWIW, I don't think the approach you have implemented here will be > > > practical to use for CoCo VMs (SEV, TDX, whatever else). The problem > > > is that dma_direct_alloc_pages() and dma_direct_free_pages() must > > > call dma_set_decrypted() and dma_set_encrypted(), respectively. In CoCo > > > VMs, these calls are expensive because they require a hypercall to the host, > > > and the operation on the host isn't trivial either. I haven't measured the > > > overhead, but doing a hypercall on every DMA map operation and on > > > every unmap operation has long been something we thought we must > > > avoid. The fixed swiotlb bounce buffer space solves this problem by > > > doing set_decrypted() in batch at boot time, and never > > > doing set_encrypted(). > > > > I also suspect it doesn't really scale too well due to the number of > > allocations. I suspect a better way to implement things would be to > > add more large chunks that are used just like the main swiotlb buffers. > > > > That is when we run out of space try to allocate another chunk of the > > same size in the background, similar to what we do with the pool in > > dma-pool.c. This means we'll do a fairly large allocation, so we'll > > need compaction or even CMA to back it up, but the other big upside > > is that it also reduces the number of buffers that need to be checked > > in is_swiotlb_buffer or the free / sync side. > > I have considered this approach. The two main issues I ran into were: > > 1. MAX_ORDER allocations were too small (at least with 4K pages), and > even then they would often fail. > > 2. Allocating from CMA did work but only from process context. > I made a stab at modifying the CMA allocator to work from interrupt > context, but there are non-trivial interactions with the buddy > allocator. Making them safe from interrupt context looked like a > major task. Can you kick off a worker thread when the swiotlb allocation gets past some reserve limit? It still has a risk of failing to bounce until the swiotlb buffer is extended. > I also had some fears about the length of the dynamic buffer list. I > observed maximum length for block devices, and then it roughly followed > the queue depth. Walking a few hundred buffers was still fast enough. > I admit the list length may become an issue with high-end NVMe and > I/O-intensive applications. You could replace the list with an rbtree, O(log n) look-up vs O(n), could be faster if you have many bounces active. -- Catalin