Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp492453rbb; Sat, 24 Feb 2024 09:07:49 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXbEks24vpryfgPU5J0dbdqcxGmMlPOyB0PeP45/BYMJaud6Dz648UYHxnkwS9kT0LuvEWKU5ZSm5zDaXgXVRzZ3i41Kzup8rm8edegPA== X-Google-Smtp-Source: AGHT+IHEDQ7+p98gKfAJQcbvAs69O0JruVd4HzB0Jslx4eH4sbExu/J8aXIr3MBzF8vBtVrwCi4d X-Received: by 2002:a17:903:1c1:b0:1da:933:fb15 with SMTP id e1-20020a17090301c100b001da0933fb15mr4076493plh.0.1708794469707; Sat, 24 Feb 2024 09:07:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708794469; cv=pass; d=google.com; s=arc-20160816; b=JmOhA95k63DU0+vi4EGjWkT8MTsZpdNZSxjFNSTdgCSXgV3hYcHvUB044U8kGjLNLK nleDGPEs8Ah0R6joYwV7AruoLar6j7VZg4kXoZcEUdFgenfU6zxrddNJPgMhSFIFa7gI 683A09oXS7tBCtMBBh6lpnZ6rLHD10OnWuMj5hTZCvjL44HMSY7YYZKBOKk6BgpIrgip Gzy40PPnmbfNn0SR5Jdevt3fpJREtq3fDHky7bQ6HuPr4wC8OyVpFJY+b89ES901B/M7 LqKcG2o90Gq4w63BLtqU2pikK6Y2gy7vbFvTSyRO6C0P+84tgPI8KT0UU34dfSeyaw+C v/XA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=pkEL+dazk0pG/rsK4xpDTlcJ2SMxur3lfWU/aiLk45M=; fh=Q9ym7u50IIQaPfiOPTt/p+xh7Sil4J++Gp3cAiIq9YU=; b=pLS11AmC3krm1hJ5ap2Ff++3B7qM/yEuktleIcvqmUIWDkHiNDjpjDQbYCxZf6dIK+ HJ/PyeeHadZYLNea1PnscEfFiD3x+4T6Q5qnQ/qSm85V/zH08DkuKa2XNUTHQzVhaMXN eA4JBy4QHaxjdLUO2HXavZaGICluiliVWLqFeqmHtNwHWFrnsbcy2HIZ/FYRWT0iBno7 eICPBYDhSelCLLu9MHSRG9vwDHI/Lr1tAcHIAk5H+ORImHpWCmjwqSr9c3ixabyx6iDU rZneEKp71dcmQRIGmXIOIMWTa+mOYPiXH2NZ0pamuWGl6kvWyVldvRv5Z5jmEjfu+V7X JJog==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=ecUldYwM; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-79754-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-79754-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id a1-20020a17090a8c0100b0029aa07ed7efsi890608pjo.25.2024.02.24.09.07.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 24 Feb 2024 09:07:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-79754-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=ecUldYwM; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-79754-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-79754-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 6470D2833DE for ; Sat, 24 Feb 2024 17:07:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3582243169; Sat, 24 Feb 2024 17:07:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ecUldYwM" Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 880D9481DA for ; Sat, 24 Feb 2024 17:07:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708794457; cv=none; b=YxAnIC7dK31+/zVdjxHSua7VVXkukKr1NaSbWWwTYcBWq1S7m9B/xQF/lsNPqOstyWgmRkdRRhJvi0hUE1teMEQR7sYoupvZjR1yngudXBYU6JDrJPhF3uisA+F2QEo+5+m6fAFveL8smu8wFtJ3+p8s2ciVM9Tg9ktJjB5USWM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708794457; c=relaxed/simple; bh=ozIDoFedItkmER/Orkt229o6FySIYQ15J2Qcqcu6KnU=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=iESOQ1X4QoPhCsDaX+BeyE6T8JPuHTl+DB6XKJGKmpRqITIzs72TlyGeRIR9NCVGGPvQnTp3vHTfmZpS8MlNDUEUiSTj4CeXFE0fdS27pFkkpgsdHbtfv0PfVM8m2jNB1fNRL/vF5pQ0AObfMjL6TApHmpTYKBPDJrL0zIhmRdQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ecUldYwM; arc=none smtp.client-ip=209.85.219.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-68ff93c5a15so581216d6.1 for ; Sat, 24 Feb 2024 09:07:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1708794454; x=1709399254; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pkEL+dazk0pG/rsK4xpDTlcJ2SMxur3lfWU/aiLk45M=; b=ecUldYwM3/O6p70SFoLUhqjfgym57h3/MhjF/L3MXrq/M419tc558AeCQrAk/KqJdb CxNRvojUP66VIvaaLFoTr9LVRpDdm9wyMs5oaHuZyJaOeqHajKia86u8nV1OaI6+rK97 TqduFNQ4+MUWH1VBT+XxFqt3ylpTuQ+MmCLpARRRmfg3t0hao7x90OLsJ5TXcSQkrwtN 8Et7MzyXu/Dp3saeh2suDTsCXk9pQiOcC+oqHMP6uJiO2LzUptZ/S28PawRzzbyL+FtF r+Rr6dUWag6smXMfYP2E7zfKdYDHaH77u4ok1j20gfQhKKU2gKWQFSIS4T2xEjxz3Z3W IXxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708794454; x=1709399254; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pkEL+dazk0pG/rsK4xpDTlcJ2SMxur3lfWU/aiLk45M=; b=SC7D5kU4P6O6jPaRBrOLJX1JnFmx2esbvW9vsrqnej0zEh7L28UAfAsQHf0VPPDQwZ LpD/0V9zYHwSkhZ8cwDH8Q2Ey5wOFxzZNaSPLQJCSz+y8dzkmMPVXx/lea24Lqa0Fart YBi95EAAeiWIpih/LevnoUO4HDdQ17KcHuIFvLO20mEv1HLV3O5jsryjY/MvjNm9wNXr cou11XYPSsDd7+yy6xMXc3NaIA6+FmrgOd8b+RsfGfzc0+HDvVVqVEHVCu4qZN77TBfX S+hPt2XHhqiuddjZNhzQtZy0LXVHkZbhaAotkafWnxoysGEmb/Au8m7apd7OMpw1CB5t tvVQ== X-Forwarded-Encrypted: i=1; AJvYcCUqbcD62ka4dHW5LBM0YdWoTfPl0oLA24g//xJjPu4hC5JRd84PfmBAeGfM49U4RHeiphqsjwRPUmqnERHINZXTxgoFbzARLvjahXYy X-Gm-Message-State: AOJu0YyYnbFXqeB/UepC+mPG17+uE97gZYj/rF3qrpMYso/s+IgJgtsw hwWqewRH7D5uGNHkrdoOWmKY7WqkGvPVdXxX+t/3BhCzq4ajb5tXhP2nwr9c/ugasvtUPTMlGgf 4afvl8tUTUNGTD/91Tkd2XOyCRLPmi/sUaD+b X-Received: by 2002:a05:6214:27ca:b0:68f:e43d:a253 with SMTP id ge10-20020a05621427ca00b0068fe43da253mr3689819qvb.2.1708794454214; Sat, 24 Feb 2024 09:07:34 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240112055251.36101-1-vannapurve@google.com> <20240112055251.36101-2-vannapurve@google.com> <8a6dabdf-dc11-4989-b6b4-b49871ff9ca6@amazon.com> In-Reply-To: From: Vishal Annapurve Date: Sat, 24 Feb 2024 22:37:19 +0530 Message-ID: Subject: Re: [RFC V1 1/5] swiotlb: Support allocating DMA memory from SWIOTLB To: Michael Kelley Cc: Alexander Graf , "Kirill A. Shutemov" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "pbonzini@redhat.com" , "rientjes@google.com" , "seanjc@google.com" , "erdemaktas@google.com" , "ackerleytng@google.com" , "jxgao@google.com" , "sagis@google.com" , "oupton@google.com" , "peterx@redhat.com" , "vkuznets@redhat.com" , "dmatlack@google.com" , "pgonda@google.com" , "michael.roth@amd.com" , "thomas.lendacky@amd.com" , "dave.hansen@linux.intel.com" , "linux-coco@lists.linux.dev" , "chao.p.peng@linux.intel.com" , "isaku.yamahata@gmail.com" , "andrew.jones@linux.dev" , "corbet@lwn.net" , "hch@lst.de" , "m.szyprowski@samsung.com" , "rostedt@goodmis.org" , "iommu@lists.linux.dev" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Feb 16, 2024 at 1:56=E2=80=AFAM Michael Kelley wrote: > > From: Alexander Graf Sent: Thursday, February 15, 2024 = 1:44 AM > > > > On 15.02.24 04:33, Vishal Annapurve wrote: > > > On Wed, Feb 14, 2024 at 8:20=E2=80=AFPM Kirill A. Shutemov > > wrote: > > >> On Fri, Jan 12, 2024 at 05:52:47AM +0000, Vishal Annapurve wrote: > > >>> Modify SWIOTLB framework to allocate DMA memory always from SWIOTLB= . > > >>> > > >>> CVMs use SWIOTLB buffers for bouncing memory when using dma_map_* A= PIs > > >>> to setup memory for IO operations. SWIOTLB buffers are marked as sh= ared > > >>> once during early boot. > > >>> > > >>> Buffers allocated using dma_alloc_* APIs are allocated from kernel = memory > > >>> and then converted to shared during each API invocation. This patch= ensures > > >>> that such buffers are also allocated from already shared SWIOTLB > > >>> regions. This allows enforcing alignment requirements on regions ma= rked > > >>> as shared. > > >> But does it work in practice? > > >> > > >> Some devices (like GPUs) require a lot of DMA memory. So with this a= pproach > > >> we would need to have huge SWIOTLB buffer that is unused in most VMs= . > > >> > > > Current implementation limits the size of statically allocated SWIOTL= B > > > memory pool to 1G. I was proposing to enable dynamic SWIOTLB for CVMs > > > in addition to aligning the memory allocations to hugepage sizes, so > > > that the SWIOTLB pool can be scaled up on demand. > > > > > Vishal -- > > When the dynamic swiotlb mechanism tries to grow swiotlb space > by adding another pool, it gets the additional memory as a single > physically contiguous chunk using alloc_pages(). It starts by trying > to allocate a chunk the size of the original swiotlb size, and if that > fails, halves the size until it gets a size where the allocation succeeds= . > The minimum size is 1 Mbyte, and if that fails, the "grow" fails. > Thanks for pointing this out. > So it seems like dynamic swiotlb is subject to the almost the same > memory fragmentation limitations as trying to allocate huge pages. > swiotlb needs a minimum of 1 Mbyte contiguous in order to grow, > while huge pages need 2 Mbytes, but either is likely to be > problematic in a VM that has been running a while. With that > in mind, I'm not clear on the benefit of enabling dynamic swiotlb. > It seems like it just moves around the problem of needing high order > contiguous memory allocations. Or am I missing something? > Currently the SWIOTLB pool is limited to 1GB in size. Kirill has pointed out that devices like GPUs could need a significant amount of memory to be allocated from the SWIOTLB pool. Without dynamic SWIOTLB, such devices run the risk of memory exhaustion without any hope of recovery. In addition, I am proposing to have dma_alloc_* APIs to use the SWIOTLB area as well, adding to the memory pressure. If there was a way to calculate the maximum amount of memory needed for all dma allocations for all possible devices used by CoCo VMs then one can use that number to preallocate SWIOTLB pool. I am arguing that calculating the maximum bound would be difficult and instead of trying to calculate it, allowing SWIOTLB to scale dynamically would be better since it provides better . So if the above argument for enabling dynamic SWIOTLB makes sense then it should be relatively easy to append hugepage alignment restrictions for SWIOTLB pool increments (inline with the fact that 2MB vs 1MB size allocations are nearly equally prone to failure due to memory fragmentation). > Michael > > > > The issue with aligning the pool areas to hugepages is that hugepage > > > allocation at runtime is not guaranteed. Guaranteeing the hugepage > > > allocation might need calculating the upper bound in advance, which > > > defeats the purpose of enabling dynamic SWIOTLB. I am open to > > > suggestions here. > > > > > > You could allocate a max bound at boot using CMA and then only fill int= o > > the CMA area when SWIOTLB size requirements increase? The CMA region > > will allow movable allocations as long as you don't require the CMA spa= ce. > > > > > > Alex >