Received: by 2002:ab2:3141:0:b0:1ed:23cc:44d1 with SMTP id i1csp442604lqg; Fri, 1 Mar 2024 09:42:43 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVlg1+Fv+x/GYMz2cpupbV6fFHH3jjRn42D39Awok5m3ndHazacvc9NsKEUbXg0Nvw9I/nLzxlhxhqE2sQ3t//H4Pjov6OPbQFA3GF5EQ== X-Google-Smtp-Source: AGHT+IEGccFImWB+PK4Rh1ocUp2a939y9/g58C9gJHBGEjvOEQCDY/CU7AbZbs6wBUrRtu7qrfX1 X-Received: by 2002:a17:90a:d70e:b0:299:5913:db1b with SMTP id y14-20020a17090ad70e00b002995913db1bmr2325971pju.14.1709314962665; Fri, 01 Mar 2024 09:42:42 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709314962; cv=pass; d=google.com; s=arc-20160816; b=DnhbhsWvj6sfVN5sYlI3J7juH4NdY7ekr2m8Qdc1MfwyjiSwL/xXNqxBULIChbPitr kMTW4OrR4mkVNHCsoiRnNxyaHsvvVOLXrUD4U7yEf4ePsr2i23JZFH/UBJuJBeqKfjyP R2blcDT4eh9cO1wC+7d7aIkM1vsMmcvbcZ0X3ah1d7/iFGeGK5yMvP7p3JRJQ5DoAyQ0 0Qbd1nm64xtjOIhSyVtK5/kr07Oe0EO7cjVo/DjLL8W2n3ziPKFaSqcZzW4L4J4vhcoP iBfdhEOxTePm/Blncig1eKhC1BpMM5mBMC5CFb3l9T6SNAIiwMF12KpsOsyiFTLRym+c HFMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=y5F9nq6Ty0/dF9UJva7lXz28jL0DsAwabnPmvR0UHIE=; fh=wVOxbXQt7McHwT1Ph3SewFU5h4SGFK1x/O6dX82lPp0=; b=zlmuiW5MMDK2ckI/j0Ia0MEGJSTPjwTCb3bzHU8evPvka18wd2ROn8EvF5yQmQvkop 6AzYgLgjLM/lwL72i9+GcjO9LruMKVR45Y4oMMu0ZrI29LzyagBzPjFcVnW4S5dlicqU 71D81leO+z8/zNmsCSa6Dbsgh2PUrqTBw6OfYxJ7x4z5gzbu5R3RNrf6aRdcZdvbOupt AZ0H6EM4wON2oCa+Ro9Ci1/GogTf+ebG9DffqvfzQhQb0SfgjoFUwAurg/k+JkUCQJZ/ KFJj8jdS0jKy3bpOEm7cxF7fVSV7mV8dUOY9epLlbazOIuisELJmgsGtHFb/JoVeHvxR YsTw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@tesarici.cz header.s=mail header.b="q8/zY7Ut"; arc=pass (i=1 spf=pass spfdomain=tesarici.cz dkim=pass dkdomain=tesarici.cz dmarc=pass fromdomain=tesarici.cz); spf=pass (google.com: domain of linux-kernel+bounces-88839-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88839-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tesarici.cz Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id w2-20020a634902000000b005d64d951c89si3895464pga.143.2024.03.01.09.42.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Mar 2024 09:42:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-88839-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@tesarici.cz header.s=mail header.b="q8/zY7Ut"; arc=pass (i=1 spf=pass spfdomain=tesarici.cz dkim=pass dkdomain=tesarici.cz dmarc=pass fromdomain=tesarici.cz); spf=pass (google.com: domain of linux-kernel+bounces-88839-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88839-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tesarici.cz Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id A748028F941 for ; Fri, 1 Mar 2024 17:34:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9AC171BF2A; Fri, 1 Mar 2024 17:33:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tesarici.cz header.i=@tesarici.cz header.b="q8/zY7Ut" Received: from bee.tesarici.cz (unknown [77.93.223.253]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 497E0F9FF for ; Fri, 1 Mar 2024 17:33:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=77.93.223.253 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709314410; cv=none; b=QbvaG/9+8tCYCY8C1uFXTf45gBnFqXS7OTJRjfbKmgJZoIfTYdIwMTvFj1mb/DRLBUdi6HR9rHewhF+gjsAbO/DJuNxaGx/6DKevZeToEbqGvXv8SO0wlUVJUK+nBryY4akjO/Z0k1/p1iKqvERwVaPR1jKRyOsi1oxQ8pBvvsY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709314410; c=relaxed/simple; bh=kqFNymNhulpq3C9J15L9aRCKq+ZlvGBWEINffETSl9Y=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bEdGVXq5X6wTRdRfaRPRBAvFbzn7/JS0Kei+JCh/DAVNEL4SLX8jGlTV7L21miQs/IqgnZiTnHuekhzkvmh4Jrhaj1QzX1B8kHDTdwBfpl2Q866WZ6wAkzc7A477P2GbkYoLdfRsS+n7MfQAR3t0gW5On+5Z46eKY6z3VCBim8c= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tesarici.cz; spf=pass smtp.mailfrom=tesarici.cz; dkim=pass (2048-bit key) header.d=tesarici.cz header.i=@tesarici.cz header.b=q8/zY7Ut; arc=none smtp.client-ip=77.93.223.253 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tesarici.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tesarici.cz Received: from meshulam.tesarici.cz (dynamic-2a00-1028-83b8-1e7a-4427-cc85-6706-c595.ipv6.o2.cz [IPv6:2a00:1028:83b8:1e7a:4427:cc85:6706:c595]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bee.tesarici.cz (Postfix) with ESMTPSA id C6EDC1C163E; Fri, 1 Mar 2024 18:33:25 +0100 (CET) Authentication-Results: mail.tesarici.cz; dmarc=fail (p=quarantine dis=none) header.from=tesarici.cz DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tesarici.cz; s=mail; t=1709314406; bh=y5F9nq6Ty0/dF9UJva7lXz28jL0DsAwabnPmvR0UHIE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=q8/zY7UtxCWkrAUnWSom6lRc1oWwIcjPkd1EABwXDjtLeNLlAm/gioZB9DVWI32Q4 cqFsBTSoCbjJ5rAO7SE2TdjtDKLcJwFX7LQMFzMRm7/1ya2TeqZT11eqJcByM5werx sipjchVa0oxNkwVdcEgIrHJ3dAa5ndy0VSYqptc4a+C6IbL6ZEDO7ZyL+IzGDACu5L R2MafJFrtLRiCWSHezTBXeh8DNklH+cT6GrRYL5r+bL/HO5/C6lE7L0PliCOzE1wtk dAuI5QfzE2I4F+kTmEf2TZX8iGr4QYKBJ5SXpdN75JJ0t6JQv1uC3HxGUknYq3vP+e BPjRubOeuEDpg== Date: Fri, 1 Mar 2024 18:33:24 +0100 From: Petr =?UTF-8?B?VGVzYcWZw61r?= To: Robin Murphy Cc: Christoph Hellwig , Michael Kelley , Will Deacon , "linux-kernel@vger.kernel.org" , Petr Tesarik , "kernel-team@android.com" , "iommu@lists.linux.dev" , Marek Szyprowski , Dexuan Cui , Nicolin Chen Subject: Re: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE Message-ID: <20240301183324.11e76df2@meshulam.tesarici.cz> In-Reply-To: <4fbcdd49-cd93-4af8-83e2-f1cef0ea2eeb@arm.com> References: <20240228133930.15400-1-will@kernel.org> <20240228133930.15400-7-will@kernel.org> <20240229133346.GA7177@lst.de> <20240229154756.GA10137@lst.de> <20240301163927.18358ee2@meshulam.tesarici.cz> <4fbcdd49-cd93-4af8-83e2-f1cef0ea2eeb@arm.com> X-Mailer: Claws Mail 4.2.0 (GTK 3.24.39; x86_64-suse-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, 1 Mar 2024 16:38:33 +0000 Robin Murphy wrote: > On 2024-03-01 3:39 pm, Petr Tesa=C5=99=C3=ADk wrote: > > On Thu, 29 Feb 2024 16:47:56 +0100 > > Christoph Hellwig wrote: > > =20 > >> On Thu, Feb 29, 2024 at 03:44:11PM +0000, Michael Kelley wrote: =20 > >>> Any thoughts on how that historical behavior should apply if > >>> the DMA min_align_mask is non-zero, or the alloc_align_mask > >>> parameter to swiotbl_tbl_map_single() is non-zero? As currently > >>> used, alloc_align_mask is page aligned if the IOMMU granule is =20 > >>>> =3D PAGE_SIZE. But a non-zero min_align_mask could mandate =20 > >>> returning a buffer that is not page aligned. Perhaps do the > >>> historical behavior only if alloc_align_mask and min_align_mask > >>> are both zero? =20 > >> > >> I think the driver setting min_align_mask is a clear indicator > >> that the driver requested a specific alignment and the defaults > >> don't apply. For swiotbl_tbl_map_single as used by dma-iommu > >> I'd have to tak a closer look at how it is used. =20 > >=20 > > I'm not sure it helps in this discussion, but let me dive into a bit > > of ancient history to understand how we ended up here. > >=20 > > IIRC this behaviour was originally motivated by limitations of PC AT > > hardware. Intel 8237 is a 16-bit DMA controller. To make it somehow > > usable with addresses up to 16MB (yeah, the infamous DMA zone), IBM > > added a page register, but it was on a separate chip and it did not > > increment when the 8237 address rolled over back to zero. Effectively, > > the page register selected a 64K-aligned window of 64K buffers. > > Consequently, DMA buffers could not cross a 64K physical boundary. > >=20 > > Thanks to how the buddy allocator works, the 64K-boundary constraint > > was satisfied by allocation size, and drivers took advantage of it when > > allocating device buffers. IMO software bounce buffers simply followed > > the same logic that worked for buffers allocated by the buddy allocator. > >=20 > > OTOH min_align_mask was motivated by NVME which prescribes the value of > > a certain number of low bits in the DMA address (for simplicity assumed > > to be identical with the same bits in the physical address). > >=20 > > The only pre-existing user of alloc_align_mask is x86 IOMMU code, and > > IIUC it is used to guarantee that unaligned transactions do not share > > the IOMMU granule with another device. This whole thing is weird, > > because swiotlb_tbl_map_single() is called like this: > >=20 > > aligned_size =3D iova_align(iovad, size); > > phys =3D swiotlb_tbl_map_single(dev, phys, size, align= ed_size, > > iova_mask(iovad), dir, a= ttrs); > >=20 > > Here: > >=20 > > * alloc_size =3D iova_align(iovad, size) > > * alloc_align_mask =3D iova_mask(iovad) > >=20 > > Now, iova_align() rounds up its argument to a multiple of iova granule > > and iova_mask() is simply "granule - 1". This works, because granule > > size must be a power of 2, and I assume it must also be >=3D PAGE_SIZE.= =20 >=20 > Not quite, the granule must *not* be larger than PAGE_SIZE (but can be=20 > smaller). OK... I must rethink what is achieved with the alignment then. > > In that case, the alloc_align_mask argument is not even needed if you > > adjust the code to match documentation---the resulting buffer will be > > aligned to a granule boundary by virtue of having a size that is a > > multiple of the granule size. =20 >=20 > I think we've conflated two different notions of "allocation" here. The=20 > page-alignment which Christoph quoted applies for dma_alloc_coherent(),=20 > which nowadays *should* only be relevant to SWIOTLB code in the=20 > swiotlb_alloc() path for stealing coherent pages out of restricted DMA=20 > pools. Historically there was never any official alignment requirement=20 > for the DMA addresses returned by dma_map_{page,sg}(), until=20 > min_align_mask was introduced. >=20 > > To sum it up: > >=20 > > 1. min_align_mask is by far the most important constraint. Devices will > > simply stop working if it is not met. > > 2. Alignment to the smallest PAGE_SIZE order which is greater than or > > equal to the requested size has been documented, and some drivers > > may rely on it. =20 >=20 > Strictly it stopped being necessary since fafadcd16595 when the=20 > historical swiotlb_alloc() was removed, but 602d9858f07c implies that=20 > indeed the assumption of it for streaming mappings had already set in.=20 > Of course NVMe is now using min_align_mask, but I'm not sure if it's=20 > worth taking the risk to find out who else should also be. >=20 > > 3. alloc_align_mask is a misguided fix for a bug in the above. =20 >=20 > No, alloc_align_mask is about describing the explicit requirement rather= =20 > than relying on any implicit behaviour, and thus being able to do the=20 > optimal thing for, say, a 9KB mapping given a 4KB IOVA granule and 64KB=20 > PAGE_SIZE. It's getting confusing. Can we stay with the IOMMU use case for a moment? Since you don't use IO_TLB_SIZE or IO_TLB_SHIFT, I assume the code should work with any SWIOTLB slot size. For granule sizes up to SWIOTLB slot size, you always get a buffer that is not shared with any other granule. For granule sizes between SWIOTLB slot size and page size, you want to get as many slots as needed to cover the whole granule. Indeed, that would not be achieved with size alone. What if we change the size-based alignment constraint to: Aligned to the smallest IO_TLB_SIZE order which is greater than or equal to the requested size. AFAICS this is stricter, i.e. it covers the already documented PAGE_SIZE alignment, but it would also cover IOMMU needs. I have the feeling that the slot search has too many parameters. This alloc_align_mask has only one in-tree user, so it's a logical candidate for reduction. Petr T >=20 > Thanks, > Robin. >=20 > >=20 > > Correct me if anything of the above is wrong. > >=20 > > HTH > > Petr T =20 >=20