Received: by 2002:a05:7208:13ce:b0:7f:395a:35b6 with SMTP id r14csp1221630rbe; Fri, 1 Mar 2024 07:42:08 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVCxZmaUTLBqtF1tPdbAU2H3ZnCfKabiNx2rWaTvJBuvKqqdhWhkkUgo/s2kWvJaHIH35s2lD7LP1dSQVAtY9AjigMRttjwZdR4gL+iLA== X-Google-Smtp-Source: AGHT+IFOx5Las810/0kLkNZJfZlGAJNUYtjeWUga8pI34ViuxaH9IGEZ2COn53GmrI67BB1RBGTz X-Received: by 2002:a17:902:c40b:b0:1dc:b989:9b96 with SMTP id k11-20020a170902c40b00b001dcb9899b96mr2237938plk.49.1709307727972; Fri, 01 Mar 2024 07:42:07 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709307727; cv=pass; d=google.com; s=arc-20160816; b=vUq4VvvlH4gcqZdk4lfE5ZvAZlz8SPKLX9WBZv/QyWgvbRcI/OrVmJqixe/5QLucf6 3fTEYN14ldR4VgCrgdlE/NNENHo74H8YbL3r37NV64/wiiCoGUDlDLa0dwwDqdqqxLTH W7xBoivYQ6DGKVwZdIER6YXn5jq7sIuDRCy1/Wx3ntMz3QFFZkNoQ5U2YQQwfBCHTrpl MdlWNtEQkC84yN54YkfMZiOy43Yb/eR0uk9OEVrMFmw6EylqFIx01Yp3Q3r3GXezNTVw +4g6Bc5iz1r9XXl5QFuSJKUSXKw9vOIpC3ufWv3/0oVfnjoCs+Kbca8aTpq9hH+yNPwg UVHw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=lhnMMcmJC5RrjEhIqhAv2s0zT/02tZlCTuNpbuEIU4Y=; fh=TV6y3XhubkLCuMGIlcFwfDVoHGUXr9HJv/0e49WKdfo=; b=zj5nvXGQJJC/3sGGXsJxTQbha1LzWF8gKxJjaC7wIuJJz6ChUnNcQzCdV6veZU/Tqf Fp+JoClSRiUKrWiwoG6xv74hP9Pb9cbUx97VdyyDeU/66gsoLS85nV7MC51rUFUltVre 5E1OugUBQbh7vpWyyyPAw3jJFQJCiGmbLqh6I9Z3wValIR/WMLwsHFUeKw8dRDqQlZe7 eAsKxWNp9TEAhaatEENCB7IVmpzMQACHqmkmPHbvjVSrn2+i046U0cA6l7eG4QdCubBm e7UmS3Ad2CO8bcdRB16ueDltZ8APhdmZkWtOsEPx3ecYwCW7NywmaLD4chBboPJkLaa7 WDzw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@tesarici.cz header.s=mail header.b=H0mewWKV; arc=pass (i=1 spf=pass spfdomain=tesarici.cz dkim=pass dkdomain=tesarici.cz dmarc=pass fromdomain=tesarici.cz); spf=pass (google.com: domain of linux-kernel+bounces-88630-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88630-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tesarici.cz Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id y8-20020a17090322c800b001dcad9d9e7dsi3968643plg.576.2024.03.01.07.42.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Mar 2024 07:42:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-88630-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@tesarici.cz header.s=mail header.b=H0mewWKV; arc=pass (i=1 spf=pass spfdomain=tesarici.cz dkim=pass dkdomain=tesarici.cz dmarc=pass fromdomain=tesarici.cz); spf=pass (google.com: domain of linux-kernel+bounces-88630-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-88630-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=tesarici.cz Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 4AD60B220CE for ; Fri, 1 Mar 2024 15:39:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7BF2A6F514; Fri, 1 Mar 2024 15:39:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=tesarici.cz header.i=@tesarici.cz header.b="H0mewWKV" Received: from bee.tesarici.cz (unknown [77.93.223.253]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94E563A8E3 for ; Fri, 1 Mar 2024 15:39:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=77.93.223.253 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709307579; cv=none; b=JqI0jFwOsd3URErdCf8N+BJL1RqkVqqEXYo2prDetXcoC90tiASdSpkgrauvjW0W9kRRW/u63pU864+C8t6n9MPXT3LCLHn2TzZTK+dnv1UYV/VWSn6YHDvxGJZ8H0n4YPcrUabR3AGHEHZwgcsY4/HyQWdInXpf/HCml/4YmVE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709307579; c=relaxed/simple; bh=MgUqSXoPJkNv53osPWyAYHJspRzP4mbPdaDO/LJU97k=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ursOcbwX5lRHcKFQeRPHIkZAtaRbyCp8uIBxHpi26gndqw2K7GX3y/NleyxpK0HqbJoOCcZRMx/fHWBCmtbvSaf6xGMvs1hLiJXBWUCrc6e+szs3VvDfA3254IUuGscIjOHDqcnmt9UDXdJQhp8Vc/8Wi4wT1gxsI4GMoQ2pO4s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tesarici.cz; spf=pass smtp.mailfrom=tesarici.cz; dkim=pass (2048-bit key) header.d=tesarici.cz header.i=@tesarici.cz header.b=H0mewWKV; arc=none smtp.client-ip=77.93.223.253 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=tesarici.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tesarici.cz Received: from meshulam.tesarici.cz (dynamic-2a00-1028-83b8-1e7a-4427-cc85-6706-c595.ipv6.o2.cz [IPv6:2a00:1028:83b8:1e7a:4427:cc85:6706:c595]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bee.tesarici.cz (Postfix) with ESMTPSA id DE6C11C32B2; Fri, 1 Mar 2024 16:39:28 +0100 (CET) Authentication-Results: mail.tesarici.cz; dmarc=fail (p=quarantine dis=none) header.from=tesarici.cz DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tesarici.cz; s=mail; t=1709307569; bh=lhnMMcmJC5RrjEhIqhAv2s0zT/02tZlCTuNpbuEIU4Y=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=H0mewWKV/KjTw4gwzl7j6Xj1MKKMFO0f30vWMbt/+p0uWUvY16DbLVm5h4xi+yECX 4x+7iXVJ7YOlG4e528m8fNRjxJPHGj6a6tviz2KjvQzTYSG0WAF60Igs4Fsj0I1H1Q 82D6vIN7yZzi4bwNWqaYpSz1xqCULqNxnM2efF6n4/rUpRGjtrqrzjWECiwjhV4Y0U mpn3nXzFZAivEdcEfZpss7cflspaBEaFT4YwzHamve4fInO4pccklB+xS/Kyh213SV CHAgRwPlURdR6vLYHVq9icVNVAl34flRZDmuL9P7YtGExDCellm/r9s7WaYOkuxY4Y JUvYmi0pNIHTA== Date: Fri, 1 Mar 2024 16:39:27 +0100 From: Petr =?UTF-8?B?VGVzYcWZw61r?= To: Christoph Hellwig Cc: Michael Kelley , Will Deacon , "linux-kernel@vger.kernel.org" , Petr Tesarik , "kernel-team@android.com" , "iommu@lists.linux.dev" , Marek Szyprowski , Robin Murphy , Dexuan Cui , Nicolin Chen Subject: Re: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for allocations >= PAGE_SIZE Message-ID: <20240301163927.18358ee2@meshulam.tesarici.cz> In-Reply-To: <20240229154756.GA10137@lst.de> References: <20240228133930.15400-1-will@kernel.org> <20240228133930.15400-7-will@kernel.org> <20240229133346.GA7177@lst.de> <20240229154756.GA10137@lst.de> X-Mailer: Claws Mail 4.2.0 (GTK 3.24.39; x86_64-suse-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 29 Feb 2024 16:47:56 +0100 Christoph Hellwig wrote: > On Thu, Feb 29, 2024 at 03:44:11PM +0000, Michael Kelley wrote: > > Any thoughts on how that historical behavior should apply if > > the DMA min_align_mask is non-zero, or the alloc_align_mask > > parameter to swiotbl_tbl_map_single() is non-zero? As currently > > used, alloc_align_mask is page aligned if the IOMMU granule is > > >= PAGE_SIZE. But a non-zero min_align_mask could mandate > > returning a buffer that is not page aligned. Perhaps do the > > historical behavior only if alloc_align_mask and min_align_mask > > are both zero? > > I think the driver setting min_align_mask is a clear indicator > that the driver requested a specific alignment and the defaults > don't apply. For swiotbl_tbl_map_single as used by dma-iommu > I'd have to tak a closer look at how it is used. I'm not sure it helps in this discussion, but let me dive into a bit of ancient history to understand how we ended up here. IIRC this behaviour was originally motivated by limitations of PC AT hardware. Intel 8237 is a 16-bit DMA controller. To make it somehow usable with addresses up to 16MB (yeah, the infamous DMA zone), IBM added a page register, but it was on a separate chip and it did not increment when the 8237 address rolled over back to zero. Effectively, the page register selected a 64K-aligned window of 64K buffers. Consequently, DMA buffers could not cross a 64K physical boundary. Thanks to how the buddy allocator works, the 64K-boundary constraint was satisfied by allocation size, and drivers took advantage of it when allocating device buffers. IMO software bounce buffers simply followed the same logic that worked for buffers allocated by the buddy allocator. OTOH min_align_mask was motivated by NVME which prescribes the value of a certain number of low bits in the DMA address (for simplicity assumed to be identical with the same bits in the physical address). The only pre-existing user of alloc_align_mask is x86 IOMMU code, and IIUC it is used to guarantee that unaligned transactions do not share the IOMMU granule with another device. This whole thing is weird, because swiotlb_tbl_map_single() is called like this: aligned_size = iova_align(iovad, size); phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size, iova_mask(iovad), dir, attrs); Here: * alloc_size = iova_align(iovad, size) * alloc_align_mask = iova_mask(iovad) Now, iova_align() rounds up its argument to a multiple of iova granule and iova_mask() is simply "granule - 1". This works, because granule size must be a power of 2, and I assume it must also be >= PAGE_SIZE. In that case, the alloc_align_mask argument is not even needed if you adjust the code to match documentation---the resulting buffer will be aligned to a granule boundary by virtue of having a size that is a multiple of the granule size. To sum it up: 1. min_align_mask is by far the most important constraint. Devices will simply stop working if it is not met. 2. Alignment to the smallest PAGE_SIZE order which is greater than or equal to the requested size has been documented, and some drivers may rely on it. 3. alloc_align_mask is a misguided fix for a bug in the above. Correct me if anything of the above is wrong. HTH Petr T