Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp1008896lqo; Fri, 17 May 2024 08:08:29 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVUf1L0p8VfJBTNygSG+TMMcBUfpJ7wjq6aaPfg9SRAcgkzvsGkDj0xfc96hWm+SecDpDLgojyFQKw/KV8Vzz85Y7lv8JlVvbMJfZnlrA== X-Google-Smtp-Source: AGHT+IHvkjWWvqW+IYb3eDGFfpFaPUw3gsmDWqTcZFtHIRnK390LWpLB0iw7p/7Z0DcKOEpknKRB X-Received: by 2002:a17:903:289:b0:1e6:114c:2e54 with SMTP id d9443c01a7336-1ef44050ea8mr307132495ad.69.1715958508926; Fri, 17 May 2024 08:08:28 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715958508; cv=pass; d=google.com; s=arc-20160816; b=l2o3wG7Hh/ZL43kOxkUJTRU95M20Zr/tsF/s1WdiFf8oA84Xf1ab4L2A5EdSHaCI/I lGB7/HvDQY0du+pOgRPqmq4g9+oNNzYyP0ulrG9s509kxR/jsffUvyyyDxkFxiUrs1yJ UXJeqNxFRkiRfg3VnR6QFgyGMtxZcVBssrnRyowzHvnY9BDFHi1bM4kAcIZeD9x/5qv9 TvNUvgDepOYC3UEWbPILWi+LKB8WsOctabjJF6pJJwCQkxX7/2Gc63KsXhQAvkOBjiUF NIqGAwet9v9ZfLYTNgfEelXuIlaIpWoVDiQmF9/8PAzhgb1ODk3fDfdALpsGwUVAnaaN V3Hw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=YuH6JVU0A6J9d7Tk6R9DmaLiAUO7HcyAtgabV/IAkww=; fh=YJcd6zXQqfX5jeH8Kq9dATK6dVb1QN0BcDdPA/JYSzM=; b=0PygJVxKgZlAntMdOU8FqrfsRlQyB3sIfV6jMNPrHK2VJN3x7NHtHL12MG/qDeaRwx E+QgrRqTv5HEM7952kzYlVrlogX77hiyz7HDzquqxBBiN3ryyWeii4iJ67JA6ev5ibPi 4g2vixxTXMLMWX3doZhGOfWBmDk+YQb2NwgRewsm9zZGP5KTonud9CGeIbyEXi6PbEW0 lN9V0zYth1Nq/DmH0UyxqsQOZD0P6Vaa14zOo7ceJ3KiIj4ZWKysME3jpse9oW4jGE1Q bZ+KokRyWjClG+KOLYd5PeM2ZKEo1PQDJi/9kIYg5+Cxnr3n9D+0gjy0rhHvZTL2AB2K eWVA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-182267-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182267-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id d9443c01a7336-1ef0b9cfb3asi187788745ad.129.2024.05.17.08.08.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 08:08:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-182267-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-182267-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182267-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 4B7DE288011 for ; Fri, 17 May 2024 15:04:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 53BE66A8C1; Fri, 17 May 2024 15:04:10 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F1EC464CE1; Fri, 17 May 2024 15:04:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715958249; cv=none; b=J5LKj+d3FHBQGUvAdF4/LjinAJUzHWKp4vh2BAw7MvcQwaDi23COzfofEPKKWFdK8nCQ0eSUOiY9sS4Vg1QlhFFGUm85G3S4LrQddDyvT6vvSYujv1tbelkr3bCJG0ng7YFG3WBxdkBOgLtxVE0qwJpfvEY6YnsA5YpdpW6tu9k= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715958249; c=relaxed/simple; bh=rmCP6QYgkI9Sl3Ub1SJ2idoZZMhLo/H/14aEl6gI9G4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=TTJZ1hv1o4fOvSqIDRVSO7ZGsOlk1WH+TDYX/sKSSr87SuE3Wp+rxKZINuKQbWuNfcLAUmKWybO3ys4WkSBF1Zznig4vhazkwEBNcCiQMrWFkZm7xR+DYC4G480WIvaveXZoHDBuFQ0+rGWX+JsSZbDklu7cKpsdYDBwpc8iw70= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E2BE01424; Fri, 17 May 2024 08:04:27 -0700 (PDT) Received: from [10.1.196.40] (e121345-lin.cambridge.arm.com [10.1.196.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 85E6D3F762; Fri, 17 May 2024 08:03:59 -0700 (PDT) Message-ID: <981c85f3-6d43-4c2b-a440-88bf81a18e55@arm.com> Date: Fri, 17 May 2024 16:03:57 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 5/7] iommu/dma: Make limit checks self-contained To: Jon Hunter , Joerg Roedel , Christoph Hellwig Cc: Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Thomas Bogendoerfer , Paul Walmsley , Palmer Dabbelt , Albert Ou , Lorenzo Pieralisi , Hanjun Guo , Sudeep Holla , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Suravee Suthikulpanit , David Woodhouse , Lu Baolu , Niklas Schnelle , Matthew Rosato , Gerald Schaefer , Jean-Philippe Brucker , Rob Herring , Frank Rowand , Marek Szyprowski , Jason Gunthorpe , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org, iommu@lists.linux.dev, devicetree@vger.kernel.org, Jason Gunthorpe , "linux-tegra@vger.kernel.org" References: <243d441d-dda8-442a-a495-83bf9725a14c@nvidia.com> <48c39306-c226-4e7f-a013-d679ca80157e@arm.com> <46fc1b7f-7d10-4233-b089-aa173ad3bbeb@nvidia.com> From: Robin Murphy Content-Language: en-GB In-Reply-To: <46fc1b7f-7d10-4233-b089-aa173ad3bbeb@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 17/05/2024 3:21 pm, Jon Hunter wrote: > > On 15/05/2024 15:59, Robin Murphy wrote: >> Hi Jon, >> >> On 2024-05-14 2:27 pm, Jon Hunter wrote: >>> Hi Robin, >>> >>> On 19/04/2024 17:54, Robin Murphy wrote: >>>> It's now easy to retrieve the device's DMA limits if we want to check >>>> them against the domain aperture, so do that ourselves instead of >>>> relying on them being passed through the callchain. >>>> >>>> Reviewed-by: Jason Gunthorpe >>>> Tested-by: Hanjun Guo >>>> Signed-off-by: Robin Murphy >>>> --- >>>>   drivers/iommu/dma-iommu.c | 21 +++++++++------------ >>>>   1 file changed, 9 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >>>> index a3039005b696..f542eabaefa4 100644 >>>> --- a/drivers/iommu/dma-iommu.c >>>> +++ b/drivers/iommu/dma-iommu.c >>>> @@ -660,19 +660,16 @@ static void iommu_dma_init_options(struct >>>> iommu_dma_options *options, >>>>   /** >>>>    * iommu_dma_init_domain - Initialise a DMA mapping domain >>>>    * @domain: IOMMU domain previously prepared by >>>> iommu_get_dma_cookie() >>>> - * @base: IOVA at which the mappable address space starts >>>> - * @limit: Last address of the IOVA space >>>>    * @dev: Device the domain is being initialised for >>>>    * >>>> - * @base and @limit + 1 should be exact multiples of IOMMU page >>>> granularity to >>>> - * avoid rounding surprises. If necessary, we reserve the page at >>>> address 0 >>>> + * If the geometry and dma_range_map include address 0, we reserve >>>> that page >>>>    * to ensure it is an invalid IOVA. It is safe to reinitialise a >>>> domain, but >>>>    * any change which could make prior IOVAs invalid will fail. >>>>    */ >>>> -static int iommu_dma_init_domain(struct iommu_domain *domain, >>>> dma_addr_t base, >>>> -                 dma_addr_t limit, struct device *dev) >>>> +static int iommu_dma_init_domain(struct iommu_domain *domain, >>>> struct device *dev) >>>>   { >>>>       struct iommu_dma_cookie *cookie = domain->iova_cookie; >>>> +    const struct bus_dma_region *map = dev->dma_range_map; >>>>       unsigned long order, base_pfn; >>>>       struct iova_domain *iovad; >>>>       int ret; >>>> @@ -684,18 +681,18 @@ static int iommu_dma_init_domain(struct >>>> iommu_domain *domain, dma_addr_t base, >>>>       /* Use the smallest supported page size for IOVA granularity */ >>>>       order = __ffs(domain->pgsize_bitmap); >>>> -    base_pfn = max_t(unsigned long, 1, base >> order); >>>> +    base_pfn = 1; >>>>       /* Check the domain allows at least some access to the >>>> device... */ >>>> -    if (domain->geometry.force_aperture) { >>>> +    if (map) { >>>> +        dma_addr_t base = dma_range_map_min(map); >>>>           if (base > domain->geometry.aperture_end || >>>> -            limit < domain->geometry.aperture_start) { >>>> +            dma_range_map_max(map) < >>>> domain->geometry.aperture_start) { >>>>               pr_warn("specified DMA range outside IOMMU >>>> capability\n"); >>>>               return -EFAULT; >>>>           } >>>>           /* ...then finally give it a kicking to make sure it fits */ >>>> -        base_pfn = max_t(unsigned long, base_pfn, >>>> -                domain->geometry.aperture_start >> order); >>>> +        base_pfn = max(base, domain->geometry.aperture_start) >> >>>> order; >>>>       } >>>>       /* start_pfn is always nonzero for an already-initialised >>>> domain */ >>>> @@ -1760,7 +1757,7 @@ void iommu_setup_dma_ops(struct device *dev, >>>> u64 dma_base, u64 dma_limit) >>>>        * underlying IOMMU driver needs to support via the dma-iommu >>>> layer. >>>>        */ >>>>       if (iommu_is_dma_domain(domain)) { >>>> -        if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev)) >>>> +        if (iommu_dma_init_domain(domain, dev)) >>>>               goto out_err; >>>>           dev->dma_ops = &iommu_dma_ops; >>>>       } >>> >>> >>> I have noticed some random test failures on Tegra186 and Tegra194 and >>> bisect is pointing to this commit. Reverting this along with the >>> various dependencies does fix the problem. On Tegra186 CPU hotplug is >>> failing and on Tegra194 suspend is failing. Unfortunately, on neither >>> platform do I see any particular crash but the boards hang somewhere. >> >> That is... thoroughly bemusing :/ Not only is there supposed to be no >> real functional change here - we should merely be recalculating the >> same information from dev->dma_range_map that the callers were already >> doing to generate the base/limit arguments - but the act of initially >> setting up a default domain for a device behind an IOMMU should have >> no connection whatsoever to suspend and especially not to CPU hotplug. > > > Yes it does look odd, but this is what bisect reported ... > > git bisect start > # good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9 > git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6 > # bad: [6ba6c795dc73c22ce2c86006f17c4aa802db2a60] Add linux-next > specific files for 20240513 > git bisect bad 6ba6c795dc73c22ce2c86006f17c4aa802db2a60 > # good: [29e7f949865a023a21ecdfbd82d68ac697569f34] Merge branch 'main' > of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git > git bisect good 29e7f949865a023a21ecdfbd82d68ac697569f34 > # skip: [150e6cc14e51f2a07034106a4529cdaafd812c46] Merge branch 'next' > of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git > git bisect skip 150e6cc14e51f2a07034106a4529cdaafd812c46 > # good: [f5d75327d30af49acf2e4b55f35ce2e6c45d1287] drm/amd/display: Fix > invalid Copyright notice > git bisect good f5d75327d30af49acf2e4b55f35ce2e6c45d1287 > # skip: [f1ec9a9ffc526df7c9523006c2abbb8ea554cdd8] Merge branch > 'for-next' of > git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-dt.git > git bisect skip f1ec9a9ffc526df7c9523006c2abbb8ea554cdd8 > # bad: [f091e93306e0429ebb7589b9874590b6a9705e64] dma-mapping: Simplify > arch_setup_dma_ops() > git bisect bad f091e93306e0429ebb7589b9874590b6a9705e64 > # good: [91cfd679f9e8b9a7bf2f26adf66eff99dbe2026b] ACPI/IORT: Handle > memory address size limits as limits > git bisect good 91cfd679f9e8b9a7bf2f26adf66eff99dbe2026b > # bad: [ad4750b07d3462ce29a0c9b1e88b2a1f9795290e] iommu/dma: Make limit > checks self-contained > git bisect bad ad4750b07d3462ce29a0c9b1e88b2a1f9795290e > # good: [fece6530bf4b59b01a476a12851e07751e73d69f] dma-mapping: Add > helpers for dma_range_map bounds > git bisect good fece6530bf4b59b01a476a12851e07751e73d69f > # first bad commit: [ad4750b07d3462ce29a0c9b1e88b2a1f9795290e] > iommu/dma: Make limit checks self-contained > > There is a couple skips in there and so I will try this again. > >>> If you have any ideas on things we can try let me know. >> >> Since the symptom seems inexplicable, I'd throw the usual memory >> debugging stuff like KASAN at it first. I'd also try >> "no_console_suspend" to check whether any late output is being missed >> in the suspend case (and if it's already broken, then any additional >> issues that may be caused by the console itself hopefully shouldn't >> matter). >> >> For more base-covering, do you have the "arm64: Properly clean up >> iommu-dma remnants" fix in there already as well? That bug has >> bisected to patch #6 each time though, so I do still suspect that what >> you're seeing is likely something else. It does seem potentially >> significant that those Tegra platforms are making fairly wide use of >> dma-ranges, but there's no clear idea forming out of that observation >> just yet... > > I was hoping it was the same issue other people had reported, > but the fix provided did not help. I have also tried today's > -next and I am still seeing the issue. > > I should have more time next week to look at this further. Let > me confirm which change is causing this and add more debug. Thanks. From staring at the code I think I've spotted one subtlety which may not be quite as intended - can you see if the diff below helps? It occurs to me that suspend and CPU hotplug may not *cause* the symptom, but they could certainly stall if one or more relevant CPUs is *already* stuck in a loop somewhere... Thanks, Robin. ----->8----- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 89a53c2f2cf9..85eb1846c637 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -686,6 +686,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, struct device *dev /* Check the domain allows at least some access to the device... */ if (map) { dma_addr_t base = dma_range_map_min(map); + base = max(base, (dma_addr_t)1 << order); if (base > domain->geometry.aperture_end || dma_range_map_max(map) < domain->geometry.aperture_start) { pr_warn("specified DMA range outside IOMMU capability\n");