Received: by 2002:ab2:69cc:0:b0:1f4:be93:e15a with SMTP id n12csp1705303lqp; Mon, 15 Apr 2024 14:44:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXLvkGcrBYyRrPMv19wZuZ8Q/cnROKNWgGHq7Bk6yuTbchSCUf/pukt9/mfrOucxzvuAydmzfKmxPRoJUQZyV1vTG1l8y6O0PVgowS7IA== X-Google-Smtp-Source: AGHT+IEjR3O0RSgzAlczt9Sb7myg/7KocqtyQPoecD+L3aLvNohoW5KqMf/b4hPo82j8EKPL4TCK X-Received: by 2002:a17:90b:3a8e:b0:2a5:275c:ed with SMTP id om14-20020a17090b3a8e00b002a5275c00edmr9690169pjb.23.1713217491096; Mon, 15 Apr 2024 14:44:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713217491; cv=pass; d=google.com; s=arc-20160816; b=r13rqc5OTHCL11CRRLVM6WbaTd/JK9EsiqrQikGkCoUjkIQvD18nBwrsvabJmxH4+W oIX1UJ2yEFZ3TB/epqK1JeMI5Zf5t1OE+YH1+72I3NyaTMNWCu8tB7nE0zrDHKmPpet7 /EIKmQHCbU24eRtQeXj+E0Y66Wc11SU1uCoVUQENXvNLf10w7VxGaJxeuxPFxd52nEdh EkkV9MhAOTCD6Z+EyN87HoHxenOeN+9YbmbeiqF+yp1gauK4rZA7RfWRhs1aZ61d2Saq xqPvc9t7rdsASy7lrFTSdETx1OomrzUYsSxog6Qx+4plMHp7+w0iqUilbK5tTMU7tqEc ZmOA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=l2vdN4m3k/rNeK+GRlcuSXraEvdLtSFGHWGP8c5QJ2c=; fh=mtjbR2NRh8KSy+YfkyuWjcJWwlfog7lXntpfnRPvuik=; b=BH73QHC/J/W6el+8H4ls4UqMjdX5zyZ375d6CO5inKmqs5sINwC532WzDvkwpEo7Ly G8AvWDVcggIJ6K7WvQJ7XY+8ewQWUpAKGLBd98ceNViPEm/ks/b3fdwjSMUpKHYipYqG NesBkb8uZmzZ5EiIqK+XHXAYUhqpo7NWrdCWEYDTMVCXsNsl6eq6P+Sg1Risq2RchXiU tE0x2r5IVfSFgy1siGtP8BFOffCLL+jRCC/5TqtH4qKKDvHD7EcWXOY3Iymt8G0BaPQN ei6BJJXX8UkTqGG/zd6gfqNvmluh0LxA6N7MeSB4cPMm/8tfzevQORTScVlINFpnK+Du 45AQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-145944-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-145944-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id gv9-20020a17090b11c900b002a5eb9b01cfsi8659831pjb.32.2024.04.15.14.44.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Apr 2024 14:44:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-145944-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-145944-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-145944-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 70135286C2E for ; Mon, 15 Apr 2024 21:44:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4B8DB15748A; Mon, 15 Apr 2024 21:44:46 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 50DD0156967 for ; Mon, 15 Apr 2024 21:44:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713217485; cv=none; b=e3eYzGggr8NsQGKOSav3cg+OegLAV3XcTn6KCzBGuEzqQcMkZfmjcgpSC4B27jOtiVPEXeYvPR70Pjesv0Tch6oMJqQfOrag4a1n6PK96KIxnpVAWULk1xunjzDgOErHl1DumQ1buxfi0QxMTOmHN4L49FctAizxcIO6hB1knKo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713217485; c=relaxed/simple; bh=YaVEnL7fTVUDsWoGF3mIRRcFiRd8xm30iOoRq9T0wwM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=KURRPN+JrOMq/KHMNxR8ilrlnKzyynFpHEPgo2y0gnaJORNpVakDWPxcIBmR7tDc4LiMvM7mtvE+0Gg0b8gM/IPMX8IT3u9VYTYXQLm+DmXVpJbcxZmA/9vsUiSKz50YcBymb4/sWmfoB5EAakirzRcOtU2IFk+PADKlyvfo9Ec= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 574E32F4; Mon, 15 Apr 2024 14:45:12 -0700 (PDT) Received: from [10.57.19.68] (unknown [10.57.19.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7FF4B3F738; Mon, 15 Apr 2024 14:44:42 -0700 (PDT) Message-ID: <65d4d7e0-4d90-48d7-8e4a-d16800df148a@arm.com> Date: Mon, 15 Apr 2024 22:44:34 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU To: Eric Wagner , Jason Gunthorpe Cc: Joerg Roedel , Will Deacon , Suravee Suthikulpanit , iommu@lists.linux.dev, linux-kernel@vger.kernel.org References: <20240415163056.GP223006@ziepe.ca> From: Robin Murphy Content-Language: en-GB In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 2024-04-15 7:57 pm, Eric Wagner wrote: > Apologies if I made a mistake in the first bisect, I'm new to kernel > debugging. > > I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and > 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good. > Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 > as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the > bisect log is attached. It ended up at the same commit as before. > > I've also attached a picture of the boot screen that occurs when it hangs. > 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the > problem. Looks like 59ddce4418da483 probably broke things most - prior to that, the fact that it's behind a Thunderbolt port would have always taken precedence and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to say, whereas now we ask the driver first, then complain that it conflicts with the untrusted status and ultimately don't configure the IOMMU at all. Meanwhile the GPU driver presumably goes on to believe it's using dma-direct with no IOMMU present, resulting in fireworks when its traffic reaches the IOMMU. Great :( However the other notable thing that also happened between 6.6 and 6.7 was the removal of the AMD iommu_v2 code, so there's some possibility that the GPU driver still may have only been working before due to that also subverting the default domain with its own identity domain, so whether it would actually work again with iommu_get_default_domain_type() sorted out is yet another question... As a first step I'd test the quick hack below, but be prepared for things to still break slightly differently. Cheers, Robin. ----->8----- diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 996e79dc582d..063e1eb32fbd 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1774,7 +1774,7 @@ static int iommu_get_default_domain_type(struct iommu_group *group, untrusted, "Device is not trusted, but driver is overriding group %u to %s, refusing to probe.\n", group->id, iommu_domain_type_str(driver_type)); - return -1; + //return -1; } driver_type = IOMMU_DOMAIN_DMA; }