Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp1903562rwl; Thu, 5 Jan 2023 22:15:48 -0800 (PST) X-Google-Smtp-Source: AMrXdXvjAxM/7UHC4PpOD+bFA9eo6EB7zev21cyGNvwolYUdlLAgbpOS+afvTodzOIiMfi04Rjat X-Received: by 2002:a17:907:6ea9:b0:7c0:e7ad:5296 with SMTP id sh41-20020a1709076ea900b007c0e7ad5296mr59287719ejc.55.1672985748427; Thu, 05 Jan 2023 22:15:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672985748; cv=none; d=google.com; s=arc-20160816; b=SfinPU8yiWHtkd3IL4hOdFXHPD+D4Q3S16wXz9yg53xRPHm6GLxmmaZNKzoqJN9yk9 FvcLpSthbfWX9wrh/k6mXHsJcsehSTV0NAUwNrIug1TtUYDjQJmTk29rlZafFR5TBvEI ivzj0oXtKbkknvRqqfO4RBd8uahUxsLlmYBDVmPs00xq1RHQJ/3feQzU63DgeOlMWCQU NeKYRhB/FL+MFJ9WfkuKloLr15b7MNm584lBr9X/3lWblMBt5u621rH7SVs5a3EV+8Hv 1vibykHo43o4UteiJozmNu66BIdaOrlBEl+hZX1xO6Lj48HPODbneFKyyC3DRaoiEq78 obkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:content-language:subject:cc:user-agent:mime-version :date:message-id:dkim-signature; bh=LfHv2PBRTUy5P6IsLwp2ySRPvYAwyWW8bFHdTNVr9rs=; b=Insf+EnFCurvxG9fw8tjgRDBOVPfmO0LNd98yhr73rex0sPGBPDdcb3vZ8CBLjawD8 WUceT72yRTpYsjy1xFjbdD2Otzw+BFrF6x0xGfaXS9uR9VkYXIZciOuRe6bEbZgxUyRq rOcqqOIPHOKMTWAu3jY8s3d+DhbRvuju8Ux1fhXdVU/cyf88X7ifH75fIADf4z7/p1/f 8g2JedqT3Q4YPQ9DtA3z6mAJHL9YsyqZUm2ebdZKPGY1i5R92b1LirQcdW48a467f/OH YaGfIkwcKzxnQ0+FD7jAt5dZBNRwo0QzlQqRCYPYW3SyOe4qv4Fnps2ssqnWUUILwOJD pdwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XYxLxvmy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga38-20020a1709070c2600b0073d9f16e5b3si456330ejc.258.2023.01.05.22.15.34; Thu, 05 Jan 2023 22:15:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XYxLxvmy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231167AbjAFF4H (ORCPT + 54 others); Fri, 6 Jan 2023 00:56:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229597AbjAFF4F (ORCPT ); Fri, 6 Jan 2023 00:56:05 -0500 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B864060CF5; Thu, 5 Jan 2023 21:56:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1672984562; x=1704520562; h=message-id:date:mime-version:cc:subject:to:references: from:in-reply-to:content-transfer-encoding; bh=5FYmc733b78fRd3B3tStq5rzw45ZjZPIs3hLf607W+U=; b=XYxLxvmyeH7K8fy2k89VWV/X3oSSfIviWF+c6l/hMrlxK8yFmmX9KEuj G2HmiD23KTWi317DuMl0NCs+cNOxSXf/PkJeXccmbXhhuz0X3JRulFtTs 10CI5l7zfYrl8e+w/IjdPf0oJ/8LB5tFQC3Jo++6/AtUeyCo1uRXpLUdI XmKObNNosy42nuTTrUMuGV1MEY6L3oYj2nyplcExhCI8X6kRUYqSK4Gek rLBsJZEp3kJ9zxXhdtwrJq6AqYa7R4Qvv2FycLyAa4c24HIDTXtmkFzQN AOTORdplrkGVgvujw95nL9P8RK2yFjSxM79mUubzJwNmmJHjp3jH3NNhB Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10581"; a="323663145" X-IronPort-AV: E=Sophos;i="5.96,304,1665471600"; d="scan'208";a="323663145" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jan 2023 21:56:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10581"; a="724333183" X-IronPort-AV: E=Sophos;i="5.96,304,1665471600"; d="scan'208";a="724333183" Received: from allen-box.sh.intel.com (HELO [10.239.159.48]) ([10.239.159.48]) by fmsmga004.fm.intel.com with ESMTP; 05 Jan 2023 21:55:58 -0800 Message-ID: <7bbc0f65-e1c6-f388-29a8-390b8c9c92c8@linux.intel.com> Date: Fri, 6 Jan 2023 13:48:11 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: baolu.lu@linux.intel.com, "iommu@lists.linux.dev" , LKML , "regressions@lists.linux.dev" , Linux PCI , Bjorn Helgaas Subject: =?UTF-8?Q?Re=3a_=5bregression=2c_bisected=2c_pci/iommu=5d_Bug=c2=a0?= =?UTF-8?Q?216865_-_Black_screen_when_amdgpu_started_during_6=2e2-rc1_boot_w?= =?UTF-8?Q?ith_AMD_IOMMU_enabled?= Content-Language: en-US To: Felix Kuehling , "Deucher, Alexander" , "Hegde, Vasant" , Matt Fagnani , Thorsten Leemhuis , Joerg Roedel , Jason Gunthorpe References: <15d0f9ff-2a56-b3e9-5b45-e6b23300ae3b@leemhuis.info> <5aa0e698-f715-0481-36e5-46505024ebc1@bell.net> <157c4ca4-370a-5d7e-fe32-c64d934f6979@amd.com> <223ee6d6-70ea-1d53-8bc2-2d22201d8dde@bell.net> <6fff9d10-f77f-e55a-9020-8a1bd34cf508@amd.com> <6b5baa30-c218-0845-d6c2-32ac21ed6a6d@linux.intel.com> <86099ef0-5a8d-bd1e-4e38-a3b361a68f10@amd.com> <12db77f6-736d-a423-4cc0-e536eb7bb712@amd.com> From: Baolu Lu In-Reply-To: <12db77f6-736d-a423-4cc0-e536eb7bb712@amd.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Jason On 1/5/23 11:27 PM, Felix Kuehling wrote: > Am 2023-01-05 um 09:46 schrieb Deucher, Alexander: >> [AMD Official Use Only - General] >> >>> -----Original Message----- >>> From: Hegde, Vasant >>> Sent: Thursday, January 5, 2023 5:46 AM >>> To: Baolu Lu ; Matt Fagnani >>> ; Thorsten Leemhuis ; >>> Deucher, Alexander ; Joerg Roedel >>> >>> Cc: iommu@lists.linux.dev; LKML ; >>> regressions@lists.linux.dev; Linux PCI ; >>> Bjorn >>> Helgaas >>> Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen >>> when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled >>> >>> Baolu, >>> >>> >>> On 1/5/2023 4:07 PM, Baolu Lu wrote: >>>> On 2023/1/5 18:27, Vasant Hegde wrote: >>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote: >>>>>> I built 6.2-rc2 with the patch applied. The same black screen >>>>>> problem happened with 6.2-rc2 with the patch. I tried to use early >>>>>> kdump with 6.2-rc2 with the patch twice by panicking the kernel with >>>>>> sysrq+alt+c after the black screen happened. The system rebooted >>>>>> after about 10-20 seconds both times, but no kdump and dmesg files >>>>>> were saved in /var/crash. I'm attaching the lspci -vvv output as >>> requested. >>>>> Thanks for testing. As mentioned earlier I was not expecting this >>>>> patch to fix the black screen issue. It should fix kernel warnings >>>>> and IOMMU page fault related call traces. By any chance do you have >>>>> the >>> kernel boot logs? >>>>> @Baolu, >>>>>     Looking into lspci output, it doesn't list ACS feature for >>>>> Graphics card. So with your fix it didn't enable PASID and hence it >>>>> failed to >>> boot. >>>> So do you mind telling why does the PASID need to be enabled for the >>>> graphic device? Or in another word, what does the graphic driver use >>>> the PASID for? >>> Honestly I don't know the complete details of how PASID works with >>> graphics >>> card. May be Alex or Joerg can explain it better. >> + Felix >> >> The GPU driver uses the pasid for shared virtual memory between the >> CPU and GPU.  I.e., so that the user apps can use the same virtual >> address space on the GPU and the CPU.  It also uses pasid to take >> advantage of recoverable device page faults using PRS. > > Agreed. This applies to GPU computing on some older AMD APUs that take > advantage of memory coherence and IOMMUv2 address translation to create > a shared virtual address space between the CPU and GPU. In this case it > seems to be a Carrizo APU. It is also true for Raven APUs. Thanks for the explanation. This is actually the problem that commit 201007ef707a was trying to fix. The PCIe fabric routes Memory Requests based on the TLP address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with PASID that should go upstream to the IOMMU may instead be routed as a P2P Request if its address falls in a bridge window. In SVA case, the IOMMU shares the address space of a user application. The user application side has no knowledge about the PCI bridge window. It is entirely possible that the device is programed with a P2P address and results in a disaster. -- Best regards, baolu