2023-01-04 07:54:00

by Lu Baolu

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

On 2023/1/4 2:55, Matt Fagnani wrote:
> I reproduced the problem with 6.2-rc1 in a Fedora 37 installation with early kdump enabled as described athttps://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes https://github.com/k-hagio/fedora-kexec-tools/blob/master/early-kdump-howto.txt I panicked the kernel with sysrq+alt+c. The dmesg saved with the kdump showed warnings at drivers/pci/ats.c:251 pci_disable_pri+0x75/0x80 and at drivers/pci/ats.c:419 pci_disable_pasid+0x45/0x50 involving AMD IOMMU and amdgpu functions in the trace. Since those warnings' were
> if (WARN_ON(!pdev->pri_enabled)) and if (WARN_ON(!pdev->pasid_enabled)), pci_disable_pri and pci_disable_pasid looked like they were called when pdev->pri_enabled and pdev->pasid_enabled were both false.
> A null pointer dereference occurred right after that which made amdgpu crash.
>
> [ 13.132368] [drm] amdgpu kernel modesetting enabled.
> [ 13.133766] amdgpu: Topology: Add APU node [0x0:0x0]
> [ 13.137596] Console: switching to colour dummy device 80x25
> [ 13.143717] amdgpu 0000:00:01.0: vgaarb: deactivate vga console
> [ 13.143970] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x103C:0x8332 0xCA).
> [ 13.144205] [drm] register mmio base: 0xF0400000
> [ 13.144209] [drm] register mmio size: 262144
> [ 13.144310] [drm] add ip block number 0 <vi_common>
> [ 13.144316] [drm] add ip block number 1 <gmc_v8_0>
> [ 13.144320] [drm] add ip block number 2 <cz_ih>
> [ 13.144324] [drm] add ip block number 3 <gfx_v8_0>
> [ 13.144328] [drm] add ip block number 4 <sdma_v3_0>
> [ 13.144332] [drm] add ip block number 5 <powerplay>
> [ 13.144336] [drm] add ip block number 6 <dm>
> [ 13.144340] [drm] add ip block number 7 <uvd_v6_0>
> [ 13.144343] [drm] add ip block number 8 <vce_v3_0>
> [ 13.144347] [drm] add ip block number 9 <acp_ip>
> [ 13.144388] amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from VFCT
> [ 13.144397] amdgpu: ATOM BIOS: 113-C75100-031
> [ 13.144425] [drm] UVD is enabled in physical mode
> [ 13.144431] [drm] VCE enabled in physical mode
> [ 13.144435] amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
> [ 13.144491] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
> [ 13.144503] amdgpu 0000:00:01.0: amdgpu: VRAM: 512M 0x000000F400000000 - 0x000000F41FFFFFFF (512M used)
> [ 13.144511] amdgpu 0000:00:01.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
> [ 13.144524] [drm] Detected VRAM RAM=512M, BAR=512M
> [ 13.144529] [drm] RAM width 64bits UNKNOWN
> [ 13.144623] [drm] amdgpu: 512M of VRAM memory ready
> [ 13.144630] [drm] amdgpu: 3572M of GTT memory ready.
> [ 13.144653] [drm] GART: num cpu pages 262144, num gpu pages 262144
> [ 13.144705] [drm] PCIE GART of 1024M enabled (table at 0x000000F400600000).
> [ 13.158820] amdgpu: hwmgr_sw_init smu backed is smu8_smu
> [ 13.175036] [drm] Found UVD firmware Version: 1.91 Family ID: 11
> [ 13.175097] [drm] UVD ENC is disabled
> [ 13.186675] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
> [ 13.187879] amdgpu: smu version 27.18.00
> [ 13.193760] [drm] DM_PPLIB: values for Engine clock
> [ 13.193773] [drm] DM_PPLIB: 300000
> [ 13.193776] [drm] DM_PPLIB: 480000
> [ 13.193779] [drm] DM_PPLIB: 533340
> [ 13.193781] [drm] DM_PPLIB: 576000
> [ 13.193784] [drm] DM_PPLIB: 626090
> [ 13.193786] [drm] DM_PPLIB: 685720
> [ 13.193788] [drm] DM_PPLIB: 720000
> [ 13.193791] [drm] DM_PPLIB: 757900
> [ 13.193793] [drm] DM_PPLIB: Validation clocks:
> [ 13.193796] [drm] DM_PPLIB: engine_max_clock: 75790
> [ 13.193799] [drm] DM_PPLIB: memory_max_clock: 93300
> [ 13.193802] [drm] DM_PPLIB: level : 8
> [ 13.193806] [drm] DM_PPLIB: values for Display clock
> [ 13.193809] [drm] DM_PPLIB: 300000
> [ 13.193811] [drm] DM_PPLIB: 400000
> [ 13.193814] [drm] DM_PPLIB: 496560
> [ 13.193816] [drm] DM_PPLIB: 626090
> [ 13.193819] [drm] DM_PPLIB: 685720
> [ 13.193821] [drm] DM_PPLIB: 757900
> [ 13.193823] [drm] DM_PPLIB: 800000
> [ 13.193825] [drm] DM_PPLIB: 847060
> [ 13.193828] [drm] DM_PPLIB: Validation clocks:
> [ 13.193830] [drm] DM_PPLIB: engine_max_clock: 75790
> [ 13.193833] [drm] DM_PPLIB: memory_max_clock: 93300
> [ 13.193836] [drm] DM_PPLIB: level : 8
> [ 13.193839] [drm] DM_PPLIB: values for Memory clock
> [ 13.193842] [drm] DM_PPLIB: 667000
> [ 13.193844] [drm] DM_PPLIB: 933000
> [ 13.193847] [drm] DM_PPLIB: Validation clocks:
> [ 13.193849] [drm] DM_PPLIB: engine_max_clock: 75790
> [ 13.193852] [drm] DM_PPLIB: memory_max_clock: 93300
> [ 13.193854] [drm] DM_PPLIB: level : 8
> [ 13.193973] [drm] Display Core initialized with v3.2.215!
> [ 13.309967] [drm] UVD initialized successfully.
> [ 13.511031] [drm] VCE initialized successfully.
> [ 13.515217] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
> [ 13.515442] amdgpu: sdma_bitmap: f
> [ 13.515549] ------------[ cut here ]------------
> [ 13.515555] WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:251 pci_disable_pri+0x75/0x80
> [ 13.515571] Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2 hid_logitech_hidpp crct10dif_pclmul drm_buddy crc32_pclmul gpu_sched crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 drm_display_helper wdat_wdt serio_raw hid_multitouch sp5100_tco hid_logitech_dj r8169 cec video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse dm_multipath
> [ 13.515620] CPU: 0 PID: 477 Comm: systemd-udevd Kdump: loaded Not tainted 6.2.0-0.rc1.14.fc38.x86_64 #1
> [ 13.515628] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019
> [ 13.515634] RIP: 0010:pci_disable_pri+0x75/0x80
> [ 13.515642] Code: 54 24 06 89 ee 48 89 df 83 e2 fe 66 89 54 24 06 0f b7 d2 e8 1d e1 fc ff 80 a3 4b 08 00 00 fd 48 83 c4 08 5b 5d e9 2b 8b 69 00 <0f> 0b eb b6 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
> [ 13.515651] RSP: 0018:ffffbaf4407ab8e8 EFLAGS: 00010046
> [ 13.515658] RAX: 0000000000000000 RBX: ffff90aa00ac4000 RCX: 0000000000000009
> [ 13.515663] RDX: 0000000000000000 RSI: 0000000000000014 RDI: ffff90aa00ac4000
> [ 13.515668] RBP: ffff90aa0e0c3810 R08: 0000000000000002 R09: 0000000000000000
> [ 13.515673] R10: 0000000000000000 R11: ffffffffade4e430 R12: ffff90aa011a8800
> [ 13.515678] R13: ffff90aa0e0c3800 R14: ffff90aa011a8800 R15: ffff90aa0e0c3960
> [ 13.515683] FS: 00007fabd67feb40(0000) GS:ffff90aaf7400000(0000) knlGS:0000000000000000
> [ 13.515689] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 13.515695] CR2: 00007f5689ff54c0 CR3: 0000000100f16000 CR4: 00000000001506f0
> [ 13.515700] Call Trace:
> [ 13.515704] <TASK>
> [ 13.515710] amd_iommu_attach_device+0x2e0/0x300
> [ 13.515719] __iommu_attach_device+0x1b/0x90
> [ 13.515727] iommu_attach_group+0x65/0xa0
> [ 13.515735] amd_iommu_init_device+0x16b/0x250 [iommu_v2]
> [ 13.515747] kfd_iommu_resume+0x4c/0x1a0 [amdgpu]
> [ 13.517094] kgd2kfd_resume_iommu+0x12/0x30 [amdgpu]
> [ 13.518419] kgd2kfd_device_init.cold+0x346/0x49a [amdgpu]
> [ 13.519699] amdgpu_amdkfd_device_init+0x142/0x1d0 [amdgpu]
> [ 13.520877] amdgpu_device_init.cold+0x19f5/0x1e21 [amdgpu]
> [ 13.522118] ? _raw_spin_lock_irqsave+0x23/0x50
> [ 13.522126] amdgpu_driver_load_kms+0x15/0x110 [amdgpu]
> [ 13.523386] amdgpu_pci_probe+0x161/0x370 [amdgpu]
> [ 13.524516] local_pci_probe+0x41/0x80
> [ 13.524525] pci_device_probe+0xb3/0x220
> [ 13.524533] really_probe+0xde/0x380
> [ 13.524540] ? pm_runtime_barrier+0x50/0x90
> [ 13.524546] __driver_probe_device+0x78/0x170
> [ 13.524555] driver_probe_device+0x1f/0x90
> [ 13.524560] __driver_attach+0xce/0x1c0
> [ 13.524565] ? __pfx___driver_attach+0x10/0x10
> [ 13.524570] bus_for_each_dev+0x73/0xa0
> [ 13.524575] bus_add_driver+0x1ae/0x200
> [ 13.524580] driver_register+0x89/0xe0
> [ 13.524586] ? __pfx_init_module+0x10/0x10 [amdgpu]
> [ 13.525819] do_one_initcall+0x59/0x230
> [ 13.525828] do_init_module+0x4a/0x200
> [ 13.525834] __do_sys_init_module+0x157/0x180
> [ 13.525839] do_syscall_64+0x5b/0x80
> [ 13.525845] ? handle_mm_fault+0xff/0x2f0
> [ 13.525850] ? do_user_addr_fault+0x1ef/0x690
> [ 13.525856] ? exc_page_fault+0x70/0x170
> [ 13.525860] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 13.525867] RIP: 0033:0x7fabd66cde4e
> [ 13.525872] Code: 48 8b 0d e5 5f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b2 5f 0c 00 f7 d8 64 89 01 48
> [ 13.525878] RSP: 002b:00007ffdd89bc6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> [ 13.525884] RAX: ffffffffffffffda RBX: 0000563e4d23f0a0 RCX: 00007fabd66cde4e
> [ 13.525887] RDX: 00007fabd6817453 RSI: 000000000174fb66 RDI: 00007fabd3bd4010
> [ 13.525890] RBP: 00007fabd6817453 R08: 0000563e4d237c70 R09: 00007fabd672f900
> [ 13.525893] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
> [ 13.525896] R13: 0000563e4d239060 R14: 0000000000000000 R15: 0000563e4d23e450
> [ 13.525900] </TASK>
> [ 13.525902] ---[ end trace 0000000000000000 ]---
> [ 13.525964] ------------[ cut here ]------------

This (including the following) kernel traces are triggered by the
following code.

1698 static int pdev_pri_ats_enable(struct pci_dev *pdev)
1699 {
1700 int ret;
1701
1702 /* Only allow access to user-accessible pages */
1703 ret = pci_enable_pasid(pdev, 0);
1704 if (ret)
1705 goto out_err;

[--cut for short--]

1724 out_err:
1725 pci_disable_pri(pdev);
1726 pci_disable_pasid(pdev);
1727
1728 return ret;
1729 }

pci_disable_pri() and pci_disable_pasid() are called with PCI PASID and
PRI not enabled. There are WARN_ON()s in the pci code for such cases.

This happens in the domain attach device path. I haven't figured out why
the failure of PASID or PRI enabling will cause the domain attach device
to fail. And also why pci_pasid_features() and pci_pri_supported() are
not called before pci_enable_pasid/pri().

commit 201007ef707a ("PCI: Enable PASID only when ACS RR & UF enabled on
upstream path") requires ACS P2P Request Redirect and Upstream
Forwarding are enabled for the path leading to the device when enabling
PASID because PCIe fabric routes Memory Requests based on the TLP
address, ignoring any PASID. I guess this is the reason why
pci_enable_pasid() returns failure and discovers above buggy code.

--
Best regards,
baolu


2023-01-04 16:02:17

by Vasant Hegde

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

On 1/4/2023 12:24 PM, Baolu Lu wrote:
> On 2023/1/4 2:55, Matt Fagnani wrote:
>> I reproduced the problem with 6.2-rc1 in a Fedora 37 installation with early
>> kdump enabled as described
>> athttps://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes 
>> https://github.com/k-hagio/fedora-kexec-tools/blob/master/early-kdump-howto.txt 
>> I panicked the kernel with sysrq+alt+c. The dmesg saved with the kdump showed
>> warnings at drivers/pci/ats.c:251 pci_disable_pri+0x75/0x80 and at
>> drivers/pci/ats.c:419 pci_disable_pasid+0x45/0x50 involving AMD IOMMU and
>> amdgpu functions in the trace. Since those warnings' were
>> if (WARN_ON(!pdev->pri_enabled)) and if (WARN_ON(!pdev->pasid_enabled)),
>> pci_disable_pri and pci_disable_pasid looked like they were called when
>> pdev->pri_enabled and pdev->pasid_enabled were both false.
>> A null pointer dereference occurred right after that which made amdgpu crash.
>>
>> [   13.132368] [drm] amdgpu kernel modesetting enabled.
>> [   13.133766] amdgpu: Topology: Add APU node [0x0:0x0]
>> [   13.137596] Console: switching to colour dummy device 80x25
>> [   13.143717] amdgpu 0000:00:01.0: vgaarb: deactivate vga console
>> [   13.143970] [drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874
>> 0x103C:0x8332 0xCA).
>> [   13.144205] [drm] register mmio base: 0xF0400000
>> [   13.144209] [drm] register mmio size: 262144
>> [   13.144310] [drm] add ip block number 0 <vi_common>
>> [   13.144316] [drm] add ip block number 1 <gmc_v8_0>
>> [   13.144320] [drm] add ip block number 2 <cz_ih>
>> [   13.144324] [drm] add ip block number 3 <gfx_v8_0>
>> [   13.144328] [drm] add ip block number 4 <sdma_v3_0>
>> [   13.144332] [drm] add ip block number 5 <powerplay>
>> [   13.144336] [drm] add ip block number 6 <dm>
>> [   13.144340] [drm] add ip block number 7 <uvd_v6_0>
>> [   13.144343] [drm] add ip block number 8 <vce_v3_0>
>> [   13.144347] [drm] add ip block number 9 <acp_ip>
>> [   13.144388] amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from VFCT
>> [   13.144397] amdgpu: ATOM BIOS: 113-C75100-031
>> [   13.144425] [drm] UVD is enabled in physical mode
>> [   13.144431] [drm] VCE enabled in physical mode
>> [   13.144435] amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature
>> not supported
>> [   13.144491] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
>> fragment size is 9-bit
>> [   13.144503] amdgpu 0000:00:01.0: amdgpu: VRAM: 512M 0x000000F400000000 -
>> 0x000000F41FFFFFFF (512M used)
>> [   13.144511] amdgpu 0000:00:01.0: amdgpu: GART: 1024M 0x000000FF00000000 -
>> 0x000000FF3FFFFFFF
>> [   13.144524] [drm] Detected VRAM RAM=512M, BAR=512M
>> [   13.144529] [drm] RAM width 64bits UNKNOWN
>> [   13.144623] [drm] amdgpu: 512M of VRAM memory ready
>> [   13.144630] [drm] amdgpu: 3572M of GTT memory ready.
>> [   13.144653] [drm] GART: num cpu pages 262144, num gpu pages 262144
>> [   13.144705] [drm] PCIE GART of 1024M enabled (table at 0x000000F400600000).
>> [   13.158820] amdgpu: hwmgr_sw_init smu backed is smu8_smu
>> [   13.175036] [drm] Found UVD firmware Version: 1.91 Family ID: 11
>> [   13.175097] [drm] UVD ENC is disabled
>> [   13.186675] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
>> [   13.187879] amdgpu: smu version 27.18.00
>> [   13.193760] [drm] DM_PPLIB: values for Engine clock
>> [   13.193773] [drm] DM_PPLIB:     300000
>> [   13.193776] [drm] DM_PPLIB:     480000
>> [   13.193779] [drm] DM_PPLIB:     533340
>> [   13.193781] [drm] DM_PPLIB:     576000
>> [   13.193784] [drm] DM_PPLIB:     626090
>> [   13.193786] [drm] DM_PPLIB:     685720
>> [   13.193788] [drm] DM_PPLIB:     720000
>> [   13.193791] [drm] DM_PPLIB:     757900
>> [   13.193793] [drm] DM_PPLIB: Validation clocks:
>> [   13.193796] [drm] DM_PPLIB:    engine_max_clock: 75790
>> [   13.193799] [drm] DM_PPLIB:    memory_max_clock: 93300
>> [   13.193802] [drm] DM_PPLIB:    level           : 8
>> [   13.193806] [drm] DM_PPLIB: values for Display clock
>> [   13.193809] [drm] DM_PPLIB:     300000
>> [   13.193811] [drm] DM_PPLIB:     400000
>> [   13.193814] [drm] DM_PPLIB:     496560
>> [   13.193816] [drm] DM_PPLIB:     626090
>> [   13.193819] [drm] DM_PPLIB:     685720
>> [   13.193821] [drm] DM_PPLIB:     757900
>> [   13.193823] [drm] DM_PPLIB:     800000
>> [   13.193825] [drm] DM_PPLIB:     847060
>> [   13.193828] [drm] DM_PPLIB: Validation clocks:
>> [   13.193830] [drm] DM_PPLIB:    engine_max_clock: 75790
>> [   13.193833] [drm] DM_PPLIB:    memory_max_clock: 93300
>> [   13.193836] [drm] DM_PPLIB:    level           : 8
>> [   13.193839] [drm] DM_PPLIB: values for Memory clock
>> [   13.193842] [drm] DM_PPLIB:     667000
>> [   13.193844] [drm] DM_PPLIB:     933000
>> [   13.193847] [drm] DM_PPLIB: Validation clocks:
>> [   13.193849] [drm] DM_PPLIB:    engine_max_clock: 75790
>> [   13.193852] [drm] DM_PPLIB:    memory_max_clock: 93300
>> [   13.193854] [drm] DM_PPLIB:    level           : 8
>> [   13.193973] [drm] Display Core initialized with v3.2.215!
>> [   13.309967] [drm] UVD initialized successfully.
>> [   13.511031] [drm] VCE initialized successfully.
>> [   13.515217] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
>> [   13.515442] amdgpu: sdma_bitmap: f
>> [   13.515549] ------------[ cut here ]------------
>> [   13.515555] WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:251
>> pci_disable_pri+0x75/0x80
>> [   13.515571] Modules linked in: amdgpu(+) drm_ttm_helper ttm iommu_v2
>> hid_logitech_hidpp crct10dif_pclmul drm_buddy crc32_pclmul gpu_sched
>> crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
>> drm_display_helper wdat_wdt serio_raw hid_multitouch sp5100_tco
>> hid_logitech_dj r8169 cec video wmi scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse
>> dm_multipath
>> [   13.515620] CPU: 0 PID: 477 Comm: systemd-udevd Kdump: loaded Not tainted
>> 6.2.0-0.rc1.14.fc38.x86_64 #1
>> [   13.515628] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019
>> [   13.515634] RIP: 0010:pci_disable_pri+0x75/0x80
>> [   13.515642] Code: 54 24 06 89 ee 48 89 df 83 e2 fe 66 89 54 24 06 0f b7 d2
>> e8 1d e1 fc ff 80 a3 4b 08 00 00 fd 48 83 c4 08 5b 5d e9 2b 8b 69 00 <0f> 0b
>> eb b6 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90
>> [   13.515651] RSP: 0018:ffffbaf4407ab8e8 EFLAGS: 00010046
>> [   13.515658] RAX: 0000000000000000 RBX: ffff90aa00ac4000 RCX: 0000000000000009
>> [   13.515663] RDX: 0000000000000000 RSI: 0000000000000014 RDI: ffff90aa00ac4000
>> [   13.515668] RBP: ffff90aa0e0c3810 R08: 0000000000000002 R09: 0000000000000000
>> [   13.515673] R10: 0000000000000000 R11: ffffffffade4e430 R12: ffff90aa011a8800
>> [   13.515678] R13: ffff90aa0e0c3800 R14: ffff90aa011a8800 R15: ffff90aa0e0c3960
>> [   13.515683] FS:  00007fabd67feb40(0000) GS:ffff90aaf7400000(0000)
>> knlGS:0000000000000000
>> [   13.515689] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   13.515695] CR2: 00007f5689ff54c0 CR3: 0000000100f16000 CR4: 00000000001506f0
>> [   13.515700] Call Trace:
>> [   13.515704]  <TASK>
>> [   13.515710]  amd_iommu_attach_device+0x2e0/0x300
>> [   13.515719]  __iommu_attach_device+0x1b/0x90
>> [   13.515727]  iommu_attach_group+0x65/0xa0
>> [   13.515735]  amd_iommu_init_device+0x16b/0x250 [iommu_v2]
>> [   13.515747]  kfd_iommu_resume+0x4c/0x1a0 [amdgpu]
>> [   13.517094]  kgd2kfd_resume_iommu+0x12/0x30 [amdgpu]
>> [   13.518419]  kgd2kfd_device_init.cold+0x346/0x49a [amdgpu]
>> [   13.519699]  amdgpu_amdkfd_device_init+0x142/0x1d0 [amdgpu]
>> [   13.520877]  amdgpu_device_init.cold+0x19f5/0x1e21 [amdgpu]
>> [   13.522118]  ? _raw_spin_lock_irqsave+0x23/0x50
>> [   13.522126]  amdgpu_driver_load_kms+0x15/0x110 [amdgpu]
>> [   13.523386]  amdgpu_pci_probe+0x161/0x370 [amdgpu]
>> [   13.524516]  local_pci_probe+0x41/0x80
>> [   13.524525]  pci_device_probe+0xb3/0x220
>> [   13.524533]  really_probe+0xde/0x380
>> [   13.524540]  ? pm_runtime_barrier+0x50/0x90
>> [   13.524546]  __driver_probe_device+0x78/0x170
>> [   13.524555]  driver_probe_device+0x1f/0x90
>> [   13.524560]  __driver_attach+0xce/0x1c0
>> [   13.524565]  ? __pfx___driver_attach+0x10/0x10
>> [   13.524570]  bus_for_each_dev+0x73/0xa0
>> [   13.524575]  bus_add_driver+0x1ae/0x200
>> [   13.524580]  driver_register+0x89/0xe0
>> [   13.524586]  ? __pfx_init_module+0x10/0x10 [amdgpu]
>> [   13.525819]  do_one_initcall+0x59/0x230
>> [   13.525828]  do_init_module+0x4a/0x200
>> [   13.525834]  __do_sys_init_module+0x157/0x180
>> [   13.525839]  do_syscall_64+0x5b/0x80
>> [   13.525845]  ? handle_mm_fault+0xff/0x2f0
>> [   13.525850]  ? do_user_addr_fault+0x1ef/0x690
>> [   13.525856]  ? exc_page_fault+0x70/0x170
>> [   13.525860]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>> [   13.525867] RIP: 0033:0x7fabd66cde4e
>> [   13.525872] Code: 48 8b 0d e5 5f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
>> 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
>> 01 f0 ff ff 73 01 c3 48 8b 0d b2 5f 0c 00 f7 d8 64 89 01 48
>> [   13.525878] RSP: 002b:00007ffdd89bc6a8 EFLAGS: 00000246 ORIG_RAX:
>> 00000000000000af
>> [   13.525884] RAX: ffffffffffffffda RBX: 0000563e4d23f0a0 RCX: 00007fabd66cde4e
>> [   13.525887] RDX: 00007fabd6817453 RSI: 000000000174fb66 RDI: 00007fabd3bd4010
>> [   13.525890] RBP: 00007fabd6817453 R08: 0000563e4d237c70 R09: 00007fabd672f900
>> [   13.525893] R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000020000
>> [   13.525896] R13: 0000563e4d239060 R14: 0000000000000000 R15: 0000563e4d23e450
>> [   13.525900]  </TASK>
>> [   13.525902] ---[ end trace 0000000000000000 ]---
>> [   13.525964] ------------[ cut here ]------------
>
> This (including the following) kernel traces are triggered by the
> following code.
>
> 1698 static int pdev_pri_ats_enable(struct pci_dev *pdev)
> 1699 {
> 1700         int ret;
> 1701
> 1702         /* Only allow access to user-accessible pages */
> 1703         ret = pci_enable_pasid(pdev, 0);
> 1704         if (ret)
> 1705                 goto out_err;
>
> [--cut for short--]
>
> 1724 out_err:
> 1725         pci_disable_pri(pdev);
> 1726         pci_disable_pasid(pdev);
> 1727
> 1728         return ret;
> 1729 }
>
> pci_disable_pri() and pci_disable_pasid() are called with PCI PASID and
> PRI not enabled. There are WARN_ON()s in the pci code for such cases.

Yeah. Error path needs to be fixed.

>
> This happens in the domain attach device path. I haven't figured out why
> the failure of PASID or PRI enabling will cause the domain attach device
> to fail. And also why pci_pasid_features() and pci_pri_supported() are
> not called before pci_enable_pasid/pri().

PASID/PRI support is verified in amd_iommu_device_info().
For AMD GPUs (PASID/PRI supported devices)
- We allocate new domain called V2 domain and then attach device(s).
amd_iommu_init_device() - > iommu_attach_group()
In attach devices path :
amd_iommu_attach_device() -> attach_device()
If domain is v2 domain and device is PASID/PRI capable, then we try to
enable PASID/PRI. This is where we are hitting WARN_ON.

I think if attach device fails then we should put the device/group back to
default domain so that we don't hit these warnings.

Matt,

Can you please test below patch? (its not a fix to original issue, but to avoid
kernel warnings/traces).

>
> commit 201007ef707a ("PCI: Enable PASID only when ACS RR & UF enabled on
> upstream path") requires ACS P2P Request Redirect and Upstream
> Forwarding are enabled for the path leading to the device when enabling
> PASID because PCIe fabric routes Memory Requests based on the TLP
> address, ignoring any PASID. I guess this is the reason why
> pci_enable_pasid() returns failure and discovers above buggy code.

Can we get the lspci -vvv output. It will tell whether ACS request support.


-Vasant

-----
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cbeaab55c0db..f81ab787eee1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1702,27 +1702,26 @@ static int pdev_pri_ats_enable(struct pci_dev *pdev)
/* Only allow access to user-accessible pages */
ret = pci_enable_pasid(pdev, 0);
if (ret)
- goto out_err;
+ return ret;

/* First reset the PRI state of the device */
ret = pci_reset_pri(pdev);
if (ret)
- goto out_err;
+ goto out_pasid;

/* Enable PRI */
/* FIXME: Hardcode number of outstanding requests for now */
ret = pci_enable_pri(pdev, 32);
if (ret)
- goto out_err;
+ goto out_pasid;

ret = pci_enable_ats(pdev, PAGE_SHIFT);
- if (ret)
- goto out_err;
-
- return 0;
+ if (!ret)
+ return 0;

-out_err:
pci_disable_pri(pdev);
+
+out_pasid:
pci_disable_pasid(pdev);

return ret;
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index 864e4ffb6aa9..4228e44b4950 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -815,6 +815,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
return 0;

out_drop_group:
+ iommu_detach_group(dev_state->domain, group);
iommu_group_put(group);

out_free_domain:

2023-01-05 01:27:16

by Matt Fagnani

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

I built 6.2-rc2 with the patch applied. The same black screen problem
happened with 6.2-rc2 with the patch. I tried to use early kdump with
6.2-rc2 with the patch twice by panicking the kernel with sysrq+alt+c
after the black screen happened. The system rebooted after about 10-20
seconds both times, but no kdump and dmesg files were saved in
/var/crash. I'm attaching the lspci -vvv output as requested.


Attachments:
lspci-vvv-2.txt (33.08 kB)

2023-01-05 10:54:24

by Vasant Hegde

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

Matt,

On 1/5/2023 6:39 AM, Matt Fagnani wrote:
> I built 6.2-rc2 with the patch applied. The same black screen problem happened
> with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
> patch twice by panicking the kernel with sysrq+alt+c after the black screen
> happened. The system rebooted after about 10-20 seconds both times, but no kdump
> and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
> requested.
>

Thanks for testing. As mentioned earlier I was not expecting this patch to fix
the black screen issue. It should fix kernel warnings and IOMMU page fault
related call traces. By any chance do you have the kernel boot logs?


@Baolu,
Looking into lspci output, it doesn't list ACS feature for Graphics card. So
with your fix it didn't enable PASID and hence it failed to boot.

-Vasant

2023-01-05 11:01:29

by Lu Baolu

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

On 2023/1/5 18:27, Vasant Hegde wrote:
> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>> I built 6.2-rc2 with the patch applied. The same black screen problem happened
>> with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
>> patch twice by panicking the kernel with sysrq+alt+c after the black screen
>> happened. The system rebooted after about 10-20 seconds both times, but no kdump
>> and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
>> requested.
>>
> Thanks for testing. As mentioned earlier I was not expecting this patch to fix
> the black screen issue. It should fix kernel warnings and IOMMU page fault
> related call traces. By any chance do you have the kernel boot logs?
>
>
> @Baolu,
> Looking into lspci output, it doesn't list ACS feature for Graphics card. So
> with your fix it didn't enable PASID and hence it failed to boot.

So do you mind telling why does the PASID need to be enabled for the
graphic device? Or in another word, what does the graphic driver use the
PASID for?

--
Best regards,
baolu

2023-01-05 20:17:13

by Matt Fagnani

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

I booted 6.2-rc2 + the patch four times with early kdump enabled and
panicked the kernel. There weren't any kdump or dmesg files saved to
/var/crash though. Nothing showed up in the journal from boots where the
problem happened. The amdgpu crash happened before systemd-journald
started from what I could tell. I tried to rebuild
/boot/initramfs-6.2.0-rc2+kdump.img with amd_iommu=off added to the
kernel command line with dracut, but an error that the kdumpbase module
couldn't be found was shown. I read that a different dump capture kernel
could be used with kdump, but I haven't figured out how to use that with
early kdump yet. If anyone has ideas how to get the kdump and dmesg log,
let me know. Thanks.

2023-01-06 07:31:21

by Matt Fagnani

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

I booted 6.2-rc2 + patch with rd.driver.blacklist=amdgpu on the kernel
command line to prevent amdgpu from being started while the initramfs
was in use. The black screen problem happened later in the boot. I
pressed sysrq+alt+s,u,b to do an emergency sync, remount read-only, and
reboot. The journal for that boot was shown on the next boot. The two
warnings which I previously reported weren't shown in the journal, but
the same null pointer dereference which made amdgpu crash happened. I'm
attaching the kernel log from the journal of that boot.


Attachments:
6.2-rc2-patch-blacklist-amdgpu-journalctl-b-1-k-1.txt (106.00 kB)

2023-01-06 14:52:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [regression, bisected, pci /iommu] Bug 216865 - Black screen whe n amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

On Thu, Jan 05, 2023 at 03:57:28PM +0530, Vasant Hegde wrote:
> Matt,
>
> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
> > I built 6.2-rc2 with the patch applied. The same black screen problem happened
> > with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
> > patch twice by panicking the kernel with sysrq+alt+c after the black screen
> > happened. The system rebooted after about 10-20 seconds both times, but no kdump
> > and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
> > requested.
> >
>
> Thanks for testing. As mentioned earlier I was not expecting this patch to fix
> the black screen issue. It should fix kernel warnings and IOMMU page fault
> related call traces. By any chance do you have the kernel boot logs?
>
>
> @Baolu,
> Looking into lspci output, it doesn't list ACS feature for Graphics card. So
> with your fix it didn't enable PASID and hence it failed to boot.

The ACS checks being done are feature of the path not the end point or
root port.

If we are expecting ACS on the end port then it is just a bug in how
the test was written.. The test should be a NOP because there are no
switches in this topology.

Looking at it, this seems to just be because pci_enable_pasid is
calling pci_acs_path_enabled wrong, the only other user is here:

for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
if (!bus->self)
continue;

if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
break;

pdev = bus->self;

group = iommu_group_get(&pdev->dev);
if (group)
return group;
}

And notice it is calling it on pdev->bus not on pdev itself which
naturally excludes the end point from the ACS validation.

So try something like:

if (!pci_acs_path_enabled(pdev->bus->self, NULL, PCI_ACS_RR | PCI_ACS_UF))

(and probably need to check for null ?)

Jason

2023-01-07 02:55:22

by Lu Baolu

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

On 1/6/2023 10:14 PM, Jason Gunthorpe wrote:
> On Thu, Jan 05, 2023 at 03:57:28PM +0530, Vasant Hegde wrote:
>> Matt,
>>
>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>> I built 6.2-rc2 with the patch applied. The same black screen problem happened
>>> with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
>>> patch twice by panicking the kernel with sysrq+alt+c after the black screen
>>> happened. The system rebooted after about 10-20 seconds both times, but no kdump
>>> and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
>>> requested.
>>>
>>
>> Thanks for testing. As mentioned earlier I was not expecting this patch to fix
>> the black screen issue. It should fix kernel warnings and IOMMU page fault
>> related call traces. By any chance do you have the kernel boot logs?
>>
>>
>> @Baolu,
>> Looking into lspci output, it doesn't list ACS feature for Graphics card. So
>> with your fix it didn't enable PASID and hence it failed to boot.
>
> The ACS checks being done are feature of the path not the end point or
> root port.
>
> If we are expecting ACS on the end port then it is just a bug in how
> the test was written.. The test should be a NOP because there are no
> switches in this topology.
>
> Looking at it, this seems to just be because pci_enable_pasid is
> calling pci_acs_path_enabled wrong, the only other user is here:
>
> for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
> if (!bus->self)
> continue;
>
> if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
> break;
>
> pdev = bus->self;
>
> group = iommu_group_get(&pdev->dev);
> if (group)
> return group;
> }
>
> And notice it is calling it on pdev->bus not on pdev itself which
> naturally excludes the end point from the ACS validation.
>
> So try something like:
>
> if (!pci_acs_path_enabled(pdev->bus->self, NULL, PCI_ACS_RR | PCI_ACS_UF))
>
> (and probably need to check for null ?)

Yeah! This really is a misuse of pci_acs_path_enabled().

But if @pdev is an endpoint of a multiple function device, perhaps we
still need to check acs on it?

--
Best regards,
baolu

2023-01-09 13:47:58

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [regression, bisected, pci /iommu] Bug 216865 - Black screen whe n amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

On Sat, Jan 07, 2023 at 10:44:46AM +0800, Baolu Lu wrote:
> On 1/6/2023 10:14 PM, Jason Gunthorpe wrote:
> > On Thu, Jan 05, 2023 at 03:57:28PM +0530, Vasant Hegde wrote:
> > > Matt,
> > >
> > > On 1/5/2023 6:39 AM, Matt Fagnani wrote:
> > > > I built 6.2-rc2 with the patch applied. The same black screen problem happened
> > > > with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
> > > > patch twice by panicking the kernel with sysrq+alt+c after the black screen
> > > > happened. The system rebooted after about 10-20 seconds both times, but no kdump
> > > > and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
> > > > requested.
> > > >
> > >
> > > Thanks for testing. As mentioned earlier I was not expecting this patch to fix
> > > the black screen issue. It should fix kernel warnings and IOMMU page fault
> > > related call traces. By any chance do you have the kernel boot logs?
> > >
> > >
> > > @Baolu,
> > > Looking into lspci output, it doesn't list ACS feature for Graphics card. So
> > > with your fix it didn't enable PASID and hence it failed to boot.
> >
> > The ACS checks being done are feature of the path not the end point or
> > root port.
> >
> > If we are expecting ACS on the end port then it is just a bug in how
> > the test was written.. The test should be a NOP because there are no
> > switches in this topology.
> >
> > Looking at it, this seems to just be because pci_enable_pasid is
> > calling pci_acs_path_enabled wrong, the only other user is here:
> >
> > for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
> > if (!bus->self)
> > continue;
> >
> > if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
> > break;
> >
> > pdev = bus->self;
> >
> > group = iommu_group_get(&pdev->dev);
> > if (group)
> > return group;
> > }
> >
> > And notice it is calling it on pdev->bus not on pdev itself which
> > naturally excludes the end point from the ACS validation.
> >
> > So try something like:
> >
> > if (!pci_acs_path_enabled(pdev->bus->self, NULL, PCI_ACS_RR | PCI_ACS_UF))
> >
> > (and probably need to check for null ?)
>
> Yeah! This really is a misuse of pci_acs_path_enabled().
>
> But if @pdev is an endpoint of a multiple function device, perhaps we
> still need to check acs on it?

Ah, I don't know anything about what this means from a spec
perspective.

Certainly if a function can internalize MMIO and loop it back to
another function then it surely is not OK for PASID either, nor should
those functions be in different iommu groups.

So, either this never happens for some spec reason, or the test in the
iommu code forming groups is incorrect.

Jason

2023-01-10 05:36:58

by Lu Baolu

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

On 2023/1/9 21:43, Jason Gunthorpe wrote:
> On Sat, Jan 07, 2023 at 10:44:46AM +0800, Baolu Lu wrote:
>> On 1/6/2023 10:14 PM, Jason Gunthorpe wrote:
>>> On Thu, Jan 05, 2023 at 03:57:28PM +0530, Vasant Hegde wrote:
>>>> Matt,
>>>>
>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>> I built 6.2-rc2 with the patch applied. The same black screen problem happened
>>>>> with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
>>>>> patch twice by panicking the kernel with sysrq+alt+c after the black screen
>>>>> happened. The system rebooted after about 10-20 seconds both times, but no kdump
>>>>> and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
>>>>> requested.
>>>>>
>>>>
>>>> Thanks for testing. As mentioned earlier I was not expecting this patch to fix
>>>> the black screen issue. It should fix kernel warnings and IOMMU page fault
>>>> related call traces. By any chance do you have the kernel boot logs?
>>>>
>>>>
>>>> @Baolu,
>>>> Looking into lspci output, it doesn't list ACS feature for Graphics card. So
>>>> with your fix it didn't enable PASID and hence it failed to boot.
>>>
>>> The ACS checks being done are feature of the path not the end point or
>>> root port.
>>>
>>> If we are expecting ACS on the end port then it is just a bug in how
>>> the test was written.. The test should be a NOP because there are no
>>> switches in this topology.
>>>
>>> Looking at it, this seems to just be because pci_enable_pasid is
>>> calling pci_acs_path_enabled wrong, the only other user is here:
>>>
>>> for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
>>> if (!bus->self)
>>> continue;
>>>
>>> if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
>>> break;
>>>
>>> pdev = bus->self;
>>>
>>> group = iommu_group_get(&pdev->dev);
>>> if (group)
>>> return group;
>>> }
>>>
>>> And notice it is calling it on pdev->bus not on pdev itself which
>>> naturally excludes the end point from the ACS validation.
>>>
>>> So try something like:
>>>
>>> if (!pci_acs_path_enabled(pdev->bus->self, NULL, PCI_ACS_RR | PCI_ACS_UF))
>>>
>>> (and probably need to check for null ?)
>>
>> Yeah! This really is a misuse of pci_acs_path_enabled().
>>
>> But if @pdev is an endpoint of a multiple function device, perhaps we
>> still need to check acs on it?
>
> Ah, I don't know anything about what this means from a spec
> perspective.
>
> Certainly if a function can internalize MMIO and loop it back to
> another function then it surely is not OK for PASID either, nor should
> those functions be in different iommu groups.
>
> So, either this never happens for some spec reason, or the test in the
> iommu code forming groups is incorrect.

The pci_device_group() path handles this like below:

/*
* For multifunction devices which are not isolated from each other, find
* all the other non-isolated functions and look for existing groups. For
* each function, we also need to look for aliases to or from other devices
* that may already have a group.
*/
static struct iommu_group *get_pci_function_alias_group(struct pci_dev
*pdev,
unsigned long
*devfns)
{
struct pci_dev *tmp = NULL;
struct iommu_group *group;

if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
return NULL;

It seems that all devices of an MFD shares a single iommu group if
there lacks ACS control.

--
Best regards,
baolu

2023-01-10 05:58:26

by Lu Baolu

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

On 2023/1/6 22:14, Jason Gunthorpe wrote:
> On Thu, Jan 05, 2023 at 03:57:28PM +0530, Vasant Hegde wrote:
>> Matt,
>>
>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>> I built 6.2-rc2 with the patch applied. The same black screen problem happened
>>> with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
>>> patch twice by panicking the kernel with sysrq+alt+c after the black screen
>>> happened. The system rebooted after about 10-20 seconds both times, but no kdump
>>> and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
>>> requested.
>>>
>> Thanks for testing. As mentioned earlier I was not expecting this patch to fix
>> the black screen issue. It should fix kernel warnings and IOMMU page fault
>> related call traces. By any chance do you have the kernel boot logs?
>>
>>
>> @Baolu,
>> Looking into lspci output, it doesn't list ACS feature for Graphics card. So
>> with your fix it didn't enable PASID and hence it failed to boot.
> The ACS checks being done are feature of the path not the end point or
> root port.
>
> If we are expecting ACS on the end port then it is just a bug in how
> the test was written.. The test should be a NOP because there are no
> switches in this topology.
>
> Looking at it, this seems to just be because pci_enable_pasid is
> calling pci_acs_path_enabled wrong, the only other user is here:
>
> for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
> if (!bus->self)
> continue;
>
> if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
> break;
>
> pdev = bus->self;
>
> group = iommu_group_get(&pdev->dev);
> if (group)
> return group;
> }
>
> And notice it is calling it on pdev->bus not on pdev itself which
> naturally excludes the end point from the ACS validation.
>
> So try something like:
>
> if (!pci_acs_path_enabled(pdev->bus->self, NULL, PCI_ACS_RR | PCI_ACS_UF))
>
> (and probably need to check for null ?)

Hi Matt,

Do you mind helping to test below change? No other change needed.

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index f9cc2e10b676..48f34cc996e4 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -382,8 +382,15 @@ int pci_enable_pasid(struct pci_dev *pdev, int
features)
if (!pasid)
return -EINVAL;

- if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
- return -EINVAL;
+ if (pdev->multifunction) {
+ if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR |
PCI_ACS_UF))
+ return -EINVAL;
+ } else {
+ if (!pdev->bus->self ||
+ !pci_acs_path_enabled(pdev->bus->self, NULL,
+ PCI_ACS_RR | PCI_ACS_UF))
+ return -EINVAL;
+ }

pci_read_config_word(pdev, pasid + PCI_PASID_CAP, &supported);
supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;

--
Best regards,
baolu

2023-01-10 08:34:37

by Matt Fagnani

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

Baolu,

I tried to apply your patch after checking out 6.2-rc3 and origin/master
but there were there the following errors.

git apply amd-iommu-amdgpu-boot-crash-2.patch
error: patch failed: drivers/pci/ats.c:382
error: drivers/pci/ats.c: patch does not apply

I manually changed drivers/pci/ats.c as shown in the patch. I built
6.2-rc3 + the patch. 6.2-rc3 with the patch had the same black screen
problem when booting. I added rd.driver.blacklist=amdgpu on the kernel
command line to prevent amdgpu from being started while the initramfs
was in use, and the black screen happened later in the boot as I
described in my previous email. The journal showed the same two warnings
and null pointer dereference which made amdgpu crash as I reported.

Thanks,

Matt


2023-01-10 13:52:27

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [regression, bisected, pci /iommu] Bug 216865 - Black screen whe n amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

On Tue, Jan 10, 2023 at 01:48:39PM +0800, Baolu Lu wrote:
> On 2023/1/6 22:14, Jason Gunthorpe wrote:
> > On Thu, Jan 05, 2023 at 03:57:28PM +0530, Vasant Hegde wrote:
> > > Matt,
> > >
> > > On 1/5/2023 6:39 AM, Matt Fagnani wrote:
> > > > I built 6.2-rc2 with the patch applied. The same black screen problem happened
> > > > with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
> > > > patch twice by panicking the kernel with sysrq+alt+c after the black screen
> > > > happened. The system rebooted after about 10-20 seconds both times, but no kdump
> > > > and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
> > > > requested.
> > > >
> > > Thanks for testing. As mentioned earlier I was not expecting this patch to fix
> > > the black screen issue. It should fix kernel warnings and IOMMU page fault
> > > related call traces. By any chance do you have the kernel boot logs?
> > >
> > >
> > > @Baolu,
> > > Looking into lspci output, it doesn't list ACS feature for Graphics card. So
> > > with your fix it didn't enable PASID and hence it failed to boot.
> > The ACS checks being done are feature of the path not the end point or
> > root port.
> >
> > If we are expecting ACS on the end port then it is just a bug in how
> > the test was written.. The test should be a NOP because there are no
> > switches in this topology.
> >
> > Looking at it, this seems to just be because pci_enable_pasid is
> > calling pci_acs_path_enabled wrong, the only other user is here:
> >
> > for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
> > if (!bus->self)
> > continue;
> >
> > if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
> > break;
> >
> > pdev = bus->self;
> >
> > group = iommu_group_get(&pdev->dev);
> > if (group)
> > return group;
> > }
> >
> > And notice it is calling it on pdev->bus not on pdev itself which
> > naturally excludes the end point from the ACS validation.
> >
> > So try something like:
> >
> > if (!pci_acs_path_enabled(pdev->bus->self, NULL, PCI_ACS_RR | PCI_ACS_UF))
> >
> > (and probably need to check for null ?)
>
> Hi Matt,
>
> Do you mind helping to test below change? No other change needed.
>
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index f9cc2e10b676..48f34cc996e4 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -382,8 +382,15 @@ int pci_enable_pasid(struct pci_dev *pdev, int
> features)
> if (!pasid)
> return -EINVAL;
>
> - if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
> - return -EINVAL;
> + if (pdev->multifunction) {
> + if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR |
> PCI_ACS_UF))
> + return -EINVAL;

The AMD device is multi-function according to the lspci, and we
already know that 'pci_acs_path_enabled' will fail on it because that
is the problem..

Actually, I remember it is supposed to be like this:

https://lore.kernel.org/linux-iommu/[email protected]/

The GPU and sound device are considered non-isolated by the group
code, presumably because of the missing ACS caps.

So, if I remember the issue, PCIe says that MemWr/Rd are routed
according to their address and ignore the PASID header.

A multifunction device is permitted to loop back DMAs one function
issues that match a MMIO BAR of another function. eg the GPU could DMA
to an MMIO address that overlaps the sound device and the function
will deliver the MMIO to the sound device not the host bridge even
though it is PASID tagged.

This is what get_pci_function_alias_group() is looking for.

Multifunction devices that do not do that are supposed to set the ACS
RR|UF bits and get_pci_function_alias_group()/etc are supposed to
succeed.

Thus - the PCI information is telling us that the AMD GPU device does
not support PASID because it may be looping back the MMIO to the other
functions on the device and thus creating an unacceptable hole in the
PASID address space.

So - we need AMD to comment on which of these describes their GPU device:

1) Is the issue that the PCI Caps are incorrect on this device and
there is no loopback? Thus we should fix it with a quirk to correct
the caps which will naturally split the iommu group too.

2) Is the device broken and loops back PASID DMAs and we are
legimiate and correct in blocking PASID? So far AMD just got lucky
that no user had a SVA that overlaps with MMIO? Seems unlikely

3) Is the device odd in that it doesn't loop back PASID tagged DMAs,
but does loop untagged? I would say this is non-compliant and PCI
provides no way to describe this. But we should again quirk it to
allow the PASID to be enabled but keep the group separated.

Alex/Christian/Pan - can you please find out? The HW is:

00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] (rev ca) (prog-if 00 [VGA controller])
DeviceName: ATI EG BROADWAY
Subsystem: Hewlett-Packard Company Device 8332
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
Subsystem: Hewlett-Packard Company Device 8332

https://lore.kernel.org/all/[email protected]/

> + } else {
> + if (!pdev->bus->self ||
> + !pci_acs_path_enabled(pdev->bus->self, NULL,
> + PCI_ACS_RR | PCI_ACS_UF))
> + return -EINVAL;
> + }

Why would these be exclusive? Both the path and endpoint needs to be
checked

Jason

2023-01-10 14:17:55

by Christian König

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

Am 10.01.23 um 14:25 schrieb Jason Gunthorpe:
> On Tue, Jan 10, 2023 at 01:48:39PM +0800, Baolu Lu wrote:
>> On 2023/1/6 22:14, Jason Gunthorpe wrote:
>>> On Thu, Jan 05, 2023 at 03:57:28PM +0530, Vasant Hegde wrote:
>>>> Matt,
>>>>
>>>> On 1/5/2023 6:39 AM, Matt Fagnani wrote:
>>>>> I built 6.2-rc2 with the patch applied. The same black screen problem happened
>>>>> with 6.2-rc2 with the patch. I tried to use early kdump with 6.2-rc2 with the
>>>>> patch twice by panicking the kernel with sysrq+alt+c after the black screen
>>>>> happened. The system rebooted after about 10-20 seconds both times, but no kdump
>>>>> and dmesg files were saved in /var/crash. I'm attaching the lspci -vvv output as
>>>>> requested.
>>>>>
>>>> Thanks for testing. As mentioned earlier I was not expecting this patch to fix
>>>> the black screen issue. It should fix kernel warnings and IOMMU page fault
>>>> related call traces. By any chance do you have the kernel boot logs?
>>>>
>>>>
>>>> @Baolu,
>>>> Looking into lspci output, it doesn't list ACS feature for Graphics card. So
>>>> with your fix it didn't enable PASID and hence it failed to boot.
>>> The ACS checks being done are feature of the path not the end point or
>>> root port.
>>>
>>> If we are expecting ACS on the end port then it is just a bug in how
>>> the test was written.. The test should be a NOP because there are no
>>> switches in this topology.
>>>
>>> Looking at it, this seems to just be because pci_enable_pasid is
>>> calling pci_acs_path_enabled wrong, the only other user is here:
>>>
>>> for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
>>> if (!bus->self)
>>> continue;
>>>
>>> if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
>>> break;
>>>
>>> pdev = bus->self;
>>>
>>> group = iommu_group_get(&pdev->dev);
>>> if (group)
>>> return group;
>>> }
>>>
>>> And notice it is calling it on pdev->bus not on pdev itself which
>>> naturally excludes the end point from the ACS validation.
>>>
>>> So try something like:
>>>
>>> if (!pci_acs_path_enabled(pdev->bus->self, NULL, PCI_ACS_RR | PCI_ACS_UF))
>>>
>>> (and probably need to check for null ?)
>> Hi Matt,
>>
>> Do you mind helping to test below change? No other change needed.
>>
>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>> index f9cc2e10b676..48f34cc996e4 100644
>> --- a/drivers/pci/ats.c
>> +++ b/drivers/pci/ats.c
>> @@ -382,8 +382,15 @@ int pci_enable_pasid(struct pci_dev *pdev, int
>> features)
>> if (!pasid)
>> return -EINVAL;
>>
>> - if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
>> - return -EINVAL;
>> + if (pdev->multifunction) {
>> + if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR |
>> PCI_ACS_UF))
>> + return -EINVAL;
> The AMD device is multi-function according to the lspci, and we
> already know that 'pci_acs_path_enabled' will fail on it because that
> is the problem..
>
> Actually, I remember it is supposed to be like this:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-iommu%2FYgpb6CxmTdUHiN50%408bytes.org%2F&data=05%7C01%7Cchristian.koenig%40amd.com%7Cb45e8c5a24394d66ae2908daf30e3802%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638089539666187724%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vf9QsDFqp9s1NUxuP5iMsQJn1R0K9tVRTImTR6uZWAE%3D&reserved=0
>
> The GPU and sound device are considered non-isolated by the group
> code, presumably because of the missing ACS caps.
>
> So, if I remember the issue, PCIe says that MemWr/Rd are routed
> according to their address and ignore the PASID header.
>
> A multifunction device is permitted to loop back DMAs one function
> issues that match a MMIO BAR of another function. eg the GPU could DMA
> to an MMIO address that overlaps the sound device and the function
> will deliver the MMIO to the sound device not the host bridge even
> though it is PASID tagged.
>
> This is what get_pci_function_alias_group() is looking for.
>
> Multifunction devices that do not do that are supposed to set the ACS
> RR|UF bits and get_pci_function_alias_group()/etc are supposed to
> succeed.
>
> Thus - the PCI information is telling us that the AMD GPU device does
> not support PASID because it may be looping back the MMIO to the other
> functions on the device and thus creating an unacceptable hole in the
> PASID address space.
>
> So - we need AMD to comment on which of these describes their GPU device:
>
> 1) Is the issue that the PCI Caps are incorrect on this device and
> there is no loopback? Thus we should fix it with a quirk to correct
> the caps which will naturally split the iommu group too.
>
> 2) Is the device broken and loops back PASID DMAs and we are
> legimiate and correct in blocking PASID? So far AMD just got lucky
> that no user had a SVA that overlaps with MMIO? Seems unlikely
>
> 3) Is the device odd in that it doesn't loop back PASID tagged DMAs,
> but does loop untagged? I would say this is non-compliant and PCI
> provides no way to describe this. But we should again quirk it to
> allow the PASID to be enabled but keep the group separated.

Mhm, I don't have a Kabini at hand but I have a Raven and there I see on
the GPU:

    Capabilities: [2a0 v1] Access Control Services
        ACSCap:    SrcValid- TransBlk- ReqRedir- CmpltRedir-
UpstreamFwd- EgressCtrl- DirectTrans-
        ACSCtl:    SrcValid- TransBlk- ReqRedir- CmpltRedir-
UpstreamFwd- EgressCtrl- DirectTrans-

    Capabilities: [2b0 v1] Address Translation Service (ATS)
        ATSCap:    Invalidate Queue Depth: 00
        ATSCtl:    Enable+, Smallest Translation Unit: 00

On the bridge:

    Capabilities: [2a0 v1] Access Control Services
        ACSCap:    SrcValid+ TransBlk+ ReqRedir- CmpltRedir-
UpstreamFwd- EgressCtrl- DirectTrans-
        ACSCtl:    SrcValid+ TransBlk- ReqRedir- CmpltRedir-
UpstreamFwd- EgressCtrl- DirectTrans-

And I'm like 99% sure that Kabini/Wani should be identical to that.

Since this is a device integrated in the CPU it could be that the
ACS/ATS functionalities are controlled by the BIOS and can be
enabled/disabled there. But this should always enable/disable both.

Regards,
Christian.

>
> Alex/Christian/Pan - can you please find out? The HW is:
>
> 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] (rev ca) (prog-if 00 [VGA controller])
> DeviceName: ATI EG BROADWAY
> Subsystem: Hewlett-Packard Company Device 8332
> 00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
> Subsystem: Hewlett-Packard Company Device 8332
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F223ee6d6-70ea-1d53-8bc2-2d22201d8dde%40bell.net%2F&data=05%7C01%7Cchristian.koenig%40amd.com%7Cb45e8c5a24394d66ae2908daf30e3802%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638089539666187724%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TMB3pXS0eUKPZhcRCvUIxzvJvPYosvv3ofrFKZx7b%2FI%3D&reserved=0
>
>> + } else {
>> + if (!pdev->bus->self ||
>> + !pci_acs_path_enabled(pdev->bus->self, NULL,
>> + PCI_ACS_RR | PCI_ACS_UF))
>> + return -EINVAL;
>> + }
> Why would these be exclusive? Both the path and endpoint needs to be
> checked
>
> Jason

2023-01-10 15:59:49

by Felix Kuehling

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

Am 2023-01-10 um 08:45 schrieb Christian König:
> And I'm like 99% sure that Kabini/Wani should be identical to that.

Kabini is not supported by KFD. There should be no calls to
amd_iommu_... functions on Kabini, at least not from kfd_iommu.c. And
I'm not aware of any other callers in amdgpu.ko.

Regards,
  Felix


2023-01-10 16:18:11

by Vasant Hegde

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

Matt,


On 1/6/2023 12:58 PM, Matt Fagnani wrote:
> I booted 6.2-rc2 + patch with rd.driver.blacklist=amdgpu on the kernel command
> line to prevent amdgpu from being started while the initramfs was in use. The
> black screen problem happened later in the boot. I pressed sysrq+alt+s,u,b to do
> an emergency sync, remount read-only, and reboot. The journal for that boot was
> shown on the next boot. The two warnings which I previously reported weren't
> shown in the journal, but the same null pointer dereference which made amdgpu
> crash happened. I'm attaching the kernel log from the journal of that boot.
>

Thanks for your effort to get boot log. This is helpful.

Looking into the code further,
iommu_detach_group() didn't attach devices back to default_domain. So IOMMU
point of view device group was left in inconsistent state. This resulted in
IOMMU throwing page fault errors and amd IOMMU event handler code always assumes
that domain is setup properly. That resulted in below NULL pointer dereference
issue.

Jan 06 02:07:52 kernel: BUG: kernel NULL pointer dereference, address:
0000000000000058
Jan 06 02:07:52 kernel: #PF: supervisor read access in kernel mode
Jan 06 02:07:53 kernel: #PF: error_code(0x0000) - not-present page
Jan 06 02:07:53 kernel: PGD 0 P4D 0
Jan 06 02:07:53 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jan 06 02:07:53 kernel: CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Not tainted
6.2.0-rc2+ #89
Jan 06 02:07:53 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52
12/03/2019
Jan 06 02:07:53 kernel: RIP: 0010:report_iommu_fault+0x11/0x90

Ideally if domain attach fails (in this case its because pasid capability check
returned error) we should put devices back to original domain.. so that it can
continue without PASID capability.

I have a patch to handle these error conditions (not the fix for original
issue). I will try to post it soon.

-Vasant

2023-01-10 17:07:51

by Vasant Hegde

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled



On 1/10/2023 9:38 PM, Vasant Hegde wrote:
> Matt,
>
>
> On 1/6/2023 12:58 PM, Matt Fagnani wrote:
>> I booted 6.2-rc2 + patch with rd.driver.blacklist=amdgpu on the kernel command
>> line to prevent amdgpu from being started while the initramfs was in use. The
>> black screen problem happened later in the boot. I pressed sysrq+alt+s,u,b to do
>> an emergency sync, remount read-only, and reboot. The journal for that boot was
>> shown on the next boot. The two warnings which I previously reported weren't
>> shown in the journal, but the same null pointer dereference which made amdgpu
>> crash happened. I'm attaching the kernel log from the journal of that boot.
>>
>
> Thanks for your effort to get boot log. This is helpful.
>
> Looking into the code further,
> iommu_detach_group() didn't attach devices back to default_domain.

... because iommu_detach_group() expects new domain should be different from
group->domain.

-Vasant


> So IOMMU
> point of view device group was left in inconsistent state. This resulted in
> IOMMU throwing page fault errors and amd IOMMU event handler code always assumes
> that domain is setup properly. That resulted in below NULL pointer dereference
> issue.
>
> Jan 06 02:07:52 kernel: BUG: kernel NULL pointer dereference, address:
> 0000000000000058
> Jan 06 02:07:52 kernel: #PF: supervisor read access in kernel mode
> Jan 06 02:07:53 kernel: #PF: error_code(0x0000) - not-present page
> Jan 06 02:07:53 kernel: PGD 0 P4D 0
> Jan 06 02:07:53 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
> Jan 06 02:07:53 kernel: CPU: 2 PID: 56 Comm: irq/24-AMD-Vi Not tainted
> 6.2.0-rc2+ #89
> Jan 06 02:07:53 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52
> 12/03/2019
> Jan 06 02:07:53 kernel: RIP: 0010:report_iommu_fault+0x11/0x90
>
> Ideally if domain attach fails (in this case its because pasid capability check
> returned error) we should put devices back to original domain.. so that it can
> continue without PASID capability.
>
> I have a patch to handle these error conditions (not the fix for original
> issue). I will try to post it soon.
>
> -Vasant

2023-01-11 03:36:30

by Lu Baolu

[permalink] [raw]
Subject: Re: [regression, bisected, pci/iommu] Bug  216865 - Black screen when amdgpu started during 6.2-rc1 boot w ith AMD IOMMU enabled

On 2023/1/10 21:25, Jason Gunthorpe wrote:
>> + } else {
>> + if (!pdev->bus->self ||
>> + !pci_acs_path_enabled(pdev->bus->self, NULL,
>> + PCI_ACS_RR | PCI_ACS_UF))
>> + return -EINVAL;
>> + }
> Why would these be exclusive? Both the path and endpoint needs to be
> checked

If the device is not an MFD, do we still need to check the ACS on it?
Perhaps I didn't get your point correctly.

--
Best regards,
baolu

2023-01-11 13:15:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [regression, bisected, pci /iommu] Bug 216865 - Black screen whe n amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled

On Wed, Jan 11, 2023 at 11:16:32AM +0800, Baolu Lu wrote:
> On 2023/1/10 21:25, Jason Gunthorpe wrote:
> > > + } else {
> > > + if (!pdev->bus->self ||
> > > + !pci_acs_path_enabled(pdev->bus->self, NULL,
> > > + PCI_ACS_RR | PCI_ACS_UF))
> > > + return -EINVAL;
> > > + }
> > Why would these be exclusive? Both the path and endpoint needs to be
> > checked
>
> If the device is not an MFD, do we still need to check the ACS on it?
> Perhaps I didn't get your point correctly.

It always needs to check the path

Jason