2024-04-13 22:04:41

by Eric Wagner

[permalink] [raw]
Subject: Kernel 6.7 regression doesn't boot if using AMD eGPU

On my Thinkpad T14s G3 AMD (Ryzen 7 6850U) laptop connected to an AMD RX
580 in Akitio Node Thunderbolt 3 eGPU. Booting with the eGPU connected
hangs on kernels 6.7 and 6.8, but worked on 6.6. For debugging, I find that
adding the kernel parameter amd_iommu=off seems to fix the issue and allows
booting with the eGPU on 6.7.

I tried bisecting the issue between 6.6 and 6.7 and ended up with:
"e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit" in the
attached. This seems to indicate an amd iommu issue.

Two others also reported the same issue on AMD Ryzen 7 7840 with AMD RX
6000 connected as eGPU (https://gitlab.freedesktop.org/drm/amd/-/issues/3182
).

Let me know if you need more information.


Attachments:
bisect.log (6.15 kB)

2024-04-14 00:02:07

by Bagas Sanjaya

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On Sat, Apr 13, 2024 at 06:04:12PM -0400, Eric Wagner wrote:
> On my Thinkpad T14s G3 AMD (Ryzen 7 6850U) laptop connected to an AMD RX
> 580 in Akitio Node Thunderbolt 3 eGPU. Booting with the eGPU connected
> hangs on kernels 6.7 and 6.8, but worked on 6.6. For debugging, I find that
> adding the kernel parameter amd_iommu=off seems to fix the issue and allows
> booting with the eGPU on 6.7.
>
> I tried bisecting the issue between 6.6 and 6.7 and ended up with:
> "e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit" in the
> attached. This seems to indicate an amd iommu issue.
>
> Two others also reported the same issue on AMD Ryzen 7 7840 with AMD RX
> 6000 connected as eGPU (https://gitlab.freedesktop.org/drm/amd/-/issues/3182
> ).
>
> Let me know if you need more information.

> Bisecting: 366 revisions left to test after this (roughly 9 steps)
> [74e9347ebc5be452935fe4f3eddb150aa5a6f4fe] Merge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
> Bisecting: 182 revisions left to test after this (roughly 8 steps)
> [f6176471542d991137543af2ef1c18dae3286079] Merge tag 'mtd/fixes-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
> Bisecting: 87 revisions left to test after this (roughly 7 steps)
> [fe3cfe869d5e0453754cf2b4c75110276b5e8527] Merge tag 'phy-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
> Bisecting: 43 revisions left to test after this (roughly 6 steps)
> [c76c067e488ccd55734c3e750799caf2c5956db6] s390/pci: Use dma-iommu layer
> Bisecting: 27 revisions left to test after this (roughly 5 steps)
> [aa5cabc4ce8e6b45d170d162dc54b1bac1767c47] Merge tag 'arm-smmu-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
> Bisecting: 14 revisions left to test after this (roughly 4 steps)
> [bbc70e0aec287e164344b1a071bd46466a4f29b3] iommu/dart: Remove the force_bypass variable
> Bisecting: 9 revisions left to test after this (roughly 3 steps)
> [e82c175e63229ea495a0a0b5305a98b5b6ee5346] Revert "iommu/vt-d: Remove unused function"
> Bisecting: 5 revisions left to test after this (roughly 2 steps)
> [92bce97f0c341d3037b0f364b6839483f6a41cae] s390/pci: Fix reset of IOMMU software counters
> Bisecting: 3 revisions left to test after this (roughly 2 steps)
> [3613047280ec42a4e1350fdc1a6dd161ff4008cc] Merge tag 'v6.6-rc7' into core
> Bisecting: 2 revisions left to test after this (roughly 1 step)
> [f7da9c081517daba70f9f9342e09d7a6322ba323] iommu/tegra-smmu: Drop unnecessary error check for for debugfs_create_dir()
> Bisecting: 1 revision left to test after this (roughly 1 step)
> [9e13ec61de2a51195b122a79461431d8cb99d7b5] iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> [6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3] iommu: Avoid unnecessary cache invalidations
> e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit
> commit e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> Merge: 6e6c6d6bc6 f7da9c0815 aa5cabc4ce 9e13ec61de e82c175e63 cedc811c76 3613047280 92bce97f0c
> Author: Joerg Roedel <[email protected]>
> Date: Fri Oct 27 09:13:40 2023 +0200
>
> Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd', 'core' and 's390' into next
>

Also Cc: regressions ML.

--
An old man doll... just what I always wanted! - Clara


Attachments:
(No filename) (3.43 kB)
signature.asc (235.00 B)
Download all attachments

2024-04-15 08:04:40

by Joerg Roedel

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

Also adding Vasant.
On Sat, Apr 13, 2024 at 06:04:12PM -0400, Eric Wagner wrote:
> On my Thinkpad T14s G3 AMD (Ryzen 7 6850U) laptop connected to an AMD RX 580 in
> Akitio Node Thunderbolt 3 eGPU. Booting with the eGPU connected hangs on
> kernels 6.7 and 6.8, but worked on 6.6. For debugging, I find that adding the
> kernel parameter amd_iommu=off seems to fix the issue and allows booting with
> the eGPU on 6.7.

Do you have any way of getting the boot log of the hang? It is hard to
debug this without further data on where it hang and what happens
before.

Regards,

Joerg

>
> I tried bisecting the issue between 6.6 and 6.7 and ended up with:
> "e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit" in the
> attached. This seems to indicate an amd iommu issue.
>
> Two others also reported the same issue on AMD Ryzen 7 7840 with AMD RX 6000
> connected as eGPU (https://gitlab.freedesktop.org/drm/amd/-/issues/3182).
>
> Let me know if you need more information.

> Bisecting: 366 revisions left to test after this (roughly 9 steps)
> [74e9347ebc5be452935fe4f3eddb150aa5a6f4fe] Merge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
> Bisecting: 182 revisions left to test after this (roughly 8 steps)
> [f6176471542d991137543af2ef1c18dae3286079] Merge tag 'mtd/fixes-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
> Bisecting: 87 revisions left to test after this (roughly 7 steps)
> [fe3cfe869d5e0453754cf2b4c75110276b5e8527] Merge tag 'phy-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
> Bisecting: 43 revisions left to test after this (roughly 6 steps)
> [c76c067e488ccd55734c3e750799caf2c5956db6] s390/pci: Use dma-iommu layer
> Bisecting: 27 revisions left to test after this (roughly 5 steps)
> [aa5cabc4ce8e6b45d170d162dc54b1bac1767c47] Merge tag 'arm-smmu-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
> Bisecting: 14 revisions left to test after this (roughly 4 steps)
> [bbc70e0aec287e164344b1a071bd46466a4f29b3] iommu/dart: Remove the force_bypass variable
> Bisecting: 9 revisions left to test after this (roughly 3 steps)
> [e82c175e63229ea495a0a0b5305a98b5b6ee5346] Revert "iommu/vt-d: Remove unused function"
> Bisecting: 5 revisions left to test after this (roughly 2 steps)
> [92bce97f0c341d3037b0f364b6839483f6a41cae] s390/pci: Fix reset of IOMMU software counters
> Bisecting: 3 revisions left to test after this (roughly 2 steps)
> [3613047280ec42a4e1350fdc1a6dd161ff4008cc] Merge tag 'v6.6-rc7' into core
> Bisecting: 2 revisions left to test after this (roughly 1 step)
> [f7da9c081517daba70f9f9342e09d7a6322ba323] iommu/tegra-smmu: Drop unnecessary error check for for debugfs_create_dir()
> Bisecting: 1 revision left to test after this (roughly 1 step)
> [9e13ec61de2a51195b122a79461431d8cb99d7b5] iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> [6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3] iommu: Avoid unnecessary cache invalidations
> e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit
> commit e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> Merge: 6e6c6d6bc6 f7da9c0815 aa5cabc4ce 9e13ec61de e82c175e63 cedc811c76 3613047280 92bce97f0c
> Author: Joerg Roedel <[email protected]>
> Date: Fri Oct 27 09:13:40 2023 +0200
>
> Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd', 'core' and 's390' into next
>
> Documentation/admin-guide/kernel-parameters.txt | 9 +-
> .../devicetree/bindings/iommu/arm,smmu.yaml | 2 +
> arch/arm/configs/multi_v7_defconfig | 1 -
> arch/arm/configs/tegra_defconfig | 1 -
> arch/powerpc/kernel/iommu.c | 53 +-
> arch/s390/include/asm/pci.h | 11 -
> arch/s390/include/asm/pci_clp.h | 3 +
> arch/s390/include/asm/pci_dma.h | 121 +--
> arch/s390/pci/Makefile | 2 +-
> arch/s390/pci/pci.c | 35 +-
> arch/s390/pci/pci_bus.c | 5 -
> arch/s390/pci/pci_debug.c | 12 +-
> arch/s390/pci/pci_dma.c | 746 ---------------
> arch/s390/pci/pci_event.c | 17 +-
> arch/s390/pci/pci_sysfs.c | 19 +-
> drivers/iommu/Kconfig | 15 +-
> drivers/iommu/Makefile | 1 -
> drivers/iommu/amd/Kconfig | 9 -
> drivers/iommu/amd/Makefile | 1 -
> drivers/iommu/amd/amd_iommu.h | 35 +-
> drivers/iommu/amd/amd_iommu_types.h | 52 +-
> drivers/iommu/amd/init.c | 117 +--
> drivers/iommu/amd/io_pgtable_v2.c | 8 +-
> drivers/iommu/amd/iommu.c | 577 +++++-------
> drivers/iommu/amd/iommu_v2.c | 996 ---------------------
> drivers/iommu/apple-dart.c | 138 +--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 71 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 251 +++---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 17 +-
> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 2 +
> drivers/iommu/arm/arm-smmu/qcom_iommu.c | 45 +-
> drivers/iommu/dma-iommu.c | 200 ++++-
> drivers/iommu/exynos-iommu.c | 83 +-
> drivers/iommu/fsl_pamu_domain.c | 41 +-
> drivers/iommu/intel/debugfs.c | 215 ++++-
> drivers/iommu/intel/iommu.c | 19 +-
> drivers/iommu/intel/iommu.h | 14 +
> drivers/iommu/iommu.c | 455 +++++-----
> drivers/iommu/iommufd/selftest.c | 30 +-
> drivers/iommu/iova.c | 95 +-
> drivers/iommu/ipmmu-vmsa.c | 72 +-
> drivers/iommu/msm_iommu.c | 35 +-
> drivers/iommu/mtk_iommu.c | 35 +-
> drivers/iommu/mtk_iommu_v1.c | 28 +-
> drivers/iommu/omap-iommu.c | 69 +-
> drivers/iommu/omap-iommu.h | 2 +-
> drivers/iommu/rockchip-iommu.c | 59 +-
> drivers/iommu/s390-iommu.c | 424 ++++++++-
> drivers/iommu/sprd-iommu.c | 36 +-
> drivers/iommu/sun50i-iommu.c | 80 +-
> drivers/iommu/tegra-gart.c | 371 --------
> drivers/iommu/tegra-smmu.c | 58 +-
> drivers/iommu/virtio-iommu.c | 4 +-
> drivers/memory/tegra/mc.c | 34 -
> drivers/memory/tegra/tegra20.c | 28 -
> include/linux/amd-iommu.h | 120 ---
> include/linux/iommu.h | 38 +-
> include/soc/tegra/mc.h | 26 -
> 58 files changed, 2138 insertions(+), 3905 deletions(-)
> delete mode 100644 arch/s390/pci/pci_dma.c
> delete mode 100644 drivers/iommu/amd/iommu_v2.c
> delete mode 100644 drivers/iommu/tegra-gart.c


2024-04-15 16:31:06

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On Sat, Apr 13, 2024 at 06:04:12PM -0400, Eric Wagner wrote:
> On my Thinkpad T14s G3 AMD (Ryzen 7 6850U) laptop connected to an AMD
> RX 580 in Akitio Node Thunderbolt 3 eGPU. Booting with the eGPU
> connected hangs on kernels 6.7 and 6.8, but worked on 6.6. For
> debugging, I find that adding the kernel parameter amd_iommu=off seems
> to fix the issue and allows booting with the eGPU on 6.7.
> I tried bisecting the issue between 6.6 and 6.7 and ended up with:
> "e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit" in
> the attached. This seems to indicate an amd iommu issue.
> Two others also reported the same issue on AMD Ryzen 7 7840 with AMD RX
> 6000 connected as eGPU
> ([1]https://gitlab.freedesktop.org/drm/amd/-/issues/3182).
> Let me know if you need more information.
>
> References
>
> 1. https://gitlab.freedesktop.org/drm/amd/-/issues/3182

> Bisecting: 366 revisions left to test after this (roughly 9 steps)
> [74e9347ebc5be452935fe4f3eddb150aa5a6f4fe] Merge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
> Bisecting: 182 revisions left to test after this (roughly 8 steps)
> [f6176471542d991137543af2ef1c18dae3286079] Merge tag 'mtd/fixes-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
> Bisecting: 87 revisions left to test after this (roughly 7 steps)
> [fe3cfe869d5e0453754cf2b4c75110276b5e8527] Merge tag 'phy-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
> Bisecting: 43 revisions left to test after this (roughly 6 steps)
> [c76c067e488ccd55734c3e750799caf2c5956db6] s390/pci: Use dma-iommu layer
> Bisecting: 27 revisions left to test after this (roughly 5 steps)
> [aa5cabc4ce8e6b45d170d162dc54b1bac1767c47] Merge tag 'arm-smmu-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
> Bisecting: 14 revisions left to test after this (roughly 4 steps)
> [bbc70e0aec287e164344b1a071bd46466a4f29b3] iommu/dart: Remove the force_bypass variable
> Bisecting: 9 revisions left to test after this (roughly 3 steps)
> [e82c175e63229ea495a0a0b5305a98b5b6ee5346] Revert "iommu/vt-d: Remove unused function"
> Bisecting: 5 revisions left to test after this (roughly 2 steps)
> [92bce97f0c341d3037b0f364b6839483f6a41cae] s390/pci: Fix reset of IOMMU software counters
> Bisecting: 3 revisions left to test after this (roughly 2 steps)
> [3613047280ec42a4e1350fdc1a6dd161ff4008cc] Merge tag 'v6.6-rc7' into core
> Bisecting: 2 revisions left to test after this (roughly 1 step)
> [f7da9c081517daba70f9f9342e09d7a6322ba323] iommu/tegra-smmu: Drop unnecessary error check for for debugfs_create_dir()
> Bisecting: 1 revision left to test after this (roughly 1 step)
> [9e13ec61de2a51195b122a79461431d8cb99d7b5] iommu/virtio: Add __counted_by for struct viommu_request and use struct_size()
> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> [6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3] iommu: Avoid unnecessary cache invalidations
> e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit
> commit e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> Merge: 6e6c6d6bc6 f7da9c0815 aa5cabc4ce 9e13ec61de e82c175e63 cedc811c76 3613047280 92bce97f0c
> Author: Joerg Roedel <[email protected]>
> Date: Fri Oct 27 09:13:40 2023 +0200
>
> Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', 'x86/amd', 'core' and 's390' into next

Do you have the good/bad log on this? It doesn't look like bisect
tested enough stuff to really conclude the merge is the bad thing, at
a minimum it should be testing all the bases of the merge. Do you have
--first-parent set or something?

I would test cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly. Most likely
cedc will be bad problem.

If one of them is bad then restart the bisection with that as the bad
and 6e6c6d6bc6 as the good.

(or run bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 as
the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good
without --first-parent)

Jason

2024-04-15 19:00:37

by Eric Wagner

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

Apologies if I made a mistake in the first bisect, I'm new to kernel
debugging.

I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
bisect log is attached. It ended up at the same commit as before.

I've also attached a picture of the boot screen that occurs when it hangs.
0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
problem.

On Mon, Apr 15, 2024 at 12:30 PM Jason Gunthorpe <[email protected]> wrote:

> On Sat, Apr 13, 2024 at 06:04:12PM -0400, Eric Wagner wrote:
> > On my Thinkpad T14s G3 AMD (Ryzen 7 6850U) laptop connected to an AMD
> > RX 580 in Akitio Node Thunderbolt 3 eGPU. Booting with the eGPU
> > connected hangs on kernels 6.7 and 6.8, but worked on 6.6. For
> > debugging, I find that adding the kernel parameter amd_iommu=off seems
> > to fix the issue and allows booting with the eGPU on 6.7.
> > I tried bisecting the issue between 6.6 and 6.7 and ended up with:
> > "e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit" in
> > the attached. This seems to indicate an amd iommu issue.
> > Two others also reported the same issue on AMD Ryzen 7 7840 with AMD
> RX
> > 6000 connected as eGPU
> > ([1]https://gitlab.freedesktop.org/drm/amd/-/issues/3182).
> > Let me know if you need more information.
> >
> > References
> >
> > 1. https://gitlab.freedesktop.org/drm/amd/-/issues/3182
>
> > Bisecting: 366 revisions left to test after this (roughly 9 steps)
> > [74e9347ebc5be452935fe4f3eddb150aa5a6f4fe] Merge tag
> 'loongarch-fixes-6.6-3' of git://
> git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
> > Bisecting: 182 revisions left to test after this (roughly 8 steps)
> > [f6176471542d991137543af2ef1c18dae3286079] Merge tag
> 'mtd/fixes-for-6.6-rc7' of git://
> git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
> > Bisecting: 87 revisions left to test after this (roughly 7 steps)
> > [fe3cfe869d5e0453754cf2b4c75110276b5e8527] Merge tag 'phy-fixes-6.6' of
> git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
> > Bisecting: 43 revisions left to test after this (roughly 6 steps)
> > [c76c067e488ccd55734c3e750799caf2c5956db6] s390/pci: Use dma-iommu layer
> > Bisecting: 27 revisions left to test after this (roughly 5 steps)
> > [aa5cabc4ce8e6b45d170d162dc54b1bac1767c47] Merge tag 'arm-smmu-updates'
> of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
> > Bisecting: 14 revisions left to test after this (roughly 4 steps)
> > [bbc70e0aec287e164344b1a071bd46466a4f29b3] iommu/dart: Remove the
> force_bypass variable
> > Bisecting: 9 revisions left to test after this (roughly 3 steps)
> > [e82c175e63229ea495a0a0b5305a98b5b6ee5346] Revert "iommu/vt-d: Remove
> unused function"
> > Bisecting: 5 revisions left to test after this (roughly 2 steps)
> > [92bce97f0c341d3037b0f364b6839483f6a41cae] s390/pci: Fix reset of IOMMU
> software counters
> > Bisecting: 3 revisions left to test after this (roughly 2 steps)
> > [3613047280ec42a4e1350fdc1a6dd161ff4008cc] Merge tag 'v6.6-rc7' into core
> > Bisecting: 2 revisions left to test after this (roughly 1 step)
> > [f7da9c081517daba70f9f9342e09d7a6322ba323] iommu/tegra-smmu: Drop
> unnecessary error check for for debugfs_create_dir()
> > Bisecting: 1 revision left to test after this (roughly 1 step)
> > [9e13ec61de2a51195b122a79461431d8cb99d7b5] iommu/virtio: Add
> __counted_by for struct viommu_request and use struct_size()
> > Bisecting: 0 revisions left to test after this (roughly 0 steps)
> > [6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3] iommu: Avoid unnecessary
> cache invalidations
> > e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 is the first bad commit
> > commit e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> > Merge: 6e6c6d6bc6 f7da9c0815 aa5cabc4ce 9e13ec61de e82c175e63 cedc811c76
> 3613047280 92bce97f0c
> > Author: Joerg Roedel <[email protected]>
> > Date: Fri Oct 27 09:13:40 2023 +0200
> >
> > Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio',
> 'x86/vt-d', 'x86/amd', 'core' and 's390' into next
>
> Do you have the good/bad log on this? It doesn't look like bisect
> tested enough stuff to really conclude the merge is the bad thing, at
> a minimum it should be testing all the bases of the merge. Do you have
> --first-parent set or something?
>
> I would test cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
> 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly. Most likely
> cedc will be bad problem.
>
> If one of them is bad then restart the bisection with that as the bad
> and 6e6c6d6bc6 as the good.
>
> (or run bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2 as
> the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good
> without --first-parent)
>
> Jason
>


Attachments:
bisect2.log (1.90 kB)
20240415_133212.jpg (1.99 MB)
Download all attachments

2024-04-15 21:44:51

by Robin Murphy

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On 2024-04-15 7:57 pm, Eric Wagner wrote:
> Apologies if I made a mistake in the first bisect, I'm new to kernel
> debugging.
>
> I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
> 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
> Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
> bisect log is attached. It ended up at the same commit as before.
>
> I've also attached a picture of the boot screen that occurs when it hangs.
> 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
> problem.

Looks like 59ddce4418da483 probably broke things most - prior to that,
the fact that it's behind a Thunderbolt port would have always taken
precedence and forced IOMMU_DOMAIN_DMA regardless of what the driver may
have wanted to say, whereas now we ask the driver first, then complain
that it conflicts with the untrusted status and ultimately don't
configure the IOMMU at all. Meanwhile the GPU driver presumably goes on
to believe it's using dma-direct with no IOMMU present, resulting in
fireworks when its traffic reaches the IOMMU. Great :(

However the other notable thing that also happened between 6.6 and 6.7
was the removal of the AMD iommu_v2 code, so there's some possibility
that the GPU driver still may have only been working before due to that
also subverting the default domain with its own identity domain, so
whether it would actually work again with
iommu_get_default_domain_type() sorted out is yet another question... As
a first step I'd test the quick hack below, but be prepared for things
to still break slightly differently.

Cheers,
Robin.

----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 996e79dc582d..063e1eb32fbd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1774,7 +1774,7 @@ static int iommu_get_default_domain_type(struct
iommu_group *group,
untrusted,
"Device is not trusted, but driver is overriding group %u to %s,
refusing to probe.\n",
group->id, iommu_domain_type_str(driver_type));
- return -1;
+ //return -1;
}
driver_type = IOMMU_DOMAIN_DMA;
}

2024-04-16 00:39:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
> On 2024-04-15 7:57 pm, Eric Wagner wrote:
> > Apologies if I made a mistake in the first bisect, I'm new to kernel
> > debugging.
> >
> > I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
> > 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
> > Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> > as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
> > bisect log is attached. It ended up at the same commit as before.
> >
> > I've also attached a picture of the boot screen that occurs when it hangs.
> > 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
> > problem.
>
> Looks like 59ddce4418da483 probably broke things most - prior to that, the
> fact that it's behind a Thunderbolt port would have always taken precedence
> and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to
> saywhereas now we ask the driver first, then complain that it conflicts
> with the untrusted status and ultimately don't configure the IOMMU
> at all.

Yes, if the driver wants to force a domain type it should be
forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
is incapable of supporting otherwise.

> Meanwhile the GPU driver presumably goes on to believe it's using dma-direct
> with no IOMMU present, resulting in fireworks when its traffic reaches the
> IOMMU. Great :(

I wonder where is the missing error handling.. iommu probe failure
should not go on to allow driver attach, there is no guarentee any DMA
works now that many iommus are booting up in BLOCKED.

> However the other notable thing that also happened between 6.6 and 6.7 was
> the removal of the AMD iommu_v2 code, so there's some possibility that the
> GPU driver still may have only been working before due to that also

Most likely it is the above change interacting with this patch when
they are both combined in the merge:

commit 92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6
Author: Vasant Hegde <[email protected]>
Date: Thu Sep 21 09:21:45 2023 +0000

iommu/amd: Introduce iommu_dev_data.flags to track device capabilities

@@ -2471,7 +2481,7 @@ static int amd_iommu_def_domain_type(struct device *dev)
* and require remapping.
* - SNP is enabled, because it prohibits DTE[Mode]=0.
*/
- if (dev_data->iommu_v2 &&
+ if (pdev_pasid_supported(dev_data) &&
!cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
!amd_iommu_snp_en) {
return IOMMU_DOMAIN_IDENTITY;

Which, IIRC, was intended to be temporary to work around limitations
in the DTE programming logic within the driver. Previously iommu_v2 as
a module option that Eric probably doesn't set, I guess.

The below will probably make it boot, but Vasant should check what
happens if PASID is eventually attached too.

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d35c1b8c8e65ce..f3da6a5b6cb1cb 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2758,11 +2758,16 @@ static int amd_iommu_def_domain_type(struct device *dev)
* and require remapping.
* - SNP is enabled, because it prohibits DTE[Mode]=0.
*/
- if (pdev_pasid_supported(dev_data) &&
- !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
- !amd_iommu_snp_en) {
+ if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
+ return IOMMU_DOMAIN_IDENTITY;
+
+ /*
+ * For now driver limitations prevent enabling PASID as a paging domain
+ * on the RID together.
+ */
+ if (dev_is_pci(dev) && !to_pci_dev(dev)->untrusted &&
+ pdev_pasid_supported(dev_data))
return IOMMU_DOMAIN_IDENTITY;
- }

return 0;
}

Jason

2024-04-16 10:54:25

by Vasant Hegde

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

Hi Eric,


On 4/16/2024 7:42 AM, Eric Wagner wrote:
> On Mon, Apr 15, 2024 at 5:44 PM Robin Murphy <[email protected]
> <mailto:[email protected]>> wrote:
>
> As a first step I'd test the quick hack below, but be prepared for things
> to still break slightly differently.
>
> Cheers,
> Robin.
>
> ----->8-----
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 996e79dc582d..063e1eb32fbd 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1774,7 +1774,7 @@ static int iommu_get_default_domain_type(struct
> iommu_group *group,
>                                 untrusted,
>                                 "Device is not trusted, but driver is
> overriding group %u to %s,
> refusing to probe.\n",
>                                 group->id, iommu_domain_type_str(driver_type));
> -                       return -1;
> +                       //return -1;
>                 }
>                 driver_type = IOMMU_DOMAIN_DMA;
>         }
>
> This worked and got the system booting for me.

Thanks for testing.

IIUC eGPU is behind Thunderbolt and hence IOMMU treated it as 'untrusted device'.

AMD driver tries to allocate "IDENTITY" domain for GPU devices.
iommu_get_default_domain_type() return -1 as it expects IOMMU_DOMAIN_DMA for
untrusted device.

Can you please attach lspci -vvv output? Also /proc/cmdline output.


>
> On Mon, Apr 15, 2024 at 8:39 PM Jason Gunthorpe <[email protected]
> <mailto:[email protected]>> wrote:
>
> On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
> > On 2024-04-15 7:57 pm, Eric Wagner wrote:
> > > Apologies if I made a mistake in the first bisect, I'm new to kernel
> > > debugging.
> > >
> > > I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
> > > 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were
> good.
> > > Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> > > as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
> > > bisect log is attached. It ended up at the same commit as before.
> > >
> > > I've also attached a picture of the boot screen that occurs when it hangs.
> > > 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
> > > problem.
> >
> > Looks like 59ddce4418da483 probably broke things most - prior to that, the
> > fact that it's behind a Thunderbolt port would have always taken precedence
> > and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to
> > saywhereas now we ask the driver first, then complain that it conflicts
> > with the untrusted status and ultimately don't configure the IOMMU
> > at all.
>
> Yes, if the driver wants to force a domain type it should be
> forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
> is incapable of supporting otherwise.


@Jason,

Looks like before commit 59ddce4418da483, core IOMMU layer was enforcing
'IOMMU_DOMAIN_DMA' for untrusted device. If its trusted device then it was
letting HW IOMMU driver to pick best domain type.

Did we change that flow? Are we expecting HW IOMMU driver to handler untrusted
devices as well?

>
> > Meanwhile the GPU driver presumably goes on to believe it's using dma-direct
> > with no IOMMU present, resulting in fireworks when its traffic reaches the
> > IOMMU. Great :(
>
> I wonder where is the missing error handling.. iommu probe failure
> should not go on to allow driver attach, there is no guarentee any DMA
> works now that many iommus are booting up in BLOCKED.


Looking into code path, in failure path we cleanup device, not group. May be
that's causing issue? Not sure where it failed. If I manage to find some system
I will try to debug further.



>
> > However the other notable thing that also happened between 6.6 and 6.7 was
> > the removal of the AMD iommu_v2 code, so there's some possibility that the
> > GPU driver still may have only been working before due to that also
>
> Most likely it is the above change interacting with this patch when
> they are both combined in the merge:
>
> commit 92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6
> Author: Vasant Hegde <[email protected] <mailto:[email protected]>>
> Date:   Thu Sep 21 09:21:45 2023 +0000
>
>     iommu/amd: Introduce iommu_dev_data.flags to track device capabilities
>
> @@ -2471,7 +2481,7 @@ static int amd_iommu_def_domain_type(struct device *dev)
>          *    and require remapping.
>          *  - SNP is enabled, because it prohibits DTE[Mode]=0.
>          */
> -       if (dev_data->iommu_v2 &&
> +       if (pdev_pasid_supported(dev_data) &&
>             !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
>             !amd_iommu_snp_en) {
>                 return IOMMU_DOMAIN_IDENTITY;
>
> Which, IIRC, was intended to be temporary to work around limitations
> in the DTE programming logic within the driver. Previously iommu_v2 as
> a module option that Eric probably doesn't set, I guess.
>
> The below will probably make it boot, but Vasant should check what
> happens if PASID is eventually attached too.
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index d35c1b8c8e65ce..f3da6a5b6cb1cb 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -2758,11 +2758,16 @@ static int amd_iommu_def_domain_type(struct device *dev)
>          *    and require remapping.
>          *  - SNP is enabled, because it prohibits DTE[Mode]=0.
>          */
> -       if (pdev_pasid_supported(dev_data) &&
> -           !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
> -           !amd_iommu_snp_en) {
> +       if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
> +               return IOMMU_DOMAIN_IDENTITY;
> +
> +       /*
> +        * For now driver limitations prevent enabling PASID as a paging domain
> +        * on the RID together.
> +        */
> +       if (dev_is_pci(dev) && !to_pci_dev(dev)->untrusted &&
> +           pdev_pasid_supported(dev_data))
>                 return IOMMU_DOMAIN_IDENTITY;
> -       }
>
>         return 0;
>  }
>
> Jason
>
>
> As it booted ok with Robin's patch above, these changes to
> drivers/iommu/amd/iommu.c didn't seem to make a difference for me. I was testing
> with amd iommu v2 off in the kernel config and I also have TSME enabled in the
> BIOS if that matters.

TMSE is transparent to OS. So its fine.

-Vasant


2024-04-16 11:27:27

by Robin Murphy

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On 2024-04-16 1:39 am, Jason Gunthorpe wrote:
> On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
>> On 2024-04-15 7:57 pm, Eric Wagner wrote:
>>> Apologies if I made a mistake in the first bisect, I'm new to kernel
>>> debugging.
>>>
>>> I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
>>> 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
>>> Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
>>> as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
>>> bisect log is attached. It ended up at the same commit as before.
>>>
>>> I've also attached a picture of the boot screen that occurs when it hangs.
>>> 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
>>> problem.
>>
>> Looks like 59ddce4418da483 probably broke things most - prior to that, the
>> fact that it's behind a Thunderbolt port would have always taken precedence
>> and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to
>> saywhereas now we ask the driver first, then complain that it conflicts
>> with the untrusted status and ultimately don't configure the IOMMU
>> at all.
>
> Yes, if the driver wants to force a domain type it should be
> forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
> is incapable of supporting otherwise.

No, in the case of AMD it only forces identity if it thinks the device
might want to use PASIDs (because of the architectural limitation that
the RID always operates in GPA space so can't have its own independent
translation).

Either way, though, there's really little sense to that argument - if
enforcing strict translation *might* compromise the device's
functionality, we should instead go out of our way to ensure it's
definitely as broken as possible? I fail to see how that can be
justified as useful or desirable behaviour.

>> Meanwhile the GPU driver presumably goes on to believe it's using dma-direct
>> with no IOMMU present, resulting in fireworks when its traffic reaches the
>> IOMMU. Great :(
>
> I wonder where is the missing error handling.. iommu probe failure
> should not go on to allow driver attach, there is no guarentee any DMA
> works now that many iommus are booting up in BLOCKED.

What do you mean error handling? After you spent a year rewriting the
probing code to your own grand design, don't suggest you don't even know
how it fundamentally works...

"Failing" iommu_probe_device is merely how we tell ourselves that we're
not interested in a device, and consequently tell the rest of the kernel
it doesn't have an IOMMU (via device_iommu_mapped() returning false).
This is normal and expected for devices which legitimately have no IOMMU
in the first place; conversely we don't do a great deal for unexpected
failures since those typically represent system-fatal conditions
whatever we might try to do. We've never had much of a notion of
expected failures when an IOMMU *is* present, but even then, denying any
trace of the IOMMU and removing ourselves from the picture is clearly
not the ideal way to approach that. We're running off a bus notifier (or
even later), so ultimately our return value is meaningless; at that
point the device already exists and has been added to its bus, we can't
undo that.

However it looks to be even more fun if failure occurs in *deferred*
default domain creation via bus_iommu_probe(), since then we give up and
dismiss the entire IOMMU. Except the x86 drivers ignore the return from
iommu_device_register(), so further hilarity ensues...

I think I've now satisfied myself that a simple fix for the core code is
appropriate and will write that up now; one other thing I couldn't quite
figure out is whether the AMD driver somehow prevents PASIDs being used
while the group is attached to a non-identity (and non-nested) domain -
that's probably one for Vasant to confirm.

Thanks,
Robin.

2024-04-16 11:39:50

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On Tue, Apr 16, 2024 at 04:23:38PM +0530, Vasant Hegde wrote:
> > Yes, if the driver wants to force a domain type it should be
> > forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
> > is incapable of supporting otherwise.
>
>
> @Jason,
>
> Looks like before commit 59ddce4418da483, core IOMMU layer was enforcing
> 'IOMMU_DOMAIN_DMA' for untrusted device. If its trusted device then it was
> letting HW IOMMU driver to pick best domain type.

If the driver wants to force identity because paging doesn't work then
yes it needs to figure something out..

Really the drivers should not be designed to do this, they need to
accommodate paging domains in all cases if things are going to work
correctly. The def_domain callback should be a last resort.

> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index d35c1b8c8e65ce..f3da6a5b6cb1cb 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -2758,11 +2758,16 @@ static int amd_iommu_def_domain_type(struct device *dev)
> >          *    and require remapping.
> >          *  - SNP is enabled, because it prohibits DTE[Mode]=0.
> >          */
> > -       if (pdev_pasid_supported(dev_data) &&
> > -           !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
> > -           !amd_iommu_snp_en) {
> > +       if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
> > +               return IOMMU_DOMAIN_IDENTITY;
> > +
> > +       /*
> > +        * For now driver limitations prevent enabling PASID as a paging domain
> > +        * on the RID together.
> > +        */
> > +       if (dev_is_pci(dev) && !to_pci_dev(dev)->untrusted &&
> > +           pdev_pasid_supported(dev_data))
> >                 return IOMMU_DOMAIN_IDENTITY;
> > -       }
> >
> >         return 0;
> >  }
> >
> >
> > As it booted ok with Robin's patch above, these changes to
> > drivers/iommu/amd/iommu.c didn't seem to make a difference for me. I was
> > testing with amd iommu v2 off in the kernel config and I also have TSME
> > enabled in the BIOS if that matters.

There must be a mistake in the above then, it would be good to sort it
out because something like that is the right fix.

Jason

2024-04-16 11:50:05

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On Tue, Apr 16, 2024 at 12:25:52PM +0100, Robin Murphy wrote:
> On 2024-04-16 1:39 am, Jason Gunthorpe wrote:
> > On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
> > > On 2024-04-15 7:57 pm, Eric Wagner wrote:
> > > > Apologies if I made a mistake in the first bisect, I'm new to kernel
> > > > debugging.
> > > >
> > > > I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
> > > > 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
> > > > Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> > > > as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
> > > > bisect log is attached. It ended up at the same commit as before.
> > > >
> > > > I've also attached a picture of the boot screen that occurs when it hangs.
> > > > 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
> > > > problem.
> > >
> > > Looks like 59ddce4418da483 probably broke things most - prior to that, the
> > > fact that it's behind a Thunderbolt port would have always taken precedence
> > > and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to
> > > saywhereas now we ask the driver first, then complain that it conflicts
> > > with the untrusted status and ultimately don't configure the IOMMU
> > > at all.
> >
> > Yes, if the driver wants to force a domain type it should be
> > forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
> > is incapable of supporting otherwise.
>
> No, in the case of AMD it only forces identity if it thinks the device might
> want to use PASIDs (because of the architectural limitation that the RID
> always operates in GPA space so can't have its own independent translation).

AMD forces this because it doesn't yet have a way to automatically
choose it's v1/v2 page table format during alloc domain. It is just a
SW bug.

The CC/SNP limitation is also a SW bug but is more fatal as it can't
even attach a v1 page table in this mode.

> Either way, though, there's really little sense to that argument - if
> enforcing strict translation *might* compromise the device's functionality,
> we should instead go out of our way to ensure it's definitely as broken as
> possible? I fail to see how that can be justified as useful or desirable
> behaviour.

For SNP cases the attach of a DMA domain will fail, so yes, moving the
failure earlier and giving a clear message is desirable.

> "Failing" iommu_probe_device is merely how we tell ourselves that we're not
> interested in a device, and consequently tell the rest of the kernel it
> doesn't have an IOMMU (via device_iommu_mapped() returning false).

Probing failing with ENODEV means the device has no iommu and the rest
of the code should assume DMA direct will work.

Probing failing with any other error code means the device has an
iommu and it couldn't be setup. DMA direct probably won't work today.

If you want all failure codes to mean the device is safe for DMA
direct then we need to try and attach the IDENTITY domain on various
probe failure paths too.

> I think I've now satisfied myself that a simple fix for the core code is
> appropriate and will write that up now; one other thing I couldn't
> quite

It really doesn't match the design here.

Jason

2024-04-17 02:17:23

by Eric Wagner

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On Tue, Apr 16, 2024 at 6:53 AM Vasant Hegde <[email protected]> wrote:
>
> Hi Eric,
>
>
> On 4/16/2024 7:42 AM, Eric Wagner wrote:
> > On Mon, Apr 15, 2024 at 5:44 PM Robin Murphy <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > As a first step I'd test the quick hack below, but be prepared for things
> > to still break slightly differently.
> >
> > Cheers,
> > Robin.
> >
> > ----->8-----
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 996e79dc582d..063e1eb32fbd 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1774,7 +1774,7 @@ static int iommu_get_default_domain_type(struct
> > iommu_group *group,
> > untrusted,
> > "Device is not trusted, but driver is
> > overriding group %u to %s,
> > refusing to probe.\n",
> > group->id, iommu_domain_type_str(driver_type));
> > - return -1;
> > + //return -1;
> > }
> > driver_type = IOMMU_DOMAIN_DMA;
> > }
> >
> > This worked and got the system booting for me.
>
> Thanks for testing.
>
> IIUC eGPU is behind Thunderbolt and hence IOMMU treated it as 'untrusted device'.
>
> AMD driver tries to allocate "IDENTITY" domain for GPU devices.
> iommu_get_default_domain_type() return -1 as it expects IOMMU_DOMAIN_DMA for
> untrusted device.
>
> Can you please attach lspci -vvv output? Also /proc/cmdline output.
>
>
> >
> > On Mon, Apr 15, 2024 at 8:39 PM Jason Gunthorpe <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
> > > On 2024-04-15 7:57 pm, Eric Wagner wrote:
> > > > Apologies if I made a mistake in the first bisect, I'm new to kernel
> > > > debugging.
> > > >
> > > > I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
> > > > 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were
> > good.
> > > > Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
> > > > as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
> > > > bisect log is attached. It ended up at the same commit as before.
> > > >
> > > > I've also attached a picture of the boot screen that occurs when it hangs.
> > > > 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
> > > > problem.
> > >
> > > Looks like 59ddce4418da483 probably broke things most - prior to that, the
> > > fact that it's behind a Thunderbolt port would have always taken precedence
> > > and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to
> > > saywhereas now we ask the driver first, then complain that it conflicts
> > > with the untrusted status and ultimately don't configure the IOMMU
> > > at all.
> >
> > Yes, if the driver wants to force a domain type it should be
> > forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
> > is incapable of supporting otherwise.
>
>
> @Jason,
>
> Looks like before commit 59ddce4418da483, core IOMMU layer was enforcing
> 'IOMMU_DOMAIN_DMA' for untrusted device. If its trusted device then it was
> letting HW IOMMU driver to pick best domain type.
>
> Did we change that flow? Are we expecting HW IOMMU driver to handler untrusted
> devices as well?
>
> >
> > > Meanwhile the GPU driver presumably goes on to believe it's using dma-direct
> > > with no IOMMU present, resulting in fireworks when its traffic reaches the
> > > IOMMU. Great :(
> >
> > I wonder where is the missing error handling.. iommu probe failure
> > should not go on to allow driver attach, there is no guarentee any DMA
> > works now that many iommus are booting up in BLOCKED.
>
>
> Looking into code path, in failure path we cleanup device, not group. May be
> that's causing issue? Not sure where it failed. If I manage to find some system
> I will try to debug further.
>
>
>
> >
> > > However the other notable thing that also happened between 6.6 and 6.7 was
> > > the removal of the AMD iommu_v2 code, so there's some possibility that the
> > > GPU driver still may have only been working before due to that also
> >
> > Most likely it is the above change interacting with this patch when
> > they are both combined in the merge:
> >
> > commit 92e2bd56a5f9fc44313fda802a43a63cc2a9c8f6
> > Author: Vasant Hegde <[email protected] <mailto:vasant.hegde@amdcom>>
> > Date: Thu Sep 21 09:21:45 2023 +0000
> >
> > iommu/amd: Introduce iommu_dev_data.flags to track device capabilities
> >
> > @@ -2471,7 +2481,7 @@ static int amd_iommu_def_domain_type(struct device *dev)
> > * and require remapping.
> > * - SNP is enabled, because it prohibits DTE[Mode]=0.
> > */
> > - if (dev_data->iommu_v2 &&
> > + if (pdev_pasid_supported(dev_data) &&
> > !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
> > !amd_iommu_snp_en) {
> > return IOMMU_DOMAIN_IDENTITY;
> >
> > Which, IIRC, was intended to be temporary to work around limitations
> > in the DTE programming logic within the driver. Previously iommu_v2 as
> > a module option that Eric probably doesn't set, I guess.
> >
> > The below will probably make it boot, but Vasant should check what
> > happens if PASID is eventually attached too.
> >
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index d35c1b8c8e65ce..f3da6a5b6cb1cb 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -2758,11 +2758,16 @@ static int amd_iommu_def_domain_type(struct device *dev)
> > * and require remapping.
> > * - SNP is enabled, because it prohibits DTE[Mode]=0.
> > */
> > - if (pdev_pasid_supported(dev_data) &&
> > - !cc_platform_has(CC_ATTR_MEM_ENCRYPT) &&
> > - !amd_iommu_snp_en) {
> > + if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT) && !amd_iommu_snp_en)
> > + return IOMMU_DOMAIN_IDENTITY;
> > +
> > + /*
> > + * For now driver limitations prevent enabling PASID as a paging domain
> > + * on the RID together.
> > + */
> > + if (dev_is_pci(dev) && !to_pci_dev(dev)->untrusted &&
> > + pdev_pasid_supported(dev_data))
> > return IOMMU_DOMAIN_IDENTITY;
> > - }
> >
> > return 0;
> > }
> >
> > Jason
> >
> >
> > As it booted ok with Robin's patch above, these changes to
> > drivers/iommu/amd/iommu.c didn't seem to make a difference for me. I was testing
> > with amd iommu v2 off in the kernel config and I also have TSME enabled in the
> > BIOS if that matters.
>
> TMSE is transparent to OS. So its fine.
>
> -Vasant
>
Output of lspci -vvv and /proc/cmdline attached.

-Eric


Attachments:
cmdline.log (83.00 B)
lspci.log (82.21 kB)
Download all attachments

2024-04-17 08:48:38

by Vasant Hegde

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

Hi,


On 4/16/2024 5:19 PM, Jason Gunthorpe wrote:
> On Tue, Apr 16, 2024 at 12:25:52PM +0100, Robin Murphy wrote:
>> On 2024-04-16 1:39 am, Jason Gunthorpe wrote:
>>> On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
>>>> On 2024-04-15 7:57 pm, Eric Wagner wrote:
>>>>> Apologies if I made a mistake in the first bisect, I'm new to kernel
>>>>> debugging.
>>>>>
>>>>> I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
>>>>> 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
>>>>> Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
>>>>> as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
>>>>> bisect log is attached. It ended up at the same commit as before.
>>>>>
>>>>> I've also attached a picture of the boot screen that occurs when it hangs.
>>>>> 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
>>>>> problem.
>>>>
>>>> Looks like 59ddce4418da483 probably broke things most - prior to that, the
>>>> fact that it's behind a Thunderbolt port would have always taken precedence
>>>> and forced IOMMU_DOMAIN_DMA regardless of what the driver may have wanted to
>>>> saywhereas now we ask the driver first, then complain that it conflicts
>>>> with the untrusted status and ultimately don't configure the IOMMU
>>>> at all.
>>>
>>> Yes, if the driver wants to force a domain type it should be
>>> forced. Driver knows best. Eg AMD forces IDENTITY when the HW/driver
>>> is incapable of supporting otherwise.
>>
>> No, in the case of AMD it only forces identity if it thinks the device might
>> want to use PASIDs (because of the architectural limitation that the RID
>> always operates in GPA space so can't have its own independent translation).
>
> AMD forces this because it doesn't yet have a way to automatically
> choose it's v1/v2 page table format during alloc domain. It is just a
> SW bug.

Yes. This will be fixed.

>
> The CC/SNP limitation is also a SW bug but is more fatal as it can't
> even attach a v1 page table in this mode.

Memory Encryption needs Encryption bit support. So we enforce Paging mode (Both
AMD v1 and v2 page table format works fine).

SNP is hardware limitation. When SNP is enabled then IOMMU must be configured
with v1 page table. It won't support Identity mapping.

>
>> Either way, though, there's really little sense to that argument - if
>> enforcing strict translation *might* compromise the device's functionality,
>> we should instead go out of our way to ensure it's definitely as broken as
>> possible? I fail to see how that can be justified as useful or desirable
>> behaviour.
>
> For SNP cases the attach of a DMA domain will fail, so yes, moving the
> failure earlier and giving a clear message is desirable.

Its handled during initialization itself (iommu_snp_enable()). If IOMMU is not
configured with V1 page table we don't enable SNP.

-Vasant


>
>> "Failing" iommu_probe_device is merely how we tell ourselves that we're not
>> interested in a device, and consequently tell the rest of the kernel it
>> doesn't have an IOMMU (via device_iommu_mapped() returning false).
>
> Probing failing with ENODEV means the device has no iommu and the rest
> of the code should assume DMA direct will work.
>
> Probing failing with any other error code means the device has an
> iommu and it couldn't be setup. DMA direct probably won't work today.
>
> If you want all failure codes to mean the device is safe for DMA
> direct then we need to try and attach the IDENTITY domain on various
> probe failure paths too.
>
>> I think I've now satisfied myself that a simple fix for the core code is
>> appropriate and will write that up now; one other thing I couldn't
>> quite
>
> It really doesn't match the design here.
>
> Jason

2024-04-17 10:37:00

by Robin Murphy

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

On 2024-04-16 1:44 pm, Vasant Hegde wrote:
> Robin,
>
> On 4/16/2024 4:55 PM, Robin Murphy wrote:
>> On 2024-04-16 1:39 am, Jason Gunthorpe wrote:
>>> On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
>>>> On 2024-04-15 7:57 pm, Eric Wagner wrote:
>>>>> Apologies if I made a mistake in the first bisect, I'm new to kernel
>>>>> debugging.
>>>>>
>>>>> I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
>>>>> 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both
>>>>> were good.
>>>>> Then I ran git bisect again with
>>>>> e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
>>>>> as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good
>>>>> and the
>>>>> bisect log is attached. It ended up at the same commit as before.
>>>>>
>>>>> I've also attached a picture of the boot screen that occurs when it
>>>>> hangs.
>>>>> 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's
>>>>> causing the
>>>>> problem.
>
> .../...
>
>>
>> "Failing" iommu_probe_device is merely how we tell ourselves that
>> we're not interested in a device, and consequently tell the rest of
>> the kernel it doesn't have an IOMMU (via device_iommu_mapped()
>> returning false). This is normal and expected for devices which
>> legitimately have no IOMMU in the first place; conversely we don't do
>> a great deal for unexpected failures since those typically represent
>> system-fatal conditions whatever we might try to do. We've never had
>> much of a notion of expected failures when an IOMMU *is* present, but
>> even then, denying any trace of the IOMMU and removing ourselves from
>> the picture is clearly not the ideal way to approach that. We're
>> running off a bus notifier (or even later), so ultimately our return
>> value is meaningless; at that point the device already exists and has
>> been added to its bus, we can't undo that.
>>
>> However it looks to be even more fun if failure occurs in *deferred*
>> default domain creation via bus_iommu_probe(), since then we give up
>> and dismiss the entire IOMMU. Except the x86 drivers ignore the return
>> from iommu_device_register(), so further hilarity ensues...
>>
>> I think I've now satisfied myself that a simple fix for the core code
>> is appropriate and will write that up now; one other thing I couldn't
>> quite figure out is whether the AMD driver somehow prevents PASIDs
>> being used while the group is attached to a non-identity (and
>> non-nested) domain - that's probably one for Vasant to confirm.
>
> AMD driver supports PASID with below domain type :
>   - Identity domain
>   - DMA translation mode (DMA and DMA_FQ) with AMD v2 page table
> (amd_iommu=pgtbl_v2).
>
>
> Currently amd_iommu_def_domain_type() tries to put PASID capable devices
> in identity domain mode. This is something to fix. Its in my TODO list.
> I will try to get into it soon.
>
> Hope this clarifies.

Ooh, I see you now have GIoV to allow that similarly to how SMMUv3 does
it - that wasn't in the older version of the spec that I've previously
been referring to :)

Can you confirm there's no hardware actually been made to the older
spec, supporting v2 and PASIDs but *not* having GIoV? Otherwise, I think
you'll still have the problem that if you use the GPA-SPA translation in
the DTE to implement IOMMU_DOMAIN_DMA for the RID, it makes all the
PASID GVA-GPA mappings useless for host SVA.

Cheers,
Robin.

2024-04-18 05:01:40

by Vasant Hegde

[permalink] [raw]
Subject: Re: Kernel 6.7 regression doesn't boot if using AMD eGPU

Hi Robin,

On 4/17/2024 4:06 PM, Robin Murphy wrote:
> On 2024-04-16 1:44 pm, Vasant Hegde wrote:
>> Robin,
>>
>> On 4/16/2024 4:55 PM, Robin Murphy wrote:
>>> On 2024-04-16 1:39 am, Jason Gunthorpe wrote:
>>>> On Mon, Apr 15, 2024 at 10:44:34PM +0100, Robin Murphy wrote:
>>>>> On 2024-04-15 7:57 pm, Eric Wagner wrote:
>>>>>> Apologies if I made a mistake in the first bisect, I'm new to kernel
>>>>>> debugging.
>>>>>>
>>>>>> I tested cedc811c76778bdef91d405717acee0de54d8db5 (x86/amd) and
>>>>>> 3613047280ec42a4e1350fdc1a6dd161ff4008cc (core) directly and both were good.
>>>>>> Then I ran git bisect again with e8cca466a84a75f8ff2a7a31173c99ee6d1c59d2
>>>>>> as the bad and 6e6c6d6bc6c96c2477ddfea24a121eb5ee12b7a3 as the good and the
>>>>>> bisect log is attached. It ended up at the same commit as before.
>>>>>>
>>>>>> I've also attached a picture of the boot screen that occurs when it hangs.
>>>>>> 0000:05:00.0 is the PCIe bus address of the RX 580 eGPU that's causing the
>>>>>> problem.
>>
>> .../...
>>
>>>
>>> "Failing" iommu_probe_device is merely how we tell ourselves that we're not
>>> interested in a device, and consequently tell the rest of the kernel it
>>> doesn't have an IOMMU (via device_iommu_mapped() returning false). This is
>>> normal and expected for devices which legitimately have no IOMMU in the first
>>> place; conversely we don't do a great deal for unexpected failures since
>>> those typically represent system-fatal conditions whatever we might try to
>>> do. We've never had much of a notion of expected failures when an IOMMU *is*
>>> present, but even then, denying any trace of the IOMMU and removing ourselves
>>> from the picture is clearly not the ideal way to approach that. We're running
>>> off a bus notifier (or even later), so ultimately our return value is
>>> meaningless; at that point the device already exists and has been added to
>>> its bus, we can't undo that.
>>>
>>> However it looks to be even more fun if failure occurs in *deferred* default
>>> domain creation via bus_iommu_probe(), since then we give up and dismiss the
>>> entire IOMMU. Except the x86 drivers ignore the return from
>>> iommu_device_register(), so further hilarity ensues...
>>>
>>> I think I've now satisfied myself that a simple fix for the core code is
>>> appropriate and will write that up now; one other thing I couldn't quite
>>> figure out is whether the AMD driver somehow prevents PASIDs being used while
>>> the group is attached to a non-identity (and non-nested) domain - that's
>>> probably one for Vasant to confirm.
>>
>> AMD driver supports PASID with below domain type :
>>    - Identity domain
>>    - DMA translation mode (DMA and DMA_FQ) with AMD v2 page table
>> (amd_iommu=pgtbl_v2).
>>
>>
>> Currently amd_iommu_def_domain_type() tries to put PASID capable devices in
>> identity domain mode. This is something to fix. Its in my TODO list. I will
>> try to get into it soon.
>>
>> Hope this clarifies.
>
> Ooh, I see you now have GIoV to allow that similarly to how SMMUv3 does it -
> that wasn't in the older version of the spec that I've previously been referring
> to :)

Right. This got added later.

>
> Can you confirm there's no hardware actually been made to the older spec,
> supporting v2 and PASIDs but *not* having GIoV? Otherwise, I think you'll still
> have the problem that if you use the GPA-SPA translation in the DTE to implement
> IOMMU_DOMAIN_DMA for the RID, it makes all the PASID GVA-GPA mappings useless
> for host SVA.

I believe we did made HW with old spec. Fortunately we have sufficient feature
bit to detect those feature support. I will have to carefully tweak the
amd_iommu_def_domain_type().

-Vasant