2021-04-14 14:51:09

by Jean-Philippe Brucker

[permalink] [raw]
Subject: [PATCH] x86/dma: Tear down DMA ops on driver unbind

Since commit 08a27c1c3ecf ("iommu: Add support to change default domain
of an iommu group") a user can switch a device between IOMMU and direct
DMA through sysfs. This doesn't work for AMD IOMMU at the moment because
dev->dma_ops is not cleared when switching from a DMA to an identity
IOMMU domain. The DMA layer thus attempts to use the dma-iommu ops on an
identity domain, causing an oops:

# echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/unbind
# echo identity > /sys/bus/pci/devices/0000:00:05.0/iommu_group/type
# echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/bind
...
[ 190.017587] BUG: kernel NULL pointer dereference, address: 0000000000000028
...
[ 190.027375] Call Trace:
[ 190.027561] iommu_dma_alloc+0xd0/0x100
[ 190.027840] e1000e_setup_tx_resources+0x56/0x90
[ 190.028173] e1000e_open+0x75/0x5b0

Implement arch_teardown_dma_ops() on x86 to clear the device's dma_ops
pointer during driver unbind.

Fixes: 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu group")
Signed-off-by: Jean-Philippe Brucker <[email protected]>
---
arch/x86/Kconfig | 1 +
arch/x86/kernel/pci-dma.c | 7 +++++++
2 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..2c90f8de3e20 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -85,6 +85,7 @@ config X86
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_SYSCALL_WRAPPER
+ select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_DMA
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_DEBUG_WX
select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de234e7a8962..60a4ec22d849 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -154,3 +154,10 @@ static void via_no_dac(struct pci_dev *dev)
DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
PCI_CLASS_BRIDGE_PCI, 8, via_no_dac);
#endif
+
+#ifdef CONFIG_ARCH_HAS_TEARDOWN_DMA_OPS
+void arch_teardown_dma_ops(struct device *dev)
+{
+ set_dma_ops(dev, NULL);
+}
+#endif
--
2.31.1


Subject: [tip: x86/urgent] x86/dma: Tear down DMA ops on driver unbind

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 9f8614f5567eb4e38579422d38a1bdfeeb648ffc
Gitweb: https://git.kernel.org/tip/9f8614f5567eb4e38579422d38a1bdfeeb648ffc
Author: Jean-Philippe Brucker <[email protected]>
AuthorDate: Wed, 14 Apr 2021 10:26:34 +02:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Thu, 15 Apr 2021 10:27:29 +02:00

x86/dma: Tear down DMA ops on driver unbind

Since

08a27c1c3ecf ("iommu: Add support to change default domain of an iommu group")

a user can switch a device between IOMMU and direct DMA through sysfs.
This doesn't work for AMD IOMMU at the moment because dev->dma_ops is
not cleared when switching from a DMA to an identity IOMMU domain. The
DMA layer thus attempts to use the dma-iommu ops on an identity domain,
causing an oops:

# echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/unbind
# echo identity > /sys/bus/pci/devices/0000:00:05.0/iommu_group/type
# echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/bind
...
BUG: kernel NULL pointer dereference, address: 0000000000000028
...
Call Trace:
iommu_dma_alloc
e1000e_setup_tx_resources
e1000e_open

Implement arch_teardown_dma_ops() on x86 to clear the device's dma_ops
pointer during driver unbind.

[ bp: Massage commit message. ]

Fixes: 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu group")
Signed-off-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/Kconfig | 1 +
arch/x86/kernel/pci-dma.c | 7 +++++++
2 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879..2c90f8d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -85,6 +85,7 @@ config X86
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
select ARCH_HAS_SYSCALL_WRAPPER
+ select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_DMA
select ARCH_HAS_UBSAN_SANITIZE_ALL
select ARCH_HAS_DEBUG_WX
select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de234e7..60a4ec2 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -154,3 +154,10 @@ static void via_no_dac(struct pci_dev *dev)
DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
PCI_CLASS_BRIDGE_PCI, 8, via_no_dac);
#endif
+
+#ifdef CONFIG_ARCH_HAS_TEARDOWN_DMA_OPS
+void arch_teardown_dma_ops(struct device *dev)
+{
+ set_dma_ops(dev, NULL);
+}
+#endif

2021-04-17 12:08:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip: x86/urgent] x86/dma: Tear down DMA ops on driver unbind

On Thu, Apr 15, 2021 at 09:00:57AM -0000, tip-bot2 for Jean-Philippe Brucker wrote:
> The following commit has been merged into the x86/urgent branch of tip:
>
> Commit-ID: 9f8614f5567eb4e38579422d38a1bdfeeb648ffc
> Gitweb: https://git.kernel.org/tip/9f8614f5567eb4e38579422d38a1bdfeeb648ffc
> Author: Jean-Philippe Brucker <[email protected]>
> AuthorDate: Wed, 14 Apr 2021 10:26:34 +02:00
> Committer: Borislav Petkov <[email protected]>
> CommitterDate: Thu, 15 Apr 2021 10:27:29 +02:00
>
> x86/dma: Tear down DMA ops on driver unbind
>
> Since
>
> 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu group")
>
> a user can switch a device between IOMMU and direct DMA through sysfs.
> This doesn't work for AMD IOMMU at the moment because dev->dma_ops is
> not cleared when switching from a DMA to an identity IOMMU domain. The
> DMA layer thus attempts to use the dma-iommu ops on an identity domain,
> causing an oops:
>
> # echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/unbind
> # echo identity > /sys/bus/pci/devices/0000:00:05.0/iommu_group/type
> # echo 0000:00:05.0 > /sys/sys/bus/pci/drivers/e1000e/bind
> ...
> BUG: kernel NULL pointer dereference, address: 0000000000000028
> ...
> Call Trace:
> iommu_dma_alloc
> e1000e_setup_tx_resources
> e1000e_open
>
> Implement arch_teardown_dma_ops() on x86 to clear the device's dma_ops
> pointer during driver unbind.
>
> [ bp: Massage commit message. ]
>
> Fixes: 08a27c1c3ecf ("iommu: Add support to change default domain of an iommu group")
> Signed-off-by: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/pci-dma.c | 7 +++++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2792879..2c90f8d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -85,6 +85,7 @@ config X86
> select ARCH_HAS_STRICT_MODULE_RWX
> select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
> select ARCH_HAS_SYSCALL_WRAPPER
> + select ARCH_HAS_TEARDOWN_DMA_OPS if IOMMU_DMA
> select ARCH_HAS_UBSAN_SANITIZE_ALL
> select ARCH_HAS_DEBUG_WX
> select ARCH_HAVE_NMI_SAFE_CMPXCHG
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index de234e7..60a4ec2 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -154,3 +154,10 @@ static void via_no_dac(struct pci_dev *dev)
> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_VIA, PCI_ANY_ID,
> PCI_CLASS_BRIDGE_PCI, 8, via_no_dac);
> #endif
> +
> +#ifdef CONFIG_ARCH_HAS_TEARDOWN_DMA_OPS
> +void arch_teardown_dma_ops(struct device *dev)
> +{
> + set_dma_ops(dev, NULL);
> +}
> +#endif

Nope, sorry, no joy. Zapping it from tip.

With that patch, it fails booting on my test box with messages like
(typing up from video I took):

...
ata: softreset failed (1st FIS failed)
ahci 0000:03:00:1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=...]
ahci 0000:03:00:1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=...]
<--- EOF

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-04-19 11:19:00

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [tip: x86/urgent] x86/dma: Tear down DMA ops on driver unbind

On Sat, Apr 17, 2021 at 02:06:44PM +0200, Borislav Petkov wrote:
> Nope, sorry, no joy. Zapping it from tip.
>
> With that patch, it fails booting on my test box with messages like
> (typing up from video I took):
>
> ...
> ata: softreset failed (1st FIS failed)
> ahci 0000:03:00:1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=...]
> ahci 0000:03:00:1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=...]

Sorry about that, I only tested under QEMU. I'll try to reproduce this on
an AMD laptop.

Thanks,
Jean