2018-07-10 18:46:19

by Robin Murphy

[permalink] [raw]
Subject: [RFC PATCH 0/4] Stop losing firmware-set DMA masks

Whilst the common firmware code invoked by dma_configure() initialises
devices' DMA masks according to limitations described by the respective
properties ("dma-ranges" for OF and _DMA/IORT for ACPI), the nature of
the dma_set_mask() API leads to that information getting lost when
well-behaved drivers probe and set a 64-bit mask, since in general
there's no way to tell the difference between a firmware-described mask
(which should be respected) and whatever default may have come from the
bus code (which should be replaced outright). This can break DMA on
systems with certain IOMMU topologies (e.g. [1]) where the IOMMU driver
only knows its maximum supported address size, not how many of those
address bits might actually be wired up between any of its input
interfaces and the associated DMA master devices. Similarly, some PCIe
root complexes only have a 32-bit native interface on their host bridge,
which leads to the same DMA-address-truncation problem in systems with a
larger physical memory map and RAM above 4GB (e.g. [2]).

These patches attempt to deal with this in the simplest way possible by
generalising the specific quirk for 32-bit bridges into an arbitrary
mask which can then also be plumbed into the firmware code. In the
interest of being minimally invasive, I've only included a point fix
for the IOMMU issue as seen on arm64 - there may be further tweaks
needed in DMA ops to catch all possible incarnations of this problem,
but this initial RFC is mostly about the impact beyond the dma-mapping
subsystem itself.

Robin.


[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/580804.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-December/474443.html

Robin Murphy (4):
dma-mapping: Generalise dma_32bit_limit flag
ACPI/IORT: Set bus DMA mask as appropriate
of/device: Set bus DMA mask as appropriate
iommu/dma: Respect bus DMA limit for IOVAs

arch/x86/kernel/pci-dma.c | 2 +-
drivers/acpi/arm64/iort.c | 1 +
drivers/iommu/dma-iommu.c | 3 +++
drivers/of/device.c | 1 +
include/linux/device.h | 6 +++---
kernel/dma/direct.c | 2 +-
6 files changed, 10 insertions(+), 5 deletions(-)

--
2.17.1.dirty



2018-07-10 18:19:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 2/4] ACPI/IORT: Set bus DMA mask as appropriate

On Tue, Jul 10, 2018 at 06:17:17PM +0100, Robin Murphy wrote:
> When an explicit DMA limit is described by firmware, we need to remember
> it regardless of how drivers might subsequently update their devices'
> masks. The new bus_dma_mask field does that.

Shouldn't we also stop presetting the dma mask after this?

2018-07-10 18:20:05

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] Stop losing firmware-set DMA masks

On 10/07/18 19:02, Christoph Hellwig wrote:
>> These patches attempt to deal with this in the simplest way possible by
>> generalising the specific quirk for 32-bit bridges into an arbitrary
>> mask which can then also be plumbed into the firmware code. In the
>> interest of being minimally invasive, I've only included a point fix
>> for the IOMMU issue as seen on arm64 - there may be further tweaks
>> needed in DMA ops to catch all possible incarnations of this problem,
>> but this initial RFC is mostly about the impact beyond the dma-mapping
>> subsystem itself.
>
> Thanks, this looks very nice to me.
>
> In fact it probably solves the RISC-V/Xiling problem as well if we can
> just add the dma-ranges property to the device tree for the affected
> systems. Palmer, do you know how easily the DT could be updated for
> that case?

That would indeed be nice, but beware that the way PCI devices are
bodged through of_dma_configure() only actually works for dma-coherent
and is busted WRT parsing dma-ranges correctly. Fixing that is my next
step after getting these basics done ;)

Robin.

2018-07-10 18:23:43

by Atish Patra

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] Stop losing firmware-set DMA masks

On 7/10/18 11:01 AM, Christoph Hellwig wrote:
>> These patches attempt to deal with this in the simplest way possible by
>> generalising the specific quirk for 32-bit bridges into an arbitrary
>> mask which can then also be plumbed into the firmware code. In the
>> interest of being minimally invasive, I've only included a point fix
>> for the IOMMU issue as seen on arm64 - there may be further tweaks
>> needed in DMA ops to catch all possible incarnations of this problem,
>> but this initial RFC is mostly about the impact beyond the dma-mapping
>> subsystem itself.
>
> Thanks, this looks very nice to me.
>
> In fact it probably solves the RISC-V/Xiling problem as well if we can
> just add the dma-ranges property to the device tree for the affected
> systems. Palmer, do you know how easily the DT could be updated for
> that case?
>
Hi Chris,
I have a PR in riscv-pk that can modify the DT in bbl easily. In fact,
that's how I added the timer node for the interrupt patch.

https://github.com/riscv/riscv-pk/pull/112

Obviously, the best approach would be to update the firmware but that
may be time consuming sometime.

Regards,
Atish
>>
>> Robin.
>>
>>
>> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/580804.html
>> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-December/474443.html
>>
>> Robin Murphy (4):
>> dma-mapping: Generalise dma_32bit_limit flag
>> ACPI/IORT: Set bus DMA mask as appropriate
>> of/device: Set bus DMA mask as appropriate
>> iommu/dma: Respect bus DMA limit for IOVAs
>>
>> arch/x86/kernel/pci-dma.c | 2 +-
>> drivers/acpi/arm64/iort.c | 1 +
>> drivers/iommu/dma-iommu.c | 3 +++
>> drivers/of/device.c | 1 +
>> include/linux/device.h | 6 +++---
>> kernel/dma/direct.c | 2 +-
>> 6 files changed, 10 insertions(+), 5 deletions(-)
>>
>> --
>> 2.17.1.dirty
> ---end quoted text---
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>


2018-07-10 18:35:04

by Robin Murphy

[permalink] [raw]
Subject: [RFC PATCH 2/4] ACPI/IORT: Set bus DMA mask as appropriate

When an explicit DMA limit is described by firmware, we need to remember
it regardless of how drivers might subsequently update their devices'
masks. The new bus_dma_mask field does that.

Signed-off-by: Robin Murphy <[email protected]>
---
drivers/acpi/arm64/iort.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 4a66896e2aa3..bc51cff5505e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1014,6 +1014,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
* Limit coherent and dma mask based on size
* retrieved from firmware.
*/
+ dev->bus_dma_mask = mask;
dev->coherent_dma_mask = mask;
*dev->dma_mask = mask;
}
--
2.17.1.dirty


2018-07-10 18:45:43

by Robin Murphy

[permalink] [raw]
Subject: [RFC PATCH 3/4] of/device: Set bus DMA mask as appropriate

When an explicit DMA limit is described by firmware, we need to remember
it regardless of how drivers might subsequently update their devices'
masks. The new bus_dma_mask field does that.

Signed-off-by: Robin Murphy <[email protected]>
---
drivers/of/device.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 33d85511d790..0d39633e8545 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -149,6 +149,7 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma)
* set by the driver.
*/
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
+ dev->bus_dma_mask = mask;
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;

--
2.17.1.dirty


2018-07-10 18:45:59

by Robin Murphy

[permalink] [raw]
Subject: [RFC PATCH 4/4] iommu/dma: Respect bus DMA limit for IOVAs

Take the new bus limit into account (when present) for IOVA allocations,
to accommodate those SoCs which integrate off-the-shelf IP blocks with
narrower interconnects such that the link between a device output and an
IOMMU input can truncate DMA addresses to even fewer bits than the
native size of either block's interface would imply.

Eventually it might make sense for the DMA core to apply this constraint
up-front in dma_set_mask() and friends, but for now this seems like the
least risky approach.

Signed-off-by: Robin Murphy <[email protected]>
---
drivers/iommu/dma-iommu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ddcbbdb5d658..511ff9a1d6d9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -367,6 +367,9 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
iova_len = roundup_pow_of_two(iova_len);

+ if (dev->bus_dma_mask)
+ dma_limit &= dev->bus_dma_mask;
+
if (domain->geometry.force_aperture)
dma_limit = min(dma_limit, domain->geometry.aperture_end);

--
2.17.1.dirty


2018-07-10 18:46:15

by Robin Murphy

[permalink] [raw]
Subject: [RFC PATCH 1/4] dma-mapping: Generalise dma_32bit_limit flag

Whilst the notion of an upstream DMA restriction is most commonly seen
via PCI host bridges saddled with a 32-bit native interface, a more
general version of the same issue can exist on complex SoCs where a bus
or point-to-point interconnect link carries fewer address bits between a
device and a downstream component (often an IOMMU) than both blocks
nominally support. In order to properly deal with this, first grow the
dma_32bit_limit flag into an arbitrary mask.

To minimise the impact on existing code, we'll only consider this new
mask valid if set. That makes sense anyway, since cases where DMA is not
wired up at all would be better handled by not giving the device valid
ops in the first place.

Signed-off-by: Robin Murphy <[email protected]>
---
arch/x86/kernel/pci-dma.c | 2 +-
include/linux/device.h | 6 +++---
kernel/dma/direct.c | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index ab5d9dd668d2..0b60f2a9dace 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -175,7 +175,7 @@ rootfs_initcall(pci_iommu_init);

static int via_no_dac_cb(struct pci_dev *pdev, void *data)
{
- pdev->dev.dma_32bit_limit = true;
+ pdev->dev_dma_mask = DMA_BIT_MASK(32);
return 0;
}

diff --git a/include/linux/device.h b/include/linux/device.h
index 055a69dbcd18..6d3b000be57e 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -886,6 +886,8 @@ struct dev_links_info {
* @coherent_dma_mask: Like dma_mask, but for alloc_coherent mapping as not all
* hardware supports 64-bit addresses for consistent allocations
* such descriptors.
+ * @bus_dma_mask: Mask of an upstream bridge or bus which imposes a smaller DMA
+ * limit than the device itself supports.
* @dma_pfn_offset: offset of DMA memory range relatively of RAM
* @dma_parms: A low level driver may set these to teach IOMMU code about
* segment limitations.
@@ -912,8 +914,6 @@ struct dev_links_info {
* @offline: Set after successful invocation of bus type's .offline().
* @of_node_reused: Set if the device-tree node is shared with an ancestor
* device.
- * @dma_32bit_limit: bridge limited to 32bit DMA even if the device itself
- * indicates support for a higher limit in the dma_mask field.
*
* At the lowest level, every device in a Linux system is represented by an
* instance of struct device. The device structure contains the information
@@ -967,6 +967,7 @@ struct device {
not all hardware supports
64 bit addresses for consistent
allocations such descriptors. */
+ u64 bus_dma_mask; /* upstream dma_mask constraint */
unsigned long dma_pfn_offset;

struct device_dma_parameters *dma_parms;
@@ -1002,7 +1003,6 @@ struct device {
bool offline_disabled:1;
bool offline:1;
bool of_node_reused:1;
- bool dma_32bit_limit:1;
};

static inline struct device *kobj_to_dev(struct kobject *kobj)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8be8106270c2..95e185347e34 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -183,7 +183,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
* Various PCI/PCIe bridges have broken support for > 32bit DMA even
* if the device itself might support it.
*/
- if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32))
+ if (dev->bus_dma_mask && mask > dev->bus_dma_mask)
return 0;
return 1;
}
--
2.17.1.dirty


2018-07-10 19:48:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] dma-mapping: Generalise dma_32bit_limit flag

On Tue, Jul 10, 2018 at 06:17:16PM +0100, Robin Murphy wrote:
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 8be8106270c2..95e185347e34 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -183,7 +183,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
> * Various PCI/PCIe bridges have broken support for > 32bit DMA even
> * if the device itself might support it.
> */
> - if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32))
> + if (dev->bus_dma_mask && mask > dev->bus_dma_mask)
> return 0;

The comment above this check needs an updated (or just be removed).

Also we still have a few architectures not using dma-direct. I guess
most were doing fine without such limits anyway, but at least arm
will probably need an equivalent check.

2018-07-10 20:13:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] Stop losing firmware-set DMA masks

> These patches attempt to deal with this in the simplest way possible by
> generalising the specific quirk for 32-bit bridges into an arbitrary
> mask which can then also be plumbed into the firmware code. In the
> interest of being minimally invasive, I've only included a point fix
> for the IOMMU issue as seen on arm64 - there may be further tweaks
> needed in DMA ops to catch all possible incarnations of this problem,
> but this initial RFC is mostly about the impact beyond the dma-mapping
> subsystem itself.

Thanks, this looks very nice to me.

In fact it probably solves the RISC-V/Xiling problem as well if we can
just add the dma-ranges property to the device tree for the affected
systems. Palmer, do you know how easily the DT could be updated for
that case?

>
> Robin.
>
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/580804.html
> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-December/474443.html
>
> Robin Murphy (4):
> dma-mapping: Generalise dma_32bit_limit flag
> ACPI/IORT: Set bus DMA mask as appropriate
> of/device: Set bus DMA mask as appropriate
> iommu/dma: Respect bus DMA limit for IOVAs
>
> arch/x86/kernel/pci-dma.c | 2 +-
> drivers/acpi/arm64/iort.c | 1 +
> drivers/iommu/dma-iommu.c | 3 +++
> drivers/of/device.c | 1 +
> include/linux/device.h | 6 +++---
> kernel/dma/direct.c | 2 +-
> 6 files changed, 10 insertions(+), 5 deletions(-)
>
> --
> 2.17.1.dirty
---end quoted text---

2018-07-11 14:42:37

by Rob Herring

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] Stop losing firmware-set DMA masks

On Tue, Jul 10, 2018 at 12:43 PM Robin Murphy <[email protected]> wrote:
>
> Whilst the common firmware code invoked by dma_configure() initialises
> devices' DMA masks according to limitations described by the respective
> properties ("dma-ranges" for OF and _DMA/IORT for ACPI), the nature of
> the dma_set_mask() API leads to that information getting lost when
> well-behaved drivers probe and set a 64-bit mask, since in general
> there's no way to tell the difference between a firmware-described mask
> (which should be respected) and whatever default may have come from the
> bus code (which should be replaced outright). This can break DMA on
> systems with certain IOMMU topologies (e.g. [1]) where the IOMMU driver
> only knows its maximum supported address size, not how many of those
> address bits might actually be wired up between any of its input
> interfaces and the associated DMA master devices. Similarly, some PCIe
> root complexes only have a 32-bit native interface on their host bridge,
> which leads to the same DMA-address-truncation problem in systems with a
> larger physical memory map and RAM above 4GB (e.g. [2]).
>
> These patches attempt to deal with this in the simplest way possible by
> generalising the specific quirk for 32-bit bridges into an arbitrary
> mask which can then also be plumbed into the firmware code. In the
> interest of being minimally invasive, I've only included a point fix
> for the IOMMU issue as seen on arm64 - there may be further tweaks
> needed in DMA ops to catch all possible incarnations of this problem,
> but this initial RFC is mostly about the impact beyond the dma-mapping
> subsystem itself.

Couldn't you set and use the device's parent's dma_mask instead. At
least for DT, we should always have a parent device representing the
bus. That would avoid further bloating of struct device.

Rob

2018-07-11 19:55:39

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH 2/4] ACPI/IORT: Set bus DMA mask as appropriate

On 10/07/18 19:04, Christoph Hellwig wrote:
> On Tue, Jul 10, 2018 at 06:17:17PM +0100, Robin Murphy wrote:
>> When an explicit DMA limit is described by firmware, we need to remember
>> it regardless of how drivers might subsequently update their devices'
>> masks. The new bus_dma_mask field does that.
>
> Shouldn't we also stop presetting the dma mask after this?

I guess initialising the device masks here only really has any effect if
drivers fail to set their own, so if we're getting stricter about that
then it would make sense to stop; I'll add a couple of patches on top to
clean that up.

Robin.

2018-07-11 21:08:15

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] Stop losing firmware-set DMA masks

On 11/07/18 15:40, Rob Herring wrote:
> On Tue, Jul 10, 2018 at 12:43 PM Robin Murphy <[email protected]> wrote:
>>
>> Whilst the common firmware code invoked by dma_configure() initialises
>> devices' DMA masks according to limitations described by the respective
>> properties ("dma-ranges" for OF and _DMA/IORT for ACPI), the nature of
>> the dma_set_mask() API leads to that information getting lost when
>> well-behaved drivers probe and set a 64-bit mask, since in general
>> there's no way to tell the difference between a firmware-described mask
>> (which should be respected) and whatever default may have come from the
>> bus code (which should be replaced outright). This can break DMA on
>> systems with certain IOMMU topologies (e.g. [1]) where the IOMMU driver
>> only knows its maximum supported address size, not how many of those
>> address bits might actually be wired up between any of its input
>> interfaces and the associated DMA master devices. Similarly, some PCIe
>> root complexes only have a 32-bit native interface on their host bridge,
>> which leads to the same DMA-address-truncation problem in systems with a
>> larger physical memory map and RAM above 4GB (e.g. [2]).
>>
>> These patches attempt to deal with this in the simplest way possible by
>> generalising the specific quirk for 32-bit bridges into an arbitrary
>> mask which can then also be plumbed into the firmware code. In the
>> interest of being minimally invasive, I've only included a point fix
>> for the IOMMU issue as seen on arm64 - there may be further tweaks
>> needed in DMA ops to catch all possible incarnations of this problem,
>> but this initial RFC is mostly about the impact beyond the dma-mapping
>> subsystem itself.
>
> Couldn't you set and use the device's parent's dma_mask instead. At
> least for DT, we should always have a parent device representing the
> bus. That would avoid further bloating of struct device.

But then if the parent device did have a non-trivial driver which calls
dma_set_mask(), we'd be back at square 1 :/

More realistically, I don't think that's viable for ACPI, at least with
IORT, since the memory address size limit belongs to the endpoint
itself, thus two devices with the same nominal parent in the Linux
device model could still have different limits (where in DT you'd have
to have to insert intermediate simple-bus nodes to model the same
topology with dma-ranges). Plus either way it seems somewhat fragile for
PCI where the host bridge may be some distance up the hierarchy.

Robin.

2018-07-11 23:32:17

by Robin Murphy

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] dma-mapping: Generalise dma_32bit_limit flag

On 10/07/18 19:04, Christoph Hellwig wrote:
> On Tue, Jul 10, 2018 at 06:17:16PM +0100, Robin Murphy wrote:
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index 8be8106270c2..95e185347e34 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -183,7 +183,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>> * Various PCI/PCIe bridges have broken support for > 32bit DMA even
>> * if the device itself might support it.
>> */
>> - if (dev->dma_32bit_limit && mask > DMA_BIT_MASK(32))
>> + if (dev->bus_dma_mask && mask > dev->bus_dma_mask)
>> return 0;
>
> The comment above this check needs an updated (or just be removed).

Right, I'll give it a tweak. I could also do with actually getting the
field name correct in via_no_dac_cb()...

> Also we still have a few architectures not using dma-direct. I guess
> most were doing fine without such limits anyway, but at least arm
> will probably need an equivalent check.

Indeed, once we've found an approach that everyone's happy with we can
have a more thorough audit of exactly where else it needs to be applied.
FWIW I'm not aware of any 32-bit Arm systems affected by this*, but if
they do exist then at least there's no risk of regression since they've
always been busted.

Robin.


* Not counting the somewhat-similar StrongArm DMA controller bug where
one bit in the *middle* of the mask is unusable. Let's keep that
confined to the Arm dmabounce code and never speak of it...

2018-07-12 07:19:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] dma-mapping: Generalise dma_32bit_limit flag

On Wed, Jul 11, 2018 at 05:56:40PM +0100, Robin Murphy wrote:
> Indeed, once we've found an approach that everyone's happy with we can have
> a more thorough audit of exactly where else it needs to be applied. FWIW
> I'm not aware of any 32-bit Arm systems affected by this*, but if they do
> exist then at least there's no risk of regression since they've always been
> busted.

Ok, great. I just assumed arm might be affected due to the fact that
it currently parses dma-ranges DT properties.