2020-10-21 14:02:14

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH v4 0/7] arm64: Default to 32-bit wide ZONE_DMA

Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v3:
- Drop patch adding define in dma-mapping
- Address small review changes
- Update Ard's patch
- Add new patch removing examples from mmzone.h

Changes since v2:
- Introduce Ard's patch
- Improve OF dma-ranges parsing function
- Add unit test for OF function
- Address small changes
- Move crashkernel reservation later in boot process

Changes since v1:
- Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (6):
arm64: mm: Move reserve_crashkernel() into mem_init()
arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
of/address: Introduce of_dma_get_max_cpu_address()
of: unittest: Add test for of_dma_get_max_cpu_address()
arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
mm: Remove examples from enum zone_type comment

arch/arm64/mm/init.c | 16 ++++++------
drivers/acpi/arm64/iort.c | 52 +++++++++++++++++++++++++++++++++++++++
drivers/of/address.c | 42 +++++++++++++++++++++++++++++++
drivers/of/unittest.c | 18 ++++++++++++++
include/linux/acpi_iort.h | 4 +++
include/linux/mmzone.h | 20 ---------------
include/linux/of.h | 7 ++++++
7 files changed, 130 insertions(+), 29 deletions(-)

--
2.28.0


2020-10-21 14:02:31

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH v4 7/7] mm: Remove examples from enum zone_type comment

We can't really list every setup in common code. On top of that they are
unlikely to stay true for long as things change in the arch trees
independently of this comment.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
include/linux/mmzone.h | 20 --------------------
1 file changed, 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..9d0c454d23cd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -354,26 +354,6 @@ enum zone_type {
* DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
* platforms may need both zones as they support peripherals with
* different DMA addressing limitations.
- *
- * Some examples:
- *
- * - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the
- * rest of the lower 4G.
- *
- * - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
- * the specific device.
- *
- * - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
- * lower 4G.
- *
- * - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
- * depending on the specific device.
- *
- * - s390 uses ZONE_DMA fixed to the lower 2G.
- *
- * - ia64 and riscv only use ZONE_DMA32.
- *
- * - parisc uses neither.
*/
#ifdef CONFIG_ZONE_DMA
ZONE_DMA,
--
2.28.0

2020-10-22 00:05:30

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.

So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>

---

Changes since v3:
- Simplify code for readability.

Changes since v2:
- Updated commit log by shamelessly copying Ard's ACPI commit log

arch/arm64/mm/init.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 410721fc4fc0..94e38f99748b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
#include <asm/tlb.h>
#include <asm/alternative.h>

-#define ARM64_ZONE_DMA_BITS 30
-
/*
* We need to be able to catch inadvertent references to memstart_addr
* that occur (potentially in generic code) before arm64_memblock_init()
@@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
static void __init zone_sizes_init(unsigned long min, unsigned long max)
{
unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
+ unsigned int __maybe_unused dt_zone_dma_bits;

#ifdef CONFIG_ZONE_DMA
- zone_dma_bits = ARM64_ZONE_DMA_BITS;
+ dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
+ zone_dma_bits = min(32U, dt_zone_dma_bits);
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
#endif
--
2.28.0

2020-10-23 03:35:31

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> static void __init zone_sizes_init(unsigned long min, unsigned long max)
> {
> unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
> + unsigned int __maybe_unused dt_zone_dma_bits;
>
> #ifdef CONFIG_ZONE_DMA
> - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> + zone_dma_bits = min(32U, dt_zone_dma_bits);

A thought: can we remove the min here and expand ZONE_DMA to whatever
dt_zone_dma_bits says? More on this below.

> arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> #endif

I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
need for max_zone_phys(). This was rather theoretical, the Seattle
platform has all RAM starting above 4GB and that led to an empty
ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
ZONE_DMA32 into the bottom of the RAM, on the assumption that such
32-bit devices would have a DMA offset hardwired. We are not aware of
any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
only work if they are behind an SMMU (so no hardwired offset).

In hindsight, it would have made more sense on platforms with RAM above
4GB to expand ZONE_DMA32 to cover the whole memory (so empty
ZONE_NORMAL). Something like:

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a53c1e0fb017..7d5e3dd85617 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
*/
static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
{
- phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, zone_bits);
- return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
+ phys_addr_t zone_mask = 1ULL << zone_bits;
+
+ if (!(memblock_start_of_DRAM() & zone_mask))
+ zone_mask = PHYS_ADDR_MAX;
+
+ return min(zone_mask, memblock_end_of_DRAM());
}

static void __init zone_sizes_init(unsigned long min, unsigned long max)

I don't think this makes any difference for ZONE_DMA unless a
broken DT or IORT reports the max CPU address below the start of DRAM.

There's a minor issue if of_dma_get_max_cpu_address() matches
memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.

--
Catalin

2020-10-23 13:41:49

by Christoph Hellwig

[permalink] [raw]

2020-10-23 18:46:47

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

Hi Catalin,

On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> > static void __init zone_sizes_init(unsigned long min, unsigned long max)
> > {
> > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
> > + unsigned int __maybe_unused dt_zone_dma_bits;
> >
> > #ifdef CONFIG_ZONE_DMA
> > - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> > + zone_dma_bits = min(32U, dt_zone_dma_bits);
>
> A thought: can we remove the min here and expand ZONE_DMA to whatever
> dt_zone_dma_bits says? More on this below.

On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual
amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first
sight, I don't see it breaking dma-direct in any way.

On the other hand, there is a big amount of MMIO devices out there that can
only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make
things worse, this limitation is often expressed in the driver, not FW (with
dma_set_mask() and friends). If those devices aren't behind an IOMMU we have be
able to provide at least 32-bit addressable memory. See this comment from
dma_direct_supported():

/*
* Because 32-bit DMA masks are so common we expect every architecture
* to be able to satisfy them - either by not supporting more physical
* memory, or by providing a ZONE_DMA32. If neither is the case, the
* architecture needs to use an IOMMU instead of the direct mapping.
*/

I think, for the common case, we're stuck with at least one zone spanning the
32-bit address space.

> > arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> > #endif
>
> I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
> need for max_zone_phys(). This was rather theoretical, the Seattle
> platform has all RAM starting above 4GB and that led to an empty
> ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
> ZONE_DMA32 into the bottom of the RAM, on the assumption that such
> 32-bit devices would have a DMA offset hardwired. We are not aware of
> any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
> only work if they are behind an SMMU (so no hardwired offset).
>
> In hindsight, it would have made more sense on platforms with RAM above
> 4GB to expand ZONE_DMA32 to cover the whole memory (so empty
> ZONE_NORMAL). Something like:
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a53c1e0fb017..7d5e3dd85617 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
> */
> static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> {
> - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, zone_bits);
> - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> + phys_addr_t zone_mask = 1ULL << zone_bits;
> +
> + if (!(memblock_start_of_DRAM() & zone_mask))
> + zone_mask = PHYS_ADDR_MAX;
> +
> + return min(zone_mask, memblock_end_of_DRAM());
> }
>
> static void __init zone_sizes_init(unsigned long min, unsigned long max)
>
> I don't think this makes any difference for ZONE_DMA unless a
> broken DT or IORT reports the max CPU address below the start of DRAM.
>
> There's a minor issue if of_dma_get_max_cpu_address() matches
> memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
> a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.

I agree it makes no sense to create more than one zone when the beginning of
RAM is located above the 32-bit address space. I'm all for disregarding the
possibility of hardwired offsets. As a bonus, as we already discussed some time
ago, this is something that never played well with current dma-direct code[1].

Regards,
Nicolas

[1] https://lkml.org/lkml/2020/9/8/377


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-10-23 19:00:14

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

On Fri, Oct 23, 2020 at 05:27:49PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> > > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> > > static void __init zone_sizes_init(unsigned long min, unsigned long max)
> > > {
> > > unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
> > > + unsigned int __maybe_unused dt_zone_dma_bits;
> > >
> > > #ifdef CONFIG_ZONE_DMA
> > > - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > > + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> > > + zone_dma_bits = min(32U, dt_zone_dma_bits);
> >
> > A thought: can we remove the min here and expand ZONE_DMA to whatever
> > dt_zone_dma_bits says? More on this below.
>
> On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual
> amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first
> sight, I don't see it breaking dma-direct in any way.
>
> On the other hand, there is a big amount of MMIO devices out there that can
> only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make
> things worse, this limitation is often expressed in the driver, not FW (with
> dma_set_mask() and friends). If those devices aren't behind an IOMMU we have be
> able to provide at least 32-bit addressable memory. See this comment from
> dma_direct_supported():
>
> /*
> * Because 32-bit DMA masks are so common we expect every architecture
> * to be able to satisfy them - either by not supporting more physical
> * memory, or by providing a ZONE_DMA32. If neither is the case, the
> * architecture needs to use an IOMMU instead of the direct mapping.
> */
>
> I think, for the common case, we're stuck with at least one zone spanning the
> 32-bit address space.

You are right, I guess it makes sense to keep a 32-bit zone as not all
devices would be described as such.

> > > arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> > > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> > > #endif
> >
> > I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
> > need for max_zone_phys(). This was rather theoretical, the Seattle
> > platform has all RAM starting above 4GB and that led to an empty
> > ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
> > ZONE_DMA32 into the bottom of the RAM, on the assumption that such
> > 32-bit devices would have a DMA offset hardwired. We are not aware of
> > any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
> > only work if they are behind an SMMU (so no hardwired offset).
> >
> > In hindsight, it would have made more sense on platforms with RAM above
> > 4GB to expand ZONE_DMA32 to cover the whole memory (so empty
> > ZONE_NORMAL). Something like:
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index a53c1e0fb017..7d5e3dd85617 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
> > */
> > static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> > {
> > - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, zone_bits);
> > - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> > + phys_addr_t zone_mask = 1ULL << zone_bits;
> > +
> > + if (!(memblock_start_of_DRAM() & zone_mask))
> > + zone_mask = PHYS_ADDR_MAX;
> > +
> > + return min(zone_mask, memblock_end_of_DRAM());
> > }
> >
> > static void __init zone_sizes_init(unsigned long min, unsigned long max)
> >
> > I don't think this makes any difference for ZONE_DMA unless a
> > broken DT or IORT reports the max CPU address below the start of DRAM.
> >
> > There's a minor issue if of_dma_get_max_cpu_address() matches
> > memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
> > a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.
>
> I agree it makes no sense to create more than one zone when the beginning of
> RAM is located above the 32-bit address space. I'm all for disregarding the
> possibility of hardwired offsets. As a bonus, as we already discussed some time
> ago, this is something that never played well with current dma-direct code[1].
>
> [1] https://lkml.org/lkml/2020/9/8/377

Maybe this one is still worth fixing, at least for consistency. But it's
not urgent.

My diff above has a side-effect that if dt_zone_dma_bits is below the
start of DRAM, ZONE_DMA gets expanded to PHYS_ADDR_MAX. If this was
32-bit, that's fine but if it was, say, 30-bit because of some firmware
misdescription with RAM starting at 2GB, we end up with no ZONE_DMA32. I
think max_zone_phys() could cap this at 32, as a safety mechanism:

static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
{
phys_addr_t zone_mask = (1ULL << zone_bits) - 1;
phys_addr_t phys_start = memblock_start_of_DRAM();

if (!(phys_start & U32_MAX))
zone_mask = PHYS_ADDR_MAX;
else if (!(phys_start & zone_mask))
zone_mask = U32_MAX;

return min(zone_mask + 1, memblock_end_of_DRAM());
}

Assuming I got the shifting right, arm64_dma_phys_limit becomes:

arm64_dma_phys_limit = max_zone_phys(zone_dma_bits, 32);

--
Catalin

2020-10-23 19:25:25

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] arm64: Default to 32-bit wide ZONE_DMA

Hi,

On 10/21/20 7:34 AM, Nicolas Saenz Julienne wrote:
> Using two distinct DMA zones turned out to be problematic. Here's an
> attempt go back to a saner default.
>
> I tested this on both a RPi4 and QEMU.

I've tested this in ACPI mode on the rpi4 (4+8G with/without the 3G
limiter) as well, with Ard's IORT patch. Nothing seems to have regressed.

Thanks,

Tested-by: Jeremy Linton <[email protected]>




>
> ---
>
> Changes since v3:
> - Drop patch adding define in dma-mapping
> - Address small review changes
> - Update Ard's patch
> - Add new patch removing examples from mmzone.h
>
> Changes since v2:
> - Introduce Ard's patch
> - Improve OF dma-ranges parsing function
> - Add unit test for OF function
> - Address small changes
> - Move crashkernel reservation later in boot process
>
> Changes since v1:
> - Parse dma-ranges instead of using machine compatible string
>
> Ard Biesheuvel (1):
> arm64: mm: Set ZONE_DMA size based on early IORT scan
>
> Nicolas Saenz Julienne (6):
> arm64: mm: Move reserve_crashkernel() into mem_init()
> arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
> of/address: Introduce of_dma_get_max_cpu_address()
> of: unittest: Add test for of_dma_get_max_cpu_address()
> arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
> mm: Remove examples from enum zone_type comment
>
> arch/arm64/mm/init.c | 16 ++++++------
> drivers/acpi/arm64/iort.c | 52 +++++++++++++++++++++++++++++++++++++++
> drivers/of/address.c | 42 +++++++++++++++++++++++++++++++
> drivers/of/unittest.c | 18 ++++++++++++++
> include/linux/acpi_iort.h | 4 +++
> include/linux/mmzone.h | 20 ---------------
> include/linux/of.h | 7 ++++++
> 7 files changed, 130 insertions(+), 29 deletions(-)
>

2020-10-27 14:42:45

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] arm64: Default to 32-bit wide ZONE_DMA

On Fri, 2020-10-23 at 14:05 -0500, Jeremy Linton wrote:
> Hi,
>
> On 10/21/20 7:34 AM, Nicolas Saenz Julienne wrote:
> > Using two distinct DMA zones turned out to be problematic. Here's an
> > attempt go back to a saner default.
> >
> > I tested this on both a RPi4 and QEMU.
>
> I've tested this in ACPI mode on the rpi4 (4+8G with/without the 3G
> limiter) as well, with Ard's IORT patch. Nothing seems to have regressed.
>
> Thanks,
>
> Tested-by: Jeremy Linton <[email protected]>

Thanks!


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part