2020-10-01 16:19:25

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 0/4] arm64: Default to 32-bit wide ZONE_DMA

Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Nicolas Saenz Julienne (4):
of/fdt: Update zone_dma_bits when running in bcm2711
dma-direct: Turn zone_dma_bits default value into a define
arm64: Default to 32-bit ZONE_DMA
mm: Update DMA zones description with arm64 newer behavior

arch/arm64/mm/init.c | 12 ++++++++----
drivers/of/fdt.c | 10 ++++++++++
include/linux/dma-direct.h | 1 +
include/linux/mmzone.h | 5 +++--
kernel/dma/direct.c | 2 +-
5 files changed, 23 insertions(+), 7 deletions(-)

--
2.28.0


2020-10-01 16:19:53

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

arm64 wants to be able to set ZONE_DMA's size depending on the specific
platform its being run on. Ideally this could be achieved in a smart way
by parsing all dma-ranges and calculating the smaller DMA constraint in
the system. Easier said than done. We compromised on a simpler solution
as the only platform interested in using this is the Raspberry Pi 4.

So update zone_dma_bits if the machine's compatible string matches
Raspberry Pi 4's, otherwise let arm64's mm code deal with it.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
drivers/of/fdt.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4602e467ca8b..cd0d115ef329 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
#include <linux/serial_core.h>
#include <linux/sysfs.h>
#include <linux/random.h>
+#include <linux/dma-direct.h> /* for zone_dma_bits */

#include <asm/setup.h> /* for COMMAND_LINE_SIZE */
#include <asm/page.h>
@@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
of_scan_flat_dt(early_init_dt_scan_memory, NULL);
}

+void __init early_init_dt_update_zone_dma_bits(void)
+{
+ unsigned long dt_root = of_get_flat_dt_root();
+
+ if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
+ zone_dma_bits = 30;
+}
+
bool __init early_init_dt_scan(void *params)
{
bool status;
@@ -1207,6 +1216,7 @@ bool __init early_init_dt_scan(void *params)
return false;

early_init_dt_scan_nodes();
+ early_init_dt_update_zone_dma_bits();
return true;
}

--
2.28.0

2020-10-01 16:20:59

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 4/4] mm: Update DMA zones description

The default behavior for arm64 changed, so reflect that.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
include/linux/mmzone.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..d28ce77ccc2a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -363,8 +363,9 @@ enum zone_type {
* - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
* the specific device.
*
- * - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
- * lower 4G.
+ * - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4,
+ * in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of
+ * the lower 4GB.
*
* - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
* depending on the specific device.
--
2.28.0

2020-10-01 16:21:45

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 3/4] arm64: Default to 32-bit ZONE_DMA

The Raspberry Pi 4 needs two DMA zones as some of its devices can only
DMA into the 30-bit physical address space. We solved that by creating
an extra ZONE_DMA covering the 30-bit. It turns out that creating extra
zones unnecessarily broke Kdump on large systems. So default to a single
32-bit wide ZONE_DMA and only define both zones if running on RPi4.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/arm64/mm/init.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e1a69a618832..3c3f462466eb 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -43,8 +43,6 @@
#include <asm/tlb.h>
#include <asm/alternative.h>

-#define ARM64_ZONE_DMA_BITS 30
-
/*
* We need to be able to catch inadvertent references to memstart_addr
* that occur (potentially in generic code) before arm64_memblock_init()
@@ -388,8 +386,14 @@ void __init arm64_memblock_init(void)
early_init_fdt_scan_reserved_mem();

if (IS_ENABLED(CONFIG_ZONE_DMA)) {
- zone_dma_bits = ARM64_ZONE_DMA_BITS;
- arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
+ /*
+ * early_init_dt_scan() might alter zone_dma_bits based on the
+ * device's DT. Otherwise, have it cover the 32-bit address
+ * space.
+ */
+ if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
+ zone_dma_bits = 32;
+ arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
}

if (IS_ENABLED(CONFIG_ZONE_DMA32))
--
2.28.0

2020-10-01 16:22:44

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 2/4] dma-direct: Turn zone_dma_bits default value into a define

Other code might need to reference it.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
include/linux/dma-direct.h | 1 +
kernel/dma/direct.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 38ed3b55034d..ef19ff505d09 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -12,6 +12,7 @@
#include <linux/mem_encrypt.h>
#include <linux/swiotlb.h>

+#define ZONE_DMA_BITS_DEFAULT 24
extern unsigned int zone_dma_bits;

/*
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 121a9c1969dd..4b3cfa02532b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -20,7 +20,7 @@
* it for entirely different regions. In that case the arch code needs to
* override the variable below for dma-direct to work properly.
*/
-unsigned int zone_dma_bits __ro_after_init = 24;
+unsigned int zone_dma_bits __ro_after_init = ZONE_DMA_BITS_DEFAULT;

static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
--
2.28.0

2020-10-01 17:16:41

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

Hi Nicolas,

Thanks for putting this together.

On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 4602e467ca8b..cd0d115ef329 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -25,6 +25,7 @@
> #include <linux/serial_core.h>
> #include <linux/sysfs.h>
> #include <linux/random.h>
> +#include <linux/dma-direct.h> /* for zone_dma_bits */
>
> #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> #include <asm/page.h>
> @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> }
>
> +void __init early_init_dt_update_zone_dma_bits(void)
> +{
> + unsigned long dt_root = of_get_flat_dt_root();
> +
> + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> + zone_dma_bits = 30;
> +}

I think we could keep this entirely in the arm64 setup_machine_fdt() and
not pollute the core code with RPi4-specific code.

--
Catalin

2020-10-01 17:22:59

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 3/4] arm64: Default to 32-bit ZONE_DMA

On Thu, Oct 01, 2020 at 06:17:39PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index e1a69a618832..3c3f462466eb 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -43,8 +43,6 @@
> #include <asm/tlb.h>
> #include <asm/alternative.h>
>
> -#define ARM64_ZONE_DMA_BITS 30
> -
> /*
> * We need to be able to catch inadvertent references to memstart_addr
> * that occur (potentially in generic code) before arm64_memblock_init()
> @@ -388,8 +386,14 @@ void __init arm64_memblock_init(void)
> early_init_fdt_scan_reserved_mem();
>
> if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> - arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
> + /*
> + * early_init_dt_scan() might alter zone_dma_bits based on the
> + * device's DT. Otherwise, have it cover the 32-bit address
> + * space.
> + */
> + if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
> + zone_dma_bits = 32;
> + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);

So here we assume that if zone_dma_bits is 24, it wasn't initialised. I
think it may be simpler if we just set it in setup_machine_fdt() to 32
or 30 if RPi4. This way we don't have to depend on what the core kernel
sets.

--
Catalin

2020-10-01 17:23:39

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 4/4] mm: Update DMA zones description

On Thu, Oct 01, 2020 at 06:17:40PM +0200, Nicolas Saenz Julienne wrote:
> The default behavior for arm64 changed, so reflect that.
>
> Signed-off-by: Nicolas Saenz Julienne <[email protected]>

Acked-by: Catalin Marinas <[email protected]>

2020-10-01 17:27:02

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> Hi Nicolas,
>
> Thanks for putting this together.
>
> On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index 4602e467ca8b..cd0d115ef329 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -25,6 +25,7 @@
> > #include <linux/serial_core.h>
> > #include <linux/sysfs.h>
> > #include <linux/random.h>
> > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> >
> > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > #include <asm/page.h>
> > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > }
> >
> > +void __init early_init_dt_update_zone_dma_bits(void)
> > +{
> > + unsigned long dt_root = of_get_flat_dt_root();
> > +
> > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > + zone_dma_bits = 30;
> > +}
>
> I think we could keep this entirely in the arm64 setup_machine_fdt() and
> not pollute the core code with RPi4-specific code.

Actually, even better, could we not move the check to
arm64_memblock_init() when we initialise zone_dma_bits?

--
Catalin

2020-10-01 17:34:12

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > Hi Nicolas,
> >
> > Thanks for putting this together.
> >
> > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > index 4602e467ca8b..cd0d115ef329 100644
> > > --- a/drivers/of/fdt.c
> > > +++ b/drivers/of/fdt.c
> > > @@ -25,6 +25,7 @@
> > > #include <linux/serial_core.h>
> > > #include <linux/sysfs.h>
> > > #include <linux/random.h>
> > > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> > >
> > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > > #include <asm/page.h>
> > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > }
> > >
> > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > +{
> > > + unsigned long dt_root = of_get_flat_dt_root();
> > > +
> > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > + zone_dma_bits = 30;
> > > +}
> >
> > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > not pollute the core code with RPi4-specific code.
>
> Actually, even better, could we not move the check to
> arm64_memblock_init() when we initialise zone_dma_bits?

I did it this way as I vaguely remembered Rob saying he wanted to centralise
all early boot fdt code in one place. But I'll be happy to move it there.

Regards,
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-10-01 20:06:55

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Thu, Oct 1, 2020 at 12:31 PM Nicolas Saenz Julienne
<[email protected]> wrote:
>
> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > Hi Nicolas,
> > >
> > > Thanks for putting this together.
> > >
> > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -25,6 +25,7 @@
> > > > #include <linux/serial_core.h>
> > > > #include <linux/sysfs.h>
> > > > #include <linux/random.h>
> > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> > > >
> > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > > > #include <asm/page.h>
> > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > }
> > > >
> > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > +{
> > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > +
> > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > + zone_dma_bits = 30;
> > > > +}
> > >
> > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > not pollute the core code with RPi4-specific code.
> >
> > Actually, even better, could we not move the check to
> > arm64_memblock_init() when we initialise zone_dma_bits?
>
> I did it this way as I vaguely remembered Rob saying he wanted to centralise
> all early boot fdt code in one place. But I'll be happy to move it there.

Right, unless zone_dma_bits is only an arm64 thing, then this doesn't
really have anything arch specific.

Reviewed-by: Rob Herring <[email protected]>

Rob

2020-10-02 09:10:25

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

Hi Nicolas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on next-20201001]
[also build test WARNING on v5.9-rc7]
[cannot apply to robh/for-next arm64/for-next/core hnaz-linux-mm/master linus/master v5.9-rc7 v5.9-rc6 v5.9-rc5]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Nicolas-Saenz-Julienne/arm64-Default-to-32-bit-wide-ZONE_DMA/20201002-002007
base: d39294091fee6b89d9c4a683bb19441b25098330
config: arm64-randconfig-r005-20200930 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project bcd05599d0e53977a963799d6ee4f6e0bc21331b)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# https://github.com/0day-ci/linux/commit/7d073ab6c280772b1bcf9e337528be2138d0bc85
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Nicolas-Saenz-Julienne/arm64-Default-to-32-bit-wide-ZONE_DMA/20201002-002007
git checkout 7d073ab6c280772b1bcf9e337528be2138d0bc85
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> drivers/of/fdt.c:1202:13: warning: no previous prototype for function 'early_init_dt_update_zone_dma_bits' [-Wmissing-prototypes]
void __init early_init_dt_update_zone_dma_bits(void)
^
drivers/of/fdt.c:1202:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void __init early_init_dt_update_zone_dma_bits(void)
^
static
1 warning generated.

vim +/early_init_dt_update_zone_dma_bits +1202 drivers/of/fdt.c

1201
> 1202 void __init early_init_dt_update_zone_dma_bits(void)
1203 {
1204 unsigned long dt_root = of_get_flat_dt_root();
1205
1206 if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
1207 zone_dma_bits = 30;
1208 }
1209

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (2.56 kB)
.config.gz (41.21 kB)
Download all attachments

2020-10-02 11:57:36

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -25,6 +25,7 @@
> > > > #include <linux/serial_core.h>
> > > > #include <linux/sysfs.h>
> > > > #include <linux/random.h>
> > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> > > >
> > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > > > #include <asm/page.h>
> > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > }
> > > >
> > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > +{
> > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > +
> > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > + zone_dma_bits = 30;
> > > > +}
> > >
> > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > not pollute the core code with RPi4-specific code.
> >
> > Actually, even better, could we not move the check to
> > arm64_memblock_init() when we initialise zone_dma_bits?
>
> I did it this way as I vaguely remembered Rob saying he wanted to centralise
> all early boot fdt code in one place. But I'll be happy to move it there.

I can see Rob replied and I'm fine if that's his preference. However,
what I don't particularly like is that in the arm64 code, if
zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
the early_init_dt_update_zone_dma_bits(). What if at some point we'll
get a platform that actually needs 24 here (I truly hope not, but just
the principle of relying on magic values)?

So rather than guessing, I'd prefer if the arch code can override
ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
need to explicitly touch the zone_dma_bits variable.

--
Catalin

2020-10-08 10:07:10

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

Hi Catalin, sorry for the late reply.

On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > --- a/drivers/of/fdt.c
> > > > > +++ b/drivers/of/fdt.c
> > > > > @@ -25,6 +25,7 @@
> > > > > #include <linux/serial_core.h>
> > > > > #include <linux/sysfs.h>
> > > > > #include <linux/random.h>
> > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> > > > >
> > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > > > > #include <asm/page.h>
> > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > }
> > > > >
> > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > +{
> > > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > > +
> > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > + zone_dma_bits = 30;
> > > > > +}
> > > >
> > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > not pollute the core code with RPi4-specific code.
> > >
> > > Actually, even better, could we not move the check to
> > > arm64_memblock_init() when we initialise zone_dma_bits?
> >
> > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > all early boot fdt code in one place. But I'll be happy to move it there.
>
> I can see Rob replied and I'm fine if that's his preference. However,
> what I don't particularly like is that in the arm64 code, if
> zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> get a platform that actually needs 24 here (I truly hope not, but just
> the principle of relying on magic values)?
>
> So rather than guessing, I'd prefer if the arch code can override
> ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> need to explicitly touch the zone_dma_bits variable.

Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
but couldn't think of a nicer alternative.

Sadly I just realised that the series is incomplete, we have RPi4 users that
want to boot unsing ACPI, and this series would break things for them. I'll
have a word with them to see what we can do for their use-case.

Regards,
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-10-08 10:16:08

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > --- a/drivers/of/fdt.c
> > > > > > +++ b/drivers/of/fdt.c
> > > > > > @@ -25,6 +25,7 @@
> > > > > > #include <linux/serial_core.h>
> > > > > > #include <linux/sysfs.h>
> > > > > > #include <linux/random.h>
> > > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> > > > > >
> > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > > > > > #include <asm/page.h>
> > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > }
> > > > > >
> > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > +{
> > > > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > > > +
> > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > + zone_dma_bits = 30;
> > > > > > +}
> > > > >
> > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > > not pollute the core code with RPi4-specific code.
> > > >
> > > > Actually, even better, could we not move the check to
> > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > >
> > > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > > all early boot fdt code in one place. But I'll be happy to move it there.
> >
> > I can see Rob replied and I'm fine if that's his preference. However,
> > what I don't particularly like is that in the arm64 code, if
> > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > get a platform that actually needs 24 here (I truly hope not, but just
> > the principle of relying on magic values)?
> >
> > So rather than guessing, I'd prefer if the arch code can override
> > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > need to explicitly touch the zone_dma_bits variable.
>
> Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> but couldn't think of a nicer alternative.
>
> Sadly I just realised that the series is incomplete, we have RPi4 users that
> want to boot unsing ACPI, and this series would break things for them. I'll
> have a word with them to see what we can do for their use-case.

Is there a way to get some SoC information from ACPI?

--
Catalin

2020-10-08 22:52:22

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

(+ Lorenzo)

On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <[email protected]> wrote:
>
> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > > --- a/drivers/of/fdt.c
> > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > @@ -25,6 +25,7 @@
> > > > > > > #include <linux/serial_core.h>
> > > > > > > #include <linux/sysfs.h>
> > > > > > > #include <linux/random.h>
> > > > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> > > > > > >
> > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > > > > > > #include <asm/page.h>
> > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > > }
> > > > > > >
> > > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > > +{
> > > > > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > > > > +
> > > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > > + zone_dma_bits = 30;
> > > > > > > +}
> > > > > >
> > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > > > not pollute the core code with RPi4-specific code.
> > > > >
> > > > > Actually, even better, could we not move the check to
> > > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > >
> > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > > > all early boot fdt code in one place. But I'll be happy to move it there.
> > >
> > > I can see Rob replied and I'm fine if that's his preference. However,
> > > what I don't particularly like is that in the arm64 code, if
> > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > > get a platform that actually needs 24 here (I truly hope not, but just
> > > the principle of relying on magic values)?
> > >
> > > So rather than guessing, I'd prefer if the arch code can override
> > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > > need to explicitly touch the zone_dma_bits variable.
> >
> > Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> > but couldn't think of a nicer alternative.
> >
> > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > want to boot unsing ACPI, and this series would break things for them. I'll
> > have a word with them to see what we can do for their use-case.
>
> Is there a way to get some SoC information from ACPI?
>

This is unfortunate. We used ACPI _DMA methods as they were designed
to communicate the DMA limit of the XHCI controller to the OS.

It shouldn't be too hard to match the OEM id field in the DSDT, and
switch to the smaller mask. But it sucks to have to add a quirk like
that.

2020-10-09 08:39:58

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <[email protected]> wrote:
> > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > have a word with them to see what we can do for their use-case.
> >
> > Stupid question: why do these users insist on a totally unsuitable
> > interface? And why would we as Linux developers care to support such
> > a aims?
>
> The point is really whether we want to revert changes in Linux that
> made both DT and ACPI boot work without quirks on RPi4.

Well, and broke a big amount of devices that were otherwise fine.

> Having to check the RPi4 compatible string or OEM id in core init code is
> awful, regardless of whether you boot via ACPI or via DT.
>
> The problem with this hardware is that it uses a DMA mask which is
> narrower than 32, and the arm64 kernel is simply not set up to deal
> with that at all. On DT, we have DMA ranges properties and the likes
> to describe such limitations, on ACPI we have _DMA methods as well as
> DMA range attributes in the IORT, both of which are now handled
> correctly. So all the information is there, we just have to figure out
> how to consume it early on.

Is it worth the effort just for a single board? I don't know about ACPI but
parsing dma-ranges that early at boot time is not trivial. My intuition tells
me that it'd be even harder for ACPI, being a more complex data structure.

> Interestingly, this limitation always existed in the SoC, but it
> wasn't until they started shipping it with more than 1 GB of DRAM that
> it became a problem. This means issues like this could resurface in
> the future with existing SoCs when they get shipped with more memory,
> and so I would prefer fixing this in a generic way.

Actually what I proposed here is pretty generic. Specially from arm64's
perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
whatever it finds in DT. Both those operations are architecture independent.
arm64 arch code doesn't care about the logic involved in ascertaining
zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
to make it as such whenever it's worth the effort.

> Also, I assume papering over the issue like this does not fix the
> kdump issue fundamentally, it just works around it, and so we might
> run into this again in the future.

Any ideas? The way I understand it the kdump issue is just a shortcoming of
the memory zones design.

Regards,
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-10-09 09:27:14

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On 10/8/20 2:43 PM, Ard Biesheuvel wrote:
> (+ Lorenzo)
>
> On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <[email protected]> wrote:
>>
>> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
>>> On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
>>>> On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
>>>>> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
>>>>>> On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
>>>>>>> On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
>>>>>>>> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
>>>>>>>> index 4602e467ca8b..cd0d115ef329 100644
>>>>>>>> --- a/drivers/of/fdt.c
>>>>>>>> +++ b/drivers/of/fdt.c
>>>>>>>> @@ -25,6 +25,7 @@
>>>>>>>> #include <linux/serial_core.h>
>>>>>>>> #include <linux/sysfs.h>
>>>>>>>> #include <linux/random.h>
>>>>>>>> +#include <linux/dma-direct.h> /* for zone_dma_bits */
>>>>>>>>
>>>>>>>> #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
>>>>>>>> #include <asm/page.h>
>>>>>>>> @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
>>>>>>>> of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +void __init early_init_dt_update_zone_dma_bits(void)
>>>>>>>> +{
>>>>>>>> + unsigned long dt_root = of_get_flat_dt_root();
>>>>>>>> +
>>>>>>>> + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
>>>>>>>> + zone_dma_bits = 30;
>>>>>>>> +}
>>>>>>>
>>>>>>> I think we could keep this entirely in the arm64 setup_machine_fdt() and
>>>>>>> not pollute the core code with RPi4-specific code.
>>>>>>
>>>>>> Actually, even better, could we not move the check to
>>>>>> arm64_memblock_init() when we initialise zone_dma_bits?
>>>>>
>>>>> I did it this way as I vaguely remembered Rob saying he wanted to centralise
>>>>> all early boot fdt code in one place. But I'll be happy to move it there.
>>>>
>>>> I can see Rob replied and I'm fine if that's his preference. However,
>>>> what I don't particularly like is that in the arm64 code, if
>>>> zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
>>>> the early_init_dt_update_zone_dma_bits(). What if at some point we'll
>>>> get a platform that actually needs 24 here (I truly hope not, but just
>>>> the principle of relying on magic values)?
>>>>
>>>> So rather than guessing, I'd prefer if the arch code can override
>>>> ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
>>>> need to explicitly touch the zone_dma_bits variable.
>>>
>>> Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
>>> but couldn't think of a nicer alternative.
>>>
>>> Sadly I just realised that the series is incomplete, we have RPi4 users that
>>> want to boot unsing ACPI, and this series would break things for them. I'll
>>> have a word with them to see what we can do for their use-case.
>>
>> Is there a way to get some SoC information from ACPI?
>>
>
> This is unfortunate. We used ACPI _DMA methods as they were designed
> to communicate the DMA limit of the XHCI controller to the OS.
>
> It shouldn't be too hard to match the OEM id field in the DSDT, and
> switch to the smaller mask. But it sucks to have to add a quirk like
> that.
> It also requires delaying setting the arm64_dma_phy_limit a bit, but
that doesn't appear to be causing a problem. I've been boot/compiling
with a patch like:

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index cada0b816c8a..9dfe776c1c75 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -14,6 +14,7 @@

#include <linux/acpi.h>
#include <linux/cpumask.h>
+#include <linux/dma-direct.h>
#include <linux/efi.h>
#include <linux/efi-bgrt.h>
#include <linux/init.h>
@@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void)
ret = -EINVAL;
}

+ if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) &&
+ !strncmp(table->oem_table_id, "RPI4 ",
ACPI_OEM_TABLE_ID_SIZE) &&
+ table->oem_revision <= 0x200) {
+ zone_dma_bits = 30;
+ }
out:
/*
* acpi_get_table() creates FADT table mapping that
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index cd5caca8a929..6c8aaf1570ce 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long
min, unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};

#ifdef CONFIG_ZONE_DMA
+ arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
#endif
#ifdef CONFIG_ZONE_DMA32
@@ -393,7 +394,6 @@ void __init arm64_memblock_init(void)
*/
if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
zone_dma_bits = 32;
- arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
}

if (IS_ENABLED(CONFIG_ZONE_DMA32))


2020-10-09 10:49:19

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <[email protected]> wrote:
>
> On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > want to boot unsing ACPI, and this series would break things for them. I'll
> > have a word with them to see what we can do for their use-case.
>
> Stupid question: why do these users insist on a totally unsuitable
> interface? And why would we as Linux developers care to support such
> a aims?
>

The point is really whether we want to revert changes in Linux that
made both DT and ACPI boot work without quirks on RPi4. Having to
check the RPi4 compatible string or OEM id in core init code is awful,
regardless of whether you boot via ACPI or via DT.

The problem with this hardware is that it uses a DMA mask which is
narrower than 32, and the arm64 kernel is simply not set up to deal
with that at all. On DT, we have DMA ranges properties and the likes
to describe such limitations, on ACPI we have _DMA methods as well as
DMA range attributes in the IORT, both of which are now handled
correctly. So all the information is there, we just have to figure out
how to consume it early on.

Interestingly, this limitation always existed in the SoC, but it
wasn't until they started shipping it with more than 1 GB of DRAM that
it became a problem. This means issues like this could resurface in
the future with existing SoCs when they get shipped with more memory,
and so I would prefer fixing this in a generic way.

Also, I assume papering over the issue like this does not fix the
kdump issue fundamentally, it just works around it, and so we might
run into this again in the future.

2020-10-09 11:09:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> Sadly I just realised that the series is incomplete, we have RPi4 users that
> want to boot unsing ACPI, and this series would break things for them. I'll
> have a word with them to see what we can do for their use-case.

Stupid question: why do these users insist on a totally unsuitable
interface? And why would we as Linux developers care to support such
a aims?

2020-10-09 12:03:14

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

Hi Jeremy.

On Thu, 2020-10-08 at 22:59 -0500, Jeremy Linton wrote:
> On 10/8/20 2:43 PM, Ard Biesheuvel wrote:
> > (+ Lorenzo)
> >
> > On Thu, 8 Oct 2020 at 12:14, Catalin Marinas <[email protected]> wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > > > > --- a/drivers/of/fdt.c
> > > > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > > > @@ -25,6 +25,7 @@
> > > > > > > > > #include <linux/serial_core.h>
> > > > > > > > > #include <linux/sysfs.h>
> > > > > > > > > #include <linux/random.h>
> > > > > > > > > +#include <linux/dma-direct.h> /* for zone_dma_bits */
> > > > > > > > >
> > > > > > > > > #include <asm/setup.h> /* for COMMAND_LINE_SIZE */
> > > > > > > > > #include <asm/page.h>
> > > > > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > > > > +{
> > > > > > > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > > > > > > +
> > > > > > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > > > > + zone_dma_bits = 30;
> > > > > > > > > +}
> > > > > > > >
> > > > > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > > > > > not pollute the core code with RPi4-specific code.
> > > > > > >
> > > > > > > Actually, even better, could we not move the check to
> > > > > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > > > >
> > > > > > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > > > > > all early boot fdt code in one place. But I'll be happy to move it there.
> > > > >
> > > > > I can see Rob replied and I'm fine if that's his preference. However,
> > > > > what I don't particularly like is that in the arm64 code, if
> > > > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > > > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > > > > get a platform that actually needs 24 here (I truly hope not, but just
> > > > > the principle of relying on magic values)?
> > > > >
> > > > > So rather than guessing, I'd prefer if the arch code can override
> > > > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > > > > need to explicitly touch the zone_dma_bits variable.
> > > >
> > > > Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> > > > but couldn't think of a nicer alternative.
> > > >
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > have a word with them to see what we can do for their use-case.
> > >
> > > Is there a way to get some SoC information from ACPI?
> > >
> >
> > This is unfortunate. We used ACPI _DMA methods as they were designed
> > to communicate the DMA limit of the XHCI controller to the OS.
> >
> > It shouldn't be too hard to match the OEM id field in the DSDT, and
> > switch to the smaller mask. But it sucks to have to add a quirk like
> > that.
> > It also requires delaying setting the arm64_dma_phy_limit a bit, but
> that doesn't appear to be causing a problem. I've been boot/compiling
> with a patch like:
>
> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
> index cada0b816c8a..9dfe776c1c75 100644
> --- a/arch/arm64/kernel/acpi.c
> +++ b/arch/arm64/kernel/acpi.c
> @@ -14,6 +14,7 @@
>
> #include <linux/acpi.h>
> #include <linux/cpumask.h>
> +#include <linux/dma-direct.h>
> #include <linux/efi.h>
> #include <linux/efi-bgrt.h>
> #include <linux/init.h>
> @@ -168,6 +169,11 @@ static int __init acpi_fadt_sanity_check(void)
> ret = -EINVAL;
> }
>
> + if (!strncmp(table->oem_id, "RPIFDN", ACPI_OEM_ID_SIZE) &&
> + !strncmp(table->oem_table_id, "RPI4 ",
> ACPI_OEM_TABLE_ID_SIZE) &&
> + table->oem_revision <= 0x200) {
> + zone_dma_bits = 30;
> + }
> out:
> /*
> * acpi_get_table() creates FADT table mapping that
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index cd5caca8a929..6c8aaf1570ce 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -195,6 +195,7 @@ static void __init zone_sizes_init(unsigned long
> min, unsigned long max)
> unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
>
> #ifdef CONFIG_ZONE_DMA
> + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> #endif
> #ifdef CONFIG_ZONE_DMA32
> @@ -393,7 +394,6 @@ void __init arm64_memblock_init(void)
> */
> if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
> zone_dma_bits = 32;
> - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> }
>
> if (IS_ENABLED(CONFIG_ZONE_DMA32))
>
>

Thanks for taking the time to look at this!

Regards,
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-10-09 12:18:37

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
<[email protected]> wrote:
>
> On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <[email protected]> wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > have a word with them to see what we can do for their use-case.
> > >
> > > Stupid question: why do these users insist on a totally unsuitable
> > > interface? And why would we as Linux developers care to support such
> > > a aims?
> >
> > The point is really whether we want to revert changes in Linux that
> > made both DT and ACPI boot work without quirks on RPi4.
>
> Well, and broke a big amount of devices that were otherwise fine.
>

Yeah that was unfortunate.

> > Having to check the RPi4 compatible string or OEM id in core init code is
> > awful, regardless of whether you boot via ACPI or via DT.
> >
> > The problem with this hardware is that it uses a DMA mask which is
> > narrower than 32, and the arm64 kernel is simply not set up to deal
> > with that at all. On DT, we have DMA ranges properties and the likes
> > to describe such limitations, on ACPI we have _DMA methods as well as
> > DMA range attributes in the IORT, both of which are now handled
> > correctly. So all the information is there, we just have to figure out
> > how to consume it early on.
>
> Is it worth the effort just for a single board? I don't know about ACPI but
> parsing dma-ranges that early at boot time is not trivial. My intuition tells
> me that it'd be even harder for ACPI, being a more complex data structure.
>

Yes, it will be harder, especially for the _DMA methods.

> > Interestingly, this limitation always existed in the SoC, but it
> > wasn't until they started shipping it with more than 1 GB of DRAM that
> > it became a problem. This means issues like this could resurface in
> > the future with existing SoCs when they get shipped with more memory,
> > and so I would prefer fixing this in a generic way.
>
> Actually what I proposed here is pretty generic. Specially from arm64's
> perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> whatever it finds in DT. Both those operations are architecture independent.
> arm64 arch code doesn't care about the logic involved in ascertaining
> zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> to make it as such whenever it's worth the effort.
>

The problem is that, while we are providing a full description of the
SoC's capabilities, we short circuit this by inserting knowledge into
the code (that is shared between all DT architectures) that
"brcm,bcm2711" is special, and needs a DMA zone override.

I think for ACPI boot, we might be able to work around this by cold
plugging the memory above 1 GB, but I have to double check whether it
won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
for me here from the top of their head)

2020-10-09 14:05:19

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, 2020-10-09 at 11:13 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> <[email protected]> wrote:
> > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <[email protected]> wrote:
> > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > > have a word with them to see what we can do for their use-case.
> > > >
> > > > Stupid question: why do these users insist on a totally unsuitable
> > > > interface? And why would we as Linux developers care to support such
> > > > a aims?
> > >
> > > The point is really whether we want to revert changes in Linux that
> > > made both DT and ACPI boot work without quirks on RPi4.
> >
> > Well, and broke a big amount of devices that were otherwise fine.
> >
>
> Yeah that was unfortunate.
>
> > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > awful, regardless of whether you boot via ACPI or via DT.
> > >
> > > The problem with this hardware is that it uses a DMA mask which is
> > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > with that at all. On DT, we have DMA ranges properties and the likes
> > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > DMA range attributes in the IORT, both of which are now handled
> > > correctly. So all the information is there, we just have to figure out
> > > how to consume it early on.
> >
> > Is it worth the effort just for a single board? I don't know about ACPI but
> > parsing dma-ranges that early at boot time is not trivial. My intuition tells
> > me that it'd be even harder for ACPI, being a more complex data structure.
> >
>
> Yes, it will be harder, especially for the _DMA methods.
>
> > > Interestingly, this limitation always existed in the SoC, but it
> > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > it became a problem. This means issues like this could resurface in
> > > the future with existing SoCs when they get shipped with more memory,
> > > and so I would prefer fixing this in a generic way.
> >
> > Actually what I proposed here is pretty generic. Specially from arm64's
> > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> > whatever it finds in DT. Both those operations are architecture independent.
> > arm64 arch code doesn't care about the logic involved in ascertaining
> > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> > to make it as such whenever it's worth the effort.
> >
>
> The problem is that, while we are providing a full description of the
> SoC's capabilities, we short circuit this by inserting knowledge into
> the code (that is shared between all DT architectures) that
> "brcm,bcm2711" is special, and needs a DMA zone override.

Yes I understand this and I sympathize with it, not the most beautiful thing
out there :). But that's only half the issue, as I said, implementing this
early at boot time is a tangible amount of work and a burden to maintain just
for one board. So this is the compromise we discussed with the DT maintainer
(RobH). The series sets things up so as to be able to implement the right
thing transparently to arm64's architecture when deemed worth the effort.

Ultimately, if you're worried about inserting knowledge into the code, aren't
we doing that, in a more extreme way, when imposing an extra unwarranted zone
to the whole arm64 ecosystem?

Note that I'm more that happy to work on alternative solutions, but let's first
settle on what would be acceptable to everyone.

> I think for ACPI boot, we might be able to work around this by cold
> plugging the memory above 1 GB, but I have to double check whether it
> won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> for me here from the top of their head)

Don't know much about what ACPI memory cold plugging involves, but we'll still
need a proper ZONE_DMA32 (i.e. spanning the whole 32-bit address space) for
RPi4.

Regards,
Nicolas


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-10-09 16:24:41

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
<[email protected]> wrote:
>
> On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> > <[email protected]> wrote:
> > >
> > > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <[email protected]> wrote:
> > > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > > > have a word with them to see what we can do for their use-case.
> > > > >
> > > > > Stupid question: why do these users insist on a totally unsuitable
> > > > > interface? And why would we as Linux developers care to support such
> > > > > a aims?
> > > >
> > > > The point is really whether we want to revert changes in Linux that
> > > > made both DT and ACPI boot work without quirks on RPi4.
> > >
> > > Well, and broke a big amount of devices that were otherwise fine.
> > >
> >
> > Yeah that was unfortunate.
> >
> > > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > > awful, regardless of whether you boot via ACPI or via DT.
> > > >
> > > > The problem with this hardware is that it uses a DMA mask which is
> > > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > > with that at all. On DT, we have DMA ranges properties and the likes
> > > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > > DMA range attributes in the IORT, both of which are now handled
> > > > correctly. So all the information is there, we just have to figure out
> > > > how to consume it early on.
> > >
> > > Is it worth the effort just for a single board? I don't know about ACPI but
> > > parsing dma-ranges that early at boot time is not trivial. My intuition tells
> > > me that it'd be even harder for ACPI, being a more complex data structure.
> > >
> >
> > Yes, it will be harder, especially for the _DMA methods.
> >
> > > > Interestingly, this limitation always existed in the SoC, but it
> > > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > > it became a problem. This means issues like this could resurface in
> > > > the future with existing SoCs when they get shipped with more memory,
> > > > and so I would prefer fixing this in a generic way.
> > >
> > > Actually what I proposed here is pretty generic. Specially from arm64's
> > > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> > > whatever it finds in DT. Both those operations are architecture independent.
> > > arm64 arch code doesn't care about the logic involved in ascertaining
> > > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> > > to make it as such whenever it's worth the effort.
> > >
> >
> > The problem is that, while we are providing a full description of the
> > SoC's capabilities, we short circuit this by inserting knowledge into
> > the code (that is shared between all DT architectures) that
> > "brcm,bcm2711" is special, and needs a DMA zone override.
> >
> > I think for ACPI boot, we might be able to work around this by cold
> > plugging the memory above 1 GB, but I have to double check whether it
> > won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> > for me here from the top of their head)
>
> Is this information that we can infer from IORT nodes and make it
> generic (ie make a DMA limit out of all IORT nodes allowed ranges) ?
>

Yes, that seems feasible.

> We can move this check to IORT code and call it from arm64 if it
> can be made to work.
>

Finding the smallest value in the IORT, and assigning it to
zone_dma_bits if it is < 32 should be easy. But as I understand it,
having these separate DMA and DMA32 zones is what breaks kdump, no? So
how is this going to fix the underlying issue?

Nicolas, were there any other reported regressions caused by the
introduction of ZONE_DMA?

2020-10-09 19:52:30

by Lorenzo Pieralisi

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, Oct 09, 2020 at 11:13:59AM +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
> <[email protected]> wrote:
> >
> > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig <[email protected]> wrote:
> > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > Sadly I just realised that the series is incomplete, we have RPi4 users that
> > > > > want to boot unsing ACPI, and this series would break things for them. I'll
> > > > > have a word with them to see what we can do for their use-case.
> > > >
> > > > Stupid question: why do these users insist on a totally unsuitable
> > > > interface? And why would we as Linux developers care to support such
> > > > a aims?
> > >
> > > The point is really whether we want to revert changes in Linux that
> > > made both DT and ACPI boot work without quirks on RPi4.
> >
> > Well, and broke a big amount of devices that were otherwise fine.
> >
>
> Yeah that was unfortunate.
>
> > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > awful, regardless of whether you boot via ACPI or via DT.
> > >
> > > The problem with this hardware is that it uses a DMA mask which is
> > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > with that at all. On DT, we have DMA ranges properties and the likes
> > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > DMA range attributes in the IORT, both of which are now handled
> > > correctly. So all the information is there, we just have to figure out
> > > how to consume it early on.
> >
> > Is it worth the effort just for a single board? I don't know about ACPI but
> > parsing dma-ranges that early at boot time is not trivial. My intuition tells
> > me that it'd be even harder for ACPI, being a more complex data structure.
> >
>
> Yes, it will be harder, especially for the _DMA methods.
>
> > > Interestingly, this limitation always existed in the SoC, but it
> > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > it became a problem. This means issues like this could resurface in
> > > the future with existing SoCs when they get shipped with more memory,
> > > and so I would prefer fixing this in a generic way.
> >
> > Actually what I proposed here is pretty generic. Specially from arm64's
> > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
> > whatever it finds in DT. Both those operations are architecture independent.
> > arm64 arch code doesn't care about the logic involved in ascertaining
> > zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
> > to make it as such whenever it's worth the effort.
> >
>
> The problem is that, while we are providing a full description of the
> SoC's capabilities, we short circuit this by inserting knowledge into
> the code (that is shared between all DT architectures) that
> "brcm,bcm2711" is special, and needs a DMA zone override.
>
> I think for ACPI boot, we might be able to work around this by cold
> plugging the memory above 1 GB, but I have to double check whether it
> won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> for me here from the top of their head)

Is this information that we can infer from IORT nodes and make it
generic (ie make a DMA limit out of all IORT nodes allowed ranges) ?

We can move this check to IORT code and call it from arm64 if it
can be made to work.

Thanks,
Lorenzo

2020-10-09 22:41:19

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> <[email protected]> wrote:
> > We can move this check to IORT code and call it from arm64 if it
> > can be made to work.
>
> Finding the smallest value in the IORT, and assigning it to
> zone_dma_bits if it is < 32 should be easy. But as I understand it,
> having these separate DMA and DMA32 zones is what breaks kdump, no? So
> how is this going to fix the underlying issue?

If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
allocations fall back to ZONE_DMA).

kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
(it's been a while since I looked), the kdump allocation couldn't span
multiple zones.

In a separate thread, we try to fix kdump to use allocations above 4G as
a fallback but this only fixes platforms with enough RAM (and maybe it's
only those platforms that care about kdump).

--
Catalin

2020-10-10 23:10:55

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Sat, Oct 10, 2020 at 12:53:19PM +0200, Nicolas Saenz Julienne wrote:
> On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <[email protected]> wrote:
> > > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > > > <[email protected]> wrote:
> > > > > We can move this check to IORT code and call it from arm64 if it
> > > > > can be made to work.
> > > >
> > > > Finding the smallest value in the IORT, and assigning it to
> > > > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > > > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > > > how is this going to fix the underlying issue?
> > >
> > > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> > > allocations fall back to ZONE_DMA).
> > >
> > > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> > > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> > > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> > > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> > > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> > > (it's been a while since I looked), the kdump allocation couldn't span
> > > multiple zones.
> > >
> > > In a separate thread, we try to fix kdump to use allocations above 4G as
> > > a fallback but this only fixes platforms with enough RAM (and maybe it's
> > > only those platforms that care about kdump).
> > >
> >
> > One thing that strikes me as odd is that we are applying the same
> > shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
> > DRAM starts outside of the zone, it is shifted upwards.
> >
> > On a typical ARM box, this gives me
> >
> > [ 0.000000] Zone ranges:
> > [ 0.000000] DMA [mem 0x0000000080000000-0x00000000bfffffff]
> > [ 0.000000] DMA32 [mem 0x00000000c0000000-0x00000000ffffffff]
> > [ 0.000000] Normal [mem 0x0000000100000000-0x0000000fffffffff]
> >
> > i.e., the 30-bit addressable range has bit 31 set, which is weird.
>
> Yes I agree it's weird, and IMO plain useless. I sent a series this summer to
> address this[1], which ultimately triggered the discussion we're having right
> now.

I don't mind assuming that ZONE_DMA is always from pfn 0 (i.e. no
dma_offset for such constrained devices). But if ZONE_DMA32 is squeezed
out with ZONE_DMA extended to 4GB, it should allow non-zero upper 32
bits. IIRC we do have SoCs with RAM starting above 4GB.

However, your patch didn't completely solve the problem for non-RPi4
platforms as there's hardware with RAM starting at 0 that doesn't need
the 1GB ZONE_DMA. We may end up with a combination of the two
approaches.

> Although with with your latest patch and the DT counterpart, we should be OK.
> It would be weird for a HW description to define DMA constraints that are
> impossible to reach on that system.

I don't remember the difficulties with parsing a DT early for inferring
the ZONE_DMA requirements. Could we not check the dma-ranges property in
the soc node? I can see bcm2711.dtsi defines a 30-bit address range. We
are not interested in the absolute physical/bus addresses, just the
size to check whether it's smaller than 32-bit.

> > I wonder if it wouldn't be better (and less problematic in the general
> > case) to drop this logic for ZONE_DMA, and simply let it remain empty
> > unless there is really some memory there.
>
> From my experience, you can't have empty ZONE_DMA when enabled. Empty
> ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed.

Indeed, because we still have GFP_DMA around which can't fall back to
ZONE_DMA32 (well, unless CONFIG_ZONE_DMA is disabled).

--
Catalin

2020-10-10 23:10:58

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <[email protected]> wrote:
>
> On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > <[email protected]> wrote:
> > > We can move this check to IORT code and call it from arm64 if it
> > > can be made to work.
> >
> > Finding the smallest value in the IORT, and assigning it to
> > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > how is this going to fix the underlying issue?
>
> If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> allocations fall back to ZONE_DMA).
>
> kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> (it's been a while since I looked), the kdump allocation couldn't span
> multiple zones.
>
> In a separate thread, we try to fix kdump to use allocations above 4G as
> a fallback but this only fixes platforms with enough RAM (and maybe it's
> only those platforms that care about kdump).
>

One thing that strikes me as odd is that we are applying the same
shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
DRAM starts outside of the zone, it is shifted upwards.

On a typical ARM box, this gives me

[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000080000000-0x00000000bfffffff]
[ 0.000000] DMA32 [mem 0x00000000c0000000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x0000000fffffffff]

i.e., the 30-bit addressable range has bit 31 set, which is weird.

I wonder if it wouldn't be better (and less problematic in the general
case) to drop this logic for ZONE_DMA, and simply let it remain empty
unless there is really some memory there.

2020-10-11 14:27:20

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 19:10, Catalin Marinas <[email protected]> wrote:
> > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > > <[email protected]> wrote:
> > > > We can move this check to IORT code and call it from arm64 if it
> > > > can be made to work.
> > >
> > > Finding the smallest value in the IORT, and assigning it to
> > > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > > how is this going to fix the underlying issue?
> >
> > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> > allocations fall back to ZONE_DMA).
> >
> > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> > (it's been a while since I looked), the kdump allocation couldn't span
> > multiple zones.
> >
> > In a separate thread, we try to fix kdump to use allocations above 4G as
> > a fallback but this only fixes platforms with enough RAM (and maybe it's
> > only those platforms that care about kdump).
> >
>
> One thing that strikes me as odd is that we are applying the same
> shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
> DRAM starts outside of the zone, it is shifted upwards.
>
> On a typical ARM box, this gives me
>
> [ 0.000000] Zone ranges:
> [ 0.000000] DMA [mem 0x0000000080000000-0x00000000bfffffff]
> [ 0.000000] DMA32 [mem 0x00000000c0000000-0x00000000ffffffff]
> [ 0.000000] Normal [mem 0x0000000100000000-0x0000000fffffffff]
>
> i.e., the 30-bit addressable range has bit 31 set, which is weird.

Yes I agree it's weird, and IMO plain useless. I sent a series this summer to
address this[1], which ultimately triggered the discussion we're having right
now.

Although with with your latest patch and the DT counterpart, we should be OK.
It would be weird for a HW description to define DMA constraints that are
impossible to reach on that system.

> I wonder if it wouldn't be better (and less problematic in the general
> case) to drop this logic for ZONE_DMA, and simply let it remain empty
> unless there is really some memory there.

From my experience, you can't have empty ZONE_DMA when enabled. Empty
ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed.

Regards,
Nicolas

[1] https://lkml.org/lkml/2020/8/19/1022



Attachments:
signature.asc (499.00 B)
This is a digitally signed message part

2020-10-12 06:50:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Fri, Oct 09, 2020 at 06:10:52PM +0100, Catalin Marinas wrote:
> kdump wants DMA-able memory and,

DMAable by whom? The only way to guranteed DMAable memory is to use
the DMA memory allocator(s) and pass a specific device to them. Everyting
else is just fundamentally broken. Note that even when device is not
DMAable we can still use swiotlb to access it.

2020-10-12 08:49:40

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

On Mon, Oct 12, 2020 at 08:47:15AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 09, 2020 at 06:10:52PM +0100, Catalin Marinas wrote:
> > kdump wants DMA-able memory and,
>
> DMAable by whom? The only way to guranteed DMAable memory is to use
> the DMA memory allocator(s) and pass a specific device to them. Everyting
> else is just fundamentally broken. Note that even when device is not
> DMAable we can still use swiotlb to access it.

What I meant is that the new kexec'ed kernel needs some memory in the
ZONE_DMA range, currently set to the bottom 30-bit even for platforms
that can cope with the whole 32-bit range (anything other than RPi4).
The memory range available to the kdump kernels is limited to what
reserve_crashkernel() allocated, which may not fit in the lower 1GB.

There are two ongoing threads (complementary):

1. Change the arm64 reserve_crashkernel() similar to x86 which allocates
memory above 4G with a small block in the ZONE_DMA range.

2. Allow zone_dma_bits to be 32 for arm64 platforms other than RPi4.

The second point also fixes some regressions with CMA reservations that
could no longer fit in the lower 1GB.

--
Catalin