The reserved_mem array is used to store the data of the different
reserved memory regions specified in the DT of a device.
The array stores information such as the name, node, starting address,
and size of a reserved memory region.
The array is currently statically allocated with a size of
MAX_RESERVED_REGIONS(64). This means that any system that specifies a
number of reserved memory regions greater than MAX_RESERVED_REGIONS(64)
will not have enough space to store the information for all the regions.
Therefore, this series extends the use of a static array for
reserved_mem, and introduces a dynamically allocated array using
memblock_alloc() based on the number of reserved memory regions
specified in the DT.
Memory gotten from memblock_alloc() is only writable after paging_init()
is called, but the reserved memory regions need to be reserved before
then so that the system does not create page table mappings for them.
Reserved memory regions can be divided into 2 groups.
i) Statically-placed reserved memory regions
i.e. regions defined in the DT using the @reg property.
ii) Dynamically-placed reserved memory regions.
i.e. regions specified in the DT using the @alloc_ranges
and @size properties.
It is possible to call memblock_reserve() and memblock_mark_nomap() on
the statically-placed reserved memory regions and not need to save them
to the array until after paging_init(), but this is not possible for the
dynamically-placed reserved memory because the starting address of these
regions need to be stored somewhere after they are allocated.
Therefore, this series achieves the allocation and population of the
reserved_mem array in two steps:
1. Before paging_init()
Before paging_init() is called, iterate through the reserved_mem
nodes in the DT and do the following:
- Allocate memory for dynamically-placed reserved memory regions and
store their starting address in the static allocated reserved_mem
array.
- Call memblock_reserve() and memblock_mark_nomap() on all the
reserved memory regions as needed.
- Count the total number of reserved_mem nodes in the DT.
2. After paging_init()
After paging_init() is called:
- Allocate new memory for the reserved_mem array based on the number
of reserved memory nodes in the DT.
- Transfer all the information that was stored in the static array
into the new array.
- Store the rest of the reserved_mem regions in the new array.
i.e. the statically-placed regions.
The static array is no longer needed after this point, but there is
currently no obvious way to free the memory. Therefore, the size of the
initial static array is now defined using a config option.
Because the array is used only before paging_init() to store the
dynamically-placed reserved memory regions, the required size can vary
from device to device. Therefore, scaling it can help get some memory
savings.
A possible solution to freeing the memory for the static array will be
to mark it as __initdata. This will automatically free the memory once
the init process is done running.
The reason why this is not pursued in this series is because of
the possibility of a use-after-free.
If the dynamic allocation of the reserved_mem array fails, then future
accesses of the reserved_mem array will still be referencing the static
array. When the init process ends and the memory is freed up, any
further attempts to use the reserved_mem array will result in a
use-after-free.
Note:
- The limitation to this approach is that there is still a limit of
64 for dynamically reserved memory regions.
- Upon further review, the series might need to be split up/duplicated
for other archs.
Oreoluwa Babatunde (6):
of: reserved_mem: Change the order that reserved_mem regions are
stored
of: reserved_mem: Swicth call to unflatten_device_tree() to after
paging_init()
of: resevred_mem: Delay allocation of memory for dynamic regions
of: reserved_mem: Add code to use unflattened DT for reserved_mem
nodes
of: reserved_mem: Add code to dynamically allocate reserved_mem array
of: reserved_mem: Make MAX_RESERVED_REGIONS a config option
arch/loongarch/kernel/setup.c | 2 +-
arch/mips/kernel/setup.c | 3 +-
arch/nios2/kernel/setup.c | 4 +-
arch/openrisc/kernel/setup.c | 4 +-
arch/powerpc/kernel/setup-common.c | 3 +
arch/sh/kernel/setup.c | 5 +-
arch/um/kernel/dtb.c | 1 -
arch/um/kernel/um_arch.c | 2 +
arch/xtensa/kernel/setup.c | 4 +-
drivers/of/Kconfig | 13 +++
drivers/of/fdt.c | 39 +++++--
drivers/of/of_private.h | 6 +-
drivers/of/of_reserved_mem.c | 175 +++++++++++++++++++++--------
include/linux/of_reserved_mem.h | 8 +-
kernel/dma/coherent.c | 4 +-
kernel/dma/contiguous.c | 8 +-
kernel/dma/swiotlb.c | 10 +-
17 files changed, 205 insertions(+), 86 deletions(-)
--
2.17.1
Switch call to unflatten_device_tree() to after paging_init() on other
archs to follow new order in which the reserved_mem regions are
processed.
Signed-off-by: Oreoluwa Babatunde <[email protected]>
---
arch/loongarch/kernel/setup.c | 2 +-
arch/mips/kernel/setup.c | 3 ++-
arch/nios2/kernel/setup.c | 4 ++--
arch/openrisc/kernel/setup.c | 4 ++--
arch/powerpc/kernel/setup-common.c | 3 +++
arch/sh/kernel/setup.c | 5 ++---
arch/um/kernel/dtb.c | 1 -
arch/um/kernel/um_arch.c | 2 ++
arch/xtensa/kernel/setup.c | 4 +++-
9 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index d183a745fb85..a01051b0f9e0 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -366,7 +366,6 @@ void __init platform_init(void)
acpi_gbl_use_default_register_widths = false;
acpi_boot_table_init();
#endif
- unflatten_and_copy_device_tree();
#ifdef CONFIG_NUMA
init_numa_memory();
@@ -626,6 +625,7 @@ void __init setup_arch(char **cmdline_p)
paging_init();
+ unflatten_and_copy_device_tree();
#ifdef CONFIG_KASAN
kasan_init();
#endif
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 2d2ca024bd47..d3b6c86a8037 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -667,7 +667,6 @@ static void __init arch_mem_init(char **cmdline_p)
mips_reserve_vmcore();
mips_parse_crashkernel();
- device_tree_init();
/*
* In order to reduce the possibility of kernel panic when failed to
@@ -798,6 +797,8 @@ void __init setup_arch(char **cmdline_p)
cpu_cache_init();
paging_init();
+ device_tree_init();
+
memblock_dump_all();
setup_rng_seed();
diff --git a/arch/nios2/kernel/setup.c b/arch/nios2/kernel/setup.c
index da122a5fa43b..6f1a4232b8f0 100644
--- a/arch/nios2/kernel/setup.c
+++ b/arch/nios2/kernel/setup.c
@@ -169,8 +169,6 @@ void __init setup_arch(char **cmdline_p)
early_init_fdt_reserve_self();
early_init_fdt_scan_reserved_mem();
- unflatten_and_copy_device_tree();
-
setup_cpuinfo();
copy_exception_handler(cpuinfo.exception_addr);
@@ -189,4 +187,6 @@ void __init setup_arch(char **cmdline_p)
* get kmalloc into gear
*/
paging_init();
+
+ unflatten_and_copy_device_tree();
}
diff --git a/arch/openrisc/kernel/setup.c b/arch/openrisc/kernel/setup.c
index 9cf7fb60441f..fcda33bdca19 100644
--- a/arch/openrisc/kernel/setup.c
+++ b/arch/openrisc/kernel/setup.c
@@ -255,8 +255,6 @@ void calibrate_delay(void)
void __init setup_arch(char **cmdline_p)
{
- unflatten_and_copy_device_tree();
-
setup_cpuinfo();
#ifdef CONFIG_SMP
@@ -284,6 +282,8 @@ void __init setup_arch(char **cmdline_p)
/* paging_init() sets up the MMU and marks all pages as reserved */
paging_init();
+ unflatten_and_copy_device_tree();
+
*cmdline_p = boot_command_line;
printk(KERN_INFO "OpenRISC Linux -- http://openrisc.io\n");
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 9b142b9d5187..58da58d02652 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -986,6 +986,9 @@ void __init setup_arch(char **cmdline_p)
paging_init();
+ /* Unflatten the device-tree passed by prom_init or kexec */
+ unflatten_device_tree();
+
/* Initialize the MMU context management stuff. */
mmu_context_init();
diff --git a/arch/sh/kernel/setup.c b/arch/sh/kernel/setup.c
index 3d80515298d2..2553696af21b 100644
--- a/arch/sh/kernel/setup.c
+++ b/arch/sh/kernel/setup.c
@@ -322,6 +322,8 @@ void __init setup_arch(char **cmdline_p)
/* Let earlyprintk output early console messages */
sh_early_platform_driver_probe("earlyprintk", 1, 1);
+ paging_init();
+
#ifdef CONFIG_OF_EARLY_FLATTREE
#ifdef CONFIG_USE_BUILTIN_DTB
unflatten_and_copy_device_tree();
@@ -329,9 +331,6 @@ void __init setup_arch(char **cmdline_p)
unflatten_device_tree();
#endif
#endif
-
- paging_init();
-
/* Perform the machine specific initialisation */
if (likely(sh_mv.mv_setup))
sh_mv.mv_setup(cmdline_p);
diff --git a/arch/um/kernel/dtb.c b/arch/um/kernel/dtb.c
index 484141b06938..04b0ada3b929 100644
--- a/arch/um/kernel/dtb.c
+++ b/arch/um/kernel/dtb.c
@@ -26,7 +26,6 @@ void uml_dtb_init(void)
}
early_init_fdt_scan_reserved_mem();
- unflatten_device_tree();
}
static int __init uml_dtb_setup(char *line, int *add)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index b1bfed0c8528..fe6ecaa12ef2 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -7,6 +7,7 @@
#include <linux/delay.h>
#include <linux/init.h>
#include <linux/mm.h>
+#include <linux/of_fdt.h>
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/panic_notifier.h>
@@ -421,6 +422,7 @@ void __init setup_arch(char **cmdline_p)
read_initrd();
paging_init();
+ unflatten_device_tree();
strscpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
setup_hostinfo(host_info, sizeof host_info);
diff --git a/arch/xtensa/kernel/setup.c b/arch/xtensa/kernel/setup.c
index bdec4a773af0..d20c56b4182e 100644
--- a/arch/xtensa/kernel/setup.c
+++ b/arch/xtensa/kernel/setup.c
@@ -355,13 +355,15 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
bootmem_init();
kasan_init();
- unflatten_and_copy_device_tree();
#ifdef CONFIG_SMP
smp_init_cpus();
#endif
paging_init();
+
+ unflatten_and_copy_device_tree();
+
zones_init();
#ifdef CONFIG_VT
--
2.17.1
On Mon, Dec 04, 2023 at 10:54:05AM -0800, Oreoluwa Babatunde wrote:
> Switch call to unflatten_device_tree() to after paging_init() on other
> archs to follow new order in which the reserved_mem regions are
> processed.
You did this so that you could allocate memory for the reserved regions.
But unflatten_device_tree() can already do allocations using memblock.
So the same thing should work for you.
I suspect that moving this will break any arch that called an of_* API
between the original call and the new call location.
Rob
On Mon, Dec 04, 2023 at 10:54:03AM -0800, Oreoluwa Babatunde wrote:
> The reserved_mem array is used to store the data of the different
> reserved memory regions specified in the DT of a device.
> The array stores information such as the name, node, starting address,
> and size of a reserved memory region.
>
> The array is currently statically allocated with a size of
> MAX_RESERVED_REGIONS(64). This means that any system that specifies a
> number of reserved memory regions greater than MAX_RESERVED_REGIONS(64)
> will not have enough space to store the information for all the regions.
>
> Therefore, this series extends the use of a static array for
> reserved_mem, and introduces a dynamically allocated array using
> memblock_alloc() based on the number of reserved memory regions
> specified in the DT.
>
> Memory gotten from memblock_alloc() is only writable after paging_init()
> is called, but the reserved memory regions need to be reserved before
> then so that the system does not create page table mappings for them.
>
> Reserved memory regions can be divided into 2 groups.
> i) Statically-placed reserved memory regions
> i.e. regions defined in the DT using the @reg property.
> ii) Dynamically-placed reserved memory regions.
> i.e. regions specified in the DT using the @alloc_ranges
> and @size properties.
>
> It is possible to call memblock_reserve() and memblock_mark_nomap() on
> the statically-placed reserved memory regions and not need to save them
> to the array until after paging_init(), but this is not possible for the
> dynamically-placed reserved memory because the starting address of these
> regions need to be stored somewhere after they are allocated.
>
> Therefore, this series achieves the allocation and population of the
> reserved_mem array in two steps:
>
> 1. Before paging_init()
> Before paging_init() is called, iterate through the reserved_mem
> nodes in the DT and do the following:
> - Allocate memory for dynamically-placed reserved memory regions and
> store their starting address in the static allocated reserved_mem
> array.
> - Call memblock_reserve() and memblock_mark_nomap() on all the
> reserved memory regions as needed.
> - Count the total number of reserved_mem nodes in the DT.
>
> 2. After paging_init()
> After paging_init() is called:
> - Allocate new memory for the reserved_mem array based on the number
> of reserved memory nodes in the DT.
> - Transfer all the information that was stored in the static array
> into the new array.
> - Store the rest of the reserved_mem regions in the new array.
> i.e. the statically-placed regions.
>
> The static array is no longer needed after this point, but there is
> currently no obvious way to free the memory. Therefore, the size of the
> initial static array is now defined using a config option.
A config option is not going to work here.
> Because the array is used only before paging_init() to store the
> dynamically-placed reserved memory regions, the required size can vary
> from device to device. Therefore, scaling it can help get some memory
> savings.
>
> A possible solution to freeing the memory for the static array will be
> to mark it as __initdata. This will automatically free the memory once
> the init process is done running.
> The reason why this is not pursued in this series is because of
> the possibility of a use-after-free.
> If the dynamic allocation of the reserved_mem array fails, then future
> accesses of the reserved_mem array will still be referencing the static
> array. When the init process ends and the memory is freed up, any
> further attempts to use the reserved_mem array will result in a
> use-after-free.
If memory allocation for the reserved_mem array fails so early in boot,
you've got much bigger problems. Use __initdata, and just WARN if
allocation fails and continue on (so hopefully the console is brought
up and someone can see the WARN).
Rob
On 12/6/2023 1:35 PM, Rob Herring wrote:
> On Mon, Dec 04, 2023 at 10:54:03AM -0800, Oreoluwa Babatunde wrote:
>> The reserved_mem array is used to store the data of the different
>> reserved memory regions specified in the DT of a device.
>> The array stores information such as the name, node, starting address,
>> and size of a reserved memory region.
>>
>> The array is currently statically allocated with a size of
>> MAX_RESERVED_REGIONS(64). This means that any system that specifies a
>> number of reserved memory regions greater than MAX_RESERVED_REGIONS(64)
>> will not have enough space to store the information for all the regions.
>>
>> Therefore, this series extends the use of a static array for
>> reserved_mem, and introduces a dynamically allocated array using
>> memblock_alloc() based on the number of reserved memory regions
>> specified in the DT.
>>
>> Memory gotten from memblock_alloc() is only writable after paging_init()
>> is called, but the reserved memory regions need to be reserved before
>> then so that the system does not create page table mappings for them.
>>
>> Reserved memory regions can be divided into 2 groups.
>> i) Statically-placed reserved memory regions
>> i.e. regions defined in the DT using the @reg property.
>> ii) Dynamically-placed reserved memory regions.
>> i.e. regions specified in the DT using the @alloc_ranges
>> and @size properties.
>>
>> It is possible to call memblock_reserve() and memblock_mark_nomap() on
>> the statically-placed reserved memory regions and not need to save them
>> to the array until after paging_init(), but this is not possible for the
>> dynamically-placed reserved memory because the starting address of these
>> regions need to be stored somewhere after they are allocated.
>>
>> Therefore, this series achieves the allocation and population of the
>> reserved_mem array in two steps:
>>
>> 1. Before paging_init()
>> Before paging_init() is called, iterate through the reserved_mem
>> nodes in the DT and do the following:
>> - Allocate memory for dynamically-placed reserved memory regions and
>> store their starting address in the static allocated reserved_mem
>> array.
>> - Call memblock_reserve() and memblock_mark_nomap() on all the
>> reserved memory regions as needed.
>> - Count the total number of reserved_mem nodes in the DT.
>>
>> 2. After paging_init()
>> After paging_init() is called:
>> - Allocate new memory for the reserved_mem array based on the number
>> of reserved memory nodes in the DT.
>> - Transfer all the information that was stored in the static array
>> into the new array.
>> - Store the rest of the reserved_mem regions in the new array.
>> i.e. the statically-placed regions.
>>
>> The static array is no longer needed after this point, but there is
>> currently no obvious way to free the memory. Therefore, the size of the
>> initial static array is now defined using a config option.
> A config option is not going to work here.
>
>> Because the array is used only before paging_init() to store the
>> dynamically-placed reserved memory regions, the required size can vary
>> from device to device. Therefore, scaling it can help get some memory
>> savings.
>>
>> A possible solution to freeing the memory for the static array will be
>> to mark it as __initdata. This will automatically free the memory once
>> the init process is done running.
>> The reason why this is not pursued in this series is because of
>> the possibility of a use-after-free.
>> If the dynamic allocation of the reserved_mem array fails, then future
>> accesses of the reserved_mem array will still be referencing the static
>> array. When the init process ends and the memory is freed up, any
>> further attempts to use the reserved_mem array will result in a
>> use-after-free.
> If memory allocation for the reserved_mem array fails so early in boot,
> you've got much bigger problems. Use __initdata, and just WARN if
> allocation fails and continue on (so hopefully the console is brought
> up and someone can see the WARN).
>
> Rob
Thanks for pointing that out! I'll move forward with implementing the use
of __initdata to delete the static array.
Regards,
Oreoluwa
On 12/6/2023 1:31 PM, Rob Herring wrote:
> On Mon, Dec 04, 2023 at 10:54:05AM -0800, Oreoluwa Babatunde wrote:
>> Switch call to unflatten_device_tree() to after paging_init() on other
>> archs to follow new order in which the reserved_mem regions are
>> processed.
> You did this so that you could allocate memory for the reserved regions.
> But unflatten_device_tree() can already do allocations using memblock.
> So the same thing should work for you.
>
> I suspect that moving this will break any arch that called an of_* API
> between the original call and the new call location.
>
> Rob
Hi Rob,
Thank you for your feedback. You are correct, I see that
unflatten_device_tree() is allocating memory from memblock and writing
to it before paging_init() is called on these other architectures.
This series will still require fdt_init_reserved_mem() to be called
after paging_init() on architectures such as arm64 which require
paging_init() to run before memblock allocated memory is writable.
In light of this, there seems to be no non-architecture-specific
universal calling point to stick in fdt_init_reserved_mem() that works
for all architectures. Therefore, I propose taking out the call to
fdt_init_reserved_mem() from the unflatten_device_tree() function and
placing it in the setup_arch() function for each architecture.
(similar to the way it was in v1 of this series).
Doing this will allow us to place fdt_init_reserved_mem() after
paging_init() on architectures that require paging_init() to be called
before memblock allocated memory is writable, and place it before
paging_init() on architectures that does not require paging_init() to be
called before writing to memblock allocated memory.
fdt_init_reserved_mem() can also be called after unflatten_device_tree()
on all architectures to ensure we get the speed benefits of using the
unflattened device tree APIs for the final pass.
Regards,
Oreoluwa