2024-05-28 22:37:39

by Oreoluwa Babatunde

[permalink] [raw]
Subject: [PATCH v6 0/4] Dynamic Allocation of the reserved_mem array

The reserved_mem array is used to store data for the different
reserved memory regions defined in the DT of a device. The array
stores information such as region name, node reference, start-address,
and size of the different reserved memory regions.

The array is currently statically allocated with a size of
MAX_RESERVED_REGIONS(64). This means that any system that specifies a
number of reserved memory regions greater than MAX_RESERVED_REGIONS(64)
will not have enough space to store the information for all the regions.

This can be fixed by making the reserved_mem array a dynamically sized
array which is allocated using memblock_alloc() based on the exact
number of reserved memory regions defined in the DT.

On architectures such as arm64, memblock allocated memory is not
writable until after the page tables have been setup.
This is an issue because the current implementation initializes the
reserved memory regions and stores their information in the array before
the page tables are setup. Hence, dynamically allocating the
reserved_mem array and attempting to write information to it at this
point will fail.

Therefore, the allocation of the reserved_mem array will need to be done
after the page tables have been setup, which means that the reserved
memory regions will also need to wait until after the page tables have
been setup to be stored in the array.

When processing the reserved memory regions defined in the DT, these
regions are marked as reserved by calling memblock_reserve(base, size).
Where: base = base address of the reserved region.
size = the size of the reserved memory region.

Depending on if that region is defined using the "no-map" property,
memblock_mark_nomap(base, size) is also called.

The "no-map" property is used to indicate to the operating system that a
mapping of the specified region must NOT be created. This also means
that no access (including speculative accesses) is allowed on this
region of memory except when it is coming from the device driver that
this region of memory is being reserved for.[1]

Therefore, it is important to call memblock_reserve() and
memblock_mark_nomap() on all the reserved memory regions before the
system sets up the page tables so that the system does not unknowingly
include any of the no-map reserved memory regions in the memory map.

There are two ways to define how/where a reserved memory region is
placed in memory:
i) Statically-placed reserved memory regions
i.e. regions defined with a set start address and size using the
"reg" property in the DT.
ii) Dynamically-placed reserved memory regions.
i.e. regions defined by specifying a range of addresses where they can
be placed in memory using the "alloc_ranges" and "size" properties
in the DT.

The dynamically-placed reserved memory regions get assigned a start
address only at runtime. And this needs to be done before the page
tables are setup so that memblock_reserve() and memblock_mark_nomap()
can be called on the allocated region as explained above.
Since the dynamically allocated reserved_mem array can only available
after the page tables have been setup, the information for the
dynamically-placed reserved memory regions needs to be stored somewhere
temporarily until the reserved_mem array is available.

Therefore, this series makes use of a temporary static array to store
the information of the dynamically-placed reserved memory regions until
the reserved_mem array is allocated.
Once the reserved_mem array is available, the information is copied over
from the temporary array into the reserved_mem array, and the memory for
the temporary array is freed back to the system.

The information for the statically-placed reserved memory regions does
not need to be stored in a temporary array because their starting
address is already stored in the devicetree.
Hence, the only thing that needs to be done for these regions before the
page tables are setup is to call memblock_reserve() and
memblock_mark_nomap().
Once the reserved_mem array is allocated, the information for the
statically-placed reserved memory regions is added to the array.

Note:
Because of the use of a temporary array to store the information of the
dynamically-placed reserved memory regions, there still exists a
limitation of 64 for this particular kind of reserved memory regions.
From my observation, these regions are typically small in number and
hence I expect this to not be an issue for now.

Dependency:
This series is dependent on the below patchset for proper behavior on
the sh architecture. The patch is currently being reviewed by the
relevant architecture maintainer and will hopefully be merged soon.
https://lore.kernel.org/all/[email protected]/

Patch Versions:
v6:
- Rebased patchset on top of v6.10-rc1.
- Addressed comments received in v5 such as:
1. Switched to using relevant typed functions such as
of_property_read_u32(), of_property_present(), etc.
2. Switched to using of_address_to_resource() to read the "reg"
property of nodes.
3. Renamed functions using "of_*" naming scheme instead of "dt_*".

v5:
https://lore.kernel.org/all/[email protected]/
- Rebased changes on top of v6.9-rc1.
- Addressed minor code comments from v4.

v4:
https://lore.kernel.org/all/[email protected]/
- Move fdt_init_reserved_mem() back into the unflatten_device_tree()
function.
- Fix warnings found by Kernel test robot:
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/

v3:
https://lore.kernel.org/all/[email protected]/
- Make use of __initdata to delete the temporary static array after
dynamically allocating memory for reserved_mem array using memblock.
- Move call to fdt_init_reserved_mem() out of the
unflatten_device_tree() function and into architecture specific setup
code.
- Breaking up the changes for the individual architectures into separate
patches.

v2:
https://lore.kernel.org/all/[email protected]/
- Extend changes to all other relevant architectures by moving
fdt_init_reserved_mem() into the unflatten_device_tree() function.
- Add code to use unflatten devicetree APIs to process the reserved
memory regions.

v1:
https://lore.kernel.org/all/[email protected]/

References:
[1]
https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/reserved-memory/reserved-memory.yaml#L79

Oreoluwa Babatunde (4):
of: reserved_mem: Restruture how the reserved memory regions are
processed
of: reserved_mem: Add code to dynamically allocate reserved_mem array
of: reserved_mem: Use unflatten_devicetree APIs to scan reserved
memory nodes
of: reserved_mem: Rename fdt_* functions to refelct the change from
using fdt APIs

drivers/of/fdt.c | 5 +-
drivers/of/of_private.h | 3 +-
drivers/of/of_reserved_mem.c | 247 ++++++++++++++++++++++++--------
include/linux/of_reserved_mem.h | 2 +-
kernel/dma/coherent.c | 10 +-
kernel/dma/contiguous.c | 8 +-
kernel/dma/swiotlb.c | 10 +-
7 files changed, 208 insertions(+), 77 deletions(-)

--
2.34.1



2024-05-28 22:37:41

by Oreoluwa Babatunde

[permalink] [raw]
Subject: [PATCH v6 2/4] of: reserved_mem: Add code to dynamically allocate reserved_mem array

The reserved_mem array is statically allocated with a size of
MAX_RESERVED_REGIONS(64). Therefore, if the number of reserved_mem
regions exceeds this size, there will not be enough space to store
all the data.

Hence, extend the use of the static array by introducing a
dynamically allocated array based on the number of reserved memory
regions specified in the DT.

On architectures such as arm64, memblock allocated memory is not
writable until after the page tables have been setup. Hence, the
dynamic allocation of the reserved_mem array will need to be done only
after the page tables have been setup.

As a result, a temporary static array is still needed in the initial
stages to store the information of the dynamically-placed reserved
memory regions because the start address is selected only at run-time
and is not stored anywhere else.
It is not possible to wait until the reserved_mem array is allocated
because this is done after the page tables are setup and the reserved
memory regions need to be initialized before then.

After the reserved_mem array is allocated, all entries from the static
array is copied over to the new array, and the rest of the information
for the statically-placed reserved memory regions are read in from the
DT and stored in the new array as well.

Once the init process is completed, the temporary static array is
released back to the system because it is no longer needed. This is
achieved by marking it as __initdata.

Signed-off-by: Oreoluwa Babatunde <[email protected]>
---
drivers/of/of_reserved_mem.c | 58 +++++++++++++++++++++++++++++++++---
1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 78ffc2635f11..113d593ea031 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -27,7 +27,9 @@

#include "of_private.h"

-static struct reserved_mem reserved_mem[MAX_RESERVED_REGIONS];
+static struct reserved_mem reserved_mem_array[MAX_RESERVED_REGIONS] __initdata;
+static struct reserved_mem *reserved_mem __refdata = reserved_mem_array;
+static int total_reserved_mem_cnt = MAX_RESERVED_REGIONS;
static int reserved_mem_count;

static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
@@ -55,6 +57,45 @@ static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
return err;
}

+/*
+ * alloc_reserved_mem_array() - allocate memory for the reserved_mem
+ * array using memblock
+ *
+ * This function is used to allocate memory for the reserved_mem array
+ * according to the total number of reserved memory regions defined in
+ * the DT.
+ * After the new array is allocated, the information stored in the
+ * initial static array is copied over to this new array and the new
+ * array is used from this point on.
+ */
+static void __init alloc_reserved_mem_array(void)
+{
+ struct reserved_mem *new_array;
+ size_t alloc_size, copy_size, memset_size;
+
+ alloc_size = array_size(total_reserved_mem_cnt, sizeof(*new_array));
+ if (alloc_size == SIZE_MAX)
+ pr_err("Failed to allocate memory for reserved_mem array with err: %d", -EOVERFLOW);
+
+ new_array = memblock_alloc(alloc_size, SMP_CACHE_BYTES);
+ if (!new_array)
+ pr_err("Failed to allocate memory for reserved_mem array with err: %d", -ENOMEM);
+
+ copy_size = array_size(reserved_mem_count, sizeof(*new_array));
+ if (copy_size == SIZE_MAX) {
+ memblock_free(new_array, alloc_size);
+ total_reserved_mem_cnt = MAX_RESERVED_REGIONS;
+ pr_err("Failed to allocate memory for reserved_mem array with err: %d", -EOVERFLOW);
+ }
+
+ memset_size = alloc_size - copy_size;
+
+ memcpy(new_array, reserved_mem, copy_size);
+ memset(new_array + reserved_mem_count, 0, memset_size);
+
+ reserved_mem = new_array;
+}
+
/*
* fdt_reserved_mem_save_node() - save fdt node for second pass initialization
*/
@@ -63,7 +104,7 @@ static void __init fdt_reserved_mem_save_node(unsigned long node, const char *un
{
struct reserved_mem *rmem = &reserved_mem[reserved_mem_count];

- if (reserved_mem_count == ARRAY_SIZE(reserved_mem)) {
+ if (reserved_mem_count == total_reserved_mem_cnt) {
pr_err("not enough space for all defined regions.\n");
return;
}
@@ -220,7 +261,7 @@ static int __init __reserved_mem_alloc_size(unsigned long node, const char *unam
int __init fdt_scan_reserved_mem(void)
{
int node, child;
- int dynamic_nodes_cnt = 0;
+ int dynamic_nodes_cnt = 0, count = 0;
int dynamic_nodes[MAX_RESERVED_REGIONS];
const void *fdt = initial_boot_params;

@@ -243,6 +284,8 @@ int __init fdt_scan_reserved_mem(void)
uname = fdt_get_name(fdt, child, NULL);

err = __reserved_mem_reserve_reg(child, uname);
+ if (!err)
+ count++;
/*
* Save the nodes for the dynamically-placed regions
* into an array which will be used for allocation right
@@ -257,11 +300,16 @@ int __init fdt_scan_reserved_mem(void)
}
for (int i = 0; i < dynamic_nodes_cnt; i++) {
const char *uname;
+ int err;

child = dynamic_nodes[i];
uname = fdt_get_name(fdt, child, NULL);
- __reserved_mem_alloc_size(child, uname);
+
+ err = __reserved_mem_alloc_size(child, uname);
+ if (!err)
+ count++;
}
+ total_reserved_mem_cnt = count++;
return 0;
}

@@ -494,6 +542,8 @@ void __init fdt_init_reserved_mem(void)
{
int i;

+ alloc_reserved_mem_array();
+
fdt_scan_reserved_mem_reg_nodes();

/* check for overlapping reserved regions */
--
2.34.1


2024-05-28 22:38:09

by Oreoluwa Babatunde

[permalink] [raw]
Subject: [PATCH v6 1/4] of: reserved_mem: Restruture how the reserved memory regions are processed

The current implementation processes the reserved memory regions in two
stages which are done with two separate functions within the
early_init_fdt_scan_reserved_mem() function.

Within the two stages of processing, the reserved memory regions are
broken up into two groups which are processed differently:
i) Statically-placed reserved memory regions
i.e. regions defined with a static start address and size using the
"reg" property in the DT.
ii) Dynamically-placed reserved memory regions.
i.e. regions defined by specifying a range of addresses where they can
be placed in memory using the "alloc_ranges" and "size" properties
in the DT.

Stage 1: fdt_scan_reserved_mem()
This stage of the reserved memory processing is used to scan through the
reserved memory nodes defined in the devicetree and do the following on
each of the nodes:

1) If the node represents a statically-placed reserved memory region,
i.e. it is defined using the "reg" property:
- Call memblock_reserve() or memblock_mark_nomap() as needed.
- Add the information for the reserved region to the reserved_mem
array.
eg: fdt_reserved_mem_save_node(node, name, base, size);

2) If the node represents a dynamically-placed reserved memory region,
i.e. it is defined using "alloc-ranges" and "size" properties:
- Add the information for the region to the reserved_mem array with
the starting address and size set to 0.
eg: fdt_reserved_mem_save_node(node, name, 0, 0);

Stage 2: fdt_init_reserved_mem()
This stage of the reserved memory processing is used to iterate through
the reserved_mem array which was populated in stage 1 and do the
following on each of the entries:

1) If the entry represents a statically-placed reserved memory region:
- Call the region specific init function.
2) If the entry represents a dynamically-placed reserved memory region:
- Call __reserved_mem_alloc_size() which is used to allocate memory
for the region using memblock_phys_alloc_range(), and call
memblock_mark_nomap() on the allocated region if the region is
specified as a no-map region.
- Call the region specific init function.

On architectures such as arm64, the dynamic allocation of the
reserved_mem array needs to be done after the page tables have been
setup because memblock allocated memory is not writable until then. This
means that the reserved_mem array will not be available to store any
reserved memory information until after the page tables have been setup.

It is possible to call memblock_reserve() and memblock_mark_nomap() on
the statically-placed reserved memory regions and not need to save them
to the reserved_mem array until later. This is because all the
information we need is present in the devicetree.
Dynamically-placed reserved memory regions on the other hand get
assigned a start address only at runtime, and since memblock_reserve()
and memblock_mark_nomap() need to be called before the memory mappings
are created, the allocation needs to happen before the page tables are
setup.

To make it easier to handle dynamically-placed reserved memory regions
before the page tables are setup, this patch makes changes to the steps
above to process the reserved memory regions in the following ways:

Step 1: fdt_scan_reserved_mem()
This stage of the reserved memory processing is used to scan through the
reserved memory nodes defined in the devicetree and do the following on
each of the nodes:

1) If the node represents a statically-placed reserved memory region,
i.e. it is defined using the "reg" property:
- Call memblock_reserve() or memblock_mark_nomap() as needed.

2) If the node represents a dynamically-placed reserved memory region,
i.e. it is defined using "alloc-ranges" and "size" properties:
- Call __reserved_mem_alloc_size() which will:
i) Allocate memory for the reserved memory region.
ii) Call memblock_mark_nomap() as needed.
Note: There is no need to explicitly call memblock_reserve() here
because it is already called by memblock when the memory for the
region is being allocated.
iii) Save the information for the region in the reserved_mem array.

Step 2: fdt_init_reserved_mem()
This stage of the reserved memory processing is used to:

1) Add the information for the statically-placed reserved memory into
the reserved_mem array.

2) Iterate through all the entries in the array and call the region
specific init function for each of them.

fdt_init_reserved_mem() is also now called from within the
unflatten_device_tree() function so that this step happens after the
page tables have been setup.

Signed-off-by: Oreoluwa Babatunde <[email protected]>
---
drivers/of/fdt.c | 5 +-
drivers/of/of_private.h | 1 +
drivers/of/of_reserved_mem.c | 134 +++++++++++++++++++++++++----------
3 files changed, 100 insertions(+), 40 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index a8a04f27915b..527e6bc1c096 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -532,8 +532,6 @@ void __init early_init_fdt_scan_reserved_mem(void)
break;
memblock_reserve(base, size);
}
-
- fdt_init_reserved_mem();
}

/**
@@ -1259,6 +1257,9 @@ void __init unflatten_device_tree(void)
of_alias_scan(early_init_dt_alloc_memory_arch);

unittest_unflatten_overlay_base();
+
+ /* initialize the reserved memory regions */
+ fdt_init_reserved_mem();
}

/**
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 94fc0aa07af9..d889b8c17ca3 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -9,6 +9,7 @@
*/

#define FDT_ALIGN_SIZE 8
+#define MAX_RESERVED_REGIONS 64

/**
* struct alias_prop - Alias property in 'aliases' node
diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 46e1c3fbc769..78ffc2635f11 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -27,7 +27,6 @@

#include "of_private.h"

-#define MAX_RESERVED_REGIONS 64
static struct reserved_mem reserved_mem[MAX_RESERVED_REGIONS];
static int reserved_mem_count;

@@ -106,7 +105,6 @@ static int __init __reserved_mem_reserve_reg(unsigned long node,
phys_addr_t base, size;
int len;
const __be32 *prop;
- int first = 1;
bool nomap;

prop = of_get_flat_dt_prop(node, "reg", &len);
@@ -134,10 +132,6 @@ static int __init __reserved_mem_reserve_reg(unsigned long node,
uname, &base, (unsigned long)(size / SZ_1M));

len -= t_len;
- if (first) {
- fdt_reserved_mem_save_node(node, uname, base, size);
- first = 0;
- }
}
return 0;
}
@@ -165,12 +159,69 @@ static int __init __reserved_mem_check_root(unsigned long node)
return 0;
}

+/**
+ * fdt_scan_reserved_mem_reg_nodes() - Store info for the "reg" defined
+ * reserved memory regions.
+ *
+ * This function is used to scan through the DT and store the
+ * information for the reserved memory regions that are defined using
+ * the "reg" property. The region node number, name, base address, and
+ * size are all stored in the reserved_mem array by calling the
+ * fdt_reserved_mem_save_node() function.
+ */
+static void __init fdt_scan_reserved_mem_reg_nodes(void)
+{
+ int t_len = (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32);
+ const void *fdt = initial_boot_params;
+ phys_addr_t base, size;
+ const __be32 *prop;
+ int node, child;
+ int len;
+
+ node = fdt_path_offset(fdt, "/reserved-memory");
+ if (node < 0) {
+ pr_info("Reserved memory: No reserved-memory node in the DT\n");
+ return;
+ }
+
+ if (__reserved_mem_check_root(node)) {
+ pr_err("Reserved memory: unsupported node format, ignoring\n");
+ return;
+ }
+
+ fdt_for_each_subnode(child, fdt, node) {
+ const char *uname;
+
+ prop = of_get_flat_dt_prop(child, "reg", &len);
+ if (!prop)
+ continue;
+ if (!of_fdt_device_is_available(fdt, child))
+ continue;
+
+ uname = fdt_get_name(fdt, child, NULL);
+ if (len && len % t_len != 0) {
+ pr_err("Reserved memory: invalid reg property in '%s', skipping node.\n",
+ uname);
+ continue;
+ }
+ base = dt_mem_next_cell(dt_root_addr_cells, &prop);
+ size = dt_mem_next_cell(dt_root_size_cells, &prop);
+
+ if (size)
+ fdt_reserved_mem_save_node(child, uname, base, size);
+ }
+}
+
+static int __init __reserved_mem_alloc_size(unsigned long node, const char *uname);
+
/*
* fdt_scan_reserved_mem() - scan a single FDT node for reserved memory
*/
int __init fdt_scan_reserved_mem(void)
{
int node, child;
+ int dynamic_nodes_cnt = 0;
+ int dynamic_nodes[MAX_RESERVED_REGIONS];
const void *fdt = initial_boot_params;

node = fdt_path_offset(fdt, "/reserved-memory");
@@ -192,8 +243,24 @@ int __init fdt_scan_reserved_mem(void)
uname = fdt_get_name(fdt, child, NULL);

err = __reserved_mem_reserve_reg(child, uname);
- if (err == -ENOENT && of_get_flat_dt_prop(child, "size", NULL))
- fdt_reserved_mem_save_node(child, uname, 0, 0);
+ /*
+ * Save the nodes for the dynamically-placed regions
+ * into an array which will be used for allocation right
+ * after all the statically-placed regions are reserved
+ * or marked as no-map. This is done to avoid dynamically
+ * allocating from one of the statically-placed regions.
+ */
+ if (err == -ENOENT && of_get_flat_dt_prop(child, "size", NULL)) {
+ dynamic_nodes[dynamic_nodes_cnt] = child;
+ dynamic_nodes_cnt++;
+ }
+ }
+ for (int i = 0; i < dynamic_nodes_cnt; i++) {
+ const char *uname;
+
+ child = dynamic_nodes[i];
+ uname = fdt_get_name(fdt, child, NULL);
+ __reserved_mem_alloc_size(child, uname);
}
return 0;
}
@@ -253,8 +320,7 @@ static int __init __reserved_mem_alloc_in_range(phys_addr_t size,
* __reserved_mem_alloc_size() - allocate reserved memory described by
* 'size', 'alignment' and 'alloc-ranges' properties.
*/
-static int __init __reserved_mem_alloc_size(unsigned long node,
- const char *uname, phys_addr_t *res_base, phys_addr_t *res_size)
+static int __init __reserved_mem_alloc_size(unsigned long node, const char *uname)
{
int t_len = (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32);
phys_addr_t start = 0, end = 0;
@@ -333,10 +399,7 @@ static int __init __reserved_mem_alloc_size(unsigned long node,
uname, (unsigned long)(size / SZ_1M));
return -ENOMEM;
}
-
- *res_base = base;
- *res_size = size;
-
+ fdt_reserved_mem_save_node(node, uname, base, size);
return 0;
}

@@ -431,6 +494,8 @@ void __init fdt_init_reserved_mem(void)
{
int i;

+ fdt_scan_reserved_mem_reg_nodes();
+
/* check for overlapping reserved regions */
__rmem_check_for_overlap();

@@ -442,30 +507,23 @@ void __init fdt_init_reserved_mem(void)

nomap = of_get_flat_dt_prop(node, "no-map", NULL) != NULL;

- if (rmem->size == 0)
- err = __reserved_mem_alloc_size(node, rmem->name,
- &rmem->base, &rmem->size);
- if (err == 0) {
- err = __reserved_mem_init_node(rmem);
- if (err != 0 && err != -ENOENT) {
- pr_info("node %s compatible matching fail\n",
- rmem->name);
- if (nomap)
- memblock_clear_nomap(rmem->base, rmem->size);
- else
- memblock_phys_free(rmem->base,
- rmem->size);
- } else {
- phys_addr_t end = rmem->base + rmem->size - 1;
- bool reusable =
- (of_get_flat_dt_prop(node, "reusable", NULL)) != NULL;
-
- pr_info("%pa..%pa (%lu KiB) %s %s %s\n",
- &rmem->base, &end, (unsigned long)(rmem->size / SZ_1K),
- nomap ? "nomap" : "map",
- reusable ? "reusable" : "non-reusable",
- rmem->name ? rmem->name : "unknown");
- }
+ err = __reserved_mem_init_node(rmem);
+ if (err != 0 && err != -ENOENT) {
+ pr_info("node %s compatible matching fail\n", rmem->name);
+ if (nomap)
+ memblock_clear_nomap(rmem->base, rmem->size);
+ else
+ memblock_phys_free(rmem->base, rmem->size);
+ } else {
+ phys_addr_t end = rmem->base + rmem->size - 1;
+ bool reusable =
+ (of_get_flat_dt_prop(node, "reusable", NULL)) != NULL;
+
+ pr_info("%pa..%pa (%lu KiB) %s %s %s\n",
+ &rmem->base, &end, (unsigned long)(rmem->size / SZ_1K),
+ nomap ? "nomap" : "map",
+ reusable ? "reusable" : "non-reusable",
+ rmem->name ? rmem->name : "unknown");
}
}
}
--
2.34.1


2024-05-28 22:39:16

by Oreoluwa Babatunde

[permalink] [raw]
Subject: [PATCH v6 3/4] of: reserved_mem: Use unflatten_devicetree APIs to scan reserved memory nodes

The unflatten_devicetree APIs have been setup and are available to be
used by the time the fdt_init_reserved_mem() function is called.
Since the unflatten_devicetree APIs are a more efficient way of scanning
through the DT nodes, switch to using these APIs to facilitate the rest
of the reserved memory processing.

Signed-off-by: Oreoluwa Babatunde <[email protected]>
---
drivers/of/of_reserved_mem.c | 93 ++++++++++++++++++++-------------
include/linux/of_reserved_mem.h | 2 +-
kernel/dma/coherent.c | 10 ++--
kernel/dma/contiguous.c | 8 +--
kernel/dma/swiotlb.c | 10 ++--
5 files changed, 72 insertions(+), 51 deletions(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 113d593ea031..05283cd24c3b 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -14,6 +14,7 @@
#include <linux/err.h>
#include <linux/libfdt.h>
#include <linux/of.h>
+#include <linux/of_address.h>
#include <linux/of_fdt.h>
#include <linux/of_platform.h>
#include <linux/mm.h>
@@ -99,7 +100,7 @@ static void __init alloc_reserved_mem_array(void)
/*
* fdt_reserved_mem_save_node() - save fdt node for second pass initialization
*/
-static void __init fdt_reserved_mem_save_node(unsigned long node, const char *uname,
+static void __init fdt_reserved_mem_save_node(struct device_node *node, const char *uname,
phys_addr_t base, phys_addr_t size)
{
struct reserved_mem *rmem = &reserved_mem[reserved_mem_count];
@@ -109,7 +110,7 @@ static void __init fdt_reserved_mem_save_node(unsigned long node, const char *un
return;
}

- rmem->fdt_node = node;
+ rmem->dev_node = node;
rmem->name = uname;
rmem->base = base;
rmem->size = size;
@@ -178,11 +179,11 @@ static int __init __reserved_mem_reserve_reg(unsigned long node,
}

/*
- * __reserved_mem_check_root() - check if #size-cells, #address-cells provided
+ * __fdt_reserved_mem_check_root() - check if #size-cells, #address-cells provided
* in /reserved-memory matches the values supported by the current implementation,
* also check if ranges property has been provided
*/
-static int __init __reserved_mem_check_root(unsigned long node)
+static int __init __fdt_reserved_mem_check_root(unsigned long node)
{
const __be32 *prop;

@@ -200,6 +201,35 @@ static int __init __reserved_mem_check_root(unsigned long node)
return 0;
}

+/*
+ * of_reserved_mem_check_root() - check if #size-cells, #address-cells provided
+ * in /reserved-memory matches the values supported by the current implementation,
+ * also check if ranges property has been provided
+ */
+static int __init of_reserved_mem_check_root(struct device_node *node)
+{
+ u32 prop;
+ int ret;
+
+ ret = of_property_read_u32(node, "#size-cells", &prop);
+ if (ret)
+ return ret;
+
+ if (prop != dt_root_size_cells)
+ return -EINVAL;
+
+ ret = of_property_read_u32(node, "#address-cells", &prop);
+ if (ret)
+ return ret;
+
+ if (prop != dt_root_addr_cells)
+ return -EINVAL;
+
+ if (!of_property_present(node, "ranges"))
+ return -EINVAL;
+ return 0;
+}
+
/**
* fdt_scan_reserved_mem_reg_nodes() - Store info for the "reg" defined
* reserved memory regions.
@@ -212,41 +242,40 @@ static int __init __reserved_mem_check_root(unsigned long node)
*/
static void __init fdt_scan_reserved_mem_reg_nodes(void)
{
- int t_len = (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32);
- const void *fdt = initial_boot_params;
+ struct device_node *node, *child;
phys_addr_t base, size;
- const __be32 *prop;
- int node, child;
- int len;

- node = fdt_path_offset(fdt, "/reserved-memory");
- if (node < 0) {
+ node = of_find_node_by_path("/reserved-memory");
+ if (!node) {
pr_info("Reserved memory: No reserved-memory node in the DT\n");
return;
}

- if (__reserved_mem_check_root(node)) {
+ if (of_reserved_mem_check_root(node)) {
pr_err("Reserved memory: unsupported node format, ignoring\n");
return;
}

- fdt_for_each_subnode(child, fdt, node) {
+ for_each_child_of_node(node, child) {
+ int ret = 0;
const char *uname;
+ struct resource res;
+ struct reserved_mem *rmem;

- prop = of_get_flat_dt_prop(child, "reg", &len);
- if (!prop)
- continue;
- if (!of_fdt_device_is_available(fdt, child))
+ if (!of_device_is_available(child))
continue;

- uname = fdt_get_name(fdt, child, NULL);
- if (len && len % t_len != 0) {
- pr_err("Reserved memory: invalid reg property in '%s', skipping node.\n",
- uname);
+ ret = of_address_to_resource(child, 0, &res);
+ if (ret) {
+ rmem = of_reserved_mem_lookup(child);
+ if (rmem)
+ rmem->dev_node = child;
continue;
}
- base = dt_mem_next_cell(dt_root_addr_cells, &prop);
- size = dt_mem_next_cell(dt_root_size_cells, &prop);
+ uname = of_node_full_name(child);
+
+ base = res.start;
+ size = res.end - res.start + 1;

if (size)
fdt_reserved_mem_save_node(child, uname, base, size);
@@ -269,7 +298,7 @@ int __init fdt_scan_reserved_mem(void)
if (node < 0)
return -ENODEV;

- if (__reserved_mem_check_root(node) != 0) {
+ if (__fdt_reserved_mem_check_root(node) != 0) {
pr_err("Reserved memory: unsupported node format, ignoring\n");
return -EINVAL;
}
@@ -447,7 +476,7 @@ static int __init __reserved_mem_alloc_size(unsigned long node, const char *unam
uname, (unsigned long)(size / SZ_1M));
return -ENOMEM;
}
- fdt_reserved_mem_save_node(node, uname, base, size);
+ fdt_reserved_mem_save_node(NULL, uname, base, size);
return 0;
}

@@ -467,7 +496,7 @@ static int __init __reserved_mem_init_node(struct reserved_mem *rmem)
reservedmem_of_init_fn initfn = i->data;
const char *compat = i->compatible;

- if (!of_flat_dt_is_compatible(rmem->fdt_node, compat))
+ if (!of_device_is_compatible(rmem->dev_node, compat))
continue;

ret = initfn(rmem);
@@ -500,11 +529,6 @@ static int __init __rmem_cmp(const void *a, const void *b)
if (ra->size > rb->size)
return 1;

- if (ra->fdt_node < rb->fdt_node)
- return -1;
- if (ra->fdt_node > rb->fdt_node)
- return 1;
-
return 0;
}

@@ -551,11 +575,11 @@ void __init fdt_init_reserved_mem(void)

for (i = 0; i < reserved_mem_count; i++) {
struct reserved_mem *rmem = &reserved_mem[i];
- unsigned long node = rmem->fdt_node;
+ struct device_node *node = rmem->dev_node;
int err = 0;
bool nomap;

- nomap = of_get_flat_dt_prop(node, "no-map", NULL) != NULL;
+ nomap = of_property_present(node, "no-map");

err = __reserved_mem_init_node(rmem);
if (err != 0 && err != -ENOENT) {
@@ -566,8 +590,7 @@ void __init fdt_init_reserved_mem(void)
memblock_phys_free(rmem->base, rmem->size);
} else {
phys_addr_t end = rmem->base + rmem->size - 1;
- bool reusable =
- (of_get_flat_dt_prop(node, "reusable", NULL)) != NULL;
+ bool reusable = of_property_present(node, "reusable");

pr_info("%pa..%pa (%lu KiB) %s %s %s\n",
&rmem->base, &end, (unsigned long)(rmem->size / SZ_1K),
diff --git a/include/linux/of_reserved_mem.h b/include/linux/of_reserved_mem.h
index e338282da652..769b8f67c8d3 100644
--- a/include/linux/of_reserved_mem.h
+++ b/include/linux/of_reserved_mem.h
@@ -10,7 +10,7 @@ struct reserved_mem_ops;

struct reserved_mem {
const char *name;
- unsigned long fdt_node;
+ struct device_node *dev_node;
const struct reserved_mem_ops *ops;
phys_addr_t base;
phys_addr_t size;
diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index ff5683a57f77..8f99586204fb 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -362,20 +362,18 @@ static const struct reserved_mem_ops rmem_dma_ops = {

static int __init rmem_dma_setup(struct reserved_mem *rmem)
{
- unsigned long node = rmem->fdt_node;
+ struct device_node *node = rmem->dev_node;

- if (of_get_flat_dt_prop(node, "reusable", NULL))
+ if (of_property_present(node, "reusable"))
return -EINVAL;

-#ifdef CONFIG_ARM
- if (!of_get_flat_dt_prop(node, "no-map", NULL)) {
+ if (IS_ENABLED(CONFIG_ARM) && !of_property_present(node, "no-map")) {
pr_err("Reserved memory: regions without no-map are not yet supported\n");
return -EINVAL;
}
-#endif

#ifdef CONFIG_DMA_GLOBAL_POOL
- if (of_get_flat_dt_prop(node, "linux,dma-default", NULL)) {
+ if (of_property_present(node, "linux,dma-default")) {
WARN(dma_reserved_default_memory,
"Reserved memory: region for default DMA coherent area is redefined\n");
dma_reserved_default_memory = rmem;
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 055da410ac71..450e9e4be79c 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -456,8 +456,8 @@ static const struct reserved_mem_ops rmem_cma_ops = {

static int __init rmem_cma_setup(struct reserved_mem *rmem)
{
- unsigned long node = rmem->fdt_node;
- bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
+ struct device_node *node = rmem->dev_node;
+ bool default_cma = of_property_read_bool(node, "linux,cma-default");
struct cma *cma;
int err;

@@ -467,8 +467,8 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem)
return -EBUSY;
}

- if (!of_get_flat_dt_prop(node, "reusable", NULL) ||
- of_get_flat_dt_prop(node, "no-map", NULL))
+ if (!of_property_present(node, "reusable") ||
+ of_property_present(node, "no-map"))
return -EINVAL;

if (!IS_ALIGNED(rmem->base | rmem->size, CMA_MIN_ALIGNMENT_BYTES)) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index fe1ccb53596f..9949ddc14272 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -1859,12 +1859,12 @@ static const struct reserved_mem_ops rmem_swiotlb_ops = {

static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
{
- unsigned long node = rmem->fdt_node;
+ struct device_node *node = rmem->dev_node;

- if (of_get_flat_dt_prop(node, "reusable", NULL) ||
- of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
- of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
- of_get_flat_dt_prop(node, "no-map", NULL))
+ if (of_property_present(node, "reusable") ||
+ of_property_present(node, "linux,cma-default") ||
+ of_property_present(node, "linux,dma-default") ||
+ of_property_present(node, "no-map"))
return -EINVAL;

rmem->ops = &rmem_swiotlb_ops;
--
2.34.1


2024-06-06 15:48:29

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Dynamic Allocation of the reserved_mem array

On Tue, May 28, 2024 at 4:37 PM Oreoluwa Babatunde
<[email protected]> wrote:
>
> The reserved_mem array is used to store data for the different
> reserved memory regions defined in the DT of a device. The array
> stores information such as region name, node reference, start-address,
> and size of the different reserved memory regions.
>
> The array is currently statically allocated with a size of
> MAX_RESERVED_REGIONS(64). This means that any system that specifies a
> number of reserved memory regions greater than MAX_RESERVED_REGIONS(64)
> will not have enough space to store the information for all the regions.
>
> This can be fixed by making the reserved_mem array a dynamically sized
> array which is allocated using memblock_alloc() based on the exact
> number of reserved memory regions defined in the DT.
>
> On architectures such as arm64, memblock allocated memory is not
> writable until after the page tables have been setup.
> This is an issue because the current implementation initializes the
> reserved memory regions and stores their information in the array before
> the page tables are setup. Hence, dynamically allocating the
> reserved_mem array and attempting to write information to it at this
> point will fail.
>
> Therefore, the allocation of the reserved_mem array will need to be done
> after the page tables have been setup, which means that the reserved
> memory regions will also need to wait until after the page tables have
> been setup to be stored in the array.
>
> When processing the reserved memory regions defined in the DT, these
> regions are marked as reserved by calling memblock_reserve(base, size).
> Where: base = base address of the reserved region.
> size = the size of the reserved memory region.
>
> Depending on if that region is defined using the "no-map" property,
> memblock_mark_nomap(base, size) is also called.
>
> The "no-map" property is used to indicate to the operating system that a
> mapping of the specified region must NOT be created. This also means
> that no access (including speculative accesses) is allowed on this
> region of memory except when it is coming from the device driver that
> this region of memory is being reserved for.[1]
>
> Therefore, it is important to call memblock_reserve() and
> memblock_mark_nomap() on all the reserved memory regions before the
> system sets up the page tables so that the system does not unknowingly
> include any of the no-map reserved memory regions in the memory map.
>
> There are two ways to define how/where a reserved memory region is
> placed in memory:
> i) Statically-placed reserved memory regions
> i.e. regions defined with a set start address and size using the
> "reg" property in the DT.
> ii) Dynamically-placed reserved memory regions.
> i.e. regions defined by specifying a range of addresses where they can
> be placed in memory using the "alloc_ranges" and "size" properties
> in the DT.
>
> The dynamically-placed reserved memory regions get assigned a start
> address only at runtime. And this needs to be done before the page
> tables are setup so that memblock_reserve() and memblock_mark_nomap()
> can be called on the allocated region as explained above.
> Since the dynamically allocated reserved_mem array can only available
> after the page tables have been setup, the information for the
> dynamically-placed reserved memory regions needs to be stored somewhere
> temporarily until the reserved_mem array is available.
>
> Therefore, this series makes use of a temporary static array to store
> the information of the dynamically-placed reserved memory regions until
> the reserved_mem array is allocated.
> Once the reserved_mem array is available, the information is copied over
> from the temporary array into the reserved_mem array, and the memory for
> the temporary array is freed back to the system.
>
> The information for the statically-placed reserved memory regions does
> not need to be stored in a temporary array because their starting
> address is already stored in the devicetree.
> Hence, the only thing that needs to be done for these regions before the
> page tables are setup is to call memblock_reserve() and
> memblock_mark_nomap().
> Once the reserved_mem array is allocated, the information for the
> statically-placed reserved memory regions is added to the array.
>
> Note:
> Because of the use of a temporary array to store the information of the
> dynamically-placed reserved memory regions, there still exists a
> limitation of 64 for this particular kind of reserved memory regions.
> From my observation, these regions are typically small in number and
> hence I expect this to not be an issue for now.
>
> Dependency:
> This series is dependent on the below patchset for proper behavior on
> the sh architecture. The patch is currently being reviewed by the
> relevant architecture maintainer and will hopefully be merged soon.
> https://lore.kernel.org/all/[email protected]/

Is it really dependent? This series just adds one additional
allocation before all the reservations are done.

I've applied this series and ran it thru kernelCI. I'm going to add it
to next to get some more testing.

I made one change dropping of_reserved_mem_check_root(). That's done
still with FDT function, so no need to do it twice. Plus, the
unflattened function doesn't have the restriction. (We could fix the
FDT version pretty easily because we have a translation function.)

Rob

2024-06-10 21:35:01

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v6 1/4] of: reserved_mem: Restruture how the reserved memory regions are processed

Hi Oreoluwa,

On Tue, May 28, 2024 at 03:36:47PM -0700, Oreoluwa Babatunde wrote:
> The current implementation processes the reserved memory regions in two
> stages which are done with two separate functions within the
> early_init_fdt_scan_reserved_mem() function.
>
> Within the two stages of processing, the reserved memory regions are
> broken up into two groups which are processed differently:
> i) Statically-placed reserved memory regions
> i.e. regions defined with a static start address and size using the
> "reg" property in the DT.
> ii) Dynamically-placed reserved memory regions.
> i.e. regions defined by specifying a range of addresses where they can
> be placed in memory using the "alloc_ranges" and "size" properties
> in the DT.
>
> Stage 1: fdt_scan_reserved_mem()
> This stage of the reserved memory processing is used to scan through the
> reserved memory nodes defined in the devicetree and do the following on
> each of the nodes:
>
> 1) If the node represents a statically-placed reserved memory region,
> i.e. it is defined using the "reg" property:
> - Call memblock_reserve() or memblock_mark_nomap() as needed.
> - Add the information for the reserved region to the reserved_mem
> array.
> eg: fdt_reserved_mem_save_node(node, name, base, size);
>
> 2) If the node represents a dynamically-placed reserved memory region,
> i.e. it is defined using "alloc-ranges" and "size" properties:
> - Add the information for the region to the reserved_mem array with
> the starting address and size set to 0.
> eg: fdt_reserved_mem_save_node(node, name, 0, 0);
>
> Stage 2: fdt_init_reserved_mem()
> This stage of the reserved memory processing is used to iterate through
> the reserved_mem array which was populated in stage 1 and do the
> following on each of the entries:
>
> 1) If the entry represents a statically-placed reserved memory region:
> - Call the region specific init function.
> 2) If the entry represents a dynamically-placed reserved memory region:
> - Call __reserved_mem_alloc_size() which is used to allocate memory
> for the region using memblock_phys_alloc_range(), and call
> memblock_mark_nomap() on the allocated region if the region is
> specified as a no-map region.
> - Call the region specific init function.
>
> On architectures such as arm64, the dynamic allocation of the
> reserved_mem array needs to be done after the page tables have been
> setup because memblock allocated memory is not writable until then. This
> means that the reserved_mem array will not be available to store any
> reserved memory information until after the page tables have been setup.
>
> It is possible to call memblock_reserve() and memblock_mark_nomap() on
> the statically-placed reserved memory regions and not need to save them
> to the reserved_mem array until later. This is because all the
> information we need is present in the devicetree.
> Dynamically-placed reserved memory regions on the other hand get
> assigned a start address only at runtime, and since memblock_reserve()
> and memblock_mark_nomap() need to be called before the memory mappings
> are created, the allocation needs to happen before the page tables are
> setup.
>
> To make it easier to handle dynamically-placed reserved memory regions
> before the page tables are setup, this patch makes changes to the steps
> above to process the reserved memory regions in the following ways:
>
> Step 1: fdt_scan_reserved_mem()
> This stage of the reserved memory processing is used to scan through the
> reserved memory nodes defined in the devicetree and do the following on
> each of the nodes:
>
> 1) If the node represents a statically-placed reserved memory region,
> i.e. it is defined using the "reg" property:
> - Call memblock_reserve() or memblock_mark_nomap() as needed.
>
> 2) If the node represents a dynamically-placed reserved memory region,
> i.e. it is defined using "alloc-ranges" and "size" properties:
> - Call __reserved_mem_alloc_size() which will:
> i) Allocate memory for the reserved memory region.
> ii) Call memblock_mark_nomap() as needed.
> Note: There is no need to explicitly call memblock_reserve() here
> because it is already called by memblock when the memory for the
> region is being allocated.
> iii) Save the information for the region in the reserved_mem array.
>
> Step 2: fdt_init_reserved_mem()
> This stage of the reserved memory processing is used to:
>
> 1) Add the information for the statically-placed reserved memory into
> the reserved_mem array.
>
> 2) Iterate through all the entries in the array and call the region
> specific init function for each of them.
>
> fdt_init_reserved_mem() is also now called from within the
> unflatten_device_tree() function so that this step happens after the
> page tables have been setup.
>
> Signed-off-by: Oreoluwa Babatunde <[email protected]>

I am seeing a warning when booting aspeed_g5_defconfig in QEMU that I
bisected to this change in -next as commit a46cccb0ee2d ("of:
reserved_mem: Restruture how the reserved memory regions are
processed").

$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- aspeed_g5_defconfig zImage aspeed/aspeed-bmc-opp-romulus.dtb

$ qemu-system-arm \
-display none \
-nodefaults \
-no-reboot \
-machine romulus-bmc \
-dtb arch/arm/boot/dts/aspeed/aspeed-bmc-opp-romulus.dtb \
-kernel arch/arm/boot/zImage \
-initrd rootfs.cpio \
-m 512m \
-serial mon:stdio
...
[ 0.000000] Linux version 6.10.0-rc1-00007-ga46cccb0ee2d (nathan@thelio-3990X) (arm-linux-gnueabi-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP Mon Jun 10 14:20:06 MST 2024
...
[ 0.623194] [drm] Initialized aspeed-gfx-drm 1.0.0 20180319 for 1e6e6000.display on minor 0
[ 0.636348] ------------[ cut here ]------------
[ 0.636613] WARNING: CPU: 0 PID: 1 at mm/memory.c:2789 __apply_to_page_range+0x360/0x380
[ 0.637601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc1-00007-ga46cccb0ee2d #1
[ 0.638001] Hardware name: Generic DT based system
[ 0.638298] Call trace:
[ 0.638743] unwind_backtrace from show_stack+0x18/0x1c
[ 0.639163] show_stack from dump_stack_lvl+0x54/0x68
[ 0.639374] dump_stack_lvl from __warn+0x74/0x114
[ 0.639590] __warn from warn_slowpath_fmt+0x13c/0x1c0
[ 0.639855] warn_slowpath_fmt from __apply_to_page_range+0x360/0x380
[ 0.640110] __apply_to_page_range from apply_to_page_range+0x24/0x2c
[ 0.640356] apply_to_page_range from __alloc_from_contiguous+0xc4/0x158
[ 0.640619] __alloc_from_contiguous from cma_allocator_alloc+0x3c/0x44
[ 0.640870] cma_allocator_alloc from arch_dma_alloc+0x128/0x2b4
[ 0.641100] arch_dma_alloc from dma_alloc_attrs+0x90/0x150
[ 0.641318] dma_alloc_attrs from drm_gem_dma_create+0xa4/0x13c
[ 0.641553] drm_gem_dma_create from drm_gem_dma_create_with_handle+0x24/0xac
[ 0.641833] drm_gem_dma_create_with_handle from drm_gem_dma_dumb_create+0x44/0x50
[ 0.642122] drm_gem_dma_dumb_create from drm_client_framebuffer_create+0x9c/0x164
[ 0.642410] drm_client_framebuffer_create from drm_fbdev_dma_helper_fb_probe+0x84/0x23c
[ 0.642724] drm_fbdev_dma_helper_fb_probe from __drm_fb_helper_initial_config_and_unlock+0x2e4/0x4f8
[ 0.643076] __drm_fb_helper_initial_config_and_unlock from drm_fbdev_dma_client_hotplug+0x74/0xb8
[ 0.643416] drm_fbdev_dma_client_hotplug from drm_client_register+0x5c/0x98
[ 0.643696] drm_client_register from aspeed_gfx_probe+0x278/0x3c0
[ 0.643935] aspeed_gfx_probe from platform_probe+0x60/0xb8
[ 0.644192] platform_probe from really_probe+0xd4/0x3b4
[ 0.644434] really_probe from __driver_probe_device+0x90/0x1dc
[ 0.644697] __driver_probe_device from driver_probe_device+0x38/0xd0
[ 0.644978] driver_probe_device from __driver_attach+0x118/0x1dc
[ 0.645246] __driver_attach from bus_for_each_dev+0x84/0xd4
[ 0.645498] bus_for_each_dev from bus_add_driver+0xec/0x1f0
[ 0.645755] bus_add_driver from driver_register+0x84/0x11c
[ 0.645999] driver_register from do_one_initcall+0x84/0x1c8
[ 0.646351] do_one_initcall from kernel_init_freeable+0x1a4/0x230
[ 0.646624] kernel_init_freeable from kernel_init+0x1c/0x138
[ 0.646885] kernel_init from ret_from_fork+0x14/0x28
[ 0.647154] Exception stack(0x9f015fb0 to 0x9f015ff8)
[ 0.647535] 5fa0: 00000000 00000000 00000000 00000000
[ 0.647895] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 0.648237] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 0.648730] ---[ end trace 0000000000000000 ]---
[ 0.653499] aspeed_gfx 1e6e6000.display: [drm] fb0: aspeed-gfx-drmd frame buffer device
...

The rootfs is from [1] if you should need it (arm-rootfs.cpio.zst,
decompress with zstd first). I am more than happy to provide any
additional information or test patches as necessary.

[1]: https://github.com/ClangBuiltLinux/boot-utils/releases

Cheers,
Nathan

# bad: [d35b2284e966c0bef3e2182a5c5ea02177dd32e4] Add linux-next specific files for 20240607
# good: [8a92980606e3585d72d510a03b59906e96755b8a] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect start 'd35b2284e966c0bef3e2182a5c5ea02177dd32e4' '8a92980606e3585d72d510a03b59906e96755b8a'
# good: [faef37a085e57f29479f853624948cdc7df6e366] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git
git bisect good faef37a085e57f29479f853624948cdc7df6e366
# good: [822946749b5f02d9f49dde4b4cb8f2535c247ce5] Merge branch 'drm-xe-next' of https://gitlab.freedesktop.org/drm/xe/kernel
git bisect good 822946749b5f02d9f49dde4b4cb8f2535c247ce5
# bad: [9ffebdc86bc23fdf0622eb0c38d395c2b99b7f32] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git bisect bad 9ffebdc86bc23fdf0622eb0c38d395c2b99b7f32
# bad: [9a85e49be89a9150fd2ec9964f48013a00c261d1] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect bad 9a85e49be89a9150fd2ec9964f48013a00c261d1
# good: [bd8c07372f9f6b8e496b7f1f0dcc287dd81c0706] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
git bisect good bd8c07372f9f6b8e496b7f1f0dcc287dd81c0706
# good: [a93b9dea60d7c46dca8de8d5180f3ceb3696ab8a] Merge branch 'apparmor-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
git bisect good a93b9dea60d7c46dca8de8d5180f3ceb3696ab8a
# good: [c1b93986dfb2a31b0528fe929d574843801089f5] spi: pxa2xx: Remove hard coded number of chip select pins
git bisect good c1b93986dfb2a31b0528fe929d574843801089f5
# good: [76a621455f7be3d9c963c8a1a27b234d27d14f16] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
git bisect good 76a621455f7be3d9c963c8a1a27b234d27d14f16
# bad: [16227bf21dff0905bfa303f8cbd9a2f7ca51f883] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
git bisect bad 16227bf21dff0905bfa303f8cbd9a2f7ca51f883
# good: [77b023ba4f45ce2bf26edb095b4a81829c9e621b] media: dt-bindings: renesas,rzg2l-csi2: Document Renesas RZ/G2UL CSI-2 block
git bisect good 77b023ba4f45ce2bf26edb095b4a81829c9e621b
# bad: [b8ffe4a11001660522ccc5b2cc892cf8aa03a62e] of: reserved_mem: Add code to dynamically allocate reserved_mem array
git bisect bad b8ffe4a11001660522ccc5b2cc892cf8aa03a62e
# bad: [a46cccb0ee2d62206f6db5e3b85fdd9e2760ee1a] of: reserved_mem: Restruture how the reserved memory regions are processed
git bisect bad a46cccb0ee2d62206f6db5e3b85fdd9e2760ee1a
# good: [83138f8fb798627531be3b5627af4a6008a7bbd6] media: dt-bindings: renesas,rzg2l-cru: Document Renesas RZ/G2UL CRU block
git bisect good 83138f8fb798627531be3b5627af4a6008a7bbd6
# first bad commit: [a46cccb0ee2d62206f6db5e3b85fdd9e2760ee1a] of: reserved_mem: Restruture how the reserved memory regions are processed

2024-06-10 21:47:32

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v6 1/4] of: reserved_mem: Restruture how the reserved memory regions are processed

On Mon, Jun 10, 2024 at 02:34:03PM -0700, Nathan Chancellor wrote:
> On Tue, May 28, 2024 at 03:36:47PM -0700, Oreoluwa Babatunde wrote:

> > fdt_init_reserved_mem() is also now called from within the
> > unflatten_device_tree() function so that this step happens after the
> > page tables have been setup.

> > Signed-off-by: Oreoluwa Babatunde <[email protected]>

> I am seeing a warning when booting aspeed_g5_defconfig in QEMU that I
> bisected to this change in -next as commit a46cccb0ee2d ("of:
> reserved_mem: Restruture how the reserved memory regions are
> processed").

I'm also seeing issues in -next which I bisected to this commit, on the
original Raspberry Pi the cpufreq driver fails to come up and I see
(potentially separate?) backtraces:

[ 0.100390] ------------[ cut here ]------------
[ 0.100476] WARNING: CPU: 0 PID: 1 at mm/memory.c:2835 __apply_to_page_range+0xd4/0x2c8
[ 0.100637] Modules linked in:
[ 0.100665] CPU: 0 PID: 1 Comm: swapper Not tainted 6.10.0-rc2-next-20240607 #1
[ 0.100692] Hardware name: BCM2835
[ 0.100705] Call trace:
[ 0.100727] unwind_backtrace from show_stack+0x18/0x1c
[ 0.100790] show_stack from dump_stack_lvl+0x38/0x48
[ 0.100833] dump_stack_lvl from __warn+0x8c/0xf4
[ 0.100888] __warn from warn_slowpath_fmt+0x80/0xbc
[ 0.100933] warn_slowpath_fmt from __apply_to_page_range+0xd4/0x2c8
[ 0.100983] __apply_to_page_range from apply_to_page_range+0x20/0x28
[ 0.101027] apply_to_page_range from __dma_remap+0x58/0x88
[ 0.101071] __dma_remap from __alloc_from_contiguous+0x6c/0xa8
[ 0.101106] __alloc_from_contiguous from atomic_pool_init+0x9c/0x1c4
[ 0.101169] atomic_pool_init from do_one_initcall+0x68/0x158
[ 0.101223] do_one_initcall from kernel_init_freeable+0x1ac/0x1f0
[ 0.101267] kernel_init_freeable from kernel_init+0x1c/0x140
[ 0.101309] kernel_init from ret_from_fork+0x14/0x28
[ 0.101344] Exception stack(0xdc80dfb0 to 0xdc80dff8)
[ 0.101369] dfa0: 00000000 00000000 00000000 00000000
[ 0.101393] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 0.101414] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 0.101428] ---[ end trace 0000000000000000 ]---

Full boot log at:

https://lava.sirena.org.uk/scheduler/job/374962

You can see the report of cpufreq not being loaded in the log.

NFS boots also fail, apparently due to slowness bringing up a Debian
userspace which may well be due to cpufreq isues:


Attachments:
(No filename) (2.54 kB)
signature.asc (499.00 B)
Download all attachments

2024-06-13 16:06:00

by Oreoluwa Babatunde

[permalink] [raw]
Subject: Re: [PATCH v6 1/4] of: reserved_mem: Restruture how the reserved memory regions are processed


On 6/10/2024 2:47 PM, Mark Brown wrote:
> On Mon, Jun 10, 2024 at 02:34:03PM -0700, Nathan Chancellor wrote:
>> On Tue, May 28, 2024 at 03:36:47PM -0700, Oreoluwa Babatunde wrote:
>>> fdt_init_reserved_mem() is also now called from within the
>>> unflatten_device_tree() function so that this step happens after the
>>> page tables have been setup.
>>> Signed-off-by: Oreoluwa Babatunde <[email protected]>
>> I am seeing a warning when booting aspeed_g5_defconfig in QEMU that I
>> bisected to this change in -next as commit a46cccb0ee2d ("of:
>> reserved_mem: Restruture how the reserved memory regions are
>> processed").
> I'm also seeing issues in -next which I bisected to this commit, on the
> original Raspberry Pi the cpufreq driver fails to come up and I see
> (potentially separate?) backtraces:
>
> [ 0.100390] ------------[ cut here ]------------
> [ 0.100476] WARNING: CPU: 0 PID: 1 at mm/memory.c:2835 __apply_to_page_range+0xd4/0x2c8
> [ 0.100637] Modules linked in:
> [ 0.100665] CPU: 0 PID: 1 Comm: swapper Not tainted 6.10.0-rc2-next-20240607 #1
> [ 0.100692] Hardware name: BCM2835
> [ 0.100705] Call trace:
> [ 0.100727] unwind_backtrace from show_stack+0x18/0x1c
> [ 0.100790] show_stack from dump_stack_lvl+0x38/0x48
> [ 0.100833] dump_stack_lvl from __warn+0x8c/0xf4
> [ 0.100888] __warn from warn_slowpath_fmt+0x80/0xbc
> [ 0.100933] warn_slowpath_fmt from __apply_to_page_range+0xd4/0x2c8
> [ 0.100983] __apply_to_page_range from apply_to_page_range+0x20/0x28
> [ 0.101027] apply_to_page_range from __dma_remap+0x58/0x88
> [ 0.101071] __dma_remap from __alloc_from_contiguous+0x6c/0xa8
> [ 0.101106] __alloc_from_contiguous from atomic_pool_init+0x9c/0x1c4
> [ 0.101169] atomic_pool_init from do_one_initcall+0x68/0x158
> [ 0.101223] do_one_initcall from kernel_init_freeable+0x1ac/0x1f0
> [ 0.101267] kernel_init_freeable from kernel_init+0x1c/0x140
> [ 0.101309] kernel_init from ret_from_fork+0x14/0x28
> [ 0.101344] Exception stack(0xdc80dfb0 to 0xdc80dff8)
> [ 0.101369] dfa0: 00000000 00000000 00000000 00000000
> [ 0.101393] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 0.101414] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 0.101428] ---[ end trace 0000000000000000 ]---
>
> Full boot log at:
>
> https://lava.sirena.org.uk/scheduler/job/374962
>
> You can see the report of cpufreq not being loaded in the log.
>
> NFS boots also fail, apparently due to slowness bringing up a Debian
> userspace which may well be due to cpufreq isues:
Hi Mark & Nathan,

Taking a look at this now and will provide a fix soon if
needed.

At first glance, it looks like there are a couple of WARN_ON*
function calls in __apply_to_page_range(). Please could
you run faddr2line and tell me which of the WARN_ON*
cases we are hitting?

Thank you!

Oreoluwa


2024-06-13 16:17:16

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v6 1/4] of: reserved_mem: Restruture how the reserved memory regions are processed

On Thu, Jun 13, 2024 at 09:05:18AM -0700, Oreoluwa Babatunde wrote:
>
> On 6/10/2024 2:47 PM, Mark Brown wrote:
> > On Mon, Jun 10, 2024 at 02:34:03PM -0700, Nathan Chancellor wrote:
> >> On Tue, May 28, 2024 at 03:36:47PM -0700, Oreoluwa Babatunde wrote:
> >>> fdt_init_reserved_mem() is also now called from within the
> >>> unflatten_device_tree() function so that this step happens after the
> >>> page tables have been setup.
> >>> Signed-off-by: Oreoluwa Babatunde <[email protected]>
> >> I am seeing a warning when booting aspeed_g5_defconfig in QEMU that I
> >> bisected to this change in -next as commit a46cccb0ee2d ("of:
> >> reserved_mem: Restruture how the reserved memory regions are
> >> processed").
> > I'm also seeing issues in -next which I bisected to this commit, on the
> > original Raspberry Pi the cpufreq driver fails to come up and I see
> > (potentially separate?) backtraces:
> >
> > [ 0.100390] ------------[ cut here ]------------
> > [ 0.100476] WARNING: CPU: 0 PID: 1 at mm/memory.c:2835 __apply_to_page_range+0xd4/0x2c8
> > [ 0.100637] Modules linked in:
> > [ 0.100665] CPU: 0 PID: 1 Comm: swapper Not tainted 6.10.0-rc2-next-20240607 #1
> > [ 0.100692] Hardware name: BCM2835
> > [ 0.100705] Call trace:
> > [ 0.100727] unwind_backtrace from show_stack+0x18/0x1c
> > [ 0.100790] show_stack from dump_stack_lvl+0x38/0x48
> > [ 0.100833] dump_stack_lvl from __warn+0x8c/0xf4
> > [ 0.100888] __warn from warn_slowpath_fmt+0x80/0xbc
> > [ 0.100933] warn_slowpath_fmt from __apply_to_page_range+0xd4/0x2c8
> > [ 0.100983] __apply_to_page_range from apply_to_page_range+0x20/0x28
> > [ 0.101027] apply_to_page_range from __dma_remap+0x58/0x88
> > [ 0.101071] __dma_remap from __alloc_from_contiguous+0x6c/0xa8
> > [ 0.101106] __alloc_from_contiguous from atomic_pool_init+0x9c/0x1c4
> > [ 0.101169] atomic_pool_init from do_one_initcall+0x68/0x158
> > [ 0.101223] do_one_initcall from kernel_init_freeable+0x1ac/0x1f0
> > [ 0.101267] kernel_init_freeable from kernel_init+0x1c/0x140
> > [ 0.101309] kernel_init from ret_from_fork+0x14/0x28
> > [ 0.101344] Exception stack(0xdc80dfb0 to 0xdc80dff8)
> > [ 0.101369] dfa0: 00000000 00000000 00000000 00000000
> > [ 0.101393] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [ 0.101414] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 0.101428] ---[ end trace 0000000000000000 ]---
> >
> > Full boot log at:
> >
> > https://lava.sirena.org.uk/scheduler/job/374962
> >
> > You can see the report of cpufreq not being loaded in the log.
> >
> > NFS boots also fail, apparently due to slowness bringing up a Debian
> > userspace which may well be due to cpufreq isues:
> Hi Mark & Nathan,
>
> Taking a look at this now and will provide a fix soon if
> needed.
>
> At first glance, it looks like there are a couple of WARN_ON*
> function calls in __apply_to_page_range(). Please could
> you run faddr2line and tell me which of the WARN_ON*
> cases we are hitting?

That shouldn't be needed, right? The line is in the WARNING: mm/memory.c:2835
which, in next-20240607, is: if (WARN_ON_ONCE(pmd_leaf(*pmd))).

Thanks,
Conor.


Attachments:
(No filename) (3.27 kB)
signature.asc (235.00 B)
Download all attachments

2024-06-13 16:38:35

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH v6 1/4] of: reserved_mem: Restruture how the reserved memory regions are processed

On 13/06/2024 5:17 pm, Conor Dooley wrote:
> On Thu, Jun 13, 2024 at 09:05:18AM -0700, Oreoluwa Babatunde wrote:
>>
>> On 6/10/2024 2:47 PM, Mark Brown wrote:
>>> On Mon, Jun 10, 2024 at 02:34:03PM -0700, Nathan Chancellor wrote:
>>>> On Tue, May 28, 2024 at 03:36:47PM -0700, Oreoluwa Babatunde wrote:
>>>>> fdt_init_reserved_mem() is also now called from within the
>>>>> unflatten_device_tree() function so that this step happens after the
>>>>> page tables have been setup.
>>>>> Signed-off-by: Oreoluwa Babatunde <[email protected]>
>>>> I am seeing a warning when booting aspeed_g5_defconfig in QEMU that I
>>>> bisected to this change in -next as commit a46cccb0ee2d ("of:
>>>> reserved_mem: Restruture how the reserved memory regions are
>>>> processed").
>>> I'm also seeing issues in -next which I bisected to this commit, on the
>>> original Raspberry Pi the cpufreq driver fails to come up and I see
>>> (potentially separate?) backtraces:
>>>
>>> [ 0.100390] ------------[ cut here ]------------
>>> [ 0.100476] WARNING: CPU: 0 PID: 1 at mm/memory.c:2835 __apply_to_page_range+0xd4/0x2c8
>>> [ 0.100637] Modules linked in:
>>> [ 0.100665] CPU: 0 PID: 1 Comm: swapper Not tainted 6.10.0-rc2-next-20240607 #1
>>> [ 0.100692] Hardware name: BCM2835
>>> [ 0.100705] Call trace:
>>> [ 0.100727] unwind_backtrace from show_stack+0x18/0x1c
>>> [ 0.100790] show_stack from dump_stack_lvl+0x38/0x48
>>> [ 0.100833] dump_stack_lvl from __warn+0x8c/0xf4
>>> [ 0.100888] __warn from warn_slowpath_fmt+0x80/0xbc
>>> [ 0.100933] warn_slowpath_fmt from __apply_to_page_range+0xd4/0x2c8
>>> [ 0.100983] __apply_to_page_range from apply_to_page_range+0x20/0x28
>>> [ 0.101027] apply_to_page_range from __dma_remap+0x58/0x88
>>> [ 0.101071] __dma_remap from __alloc_from_contiguous+0x6c/0xa8
>>> [ 0.101106] __alloc_from_contiguous from atomic_pool_init+0x9c/0x1c4
>>> [ 0.101169] atomic_pool_init from do_one_initcall+0x68/0x158
>>> [ 0.101223] do_one_initcall from kernel_init_freeable+0x1ac/0x1f0
>>> [ 0.101267] kernel_init_freeable from kernel_init+0x1c/0x140
>>> [ 0.101309] kernel_init from ret_from_fork+0x14/0x28
>>> [ 0.101344] Exception stack(0xdc80dfb0 to 0xdc80dff8)
>>> [ 0.101369] dfa0: 00000000 00000000 00000000 00000000
>>> [ 0.101393] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [ 0.101414] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> [ 0.101428] ---[ end trace 0000000000000000 ]---
>>>
>>> Full boot log at:
>>>
>>> https://lava.sirena.org.uk/scheduler/job/374962
>>>
>>> You can see the report of cpufreq not being loaded in the log.
>>>
>>> NFS boots also fail, apparently due to slowness bringing up a Debian
>>> userspace which may well be due to cpufreq isues:
>> Hi Mark & Nathan,
>>
>> Taking a look at this now and will provide a fix soon if
>> needed.
>>
>> At first glance, it looks like there are a couple of WARN_ON*
>> function calls in __apply_to_page_range(). Please could
>> you run faddr2line and tell me which of the WARN_ON*
>> cases we are hitting?
>
> That shouldn't be needed, right? The line is in the WARNING: mm/memory.c:2835
> which, in next-20240607, is: if (WARN_ON_ONCE(pmd_leaf(*pmd))).

Indeed, and the overall implication there would seem to be that, because
the dynamic CMA region wasn't allocated and reserved before we created
the pagetables, we thus haven't created the pagetables in a shape which
can accommodate chopping it out again later. Note that on arm64 at
least, this is liable to be hidden by other options like rodata_full and
debug_pagealloc - see can_set_direct_map().

Thanks,
Robin.