2021-05-26 20:29:11

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 0/4] arm64: Make kexec_file_load honor iomem reservations

This series is a complete departure from the approach I initially sent
almost a month ago[1]. Instead of trying to teach EFI, ACPI and other
subsystem to use memblock, I've decided to stick with the iomem
resource tree and use that exclusively for arm64.

This means that my current approach is (despite what I initially
replied to both Dave and Catalin) to provide an arm64-specific
implementation of arch_kexec_locate_mem_hole() which walks the
resource tree and excludes ranges of RAM that have been registered for
any odd purpose. This is exactly what the userspace implementation
does, and I don't really see a good reason to diverge from it.

Again, this allows my Synquacer board to reliably use kexec_file_load
with as little as 256M, something that would always fail before as it
would overwrite most of the reserved tables.

Obviously, this is now at least 5.14 material. Given how broken
kexec_file_load is for non-crash kernels on arm64 at the moment,
should we at least disable it in 5.13 and all previous stable kernels?

Thanks,

M.

[1] https://lore.kernel.org/r/[email protected]

Marc Zyngier (4):
kexec_file: Make locate_mem_hole_callback global
kernel/resource: Populate child pointer in find_next_iomem_res()
kernel/resource: Add walk_excluding_child_res() helper
arm64: kexec_image: Implement arch_kexec_locate_mem_hole()

arch/arm64/kernel/kexec_image.c | 45 ++++++++++++++++++
include/linux/ioport.h | 4 ++
include/linux/kexec.h | 1 +
kernel/kexec_file.c | 6 +--
kernel/resource.c | 81 +++++++++++++++++++++++++++++++++
5 files changed, 134 insertions(+), 3 deletions(-)

--
2.30.2


2021-05-26 20:30:22

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH 3/4] kernel/resource: Add walk_excluding_child_res() helper

Once we have obtained a resource of a certain type from
find_next_iomem_res(), it doesn't necessarily mean that the whole
resource is usable, and we have cases where a child resource
denotes an exclusion in the initial resource.

Provide a new walker that deals with this exact case, and calls
a callback on each resource fragment that doesn't have a child.

Signed-off-by: Marc Zyngier <[email protected]>
---
include/linux/ioport.h | 4 +++
kernel/resource.c | 80 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 84 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 8359c50f9988..526314a42ad2 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -323,6 +323,10 @@ extern int
walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end,
void *arg, int (*func)(struct resource *, void *));

+extern int
+walk_excluding_child_res(struct resource *res, void *arg,
+ int (*func)(struct resource *, void *));
+
struct resource *devm_request_free_mem_region(struct device *dev,
struct resource *base, unsigned long size);
struct resource *request_free_mem_region(struct resource *base,
diff --git a/kernel/resource.c b/kernel/resource.c
index 311b8d2c9957..1d9b5f653938 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -318,6 +318,86 @@ int release_resource(struct resource *old)

EXPORT_SYMBOL(release_resource);

+/**
+ * walk_excluding_child_res - call a callback function on each fragment of
+ * a resource that do not have a child resource
+ *
+ * @res: the root resource containing the initial range
+ * @arg: function argument for the callback @func
+ * @func: callback function that is called for each qualifying resource area
+ *
+ * For a given resource, remove all the child resources and feed the
+ * resulting fragments to kexec_locate_mem_hole_callback().
+ */
+int walk_excluding_child_res(struct resource *res, void *arg,
+ int (*func)(struct resource *, void *))
+{
+ struct resource *tmp, cursor;
+ int ret = 0;
+
+ cursor = *res;
+
+ /* Use .child for the head of the list, .sibling for the tail */
+ cursor.child = cursor.sibling = NULL;
+
+ read_lock(&resource_lock);
+
+ for (tmp = res->child; tmp; tmp = tmp->sibling) {
+ struct resource *new;
+
+ if (cursor.start < tmp->start) {
+ new = kmalloc(sizeof(*new), GFP_KERNEL);
+ if (!new)
+ goto cleanup;
+
+ *new = (struct resource) {
+ .start = cursor.start,
+ .end = tmp->start - 1,
+ .flags = res->flags,
+ .desc = res->desc,
+ .parent = res->parent,
+ };
+
+ if (!cursor.child)
+ cursor.child = new;
+ if (cursor.sibling)
+ cursor.sibling->sibling = new;
+ cursor.sibling = new;
+ }
+
+ /*
+ * This may result in a resource with a negative size
+ * at the very end of the loop.
+ */
+ cursor.start = tmp->end + 1;
+ }
+
+ read_unlock(&resource_lock);
+
+ /*
+ * At this stage, the list pointed to by cursor.child contains
+ * every non-reserved blocks, completed by 'cursor' which
+ * contains the potential last block (may be empty).
+ */
+ for (tmp = cursor.child; tmp; tmp = tmp->sibling) {
+ ret = func(tmp, arg);
+ if (ret)
+ break;
+ }
+
+ if (!ret && cursor.start <= cursor.end)
+ ret = func(&cursor, tmp);
+
+cleanup:
+ while (cursor.child) {
+ tmp = cursor.child;
+ cursor.child = cursor.child->sibling;
+ kfree(tmp);
+ }
+
+ return ret;
+}
+
/**
* find_next_iomem_res - Finds the lowest iomem resource that covers part of
* [@start..@end].
--
2.30.2

2021-05-27 19:04:10

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 0/4] arm64: Make kexec_file_load honor iomem reservations

On Wed, May 26, 2021 at 08:05:27PM +0100, Marc Zyngier wrote:
> This series is a complete departure from the approach I initially sent
> almost a month ago[1]. Instead of trying to teach EFI, ACPI and other
> subsystem to use memblock, I've decided to stick with the iomem
> resource tree and use that exclusively for arm64.
>
> This means that my current approach is (despite what I initially
> replied to both Dave and Catalin) to provide an arm64-specific
> implementation of arch_kexec_locate_mem_hole() which walks the
> resource tree and excludes ranges of RAM that have been registered for
> any odd purpose. This is exactly what the userspace implementation
> does, and I don't really see a good reason to diverge from it.
>
> Again, this allows my Synquacer board to reliably use kexec_file_load
> with as little as 256M, something that would always fail before as it
> would overwrite most of the reserved tables.
>
> Obviously, this is now at least 5.14 material. Given how broken
> kexec_file_load is for non-crash kernels on arm64 at the moment,
> should we at least disable it in 5.13 and all previous stable kernels?

I think it makes sense to disable it in the current and earlier kernels.

For this series:

Acked-by: Catalin Marinas <[email protected]>

2021-05-31 06:10:07

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 0/4] arm64: Make kexec_file_load honor iomem reservations

On Thu, 27 May 2021 at 19:39, Catalin Marinas <[email protected]> wrote:
>
> On Wed, May 26, 2021 at 08:05:27PM +0100, Marc Zyngier wrote:
> > This series is a complete departure from the approach I initially sent
> > almost a month ago[1]. Instead of trying to teach EFI, ACPI and other
> > subsystem to use memblock, I've decided to stick with the iomem
> > resource tree and use that exclusively for arm64.
> >
> > This means that my current approach is (despite what I initially
> > replied to both Dave and Catalin) to provide an arm64-specific
> > implementation of arch_kexec_locate_mem_hole() which walks the
> > resource tree and excludes ranges of RAM that have been registered for
> > any odd purpose. This is exactly what the userspace implementation
> > does, and I don't really see a good reason to diverge from it.
> >
> > Again, this allows my Synquacer board to reliably use kexec_file_load
> > with as little as 256M, something that would always fail before as it
> > would overwrite most of the reserved tables.
> >
> > Obviously, this is now at least 5.14 material. Given how broken
> > kexec_file_load is for non-crash kernels on arm64 at the moment,
> > should we at least disable it in 5.13 and all previous stable kernels?
>
> I think it makes sense to disable it in the current and earlier kernels.
>

Ack to that

> For this series:
>
> Acked-by: Catalin Marinas <[email protected]>

and likewise for the series

Reviewed-by: Ard Biesheuvel <[email protected]>