2024-03-22 05:15:15

by Dave Young

[permalink] [raw]
Subject: [PATCH V3] x86/kexec: do not update E820 kexec table for setup_data

crashkernel reservation failed on a Thinkpad t440s laptop recently.
Actually the memblock reservation succeeded, but later insert_resource()
failed.

Test steps:
kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
kexec reboot ->
dmesg|grep "crashkernel reserved";
crashkernel memory range like below reserved successfully:
0x00000000d0000000 - 0x00000000da000000
But no such "Crash kernel" region in /proc/iomem

The background story is like below:

Currently E820 code reserves setup_data regions for both the current
kernel and the kexec kernel, and it inserts them into the resources list.
Before the kexec kernel reboots nobody passes the old setup_data, and
kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed.
Thus the old setup data memory is not used at all.

Due to old kernel updates the kexec e820 table as well so kexec kernel
sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
regions are inserted into resources list in the kexec kernel by
e820__reserve_resources().

Note, due to no setup_data is passed in for those old regions they are not
early reserved (by function early_reserve_memory), and the crashkernel
memblock reservation will just treat them as usable memory and it could
reserve the crashkernel region which overlaps with the old setup_data
regions. And just like the bug I noticed here, kdump insert_resource
failed because e820__reserve_resources has added the overlapped chunks
in /proc/iomem already.

Finally, looking at the code, the old setup_data regions are not used
at all as no setup_data is passed in by the kexec boot loader. Although
something like SETUP_PCI etc could be needed, kexec should pass
the info as new setup_data so that kexec kernel can take care of them.
This should be taken care of in other separate patches if needed.

Thus drop the useless buggy code here.

Signed-off-by: Dave Young <[email protected]>
---
V3: Rebase to latest mainline [Jiri Bohac]
V2: changelog grammar fixes [suggestions from Huang Kai]
arch/x86/kernel/e820.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)

Index: linux/arch/x86/kernel/e820.c
===================================================================
--- linux.orig/arch/x86/kernel/e820.c
+++ linux/arch/x86/kernel/e820.c
@@ -1016,17 +1016,6 @@ void __init e820__reserve_setup_data(voi

e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);

- /*
- * SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED are supplied by
- * kexec and do not need to be reserved.
- */
- if (data->type != SETUP_EFI &&
- data->type != SETUP_IMA &&
- data->type != SETUP_RNG_SEED)
- e820__range_update_kexec(pa_data,
- sizeof(*data) + data->len,
- E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
if (data->type == SETUP_INDIRECT) {
len += data->len;
early_memunmap(data, sizeof(*data));
@@ -1038,12 +1027,9 @@ void __init e820__reserve_setup_data(voi

indirect = (struct setup_indirect *)data->data;

- if (indirect->type != SETUP_INDIRECT) {
+ if (indirect->type != SETUP_INDIRECT)
e820__range_update(indirect->addr, indirect->len,
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
- e820__range_update_kexec(indirect->addr, indirect->len,
- E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
- }
}

pa_data = pa_next;
@@ -1051,7 +1037,6 @@ void __init e820__reserve_setup_data(voi
}

e820__update_table(e820_table);
- e820__update_table(e820_table_kexec);

pr_info("extended physical RAM map:\n");
e820__print_table("reserve setup_data");



Subject: [tip: x86/urgent] x86/kexec: Do not update E820 kexec table for setup_data

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: fc7f27cda843ce294c71767d42b9d8abd015d7cb
Gitweb: https://git.kernel.org/tip/fc7f27cda843ce294c71767d42b9d8abd015d7cb
Author: Dave Young <[email protected]>
AuthorDate: Fri, 22 Mar 2024 13:15:08 +08:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 22 Mar 2024 10:07:45 +01:00

x86/kexec: Do not update E820 kexec table for setup_data

crashkernel reservation failed on a Thinkpad t440s laptop recently.
Actually the memblock reservation succeeded, but later insert_resource()
failed.

Test steps:
kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
kexec reboot ->
dmesg|grep "crashkernel reserved";
crashkernel memory range like below reserved successfully:
0x00000000d0000000 - 0x00000000da000000
But no such "Crash kernel" region in /proc/iomem

The background story:

Currently the E820 code reserves setup_data regions for both the current
kernel and the kexec kernel, and it inserts them into the resources list.

Before the kexec kernel reboots nobody passes the old setup_data, and
kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed.
Thus the old setup data memory is not used at all.

Due to old kernel updates the kexec e820 table as well so kexec kernel
sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
regions are inserted into resources list in the kexec kernel by
e820__reserve_resources().

Note, due to no setup_data is passed in for those old regions they are not
early reserved (by function early_reserve_memory), and the crashkernel
memblock reservation will just treat them as usable memory and it could
reserve the crashkernel region which overlaps with the old setup_data
regions. And just like the bug I noticed here, kdump insert_resource
failed because e820__reserve_resources has added the overlapped chunks
in /proc/iomem already.

Finally, looking at the code, the old setup_data regions are not used
at all as no setup_data is passed in by the kexec boot loader. Although
something like SETUP_PCI etc could be needed, kexec should pass
the info as new setup_data so that kexec kernel can take care of them.
This should be taken care of in other separate patches if needed.

Thus drop the useless buggy code here.

Signed-off-by: Dave Young <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Jiri Bohac <[email protected]>
Cc: Eric DeVolder <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/e820.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index b66f540..6f1b379 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1016,17 +1016,6 @@ void __init e820__reserve_setup_data(void)

e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);

- /*
- * SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED are supplied by
- * kexec and do not need to be reserved.
- */
- if (data->type != SETUP_EFI &&
- data->type != SETUP_IMA &&
- data->type != SETUP_RNG_SEED)
- e820__range_update_kexec(pa_data,
- sizeof(*data) + data->len,
- E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
if (data->type == SETUP_INDIRECT) {
len += data->len;
early_memunmap(data, sizeof(*data));
@@ -1038,12 +1027,9 @@ void __init e820__reserve_setup_data(void)

indirect = (struct setup_indirect *)data->data;

- if (indirect->type != SETUP_INDIRECT) {
+ if (indirect->type != SETUP_INDIRECT)
e820__range_update(indirect->addr, indirect->len,
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
- e820__range_update_kexec(indirect->addr, indirect->len,
- E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
- }
}

pa_data = pa_next;
@@ -1051,7 +1037,6 @@ void __init e820__reserve_setup_data(void)
}

e820__update_table(e820_table);
- e820__update_table(e820_table_kexec);

pr_info("extended physical RAM map:\n");
e820__print_table("reserve setup_data");