2022-10-31 20:31:57

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v13 0/7] crash: Kernel handling of CPU and memory hot un/plug

When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr, which describes the CPUs
and memory in the system, must also be updated, else the resulting
vmcore is inaccurate (eg. missing either CPU context or memory
regions).

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (eg. kernel, initrd, boot_params, puratory and
elfcorehdr) by the userspace kexec utility. In previous posts I have
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash hot un/plug handler that
registers with the CPU and memory notifiers. Upon CPU or memory
changes, this generic handler is invoked and performs important
housekeeping, for example obtaining the appropriate lock, and then
invokes an architecture specific handler to do the appropriate
updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory; thus no
involvement with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

- Prevent udev from updating kdump crash kernel on hot un/plug changes.
Add the following as the first lines to the RHEL udev rule file
/usr/lib/udev/rules.d/98-kexec.rules:

# The kernel handles updates to crash elfcorehdr for cpu and memory changes
SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

These lines will cause cpu and memory hot un/plug events to be
skipped within this rule file, with this changset applied.

- Change to the kexec_file_load for loading the kdump kernel:
Eg. on RHEL: in /usr/bin/kdumpctl, change to:
standard_kexec_args="-p -d -s"
which adds the -s to select kexec_file_load syscall.

This kernel patchset also supports kexec_load() with a modified kexec
userspace utility. A working changeset to the kexec userspace utility
is posted to the kexec-tools mailing list here:

http://lists.infradead.org/pipermail/kexec/2022-October/026032.html

To use the kexec-tools patch, apply, build and install kexec-tools,
then change the kdumpctl's standard_kexec_args to replace the -s with
--hotplug. The removal of -s reverts to the kexec_load syscall and
the addition of --hotplug invokes the changes put forth in the
kexec-tools patch.

Regards,
eric
---
v13: 31oct2022
- Rebased onto 6.1.0-rc3, which means converting to use the new
kexec_trylock() away from mutex_lock(kexec_mutex).
- Moved arch_un/map_crash_pages() into kexec.h and default
implementation using k/unmap_local_pages().
- Changed more #ifdef's into IS_ENABLED()
- Changed CRASH_MAX_MEMORY_RANGES to 8192 from 32768, and it moved
into x86 crash.c as #define rather Kconfig item, per Boris.
- Check number of Phdrs against PN_XNUM, max possible.

v12: 9sep2022
https://lkml.org/lkml/2022/9/9/1358
- Rebased onto 6.0-rc4
- Addressed some minor formatting items, per Baoquan

v11: 26aug2022
https://lkml.org/lkml/2022/8/26/963
- Rebased onto 6.0-rc2
- Redid the rework of __weak to use asm/kexec.h, per Baoquan
- Reworked some comments and minor items, per Baoquan

v10: 21jul2022
https://lkml.org/lkml/2022/7/21/1007
- Rebased to 5.19.0-rc7
- Per Sourabh, corrected build issue with arch_un/map_crash_pages()
for architectures not supporting this feature.
- Per David Hildebrand, removed the WARN_ONCE() altogether.
- Per David Hansen, converted to use of kmap_local_page().
- Per Baoquan He, replaced use of __weak with the kexec technique.

v9: 13jun2022
https://lkml.org/lkml/2022/6/13/3382
- Rebased to 5.18.0
- Per Sourabh, moved crash_prepare_elf64_headers() into common
crash_core.c to avoid compile issues with kexec_load only path.
- Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
- Changed the __weak arch_crash_handle_hotplug_event() to utilize
WARN_ONCE() instead of WARN(). Fix some formatting issues.
- Per Sourabh, introduced sysfs attribute crash_hotplug for memory
and CPUs; for use by userspace (udev) to determine if the kernel
performs crash hot un/plug support.
- Per Sourabh, moved the code detecting the elfcorehdr segment from
arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
and kexec_file_load can benefit.
- Updated userspace kexec-tools kexec utility to reflect change to
using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
- Updated the new proposed udev rules to reflect using the sysfs
attributes crash_hotplug.

v8: 5may2022
https://lkml.org/lkml/2022/5/5/1133
- Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
is not needed. Also use of IS_ENABLED() rather than #ifdef's.
Renamed crash_hotplug_handler() to handle_hotplug_event().
And other corrections.
- Per Baoquan, minimized the parameters to the arch_crash_
handle_hotplug_event() to hp_action and cpu.
- Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
- Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
by David Hildebrand. Folded this patch into the x86
kexec_file_load support patch.

v7: 13apr2022
https://lkml.org/lkml/2022/4/13/850
- Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
https://lkml.org/lkml/2022/4/1/1203
- Reword commit messages and some comment cleanup per Baoquan.
- Changed elf_index to elfcorehdr_index for clarity.
- Minor code changes per Baoquan.

v5: 3mar2022
https://lkml.org/lkml/2022/3/3/674
- Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
David Hildenbrand.
- Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
https://lkml.org/lkml/2022/2/9/1406
- Refactored patches per Baoquan suggestsions.
- A few corrections, per Baoquan.

v3: 10jan2022
https://lkml.org/lkml/2022/1/10/1212
- Rebasing per Baoquan He request.
- Changed memory notifier per David Hildenbrand.
- Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
https://lkml.org/lkml/2021/12/7/1088
- Acting upon Baoquan He suggestion of removing elfcorehdr from
the purgatory list of segments, removed purgatory code from
patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
https://lkml.org/lkml/2021/11/18/845
- working patchset demonstrating kernel handling of hotplug
updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
https://lkml.org/lkml/2020/12/14/532
- proposed concept of allowing kernel to handle hotplug update
of elfcorehdr
---

Eric DeVolder (7):
crash: move crash_prepare_elf64_headers()
crash: prototype change for crash_prepare_elf64_headers()
crash: add generic infrastructure for crash hotplug support
kexec: exclude elfcorehdr from the segment digest
kexec: exclude hot remove cpu from elfcorehdr notes
crash: memory and cpu hotplug sysfs attributes
x86/crash: add x86 crash hotplug support

.../admin-guide/mm/memory-hotplug.rst | 8 +
Documentation/core-api/cpu_hotplug.rst | 18 ++
arch/arm64/kernel/machine_kexec_file.c | 6 +-
arch/powerpc/kexec/file_load_64.c | 2 +-
arch/riscv/kernel/elf_kexec.c | 7 +-
arch/x86/include/asm/kexec.h | 14 +
arch/x86/kernel/crash.c | 110 +++++++-
drivers/base/cpu.c | 14 +
drivers/base/memory.c | 13 +
include/linux/crash_core.h | 8 +
include/linux/kexec.h | 54 +++-
kernel/crash_core.c | 255 ++++++++++++++++++
kernel/kexec_file.c | 105 +-------
13 files changed, 502 insertions(+), 112 deletions(-)

--
2.31.1



2022-10-31 20:52:28

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v13 2/7] crash: prototype change for crash_prepare_elf64_headers()

From within crash_prepare_elf64_headers() there is a need to
reference the struct kimage hotplug members. As such, this
change passes the struct kimage as a parameter to the
crash_prepare_elf64_headers(). The hotplug members are added
in "crash: add generic infrastructure for crash hotplug support".

This is preparation for later patch, no functionality change.

Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
---
arch/arm64/kernel/machine_kexec_file.c | 6 +++---
arch/powerpc/kexec/file_load_64.c | 2 +-
arch/riscv/kernel/elf_kexec.c | 7 ++++---
arch/x86/kernel/crash.c | 2 +-
include/linux/kexec.h | 7 +++++--
kernel/crash_core.c | 4 ++--
6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index a11a6e14ba89..2f7b773a83bb 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -39,7 +39,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
return kexec_image_post_load_cleanup_default(image);
}

-static int prepare_elf_headers(void **addr, unsigned long *sz)
+static int prepare_elf_headers(struct kimage *image, void **addr, unsigned long *sz)
{
struct crash_mem *cmem;
unsigned int nr_ranges;
@@ -64,7 +64,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
}

/* Exclude crashkernel region */
- ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
+ ret = crash_exclude_mem_range(image, cmem, crashk_res.start, crashk_res.end);
if (ret)
goto out;

@@ -74,7 +74,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
goto out;
}

- ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
+ ret = crash_prepare_elf64_headers(image, cmem, true, addr, sz);

out:
kfree(cmem);
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 349a781cea0b..a0af9966a8f0 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -798,7 +798,7 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
goto out;

/* Setup elfcorehdr segment */
- ret = crash_prepare_elf64_headers(cmem, false, &headers, &headers_sz);
+ ret = crash_prepare_elf64_headers(image, cmem, false, &headers, &headers_sz);
if (ret) {
pr_err("Failed to prepare elf headers for the core\n");
goto out;
diff --git a/arch/riscv/kernel/elf_kexec.c b/arch/riscv/kernel/elf_kexec.c
index 0cb94992c15b..ffde73228108 100644
--- a/arch/riscv/kernel/elf_kexec.c
+++ b/arch/riscv/kernel/elf_kexec.c
@@ -118,7 +118,8 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
return 0;
}

-static int prepare_elf_headers(void **addr, unsigned long *sz)
+static int prepare_elf_headers(struct kimage *image,
+ void **addr, unsigned long *sz)
{
struct crash_mem *cmem;
unsigned int nr_ranges;
@@ -140,7 +141,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
/* Exclude crashkernel region */
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (!ret)
- ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
+ ret = crash_prepare_elf64_headers(image, cmem, true, addr, sz);

out:
kfree(cmem);
@@ -212,7 +213,7 @@ static void *elf_kexec_load(struct kimage *image, char *kernel_buf,

/* Add elfcorehdr */
if (image->type == KEXEC_TYPE_CRASH) {
- ret = prepare_elf_headers(&headers, &headers_sz);
+ ret = prepare_elf_headers(image, &headers, &headers_sz);
if (ret) {
pr_err("Preparing elf core header failed\n");
goto out;
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9730c88530fc..9ceb93c176a6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -265,7 +265,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
goto out;

/* By default prepare 64bit headers */
- ret = crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);
+ ret = crash_prepare_elf64_headers(image, cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);

out:
vfree(cmem);
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 41a686996aaa..ebf46c3b8f8b 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -253,8 +253,11 @@ struct crash_mem {
extern int crash_exclude_mem_range(struct crash_mem *mem,
unsigned long long mstart,
unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
- void **addr, unsigned long *sz);
+extern int crash_prepare_elf64_headers(struct kimage *image,
+ struct crash_mem *mem,
+ int need_kernel_map,
+ void **addr,
+ unsigned long *sz);

#ifndef arch_kexec_apply_relocations_add
/*
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 46c160d14045..8c648fd5897a 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -315,8 +315,8 @@ static int __init parse_crashkernel_dummy(char *arg)
}
early_param("crashkernel", parse_crashkernel_dummy);

-int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
- void **addr, unsigned long *sz)
+int crash_prepare_elf64_headers(struct kimage *image, struct crash_mem *mem,
+ int need_kernel_map, void **addr, unsigned long *sz)
{
Elf64_Ehdr *ehdr;
Elf64_Phdr *phdr;
--
2.31.1