2023-04-04 18:07:58

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v21 0/7] crash: Kernel handling of CPU and memory hot un/plug

Once the kdump service is loaded, if changes to CPUs or memory occur,
either by hot un/plug or off/onlining, the crash elfcorehdr must also
be updated.

The elfcorehdr describes to kdump the CPUs and memory in the system,
and any inaccuracies can result in a vmcore with missing CPU context
or memory regions.

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (eg. kernel, initrd, boot_params, purgatory and
elfcorehdr) by the userspace kexec utility. In the original post I
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash handler that registers with
the CPU and memory notifiers. Upon CPU or memory changes, from either
hot un/plug or off/onlining, this generic handler is invoked and
performs important housekeeping, for example obtaining the appropriate
lock, and then invokes an architecture specific handler to do the
appropriate elfcorehdr update.

Note the description in patch 'crash: change crash_prepare_elf64_headers()
to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
enables further optimizations related to CPU plug/unplug/online/offline
performance of elfcorehdr updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory; thus no involvement
with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

- Prevent udev from updating kdump crash kernel on hot un/plug changes.
Add the following as the first lines to the RHEL udev rule file
/usr/lib/udev/rules.d/98-kexec.rules:

# The kernel updates the crash elfcorehdr for CPU and memory changes
SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

With this changeset applied, the two rules evaluate to false for
CPU and memory change events and thus skip the userspace
unload-then-reload of kdump.

- Change to the kexec_file_load for loading the kdump kernel:
Eg. on RHEL: in /usr/bin/kdumpctl, change to:
standard_kexec_args="-p -d -s"
which adds the -s to select kexec_file_load() syscall.

This kernel patchset also supports kexec_load() with a modified kexec
userspace utility. A working changeset to the kexec userspace utility
is posted to the kexec-tools mailing list here:

http://lists.infradead.org/pipermail/kexec/2022-October/026032.html

To use the kexec-tools patch, apply, build and install kexec-tools,
then change the kdumpctl's standard_kexec_args to replace the -s with
--hotplug. The removal of -s reverts to the kexec_load syscall and
the addition of --hotplug invokes the changes put forth in the
kexec-tools patch.

Regards,
eric
---
v21: 4apr2023
- Rebased onto 6.3.0-rc5
- Additional simplification of indentation in crash_handle_hotplug_event(),
per Baoquan.

v20: 17mar2023
https://lkml.org/lkml/2023/3/17/1169
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.3.0-rc2
- Defaulting CRASH_HOTPLUG for x86 to Y, per Sourabh.
- Explicitly initializing image->hp_action, per Baoquan.
- Simplified kexec_trylock() in crash_handle_hotplug_event(),
per Baoquan.
- Applied Sourabh's Reviewed-by to the series.

v19: 6mar2023
https://lkml.org/lkml/2023/3/6/1358
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0
- Did away with offlinecpu, per Thomas Gleixner.
- Changed to CPUHP_BP_PREPARE_DYN instead of CPUHP_AP_ONLINE_DYN.
- Did away with elfcorehdr_index_valid, per Sourabh.
- Convert to for_each_possible_cpu() in crash_prepare_elf64_headers()
per Sourabh.
- Small optimization for x86 cpu changes.

v18: 31jan2023
https://lkml.org/lkml/2023/1/31/1356
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0-rc6
- Renamed struct kimage member hotplug_event to hp_action, and
re-enumerated the KEXEC_CRASH_HP_x items, adding _NONE at 0.
- Moved to cpuhp state CPUHP_BP_PREPARE_DYN instead of
CPUHP_AP_ONLINE_DYN in order to minimize window of time CPU
is not reflected in elfcorehdr.
- Reworked some of the comments and commit messages to offer
more of the why, than what, per Thomas Gleixner.

v17: 18jan2023
https://lkml.org/lkml/2023/1/18/1420
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0-rc4
- Moved a bit of code around so that kexec_load()-only builds
work, per Sourabh.
- Corrected computation of number of memory region Phdrs needed
when x86 memory hotplug is not enabled, per Baoquan.

v16: 5jan2023
https://lkml.org/lkml/2023/1/5/673
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0-rc2
- Corrected error identified by Baoquan.

v15: 9dec2022
https://lkml.org/lkml/2022/12/9/520
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.1.0-rc8
- Replaced arch_un/map_crash_pages() with direct use of
kun/map_local_pages(), per Boris.
- Some x86 changes, per Boris.

v14: 16nov2022
https://lkml.org/lkml/2022/11/16/1645
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.1.0-rc5
- Introduced CRASH_HOTPLUG Kconfig item to better fine tune
compilation of feature components, per Boris.
- Removed hp_action parameter to arch_crash_handle_hotplug_event()
as it is unused.

v13: 31oct2022
https://lkml.org/lkml/2022/10/31/854
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.1.0-rc3, which means converting to use the new
kexec_trylock() away from mutex_lock(kexec_mutex).
- Moved arch_un/map_crash_pages() into kexec.h and default
implementation using k/unmap_local_pages().
- Changed more #ifdef's into IS_ENABLED()
- Changed CRASH_MAX_MEMORY_RANGES to 8192 from 32768, and it moved
into x86 crash.c as #define rather Kconfig item, per Boris.
- Check number of Phdrs against PN_XNUM, max possible.

v12: 9sep2022
https://lkml.org/lkml/2022/9/9/1358
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.0-rc4
- Addressed some minor formatting items, per Baoquan

v11: 26aug2022
https://lkml.org/lkml/2022/8/26/963
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.0-rc2
- Redid the rework of __weak to use asm/kexec.h, per Baoquan
- Reworked some comments and minor items, per Baoquan

v10: 21jul2022
https://lkml.org/lkml/2022/7/21/1007
https://lore.kernel.org/lkml/[email protected]/
- Rebased to 5.19.0-rc7
- Per Sourabh, corrected build issue with arch_un/map_crash_pages()
for architectures not supporting this feature.
- Per David Hildebrand, removed the WARN_ONCE() altogether.
- Per David Hansen, converted to use of kmap_local_page().
- Per Baoquan He, replaced use of __weak with the kexec technique.

v9: 13jun2022
https://lkml.org/lkml/2022/6/13/3382
https://lore.kernel.org/lkml/[email protected]/
- Rebased to 5.18.0
- Per Sourabh, moved crash_prepare_elf64_headers() into common
crash_core.c to avoid compile issues with kexec_load only path.
- Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
- Changed the __weak arch_crash_handle_hotplug_event() to utilize
WARN_ONCE() instead of WARN(). Fix some formatting issues.
- Per Sourabh, introduced sysfs attribute crash_hotplug for memory
and CPUs; for use by userspace (udev) to determine if the kernel
performs crash hot un/plug support.
- Per Sourabh, moved the code detecting the elfcorehdr segment from
arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
and kexec_file_load can benefit.
- Updated userspace kexec-tools kexec utility to reflect change to
using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
- Updated the new proposed udev rules to reflect using the sysfs
attributes crash_hotplug.

v8: 5may2022
https://lkml.org/lkml/2022/5/5/1133
https://lore.kernel.org/lkml/[email protected]/
- Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
is not needed. Also use of IS_ENABLED() rather than #ifdef's.
Renamed crash_hotplug_handler() to handle_hotplug_event().
And other corrections.
- Per Baoquan, minimized the parameters to the arch_crash_
handle_hotplug_event() to hp_action and cpu.
- Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
- Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
by David Hildebrand. Folded this patch into the x86
kexec_file_load support patch.

v7: 13apr2022
https://lkml.org/lkml/2022/4/13/850
https://lore.kernel.org/lkml/[email protected]/
- Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
https://lkml.org/lkml/2022/4/1/1203
https://lore.kernel.org/lkml/[email protected]/
- Reword commit messages and some comment cleanup per Baoquan.
- Changed elf_index to elfcorehdr_index for clarity.
- Minor code changes per Baoquan.

v5: 3mar2022
https://lkml.org/lkml/2022/3/3/674
https://lore.kernel.org/lkml/[email protected]/
- Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
David Hildenbrand.
- Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
https://lkml.org/lkml/2022/2/9/1406
https://lore.kernel.org/lkml/[email protected]/
- Refactored patches per Baoquan suggestsions.
- A few corrections, per Baoquan.

v3: 10jan2022
https://lkml.org/lkml/2022/1/10/1212
https://lore.kernel.org/lkml/[email protected]/
- Rebasing per Baoquan He request.
- Changed memory notifier per David Hildenbrand.
- Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
https://lkml.org/lkml/2021/12/7/1088
https://lore.kernel.org/lkml/[email protected]/
- Acting upon Baoquan He suggestion of removing elfcorehdr from
the purgatory list of segments, removed purgatory code from
patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
https://lkml.org/lkml/2021/11/18/845
https://lore.kernel.org/lkml/[email protected]/
- working patchset demonstrating kernel handling of hotplug
updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
https://lkml.org/lkml/2020/12/14/532
https://lore.kernel.org/lkml/[email protected]/
- proposed concept of allowing kernel to handle hotplug update
of elfcorehdr
---

Eric DeVolder (7):
crash: move a few code bits to setup support of crash hotplug
crash: add generic infrastructure for crash hotplug support
kexec: exclude elfcorehdr from the segment digest
crash: memory and CPU hotplug sysfs attributes
x86/crash: add x86 crash hotplug support
crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
x86/crash: optimize CPU changes

.../admin-guide/mm/memory-hotplug.rst | 8 +
Documentation/core-api/cpu_hotplug.rst | 18 +
arch/x86/Kconfig | 13 +
arch/x86/include/asm/kexec.h | 15 +
arch/x86/kernel/crash.c | 129 ++++++-
drivers/base/cpu.c | 14 +
drivers/base/memory.c | 13 +
include/linux/crash_core.h | 9 +
include/linux/kexec.h | 49 ++-
kernel/crash_core.c | 324 ++++++++++++++++++
kernel/kexec_core.c | 6 +
kernel/kexec_file.c | 187 +---------
12 files changed, 582 insertions(+), 203 deletions(-)

--
2.31.1


2023-04-04 18:07:58

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support

When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

The segment containing the elfcorehdr is identified at run-time
in crash_core:crash_handle_hotplug_event(), which works for both
the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
is generated from the available CPUs and memory into a buffer,
and then installed over the top of the existing elfcorehdr.

In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated. As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.

To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
only on the kexec_file_load() syscall; for kexec_load() userspace
will need to size the segment similarly.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, and with CONFIG_CRASH_HOTPLUG
enabled, it is necessary to move prepare_elf_headers() and
dependents outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder <[email protected]>
Reviewed-by: Sourabh Jain <[email protected]>
---
arch/x86/Kconfig | 13 ++++
arch/x86/include/asm/kexec.h | 15 +++++
arch/x86/kernel/crash.c | 119 ++++++++++++++++++++++++++++++++---
3 files changed, 140 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a825bf031f49..316fdaa9a68e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2119,6 +2119,19 @@ config CRASH_DUMP
(CONFIG_RELOCATABLE=y).
For more details see Documentation/admin-guide/kdump/kdump.rst

+config CRASH_HOTPLUG
+ bool "Update the crash elfcorehdr on system configuration changes"
+ default y
+ depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+ help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
config KEXEC_JUMP
bool "kexec jump"
depends on KEXEC && HIBERNATION
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index a3760ca796aa..1bc852ce347d 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -212,6 +212,21 @@ typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);

+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..0c9d496cf7ce 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -41,6 +41,21 @@
#include <asm/crash.h>
#include <asm/cmdline.h>

+/*
+ * For the kexec_file_load() syscall path, specify the maximum number of
+ * memory regions that the elfcorehdr buffer/segment can accommodate.
+ * These regions are obtained via walk_system_ram_res(); eg. the
+ * 'System RAM' entries in /proc/iomem.
+ * This value is combined with NR_CPUS_DEFAULT and multiplied by
+ * sizeof(Elf64_Phdr) to determine the final elfcorehdr memory buffer/
+ * segment size.
+ * The value 8192, for example, covers a (sparsely populated) 1TiB system
+ * consisting of 128MiB memblocks, while resulting in an elfcorehdr
+ * memory buffer/segment size under 1MiB. This represents a sane choice
+ * to accommodate both baremetal and virtual machine configurations.
+ */
+#define CRASH_MAX_MEMORY_RANGES 8192
+
/* Used while preparing memory map entries for second kernel */
struct crash_memmap_data {
struct boot_params *params;
@@ -158,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
}

-#ifdef CONFIG_KEXEC_FILE
-
static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
{
unsigned int *nr_ranges = arg;
@@ -231,7 +244,7 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)

/* Prepare elf headers. Return addr and size */
static int prepare_elf_headers(struct kimage *image, void **addr,
- unsigned long *sz)
+ unsigned long *sz, unsigned long *nr_mem_ranges)
{
struct crash_mem *cmem;
int ret;
@@ -249,6 +262,9 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
if (ret)
goto out;

+ /* Return the computed number of memory ranges, for hotplug usage */
+ *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret = crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);

@@ -257,6 +273,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
return ret;
}

+#ifdef CONFIG_KEXEC_FILE
static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
{
unsigned int nr_e820_entries;
@@ -371,18 +388,42 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
int crash_load_segments(struct kimage *image)
{
int ret;
+ unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
.buf_max = ULONG_MAX, .top_down = false };

/* Prepare elf headers and add a segment */
- ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz);
+ ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz, &pnum);
if (ret)
return ret;

- image->elf_headers = kbuf.buffer;
- image->elf_headers_sz = kbuf.bufsz;
+ image->elf_headers = kbuf.buffer;
+ image->elf_headers_sz = kbuf.bufsz;
+ kbuf.memsz = kbuf.bufsz;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+ /*
+ * Ensure the elfcorehdr segment large enough for hotplug changes.
+ * Account for VMCOREINFO and kernel_map and maximum CPUs.
+ */
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ pnum = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
+ else
+ pnum += 2 + CONFIG_NR_CPUS_DEFAULT;
+
+ if (pnum < (unsigned long)PN_XNUM) {
+ kbuf.memsz = pnum * sizeof(Elf64_Phdr);
+ kbuf.memsz += sizeof(Elf64_Ehdr);
+
+ image->elfcorehdr_index = image->nr_segments;
+
+ /* Mark as usable to crash kernel, else crash kernel fails on boot */
+ image->elf_headers_sz = kbuf.memsz;
+ } else {
+ pr_err("number of Phdrs %lu exceeds max\n", pnum);
+ }
+#endif

- kbuf.memsz = kbuf.bufsz;
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
@@ -395,3 +436,67 @@ int crash_load_segments(struct kimage *image)
return ret;
}
#endif /* CONFIG_KEXEC_FILE */
+
+#ifdef CONFIG_CRASH_HOTPLUG
+
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage
+ *
+ * The new elfcorehdr is prepared in a kernel buffer, and then it is
+ * written on top of the existing/old elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image)
+{
+ void *elfbuf = NULL, *old_elfcorehdr;
+ unsigned long nr_mem_ranges;
+ unsigned long mem, memsz;
+ unsigned long elfsz = 0;
+
+ /*
+ * Create the new elfcorehdr reflecting the changes to CPU and/or
+ * memory resources.
+ */
+ if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
+ pr_err("unable to prepare elfcore headers");
+ goto out;
+ }
+
+ /*
+ * Obtain address and size of the elfcorehdr segment, and
+ * check it against the new elfcorehdr buffer.
+ */
+ mem = image->segment[image->elfcorehdr_index].mem;
+ memsz = image->segment[image->elfcorehdr_index].memsz;
+ if (elfsz > memsz) {
+ pr_err("update elfcorehdr elfsz %lu > memsz %lu",
+ elfsz, memsz);
+ goto out;
+ }
+
+ /*
+ * Copy new elfcorehdr over the old elfcorehdr at destination.
+ */
+ old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+ if (!old_elfcorehdr) {
+ pr_err("updating elfcorehdr failed\n");
+ goto out;
+ }
+
+ /*
+ * Temporarily invalidate the crash image while the
+ * elfcorehdr is updated.
+ */
+ xchg(&kexec_crash_image, NULL);
+ memcpy_flushcache(old_elfcorehdr, elfbuf, elfsz);
+ xchg(&kexec_crash_image, image);
+ kunmap_local(old_elfcorehdr);
+ pr_debug("updated elfcorehdr\n");
+
+out:
+ vfree(elfbuf);
+}
+#endif
--
2.31.1

2023-04-04 18:08:04

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v21 7/7] x86/crash: optimize CPU changes

This patch is dependent upon the patch 'crash: change
crash_prepare_elf64_headers() to for_each_possible_cpu()'. With that
patch, crash_prepare_elf64_headers() writes out an ELF CPU PT_NOTE
for all possible CPUs, thus further CPU changes to the elfcorehdr
are not needed.

This change works for kexec_file_load() and kexec_load() syscalls.
For kexec_file_load(), crash_prepare_elf64_headers() is utilized
directly and thus all ELF CPU PT_NOTEs are in the elfcorehdr already.
This is the kimage->file_mode term.
For kexec_load() syscall, one CPU or memory change will cause the
elfcorehdr to be updated via crash_prepare_elf64_headers() and at
that point all ELF CPU PT_NOTEs are in the elfcorehdr. This is the
kimage->elfcorehdr_updated term.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder <[email protected]>
Reviewed-by: Sourabh Jain <[email protected]>
---
arch/x86/kernel/crash.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 0c9d496cf7ce..ead602636f3e 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -456,6 +456,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;

+ /* As crash_prepare_elf64_headers() has already described all
+ * possible CPUs, there is no need to update the elfcorehdr
+ * for additional CPU changes. This works for both kexec_load()
+ * and kexec_file_load() syscalls.
+ */
+ if ((image->file_mode || image->elfcorehdr_updated) &&
+ ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+ (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+ return;
+
/*
* Create the new elfcorehdr reflecting the changes to CPU and/or
* memory resources.
--
2.31.1

2023-04-06 11:15:03

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v21 0/7] crash: Kernel handling of CPU and memory hot un/plug

On 04/04/23 at 02:03pm, Eric DeVolder wrote:
> Once the kdump service is loaded, if changes to CPUs or memory occur,
> either by hot un/plug or off/onlining, the crash elfcorehdr must also
> be updated.
>
> The elfcorehdr describes to kdump the CPUs and memory in the system,
> and any inaccuracies can result in a vmcore with missing CPU context
> or memory regions.
>
> The current solution utilizes udev to initiate an unload-then-reload
> of the kdump image (eg. kernel, initrd, boot_params, purgatory and
> elfcorehdr) by the userspace kexec utility. In the original post I
> outlined the significant performance problems related to offloading
> this activity to userspace.
>
> This patchset introduces a generic crash handler that registers with
> the CPU and memory notifiers. Upon CPU or memory changes, from either
> hot un/plug or off/onlining, this generic handler is invoked and
> performs important housekeeping, for example obtaining the appropriate
> lock, and then invokes an architecture specific handler to do the
> appropriate elfcorehdr update.
>
> Note the description in patch 'crash: change crash_prepare_elf64_headers()
> to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
> enables further optimizations related to CPU plug/unplug/online/offline
> performance of elfcorehdr updates.
>
> In the case of x86_64, the arch specific handler generates a new
> elfcorehdr, and overwrites the old one in memory; thus no involvement
> with userspace needed.
>
> To realize the benefits/test this patchset, one must make a couple
> of minor changes to userspace:
>
> - Prevent udev from updating kdump crash kernel on hot un/plug changes.
> Add the following as the first lines to the RHEL udev rule file
> /usr/lib/udev/rules.d/98-kexec.rules:
>
> # The kernel updates the crash elfcorehdr for CPU and memory changes
> SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>
> With this changeset applied, the two rules evaluate to false for
> CPU and memory change events and thus skip the userspace
> unload-then-reload of kdump.
>
> - Change to the kexec_file_load for loading the kdump kernel:
> Eg. on RHEL: in /usr/bin/kdumpctl, change to:
> standard_kexec_args="-p -d -s"
> which adds the -s to select kexec_file_load() syscall.
>
> This kernel patchset also supports kexec_load() with a modified kexec
> userspace utility. A working changeset to the kexec userspace utility
> is posted to the kexec-tools mailing list here:
>
> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>
> To use the kexec-tools patch, apply, build and install kexec-tools,
> then change the kdumpctl's standard_kexec_args to replace the -s with
> --hotplug. The removal of -s reverts to the kexec_load syscall and
> the addition of --hotplug invokes the changes put forth in the
> kexec-tools patch.

Other than the nitpick in patch 2, this series looks good to me.

Acked-by: Baoquan He <[email protected]>

2023-04-06 16:16:03

by Eric DeVolder

[permalink] [raw]
Subject: Re: [PATCH v21 0/7] crash: Kernel handling of CPU and memory hot un/plug



On 4/6/23 06:06, Baoquan He wrote:
> On 04/04/23 at 02:03pm, Eric DeVolder wrote:
>> Once the kdump service is loaded, if changes to CPUs or memory occur,
>> either by hot un/plug or off/onlining, the crash elfcorehdr must also
>> be updated.
>>
>> The elfcorehdr describes to kdump the CPUs and memory in the system,
>> and any inaccuracies can result in a vmcore with missing CPU context
>> or memory regions.
>>
>> The current solution utilizes udev to initiate an unload-then-reload
>> of the kdump image (eg. kernel, initrd, boot_params, purgatory and
>> elfcorehdr) by the userspace kexec utility. In the original post I
>> outlined the significant performance problems related to offloading
>> this activity to userspace.
>>
>> This patchset introduces a generic crash handler that registers with
>> the CPU and memory notifiers. Upon CPU or memory changes, from either
>> hot un/plug or off/onlining, this generic handler is invoked and
>> performs important housekeeping, for example obtaining the appropriate
>> lock, and then invokes an architecture specific handler to do the
>> appropriate elfcorehdr update.
>>
>> Note the description in patch 'crash: change crash_prepare_elf64_headers()
>> to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
>> enables further optimizations related to CPU plug/unplug/online/offline
>> performance of elfcorehdr updates.
>>
>> In the case of x86_64, the arch specific handler generates a new
>> elfcorehdr, and overwrites the old one in memory; thus no involvement
>> with userspace needed.
>>
>> To realize the benefits/test this patchset, one must make a couple
>> of minor changes to userspace:
>>
>> - Prevent udev from updating kdump crash kernel on hot un/plug changes.
>> Add the following as the first lines to the RHEL udev rule file
>> /usr/lib/udev/rules.d/98-kexec.rules:
>>
>> # The kernel updates the crash elfcorehdr for CPU and memory changes
>> SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>> SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>>
>> With this changeset applied, the two rules evaluate to false for
>> CPU and memory change events and thus skip the userspace
>> unload-then-reload of kdump.
>>
>> - Change to the kexec_file_load for loading the kdump kernel:
>> Eg. on RHEL: in /usr/bin/kdumpctl, change to:
>> standard_kexec_args="-p -d -s"
>> which adds the -s to select kexec_file_load() syscall.
>>
>> This kernel patchset also supports kexec_load() with a modified kexec
>> userspace utility. A working changeset to the kexec userspace utility
>> is posted to the kexec-tools mailing list here:
>>
>> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>>
>> To use the kexec-tools patch, apply, build and install kexec-tools,
>> then change the kdumpctl's standard_kexec_args to replace the -s with
>> --hotplug. The removal of -s reverts to the kexec_load syscall and
>> the addition of --hotplug invokes the changes put forth in the
>> kexec-tools patch.
>
> Other than the nitpick in patch 2, this series looks good to me.
>
> Acked-by: Baoquan He <[email protected]>
>

Thank you! I'll await your preference on memory range handling.
eric

2023-04-27 07:16:09

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v21 0/7] crash: Kernel handling of CPU and memory hot un/plug

Hi Eric,

On 04/04/23 11:33 pm, Eric DeVolder wrote:
> Once the kdump service is loaded, if changes to CPUs or memory occur,
> either by hot un/plug or off/onlining, the crash elfcorehdr must also
> be updated.
>
> The elfcorehdr describes to kdump the CPUs and memory in the system,
> and any inaccuracies can result in a vmcore with missing CPU context
> or memory regions.
>
> The current solution utilizes udev to initiate an unload-then-reload
> of the kdump image (eg. kernel, initrd, boot_params, purgatory and
> elfcorehdr) by the userspace kexec utility. In the original post I
> outlined the significant performance problems related to offloading
> this activity to userspace.
>
> This patchset introduces a generic crash handler that registers with
> the CPU and memory notifiers. Upon CPU or memory changes, from either
> hot un/plug or off/onlining, this generic handler is invoked and
> performs important housekeeping, for example obtaining the appropriate
> lock, and then invokes an architecture specific handler to do the
> appropriate elfcorehdr update.
>
> Note the description in patch 'crash: change crash_prepare_elf64_headers()
> to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
> enables further optimizations related to CPU plug/unplug/online/offline
> performance of elfcorehdr updates.
>
> In the case of x86_64, the arch specific handler generates a new
> elfcorehdr, and overwrites the old one in memory; thus no involvement
> with userspace needed.
>
> To realize the benefits/test this patchset, one must make a couple
> of minor changes to userspace:
>
> - Prevent udev from updating kdump crash kernel on hot un/plug changes.
> Add the following as the first lines to the RHEL udev rule file
> /usr/lib/udev/rules.d/98-kexec.rules:
>
> # The kernel updates the crash elfcorehdr for CPU and memory changes
> SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
> SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>
> With this changeset applied, the two rules evaluate to false for
> CPU and memory change events and thus skip the userspace
> unload-then-reload of kdump.
>
> - Change to the kexec_file_load for loading the kdump kernel:
> Eg. on RHEL: in /usr/bin/kdumpctl, change to:
> standard_kexec_args="-p -d -s"
> which adds the -s to select kexec_file_load() syscall.
>
> This kernel patchset also supports kexec_load() with a modified kexec
> userspace utility. A working changeset to the kexec userspace utility
> is posted to the kexec-tools mailing list here:
>
> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html

With the in-kernel update, the size is anyway getting calculated in
kernel for kexec_file_load case, how about passing the recommended size
for elfcorehdr segment and any other segment that needs an in-kernel
update as a sysfs attribute instead of kexec-tools having to do the
calculation again in the userspace (get_elfcorehdrsz()) for kexec_load
case. That makes segment size calculation less error prone.

Thanks
Hari

2023-04-27 08:09:35

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support

Hi Eric,

On 04/04/23 11:33 pm, Eric DeVolder wrote:
> When CPU or memory is hot un/plugged, or off/onlined, the crash
> elfcorehdr, which describes the CPUs and memory in the system,
> must also be updated.
>
> The segment containing the elfcorehdr is identified at run-time
> in crash_core:crash_handle_hotplug_event(), which works for both
> the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
> is generated from the available CPUs and memory into a buffer,
> and then installed over the top of the existing elfcorehdr.
>
> In the patch 'kexec: exclude elfcorehdr from the segment digest'
> the need to update purgatory due to the change in elfcorehdr was
> eliminated. As a result, no changes to purgatory or boot_params
> (as the elfcorehdr= kernel command line parameter pointer
> remains unchanged and correct) are needed, just elfcorehdr.
>
> To accommodate a growing number of resources via hotplug, the
> elfcorehdr segment must be sufficiently large enough to accommodate
> changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
> only on the kexec_file_load() syscall; for kexec_load() userspace
> will need to size the segment similarly.
>
> To accommodate kexec_load() syscall in the absence of

Firstly, thanks! This series is a nice improvement to kdump support
in hotplug environment.

One concern though is that this change assumes corresponding support
in kexec-tools. Without that support kexec_load would fail to boot
with digest verification failure, iiuc.

I would suggest a flag to advertise to the kernel that kexec-tools/
userspace wants in-kernel update. Something like KEXEC_IN_KERNEL_UPDATE
on top of existing flags like KEXEC_ON_CRASH & KEXEC_PRESERVE_CONTEXT.
This flag can be used to decide whether in-kernel update needs to be
enforced or not. That should make transition to this change smoother
without having to break userspace.

Thanks
Hari

2023-04-27 08:55:19

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support

On 04/27/23 at 12:39pm, Hari Bathini wrote:
> Hi Eric,
>
> On 04/04/23 11:33 pm, Eric DeVolder wrote:
> > When CPU or memory is hot un/plugged, or off/onlined, the crash
> > elfcorehdr, which describes the CPUs and memory in the system,
> > must also be updated.
> >
> > The segment containing the elfcorehdr is identified at run-time
> > in crash_core:crash_handle_hotplug_event(), which works for both
> > the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
> > is generated from the available CPUs and memory into a buffer,
> > and then installed over the top of the existing elfcorehdr.
> >
> > In the patch 'kexec: exclude elfcorehdr from the segment digest'
> > the need to update purgatory due to the change in elfcorehdr was
> > eliminated. As a result, no changes to purgatory or boot_params
> > (as the elfcorehdr= kernel command line parameter pointer
> > remains unchanged and correct) are needed, just elfcorehdr.
> >
> > To accommodate a growing number of resources via hotplug, the
> > elfcorehdr segment must be sufficiently large enough to accommodate
> > changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
> > only on the kexec_file_load() syscall; for kexec_load() userspace
> > will need to size the segment similarly.
> >
> > To accommodate kexec_load() syscall in the absence of
>
> Firstly, thanks! This series is a nice improvement to kdump support
> in hotplug environment.
>
> One concern though is that this change assumes corresponding support
> in kexec-tools. Without that support kexec_load would fail to boot
> with digest verification failure, iiuc.

Eric has posted patchset to modify kexec_tools to support that, please
see the link Eric pasted in the cover letter.

http://lists.infradead.org/pipermail/kexec/2022-October/026032.html

2023-04-27 17:10:04

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support

On 27/04/23 2:19 pm, Baoquan He wrote:
> On 04/27/23 at 12:39pm, Hari Bathini wrote:
>> Hi Eric,
>>
>> On 04/04/23 11:33 pm, Eric DeVolder wrote:
>>> When CPU or memory is hot un/plugged, or off/onlined, the crash
>>> elfcorehdr, which describes the CPUs and memory in the system,
>>> must also be updated.
>>>
>>> The segment containing the elfcorehdr is identified at run-time
>>> in crash_core:crash_handle_hotplug_event(), which works for both
>>> the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
>>> is generated from the available CPUs and memory into a buffer,
>>> and then installed over the top of the existing elfcorehdr.
>>>
>>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
>>> the need to update purgatory due to the change in elfcorehdr was
>>> eliminated. As a result, no changes to purgatory or boot_params
>>> (as the elfcorehdr= kernel command line parameter pointer
>>> remains unchanged and correct) are needed, just elfcorehdr.
>>>
>>> To accommodate a growing number of resources via hotplug, the
>>> elfcorehdr segment must be sufficiently large enough to accommodate
>>> changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
>>> only on the kexec_file_load() syscall; for kexec_load() userspace
>>> will need to size the segment similarly.
>>>
>>> To accommodate kexec_load() syscall in the absence of
>>
>> Firstly, thanks! This series is a nice improvement to kdump support
>> in hotplug environment.
>>
>> One concern though is that this change assumes corresponding support
>> in kexec-tools. Without that support kexec_load would fail to boot
>> with digest verification failure, iiuc.
>
> Eric has posted patchset to modify kexec_tools to support that, please
> see the link Eric pasted in the cover letter.
>
> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html

Right, Baoquan.

I did see that and if I read the code correctly, without that patchset
kexec_load would fail. Not with an explicit error that hotplug support
is missing or such but it would simply fail to boot into capture kernel
with digest verification failure.

My suggestion was to avoid that userspace tool breakage for older
kexec-tools version by introducing a new kexec flag that can tell
kernel that kexec-tools is ready to use this in-kernel update support.
So, if kexec_load happens without the flag, avoid doing an in-kernel
update on hotplug. I hope that clears the confusion.

Thanks
Hari

2023-04-28 09:27:36

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support

On 04/27/23 at 10:26pm, Hari Bathini wrote:
> On 27/04/23 2:19 pm, Baoquan He wrote:
> > On 04/27/23 at 12:39pm, Hari Bathini wrote:
> > > Hi Eric,
> > >
> > > On 04/04/23 11:33 pm, Eric DeVolder wrote:
> > > > When CPU or memory is hot un/plugged, or off/onlined, the crash
> > > > elfcorehdr, which describes the CPUs and memory in the system,
> > > > must also be updated.
> > > >
> > > > The segment containing the elfcorehdr is identified at run-time
> > > > in crash_core:crash_handle_hotplug_event(), which works for both
> > > > the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
> > > > is generated from the available CPUs and memory into a buffer,
> > > > and then installed over the top of the existing elfcorehdr.
> > > >
> > > > In the patch 'kexec: exclude elfcorehdr from the segment digest'
> > > > the need to update purgatory due to the change in elfcorehdr was
> > > > eliminated. As a result, no changes to purgatory or boot_params
> > > > (as the elfcorehdr= kernel command line parameter pointer
> > > > remains unchanged and correct) are needed, just elfcorehdr.
> > > >
> > > > To accommodate a growing number of resources via hotplug, the
> > > > elfcorehdr segment must be sufficiently large enough to accommodate
> > > > changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
> > > > only on the kexec_file_load() syscall; for kexec_load() userspace
> > > > will need to size the segment similarly.
> > > >
> > > > To accommodate kexec_load() syscall in the absence of
> > >
> > > Firstly, thanks! This series is a nice improvement to kdump support
> > > in hotplug environment.
> > >
> > > One concern though is that this change assumes corresponding support
> > > in kexec-tools. Without that support kexec_load would fail to boot
> > > with digest verification failure, iiuc.
> >
> > Eric has posted patchset to modify kexec_tools to support that, please
> > see the link Eric pasted in the cover letter.
> >
> > http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>
> Right, Baoquan.
>
> I did see that and if I read the code correctly, without that patchset
> kexec_load would fail. Not with an explicit error that hotplug support
> is missing or such but it would simply fail to boot into capture kernel
> with digest verification failure.
>
> My suggestion was to avoid that userspace tool breakage for older
> kexec-tools version by introducing a new kexec flag that can tell
> kernel that kexec-tools is ready to use this in-kernel update support.
> So, if kexec_load happens without the flag, avoid doing an in-kernel
> update on hotplug. I hope that clears the confusion.

Yeah, sounds like a good idea. It may be extended in later patch.

2023-04-28 18:57:06

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support


On 28/04/23 2:55 pm, Baoquan He wrote:
> On 04/27/23 at 10:26pm, Hari Bathini wrote:
>> On 27/04/23 2:19 pm, Baoquan He wrote:
>>> On 04/27/23 at 12:39pm, Hari Bathini wrote:
>>>> Hi Eric,
>>>>
>>>> On 04/04/23 11:33 pm, Eric DeVolder wrote:
>>>>> When CPU or memory is hot un/plugged, or off/onlined, the crash
>>>>> elfcorehdr, which describes the CPUs and memory in the system,
>>>>> must also be updated.
>>>>>
>>>>> The segment containing the elfcorehdr is identified at run-time
>>>>> in crash_core:crash_handle_hotplug_event(), which works for both
>>>>> the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
>>>>> is generated from the available CPUs and memory into a buffer,
>>>>> and then installed over the top of the existing elfcorehdr.
>>>>>
>>>>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
>>>>> the need to update purgatory due to the change in elfcorehdr was
>>>>> eliminated. As a result, no changes to purgatory or boot_params
>>>>> (as the elfcorehdr= kernel command line parameter pointer
>>>>> remains unchanged and correct) are needed, just elfcorehdr.
>>>>>
>>>>> To accommodate a growing number of resources via hotplug, the
>>>>> elfcorehdr segment must be sufficiently large enough to accommodate
>>>>> changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
>>>>> only on the kexec_file_load() syscall; for kexec_load() userspace
>>>>> will need to size the segment similarly.
>>>>>
>>>>> To accommodate kexec_load() syscall in the absence of
>>>>
>>>> Firstly, thanks! This series is a nice improvement to kdump support
>>>> in hotplug environment.
>>>>
>>>> One concern though is that this change assumes corresponding support
>>>> in kexec-tools. Without that support kexec_load would fail to boot
>>>> with digest verification failure, iiuc.
>>>
>>> Eric has posted patchset to modify kexec_tools to support that, please
>>> see the link Eric pasted in the cover letter.
>>>
>>> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>>
>> Right, Baoquan.
>>
>> I did see that and if I read the code correctly, without that patchset
>> kexec_load would fail. Not with an explicit error that hotplug support
>> is missing or such but it would simply fail to boot into capture kernel
>> with digest verification failure.
>>
>> My suggestion was to avoid that userspace tool breakage for older
>> kexec-tools version by introducing a new kexec flag that can tell
>> kernel that kexec-tools is ready to use this in-kernel update support.
>> So, if kexec_load happens without the flag, avoid doing an in-kernel
>> update on hotplug. I hope that clears the confusion.
>
> Yeah, sounds like a good idea. It may be extended in later patch.

Fixing it in this series itself would be a cleaner way, I guess.

Thanks
Hari

2023-05-01 18:41:17

by Eric DeVolder

[permalink] [raw]
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support



On 4/28/23 13:31, Hari Bathini wrote:
>
> On 28/04/23 2:55 pm, Baoquan He wrote:
>> On 04/27/23 at 10:26pm, Hari Bathini wrote:
>>> On 27/04/23 2:19 pm, Baoquan He wrote:
>>>> On 04/27/23 at 12:39pm, Hari Bathini wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 04/04/23 11:33 pm, Eric DeVolder wrote:
>>>>>> When CPU or memory is hot un/plugged, or off/onlined, the crash
>>>>>> elfcorehdr, which describes the CPUs and memory in the system,
>>>>>> must also be updated.
>>>>>>
>>>>>> The segment containing the elfcorehdr is identified at run-time
>>>>>> in crash_core:crash_handle_hotplug_event(), which works for both
>>>>>> the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
>>>>>> is generated from the available CPUs and memory into a buffer,
>>>>>> and then installed over the top of the existing elfcorehdr.
>>>>>>
>>>>>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
>>>>>> the need to update purgatory due to the change in elfcorehdr was
>>>>>> eliminated.  As a result, no changes to purgatory or boot_params
>>>>>> (as the elfcorehdr= kernel command line parameter pointer
>>>>>> remains unchanged and correct) are needed, just elfcorehdr.
>>>>>>
>>>>>> To accommodate a growing number of resources via hotplug, the
>>>>>> elfcorehdr segment must be sufficiently large enough to accommodate
>>>>>> changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
>>>>>> only on the kexec_file_load() syscall; for kexec_load() userspace
>>>>>> will need to size the segment similarly.
>>>>>>
>>>>>> To accommodate kexec_load() syscall in the absence of
>>>>>
>>>>> Firstly, thanks! This series is a nice improvement to kdump support
>>>>> in hotplug environment.
Thank you!

>>>>>
>>>>> One concern though is that this change assumes corresponding support
>>>>> in kexec-tools. Without that support kexec_load would fail to boot
>>>>> with digest verification failure, iiuc.

Yes, you've correctly identified that if a hotplug change occurs following kexec_load
(made with kexec-tools unaltered for hotplug), then a subsequent panic would in fact
fail the purgatory digest verification, and kdump would not happen.

>>>>
>>>> Eric has posted patchset to modify kexec_tools to support that, please
>>>> see the link Eric pasted in the cover letter.
>>>>
>>>> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>>>
>>> Right, Baoquan.
>>>
>>> I did see that and if I read the code correctly, without that patchset
>>> kexec_load would fail. Not with an explicit error that hotplug support
>>> is missing or such but it would simply fail to boot into capture kernel
>>> with digest verification failure.
This is correct.

>>>
>>> My suggestion was to avoid that userspace tool breakage for older
>>> kexec-tools version by introducing a new kexec flag that can tell
>>> kernel that kexec-tools is ready to use this in-kernel update support.
>>> So, if kexec_load happens without the flag, avoid doing an in-kernel
>>> update on hotplug. I hope that clears the confusion.
>>
>> Yeah, sounds like a good idea. It may be extended in later patch.
>
> Fixing it in this series itself would be a cleaner way, I guess.

You're suggestion of using a flag makes alot of sense; it is an indication
to the kernel that it is valid/okay to modify the kexec_load elfcorehdr.
Only kexec-tools that understands this (meaning the elfcorehdr buffer is
appropriately sized *and* excludes the elfcorehdr from the purgatory check)
would set that flag.

The roll-out of this feature needs to be coordinated, no doubt. There are three
pieces to this puzzle: this kernel series, the udev rule changes, and the changes
to kexec-tools for kexec_load.

I consider the udev rule changes critical to making this feature work efficiently.
I also think that deploying the udev rules immediately is doable since nothing
references them, yet; they would be NOPs. And they would be in place when the
kernel and/or kexec-tool changes deploy.

However, your point about supporting kexec_load with and without this new flag
means the sysfs nodes upon which the udev rule change rely need to be a bit
smarter now. (I'm assuming these udev rules will be generally accepted as-is,
as they are simple and efficient.)

The sysfs crash_hotplug nodes need to take into account kexec_file_load vs
(kexec_load && new_flag). Generally speaking these crash_hotplug sysfs nodes we
want to be 1 going forward, but where kexec_load/kexec-tools is older and/or no new_flag,
it needs to be 0. In this way the udev rules can remain as proposed and work properly
for kexec_file_load and both flavors of kexec_load.

Good catch! I'll post v22 soon.

Eric

>
> Thanks
> Hari

2023-05-01 18:46:02

by Eric DeVolder

[permalink] [raw]
Subject: Re: [PATCH v21 0/7] crash: Kernel handling of CPU and memory hot un/plug



On 4/27/23 02:08, Hari Bathini wrote:
> Hi Eric,
>
> On 04/04/23 11:33 pm, Eric DeVolder wrote:
>> Once the kdump service is loaded, if changes to CPUs or memory occur,
>> either by hot un/plug or off/onlining, the crash elfcorehdr must also
>> be updated.
>>
>> The elfcorehdr describes to kdump the CPUs and memory in the system,
>> and any inaccuracies can result in a vmcore with missing CPU context
>> or memory regions.
>>
>> The current solution utilizes udev to initiate an unload-then-reload
>> of the kdump image (eg. kernel, initrd, boot_params, purgatory and
>> elfcorehdr) by the userspace kexec utility. In the original post I
>> outlined the significant performance problems related to offloading
>> this activity to userspace.
>>
>> This patchset introduces a generic crash handler that registers with
>> the CPU and memory notifiers. Upon CPU or memory changes, from either
>> hot un/plug or off/onlining, this generic handler is invoked and
>> performs important housekeeping, for example obtaining the appropriate
>> lock, and then invokes an architecture specific handler to do the
>> appropriate elfcorehdr update.
>>
>> Note the description in patch 'crash: change crash_prepare_elf64_headers()
>> to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
>> enables further optimizations related to CPU plug/unplug/online/offline
>> performance of elfcorehdr updates.
>>
>> In the case of x86_64, the arch specific handler generates a new
>> elfcorehdr, and overwrites the old one in memory; thus no involvement
>> with userspace needed.
>>
>> To realize the benefits/test this patchset, one must make a couple
>> of minor changes to userspace:
>>
>>   - Prevent udev from updating kdump crash kernel on hot un/plug changes.
>>     Add the following as the first lines to the RHEL udev rule file
>>     /usr/lib/udev/rules.d/98-kexec.rules:
>>
>>     # The kernel updates the crash elfcorehdr for CPU and memory changes
>>     SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>>     SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
>>
>>     With this changeset applied, the two rules evaluate to false for
>>     CPU and memory change events and thus skip the userspace
>>     unload-then-reload of kdump.
>>
>>   - Change to the kexec_file_load for loading the kdump kernel:
>>     Eg. on RHEL: in /usr/bin/kdumpctl, change to:
>>      standard_kexec_args="-p -d -s"
>>     which adds the -s to select kexec_file_load() syscall.
>>
>> This kernel patchset also supports kexec_load() with a modified kexec
>> userspace utility. A working changeset to the kexec userspace utility
>> is posted to the kexec-tools mailing list here:
>>
>>   http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>
> With the in-kernel update, the size is anyway getting calculated in
> kernel for kexec_file_load case, how about passing the recommended size
> for elfcorehdr segment and any other segment that needs an in-kernel update as a sysfs attribute
> instead of kexec-tools having to do the
> calculation again in the userspace  (get_elfcorehdrsz()) for kexec_load
> case. That makes segment size calculation less error prone.
>
> Thanks
> Hari
Makes sense, I'll add that to v22.
Thanks!
eric

2023-05-02 09:53:41

by Hari Bathini

[permalink] [raw]
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support



On 02/05/23 12:03 am, Eric DeVolder wrote:
>
>
> On 4/28/23 13:31, Hari Bathini wrote:
>>
>> On 28/04/23 2:55 pm, Baoquan He wrote:
>>> On 04/27/23 at 10:26pm, Hari Bathini wrote:
>>>> On 27/04/23 2:19 pm, Baoquan He wrote:
>>>>> On 04/27/23 at 12:39pm, Hari Bathini wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> On 04/04/23 11:33 pm, Eric DeVolder wrote:
>>>>>>> When CPU or memory is hot un/plugged, or off/onlined, the crash
>>>>>>> elfcorehdr, which describes the CPUs and memory in the system,
>>>>>>> must also be updated.
>>>>>>>
>>>>>>> The segment containing the elfcorehdr is identified at run-time
>>>>>>> in crash_core:crash_handle_hotplug_event(), which works for both
>>>>>>> the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
>>>>>>> is generated from the available CPUs and memory into a buffer,
>>>>>>> and then installed over the top of the existing elfcorehdr.
>>>>>>>
>>>>>>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
>>>>>>> the need to update purgatory due to the change in elfcorehdr was
>>>>>>> eliminated.  As a result, no changes to purgatory or boot_params
>>>>>>> (as the elfcorehdr= kernel command line parameter pointer
>>>>>>> remains unchanged and correct) are needed, just elfcorehdr.
>>>>>>>
>>>>>>> To accommodate a growing number of resources via hotplug, the
>>>>>>> elfcorehdr segment must be sufficiently large enough to accommodate
>>>>>>> changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
>>>>>>> only on the kexec_file_load() syscall; for kexec_load() userspace
>>>>>>> will need to size the segment similarly.
>>>>>>>
>>>>>>> To accommodate kexec_load() syscall in the absence of
>>>>>>
>>>>>> Firstly, thanks! This series is a nice improvement to kdump support
>>>>>> in hotplug environment.
> Thank you!
>
>>>>>>
>>>>>> One concern though is that this change assumes corresponding support
>>>>>> in kexec-tools. Without that support kexec_load would fail to boot
>>>>>> with digest verification failure, iiuc.
>
> Yes, you've correctly identified that if a hotplug change occurs
> following kexec_load
> (made with kexec-tools unaltered for hotplug), then a subsequent panic
> would in fact
> fail the purgatory digest verification, and  kdump would not happen.
>
>>>>>
>>>>> Eric has posted patchset to modify kexec_tools to support that, please
>>>>> see the link Eric pasted in the cover letter.
>>>>>
>>>>> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>>>>
>>>> Right, Baoquan.
>>>>
>>>> I did see that and if I read the code correctly, without that patchset
>>>> kexec_load would fail. Not with an explicit error that hotplug support
>>>> is missing or such but it would simply fail to boot into capture kernel
>>>> with digest verification failure.
> This is correct.
>
>>>>
>>>> My suggestion was to avoid that userspace tool breakage for older
>>>> kexec-tools version by introducing a new kexec flag that can tell
>>>> kernel that kexec-tools is ready to use this in-kernel update support.
>>>> So, if kexec_load happens without the flag, avoid doing an in-kernel
>>>> update on hotplug. I hope that clears the confusion.
>>>
>>> Yeah, sounds like a good idea. It may be extended in later patch.
>>
>> Fixing it in this series itself would be a cleaner way, I guess.
>
> You're suggestion of using a flag makes alot of sense; it is an indication
> to the kernel that it is valid/okay to modify the kexec_load elfcorehdr.
> Only kexec-tools that understands this (meaning the elfcorehdr buffer is
> appropriately sized *and* excludes the elfcorehdr from the purgatory check)
> would set that flag.
>
> The roll-out of this feature needs to be coordinated, no doubt. There
> are three
> pieces to this puzzle: this kernel series, the udev rule changes, and
> the changes
> to kexec-tools for kexec_load.
>
> I consider the udev rule changes critical to making this feature work
> efficiently.
> I also think that deploying the udev rules immediately is doable since
> nothing
> references them, yet; they would be NOPs. And they would be in place
> when the
> kernel and/or kexec-tool changes deploy.
>
> However, your point about supporting kexec_load with and without this
> new flag
> means the sysfs nodes upon which the udev rule change rely need to be a bit
> smarter now. (I'm assuming these udev rules will be generally accepted
> as-is,
> as they are simple and efficient.)
>

> The sysfs crash_hotplug nodes need to take into account kexec_file_load vs
> (kexec_load && new_flag). Generally speaking these crash_hotplug sysfs
> nodes we
> want to be 1 going forward, but where kexec_load/kexec-tools is older
> and/or no new_flag,
> it needs to be 0. In this way the udev rules can remain as proposed and
> work properly
> for kexec_file_load and both flavors of kexec_load.

Right. That is the tricky part. kdump scripts and kexec-tools have to
be in sync if udev rules have to just rely on crash_hotplug.

Thanks
Hari