2021-12-07 19:53:18

by Eric DeVolder

[permalink] [raw]
Subject: [RFC v2 0/6] crash: Kernel handling of CPU and memory hot un/plug

When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
and memory in the system, must also be updated, else the resulting
vmcore is inaccurate (eg. missing either CPU context or memory
regions).

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (e. kernel, initrd, boot_params, puratory and
elfcorehdr) by the userspace kexec utility. In previous posts I have
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash hot un/plug handler that
registers with the CPU and memory notifiers. Upon CPU or memory
changes, this generic handler is invoked and performs important
housekeeping, for example obtaining the appropriate lock, and then
invokes an architecture specific handler to do the appropriate
updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory. No involvement
with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

- Disable the udev rule for updating kdump on hot un/plug changes
Eg. on RHEL: rm -f /usr/lib/udev/rules.d/98-kexec.rules
or other technique to neuter the rule.

- Change to the kexec_file_load for loading the kdump kernel:
Eg. on RHEL: in /usr/bin/kdumpctl, change to:
standard_kexec_args="-p -d -s"
which adds the -s to select kexec_file_load syscall.

This patchset supports kexec_load with a modified kexec userspace
utility, on which I am current working to provide separately.

Regards,
eric
---
RFC v2: 7dec2021
- Acting upon Baoquan He suggestion of removing elfcorehdr from
the purgatory list of segments, removed purgatory code from
patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
https://lkml.org/lkml/2021/11/18/845
- working patchset demonstrating kernel handling of hotplug
updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
https://lkml.org/lkml/2020/12/14/532
- proposed concept of allowing kernel to handle hotplug update
of elfcorehdr
---


Eric DeVolder (6):
crash: fix minor typo/bug in debug message
crash hp: Introduce CRASH_HOTPLUG configuration options
crash hp: definitions and prototype changes
crash hp: generic crash hotplug support infrastructure
crash hp: kexec_file changes for crash hotplug support
crash hp: Add x86 crash hotplug support

arch/x86/Kconfig | 26 ++++++++
arch/x86/kernel/crash.c | 140 +++++++++++++++++++++++++++++++++++++++-
include/linux/kexec.h | 21 +++++-
kernel/crash_core.c | 118 +++++++++++++++++++++++++++++++++
kernel/kexec_file.c | 15 ++++-
5 files changed, 314 insertions(+), 6 deletions(-)

--
2.27.0



2021-12-07 19:53:24

by Eric DeVolder

[permalink] [raw]
Subject: [RFC v2 5/6] crash hp: kexec_file changes for crash hotplug support

Two important changes to note:

The kexec_calculate_store_digests() changed to specifically EXCLUDE
the elfcorehdr segment from its list of segments to check.
This is an important change as it allows, in a hotplug environment,
for the elfcorehdr segment (which contains the list of CPUs and
memory regions) to change dynamically without the need to update
purgatory (with the hash/digests of the segments it checks) as well.

The crash_prepare_elf64_headers() changed to look for the offline'd
CPU and exclude it. This due to the fact that the offline'd CPU is
still in the for_each_present_cpu() list at this point in time on
the cpu hotplug handler path.

Signed-off-by: Eric DeVolder <[email protected]>
---
kernel/kexec_file.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..339995d42169 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -765,6 +765,12 @@ static int kexec_calculate_store_digests(struct kimage *image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;

+#ifdef CONFIG_CRASH_HOTPLUG
+ /* This segment excluded to allow future changes via hotplug */
+ if (image->elf_index_valid && (j == image->elf_index))
+ continue;
+#endif
+
ksegment = &image->segment[i];
/*
* Skip purgatory as it will be modified once we put digest
@@ -1260,8 +1266,8 @@ int crash_exclude_mem_range(struct crash_mem *mem,
return 0;
}

-int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
- void **addr, unsigned long *sz)
+int crash_prepare_elf64_headers(struct kimage *image, struct crash_mem *mem,
+ int kernel_map, void **addr, unsigned long *sz)
{
Elf64_Ehdr *ehdr;
Elf64_Phdr *phdr;
@@ -1308,6 +1314,11 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,

/* Prepare one phdr of type PT_NOTE for each present CPU */
for_each_present_cpu(cpu) {
+#ifdef CONFIG_CRASH_HOTPLUG
+ /* Skip the soon-to-be offlined cpu */
+ if (image->hotplug_event && (cpu == image->offlinecpu))
+ continue;
+#endif
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
--
2.27.0


2021-12-07 19:53:26

by Eric DeVolder

[permalink] [raw]
Subject: [RFC v2 1/6] crash: fix minor typo/bug in debug message

The pr_debug() intends to display the memsz member, but the
parameter is actually the bufsz member (which is already
displayed). Correct this to display memsz value.

Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/kernel/crash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8326a8d1c5d..9730c88530fc 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -407,7 +407,7 @@ int crash_load_segments(struct kimage *image)
}
image->elf_load_addr = kbuf.mem;
pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
- image->elf_load_addr, kbuf.bufsz, kbuf.bufsz);
+ image->elf_load_addr, kbuf.bufsz, kbuf.memsz);

return ret;
}
--
2.27.0


2021-12-07 19:53:35

by Eric DeVolder

[permalink] [raw]
Subject: [RFC v2 6/6] crash hp: Add x86 crash hotplug support

For x86_64, when CPU or memory is hot un/plugged, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

To update the elfcorehdr for x86_64, a new elfcorehdr must be
generated from the available CPUs and memory. The new elfcorehdr
is prepared into a buffer, and if no errors occur, it is
installed over the top of the existing elfcorehdr.

In the patch 'crash hp: kexec_file changes for crash hotplug support'
the need to update purgatory due to the change in elfcorehdr was
eliminated. As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.

To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_HOTPLUG_ELFCOREHDR_SZ configure item.

NOTE that this supports both kexec_load and kexec_file_load. Support
for kexec_load is made possible by identifying the elfcorehdr segment
at load time and updating it as previously described. However, it is
the responsibility of the userspace kexec utility to ensure that:
- the elfcorehdr segment is sufficiently large enough to accommodate
hotplug changes, ala CRASH_HOTPLUG_ELFCOREHDR_SZ.
- provides a purgatory that excludes the elfcorehdr from its list of
run-time segments to check.
These changes to the userspace kexec utility are not yet available.

Signed-off-by: Eric DeVolder <[email protected]>
---
arch/x86/kernel/crash.c | 138 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 137 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9730c88530fc..d185137b33d4 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/memblock.h>
+#include <linux/highmem.h>

#include <asm/processor.h>
#include <asm/hardirq.h>
@@ -265,7 +266,8 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
goto out;

/* By default prepare 64bit headers */
- ret = crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);
+ ret = crash_prepare_elf64_headers(image, cmem,
+ IS_ENABLED(CONFIG_X86_64), addr, sz);

out:
vfree(cmem);
@@ -397,7 +399,17 @@ int crash_load_segments(struct kimage *image)
image->elf_headers = kbuf.buffer;
image->elf_headers_sz = kbuf.bufsz;

+#ifdef CONFIG_CRASH_HOTPLUG
+ /* Ensure elfcorehdr segment large enough for hotplug changes */
+ kbuf.memsz = CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ;
+ /* For marking as usable to crash kernel */
+ image->elf_headers_sz = kbuf.memsz;
+ /* Record the index of the elfcorehdr segment */
+ image->elf_index = image->nr_segments;
+ image->elf_index_valid = true;
+#else
kbuf.memsz = kbuf.bufsz;
+#endif
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
@@ -412,3 +424,127 @@ int crash_load_segments(struct kimage *image)
return ret;
}
#endif /* CONFIG_KEXEC_FILE */
+
+#ifdef CONFIG_CRASH_HOTPLUG
+void *map_crash_pages(unsigned long paddr, unsigned long size)
+{
+ /*
+ * NOTE: The addresses and sizes passed to this routine have
+ * already been fully aligned on page boundaries. There is no
+ * need for massaging the address or size.
+ */
+ void *ptr = NULL;
+
+ /* NOTE: requires arch_kexec_[un]protect_crashkres() for write access */
+ if (size > 0) {
+ struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
+
+ ptr = kmap(page);
+ }
+
+ return ptr;
+}
+
+void unmap_crash_pages(void **ptr)
+{
+ if (ptr) {
+ if (*ptr)
+ kunmap(*ptr);
+ *ptr = NULL;
+ }
+}
+
+void arch_crash_hotplug_handler(struct kimage *image,
+ unsigned int hp_action, unsigned long a, unsigned long b)
+{
+ /*
+ * To accurately reflect hot un/plug changes, the elfcorehdr (which
+ * is passed to the crash kernel via the elfcorehdr= parameter)
+ * must be updated with the new list of CPUs and memories. The new
+ * elfcorehdr is prepared in a kernel buffer, and if no errors,
+ * then it is written on top of the existing/old elfcorehdr.
+ *
+ * Due to the change to the elfcorehdr, purgatory must explicitly
+ * exclude the elfcorehdr from the list of segments it checks.
+ */
+ struct kexec_segment *ksegment;
+ unsigned char *ptr = NULL;
+ unsigned long elfsz = 0;
+ void *elfbuf = NULL;
+ unsigned long mem, memsz;
+ unsigned int n;
+
+ /*
+ * When the struct kimage is alloced, it is wiped to zero, so
+ * the elf_index_valid defaults to false. It is set on the
+ * kexec_file_load path, or here for kexec_load.
+ */
+ if (!image->elf_index_valid) {
+ for (n = 0; n < image->nr_segments; n++) {
+ mem = image->segment[n].mem;
+ memsz = image->segment[n].memsz;
+ ptr = map_crash_pages(mem, memsz);
+ if (ptr) {
+ /* The segment containing elfcorehdr */
+ if ((ptr[0] == 0x7F) &&
+ (ptr[1] == 'E') &&
+ (ptr[2] == 'L') &&
+ (ptr[3] == 'F')) {
+ image->elf_index = (int)n;
+ image->elf_index_valid = true;
+ }
+ }
+ unmap_crash_pages((void **)&ptr);
+ }
+ }
+
+ /* Must have valid elfcorehdr index */
+ if (!image->elf_index_valid) {
+ pr_err("crash hp: unable to locate elfcorehdr segment");
+ goto out;
+ }
+
+ /*
+ * Create the new elfcorehdr reflecting the changes to CPU and/or
+ * memory resources. The elfcorehdr segment memsz must be
+ * sufficiently large to accommodate increases due to hotplug
+ * activity. See CRASH_HOTPLUG_ELFCOREHDR_SZ.
+ */
+ if (prepare_elf_headers(image, &elfbuf, &elfsz)) {
+ pr_err("crash hp: unable to prepare elfcore headers");
+ goto out;
+ }
+ ksegment = &image->segment[image->elf_index];
+ memsz = ksegment->memsz;
+ if (elfsz > memsz) {
+ pr_err("crash hp: update elfcorehdr elfsz %lu > memsz %lu",
+ elfsz, memsz);
+ goto out;
+ }
+
+ /*
+ * At this point, we are all but assured of success.
+ * Copy new elfcorehdr into destination.
+ */
+ ksegment = &image->segment[image->elf_index];
+ mem = ksegment->mem;
+ memsz = ksegment->memsz;
+ ptr = map_crash_pages(mem, memsz);
+ if (ptr) {
+ /* Temporarily invalidate the crash image while it is replaced */
+ xchg(&kexec_crash_image, NULL);
+ /* Write the new elfcorehdr into memory */
+ memcpy((void *)ptr, elfbuf, elfsz);
+ /* The crash image is now valid once again */
+ xchg(&kexec_crash_image, image);
+ }
+ unmap_crash_pages((void **)&ptr);
+ pr_debug("crash hp: re-loaded elfcorehdr at 0x%lx\n", mem);
+
+//FIX??? somekind of cache flush perhaps?
+
+out:
+ if (elfbuf)
+ vfree(elfbuf);
+}
+#endif /* CONFIG_CRASH_HOTPLUG */
--
2.27.0


2021-12-07 19:54:01

by Eric DeVolder

[permalink] [raw]
Subject: [RFC v2 4/6] crash hp: generic crash hotplug support infrastructure

This patch introduces a generic crash hot plug/unplug infrastructure
for CPU and memory changes. Upon CPU and memory changes, a generic
crash_hotplug_handler() obtains the appropriate lock, does some
important house keeping and then dispatches the hot plug/unplug event
to the architecture specific arch_crash_hotplug_handler(), and when
that handler returns, the lock is released.

This patch modifies crash_core.c to implement a subsys_initcall()
function that installs handlers for hot plug/unplug events. If CPU
hotplug is enabled, then cpuhp_setup_state() is invoked to register a
handler for CPU changes. Similarly, if memory hotplug is enabled, then
register_memory_notifier() is invoked to install a handler for memory
changes. These handlers in turn invoke the common generic handler
crash_hotplug_handler().

On the CPU side, cpuhp_setup_state_nocalls() is invoked with parameter
CPUHP_AP_ONLINE_DYN. While this works, when a CPU is being unplugged,
the CPU still shows up in foreach_present_cpu() during the regeneration
of the new CPU list, thus the need to explicitly check and exclude the
soon-to-be offlined CPU in crash_prepare_elf64_headers().

On the memory side, each un/plugged memory block passes through the
handler. For example, if a 1GiB DIMM is hotplugged, that generate 8
memory events, one for each 128MiB memblock.

Signed-off-by: Eric DeVolder <[email protected]>
---
kernel/crash_core.c | 118 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 118 insertions(+)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index eb53f5ec62c9..9a30a305b04d 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -8,12 +8,16 @@
#include <linux/crash_core.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>

#include <asm/page.h>
#include <asm/sections.h>

#include <crypto/sha1.h>

+#include "kexec_internal.h"
+
/* vmcoreinfo stuff */
unsigned char *vmcoreinfo_data;
size_t vmcoreinfo_size;
@@ -480,3 +484,117 @@ static int __init crash_save_vmcoreinfo_init(void)
}

subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+void __weak arch_crash_hotplug_handler(struct kimage *image,
+ unsigned int hp_action, unsigned long a, unsigned long b)
+{
+ pr_warn("crash hp: %s not implemented", __func__);
+}
+
+static void crash_hotplug_handler(unsigned int hp_action,
+ unsigned long a, unsigned long b)
+{
+ /* Obtain lock while changing crash information */
+ if (!mutex_trylock(&kexec_mutex))
+ return;
+
+ /* Check kdump is loaded */
+ if (kexec_crash_image) {
+ pr_debug("crash hp: hp_action %u, a %lu, b %lu", hp_action,
+ a, b);
+
+ /* Needed in order for the segments to be updated */
+ arch_kexec_unprotect_crashkres();
+
+ /* Flag to differentiate between normal load and hotplug */
+ kexec_crash_image->hotplug_event = true;
+
+ /*
+ * Due to use of CPUHP_AP_ONLINE_DYN, upon unplug and during
+ * this callback, the CPU is still in the for_each_present_cpu()
+ * list. Must explicitly look to exclude this CPU when building
+ * new list.
+ */
+ kexec_crash_image->offlinecpu =
+ (hp_action == KEXEC_CRASH_HP_REMOVE_CPU) ?
+ (unsigned int)a : ~0U;
+
+ /* Now invoke arch-specific update handler */
+ arch_crash_hotplug_handler(kexec_crash_image, hp_action, a, b);
+
+ /* No longer handling a hotplug event */
+ kexec_crash_image->hotplug_event = false;
+
+ /* Change back to read-only */
+ arch_kexec_protect_crashkres();
+ }
+
+ /* Release lock now that update complete */
+ mutex_unlock(&kexec_mutex);
+}
+
+#if defined(CONFIG_MEMORY_HOTPLUG)
+static int crash_memhp_notifier(struct notifier_block *nb,
+ unsigned long val, void *v)
+{
+ struct memory_notify *mhp = v;
+ unsigned long start, end;
+
+ start = mhp->start_pfn << PAGE_SHIFT;
+ end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
+
+ switch (val) {
+ case MEM_GOING_ONLINE:
+ crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
+ start, end-start);
+ break;
+
+ case MEM_OFFLINE:
+ case MEM_CANCEL_ONLINE:
+ crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
+ start, end-start);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+ .notifier_call = crash_memhp_notifier,
+ .priority = 0
+};
+#endif
+
+#if defined(CONFIG_HOTPLUG_CPU)
+static int crash_cpuhp_online(unsigned int cpu)
+{
+ crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu, 0);
+ return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+ crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu, 0);
+ return 0;
+}
+#endif
+
+static int __init crash_hotplug_init(void)
+{
+ int result = 0;
+
+#if defined(CONFIG_MEMORY_HOTPLUG)
+ register_memory_notifier(&crash_memhp_nb);
+#endif
+
+#if defined(CONFIG_HOTPLUG_CPU)
+ result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "crash/cpuhp",
+ crash_cpuhp_online, crash_cpuhp_offline);
+#endif
+
+ return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif /* CONFIG_CRASH_HOTPLUG */
--
2.27.0


2021-12-08 13:38:32

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC v2 4/6] crash hp: generic crash hotplug support infrastructure

> +#if defined(CONFIG_MEMORY_HOTPLUG)
> +static int crash_memhp_notifier(struct notifier_block *nb,
> + unsigned long val, void *v)
> +{
> + struct memory_notify *mhp = v;
> + unsigned long start, end;
> +
> + start = mhp->start_pfn << PAGE_SHIFT;
> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
> +
> + switch (val) {
> + case MEM_GOING_ONLINE:
> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
> + start, end-start);
> + break;
> +
> + case MEM_OFFLINE:
> + case MEM_CANCEL_ONLINE:
> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
> + start, end-start);

Any reason you don't handle this after the effects completely, meaning
MEM_ONLINE and MEM_OFFLINE?

--
Thanks,

David / dhildenb


2021-12-09 15:41:07

by Eric DeVolder

[permalink] [raw]
Subject: Re: [RFC v2 4/6] crash hp: generic crash hotplug support infrastructure



On 12/8/21 07:38, David Hildenbrand wrote:
>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>> +static int crash_memhp_notifier(struct notifier_block *nb,
>> + unsigned long val, void *v)
>> +{
>> + struct memory_notify *mhp = v;
>> + unsigned long start, end;
>> +
>> + start = mhp->start_pfn << PAGE_SHIFT;
>> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
>> +
>> + switch (val) {
>> + case MEM_GOING_ONLINE:
>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
>> + start, end-start);
>> + break;
>> +
>> + case MEM_OFFLINE:
>> + case MEM_CANCEL_ONLINE:
>> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
>> + start, end-start);
>
> Any reason you don't handle this after the effects completely, meaning
> MEM_ONLINE and MEM_OFFLINE?
>

No, no reason. Great catch! I've changed it to use MEM_ONLINE/OFFLINE only.
Thanks!
eric

2022-01-05 14:26:50

by Eric DeVolder

[permalink] [raw]
Subject: Re: [RFC v2 0/6] crash: Kernel handling of CPU and memory hot un/plug

Nudge...

Fwiw, below is a working changeset to kexec userspace utility that allows the kexec_load
path to work similarly to the kexec_file_load path of this RFC. With both the following
userspace kexec patch and this RFC, both kexec_load and kexec_file_load work with changes
due to hotplug *without* unloading-then-reloading the kdump/capture kernel.

eric

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index 9826f6d..06adb7e 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -48,6 +48,7 @@
#include <x86/x86-linux.h>

extern struct arch_options_t arch_options;
+extern unsigned long long hotplug_size;

static int get_kernel_page_offset(struct kexec_info *UNUSED(info),
struct crash_elf_info *elf_info)
@@ -975,6 +976,13 @@ int load_crashdump_segments(struct kexec_info *info, char* mod_cmdline,
} else {
memsz = bufsz;
}
+
+ /* If hotplug support enabled, use that size */
+ if (hotplug_size) {
+ memsz = hotplug_size;
+ }
+
+ info->elfcorehdr =
elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
max_addr, -1);
dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
diff --git a/kexec/kexec.c b/kexec/kexec.c
index f63b36b..9569d9a 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -58,6 +58,7 @@

unsigned long long mem_min = 0;
unsigned long long mem_max = ULONG_MAX;
+unsigned long long hotplug_size = 0;
static unsigned long kexec_flags = 0;
/* Flags for kexec file (fd) based syscall */
static unsigned long kexec_file_flags = 0;
@@ -672,6 +673,12 @@ static void update_purgatory(struct kexec_info *info)
if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
continue;
}
+ /* Don't include elfcorehdr in the checksum, if hotplug
+ * support enabled.
+ */
+ if (hotplug_size && (info->segment[i].mem == (void *)info->elfcorehdr)) {
+ continue;
+ }
sha256_update(&ctx, info->segment[i].buf,
info->segment[i].bufsz);
nullsz = info->segment[i].memsz - info->segment[i].bufsz;
@@ -1504,6 +1511,17 @@ int main(int argc, char *argv[])
case OPT_PRINT_CKR_SIZE:
print_crashkernel_region_size();
return 0;
+ case OPT_HOTPLUG_SIZE:
+ /* Reserved the specified size for hotplug growth */
+ hotplug_size = strtoul(optarg, &endptr, 0);
+ if (*endptr) {
+ fprintf(stderr,
+ "Bad option value in --hotplug-size=%s\n",
+ optarg);
+ usage();
+ return 1;
+ }
+ break;
default:
break;
}
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 595dd68..b30dda4 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -169,6 +169,7 @@ struct kexec_info {
int command_line_len;

int skip_checks;
+ unsigned long elfcorehdr;
};

struct arch_map_entry {
@@ -231,7 +232,8 @@ extern int file_types;
#define OPT_PRINT_CKR_SIZE 262
#define OPT_LOAD_LIVE_UPDATE 263
#define OPT_EXEC_LIVE_UPDATE 264
-#define OPT_MAX 265
+#define OPT_HOTPLUG_SIZE 265
+#define OPT_MAX 266
#define KEXEC_OPTIONS \
{ "help", 0, 0, OPT_HELP }, \
{ "version", 0, 0, OPT_VERSION }, \
@@ -258,6 +260,7 @@ extern int file_types;
{ "debug", 0, 0, OPT_DEBUG }, \
{ "status", 0, 0, OPT_STATUS }, \
{ "print-ckr-size", 0, 0, OPT_PRINT_CKR_SIZE }, \
+ { "hotplug-size", 2, 0, OPT_HOTPLUG_SIZE }, \

#define KEXEC_OPT_STR "h?vdfixyluet:pscaS"



On 12/7/21 13:51, Eric DeVolder wrote:
> When the kdump service is loaded, if a CPU or memory is hot
> un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
> and memory in the system, must also be updated, else the resulting
> vmcore is inaccurate (eg. missing either CPU context or memory
> regions).
>
> The current solution utilizes udev to initiate an unload-then-reload
> of the kdump image (e. kernel, initrd, boot_params, puratory and
> elfcorehdr) by the userspace kexec utility. In previous posts I have
> outlined the significant performance problems related to offloading
> this activity to userspace.
>
> This patchset introduces a generic crash hot un/plug handler that
> registers with the CPU and memory notifiers. Upon CPU or memory
> changes, this generic handler is invoked and performs important
> housekeeping, for example obtaining the appropriate lock, and then
> invokes an architecture specific handler to do the appropriate
> updates.
>
> In the case of x86_64, the arch specific handler generates a new
> elfcorehdr, and overwrites the old one in memory. No involvement
> with userspace needed.
>
> To realize the benefits/test this patchset, one must make a couple
> of minor changes to userspace:
>
> - Disable the udev rule for updating kdump on hot un/plug changes
> Eg. on RHEL: rm -f /usr/lib/udev/rules.d/98-kexec.rules
> or other technique to neuter the rule.
>
> - Change to the kexec_file_load for loading the kdump kernel:
> Eg. on RHEL: in /usr/bin/kdumpctl, change to:
> standard_kexec_args="-p -d -s"
> which adds the -s to select kexec_file_load syscall.
>
> This patchset supports kexec_load with a modified kexec userspace
> utility, on which I am current working to provide separately.
>
> Regards,
> eric
> ---
> RFC v2: 7dec2021
> - Acting upon Baoquan He suggestion of removing elfcorehdr from
> the purgatory list of segments, removed purgatory code from
> patchset, and it is signficiantly simpler now.
>
> RFC v1: 18nov2021
> https://lkml.org/lkml/2021/11/18/845
> - working patchset demonstrating kernel handling of hotplug
> updates to x86 elfcorehdr for kexec_file_load
>
> RFC: 14dec2020
> https://lkml.org/lkml/2020/12/14/532
> - proposed concept of allowing kernel to handle hotplug update
> of elfcorehdr
> ---
>
>
> Eric DeVolder (6):
> crash: fix minor typo/bug in debug message
> crash hp: Introduce CRASH_HOTPLUG configuration options
> crash hp: definitions and prototype changes
> crash hp: generic crash hotplug support infrastructure
> crash hp: kexec_file changes for crash hotplug support
> crash hp: Add x86 crash hotplug support
>
> arch/x86/Kconfig | 26 ++++++++
> arch/x86/kernel/crash.c | 140 +++++++++++++++++++++++++++++++++++++++-
> include/linux/kexec.h | 21 +++++-
> kernel/crash_core.c | 118 +++++++++++++++++++++++++++++++++
> kernel/kexec_file.c | 15 ++++-
> 5 files changed, 314 insertions(+), 6 deletions(-)
>

2022-01-10 08:05:25

by Baoquan He

[permalink] [raw]
Subject: Re: [RFC v2 0/6] crash: Kernel handling of CPU and memory hot un/plug

Hi Eric,

On 01/05/22 at 08:25am, Eric DeVolder wrote:
> Nudge...
>
> Fwiw, below is a working changeset to kexec userspace utility that allows the kexec_load
> path to work similarly to the kexec_file_load path of this RFC. With both the following
> userspace kexec patch and this RFC, both kexec_load and kexec_file_load work with changes
> due to hotplug *without* unloading-then-reloading the kdump/capture kernel.

Thanks for taking a try on that, and sorry for late response because of
somethings at hand.

I will review this v2 round. When applying them, I encountered some
conflict, could you please rebase these on the ltest 5.16 and send me
a tar ball privately, or a github branch is also welcome. Thanks in
advance.


Thanks
Baoquan

>
> diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
> index 9826f6d..06adb7e 100644
> --- a/kexec/arch/i386/crashdump-x86.c
> +++ b/kexec/arch/i386/crashdump-x86.c
> @@ -48,6 +48,7 @@
> #include <x86/x86-linux.h>
>
> extern struct arch_options_t arch_options;
> +extern unsigned long long hotplug_size;
>
> static int get_kernel_page_offset(struct kexec_info *UNUSED(info),
> struct crash_elf_info *elf_info)
> @@ -975,6 +976,13 @@ int load_crashdump_segments(struct kexec_info *info, char* mod_cmdline,
> } else {
> memsz = bufsz;
> }
> +
> + /* If hotplug support enabled, use that size */
> + if (hotplug_size) {
> + memsz = hotplug_size;
> + }
> +
> + info->elfcorehdr =
> elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
> max_addr, -1);
> dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index f63b36b..9569d9a 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -58,6 +58,7 @@
>
> unsigned long long mem_min = 0;
> unsigned long long mem_max = ULONG_MAX;
> +unsigned long long hotplug_size = 0;
> static unsigned long kexec_flags = 0;
> /* Flags for kexec file (fd) based syscall */
> static unsigned long kexec_file_flags = 0;
> @@ -672,6 +673,12 @@ static void update_purgatory(struct kexec_info *info)
> if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
> continue;
> }
> + /* Don't include elfcorehdr in the checksum, if hotplug
> + * support enabled.
> + */
> + if (hotplug_size && (info->segment[i].mem == (void *)info->elfcorehdr)) {
> + continue;
> + }
> sha256_update(&ctx, info->segment[i].buf,
> info->segment[i].bufsz);
> nullsz = info->segment[i].memsz - info->segment[i].bufsz;
> @@ -1504,6 +1511,17 @@ int main(int argc, char *argv[])
> case OPT_PRINT_CKR_SIZE:
> print_crashkernel_region_size();
> return 0;
> + case OPT_HOTPLUG_SIZE:
> + /* Reserved the specified size for hotplug growth */
> + hotplug_size = strtoul(optarg, &endptr, 0);
> + if (*endptr) {
> + fprintf(stderr,
> + "Bad option value in --hotplug-size=%s\n",
> + optarg);
> + usage();
> + return 1;
> + }
> + break;
> default:
> break;
> }
> diff --git a/kexec/kexec.h b/kexec/kexec.h
> index 595dd68..b30dda4 100644
> --- a/kexec/kexec.h
> +++ b/kexec/kexec.h
> @@ -169,6 +169,7 @@ struct kexec_info {
> int command_line_len;
>
> int skip_checks;
> + unsigned long elfcorehdr;
> };
>
> struct arch_map_entry {
> @@ -231,7 +232,8 @@ extern int file_types;
> #define OPT_PRINT_CKR_SIZE 262
> #define OPT_LOAD_LIVE_UPDATE 263
> #define OPT_EXEC_LIVE_UPDATE 264
> -#define OPT_MAX 265
> +#define OPT_HOTPLUG_SIZE 265
> +#define OPT_MAX 266
> #define KEXEC_OPTIONS \
> { "help", 0, 0, OPT_HELP }, \
> { "version", 0, 0, OPT_VERSION }, \
> @@ -258,6 +260,7 @@ extern int file_types;
> { "debug", 0, 0, OPT_DEBUG }, \
> { "status", 0, 0, OPT_STATUS }, \
> { "print-ckr-size", 0, 0, OPT_PRINT_CKR_SIZE }, \
> + { "hotplug-size", 2, 0, OPT_HOTPLUG_SIZE }, \
>
> #define KEXEC_OPT_STR "h?vdfixyluet:pscaS"
>
>
>
> On 12/7/21 13:51, Eric DeVolder wrote:
> > When the kdump service is loaded, if a CPU or memory is hot
> > un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
> > and memory in the system, must also be updated, else the resulting
> > vmcore is inaccurate (eg. missing either CPU context or memory
> > regions).
> >
> > The current solution utilizes udev to initiate an unload-then-reload
> > of the kdump image (e. kernel, initrd, boot_params, puratory and
> > elfcorehdr) by the userspace kexec utility. In previous posts I have
> > outlined the significant performance problems related to offloading
> > this activity to userspace.
> >
> > This patchset introduces a generic crash hot un/plug handler that
> > registers with the CPU and memory notifiers. Upon CPU or memory
> > changes, this generic handler is invoked and performs important
> > housekeeping, for example obtaining the appropriate lock, and then
> > invokes an architecture specific handler to do the appropriate
> > updates.
> >
> > In the case of x86_64, the arch specific handler generates a new
> > elfcorehdr, and overwrites the old one in memory. No involvement
> > with userspace needed.
> >
> > To realize the benefits/test this patchset, one must make a couple
> > of minor changes to userspace:
> >
> > - Disable the udev rule for updating kdump on hot un/plug changes
> > Eg. on RHEL: rm -f /usr/lib/udev/rules.d/98-kexec.rules
> > or other technique to neuter the rule.
> >
> > - Change to the kexec_file_load for loading the kdump kernel:
> > Eg. on RHEL: in /usr/bin/kdumpctl, change to:
> > standard_kexec_args="-p -d -s"
> > which adds the -s to select kexec_file_load syscall.
> >
> > This patchset supports kexec_load with a modified kexec userspace
> > utility, on which I am current working to provide separately.
> >
> > Regards,
> > eric
> > ---
> > RFC v2: 7dec2021
> > - Acting upon Baoquan He suggestion of removing elfcorehdr from
> > the purgatory list of segments, removed purgatory code from
> > patchset, and it is signficiantly simpler now.
> >
> > RFC v1: 18nov2021
> > https://lkml.org/lkml/2021/11/18/845
> > - working patchset demonstrating kernel handling of hotplug
> > updates to x86 elfcorehdr for kexec_file_load
> >
> > RFC: 14dec2020
> > https://lkml.org/lkml/2020/12/14/532
> > - proposed concept of allowing kernel to handle hotplug update
> > of elfcorehdr
> > ---
> >
> >
> > Eric DeVolder (6):
> > crash: fix minor typo/bug in debug message
> > crash hp: Introduce CRASH_HOTPLUG configuration options
> > crash hp: definitions and prototype changes
> > crash hp: generic crash hotplug support infrastructure
> > crash hp: kexec_file changes for crash hotplug support
> > crash hp: Add x86 crash hotplug support
> >
> > arch/x86/Kconfig | 26 ++++++++
> > arch/x86/kernel/crash.c | 140 +++++++++++++++++++++++++++++++++++++++-
> > include/linux/kexec.h | 21 +++++-
> > kernel/crash_core.c | 118 +++++++++++++++++++++++++++++++++
> > kernel/kexec_file.c | 15 ++++-
> > 5 files changed, 314 insertions(+), 6 deletions(-)
> >
>


2022-01-10 20:00:31

by Eric DeVolder

[permalink] [raw]
Subject: Re: [RFC v2 0/6] crash: Kernel handling of CPU and memory hot un/plug



On 1/10/22 02:04, Baoquan He wrote:
> Hi Eric,
>
> On 01/05/22 at 08:25am, Eric DeVolder wrote:
>> Nudge...
>>
>> Fwiw, below is a working changeset to kexec userspace utility that allows the kexec_load
>> path to work similarly to the kexec_file_load path of this RFC. With both the following
>> userspace kexec patch and this RFC, both kexec_load and kexec_file_load work with changes
>> due to hotplug *without* unloading-then-reloading the kdump/capture kernel.
>
> Thanks for taking a try on that, and sorry for late response because of
> somethings at hand.
>
> I will review this v2 round. When applying them, I encountered some
> conflict, could you please rebase these on the ltest 5.16 and send me
> a tar ball privately, or a github branch is also welcome. Thanks in
> advance.

Baoquan, thank you for your time and interest on this patch.
I posted as v3 as I incorporated changes from David Hildenbrand and some other minor tweaks.
Regards,
eric

>
>
> Thanks
> Baoquan
>
>>
>> diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
>> index 9826f6d..06adb7e 100644
>> --- a/kexec/arch/i386/crashdump-x86.c
>> +++ b/kexec/arch/i386/crashdump-x86.c
>> @@ -48,6 +48,7 @@
>> #include <x86/x86-linux.h>
>>
>> extern struct arch_options_t arch_options;
>> +extern unsigned long long hotplug_size;
>>
>> static int get_kernel_page_offset(struct kexec_info *UNUSED(info),
>> struct crash_elf_info *elf_info)
>> @@ -975,6 +976,13 @@ int load_crashdump_segments(struct kexec_info *info, char* mod_cmdline,
>> } else {
>> memsz = bufsz;
>> }
>> +
>> + /* If hotplug support enabled, use that size */
>> + if (hotplug_size) {
>> + memsz = hotplug_size;
>> + }
>> +
>> + info->elfcorehdr =
>> elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
>> max_addr, -1);
>> dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
>> diff --git a/kexec/kexec.c b/kexec/kexec.c
>> index f63b36b..9569d9a 100644
>> --- a/kexec/kexec.c
>> +++ b/kexec/kexec.c
>> @@ -58,6 +58,7 @@
>>
>> unsigned long long mem_min = 0;
>> unsigned long long mem_max = ULONG_MAX;
>> +unsigned long long hotplug_size = 0;
>> static unsigned long kexec_flags = 0;
>> /* Flags for kexec file (fd) based syscall */
>> static unsigned long kexec_file_flags = 0;
>> @@ -672,6 +673,12 @@ static void update_purgatory(struct kexec_info *info)
>> if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
>> continue;
>> }
>> + /* Don't include elfcorehdr in the checksum, if hotplug
>> + * support enabled.
>> + */
>> + if (hotplug_size && (info->segment[i].mem == (void *)info->elfcorehdr)) {
>> + continue;
>> + }
>> sha256_update(&ctx, info->segment[i].buf,
>> info->segment[i].bufsz);
>> nullsz = info->segment[i].memsz - info->segment[i].bufsz;
>> @@ -1504,6 +1511,17 @@ int main(int argc, char *argv[])
>> case OPT_PRINT_CKR_SIZE:
>> print_crashkernel_region_size();
>> return 0;
>> + case OPT_HOTPLUG_SIZE:
>> + /* Reserved the specified size for hotplug growth */
>> + hotplug_size = strtoul(optarg, &endptr, 0);
>> + if (*endptr) {
>> + fprintf(stderr,
>> + "Bad option value in --hotplug-size=%s\n",
>> + optarg);
>> + usage();
>> + return 1;
>> + }
>> + break;
>> default:
>> break;
>> }
>> diff --git a/kexec/kexec.h b/kexec/kexec.h
>> index 595dd68..b30dda4 100644
>> --- a/kexec/kexec.h
>> +++ b/kexec/kexec.h
>> @@ -169,6 +169,7 @@ struct kexec_info {
>> int command_line_len;
>>
>> int skip_checks;
>> + unsigned long elfcorehdr;
>> };
>>
>> struct arch_map_entry {
>> @@ -231,7 +232,8 @@ extern int file_types;
>> #define OPT_PRINT_CKR_SIZE 262
>> #define OPT_LOAD_LIVE_UPDATE 263
>> #define OPT_EXEC_LIVE_UPDATE 264
>> -#define OPT_MAX 265
>> +#define OPT_HOTPLUG_SIZE 265
>> +#define OPT_MAX 266
>> #define KEXEC_OPTIONS \
>> { "help", 0, 0, OPT_HELP }, \
>> { "version", 0, 0, OPT_VERSION }, \
>> @@ -258,6 +260,7 @@ extern int file_types;
>> { "debug", 0, 0, OPT_DEBUG }, \
>> { "status", 0, 0, OPT_STATUS }, \
>> { "print-ckr-size", 0, 0, OPT_PRINT_CKR_SIZE }, \
>> + { "hotplug-size", 2, 0, OPT_HOTPLUG_SIZE }, \
>>
>> #define KEXEC_OPT_STR "h?vdfixyluet:pscaS"
>>
>>
>>
>> On 12/7/21 13:51, Eric DeVolder wrote:
>>> When the kdump service is loaded, if a CPU or memory is hot
>>> un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
>>> and memory in the system, must also be updated, else the resulting
>>> vmcore is inaccurate (eg. missing either CPU context or memory
>>> regions).
>>>
>>> The current solution utilizes udev to initiate an unload-then-reload
>>> of the kdump image (e. kernel, initrd, boot_params, puratory and
>>> elfcorehdr) by the userspace kexec utility. In previous posts I have
>>> outlined the significant performance problems related to offloading
>>> this activity to userspace.
>>>
>>> This patchset introduces a generic crash hot un/plug handler that
>>> registers with the CPU and memory notifiers. Upon CPU or memory
>>> changes, this generic handler is invoked and performs important
>>> housekeeping, for example obtaining the appropriate lock, and then
>>> invokes an architecture specific handler to do the appropriate
>>> updates.
>>>
>>> In the case of x86_64, the arch specific handler generates a new
>>> elfcorehdr, and overwrites the old one in memory. No involvement
>>> with userspace needed.
>>>
>>> To realize the benefits/test this patchset, one must make a couple
>>> of minor changes to userspace:
>>>
>>> - Disable the udev rule for updating kdump on hot un/plug changes
>>> Eg. on RHEL: rm -f /usr/lib/udev/rules.d/98-kexec.rules
>>> or other technique to neuter the rule.
>>>
>>> - Change to the kexec_file_load for loading the kdump kernel:
>>> Eg. on RHEL: in /usr/bin/kdumpctl, change to:
>>> standard_kexec_args="-p -d -s"
>>> which adds the -s to select kexec_file_load syscall.
>>>
>>> This patchset supports kexec_load with a modified kexec userspace
>>> utility, on which I am current working to provide separately.
>>>
>>> Regards,
>>> eric
>>> ---
>>> RFC v2: 7dec2021
>>> - Acting upon Baoquan He suggestion of removing elfcorehdr from
>>> the purgatory list of segments, removed purgatory code from
>>> patchset, and it is signficiantly simpler now.
>>>
>>> RFC v1: 18nov2021
>>> https://lkml.org/lkml/2021/11/18/845
>>> - working patchset demonstrating kernel handling of hotplug
>>> updates to x86 elfcorehdr for kexec_file_load
>>>
>>> RFC: 14dec2020
>>> https://lkml.org/lkml/2020/12/14/532
>>> - proposed concept of allowing kernel to handle hotplug update
>>> of elfcorehdr
>>> ---
>>>
>>>
>>> Eric DeVolder (6):
>>> crash: fix minor typo/bug in debug message
>>> crash hp: Introduce CRASH_HOTPLUG configuration options
>>> crash hp: definitions and prototype changes
>>> crash hp: generic crash hotplug support infrastructure
>>> crash hp: kexec_file changes for crash hotplug support
>>> crash hp: Add x86 crash hotplug support
>>>
>>> arch/x86/Kconfig | 26 ++++++++
>>> arch/x86/kernel/crash.c | 140 +++++++++++++++++++++++++++++++++++++++-
>>> include/linux/kexec.h | 21 +++++-
>>> kernel/crash_core.c | 118 +++++++++++++++++++++++++++++++++
>>> kernel/kexec_file.c | 15 ++++-
>>> 5 files changed, 314 insertions(+), 6 deletions(-)
>>>
>>
>