When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
and memory in the system, must also be updated, else the resulting
vmcore is inaccurate (eg. missing either CPU context or memory
regions).
The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (e. kernel, initrd, boot_params, puratory and
elfcorehdr) by the userspace kexec utility. In previous posts I have
outlined the significant performance problems related to offloading
this activity to userspace.
This patchset introduces a generic crash hot un/plug handler that
registers with the CPU and memory notifiers. Upon CPU or memory
changes, this generic handler is invoked and performs important
housekeeping, for example obtaining the appropriate lock, and then
invokes an architecture specific handler to do the appropriate
updates.
In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory. No involvement
with userspace needed.
To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:
- Disable the udev rule for updating kdump on hot un/plug changes.
Add the following as the first two lines to the udev rule file
/usr/lib/udev/rules.d/98-kexec.rules:
# For x86_64, the kernel handles updates to crash elfcorehdr
CONST{arch}=="x86-64", GOTO="kdump_reload_end"
These two lines will cause cpu and memory hot un/plug events
to be skipped within this rule file.
- Change to the kexec_file_load for loading the kdump kernel:
Eg. on RHEL: in /usr/bin/kdumpctl, change to:
standard_kexec_args="-p -d -s"
which adds the -s to select kexec_file_load syscall.
This patchset supports kexec_load with a modified kexec userspace
utility, and a working changeset to the kexec userspace utility
is provided here (and to use, the above change to standard_kexec_args
would be, for example, to append --hotplug-size=131072 instead of -s).
diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index 9826f6d..06adb7e 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -48,6 +48,7 @@
#include <x86/x86-linux.h>
extern struct arch_options_t arch_options;
+extern unsigned long long hotplug_size;
static int get_kernel_page_offset(struct kexec_info *UNUSED(info),
struct crash_elf_info *elf_info)
@@ -975,6 +976,13 @@ int load_crashdump_segments(struct kexec_info *info, char* mod_cmdline,
} else {
memsz = bufsz;
}
+
+ /* If hotplug support enabled, use that size */
+ if (hotplug_size) {
+ memsz = hotplug_size;
+ }
+
+ info->elfcorehdr =
elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
max_addr, -1);
dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
diff --git a/kexec/kexec.c b/kexec/kexec.c
index f63b36b..9569d9a 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -58,6 +58,7 @@
unsigned long long mem_min = 0;
unsigned long long mem_max = ULONG_MAX;
+unsigned long long hotplug_size = 0;
static unsigned long kexec_flags = 0;
/* Flags for kexec file (fd) based syscall */
static unsigned long kexec_file_flags = 0;
@@ -672,6 +673,12 @@ static void update_purgatory(struct kexec_info *info)
if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
continue;
}
+ /* Don't include elfcorehdr in the checksum, if hotplug
+ * support enabled.
+ */
+ if (hotplug_size && (info->segment[i].mem == (void *)info->elfcorehdr)) {
+ continue;
+ }
sha256_update(&ctx, info->segment[i].buf,
info->segment[i].bufsz);
nullsz = info->segment[i].memsz - info->segment[i].bufsz;
@@ -1504,6 +1511,17 @@ int main(int argc, char *argv[])
case OPT_PRINT_CKR_SIZE:
print_crashkernel_region_size();
return 0;
+ case OPT_HOTPLUG_SIZE:
+ /* Reserved the specified size for hotplug growth */
+ hotplug_size = strtoul(optarg, &endptr, 0);
+ if (*endptr) {
+ fprintf(stderr,
+ "Bad option value in --hotplug-size=%s\n",
+ optarg);
+ usage();
+ return 1;
+ }
+ break;
default:
break;
}
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 595dd68..b30dda4 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -169,6 +169,7 @@ struct kexec_info {
int command_line_len;
int skip_checks;
+ unsigned long elfcorehdr;
};
struct arch_map_entry {
@@ -231,7 +232,8 @@ extern int file_types;
#define OPT_PRINT_CKR_SIZE 262
#define OPT_LOAD_LIVE_UPDATE 263
#define OPT_EXEC_LIVE_UPDATE 264
-#define OPT_MAX 265
+#define OPT_HOTPLUG_SIZE 265
+#define OPT_MAX 266
#define KEXEC_OPTIONS \
{ "help", 0, 0, OPT_HELP }, \
{ "version", 0, 0, OPT_VERSION }, \
@@ -258,6 +260,7 @@ extern int file_types;
{ "debug", 0, 0, OPT_DEBUG }, \
{ "status", 0, 0, OPT_STATUS }, \
{ "print-ckr-size", 0, 0, OPT_PRINT_CKR_SIZE }, \
+ { "hotplug-size", 2, 0, OPT_HOTPLUG_SIZE }, \
#define KEXEC_OPT_STR "h?vdfixyluet:pscaS"
Regards,
eric
---
v7: 13apr2022
- Resolved parameter usage to crash_hotplug_handler(), per Baoquan.
v6: 1apr2022
https://lkml.org/lkml/2022/4/1/1203
- Reword commit messages and some comment cleanup per Baoquan.
- Changed elf_index to elfcorehdr_index for clarity.
- Minor code changes per Baoquan.
v5: 3mar2022
https://lkml.org/lkml/2022/3/3/674
- Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
David Hildenbrand.
- Refactored slightly a few patches per Baoquan recommendation.
v4: 9feb2022
https://lkml.org/lkml/2022/2/9/1406
- Refactored patches per Baoquan suggestsions.
- A few corrections, per Baoquan.
v3: 10jan2022
https://lkml.org/lkml/2022/1/10/1212
- Rebasing per Baoquan He request.
- Changed memory notifier per David Hildenbrand.
- Providing example kexec userspace change in cover letter.
RFC v2: 7dec2021
https://lkml.org/lkml/2021/12/7/1088
- Acting upon Baoquan He suggestion of removing elfcorehdr from
the purgatory list of segments, removed purgatory code from
patchset, and it is signficiantly simpler now.
RFC v1: 18nov2021
https://lkml.org/lkml/2021/11/18/845
- working patchset demonstrating kernel handling of hotplug
updates to x86 elfcorehdr for kexec_file_load
RFC: 14dec2020
https://lkml.org/lkml/2020/12/14/532
- proposed concept of allowing kernel to handle hotplug update
of elfcorehdr
---
Eric DeVolder (8):
x86/crash: fix minor typo/bug in debug message
x86/crash: Introduce new options to support cpu and memory hotplug
crash: prototype change for crash_prepare_elf64_headers
crash: add generic infrastructure for crash hotplug support
kexec: exclude elfcorehdr from the segment digest
kexec: exclude hot remove cpu from elfcorehdr notes
x86/crash: Add x86 crash hotplug support for kexec_file_load
x86/crash: Add x86 crash hotplug support for kexec_load
arch/arm64/kernel/machine_kexec_file.c | 6 +-
arch/powerpc/kexec/file_load_64.c | 2 +-
arch/x86/Kconfig | 26 +++++
arch/x86/kernel/crash.c | 146 ++++++++++++++++++++++++-
include/linux/kexec.h | 21 +++-
kernel/crash_core.c | 110 +++++++++++++++++++
kernel/kexec_file.c | 15 ++-
7 files changed, 316 insertions(+), 10 deletions(-)
--
2.27.0
When a crash kernel is loaded via the kexec_file_load syscall, the
kernel places the various segments (ie crash kernel, crash initrd,
boot_params, elfcorehdr, purgatory, etc) in memory. For those
architectures that utilize purgatory, a hash digest of the segments
is calculated for integrity checking. This digest is embedded into
the purgatory image prior to placing purgatory in memory.
Since hotplug events cause changes to the elfcorehdr, purgatory
integrity checking fails (at crash time, and no kdump created).
As a result, this change explicitly excludes the elfcorehdr segment
from the list of segments used to create the digest. By doing so,
this permits changes to the elfcorehdr in response to hotplug events,
without having to also reload purgatory due to the change to the
digest.
Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
kernel/kexec_file.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 801d0d0a5012..94a459209111 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -765,6 +765,12 @@ static int kexec_calculate_store_digests(struct kimage *image)
for (j = i = 0; i < image->nr_segments; i++) {
struct kexec_segment *ksegment;
+#ifdef CONFIG_CRASH_HOTPLUG
+ /* This segment excluded to allow future changes via hotplug */
+ if (image->elfcorehdr_index_valid && (j == image->elfcorehdr_index))
+ continue;
+#endif
+
ksegment = &image->segment[i];
/*
* Skip purgatory as it will be modified once we put digest
--
2.27.0
Upon CPU and memory changes, a generic crash_hotplug_handler()
dispatches the hot plug/unplug event to the architecture specific
arch_crash_hotplug_handler(). During the process, the kexec_mutex
is held.
To support cpu hotplug, a callback is registered to capture the
CPUHP_AP_ONLINE_DYN online and ofline events via
cpuhp_setup_state_nocalls().
To support memory hotplug, a notifier is registered to capture the
MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
The cpu callback and memory notifier then call crash_hotplug_handler()
to handle the hot plug/unplug event.
Signed-off-by: Eric DeVolder <[email protected]>
---
include/linux/kexec.h | 16 +++++++
kernel/crash_core.c | 101 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 117 insertions(+)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index f93f2591fc1e..02daff1f47dd 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -306,6 +306,13 @@ struct kimage {
/* Information for loading purgatory */
struct purgatory_info purgatory_info;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+ bool hotplug_event;
+ unsigned int offlinecpu;
+ bool elfcorehdr_index_valid;
+ int elfcorehdr_index;
+#endif
#endif
#ifdef CONFIG_IMA_KEXEC
@@ -322,6 +329,15 @@ struct kimage {
unsigned long elf_load_addr;
};
+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_hotplug_handler(struct kimage *image,
+ unsigned int hp_action, unsigned int cpu);
+#define KEXEC_CRASH_HP_REMOVE_CPU 0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
+#define KEXEC_CRASH_HP_ADD_MEMORY 3
+#endif /* CONFIG_CRASH_HOTPLUG */
+
/* kexec interface functions */
extern void machine_kexec(struct kimage *image);
extern int machine_kexec_prepare(struct kimage *image);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 256cf6db573c..ecf746243ab2 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -9,12 +9,17 @@
#include <linux/init.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
+#include <linux/highmem.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>
#include <asm/page.h>
#include <asm/sections.h>
#include <crypto/sha1.h>
+#include "kexec_internal.h"
+
/* vmcoreinfo stuff */
unsigned char *vmcoreinfo_data;
size_t vmcoreinfo_size;
@@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
}
subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+void __weak arch_crash_hotplug_handler(struct kimage *image,
+ unsigned int hp_action, unsigned int cpu)
+{
+ pr_warn("crash hp: %s not implemented", __func__);
+}
+
+static void crash_hotplug_handler(unsigned int hp_action,
+ unsigned int cpu)
+{
+ /* Obtain lock while changing crash information */
+ if (!mutex_trylock(&kexec_mutex))
+ return;
+
+ /* Check kdump is loaded */
+ if (kexec_crash_image) {
+ pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
+
+ /* Needed in order for the segments to be updated */
+ arch_kexec_unprotect_crashkres();
+
+ /* Flag to differentiate between normal load and hotplug */
+ kexec_crash_image->hotplug_event = true;
+
+ /* Now invoke arch-specific update handler */
+ arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
+
+ /* No longer handling a hotplug event */
+ kexec_crash_image->hotplug_event = false;
+
+ /* Change back to read-only */
+ arch_kexec_protect_crashkres();
+ }
+
+ /* Release lock now that update complete */
+ mutex_unlock(&kexec_mutex);
+}
+
+#if defined(CONFIG_MEMORY_HOTPLUG)
+static int crash_memhp_notifier(struct notifier_block *nb,
+ unsigned long val, void *v)
+{
+ struct memory_notify *mhp = v;
+
+ switch (val) {
+ case MEM_ONLINE:
+ crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
+ break;
+
+ case MEM_OFFLINE:
+ crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY, -1U);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+ .notifier_call = crash_memhp_notifier,
+ .priority = 0
+};
+#endif
+
+#if defined(CONFIG_HOTPLUG_CPU)
+static int crash_cpuhp_online(unsigned int cpu)
+{
+ crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
+ return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+ crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+ return 0;
+}
+#endif
+
+static int __init crash_hotplug_init(void)
+{
+ int result = 0;
+
+#if defined(CONFIG_MEMORY_HOTPLUG)
+ register_memory_notifier(&crash_memhp_nb);
+#endif
+
+#if defined(CONFIG_HOTPLUG_CPU)
+ result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "crash/cpuhp",
+ crash_cpuhp_online, crash_cpuhp_offline);
+#endif
+
+ return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif /* CONFIG_CRASH_HOTPLUG */
--
2.27.0
From within crash_prepare_elf64_headers() there is a need to
reference the struct kimage hotplug members. As such, this
change passes the struct kimage as a parameter to the
crash_prepare_elf64_headers().
This is preparation for later patch, no functionality change.
Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/arm64/kernel/machine_kexec_file.c | 6 +++---
arch/powerpc/kexec/file_load_64.c | 2 +-
arch/x86/kernel/crash.c | 3 ++-
include/linux/kexec.h | 5 +++--
kernel/kexec_file.c | 4 ++--
5 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
index 59c648d51848..7dbafb42ecf2 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -39,7 +39,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
return kexec_image_post_load_cleanup_default(image);
}
-static int prepare_elf_headers(void **addr, unsigned long *sz)
+static int prepare_elf_headers(struct kimage *image, void **addr, unsigned long *sz)
{
struct crash_mem *cmem;
unsigned int nr_ranges;
@@ -67,7 +67,7 @@ static int prepare_elf_headers(void **addr, unsigned long *sz)
ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
if (!ret)
- ret = crash_prepare_elf64_headers(cmem, true, addr, sz);
+ ret = crash_prepare_elf64_headers(image, cmem, true, addr, sz);
kfree(cmem);
return ret;
@@ -96,7 +96,7 @@ int load_other_segments(struct kimage *image,
/* load elf core header */
if (image->type == KEXEC_TYPE_CRASH) {
- ret = prepare_elf_headers(&headers, &headers_sz);
+ ret = prepare_elf_headers(image, &headers, &headers_sz);
if (ret) {
pr_err("Preparing elf core header failed\n");
goto out_err;
diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index b4981b651d9a..07da6bf1cf24 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -797,7 +797,7 @@ static int load_elfcorehdr_segment(struct kimage *image, struct kexec_buf *kbuf)
goto out;
/* Setup elfcorehdr segment */
- ret = crash_prepare_elf64_headers(cmem, false, &headers, &headers_sz);
+ ret = crash_prepare_elf64_headers(image, cmem, false, &headers, &headers_sz);
if (ret) {
pr_err("Failed to prepare elf headers for the core\n");
goto out;
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9730c88530fc..9db41cce8d97 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -265,7 +265,8 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
goto out;
/* By default prepare 64bit headers */
- ret = crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);
+ ret = crash_prepare_elf64_headers(image, cmem,
+ IS_ENABLED(CONFIG_X86_64), addr, sz);
out:
vfree(cmem);
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 58d1b58a971e..f93f2591fc1e 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -227,8 +227,9 @@ struct crash_mem {
extern int crash_exclude_mem_range(struct crash_mem *mem,
unsigned long long mstart,
unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
- void **addr, unsigned long *sz);
+extern int crash_prepare_elf64_headers(struct kimage *image,
+ struct crash_mem *mem, int kernel_map,
+ void **addr, unsigned long *sz);
#endif /* CONFIG_KEXEC_FILE */
#ifdef CONFIG_KEXEC_ELF
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 8347fc158d2b..801d0d0a5012 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1260,8 +1260,8 @@ int crash_exclude_mem_range(struct crash_mem *mem,
return 0;
}
-int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
- void **addr, unsigned long *sz)
+int crash_prepare_elf64_headers(struct kimage *image, struct crash_mem *mem,
+ int kernel_map, void **addr, unsigned long *sz)
{
Elf64_Ehdr *ehdr;
Elf64_Phdr *phdr;
--
2.27.0
CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
CRASH_HOTPLUG_ELFCOREHDR_SZ is used to specify the maximum size of
the elfcorehdr buffer/segment.
This is a preparation for later usage.
Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/Kconfig | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b0142e01002e..f7b92ee1bcc7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2072,6 +2072,32 @@ config CRASH_DUMP
(CONFIG_RELOCATABLE=y).
For more details see Documentation/admin-guide/kdump/kdump.rst
+config CRASH_HOTPLUG
+ bool "kernel updates of crash elfcorehdr"
+ depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG) && KEXEC_FILE
+ help
+ Enable the kernel to update the crash elfcorehdr (which contains
+ the list of CPUs and memory regions) directly when hot plug/unplug
+ of CPUs or memory. Otherwise userspace must monitor these hot
+ plug/unplug change notifications via udev in order to
+ unload-then-reload the crash kernel so that the list of CPUs and
+ memory regions is kept up-to-date. Note that the udev CPU and
+ memory change notifications still occur (however, userspace is not
+ required to monitor for crash dump purposes).
+
+config CRASH_HOTPLUG_ELFCOREHDR_SZ
+ depends on CRASH_HOTPLUG
+ int
+ default 131072
+ help
+ Specify the maximum size of the elfcorehdr buffer/segment.
+ The 128KiB default is sized so that it can accommodate 2048
+ Elf64_Phdr, where each Phdr represents either a CPU or a
+ region of memory.
+ For example, this size can accommodate a machine with up to 1024
+ CPUs and up to 1024 memory regions, eg. as represented by the
+ 'System RAM' entries in /proc/iomem.
+
config KEXEC_JUMP
bool "kexec jump"
depends on KEXEC && HIBERNATION
--
2.27.0
For x86_64, when CPU or memory is hot un/plugged, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.
To update the elfcorehdr for x86_64, a new elfcorehdr must be
generated from the available CPUs and memory. The new elfcorehdr
is prepared into a buffer, and then installed over the top of
the existing elfcorehdr.
In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated. As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.
To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_HOTPLUG_ELFCOREHDR_SZ configure item.
With this change, crash hotplug for kexec_file_load syscall
is supported. When loading the crash kernel via kexec_file_load,
the elfcorehdr is identified at load time in crash_load_segments().
Signed-off-by: Eric DeVolder <[email protected]>
---
arch/x86/kernel/crash.c | 117 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 117 insertions(+)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9db41cce8d97..47adf69c9f71 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/memblock.h>
+#include <linux/highmem.h>
#include <asm/processor.h>
#include <asm/hardirq.h>
@@ -398,7 +399,17 @@ int crash_load_segments(struct kimage *image)
image->elf_headers = kbuf.buffer;
image->elf_headers_sz = kbuf.bufsz;
+#ifdef CONFIG_CRASH_HOTPLUG
+ /* Ensure elfcorehdr segment large enough for hotplug changes */
+ kbuf.memsz = CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ;
+ /* For marking as usable to crash kernel */
+ image->elf_headers_sz = kbuf.memsz;
+ /* Record the index of the elfcorehdr segment */
+ image->elfcorehdr_index = image->nr_segments;
+ image->elfcorehdr_index_valid = true;
+#else
kbuf.memsz = kbuf.bufsz;
+#endif
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
@@ -413,3 +424,109 @@ int crash_load_segments(struct kimage *image)
return ret;
}
#endif /* CONFIG_KEXEC_FILE */
+
+#ifdef CONFIG_CRASH_HOTPLUG
+static void *map_crash_pages(unsigned long paddr, unsigned long size)
+{
+ /*
+ * NOTE: The addresses and sizes passed to this routine have
+ * already been fully aligned on page boundaries. There is no
+ * need for massaging the address or size.
+ */
+ void *ptr = NULL;
+
+ /* NOTE: requires arch_kexec_[un]protect_crashkres() for write access */
+ if (size > 0) {
+ struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
+
+ ptr = kmap(page);
+ }
+
+ return ptr;
+}
+
+static void unmap_crash_pages(void **ptr)
+{
+ if (ptr) {
+ if (*ptr)
+ kunmap(*ptr);
+ *ptr = NULL;
+ }
+}
+
+/**
+ * arch_crash_hotplug_handler() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage
+ * @hp_action: the hot un/plug action being handled
+ * @cpu: when hp_action is KEXEC_CRASH_HP_ADD|REMOVE_CPU, the affected cpu
+ *
+ * To accurately reflect hot un/plug changes, the elfcorehdr (which
+ * is passed to the crash kernel via the elfcorehdr= parameter)
+ * must be updated with the new list of CPUs and memories. The new
+ * elfcorehdr is prepared in a kernel buffer, and then it is
+ * written on top of the existing/old elfcorehdr.
+ *
+ * For hotplug changes to elfcorehdr to work, two conditions are
+ * needed:
+ * First, the segment containing the elfcorehdr must be large enough
+ * to permit a growing number of resources. See
+ * CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ.
+ * Second, purgatory must explicitly exclude the elfcorehdr from the
+ * list of segments it checks (since the elfcorehdr changes and thus
+ * would require an update to purgatory itself to update the digest).
+ *
+ */
+void arch_crash_hotplug_handler(struct kimage *image,
+ unsigned int hp_action, unsigned int cpu)
+{
+ struct kexec_segment *ksegment;
+ unsigned char *ptr = NULL;
+ unsigned long elfsz = 0;
+ void *elfbuf = NULL;
+ unsigned long mem, memsz;
+
+ if (!image->elfcorehdr_index_valid) {
+ pr_err("crash hp: unable to locate elfcorehdr segment");
+ goto out;
+ }
+
+ ksegment = &image->segment[image->elfcorehdr_index];
+ mem = ksegment->mem;
+ memsz = ksegment->memsz;
+
+ /*
+ * Create the new elfcorehdr reflecting the changes to CPU and/or
+ * memory resources.
+ */
+ if (prepare_elf_headers(image, &elfbuf, &elfsz)) {
+ pr_err("crash hp: unable to prepare elfcore headers");
+ goto out;
+ }
+ if (elfsz > memsz) {
+ pr_err("crash hp: update elfcorehdr elfsz %lu > memsz %lu",
+ elfsz, memsz);
+ goto out;
+ }
+
+ /*
+ * At this point, we are all but assured of success.
+ * Copy new elfcorehdr into destination.
+ */
+ ptr = map_crash_pages(mem, memsz);
+ if (ptr) {
+ /*
+ * Temporarily invalidate the crash image while the
+ * elfcorehdr is updated.
+ */
+ xchg(&kexec_crash_image, NULL);
+ memcpy_flushcache((void *)ptr, elfbuf, elfsz);
+ xchg(&kexec_crash_image, image);
+ }
+ unmap_crash_pages((void **)&ptr);
+ pr_debug("crash hp: re-loaded elfcorehdr at 0x%lx\n", mem);
+
+out:
+ if (elfbuf)
+ vfree(elfbuf);
+}
+#endif /* CONFIG_CRASH_HOTPLUG */
--
2.27.0
For kexec_file_load support, the loading of the crash kernel occurs
entirely within the kernel, and as such the elfcorehdr is readily
identified (so that it can be modified upon hotplug events).
This change enables support for kexec_load by identifying the
elfcorehdr segment in the arch_crash_hotplug_handler(), if it has
not already been identified.
Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/kernel/crash.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 47adf69c9f71..aa2d9680431a 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -485,6 +485,30 @@ void arch_crash_hotplug_handler(struct kimage *image,
void *elfbuf = NULL;
unsigned long mem, memsz;
+ /*
+ * When the struct kimage is alloced, it is wiped to zero, so
+ * the elfcorehdr_index_valid defaults to false. It is set on the
+ * kexec_file_load path, or here for kexec_load, if not already
+ * identified.
+ */
+ if (!image->elfcorehdr_index_valid) {
+ unsigned int n;
+
+ for (n = 0; n < image->nr_segments; n++) {
+ mem = image->segment[n].mem;
+ memsz = image->segment[n].memsz;
+ ptr = map_crash_pages(mem, memsz);
+ if (ptr) {
+ /* The segment containing elfcorehdr */
+ if (memcmp(ptr, ELFMAG, SELFMAG) == 0) {
+ image->elfcorehdr_index = (int)n;
+ image->elfcorehdr_index_valid = true;
+ }
+ }
+ unmap_crash_pages((void **)&ptr);
+ }
+ }
+
if (!image->elfcorehdr_index_valid) {
pr_err("crash hp: unable to locate elfcorehdr segment");
goto out;
--
2.27.0
On 04/13/22 at 12:42pm, Eric DeVolder wrote:
> For x86_64, when CPU or memory is hot un/plugged, the crash
> elfcorehdr, which describes the CPUs and memory in the system,
> must also be updated.
>
> To update the elfcorehdr for x86_64, a new elfcorehdr must be
> generated from the available CPUs and memory. The new elfcorehdr
> is prepared into a buffer, and then installed over the top of
> the existing elfcorehdr.
>
> In the patch 'kexec: exclude elfcorehdr from the segment digest'
> the need to update purgatory due to the change in elfcorehdr was
> eliminated. As a result, no changes to purgatory or boot_params
> (as the elfcorehdr= kernel command line parameter pointer
> remains unchanged and correct) are needed, just elfcorehdr.
>
> To accommodate a growing number of resources via hotplug, the
> elfcorehdr segment must be sufficiently large enough to accommodate
> changes, see the CRASH_HOTPLUG_ELFCOREHDR_SZ configure item.
>
> With this change, crash hotplug for kexec_file_load syscall
> is supported. When loading the crash kernel via kexec_file_load,
> the elfcorehdr is identified at load time in crash_load_segments().
>
> Signed-off-by: Eric DeVolder <[email protected]>
> ---
> arch/x86/kernel/crash.c | 117 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 117 insertions(+)
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 9db41cce8d97..47adf69c9f71 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -25,6 +25,7 @@
> #include <linux/slab.h>
> #include <linux/vmalloc.h>
> #include <linux/memblock.h>
> +#include <linux/highmem.h>
>
> #include <asm/processor.h>
> #include <asm/hardirq.h>
> @@ -398,7 +399,17 @@ int crash_load_segments(struct kimage *image)
> image->elf_headers = kbuf.buffer;
> image->elf_headers_sz = kbuf.bufsz;
>
> +#ifdef CONFIG_CRASH_HOTPLUG
> + /* Ensure elfcorehdr segment large enough for hotplug changes */
> + kbuf.memsz = CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ;
> + /* For marking as usable to crash kernel */
> + image->elf_headers_sz = kbuf.memsz;
> + /* Record the index of the elfcorehdr segment */
> + image->elfcorehdr_index = image->nr_segments;
> + image->elfcorehdr_index_valid = true;
> +#else
> kbuf.memsz = kbuf.bufsz;
> +#endif
> kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
> kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
> ret = kexec_add_buffer(&kbuf);
> @@ -413,3 +424,109 @@ int crash_load_segments(struct kimage *image)
> return ret;
> }
> #endif /* CONFIG_KEXEC_FILE */
> +
> +#ifdef CONFIG_CRASH_HOTPLUG
> +static void *map_crash_pages(unsigned long paddr, unsigned long size)
> +{
> + /*
> + * NOTE: The addresses and sizes passed to this routine have
> + * already been fully aligned on page boundaries. There is no
> + * need for massaging the address or size.
> + */
> + void *ptr = NULL;
> +
> + /* NOTE: requires arch_kexec_[un]protect_crashkres() for write access */
> + if (size > 0) {
> + struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
> +
> + ptr = kmap(page);
> + }
> +
> + return ptr;
> +}
> +
> +static void unmap_crash_pages(void **ptr)
> +{
> + if (ptr) {
> + if (*ptr)
> + kunmap(*ptr);
> + *ptr = NULL;
> + }
> +}
> +
> +/**
> + * arch_crash_hotplug_handler() - Handle hotplug elfcorehdr changes
> + * @image: the active struct kimage
> + * @hp_action: the hot un/plug action being handled
> + * @cpu: when hp_action is KEXEC_CRASH_HP_ADD|REMOVE_CPU, the affected cpu
> + *
> + * To accurately reflect hot un/plug changes, the elfcorehdr (which
> + * is passed to the crash kernel via the elfcorehdr= parameter)
> + * must be updated with the new list of CPUs and memories. The new
> + * elfcorehdr is prepared in a kernel buffer, and then it is
> + * written on top of the existing/old elfcorehdr.
> + *
> + * For hotplug changes to elfcorehdr to work, two conditions are
> + * needed:
> + * First, the segment containing the elfcorehdr must be large enough
> + * to permit a growing number of resources. See
> + * CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ.
> + * Second, purgatory must explicitly exclude the elfcorehdr from the
> + * list of segments it checks (since the elfcorehdr changes and thus
> + * would require an update to purgatory itself to update the digest).
> + *
> + */
> +void arch_crash_hotplug_handler(struct kimage *image,
> + unsigned int hp_action, unsigned int cpu)
We have stored the necessary information into kimage, e.g image->hotplug_event,
image->offlinecpu, do we still need to pass down the hp_action and cpu
in arch_crash_hotplug_handler()? Do you foresee it will be used in other
architectures?
> +{
> + struct kexec_segment *ksegment;
> + unsigned char *ptr = NULL;
> + unsigned long elfsz = 0;
> + void *elfbuf = NULL;
> + unsigned long mem, memsz;
> +
> + if (!image->elfcorehdr_index_valid) {
> + pr_err("crash hp: unable to locate elfcorehdr segment");
> + goto out;
> + }
> +
> + ksegment = &image->segment[image->elfcorehdr_index];
> + mem = ksegment->mem;
> + memsz = ksegment->memsz;
> +
> + /*
> + * Create the new elfcorehdr reflecting the changes to CPU and/or
> + * memory resources.
> + */
> + if (prepare_elf_headers(image, &elfbuf, &elfsz)) {
> + pr_err("crash hp: unable to prepare elfcore headers");
> + goto out;
> + }
> + if (elfsz > memsz) {
> + pr_err("crash hp: update elfcorehdr elfsz %lu > memsz %lu",
> + elfsz, memsz);
> + goto out;
> + }
> +
> + /*
> + * At this point, we are all but assured of success.
> + * Copy new elfcorehdr into destination.
> + */
> + ptr = map_crash_pages(mem, memsz);
> + if (ptr) {
> + /*
> + * Temporarily invalidate the crash image while the
> + * elfcorehdr is updated.
> + */
> + xchg(&kexec_crash_image, NULL);
> + memcpy_flushcache((void *)ptr, elfbuf, elfsz);
> + xchg(&kexec_crash_image, image);
> + }
> + unmap_crash_pages((void **)&ptr);
> + pr_debug("crash hp: re-loaded elfcorehdr at 0x%lx\n", mem);
> +
> +out:
> + if (elfbuf)
> + vfree(elfbuf);
> +}
> +#endif /* CONFIG_CRASH_HOTPLUG */
> --
> 2.27.0
>
On 04/13/22 at 12:42pm, Eric DeVolder wrote:
> Upon CPU and memory changes, a generic crash_hotplug_handler()
> dispatches the hot plug/unplug event to the architecture specific
> arch_crash_hotplug_handler(). During the process, the kexec_mutex
> is held.
>
> To support cpu hotplug, a callback is registered to capture the
> CPUHP_AP_ONLINE_DYN online and ofline events via
> cpuhp_setup_state_nocalls().
>
> To support memory hotplug, a notifier is registered to capture the
> MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>
> The cpu callback and memory notifier then call crash_hotplug_handler()
> to handle the hot plug/unplug event.
>
> Signed-off-by: Eric DeVolder <[email protected]>
> ---
> include/linux/kexec.h | 16 +++++++
> kernel/crash_core.c | 101 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 117 insertions(+)
>
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index f93f2591fc1e..02daff1f47dd 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -306,6 +306,13 @@ struct kimage {
>
> /* Information for loading purgatory */
> struct purgatory_info purgatory_info;
> +
> +#ifdef CONFIG_CRASH_HOTPLUG
> + bool hotplug_event;
> + unsigned int offlinecpu;
> + bool elfcorehdr_index_valid;
> + int elfcorehdr_index;
> +#endif
> #endif
>
> #ifdef CONFIG_IMA_KEXEC
> @@ -322,6 +329,15 @@ struct kimage {
> unsigned long elf_load_addr;
> };
>
> +#ifdef CONFIG_CRASH_HOTPLUG
> +void arch_crash_hotplug_handler(struct kimage *image,
> + unsigned int hp_action, unsigned int cpu);
> +#define KEXEC_CRASH_HP_REMOVE_CPU 0
> +#define KEXEC_CRASH_HP_ADD_CPU 1
> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> +#define KEXEC_CRASH_HP_ADD_MEMORY 3
> +#endif /* CONFIG_CRASH_HOTPLUG */
> +
> /* kexec interface functions */
> extern void machine_kexec(struct kimage *image);
> extern int machine_kexec_prepare(struct kimage *image);
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 256cf6db573c..ecf746243ab2 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -9,12 +9,17 @@
> #include <linux/init.h>
> #include <linux/utsname.h>
> #include <linux/vmalloc.h>
> +#include <linux/highmem.h>
> +#include <linux/memory.h>
> +#include <linux/cpuhotplug.h>
>
> #include <asm/page.h>
> #include <asm/sections.h>
>
> #include <crypto/sha1.h>
>
> +#include "kexec_internal.h"
> +
> /* vmcoreinfo stuff */
> unsigned char *vmcoreinfo_data;
> size_t vmcoreinfo_size;
> @@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
> }
>
> subsys_initcall(crash_save_vmcoreinfo_init);
> +
> +#ifdef CONFIG_CRASH_HOTPLUG
> +void __weak arch_crash_hotplug_handler(struct kimage *image,
> + unsigned int hp_action, unsigned int cpu)
> +{
> + pr_warn("crash hp: %s not implemented", __func__);
> +}
> +
> +static void crash_hotplug_handler(unsigned int hp_action,
> + unsigned int cpu)
> +{
> + /* Obtain lock while changing crash information */
> + if (!mutex_trylock(&kexec_mutex))
> + return;
> +
> + /* Check kdump is loaded */
> + if (kexec_crash_image) {
> + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
> +
> + /* Needed in order for the segments to be updated */
> + arch_kexec_unprotect_crashkres();
> +
> + /* Flag to differentiate between normal load and hotplug */
> + kexec_crash_image->hotplug_event = true;
> +
> + /* Now invoke arch-specific update handler */
> + arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
> +
> + /* No longer handling a hotplug event */
> + kexec_crash_image->hotplug_event = false;
> +
> + /* Change back to read-only */
> + arch_kexec_protect_crashkres();
> + }
> +
> + /* Release lock now that update complete */
> + mutex_unlock(&kexec_mutex);
> +}
> +
> +#if defined(CONFIG_MEMORY_HOTPLUG)
> +static int crash_memhp_notifier(struct notifier_block *nb,
> + unsigned long val, void *v)
> +{
> + struct memory_notify *mhp = v;
> +
> + switch (val) {
> + case MEM_ONLINE:
> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
We don't differentiate the memory add/remove, cpu add, except of cpu
remove. Means the hp_action only differentiate cpu remove from the other
action. Maybe only making two types?
#define KEXEC_CRASH_HP_REMOVE_CPU 0
#define KEXEC_CRASH_HP_UPDATE_OTHER 1
And define a new macro to replace the magic number?
#define KEXEC_CRASH_HP_INVALID_CPU -1U
> + break;
> +
> + case MEM_OFFLINE:
> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY, -1U);
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block crash_memhp_nb = {
> + .notifier_call = crash_memhp_notifier,
> + .priority = 0
> +};
> +#endif
> +
> +#if defined(CONFIG_HOTPLUG_CPU)
> +static int crash_cpuhp_online(unsigned int cpu)
> +{
> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
Should making the cpu as -1U?
crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, -1U);
> + return 0;
> +}
> +
> +static int crash_cpuhp_offline(unsigned int cpu)
> +{
> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
> + return 0;
> +}
> +#endif
> +
> +static int __init crash_hotplug_init(void)
> +{
> + int result = 0;
> +
> +#if defined(CONFIG_MEMORY_HOTPLUG)
> + register_memory_notifier(&crash_memhp_nb);
> +#endif
> +
> +#if defined(CONFIG_HOTPLUG_CPU)
> + result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> + "crash/cpuhp",
> + crash_cpuhp_online, crash_cpuhp_offline);
> +#endif
> +
> + return result;
> +}
> +
> +subsys_initcall(crash_hotplug_init);
> +#endif /* CONFIG_CRASH_HOTPLUG */
> --
> 2.27.0
>
The pr_debug() intends to display the memsz member, but the
parameter is actually the bufsz member (which is already
displayed). Correct this to display memsz value.
Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/kernel/crash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8326a8d1c5d..9730c88530fc 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -407,7 +407,7 @@ int crash_load_segments(struct kimage *image)
}
image->elf_load_addr = kbuf.mem;
pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
- image->elf_load_addr, kbuf.bufsz, kbuf.bufsz);
+ image->elf_load_addr, kbuf.bufsz, kbuf.memsz);
return ret;
}
--
2.27.0
Due to use of CPUHP_AP_ONLINE_DYN, upon CPU unplug, the CPU is
still in the for_each_present_cpu() list when within the
crash_hotplug_handler(). Thus the CPU must be explicitly excluded
when building the new list of CPUs.
This change identifies in crash_hotplug_handler() the CPU to be
excluded, and the check for excluding the CPU in
crash_prepare_elf64_headers().
Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
kernel/crash_core.c | 9 +++++++++
kernel/kexec_file.c | 5 +++++
2 files changed, 14 insertions(+)
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index ecf746243ab2..036243b1f252 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -521,6 +521,15 @@ static void crash_hotplug_handler(unsigned int hp_action,
/* Flag to differentiate between normal load and hotplug */
kexec_crash_image->hotplug_event = true;
+ /*
+ * Due to use of CPUHP_AP_ONLINE_DYN, upon unplug and during
+ * this callback, the CPU is still in the for_each_present_cpu()
+ * list. Must explicitly look to exclude this CPU when building
+ * new list.
+ */
+ kexec_crash_image->offlinecpu =
+ (hp_action == KEXEC_CRASH_HP_REMOVE_CPU) ? cpu : ~0U;
+
/* Now invoke arch-specific update handler */
arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 94a459209111..9d5c4eea0179 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1314,6 +1314,11 @@ int crash_prepare_elf64_headers(struct kimage *image, struct crash_mem *mem,
/* Prepare one phdr of type PT_NOTE for each present CPU */
for_each_present_cpu(cpu) {
+#ifdef CONFIG_CRASH_HOTPLUG
+ /* Skip the soon-to-be offlined cpu */
+ if (image->hotplug_event && (cpu == image->offlinecpu))
+ continue;
+#endif
phdr->p_type = PT_NOTE;
notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
phdr->p_offset = phdr->p_paddr = notes_addr;
--
2.27.0
Hi Baoquan,
Inline comments below.
Thanks!
eric
On 4/13/22 21:45, Baoquan He wrote:
> On 04/13/22 at 12:42pm, Eric DeVolder wrote:
>> Upon CPU and memory changes, a generic crash_hotplug_handler()
>> dispatches the hot plug/unplug event to the architecture specific
>> arch_crash_hotplug_handler(). During the process, the kexec_mutex
>> is held.
>>
>> To support cpu hotplug, a callback is registered to capture the
>> CPUHP_AP_ONLINE_DYN online and ofline events via
>> cpuhp_setup_state_nocalls().
>>
>> To support memory hotplug, a notifier is registered to capture the
>> MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>>
>> The cpu callback and memory notifier then call crash_hotplug_handler()
>> to handle the hot plug/unplug event.
>>
>> Signed-off-by: Eric DeVolder <[email protected]>
>> ---
>> include/linux/kexec.h | 16 +++++++
>> kernel/crash_core.c | 101 ++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 117 insertions(+)
>>
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index f93f2591fc1e..02daff1f47dd 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -306,6 +306,13 @@ struct kimage {
>>
>> /* Information for loading purgatory */
>> struct purgatory_info purgatory_info;
>> +
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> + bool hotplug_event;
>> + unsigned int offlinecpu;
>> + bool elfcorehdr_index_valid;
>> + int elfcorehdr_index;
>> +#endif
>> #endif
>>
>> #ifdef CONFIG_IMA_KEXEC
>> @@ -322,6 +329,15 @@ struct kimage {
>> unsigned long elf_load_addr;
>> };
>>
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +void arch_crash_hotplug_handler(struct kimage *image,
>> + unsigned int hp_action, unsigned int cpu);
>> +#define KEXEC_CRASH_HP_REMOVE_CPU 0
>> +#define KEXEC_CRASH_HP_ADD_CPU 1
>> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
>> +#define KEXEC_CRASH_HP_ADD_MEMORY 3
>> +#endif /* CONFIG_CRASH_HOTPLUG */
>> +
>> /* kexec interface functions */
>> extern void machine_kexec(struct kimage *image);
>> extern int machine_kexec_prepare(struct kimage *image);
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 256cf6db573c..ecf746243ab2 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -9,12 +9,17 @@
>> #include <linux/init.h>
>> #include <linux/utsname.h>
>> #include <linux/vmalloc.h>
>> +#include <linux/highmem.h>
>> +#include <linux/memory.h>
>> +#include <linux/cpuhotplug.h>
>>
>> #include <asm/page.h>
>> #include <asm/sections.h>
>>
>> #include <crypto/sha1.h>
>>
>> +#include "kexec_internal.h"
>> +
>> /* vmcoreinfo stuff */
>> unsigned char *vmcoreinfo_data;
>> size_t vmcoreinfo_size;
>> @@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
>> }
>>
>> subsys_initcall(crash_save_vmcoreinfo_init);
>> +
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +void __weak arch_crash_hotplug_handler(struct kimage *image,
>> + unsigned int hp_action, unsigned int cpu)
>> +{
>> + pr_warn("crash hp: %s not implemented", __func__);
>> +}
>> +
>> +static void crash_hotplug_handler(unsigned int hp_action,
>> + unsigned int cpu)
>> +{
>> + /* Obtain lock while changing crash information */
>> + if (!mutex_trylock(&kexec_mutex))
>> + return;
>> +
>> + /* Check kdump is loaded */
>> + if (kexec_crash_image) {
>> + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
>> +
>> + /* Needed in order for the segments to be updated */
>> + arch_kexec_unprotect_crashkres();
>> +
>> + /* Flag to differentiate between normal load and hotplug */
>> + kexec_crash_image->hotplug_event = true;
>> +
>> + /* Now invoke arch-specific update handler */
>> + arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
>> +
>> + /* No longer handling a hotplug event */
>> + kexec_crash_image->hotplug_event = false;
>> +
>> + /* Change back to read-only */
>> + arch_kexec_protect_crashkres();
>> + }
>> +
>> + /* Release lock now that update complete */
>> + mutex_unlock(&kexec_mutex);
>> +}
>> +
>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>> +static int crash_memhp_notifier(struct notifier_block *nb,
>> + unsigned long val, void *v)
>> +{
>> + struct memory_notify *mhp = v;
>> +
>> + switch (val) {
>> + case MEM_ONLINE:
>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
> We don't differentiate the memory add/remove, cpu add, except of cpu
> remove. Means the hp_action only differentiate cpu remove from the other
> action. Maybe only making two types?
>
> #define KEXEC_CRASH_HP_REMOVE_CPU 0
> #define KEXEC_CRASH_HP_UPDATE_OTHER 1
>
Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and ADD_MEMORY.
Do you still want to consolidate these?
> And define a new macro to replace the magic number?
Yes, I will do this.
>
> #define KEXEC_CRASH_HP_INVALID_CPU -1U
>> + break;
>> +
>> + case MEM_OFFLINE:
>> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY, -1U);
>> + break;
>> + }
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block crash_memhp_nb = {
>> + .notifier_call = crash_memhp_notifier,
>> + .priority = 0
>> +};
>> +#endif
>> +
>> +#if defined(CONFIG_HOTPLUG_CPU)
>> +static int crash_cpuhp_online(unsigned int cpu)
>> +{
>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
> Should making the cpu as -1U?
Well, the cpu would be correct/valid cpu. Is there harm in reporting it?
> crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, -1U);
>> + return 0;
>> +}
>> +
>> +static int crash_cpuhp_offline(unsigned int cpu)
>> +{
>> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
>> + return 0;
>> +}
>> +#endif
>> +
>> +static int __init crash_hotplug_init(void)
>> +{
>> + int result = 0;
>> +
>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>> + register_memory_notifier(&crash_memhp_nb);
>> +#endif
>> +
>> +#if defined(CONFIG_HOTPLUG_CPU)
>> + result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>> + "crash/cpuhp",
>> + crash_cpuhp_online, crash_cpuhp_offline);
>> +#endif
>> +
>> + return result;
>> +}
>> +
>> +subsys_initcall(crash_hotplug_init);
>> +#endif /* CONFIG_CRASH_HOTPLUG */
>> --
>> 2.27.0
>>
>
On Wed, Apr 13, 2022 at 12:42:31PM -0400, Eric DeVolder wrote:
> CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
What for?
Why don't you put all that new code you're adding under an
MEMORY_HOTPLUG ifdef? It seems you would need to do that when memory
hotplug is enabled, anyway.
Also, looking further into your patchset, you have ugly ifdeffery.
Instead of that, pls add stubs for the !MEMORY_HOTPLUG case so that
everything is abstracted away in the headers.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 4/13/22 21:52, Baoquan He wrote:
> On 04/13/22 at 12:42pm, Eric DeVolder wrote:
>> For x86_64, when CPU or memory is hot un/plugged, the crash
>> elfcorehdr, which describes the CPUs and memory in the system,
>> must also be updated.
>>
>> To update the elfcorehdr for x86_64, a new elfcorehdr must be
>> generated from the available CPUs and memory. The new elfcorehdr
>> is prepared into a buffer, and then installed over the top of
>> the existing elfcorehdr.
>>
>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
>> the need to update purgatory due to the change in elfcorehdr was
>> eliminated. As a result, no changes to purgatory or boot_params
>> (as the elfcorehdr= kernel command line parameter pointer
>> remains unchanged and correct) are needed, just elfcorehdr.
>>
>> To accommodate a growing number of resources via hotplug, the
>> elfcorehdr segment must be sufficiently large enough to accommodate
>> changes, see the CRASH_HOTPLUG_ELFCOREHDR_SZ configure item.
>>
>> With this change, crash hotplug for kexec_file_load syscall
>> is supported. When loading the crash kernel via kexec_file_load,
>> the elfcorehdr is identified at load time in crash_load_segments().
>>
>> Signed-off-by: Eric DeVolder <[email protected]>
>> ---
>> arch/x86/kernel/crash.c | 117 ++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 117 insertions(+)
>>
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 9db41cce8d97..47adf69c9f71 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -25,6 +25,7 @@
>> #include <linux/slab.h>
>> #include <linux/vmalloc.h>
>> #include <linux/memblock.h>
>> +#include <linux/highmem.h>
>>
>> #include <asm/processor.h>
>> #include <asm/hardirq.h>
>> @@ -398,7 +399,17 @@ int crash_load_segments(struct kimage *image)
>> image->elf_headers = kbuf.buffer;
>> image->elf_headers_sz = kbuf.bufsz;
>>
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> + /* Ensure elfcorehdr segment large enough for hotplug changes */
>> + kbuf.memsz = CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ;
>> + /* For marking as usable to crash kernel */
>> + image->elf_headers_sz = kbuf.memsz;
>> + /* Record the index of the elfcorehdr segment */
>> + image->elfcorehdr_index = image->nr_segments;
>> + image->elfcorehdr_index_valid = true;
>> +#else
>> kbuf.memsz = kbuf.bufsz;
>> +#endif
>> kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
>> kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>> ret = kexec_add_buffer(&kbuf);
>> @@ -413,3 +424,109 @@ int crash_load_segments(struct kimage *image)
>> return ret;
>> }
>> #endif /* CONFIG_KEXEC_FILE */
>> +
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +static void *map_crash_pages(unsigned long paddr, unsigned long size)
>> +{
>> + /*
>> + * NOTE: The addresses and sizes passed to this routine have
>> + * already been fully aligned on page boundaries. There is no
>> + * need for massaging the address or size.
>> + */
>> + void *ptr = NULL;
>> +
>> + /* NOTE: requires arch_kexec_[un]protect_crashkres() for write access */
>> + if (size > 0) {
>> + struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
>> +
>> + ptr = kmap(page);
>> + }
>> +
>> + return ptr;
>> +}
>> +
>> +static void unmap_crash_pages(void **ptr)
>> +{
>> + if (ptr) {
>> + if (*ptr)
>> + kunmap(*ptr);
>> + *ptr = NULL;
>> + }
>> +}
>> +
>> +/**
>> + * arch_crash_hotplug_handler() - Handle hotplug elfcorehdr changes
>> + * @image: the active struct kimage
>> + * @hp_action: the hot un/plug action being handled
>> + * @cpu: when hp_action is KEXEC_CRASH_HP_ADD|REMOVE_CPU, the affected cpu
>> + *
>> + * To accurately reflect hot un/plug changes, the elfcorehdr (which
>> + * is passed to the crash kernel via the elfcorehdr= parameter)
>> + * must be updated with the new list of CPUs and memories. The new
>> + * elfcorehdr is prepared in a kernel buffer, and then it is
>> + * written on top of the existing/old elfcorehdr.
>> + *
>> + * For hotplug changes to elfcorehdr to work, two conditions are
>> + * needed:
>> + * First, the segment containing the elfcorehdr must be large enough
>> + * to permit a growing number of resources. See
>> + * CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ.
>> + * Second, purgatory must explicitly exclude the elfcorehdr from the
>> + * list of segments it checks (since the elfcorehdr changes and thus
>> + * would require an update to purgatory itself to update the digest).
>> + *
>> + */
>> +void arch_crash_hotplug_handler(struct kimage *image,
>> + unsigned int hp_action, unsigned int cpu)
>
> We have stored the necessary information into kimage, e.g image->hotplug_event,
> image->offlinecpu, do we still need to pass down the hp_action and cpu
> in arch_crash_hotplug_handler()? Do you foresee it will be used in other
> architectures?
Sourabh Jain's work with PPC is using in fact using hp_action and checking its value.
eric
>
>> +{
>> + struct kexec_segment *ksegment;
>> + unsigned char *ptr = NULL;
>> + unsigned long elfsz = 0;
>> + void *elfbuf = NULL;
>> + unsigned long mem, memsz;
>> +
>> + if (!image->elfcorehdr_index_valid) {
>> + pr_err("crash hp: unable to locate elfcorehdr segment");
>> + goto out;
>> + }
>> +
>> + ksegment = &image->segment[image->elfcorehdr_index];
>> + mem = ksegment->mem;
>> + memsz = ksegment->memsz;
>> +
>> + /*
>> + * Create the new elfcorehdr reflecting the changes to CPU and/or
>> + * memory resources.
>> + */
>> + if (prepare_elf_headers(image, &elfbuf, &elfsz)) {
>> + pr_err("crash hp: unable to prepare elfcore headers");
>> + goto out;
>> + }
>> + if (elfsz > memsz) {
>> + pr_err("crash hp: update elfcorehdr elfsz %lu > memsz %lu",
>> + elfsz, memsz);
>> + goto out;
>> + }
>> +
>> + /*
>> + * At this point, we are all but assured of success.
>> + * Copy new elfcorehdr into destination.
>> + */
>> + ptr = map_crash_pages(mem, memsz);
>> + if (ptr) {
>> + /*
>> + * Temporarily invalidate the crash image while the
>> + * elfcorehdr is updated.
>> + */
>> + xchg(&kexec_crash_image, NULL);
>> + memcpy_flushcache((void *)ptr, elfbuf, elfsz);
>> + xchg(&kexec_crash_image, image);
>> + }
>> + unmap_crash_pages((void **)&ptr);
>> + pr_debug("crash hp: re-loaded elfcorehdr at 0x%lx\n", mem);
>> +
>> +out:
>> + if (elfbuf)
>> + vfree(elfbuf);
>> +}
>> +#endif /* CONFIG_CRASH_HOTPLUG */
>> --
>> 2.27.0
>>
>
On 04/14/22 at 05:29pm, Eric DeVolder wrote:
> Hi Baoquan,
> Inline comments below.
> Thanks!
> eric
>
> On 4/13/22 21:45, Baoquan He wrote:
> > On 04/13/22 at 12:42pm, Eric DeVolder wrote:
> > > Upon CPU and memory changes, a generic crash_hotplug_handler()
> > > dispatches the hot plug/unplug event to the architecture specific
> > > arch_crash_hotplug_handler(). During the process, the kexec_mutex
> > > is held.
> > >
> > > To support cpu hotplug, a callback is registered to capture the
> > > CPUHP_AP_ONLINE_DYN online and ofline events via
> > > cpuhp_setup_state_nocalls().
> > >
> > > To support memory hotplug, a notifier is registered to capture the
> > > MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
> > >
> > > The cpu callback and memory notifier then call crash_hotplug_handler()
> > > to handle the hot plug/unplug event.
> > >
> > > Signed-off-by: Eric DeVolder <[email protected]>
> > > ---
> > > include/linux/kexec.h | 16 +++++++
> > > kernel/crash_core.c | 101 ++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 117 insertions(+)
> > >
> > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > > index f93f2591fc1e..02daff1f47dd 100644
> > > --- a/include/linux/kexec.h
> > > +++ b/include/linux/kexec.h
> > > @@ -306,6 +306,13 @@ struct kimage {
> > > /* Information for loading purgatory */
> > > struct purgatory_info purgatory_info;
> > > +
> > > +#ifdef CONFIG_CRASH_HOTPLUG
> > > + bool hotplug_event;
> > > + unsigned int offlinecpu;
> > > + bool elfcorehdr_index_valid;
> > > + int elfcorehdr_index;
> > > +#endif
> > > #endif
> > > #ifdef CONFIG_IMA_KEXEC
> > > @@ -322,6 +329,15 @@ struct kimage {
> > > unsigned long elf_load_addr;
> > > };
> > > +#ifdef CONFIG_CRASH_HOTPLUG
> > > +void arch_crash_hotplug_handler(struct kimage *image,
> > > + unsigned int hp_action, unsigned int cpu);
> > > +#define KEXEC_CRASH_HP_REMOVE_CPU 0
> > > +#define KEXEC_CRASH_HP_ADD_CPU 1
> > > +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> > > +#define KEXEC_CRASH_HP_ADD_MEMORY 3
> > > +#endif /* CONFIG_CRASH_HOTPLUG */
> > > +
> > > /* kexec interface functions */
> > > extern void machine_kexec(struct kimage *image);
> > > extern int machine_kexec_prepare(struct kimage *image);
> > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > index 256cf6db573c..ecf746243ab2 100644
> > > --- a/kernel/crash_core.c
> > > +++ b/kernel/crash_core.c
> > > @@ -9,12 +9,17 @@
> > > #include <linux/init.h>
> > > #include <linux/utsname.h>
> > > #include <linux/vmalloc.h>
> > > +#include <linux/highmem.h>
> > > +#include <linux/memory.h>
> > > +#include <linux/cpuhotplug.h>
> > > #include <asm/page.h>
> > > #include <asm/sections.h>
> > > #include <crypto/sha1.h>
> > > +#include "kexec_internal.h"
> > > +
> > > /* vmcoreinfo stuff */
> > > unsigned char *vmcoreinfo_data;
> > > size_t vmcoreinfo_size;
> > > @@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
> > > }
> > > subsys_initcall(crash_save_vmcoreinfo_init);
> > > +
> > > +#ifdef CONFIG_CRASH_HOTPLUG
> > > +void __weak arch_crash_hotplug_handler(struct kimage *image,
> > > + unsigned int hp_action, unsigned int cpu)
> > > +{
> > > + pr_warn("crash hp: %s not implemented", __func__);
> > > +}
> > > +
> > > +static void crash_hotplug_handler(unsigned int hp_action,
> > > + unsigned int cpu)
> > > +{
> > > + /* Obtain lock while changing crash information */
> > > + if (!mutex_trylock(&kexec_mutex))
> > > + return;
> > > +
> > > + /* Check kdump is loaded */
> > > + if (kexec_crash_image) {
> > > + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
> > > +
> > > + /* Needed in order for the segments to be updated */
> > > + arch_kexec_unprotect_crashkres();
> > > +
> > > + /* Flag to differentiate between normal load and hotplug */
> > > + kexec_crash_image->hotplug_event = true;
> > > +
> > > + /* Now invoke arch-specific update handler */
> > > + arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
> > > +
> > > + /* No longer handling a hotplug event */
> > > + kexec_crash_image->hotplug_event = false;
> > > +
> > > + /* Change back to read-only */
> > > + arch_kexec_protect_crashkres();
> > > + }
> > > +
> > > + /* Release lock now that update complete */
> > > + mutex_unlock(&kexec_mutex);
> > > +}
> > > +
> > > +#if defined(CONFIG_MEMORY_HOTPLUG)
> > > +static int crash_memhp_notifier(struct notifier_block *nb,
> > > + unsigned long val, void *v)
> > > +{
> > > + struct memory_notify *mhp = v;
> > > +
> > > + switch (val) {
> > > + case MEM_ONLINE:
> > > + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
> > We don't differentiate the memory add/remove, cpu add, except of cpu
> > remove. Means the hp_action only differentiate cpu remove from the other
> > action. Maybe only making two types?
> >
> > #define KEXEC_CRASH_HP_REMOVE_CPU 0
> > #define KEXEC_CRASH_HP_UPDATE_OTHER 1
> >
> Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and ADD_MEMORY.
> Do you still want to consolidate these?
I checked Sourabh's ppc patches, am wondering why memory hotplug is not
handled. Is it going to be that CONFIG_MEMORY_HOTPLUG is enabled on ppc,
while Sourabh is not willing to handle it for now?
Only passing KEXEC_CRASH_HP_ADD_MEMORY/KEXEC_CRASH_HP_REMOVE_MEMORY to
indicate that they are not supported sounds weird to me, honestly.
On Mon, Apr 18, 2022 at 05:03:39PM -0500, Eric DeVolder wrote:
> I've examined the code with this thought in mind, and I'm not exactly sure how
> this code should be restructured for !HOTPLUG stubs. I'd very much appreciate
> an example in order to facilitate accommodating the request!
For example, see init_intel_microcode() in arch/x86/include/asm/microcode.h:
#ifdef CONFIG_MICROCODE_INTEL
extern struct microcode_ops * __init init_intel_microcode(void);
#else
static inline struct microcode_ops * __init init_intel_microcode(void)
{
return NULL;
}
#endif /* CONFIG_MICROCODE_INTEL */
The actual definition then is in:
arch/x86/kernel/cpu/microcode/intel.c:
struct microcode_ops * __init init_intel_microcode(void)
and it gets enabled when CONFIG_MICROCODE_INTEL is enabled in the
.config. When CONFIG_MICROCODE_INTEL=n, the static inline stub gets used
and optimized away by the compiler.
HTH.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 4/19/22 05:32, Borislav Petkov wrote:
> On Mon, Apr 18, 2022 at 05:03:39PM -0500, Eric DeVolder wrote:
>> I've examined the code with this thought in mind, and I'm not exactly sure how
>> this code should be restructured for !HOTPLUG stubs. I'd very much appreciate
>> an example in order to facilitate accommodating the request!
>
> For example, see init_intel_microcode() in arch/x86/include/asm/microcode.h:
>
> #ifdef CONFIG_MICROCODE_INTEL
> extern struct microcode_ops * __init init_intel_microcode(void);
> #else
> static inline struct microcode_ops * __init init_intel_microcode(void)
> {
> return NULL;
> }
> #endif /* CONFIG_MICROCODE_INTEL */
>
> The actual definition then is in:
>
> arch/x86/kernel/cpu/microcode/intel.c:
> struct microcode_ops * __init init_intel_microcode(void)
>
> and it gets enabled when CONFIG_MICROCODE_INTEL is enabled in the
> .config. When CONFIG_MICROCODE_INTEL=n, the static inline stub gets used
> and optimized away by the compiler.
>
> HTH.
>
Thanks Boris. So in taking this concept and looking at, in particular, patch 4/8
"crash: add generic infrastructure for crash hotplug support", I'm not exactly
sure how to apply this technique.
For example, if the suggestion is to change crash_hotplug_init() to be the function
or a stub, then what is an appropriate place to the __init callout?
If instead the suggestion is to create !HOTPLUG stubs() for the registration
functions within crash_hotplug_init(), then that entails taking what are currently
static scope symbols in the callbacks and notifier_block and exposing (ie make non-
static) those, which doesn't seem beneficial.
Or if the suggestion is to create !HOTPLUG stubs() for the callbacks, then I thin that
is implying that crash_hotplug_init() is not #ifdef'd and always runs and always
sets-up these event handlers, regardless if we need them?
Perhaps said differently, I'm not seeing, yet, how this technique applies to this code.
Is there a specific function that you know you want this way?
Thanks,
eric
Boris,
Thanks for looking! I've inline comments below.
Eric
On 4/14/22 08:59, Borislav Petkov wrote:
> On Wed, Apr 13, 2022 at 12:42:31PM -0400, Eric DeVolder wrote:
>> CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
>
> What for?
Fair point, a new define isn't necessarily needed. I've now eliminated
CONFIG_CRASH_HOTPLUG and am using CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG
instead.
>
> Why don't you put all that new code you're adding under an
> MEMORY_HOTPLUG ifdef? It seems you would need to do that when memory
> hotplug is enabled, anyway.
I think I have done that, in patch "crash: add generic infrastructure for crash
hotplug support", I do have code under MEMORY_HOTPLUG as well as HOTPLUG_CPU.
But...
>
> Also, looking further into your patchset, you have ugly ifdeffery.
> Instead of that, pls add stubs for the !MEMORY_HOTPLUG case so that
> everything is abstracted away in the headers.
I've examined the code with this thought in mind, and I'm not exactly sure how
this code should be restructured for !HOTPLUG stubs. I'd very much appreciate
an example in order to facilitate accommodating the request!
Thanks!
eric
>
> Thx.
>
On 04/26/22 at 09:36am, Sourabh Jain wrote:
>
> On 15/04/22 03:59, Eric DeVolder wrote:
> > Hi Baoquan,
> > Inline comments below.
> > Thanks!
> > eric
> >
> > On 4/13/22 21:45, Baoquan He wrote:
> > > On 04/13/22 at 12:42pm, Eric DeVolder wrote:
> > > > Upon CPU and memory changes, a generic crash_hotplug_handler()
> > > > dispatches the hot plug/unplug event to the architecture specific
> > > > arch_crash_hotplug_handler(). During the process, the kexec_mutex
> > > > is held.
> > > >
> > > > To support cpu hotplug, a callback is registered to capture the
> > > > CPUHP_AP_ONLINE_DYN online and ofline events via
> > > > cpuhp_setup_state_nocalls().
> > > >
> > > > To support memory hotplug, a notifier is registered to capture the
> > > > MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
> > > >
> > > > The cpu callback and memory notifier then call crash_hotplug_handler()
> > > > to handle the hot plug/unplug event.
> > > >
> > > > Signed-off-by: Eric DeVolder <[email protected]>
> > > > ---
> > > > ? include/linux/kexec.h |? 16 +++++++
> > > > ? kernel/crash_core.c?? | 101
> > > > ++++++++++++++++++++++++++++++++++++++++++
> > > > ? 2 files changed, 117 insertions(+)
> > > >
> > > > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > > > index f93f2591fc1e..02daff1f47dd 100644
> > > > --- a/include/linux/kexec.h
> > > > +++ b/include/linux/kexec.h
> > > > @@ -306,6 +306,13 @@ struct kimage {
> > > > ? ????? /* Information for loading purgatory */
> > > > ????? struct purgatory_info purgatory_info;
> > > > +
> > > > +#ifdef CONFIG_CRASH_HOTPLUG
> > > > +??? bool hotplug_event;
> > > > +??? unsigned int offlinecpu;
> > > > +??? bool elfcorehdr_index_valid;
> > > > +??? int elfcorehdr_index;
> > > > +#endif
> > > > ? #endif
> > > > ? ? #ifdef CONFIG_IMA_KEXEC
> > > > @@ -322,6 +329,15 @@ struct kimage {
> > > > ????? unsigned long elf_load_addr;
> > > > ? };
> > > > ? +#ifdef CONFIG_CRASH_HOTPLUG
> > > > +void arch_crash_hotplug_handler(struct kimage *image,
> > > > +??? unsigned int hp_action, unsigned int cpu);
> > > > +#define KEXEC_CRASH_HP_REMOVE_CPU?? 0
> > > > +#define KEXEC_CRASH_HP_ADD_CPU????? 1
> > > > +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> > > > +#define KEXEC_CRASH_HP_ADD_MEMORY?? 3
> > > > +#endif /* CONFIG_CRASH_HOTPLUG */
> > > > +
> > > > ? /* kexec interface functions */
> > > > ? extern void machine_kexec(struct kimage *image);
> > > > ? extern int machine_kexec_prepare(struct kimage *image);
> > > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > > > index 256cf6db573c..ecf746243ab2 100644
> > > > --- a/kernel/crash_core.c
> > > > +++ b/kernel/crash_core.c
> > > > @@ -9,12 +9,17 @@
> > > > ? #include <linux/init.h>
> > > > ? #include <linux/utsname.h>
> > > > ? #include <linux/vmalloc.h>
> > > > +#include <linux/highmem.h>
> > > > +#include <linux/memory.h>
> > > > +#include <linux/cpuhotplug.h>
> > > > ? ? #include <asm/page.h>
> > > > ? #include <asm/sections.h>
> > > > ? ? #include <crypto/sha1.h>
> > > > ? +#include "kexec_internal.h"
> > > > +
> > > > ? /* vmcoreinfo stuff */
> > > > ? unsigned char *vmcoreinfo_data;
> > > > ? size_t vmcoreinfo_size;
> > > > @@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
> > > > ? }
> > > > ? ? subsys_initcall(crash_save_vmcoreinfo_init);
> > > > +
> > > > +#ifdef CONFIG_CRASH_HOTPLUG
> > > > +void __weak arch_crash_hotplug_handler(struct kimage *image,
> > > > +??? unsigned int hp_action, unsigned int cpu)
> > > > +{
> > > > +??? pr_warn("crash hp: %s not implemented", __func__);
> > > > +}
> > > > +
> > > > +static void crash_hotplug_handler(unsigned int hp_action,
> > > > +??? unsigned int cpu)
> > > > +{
> > > > +??? /* Obtain lock while changing crash information */
> > > > +??? if (!mutex_trylock(&kexec_mutex))
> > > > +??????? return;
> > > > +
> > > > +??? /* Check kdump is loaded */
> > > > +??? if (kexec_crash_image) {
> > > > +??????? pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
> > > > +
> > > > +??????? /* Needed in order for the segments to be updated */
> > > > +??????? arch_kexec_unprotect_crashkres();
> > > > +
> > > > +??????? /* Flag to differentiate between normal load and hotplug */
> > > > +??????? kexec_crash_image->hotplug_event = true;
> > > > +
> > > > +??????? /* Now invoke arch-specific update handler */
> > > > +??????? arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
> > > > +
> > > > +??????? /* No longer handling a hotplug event */
> > > > +??????? kexec_crash_image->hotplug_event = false;
> > > > +
> > > > +??????? /* Change back to read-only */
> > > > +??????? arch_kexec_protect_crashkres();
> > > > +??? }
> > > > +
> > > > +??? /* Release lock now that update complete */
> > > > +??? mutex_unlock(&kexec_mutex);
> > > > +}
> > > > +
> > > > +#if defined(CONFIG_MEMORY_HOTPLUG)
> > > > +static int crash_memhp_notifier(struct notifier_block *nb,
> > > > +??? unsigned long val, void *v)
> > > > +{
> > > > +??? struct memory_notify *mhp = v;
> > > > +
> > > > +??? switch (val) {
> > > > +??? case MEM_ONLINE:
> > > > +??????? crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
> > > We don't differentiate the memory add/remove, cpu add, except of cpu
> > > remove. Means the hp_action only differentiate cpu remove from the other
> > > action. Maybe only making two types?
> > >
> > > #define KEXEC_CRASH_HP_REMOVE_CPU?? 0
> > > #define KEXEC_CRASH_HP_UPDATE_OTHER????? 1
> > >
> > Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
> > ADD_MEMORY.
> > Do you still want to consolidate these?
>
> On PowerPC different actions are needed for CPU add and memory add/remove.
> For CPU add case only FDT is updated whereas for the memory hotplug we will
> be
> updating FDT and elfcorehdr.
I don't understand. For elfcorehdr updating, we only need regenerate it.
Do you update them different for memory add/remove?
What I saw is the added action for memory hotplug is only for message
printing. Is this really needed? And memory hotplug is even not
supported. Please correct me if I missed anything.
+ /* crash update on memory hotplug is not support yet */
+ if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
+ pr_info_once("crash hp: crash update is not supported with memory hotplug\n");
+ return;
+ }
>
> Ideally, I would prefer the crash hotplug handler to report all four actions
> separately.
>
> Thanks,
> Sourabh Jain
>
On Tue, Apr 19, 2022 at 04:58:47PM -0500, Eric DeVolder wrote:
> So in taking this concept and looking at, in particular, patch 4/8
> "crash: add generic infrastructure for crash hotplug support", I'm not
> exactly sure how to apply this technique.
So I took your patch 4 and maimed into what I think it should look like,
see below.
Now there's a single ifdef there and the registration routines are
wrapped in IS_ENABLED() so that you register a callback only when the
respective stuff - HOTPLUG_CPU or MEMORY_HOTPLUG - is enabled. Otherwise
the couple of functions are unused but that's not that big of a deal.
I've also fixed up some other issues I've encountered along the way.
Holler if there are questions.
HTH.
---
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..b3c32e04d3f0 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,12 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
+#define KEXEC_CRASH_HP_REMOVE_CPU 0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
+#define KEXEC_CRASH_HP_ADD_MEMORY 3
+
+struct kimage;
+void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
+ unsigned int cpu);
#endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 02daff1f47dd..d907a1f0d3da 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -307,13 +307,11 @@ struct kimage {
/* Information for loading purgatory */
struct purgatory_info purgatory_info;
-#ifdef CONFIG_CRASH_HOTPLUG
bool hotplug_event;
unsigned int offlinecpu;
bool elfcorehdr_index_valid;
int elfcorehdr_index;
#endif
-#endif
#ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
@@ -329,15 +327,6 @@ struct kimage {
unsigned long elf_load_addr;
};
-#ifdef CONFIG_CRASH_HOTPLUG
-void arch_crash_hotplug_handler(struct kimage *image,
- unsigned int hp_action, unsigned int cpu);
-#define KEXEC_CRASH_HP_REMOVE_CPU 0
-#define KEXEC_CRASH_HP_ADD_CPU 1
-#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
-#define KEXEC_CRASH_HP_ADD_MEMORY 3
-#endif /* CONFIG_CRASH_HOTPLUG */
-
/* kexec interface functions */
extern void machine_kexec(struct kimage *image);
extern int machine_kexec_prepare(struct kimage *image);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index ecf746243ab2..a6c4ee1d5c86 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -497,57 +497,50 @@ static int __init crash_save_vmcoreinfo_init(void)
subsys_initcall(crash_save_vmcoreinfo_init);
-#ifdef CONFIG_CRASH_HOTPLUG
-void __weak arch_crash_hotplug_handler(struct kimage *image,
- unsigned int hp_action, unsigned int cpu)
+void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
+ unsigned int cpu)
{
- pr_warn("crash hp: %s not implemented", __func__);
+ WARN(1, "crash hotplug handler not implemented");
}
-static void crash_hotplug_handler(unsigned int hp_action,
- unsigned int cpu)
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
{
- /* Obtain lock while changing crash information */
- if (!mutex_trylock(&kexec_mutex))
+ if (!kexec_crash_image)
return;
- /* Check kdump is loaded */
- if (kexec_crash_image) {
- pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
+ mutex_lock(&kexec_mutex);
- /* Needed in order for the segments to be updated */
- arch_kexec_unprotect_crashkres();
+ pr_debug("crash hotplug: hp_action %u, cpu %u", hp_action, cpu);
- /* Flag to differentiate between normal load and hotplug */
- kexec_crash_image->hotplug_event = true;
+ /* Needed in order for the segments to be updated */
+ arch_kexec_unprotect_crashkres();
- /* Now invoke arch-specific update handler */
- arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
+ /* Flag to differentiate between normal load and hotplug */
+ kexec_crash_image->hotplug_event = true;
- /* No longer handling a hotplug event */
- kexec_crash_image->hotplug_event = false;
+ /* Now invoke arch-specific update handler */
+ arch_crash_handle_hotplug_event(kexec_crash_image, hp_action, cpu);
- /* Change back to read-only */
- arch_kexec_protect_crashkres();
- }
+ /* No longer handling a hotplug event */
+ kexec_crash_image->hotplug_event = false;
+
+ /* Change back to read-only */
+ arch_kexec_protect_crashkres();
/* Release lock now that update complete */
mutex_unlock(&kexec_mutex);
}
-#if defined(CONFIG_MEMORY_HOTPLUG)
-static int crash_memhp_notifier(struct notifier_block *nb,
- unsigned long val, void *v)
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
{
- struct memory_notify *mhp = v;
-
switch (val) {
case MEM_ONLINE:
- crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
+ handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
break;
case MEM_OFFLINE:
- crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY, -1U);
+ handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY, -1U);
break;
}
return NOTIFY_OK;
@@ -557,38 +550,33 @@ static struct notifier_block crash_memhp_nb = {
.notifier_call = crash_memhp_notifier,
.priority = 0
};
-#endif
-#if defined(CONFIG_HOTPLUG_CPU)
static int crash_cpuhp_online(unsigned int cpu)
{
- crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
+ handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
return 0;
}
static int crash_cpuhp_offline(unsigned int cpu)
{
- crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+ handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
return 0;
}
-#endif
static int __init crash_hotplug_init(void)
{
int result = 0;
-#if defined(CONFIG_MEMORY_HOTPLUG)
- register_memory_notifier(&crash_memhp_nb);
-#endif
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ register_memory_notifier(&crash_memhp_nb);
-#if defined(CONFIG_HOTPLUG_CPU)
- result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
- "crash/cpuhp",
- crash_cpuhp_online, crash_cpuhp_offline);
-#endif
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "crash/cpuhp",
+ crash_cpuhp_online,
+ crash_cpuhp_offline);
return result;
}
-
subsys_initcall(crash_hotplug_init);
-#endif /* CONFIG_CRASH_HOTPLUG */
+#endif
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 13/04/22 22:12, Eric DeVolder wrote:
> CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
>
> CRASH_HOTPLUG_ELFCOREHDR_SZ is used to specify the maximum size of
> the elfcorehdr buffer/segment.
>
> This is a preparation for later usage.
>
> Signed-off-by: Eric DeVolder <[email protected]>
> Acked-by: Baoquan He <[email protected]>
> ---
> arch/x86/Kconfig | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b0142e01002e..f7b92ee1bcc7 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2072,6 +2072,32 @@ config CRASH_DUMP
> (CONFIG_RELOCATABLE=y).
> For more details see Documentation/admin-guide/kdump/kdump.rst
>
> +config CRASH_HOTPLUG
> + bool "kernel updates of crash elfcorehdr"
> + depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG) && KEXEC_FILE
> + help
> + Enable the kernel to update the crash elfcorehdr (which contains
> + the list of CPUs and memory regions) directly when hot plug/unplug
> + of CPUs or memory. Otherwise userspace must monitor these hot
> + plug/unplug change notifications via udev in order to
> + unload-then-reload the crash kernel so that the list of CPUs and
> + memory regions is kept up-to-date. Note that the udev CPU and
> + memory change notifications still occur (however, userspace is not
> + required to monitor for crash dump purposes).
> +
> +config CRASH_HOTPLUG_ELFCOREHDR_SZ
> + depends on CRASH_HOTPLUG
> + int
> + default 131072
> + help
> + Specify the maximum size of the elfcorehdr buffer/segment.
> + The 128KiB default is sized so that it can accommodate 2048
> + Elf64_Phdr, where each Phdr represents either a CPU or a
> + region of memory.
> + For example, this size can accommodate a machine with up to 1024
> + CPUs and up to 1024 memory regions, eg. as represented by the
> + 'System RAM' entries in /proc/iomem.
Is it possible to get rid of CRASH_HOTPLUG_ELFCOREHDR_SZ?
How about finding the additional buffer space needed for future CPU and memory
add during the kdump load? Not sure about the feasibility of doing this in
kexec tool (userspace).
Thanks,
Sourabh Jain
On 15/04/22 03:59, Eric DeVolder wrote:
> Hi Baoquan,
> Inline comments below.
> Thanks!
> eric
>
> On 4/13/22 21:45, Baoquan He wrote:
>> On 04/13/22 at 12:42pm, Eric DeVolder wrote:
>>> Upon CPU and memory changes, a generic crash_hotplug_handler()
>>> dispatches the hot plug/unplug event to the architecture specific
>>> arch_crash_hotplug_handler(). During the process, the kexec_mutex
>>> is held.
>>>
>>> To support cpu hotplug, a callback is registered to capture the
>>> CPUHP_AP_ONLINE_DYN online and ofline events via
>>> cpuhp_setup_state_nocalls().
>>>
>>> To support memory hotplug, a notifier is registered to capture the
>>> MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>>>
>>> The cpu callback and memory notifier then call crash_hotplug_handler()
>>> to handle the hot plug/unplug event.
>>>
>>> Signed-off-by: Eric DeVolder <[email protected]>
>>> ---
>>> include/linux/kexec.h | 16 +++++++
>>> kernel/crash_core.c | 101
>>> ++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 117 insertions(+)
>>>
>>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>>> index f93f2591fc1e..02daff1f47dd 100644
>>> --- a/include/linux/kexec.h
>>> +++ b/include/linux/kexec.h
>>> @@ -306,6 +306,13 @@ struct kimage {
>>> /* Information for loading purgatory */
>>> struct purgatory_info purgatory_info;
>>> +
>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>> + bool hotplug_event;
>>> + unsigned int offlinecpu;
>>> + bool elfcorehdr_index_valid;
>>> + int elfcorehdr_index;
>>> +#endif
>>> #endif
>>> #ifdef CONFIG_IMA_KEXEC
>>> @@ -322,6 +329,15 @@ struct kimage {
>>> unsigned long elf_load_addr;
>>> };
>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>> +void arch_crash_hotplug_handler(struct kimage *image,
>>> + unsigned int hp_action, unsigned int cpu);
>>> +#define KEXEC_CRASH_HP_REMOVE_CPU 0
>>> +#define KEXEC_CRASH_HP_ADD_CPU 1
>>> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
>>> +#define KEXEC_CRASH_HP_ADD_MEMORY 3
>>> +#endif /* CONFIG_CRASH_HOTPLUG */
>>> +
>>> /* kexec interface functions */
>>> extern void machine_kexec(struct kimage *image);
>>> extern int machine_kexec_prepare(struct kimage *image);
>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>> index 256cf6db573c..ecf746243ab2 100644
>>> --- a/kernel/crash_core.c
>>> +++ b/kernel/crash_core.c
>>> @@ -9,12 +9,17 @@
>>> #include <linux/init.h>
>>> #include <linux/utsname.h>
>>> #include <linux/vmalloc.h>
>>> +#include <linux/highmem.h>
>>> +#include <linux/memory.h>
>>> +#include <linux/cpuhotplug.h>
>>> #include <asm/page.h>
>>> #include <asm/sections.h>
>>> #include <crypto/sha1.h>
>>> +#include "kexec_internal.h"
>>> +
>>> /* vmcoreinfo stuff */
>>> unsigned char *vmcoreinfo_data;
>>> size_t vmcoreinfo_size;
>>> @@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
>>> }
>>> subsys_initcall(crash_save_vmcoreinfo_init);
>>> +
>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>> +void __weak arch_crash_hotplug_handler(struct kimage *image,
>>> + unsigned int hp_action, unsigned int cpu)
>>> +{
>>> + pr_warn("crash hp: %s not implemented", __func__);
>>> +}
>>> +
>>> +static void crash_hotplug_handler(unsigned int hp_action,
>>> + unsigned int cpu)
>>> +{
>>> + /* Obtain lock while changing crash information */
>>> + if (!mutex_trylock(&kexec_mutex))
>>> + return;
>>> +
>>> + /* Check kdump is loaded */
>>> + if (kexec_crash_image) {
>>> + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
>>> +
>>> + /* Needed in order for the segments to be updated */
>>> + arch_kexec_unprotect_crashkres();
>>> +
>>> + /* Flag to differentiate between normal load and hotplug */
>>> + kexec_crash_image->hotplug_event = true;
>>> +
>>> + /* Now invoke arch-specific update handler */
>>> + arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
>>> +
>>> + /* No longer handling a hotplug event */
>>> + kexec_crash_image->hotplug_event = false;
>>> +
>>> + /* Change back to read-only */
>>> + arch_kexec_protect_crashkres();
>>> + }
>>> +
>>> + /* Release lock now that update complete */
>>> + mutex_unlock(&kexec_mutex);
>>> +}
>>> +
>>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>>> +static int crash_memhp_notifier(struct notifier_block *nb,
>>> + unsigned long val, void *v)
>>> +{
>>> + struct memory_notify *mhp = v;
>>> +
>>> + switch (val) {
>>> + case MEM_ONLINE:
>>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
>> We don't differentiate the memory add/remove, cpu add, except of cpu
>> remove. Means the hp_action only differentiate cpu remove from the other
>> action. Maybe only making two types?
>>
>> #define KEXEC_CRASH_HP_REMOVE_CPU 0
>> #define KEXEC_CRASH_HP_UPDATE_OTHER 1
>>
> Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
> ADD_MEMORY.
> Do you still want to consolidate these?
On PowerPC different actions are needed for CPU add and memory add/remove.
For CPU add case only FDT is updated whereas for the memory hotplug we
will be
updating FDT and elfcorehdr.
Ideally, I would prefer the crash hotplug handler to report all four
actions separately.
Thanks,
Sourabh Jain
Boris,
Many thanks for the time taken to illustrate for me!
I've one question below.
Eric
On 4/25/22 14:25, Borislav Petkov wrote:
> On Tue, Apr 19, 2022 at 04:58:47PM -0500, Eric DeVolder wrote:
>> So in taking this concept and looking at, in particular, patch 4/8
>> "crash: add generic infrastructure for crash hotplug support", I'm not
>> exactly sure how to apply this technique.
>
> So I took your patch 4 and maimed into what I think it should look like,
> see below.
>
> Now there's a single ifdef there and the registration routines are
> wrapped in IS_ENABLED() so that you register a callback only when the
> respective stuff - HOTPLUG_CPU or MEMORY_HOTPLUG - is enabled. Otherwise
> the couple of functions are unused but that's not that big of a deal.
>
> I've also fixed up some other issues I've encountered along the way.
>
> Holler if there are questions.
>
> HTH.
>
> ---
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index de62a722431e..b3c32e04d3f0 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -84,4 +84,12 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
> int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
> unsigned long long *crash_size, unsigned long long *crash_base);
>
> +#define KEXEC_CRASH_HP_REMOVE_CPU 0
> +#define KEXEC_CRASH_HP_ADD_CPU 1
> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> +#define KEXEC_CRASH_HP_ADD_MEMORY 3
> +
> +struct kimage;
> +void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
> + unsigned int cpu);
> #endif /* LINUX_CRASH_CORE_H */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 02daff1f47dd..d907a1f0d3da 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -307,13 +307,11 @@ struct kimage {
> /* Information for loading purgatory */
> struct purgatory_info purgatory_info;
>
> -#ifdef CONFIG_CRASH_HOTPLUG
> bool hotplug_event;
> unsigned int offlinecpu;
> bool elfcorehdr_index_valid;
> int elfcorehdr_index;
> #endif
> -#endif
>
> #ifdef CONFIG_IMA_KEXEC
> /* Virtual address of IMA measurement buffer for kexec syscall */
> @@ -329,15 +327,6 @@ struct kimage {
> unsigned long elf_load_addr;
> };
>
> -#ifdef CONFIG_CRASH_HOTPLUG
> -void arch_crash_hotplug_handler(struct kimage *image,
> - unsigned int hp_action, unsigned int cpu);
> -#define KEXEC_CRASH_HP_REMOVE_CPU 0
> -#define KEXEC_CRASH_HP_ADD_CPU 1
> -#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> -#define KEXEC_CRASH_HP_ADD_MEMORY 3
> -#endif /* CONFIG_CRASH_HOTPLUG */
> -
> /* kexec interface functions */
> extern void machine_kexec(struct kimage *image);
> extern int machine_kexec_prepare(struct kimage *image);
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index ecf746243ab2..a6c4ee1d5c86 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -497,57 +497,50 @@ static int __init crash_save_vmcoreinfo_init(void)
>
> subsys_initcall(crash_save_vmcoreinfo_init);
>
> -#ifdef CONFIG_CRASH_HOTPLUG
> -void __weak arch_crash_hotplug_handler(struct kimage *image,
> - unsigned int hp_action, unsigned int cpu)
> +void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
> + unsigned int cpu)
> {
> - pr_warn("crash hp: %s not implemented", __func__);
> + WARN(1, "crash hotplug handler not implemented");
> }
>
> -static void crash_hotplug_handler(unsigned int hp_action,
> - unsigned int cpu)
> +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> +static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> {
> - /* Obtain lock while changing crash information */
> - if (!mutex_trylock(&kexec_mutex))
> + if (!kexec_crash_image)
> return;
Why is it safe to examine kexec_crash_image outside the mutex? As I understand it, there is still
the (very rare) opportunity for a kdump load/unload initiated via userland and this check to
collide. (Similarly, I believe the mutex entry is almost always assured/likely.)
>
> - /* Check kdump is loaded */
> - if (kexec_crash_image) {
> - pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
> + mutex_lock(&kexec_mutex);
>
> - /* Needed in order for the segments to be updated */
> - arch_kexec_unprotect_crashkres();
> + pr_debug("crash hotplug: hp_action %u, cpu %u", hp_action, cpu);
>
> - /* Flag to differentiate between normal load and hotplug */
> - kexec_crash_image->hotplug_event = true;
> + /* Needed in order for the segments to be updated */
> + arch_kexec_unprotect_crashkres();
>
> - /* Now invoke arch-specific update handler */
> - arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
> + /* Flag to differentiate between normal load and hotplug */
> + kexec_crash_image->hotplug_event = true;
>
> - /* No longer handling a hotplug event */
> - kexec_crash_image->hotplug_event = false;
> + /* Now invoke arch-specific update handler */
> + arch_crash_handle_hotplug_event(kexec_crash_image, hp_action, cpu);
>
> - /* Change back to read-only */
> - arch_kexec_protect_crashkres();
> - }
> + /* No longer handling a hotplug event */
> + kexec_crash_image->hotplug_event = false;
> +
> + /* Change back to read-only */
> + arch_kexec_protect_crashkres();
>
> /* Release lock now that update complete */
> mutex_unlock(&kexec_mutex);
> }
>
> -#if defined(CONFIG_MEMORY_HOTPLUG)
> -static int crash_memhp_notifier(struct notifier_block *nb,
> - unsigned long val, void *v)
> +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
> {
> - struct memory_notify *mhp = v;
> -
> switch (val) {
> case MEM_ONLINE:
> - crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
> + handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
> break;
>
> case MEM_OFFLINE:
> - crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY, -1U);
> + handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY, -1U);
> break;
> }
> return NOTIFY_OK;
> @@ -557,38 +550,33 @@ static struct notifier_block crash_memhp_nb = {
> .notifier_call = crash_memhp_notifier,
> .priority = 0
> };
> -#endif
>
> -#if defined(CONFIG_HOTPLUG_CPU)
> static int crash_cpuhp_online(unsigned int cpu)
> {
> - crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
> + handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
> return 0;
> }
>
> static int crash_cpuhp_offline(unsigned int cpu)
> {
> - crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
> + handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
> return 0;
> }
> -#endif
>
> static int __init crash_hotplug_init(void)
> {
> int result = 0;
>
> -#if defined(CONFIG_MEMORY_HOTPLUG)
> - register_memory_notifier(&crash_memhp_nb);
> -#endif
> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> + register_memory_notifier(&crash_memhp_nb);
>
> -#if defined(CONFIG_HOTPLUG_CPU)
> - result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> - "crash/cpuhp",
> - crash_cpuhp_online, crash_cpuhp_offline);
> -#endif
> + if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
> + result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> + "crash/cpuhp",
> + crash_cpuhp_online,
> + crash_cpuhp_offline);
>
> return result;
> }
> -
> subsys_initcall(crash_hotplug_init);
> -#endif /* CONFIG_CRASH_HOTPLUG */
> +#endif
>
On 4/25/22 23:21, Sourabh Jain wrote:
>
> On 13/04/22 22:12, Eric DeVolder wrote:
>> CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
>>
>> CRASH_HOTPLUG_ELFCOREHDR_SZ is used to specify the maximum size of
>> the elfcorehdr buffer/segment.
>>
>> This is a preparation for later usage.
>>
>> Signed-off-by: Eric DeVolder <[email protected]>
>> Acked-by: Baoquan He <[email protected]>
>> ---
>> arch/x86/Kconfig | 26 ++++++++++++++++++++++++++
>> 1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index b0142e01002e..f7b92ee1bcc7 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -2072,6 +2072,32 @@ config CRASH_DUMP
>> (CONFIG_RELOCATABLE=y).
>> For more details see Documentation/admin-guide/kdump/kdump.rst
>> +config CRASH_HOTPLUG
>> + bool "kernel updates of crash elfcorehdr"
>> + depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG) && KEXEC_FILE
>> + help
>> + Enable the kernel to update the crash elfcorehdr (which contains
>> + the list of CPUs and memory regions) directly when hot plug/unplug
>> + of CPUs or memory. Otherwise userspace must monitor these hot
>> + plug/unplug change notifications via udev in order to
>> + unload-then-reload the crash kernel so that the list of CPUs and
>> + memory regions is kept up-to-date. Note that the udev CPU and
>> + memory change notifications still occur (however, userspace is not
>> + required to monitor for crash dump purposes).
>> +
>> +config CRASH_HOTPLUG_ELFCOREHDR_SZ
>> + depends on CRASH_HOTPLUG
>> + int
>> + default 131072
>> + help
>> + Specify the maximum size of the elfcorehdr buffer/segment.
>> + The 128KiB default is sized so that it can accommodate 2048
>> + Elf64_Phdr, where each Phdr represents either a CPU or a
>> + region of memory.
>> + For example, this size can accommodate a machine with up to 1024
>> + CPUs and up to 1024 memory regions, eg. as represented by the
>> + 'System RAM' entries in /proc/iomem.
>
> Is it possible to get rid of CRASH_HOTPLUG_ELFCOREHDR_SZ?
At the moment, I do not think so. The idea behind this value is to represent the largest number of
CPUs and memory regions possible in the system. Today there is NR_CPUS which could be used for CPUs,
but there isn't a similar value for memory. I also am not aware of a kernel variable that could be
utilized to represent the maximum number of memory regions. If there is, please let me know!
>
> How about finding the additional buffer space needed for future CPU and memory
> add during the kdump load? Not sure about the feasibility of doing this in
> kexec tool (userspace).
I may not understand what you are asking, but the x86 code, for kexec_file_load, does in fact
allocate all the space needed (currently via CRASH_HOTPLUG_ELFCOREHDR_SZ) upon kdump load.
For kexec_load, I've had no problem asking the kexec tool to allocate a larger piece of memory for
the elfcorehdr. But it is the same problem as CRASH_HOTPLUG_ELFCOREHDR_SZ; how big? In my workspace
I tell kexec tool how big. If there are sysfs visible values for NR_CPU and memory, then we could
have kexec pull those and compute.
I do think the important thing is that this allocation needs to happen once (for either kexec_load
or kexec_file_load), so that the buffer is always in the same spot and thus the pointer to that
buffer does not change; else boot_params cmdline would need to change. I once had this coded this
way, but Baoquan pointed out this simpler way.
Regards,
eric
>
> Thanks,
> Sourabh Jain
>
>
>
On Tue, Apr 26, 2022 at 03:08:26PM -0500, Eric DeVolder wrote:
> Why is it safe to examine kexec_crash_image outside the mutex? As I
> understand it, there is still the (very rare) opportunity for a kdump
> load/unload initiated via userland and this check to collide. (Similarly, I
> believe the mutex entry is almost always assured/likely.)
Yeah, I guess. As the comment in do_kexec_load() says
"KISS: always take the mutex."
so ignore this part of the diff and let's always grab the mutex.
Sorry for the confusion.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Baoquan,
On 26/04/22 10:52, Baoquan He wrote:
> On 04/26/22 at 09:36am, Sourabh Jain wrote:
>> On 15/04/22 03:59, Eric DeVolder wrote:
>>> Hi Baoquan,
>>> Inline comments below.
>>> Thanks!
>>> eric
>>>
>>> On 4/13/22 21:45, Baoquan He wrote:
>>>> On 04/13/22 at 12:42pm, Eric DeVolder wrote:
>>>>> Upon CPU and memory changes, a generic crash_hotplug_handler()
>>>>> dispatches the hot plug/unplug event to the architecture specific
>>>>> arch_crash_hotplug_handler(). During the process, the kexec_mutex
>>>>> is held.
>>>>>
>>>>> To support cpu hotplug, a callback is registered to capture the
>>>>> CPUHP_AP_ONLINE_DYN online and ofline events via
>>>>> cpuhp_setup_state_nocalls().
>>>>>
>>>>> To support memory hotplug, a notifier is registered to capture the
>>>>> MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>>>>>
>>>>> The cpu callback and memory notifier then call crash_hotplug_handler()
>>>>> to handle the hot plug/unplug event.
>>>>>
>>>>> Signed-off-by: Eric DeVolder <[email protected]>
>>>>> ---
>>>>> include/linux/kexec.h | 16 +++++++
>>>>> kernel/crash_core.c | 101
>>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>> 2 files changed, 117 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>>>>> index f93f2591fc1e..02daff1f47dd 100644
>>>>> --- a/include/linux/kexec.h
>>>>> +++ b/include/linux/kexec.h
>>>>> @@ -306,6 +306,13 @@ struct kimage {
>>>>> /* Information for loading purgatory */
>>>>> struct purgatory_info purgatory_info;
>>>>> +
>>>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>>>> + bool hotplug_event;
>>>>> + unsigned int offlinecpu;
>>>>> + bool elfcorehdr_index_valid;
>>>>> + int elfcorehdr_index;
>>>>> +#endif
>>>>> #endif
>>>>> #ifdef CONFIG_IMA_KEXEC
>>>>> @@ -322,6 +329,15 @@ struct kimage {
>>>>> unsigned long elf_load_addr;
>>>>> };
>>>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>>>> +void arch_crash_hotplug_handler(struct kimage *image,
>>>>> + unsigned int hp_action, unsigned int cpu);
>>>>> +#define KEXEC_CRASH_HP_REMOVE_CPU 0
>>>>> +#define KEXEC_CRASH_HP_ADD_CPU 1
>>>>> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
>>>>> +#define KEXEC_CRASH_HP_ADD_MEMORY 3
>>>>> +#endif /* CONFIG_CRASH_HOTPLUG */
>>>>> +
>>>>> /* kexec interface functions */
>>>>> extern void machine_kexec(struct kimage *image);
>>>>> extern int machine_kexec_prepare(struct kimage *image);
>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>> index 256cf6db573c..ecf746243ab2 100644
>>>>> --- a/kernel/crash_core.c
>>>>> +++ b/kernel/crash_core.c
>>>>> @@ -9,12 +9,17 @@
>>>>> #include <linux/init.h>
>>>>> #include <linux/utsname.h>
>>>>> #include <linux/vmalloc.h>
>>>>> +#include <linux/highmem.h>
>>>>> +#include <linux/memory.h>
>>>>> +#include <linux/cpuhotplug.h>
>>>>> #include <asm/page.h>
>>>>> #include <asm/sections.h>
>>>>> #include <crypto/sha1.h>
>>>>> +#include "kexec_internal.h"
>>>>> +
>>>>> /* vmcoreinfo stuff */
>>>>> unsigned char *vmcoreinfo_data;
>>>>> size_t vmcoreinfo_size;
>>>>> @@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
>>>>> }
>>>>> subsys_initcall(crash_save_vmcoreinfo_init);
>>>>> +
>>>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>>>> +void __weak arch_crash_hotplug_handler(struct kimage *image,
>>>>> + unsigned int hp_action, unsigned int cpu)
>>>>> +{
>>>>> + pr_warn("crash hp: %s not implemented", __func__);
>>>>> +}
>>>>> +
>>>>> +static void crash_hotplug_handler(unsigned int hp_action,
>>>>> + unsigned int cpu)
>>>>> +{
>>>>> + /* Obtain lock while changing crash information */
>>>>> + if (!mutex_trylock(&kexec_mutex))
>>>>> + return;
>>>>> +
>>>>> + /* Check kdump is loaded */
>>>>> + if (kexec_crash_image) {
>>>>> + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
>>>>> +
>>>>> + /* Needed in order for the segments to be updated */
>>>>> + arch_kexec_unprotect_crashkres();
>>>>> +
>>>>> + /* Flag to differentiate between normal load and hotplug */
>>>>> + kexec_crash_image->hotplug_event = true;
>>>>> +
>>>>> + /* Now invoke arch-specific update handler */
>>>>> + arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
>>>>> +
>>>>> + /* No longer handling a hotplug event */
>>>>> + kexec_crash_image->hotplug_event = false;
>>>>> +
>>>>> + /* Change back to read-only */
>>>>> + arch_kexec_protect_crashkres();
>>>>> + }
>>>>> +
>>>>> + /* Release lock now that update complete */
>>>>> + mutex_unlock(&kexec_mutex);
>>>>> +}
>>>>> +
>>>>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>>>>> +static int crash_memhp_notifier(struct notifier_block *nb,
>>>>> + unsigned long val, void *v)
>>>>> +{
>>>>> + struct memory_notify *mhp = v;
>>>>> +
>>>>> + switch (val) {
>>>>> + case MEM_ONLINE:
>>>>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
>>>> We don't differentiate the memory add/remove, cpu add, except of cpu
>>>> remove. Means the hp_action only differentiate cpu remove from the other
>>>> action. Maybe only making two types?
>>>>
>>>> #define KEXEC_CRASH_HP_REMOVE_CPU 0
>>>> #define KEXEC_CRASH_HP_UPDATE_OTHER 1
>>>>
>>> Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
>>> ADD_MEMORY.
>>> Do you still want to consolidate these?
>> On PowerPC different actions are needed for CPU add and memory add/remove.
>> For CPU add case only FDT is updated whereas for the memory hotplug we will
>> be
>> updating FDT and elfcorehdr.
> I don't understand. For elfcorehdr updating, we only need regenerate it.
> Do you update them different for memory add/remove?
We have different actions for cpu remove, CPU add and memory add/remove
case.
CPU remove: no action
CPU add: update flattened device tree (FDT)
memory add/remove: update FDT and regenerate/update elfcorehdr
Since memory add/remove action is same we can have common hp_action for
them.
>
> What I saw is the added action for memory hotplug is only for message
> printing. Is this really needed? And memory hotplug is even not
> supported. Please correct me if I missed anything.
I agree that currently memory hp_action is only used for printing
warning message but
eventually we will be handling memory hotplug case as well.
> + /* crash update on memory hotplug is not support yet */
> + if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
> + pr_info_once("crash hp: crash update is not supported with memory hotplug\n");
> + return;
> + }
Thanks,
Sourabh Jain
On 13.04.22 18:42, Eric DeVolder wrote:
> The pr_debug() intends to display the memsz member, but the
> parameter is actually the bufsz member (which is already
> displayed). Correct this to display memsz value.
>
> Signed-off-by: Eric DeVolder <[email protected]>
> Acked-by: Baoquan He <[email protected]>
> ---
Reviewed-by: David Hildenbrand <[email protected]>
--
Thanks,
David / dhildenb
On 26/04/22 20:09, Eric DeVolder wrote:
>
>
> On 4/25/22 23:21, Sourabh Jain wrote:
>>
>> On 13/04/22 22:12, Eric DeVolder wrote:
>>> CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
>>>
>>> CRASH_HOTPLUG_ELFCOREHDR_SZ is used to specify the maximum size of
>>> the elfcorehdr buffer/segment.
>>>
>>> This is a preparation for later usage.
>>>
>>> Signed-off-by: Eric DeVolder <[email protected]>
>>> Acked-by: Baoquan He <[email protected]>
>>> ---
>>> arch/x86/Kconfig | 26 ++++++++++++++++++++++++++
>>> 1 file changed, 26 insertions(+)
>>>
>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>> index b0142e01002e..f7b92ee1bcc7 100644
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -2072,6 +2072,32 @@ config CRASH_DUMP
>>> (CONFIG_RELOCATABLE=y).
>>> For more details see Documentation/admin-guide/kdump/kdump.rst
>>> +config CRASH_HOTPLUG
>>> + bool "kernel updates of crash elfcorehdr"
>>> + depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG) &&
>>> KEXEC_FILE
>>> + help
>>> + Enable the kernel to update the crash elfcorehdr (which contains
>>> + the list of CPUs and memory regions) directly when hot
>>> plug/unplug
>>> + of CPUs or memory. Otherwise userspace must monitor these hot
>>> + plug/unplug change notifications via udev in order to
>>> + unload-then-reload the crash kernel so that the list of CPUs and
>>> + memory regions is kept up-to-date. Note that the udev CPU and
>>> + memory change notifications still occur (however, userspace
>>> is not
>>> + required to monitor for crash dump purposes).
>>> +
>>> +config CRASH_HOTPLUG_ELFCOREHDR_SZ
>>> + depends on CRASH_HOTPLUG
>>> + int
>>> + default 131072
>>> + help
>>> + Specify the maximum size of the elfcorehdr buffer/segment.
>>> + The 128KiB default is sized so that it can accommodate 2048
>>> + Elf64_Phdr, where each Phdr represents either a CPU or a
>>> + region of memory.
>>> + For example, this size can accommodate a machine with up to 1024
>>> + CPUs and up to 1024 memory regions, eg. as represented by the
>>> + 'System RAM' entries in /proc/iomem.
>>
>> Is it possible to get rid of CRASH_HOTPLUG_ELFCOREHDR_SZ?
> At the moment, I do not think so. The idea behind this value is to
> represent the largest number of CPUs and memory regions possible in
> the system. Today there is NR_CPUS which could be used for CPUs, but
> there isn't a similar value for memory. I also am not aware of a
> kernel variable that could be utilized to represent the maximum number
> of memory regions. If there is, please let me know!
>>
>> How about finding the additional buffer space needed for future CPU
>> and memory
>> add during the kdump load? Not sure about the feasibility of doing
>> this in
>> kexec tool (userspace).
>
> I may not understand what you are asking, but the x86 code, for
> kexec_file_load, does in fact allocate all the space needed (currently
> via CRASH_HOTPLUG_ELFCOREHDR_SZ) upon kdump load.
>
> For kexec_load, I've had no problem asking the kexec tool to allocate
> a larger piece of memory for the elfcorehdr. But it is the same
> problem as CRASH_HOTPLUG_ELFCOREHDR_SZ; how big? In my workspace I
> tell kexec tool how big. If there are sysfs visible values for NR_CPU
> and memory, then we could have kexec pull those and compute.
Yeah dynamic calculation for PT_LOAD sections needed for possible memory
may not be straightforward. But still I did not get the rational for
limiting the possible PT_LOAD sections or memory ranges to only 1024.
Although in kexec tool the max memory ranges for x86 is 32K.
commit 1bc7bc7649fa29d95c98f6a6d8dd2f08734a865c
Author: David Hildenbrand <[email protected]>
Date: Tue Mar 23 11:01:10 2021 +0100
crashdump/x86: increase CRASH_MAX_MEMORY_RANGES to 32k
virtio-mem in Linux adds/removes individual memory blocks (e.g., 128 MB
each). Linux merges adjacent memory blocks added by virtio-mem
devices, but
we can still end up with a very sparse memory layout when unplugging
memory in corner cases.
Let's increase the maximum number of crash memory ranges from ~2k
to 32k.
32k should be sufficient for a very long time.
e_phnum field in the header is 16 bits wide, so we can fit a maximum of
~64k entries in there, shared with other entries (i.e., CPU).
Therefore,
using up to 32k memory ranges is fine. (if we ever need more than ~64k,
Do you see any issue if we increase the memory range count to 32K?
Thanks,
Sourabh Jain
On 4/28/22 00:18, Sourabh Jain wrote:
> Hi Baoquan,
>
> On 26/04/22 10:52, Baoquan He wrote:
>> On 04/26/22 at 09:36am, Sourabh Jain wrote:
>>> On 15/04/22 03:59, Eric DeVolder wrote:
>>>> Hi Baoquan,
>>>> Inline comments below.
>>>> Thanks!
>>>> eric
>>>>
>>>> On 4/13/22 21:45, Baoquan He wrote:
>>>>> On 04/13/22 at 12:42pm, Eric DeVolder wrote:
>>>>>> Upon CPU and memory changes, a generic crash_hotplug_handler()
>>>>>> dispatches the hot plug/unplug event to the architecture specific
>>>>>> arch_crash_hotplug_handler(). During the process, the kexec_mutex
>>>>>> is held.
>>>>>>
>>>>>> To support cpu hotplug, a callback is registered to capture the
>>>>>> CPUHP_AP_ONLINE_DYN online and ofline events via
>>>>>> cpuhp_setup_state_nocalls().
>>>>>>
>>>>>> To support memory hotplug, a notifier is registered to capture the
>>>>>> MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>>>>>>
>>>>>> The cpu callback and memory notifier then call crash_hotplug_handler()
>>>>>> to handle the hot plug/unplug event.
>>>>>>
>>>>>> Signed-off-by: Eric DeVolder <[email protected]>
>>>>>> ---
>>>>>> include/linux/kexec.h | 16 +++++++
>>>>>> kernel/crash_core.c | 101
>>>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>>> 2 files changed, 117 insertions(+)
>>>>>>
>>>>>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>>>>>> index f93f2591fc1e..02daff1f47dd 100644
>>>>>> --- a/include/linux/kexec.h
>>>>>> +++ b/include/linux/kexec.h
>>>>>> @@ -306,6 +306,13 @@ struct kimage {
>>>>>> /* Information for loading purgatory */
>>>>>> struct purgatory_info purgatory_info;
>>>>>> +
>>>>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>>>>> + bool hotplug_event;
>>>>>> + unsigned int offlinecpu;
>>>>>> + bool elfcorehdr_index_valid;
>>>>>> + int elfcorehdr_index;
>>>>>> +#endif
>>>>>> #endif
>>>>>> #ifdef CONFIG_IMA_KEXEC
>>>>>> @@ -322,6 +329,15 @@ struct kimage {
>>>>>> unsigned long elf_load_addr;
>>>>>> };
>>>>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>>>>> +void arch_crash_hotplug_handler(struct kimage *image,
>>>>>> + unsigned int hp_action, unsigned int cpu);
>>>>>> +#define KEXEC_CRASH_HP_REMOVE_CPU 0
>>>>>> +#define KEXEC_CRASH_HP_ADD_CPU 1
>>>>>> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
>>>>>> +#define KEXEC_CRASH_HP_ADD_MEMORY 3
>>>>>> +#endif /* CONFIG_CRASH_HOTPLUG */
>>>>>> +
>>>>>> /* kexec interface functions */
>>>>>> extern void machine_kexec(struct kimage *image);
>>>>>> extern int machine_kexec_prepare(struct kimage *image);
>>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>>> index 256cf6db573c..ecf746243ab2 100644
>>>>>> --- a/kernel/crash_core.c
>>>>>> +++ b/kernel/crash_core.c
>>>>>> @@ -9,12 +9,17 @@
>>>>>> #include <linux/init.h>
>>>>>> #include <linux/utsname.h>
>>>>>> #include <linux/vmalloc.h>
>>>>>> +#include <linux/highmem.h>
>>>>>> +#include <linux/memory.h>
>>>>>> +#include <linux/cpuhotplug.h>
>>>>>> #include <asm/page.h>
>>>>>> #include <asm/sections.h>
>>>>>> #include <crypto/sha1.h>
>>>>>> +#include "kexec_internal.h"
>>>>>> +
>>>>>> /* vmcoreinfo stuff */
>>>>>> unsigned char *vmcoreinfo_data;
>>>>>> size_t vmcoreinfo_size;
>>>>>> @@ -491,3 +496,99 @@ static int __init crash_save_vmcoreinfo_init(void)
>>>>>> }
>>>>>> subsys_initcall(crash_save_vmcoreinfo_init);
>>>>>> +
>>>>>> +#ifdef CONFIG_CRASH_HOTPLUG
>>>>>> +void __weak arch_crash_hotplug_handler(struct kimage *image,
>>>>>> + unsigned int hp_action, unsigned int cpu)
>>>>>> +{
>>>>>> + pr_warn("crash hp: %s not implemented", __func__);
>>>>>> +}
>>>>>> +
>>>>>> +static void crash_hotplug_handler(unsigned int hp_action,
>>>>>> + unsigned int cpu)
>>>>>> +{
>>>>>> + /* Obtain lock while changing crash information */
>>>>>> + if (!mutex_trylock(&kexec_mutex))
>>>>>> + return;
>>>>>> +
>>>>>> + /* Check kdump is loaded */
>>>>>> + if (kexec_crash_image) {
>>>>>> + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu);
>>>>>> +
>>>>>> + /* Needed in order for the segments to be updated */
>>>>>> + arch_kexec_unprotect_crashkres();
>>>>>> +
>>>>>> + /* Flag to differentiate between normal load and hotplug */
>>>>>> + kexec_crash_image->hotplug_event = true;
>>>>>> +
>>>>>> + /* Now invoke arch-specific update handler */
>>>>>> + arch_crash_hotplug_handler(kexec_crash_image, hp_action, cpu);
>>>>>> +
>>>>>> + /* No longer handling a hotplug event */
>>>>>> + kexec_crash_image->hotplug_event = false;
>>>>>> +
>>>>>> + /* Change back to read-only */
>>>>>> + arch_kexec_protect_crashkres();
>>>>>> + }
>>>>>> +
>>>>>> + /* Release lock now that update complete */
>>>>>> + mutex_unlock(&kexec_mutex);
>>>>>> +}
>>>>>> +
>>>>>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>>>>>> +static int crash_memhp_notifier(struct notifier_block *nb,
>>>>>> + unsigned long val, void *v)
>>>>>> +{
>>>>>> + struct memory_notify *mhp = v;
>>>>>> +
>>>>>> + switch (val) {
>>>>>> + case MEM_ONLINE:
>>>>>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
>>>>> We don't differentiate the memory add/remove, cpu add, except of cpu
>>>>> remove. Means the hp_action only differentiate cpu remove from the other
>>>>> action. Maybe only making two types?
>>>>>
>>>>> #define KEXEC_CRASH_HP_REMOVE_CPU 0
>>>>> #define KEXEC_CRASH_HP_UPDATE_OTHER 1
>>>>>
>>>> Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
>>>> ADD_MEMORY.
>>>> Do you still want to consolidate these?
>>> On PowerPC different actions are needed for CPU add and memory add/remove.
>>> For CPU add case only FDT is updated whereas for the memory hotplug we will
>>> be
>>> updating FDT and elfcorehdr.
>> I don't understand. For elfcorehdr updating, we only need regenerate it.
>> Do you update them different for memory add/remove?
>
> We have different actions for cpu remove, CPU add and memory add/remove case.
>
> CPU remove: no action
> CPU add: update flattened device tree (FDT)
> memory add/remove: update FDT and regenerate/update elfcorehdr
>
> Since memory add/remove action is same we can have common hp_action for them.
>
>>
>> What I saw is the added action for memory hotplug is only for message
>> printing. Is this really needed? And memory hotplug is even not
>> supported. Please correct me if I missed anything.
>
> I agree that currently memory hp_action is only used for printing warning message but
> eventually we will be handling memory hotplug case as well.
Baoquan,
It appears the straight forward thing to do here is just to keep the 4 cpu/mem add/remove
combinations. It appears there is value in keeping them as currently defined. However, please
indicate if you agree or not.
Thanks!
Eric
>
>> + /* crash update on memory hotplug is not support yet */
>> + if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
>> + pr_info_once("crash hp: crash update is not supported with memory hotplug\n");
>> + return;
>> + }
>
> Thanks,
> Sourabh Jain
>
On 04/28/22 at 10:48am, Sourabh Jain wrote:
> Hi Baoquan,
>
> On 26/04/22 10:52, Baoquan He wrote:
> > On 04/26/22 at 09:36am, Sourabh Jain wrote:
> > > On 15/04/22 03:59, Eric DeVolder wrote:
......
> > > > > > +#if defined(CONFIG_MEMORY_HOTPLUG)
> > > > > > +static int crash_memhp_notifier(struct notifier_block *nb,
> > > > > > +??? unsigned long val, void *v)
> > > > > > +{
> > > > > > +??? struct memory_notify *mhp = v;
> > > > > > +
> > > > > > +??? switch (val) {
> > > > > > +??? case MEM_ONLINE:
> > > > > > +??????? crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
> > > > > We don't differentiate the memory add/remove, cpu add, except of cpu
> > > > > remove. Means the hp_action only differentiate cpu remove from the other
> > > > > action. Maybe only making two types?
> > > > >
> > > > > #define KEXEC_CRASH_HP_REMOVE_CPU?? 0
> > > > > #define KEXEC_CRASH_HP_UPDATE_OTHER????? 1
> > > > >
> > > > Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
> > > > ADD_MEMORY.
> > > > Do you still want to consolidate these?
> > > On PowerPC different actions are needed for CPU add and memory add/remove.
> > > For CPU add case only FDT is updated whereas for the memory hotplug we will
> > > be
> > > updating FDT and elfcorehdr.
> > I don't understand. For elfcorehdr updating, we only need regenerate it.
> > Do you update them different for memory add/remove?
>
> We have different actions for cpu remove, CPU add and memory add/remove
> case.
>
> CPU remove: no action
> CPU add: update flattened device tree (FDT)
> memory add/remove: update FDT and regenerate/update elfcorehdr
>
> Since memory add/remove action is same we can have common hp_action for
> them.
For memory hot add/remove, we need rengereate elfcorehdr, and add the
new elfcorehdr into fdt. Except of this, FDT need to know the hp_action
and the hot added/removed memory region, namely the start and end, e.g
[start, end]?
I checked arm64 kexec code, seems we only need to know if mem hotplug
event happened, then regenerate elfcorehdr and embed the new elfcorehdr
into fdt. Then we don't know pass the [start, end] info into the
handler. Please tell if ppc is different or I missed anything.
If I am right, I would like the handler interface as Boris has made
in his draft patch.
void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
unsigned int cpu)
static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>
> >
> > What I saw is the added action for memory hotplug is only for message
> > printing. Is this really needed? And memory hotplug is even not
> > supported. Please correct me if I missed anything.
>
> I agree that currently memory hp_action is only used for printing warning
> message but
> eventually we will be handling memory hotplug case as well.
>
> > + /* crash update on memory hotplug is not support yet */
> > + if (hp_action == KEXEC_CRASH_HP_REMOVE_MEMORY || hp_action == KEXEC_CRASH_HP_ADD_MEMORY) {
> > + pr_info_once("crash hp: crash update is not supported with memory hotplug\n");
> > + return;
> > + }
>
> Thanks,
> Sourabh Jain
>
On 05/05/22 at 03:29pm, Sourabh Jain wrote:
>
> On 05/05/22 11:15, Baoquan He wrote:
> > On 04/28/22 at 10:48am, Sourabh Jain wrote:
> > > Hi Baoquan,
> > >
> > > On 26/04/22 10:52, Baoquan He wrote:
> > > > On 04/26/22 at 09:36am, Sourabh Jain wrote:
> > > > > On 15/04/22 03:59, Eric DeVolder wrote:
> > ......
> >
> > > > > > > > +#if defined(CONFIG_MEMORY_HOTPLUG)
> > > > > > > > +static int crash_memhp_notifier(struct notifier_block *nb,
> > > > > > > > +??? unsigned long val, void *v)
> > > > > > > > +{
> > > > > > > > +??? struct memory_notify *mhp = v;
> > > > > > > > +
> > > > > > > > +??? switch (val) {
> > > > > > > > +??? case MEM_ONLINE:
> > > > > > > > +??????? crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
> > > > > > > We don't differentiate the memory add/remove, cpu add, except of cpu
> > > > > > > remove. Means the hp_action only differentiate cpu remove from the other
> > > > > > > action. Maybe only making two types?
> > > > > > >
> > > > > > > #define KEXEC_CRASH_HP_REMOVE_CPU?? 0
> > > > > > > #define KEXEC_CRASH_HP_UPDATE_OTHER????? 1
> > > > > > >
> > > > > > Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
> > > > > > ADD_MEMORY.
> > > > > > Do you still want to consolidate these?
> > > > > On PowerPC different actions are needed for CPU add and memory add/remove.
> > > > > For CPU add case only FDT is updated whereas for the memory hotplug we will
> > > > > be
> > > > > updating FDT and elfcorehdr.
> > > > I don't understand. For elfcorehdr updating, we only need regenerate it.
> > > > Do you update them different for memory add/remove?
> > > We have different actions for cpu remove, CPU add and memory add/remove
> > > case.
> > >
> > > CPU remove: no action
> > > CPU add: update flattened device tree (FDT)
> > > memory add/remove: update FDT and regenerate/update elfcorehdr
> > >
> > > Since memory add/remove action is same we can have common hp_action for
> > > them.
> > For memory hot add/remove, we need rengereate elfcorehdr, and add the
> > new elfcorehdr into fdt. Except of this, FDT need to know the hp_action
> > and the hot added/removed memory region, namely the start and end, e.g
> > [start, end]?
> >
> > I checked arm64 kexec code, seems we only need to know if mem hotplug
> > event happened, then regenerate elfcorehdr and embed the new elfcorehdr
> > into fdt. Then we don't know pass the [start, end] info into the
> > handler. Please tell if ppc is different or I missed anything.
>
> Yes we don't need start and end info as such but we expect arch
> handler to have info about which hotplug action is performed.
> It is just that I don't see an significant advantage of consolidation of
> CPU ADD, memory ADD and Memory REMOVE in one hp_action as
> KEXEC_CRASH_HP_UPDATE_OTHER.
I see. I don't oppose all those passed info, just worried the
unnecessary info passed down to the handler.
>
> > If I am right, I would like the handler interface as Boris has made
> > in his draft patch.
> >
> > void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
> > unsigned int cpu)
> >
> > static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
> The above template works fine for PowerPC as long we have four different
> hp_action
> to indicate CPU add/remove and memory add/remove.
Cool. Then all things are clear. We can pass the needed hp_action and
cpu number only.
Hi Eric,
The consensus is reached, please proceed when it's convenient.
On 4/29/22 01:41, Sourabh Jain wrote:
>
> On 26/04/22 20:09, Eric DeVolder wrote:
>>
>>
>> On 4/25/22 23:21, Sourabh Jain wrote:
>>>
>>> On 13/04/22 22:12, Eric DeVolder wrote:
>>>> CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
>>>>
>>>> CRASH_HOTPLUG_ELFCOREHDR_SZ is used to specify the maximum size of
>>>> the elfcorehdr buffer/segment.
>>>>
>>>> This is a preparation for later usage.
>>>>
>>>> Signed-off-by: Eric DeVolder <[email protected]>
>>>> Acked-by: Baoquan He <[email protected]>
>>>> ---
>>>> arch/x86/Kconfig | 26 ++++++++++++++++++++++++++
>>>> 1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index b0142e01002e..f7b92ee1bcc7 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -2072,6 +2072,32 @@ config CRASH_DUMP
>>>> (CONFIG_RELOCATABLE=y).
>>>> For more details see Documentation/admin-guide/kdump/kdump.rst
>>>> +config CRASH_HOTPLUG
>>>> + bool "kernel updates of crash elfcorehdr"
>>>> + depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG) && KEXEC_FILE
>>>> + help
>>>> + Enable the kernel to update the crash elfcorehdr (which contains
>>>> + the list of CPUs and memory regions) directly when hot plug/unplug
>>>> + of CPUs or memory. Otherwise userspace must monitor these hot
>>>> + plug/unplug change notifications via udev in order to
>>>> + unload-then-reload the crash kernel so that the list of CPUs and
>>>> + memory regions is kept up-to-date. Note that the udev CPU and
>>>> + memory change notifications still occur (however, userspace is not
>>>> + required to monitor for crash dump purposes).
>>>> +
>>>> +config CRASH_HOTPLUG_ELFCOREHDR_SZ
>>>> + depends on CRASH_HOTPLUG
>>>> + int
>>>> + default 131072
>>>> + help
>>>> + Specify the maximum size of the elfcorehdr buffer/segment.
>>>> + The 128KiB default is sized so that it can accommodate 2048
>>>> + Elf64_Phdr, where each Phdr represents either a CPU or a
>>>> + region of memory.
>>>> + For example, this size can accommodate a machine with up to 1024
>>>> + CPUs and up to 1024 memory regions, eg. as represented by the
>>>> + 'System RAM' entries in /proc/iomem.
>>>
>>> Is it possible to get rid of CRASH_HOTPLUG_ELFCOREHDR_SZ?
>> At the moment, I do not think so. The idea behind this value is to represent the largest number of
>> CPUs and memory regions possible in the system. Today there is NR_CPUS which could be used for
>> CPUs, but there isn't a similar value for memory. I also am not aware of a kernel variable that
>> could be utilized to represent the maximum number of memory regions. If there is, please let me know!
>>>
>>> How about finding the additional buffer space needed for future CPU and memory
>>> add during the kdump load? Not sure about the feasibility of doing this in
>>> kexec tool (userspace).
>>
>> I may not understand what you are asking, but the x86 code, for kexec_file_load, does in fact
>> allocate all the space needed (currently via CRASH_HOTPLUG_ELFCOREHDR_SZ) upon kdump load.
>>
>> For kexec_load, I've had no problem asking the kexec tool to allocate a larger piece of memory for
>> the elfcorehdr. But it is the same problem as CRASH_HOTPLUG_ELFCOREHDR_SZ; how big? In my
>> workspace I tell kexec tool how big. If there are sysfs visible values for NR_CPU and memory, then
>> we could have kexec pull those and compute.
>
> Yeah dynamic calculation for PT_LOAD sections needed for possible memory may not be straightforward.
> But still I did not get the rational for limiting the possible PT_LOAD sections or memory ranges to
> only 1024. Although in kexec tool the max memory ranges for x86 is 32K.
>
> commit 1bc7bc7649fa29d95c98f6a6d8dd2f08734a865c
> Author: David Hildenbrand <[email protected]>
> Date: Tue Mar 23 11:01:10 2021 +0100
>
> crashdump/x86: increase CRASH_MAX_MEMORY_RANGES to 32k
>
> virtio-mem in Linux adds/removes individual memory blocks (e.g., 128 MB
> each). Linux merges adjacent memory blocks added by virtio-mem devices, but
> we can still end up with a very sparse memory layout when unplugging
> memory in corner cases.
>
> Let's increase the maximum number of crash memory ranges from ~2k to 32k.
> 32k should be sufficient for a very long time.
>
> e_phnum field in the header is 16 bits wide, so we can fit a maximum of
> ~64k entries in there, shared with other entries (i.e., CPU). Therefore,
> using up to 32k memory ranges is fine. (if we ever need more than ~64k,
>
> Do you see any issue if we increase the memory range count to 32K?
No, I do not. Allowing for 32K ranges means the elfcorehdr buffer is now 2MiB.
I'm thinking I'll redefine/rename this config option to mirror CRASH_MAX_MEMORY_RANGES
and default it to 32K. Then the buffer math will take into account NR_CPUS and this
value to compute the buffer size.
Thanks!
eric
>
> Thanks,
> Sourabh Jain
>
On 05/05/22 11:15, Baoquan He wrote:
> On 04/28/22 at 10:48am, Sourabh Jain wrote:
>> Hi Baoquan,
>>
>> On 26/04/22 10:52, Baoquan He wrote:
>>> On 04/26/22 at 09:36am, Sourabh Jain wrote:
>>>> On 15/04/22 03:59, Eric DeVolder wrote:
> ......
>
>>>>>>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>>>>>>> +static int crash_memhp_notifier(struct notifier_block *nb,
>>>>>>> + unsigned long val, void *v)
>>>>>>> +{
>>>>>>> + struct memory_notify *mhp = v;
>>>>>>> +
>>>>>>> + switch (val) {
>>>>>>> + case MEM_ONLINE:
>>>>>>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
>>>>>> We don't differentiate the memory add/remove, cpu add, except of cpu
>>>>>> remove. Means the hp_action only differentiate cpu remove from the other
>>>>>> action. Maybe only making two types?
>>>>>>
>>>>>> #define KEXEC_CRASH_HP_REMOVE_CPU 0
>>>>>> #define KEXEC_CRASH_HP_UPDATE_OTHER 1
>>>>>>
>>>>> Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
>>>>> ADD_MEMORY.
>>>>> Do you still want to consolidate these?
>>>> On PowerPC different actions are needed for CPU add and memory add/remove.
>>>> For CPU add case only FDT is updated whereas for the memory hotplug we will
>>>> be
>>>> updating FDT and elfcorehdr.
>>> I don't understand. For elfcorehdr updating, we only need regenerate it.
>>> Do you update them different for memory add/remove?
>> We have different actions for cpu remove, CPU add and memory add/remove
>> case.
>>
>> CPU remove: no action
>> CPU add: update flattened device tree (FDT)
>> memory add/remove: update FDT and regenerate/update elfcorehdr
>>
>> Since memory add/remove action is same we can have common hp_action for
>> them.
> For memory hot add/remove, we need rengereate elfcorehdr, and add the
> new elfcorehdr into fdt. Except of this, FDT need to know the hp_action
> and the hot added/removed memory region, namely the start and end, e.g
> [start, end]?
>
> I checked arm64 kexec code, seems we only need to know if mem hotplug
> event happened, then regenerate elfcorehdr and embed the new elfcorehdr
> into fdt. Then we don't know pass the [start, end] info into the
> handler. Please tell if ppc is different or I missed anything.
Yes we don't need start and end info as such but we expect arch
handler to have info about which hotplug action is performed.
It is just that I don't see an significant advantage of consolidation of
CPU ADD, memory ADD and Memory REMOVE in one hp_action as
KEXEC_CRASH_HP_UPDATE_OTHER.
> If I am right, I would like the handler interface as Boris has made
> in his draft patch.
>
> void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
> unsigned int cpu)
>
> static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
The above template works fine for PowerPC as long we have four different
hp_action
to indicate CPU add/remove and memory add/remove.
Thanks,
Sourabh Jain
On 5/5/22 06:04, Baoquan He wrote:
> On 05/05/22 at 03:29pm, Sourabh Jain wrote:
>>
>> On 05/05/22 11:15, Baoquan He wrote:
>>> On 04/28/22 at 10:48am, Sourabh Jain wrote:
>>>> Hi Baoquan,
>>>>
>>>> On 26/04/22 10:52, Baoquan He wrote:
>>>>> On 04/26/22 at 09:36am, Sourabh Jain wrote:
>>>>>> On 15/04/22 03:59, Eric DeVolder wrote:
>>> ......
>>>
>>>>>>>>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>>>>>>>>> +static int crash_memhp_notifier(struct notifier_block *nb,
>>>>>>>>> + unsigned long val, void *v)
>>>>>>>>> +{
>>>>>>>>> + struct memory_notify *mhp = v;
>>>>>>>>> +
>>>>>>>>> + switch (val) {
>>>>>>>>> + case MEM_ONLINE:
>>>>>>>>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY, -1U);
>>>>>>>> We don't differentiate the memory add/remove, cpu add, except of cpu
>>>>>>>> remove. Means the hp_action only differentiate cpu remove from the other
>>>>>>>> action. Maybe only making two types?
>>>>>>>>
>>>>>>>> #define KEXEC_CRASH_HP_REMOVE_CPU 0
>>>>>>>> #define KEXEC_CRASH_HP_UPDATE_OTHER 1
>>>>>>>>
>>>>>>> Sourabh Jain's work with PPC uses REMOVE_CPU, REMOVE_MEMORY, and
>>>>>>> ADD_MEMORY.
>>>>>>> Do you still want to consolidate these?
>>>>>> On PowerPC different actions are needed for CPU add and memory add/remove.
>>>>>> For CPU add case only FDT is updated whereas for the memory hotplug we will
>>>>>> be
>>>>>> updating FDT and elfcorehdr.
>>>>> I don't understand. For elfcorehdr updating, we only need regenerate it.
>>>>> Do you update them different for memory add/remove?
>>>> We have different actions for cpu remove, CPU add and memory add/remove
>>>> case.
>>>>
>>>> CPU remove: no action
>>>> CPU add: update flattened device tree (FDT)
>>>> memory add/remove: update FDT and regenerate/update elfcorehdr
>>>>
>>>> Since memory add/remove action is same we can have common hp_action for
>>>> them.
>>> For memory hot add/remove, we need rengereate elfcorehdr, and add the
>>> new elfcorehdr into fdt. Except of this, FDT need to know the hp_action
>>> and the hot added/removed memory region, namely the start and end, e.g
>>> [start, end]?
>>>
>>> I checked arm64 kexec code, seems we only need to know if mem hotplug
>>> event happened, then regenerate elfcorehdr and embed the new elfcorehdr
>>> into fdt. Then we don't know pass the [start, end] info into the
>>> handler. Please tell if ppc is different or I missed anything.
>>
>> Yes we don't need start and end info as such but we expect arch
>> handler to have info about which hotplug action is performed.
>> It is just that I don't see an significant advantage of consolidation of
>> CPU ADD, memory ADD and Memory REMOVE in one hp_action as
>> KEXEC_CRASH_HP_UPDATE_OTHER.
>
> I see. I don't oppose all those passed info, just worried the
> unnecessary info passed down to the handler.
>
>>
>>> If I am right, I would like the handler interface as Boris has made
>>> in his draft patch.
>>>
>>> void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action,
>>> unsigned int cpu)
>>>
>>> static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>> The above template works fine for PowerPC as long we have four different
>> hp_action
>> to indicate CPU add/remove and memory add/remove.
>
> Cool. Then all things are clear. We can pass the needed hp_action and
> cpu number only.
>
> Hi Eric,
>
> The consensus is reached, please proceed when it's convenient.
>
Excellent! I will post v8 soon!
Thanks!
eric