2022-04-05 02:18:48

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v6 0/8] crash: Kernel handling of CPU and memory hot un/plug

When the kdump service is loaded, if a CPU or memory is hot
un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
and memory in the system, must also be updated, else the resulting
vmcore is inaccurate (eg. missing either CPU context or memory
regions).

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (e. kernel, initrd, boot_params, puratory and
elfcorehdr) by the userspace kexec utility. In previous posts I have
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash hot un/plug handler that
registers with the CPU and memory notifiers. Upon CPU or memory
changes, this generic handler is invoked and performs important
housekeeping, for example obtaining the appropriate lock, and then
invokes an architecture specific handler to do the appropriate
updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory. No involvement
with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

- Disable the udev rule for updating kdump on hot un/plug changes.
Add the following as the first two lines to the udev rule file
/usr/lib/udev/rules.d/98-kexec.rules:

# For x86_64, the kernel handles updates to crash elfcorehdr
CONST{arch}=="x86-64", GOTO="kdump_reload_end"

These two lines will cause cpu and memory hot un/plug events
to be skipped within this rule file.

- Change to the kexec_file_load for loading the kdump kernel:
Eg. on RHEL: in /usr/bin/kdumpctl, change to:
standard_kexec_args="-p -d -s"
which adds the -s to select kexec_file_load syscall.

This patchset supports kexec_load with a modified kexec userspace
utility, and a working changeset to the kexec userspace utility
is provided here (and to use, the above change to standard_kexec_args
would be, for example, to append --hotplug-size=131072 instead of -s).

diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index 9826f6d..06adb7e 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -48,6 +48,7 @@
#include <x86/x86-linux.h>

extern struct arch_options_t arch_options;
+extern unsigned long long hotplug_size;

static int get_kernel_page_offset(struct kexec_info *UNUSED(info),
struct crash_elf_info *elf_info)
@@ -975,6 +976,13 @@ int load_crashdump_segments(struct kexec_info *info, char* mod_cmdline,
} else {
memsz = bufsz;
}
+
+ /* If hotplug support enabled, use that size */
+ if (hotplug_size) {
+ memsz = hotplug_size;
+ }
+
+ info->elfcorehdr =
elfcorehdr = add_buffer(info, tmp, bufsz, memsz, align, min_base,
max_addr, -1);
dbgprintf("Created elf header segment at 0x%lx\n", elfcorehdr);
diff --git a/kexec/kexec.c b/kexec/kexec.c
index f63b36b..9569d9a 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -58,6 +58,7 @@

unsigned long long mem_min = 0;
unsigned long long mem_max = ULONG_MAX;
+unsigned long long hotplug_size = 0;
static unsigned long kexec_flags = 0;
/* Flags for kexec file (fd) based syscall */
static unsigned long kexec_file_flags = 0;
@@ -672,6 +673,12 @@ static void update_purgatory(struct kexec_info *info)
if (info->segment[i].mem == (void *)info->rhdr.rel_addr) {
continue;
}
+ /* Don't include elfcorehdr in the checksum, if hotplug
+ * support enabled.
+ */
+ if (hotplug_size && (info->segment[i].mem == (void *)info->elfcorehdr)) {
+ continue;
+ }
sha256_update(&ctx, info->segment[i].buf,
info->segment[i].bufsz);
nullsz = info->segment[i].memsz - info->segment[i].bufsz;
@@ -1504,6 +1511,17 @@ int main(int argc, char *argv[])
case OPT_PRINT_CKR_SIZE:
print_crashkernel_region_size();
return 0;
+ case OPT_HOTPLUG_SIZE:
+ /* Reserved the specified size for hotplug growth */
+ hotplug_size = strtoul(optarg, &endptr, 0);
+ if (*endptr) {
+ fprintf(stderr,
+ "Bad option value in --hotplug-size=%s\n",
+ optarg);
+ usage();
+ return 1;
+ }
+ break;
default:
break;
}
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 595dd68..b30dda4 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -169,6 +169,7 @@ struct kexec_info {
int command_line_len;

int skip_checks;
+ unsigned long elfcorehdr;
};

struct arch_map_entry {
@@ -231,7 +232,8 @@ extern int file_types;
#define OPT_PRINT_CKR_SIZE 262
#define OPT_LOAD_LIVE_UPDATE 263
#define OPT_EXEC_LIVE_UPDATE 264
-#define OPT_MAX 265
+#define OPT_HOTPLUG_SIZE 265
+#define OPT_MAX 266
#define KEXEC_OPTIONS \
{ "help", 0, 0, OPT_HELP }, \
{ "version", 0, 0, OPT_VERSION }, \
@@ -258,6 +260,7 @@ extern int file_types;
{ "debug", 0, 0, OPT_DEBUG }, \
{ "status", 0, 0, OPT_STATUS }, \
{ "print-ckr-size", 0, 0, OPT_PRINT_CKR_SIZE }, \
+ { "hotplug-size", 2, 0, OPT_HOTPLUG_SIZE }, \

#define KEXEC_OPT_STR "h?vdfixyluet:pscaS"


Regards,
eric
---
v6: 1apr202
- Reword commit messages and some comment cleanup per Baoquan.
- Changed elf_index to elfcorehdr_index for clarity.
- Minor code changes per Baoquan.

v5: 3mar2022
https://lkml.org/lkml/2022/3/3/674
- Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
David Hildenbrand.
- Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
https://lkml.org/lkml/2022/2/9/1406
- Refactored patches per Baoquan suggestsions.
- A few corrections, per Baoquan.

v3: 10jan2022
https://lkml.org/lkml/2022/1/10/1212
- Rebasing per Baoquan He request.
- Changed memory notifier per David Hildenbrand.
- Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
https://lkml.org/lkml/2021/12/7/1088
- Acting upon Baoquan He suggestion of removing elfcorehdr from
the purgatory list of segments, removed purgatory code from
patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
https://lkml.org/lkml/2021/11/18/845
- working patchset demonstrating kernel handling of hotplug
updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
https://lkml.org/lkml/2020/12/14/532
- proposed concept of allowing kernel to handle hotplug update
of elfcorehdr
---


Eric DeVolder (8):
x86/crash: fix minor typo/bug in debug message
x86/crash: Introduce new options to support cpu and memory hotplug
crash: prototype change for crash_prepare_elf64_headers
crash: add generic infrastructure for crash hotplug support
kexec: exclude elfcorehdr from the segment digest
kexec: exclude hot remove cpu from elfcorehdr notes
x86/crash: Add x86 crash hotplug support for kexec_file_load
x86/crash: Add x86 crash hotplug support for kexec_load

arch/arm64/kernel/machine_kexec_file.c | 6 +-
arch/powerpc/kexec/file_load_64.c | 2 +-
arch/x86/Kconfig | 26 +++++
arch/x86/kernel/crash.c | 147 ++++++++++++++++++++++++-
include/linux/kexec.h | 21 +++-
kernel/crash_core.c | 118 ++++++++++++++++++++
kernel/kexec_file.c | 15 ++-
7 files changed, 325 insertions(+), 10 deletions(-)

--
2.27.0


2022-04-05 02:22:28

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v6 4/8] crash: add generic infrastructure for crash hotplug support

Upon CPU and memory changes, a generic crash_hotplug_handler()
dispatches the hot plug/unplug event to the architecture specific
arch_crash_hotplug_handler(). During the process, the kexec_mutex
is held.

To support cpu hotplug, a callback is registered to capture the
CPUHP_AP_ONLINE_DYN online and ofline events via
cpuhp_setup_state_nocalls().

To support memory hotplug, a notifier is registered to capture the
MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The cpu callback and memory notifier then call crash_hotplug_handler()
to handle the hot plug/unplug event.

Signed-off-by: Eric DeVolder <[email protected]>
---
include/linux/kexec.h | 16 +++++++
kernel/crash_core.c | 108 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 124 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index f93f2591fc1e..40e426cfd795 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -306,6 +306,13 @@ struct kimage {

/* Information for loading purgatory */
struct purgatory_info purgatory_info;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+ bool hotplug_event;
+ int offlinecpu;
+ bool elfcorehdr_index_valid;
+ int elfcorehdr_index;
+#endif
#endif

#ifdef CONFIG_IMA_KEXEC
@@ -322,6 +329,15 @@ struct kimage {
unsigned long elf_load_addr;
};

+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_hotplug_handler(struct kimage *image,
+ unsigned int hp_action, unsigned long a, unsigned long b);
+#define KEXEC_CRASH_HP_REMOVE_CPU 0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
+#define KEXEC_CRASH_HP_ADD_MEMORY 3
+#endif /* CONFIG_CRASH_HOTPLUG */
+
/* kexec interface functions */
extern void machine_kexec(struct kimage *image);
extern int machine_kexec_prepare(struct kimage *image);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 256cf6db573c..76959d440f71 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -9,12 +9,17 @@
#include <linux/init.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
+#include <linux/highmem.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>

#include <asm/page.h>
#include <asm/sections.h>

#include <crypto/sha1.h>

+#include "kexec_internal.h"
+
/* vmcoreinfo stuff */
unsigned char *vmcoreinfo_data;
size_t vmcoreinfo_size;
@@ -491,3 +496,106 @@ static int __init crash_save_vmcoreinfo_init(void)
}

subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+void __weak arch_crash_hotplug_handler(struct kimage *image,
+ unsigned int hp_action, unsigned long a, unsigned long b)
+{
+ pr_warn("crash hp: %s not implemented", __func__);
+}
+
+static void crash_hotplug_handler(unsigned int hp_action,
+ unsigned long a, unsigned long b)
+{
+ /* Obtain lock while changing crash information */
+ if (!mutex_trylock(&kexec_mutex))
+ return;
+
+ /* Check kdump is loaded */
+ if (kexec_crash_image) {
+ pr_debug("crash hp: hp_action %u, a %lu, b %lu", hp_action,
+ a, b);
+
+ /* Needed in order for the segments to be updated */
+ arch_kexec_unprotect_crashkres();
+
+ /* Flag to differentiate between normal load and hotplug */
+ kexec_crash_image->hotplug_event = true;
+
+ /* Now invoke arch-specific update handler */
+ arch_crash_hotplug_handler(kexec_crash_image, hp_action, a, b);
+
+ /* No longer handling a hotplug event */
+ kexec_crash_image->hotplug_event = false;
+
+ /* Change back to read-only */
+ arch_kexec_protect_crashkres();
+ }
+
+ /* Release lock now that update complete */
+ mutex_unlock(&kexec_mutex);
+}
+
+#if defined(CONFIG_MEMORY_HOTPLUG)
+static int crash_memhp_notifier(struct notifier_block *nb,
+ unsigned long val, void *v)
+{
+ struct memory_notify *mhp = v;
+ unsigned long start, end;
+
+ start = mhp->start_pfn << PAGE_SHIFT;
+ end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
+
+ switch (val) {
+ case MEM_ONLINE:
+ crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
+ start, end-start);
+ break;
+
+ case MEM_OFFLINE:
+ crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
+ start, end-start);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+ .notifier_call = crash_memhp_notifier,
+ .priority = 0
+};
+#endif
+
+#if defined(CONFIG_HOTPLUG_CPU)
+static int crash_cpuhp_online(unsigned int cpu)
+{
+ crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu, 0);
+ return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+ crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu, 0);
+ return 0;
+}
+#endif
+
+static int __init crash_hotplug_init(void)
+{
+ int result = 0;
+
+#if defined(CONFIG_MEMORY_HOTPLUG)
+ register_memory_notifier(&crash_memhp_nb);
+#endif
+
+#if defined(CONFIG_HOTPLUG_CPU)
+ result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "crash/cpuhp",
+ crash_cpuhp_online, crash_cpuhp_offline);
+#endif
+
+ return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif /* CONFIG_CRASH_HOTPLUG */
--
2.27.0

2022-04-12 04:56:37

by Eric DeVolder

[permalink] [raw]
Subject: Re: [PATCH v6 4/8] crash: add generic infrastructure for crash hotplug support



On 4/11/22 04:20, Baoquan He wrote:
> Hi Eric,
>
> On 04/01/22 at 02:30pm, Eric DeVolder wrote:
> ... ...
>
>> +static void crash_hotplug_handler(unsigned int hp_action,
>> + unsigned long a, unsigned long b)
>
> I am still struggling to consider if these unused parameters should be
> kept or removed. Do you foresee or feel on which ARCH they could be used?
>
> Considering our elfcorehdr updating method, once memory or cpu changed,
> we will update elfcorehdr and cpu notes to reflect all existing memory
> regions and cpu in the current system. We could end up with having them
> but never being used. Then we may finally need to clean them up.
>
> If you have investigated and foresee or feel they could be used on a
> certain architecture, we can keep them for the time being.

So 'hp_action' and 'a' are used within the existing patch series.
In crash_core.c, there is this bit of code:

+ kexec_crash_image->offlinecpu =
+ (hp_action == KEXEC_CRASH_HP_REMOVE_CPU) ?
+ (unsigned int)a : ~0U;

which is referencing both 'hp_action' and using 'a' from the cpu notifier handler.
I looked into removing 'a' and setting offlinecpu directly, but I thought
it better that offlinecpu be set within the safety of the kexec_mutex.
Also, Sourabh Jain's work with PowerPC utilizing this framework directly
references hp_action in the arch-specific handler.

The cpu and memory notifier handlers set hp_action accordingly. For cpu handler,
the 'a' is set with the impacted cpu. For memory handler, 'a' and 'b' form the
impacted memory range. I agree it looks like the memory range is currently
not useful.

Let me know what you think.
eric

>
>> +{
>> + /* Obtain lock while changing crash information */
>> + if (!mutex_trylock(&kexec_mutex))
>> + return;
>> +
>> + /* Check kdump is loaded */
>> + if (kexec_crash_image) {
>> + pr_debug("crash hp: hp_action %u, a %lu, b %lu", hp_action,
>> + a, b);
>> +
>> + /* Needed in order for the segments to be updated */
>> + arch_kexec_unprotect_crashkres();
>> +
>> + /* Flag to differentiate between normal load and hotplug */
>> + kexec_crash_image->hotplug_event = true;
>> +
>> + /* Now invoke arch-specific update handler */
>> + arch_crash_hotplug_handler(kexec_crash_image, hp_action, a, b);
>> +
>> + /* No longer handling a hotplug event */
>> + kexec_crash_image->hotplug_event = false;
>> +
>> + /* Change back to read-only */
>> + arch_kexec_protect_crashkres();
>> + }
>> +
>> + /* Release lock now that update complete */
>> + mutex_unlock(&kexec_mutex);
>> +}
>> +
>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>> +static int crash_memhp_notifier(struct notifier_block *nb,
>> + unsigned long val, void *v)
>> +{
>> + struct memory_notify *mhp = v;
>> + unsigned long start, end;
>> +
>> + start = mhp->start_pfn << PAGE_SHIFT;
>> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
>> +
>> + switch (val) {
>> + case MEM_ONLINE:
>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
>> + start, end-start);
>> + break;
>> +
>> + case MEM_OFFLINE:
>> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
>> + start, end-start);
>> + break;
>> + }
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block crash_memhp_nb = {
>> + .notifier_call = crash_memhp_notifier,
>> + .priority = 0
>> +};
>> +#endif
>> +
>> +#if defined(CONFIG_HOTPLUG_CPU)
>> +static int crash_cpuhp_online(unsigned int cpu)
>> +{
>> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu, 0);
>> + return 0;
>> +}
>> +
>> +static int crash_cpuhp_offline(unsigned int cpu)
>> +{
>> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu, 0);
>> + return 0;
>> +}
>> +#endif
>> +
>> +static int __init crash_hotplug_init(void)
>> +{
>> + int result = 0;
>> +
>> +#if defined(CONFIG_MEMORY_HOTPLUG)
>> + register_memory_notifier(&crash_memhp_nb);
>> +#endif
>> +
>> +#if defined(CONFIG_HOTPLUG_CPU)
>> + result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>> + "crash/cpuhp",
>> + crash_cpuhp_online, crash_cpuhp_offline);
>> +#endif
>> +
>> + return result;
>> +}
>> +
>> +subsys_initcall(crash_hotplug_init);
>> +#endif /* CONFIG_CRASH_HOTPLUG */
>> --
>> 2.27.0
>>
>

2022-04-12 06:34:51

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v6 4/8] crash: add generic infrastructure for crash hotplug support

Hi Eric,

On 04/01/22 at 02:30pm, Eric DeVolder wrote:
... ...

> +static void crash_hotplug_handler(unsigned int hp_action,
> + unsigned long a, unsigned long b)

I am still struggling to consider if these unused parameters should be
kept or removed. Do you foresee or feel on which ARCH they could be used?

Considering our elfcorehdr updating method, once memory or cpu changed,
we will update elfcorehdr and cpu notes to reflect all existing memory
regions and cpu in the current system. We could end up with having them
but never being used. Then we may finally need to clean them up.

If you have investigated and foresee or feel they could be used on a
certain architecture, we can keep them for the time being.

> +{
> + /* Obtain lock while changing crash information */
> + if (!mutex_trylock(&kexec_mutex))
> + return;
> +
> + /* Check kdump is loaded */
> + if (kexec_crash_image) {
> + pr_debug("crash hp: hp_action %u, a %lu, b %lu", hp_action,
> + a, b);
> +
> + /* Needed in order for the segments to be updated */
> + arch_kexec_unprotect_crashkres();
> +
> + /* Flag to differentiate between normal load and hotplug */
> + kexec_crash_image->hotplug_event = true;
> +
> + /* Now invoke arch-specific update handler */
> + arch_crash_hotplug_handler(kexec_crash_image, hp_action, a, b);
> +
> + /* No longer handling a hotplug event */
> + kexec_crash_image->hotplug_event = false;
> +
> + /* Change back to read-only */
> + arch_kexec_protect_crashkres();
> + }
> +
> + /* Release lock now that update complete */
> + mutex_unlock(&kexec_mutex);
> +}
> +
> +#if defined(CONFIG_MEMORY_HOTPLUG)
> +static int crash_memhp_notifier(struct notifier_block *nb,
> + unsigned long val, void *v)
> +{
> + struct memory_notify *mhp = v;
> + unsigned long start, end;
> +
> + start = mhp->start_pfn << PAGE_SHIFT;
> + end = ((mhp->start_pfn + mhp->nr_pages) << PAGE_SHIFT) - 1;
> +
> + switch (val) {
> + case MEM_ONLINE:
> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
> + start, end-start);
> + break;
> +
> + case MEM_OFFLINE:
> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
> + start, end-start);
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block crash_memhp_nb = {
> + .notifier_call = crash_memhp_notifier,
> + .priority = 0
> +};
> +#endif
> +
> +#if defined(CONFIG_HOTPLUG_CPU)
> +static int crash_cpuhp_online(unsigned int cpu)
> +{
> + crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu, 0);
> + return 0;
> +}
> +
> +static int crash_cpuhp_offline(unsigned int cpu)
> +{
> + crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu, 0);
> + return 0;
> +}
> +#endif
> +
> +static int __init crash_hotplug_init(void)
> +{
> + int result = 0;
> +
> +#if defined(CONFIG_MEMORY_HOTPLUG)
> + register_memory_notifier(&crash_memhp_nb);
> +#endif
> +
> +#if defined(CONFIG_HOTPLUG_CPU)
> + result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> + "crash/cpuhp",
> + crash_cpuhp_online, crash_cpuhp_offline);
> +#endif
> +
> + return result;
> +}
> +
> +subsys_initcall(crash_hotplug_init);
> +#endif /* CONFIG_CRASH_HOTPLUG */
> --
> 2.27.0
>

2022-04-13 03:04:17

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v6 4/8] crash: add generic infrastructure for crash hotplug support

On 04/11/22 at 08:54am, Eric DeVolder wrote:
>
>
> On 4/11/22 04:20, Baoquan He wrote:
> > Hi Eric,
> >
> > On 04/01/22 at 02:30pm, Eric DeVolder wrote:
> > ... ...
> >
> > > +static void crash_hotplug_handler(unsigned int hp_action,
> > > + unsigned long a, unsigned long b)
> >
> > I am still struggling to consider if these unused parameters should be
> > kept or removed. Do you foresee or feel on which ARCH they could be used?
> >
> > Considering our elfcorehdr updating method, once memory or cpu changed,
> > we will update elfcorehdr and cpu notes to reflect all existing memory
> > regions and cpu in the current system. We could end up with having them
> > but never being used. Then we may finally need to clean them up.
> >
> > If you have investigated and foresee or feel they could be used on a
> > certain architecture, we can keep them for the time being.
>
> So 'hp_action' and 'a' are used within the existing patch series.
> In crash_core.c, there is this bit of code:
>
> + kexec_crash_image->offlinecpu =
> + (hp_action == KEXEC_CRASH_HP_REMOVE_CPU) ?
> + (unsigned int)a : ~0U;
>
> which is referencing both 'hp_action' and using 'a' from the cpu notifier handler.
> I looked into removing 'a' and setting offlinecpu directly, but I thought
> it better that offlinecpu be set within the safety of the kexec_mutex.
> Also, Sourabh Jain's work with PowerPC utilizing this framework directly
> references hp_action in the arch-specific handler.
>
> The cpu and memory notifier handlers set hp_action accordingly. For cpu handler,
> the 'a' is set with the impacted cpu. For memory handler, 'a' and 'b' form the
> impacted memory range. I agree it looks like the memory range is currently
> not useful.

OK, memory handler doesn't need the action, memory regions. While cpu
handler needs it to exclude the hot plugged cpu.

We could have two ways to acheive this as below. How do you think about
them?

static void crash_hotplug_handler(unsigned int hp_action,
unsigned long cpu)

static int crash_memhp_notifier(struct notifier_block *nb,
unsigned long val, void *v)
{
......
switch (val) {
case MEM_ONLINE:
crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
-1UL);
break;

case MEM_OFFLINE:
crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
-1UL);
break;
}
return NOTIFY_OK;
}

static int crash_cpuhp_online(unsigned int cpu)
{
crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
return 0;
}

static int crash_cpuhp_offline(unsigned int cpu)
{
crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
return 0;
}

OR,

static void crash_hotplug_handler(unsigned int hp_action,
int* cpu)

static int crash_cpuhp_online(unsigned int cpu)
{
crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, NULL);
return 0;
}

static int crash_cpuhp_offline(unsigned int cpu)
{
int dead_cpu = cpu;
crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, &cpu);
return 0;
}

2022-04-13 15:31:51

by Eric DeVolder

[permalink] [raw]
Subject: Re: [PATCH v6 4/8] crash: add generic infrastructure for crash hotplug support



On 4/12/22 21:41, Baoquan He wrote:
> On 04/11/22 at 08:54am, Eric DeVolder wrote:
>>
>>
>> On 4/11/22 04:20, Baoquan He wrote:
>>> Hi Eric,
>>>
>>> On 04/01/22 at 02:30pm, Eric DeVolder wrote:
>>> ... ...
>>>
>>>> +static void crash_hotplug_handler(unsigned int hp_action,
>>>> + unsigned long a, unsigned long b)
>>>
>>> I am still struggling to consider if these unused parameters should be
>>> kept or removed. Do you foresee or feel on which ARCH they could be used?
>>>
>>> Considering our elfcorehdr updating method, once memory or cpu changed,
>>> we will update elfcorehdr and cpu notes to reflect all existing memory
>>> regions and cpu in the current system. We could end up with having them
>>> but never being used. Then we may finally need to clean them up.
>>>
>>> If you have investigated and foresee or feel they could be used on a
>>> certain architecture, we can keep them for the time being.
>>
>> So 'hp_action' and 'a' are used within the existing patch series.
>> In crash_core.c, there is this bit of code:
>>
>> + kexec_crash_image->offlinecpu =
>> + (hp_action == KEXEC_CRASH_HP_REMOVE_CPU) ?
>> + (unsigned int)a : ~0U;
>>
>> which is referencing both 'hp_action' and using 'a' from the cpu notifier handler.
>> I looked into removing 'a' and setting offlinecpu directly, but I thought
>> it better that offlinecpu be set within the safety of the kexec_mutex.
>> Also, Sourabh Jain's work with PowerPC utilizing this framework directly
>> references hp_action in the arch-specific handler.
>>
>> The cpu and memory notifier handlers set hp_action accordingly. For cpu handler,
>> the 'a' is set with the impacted cpu. For memory handler, 'a' and 'b' form the
>> impacted memory range. I agree it looks like the memory range is currently
>> not useful.
>
> OK, memory handler doesn't need the action, memory regions. While cpu
> handler needs it to exclude the hot plugged cpu.
>
> We could have two ways to acheive this as below. How do you think about
> them?
>
> static void crash_hotplug_handler(unsigned int hp_action,
> unsigned long cpu)
>
> static int crash_memhp_notifier(struct notifier_block *nb,
> unsigned long val, void *v)
> {
> ......
> switch (val) {
> case MEM_ONLINE:
> crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
> -1UL);
> break;
>
> case MEM_OFFLINE:
> crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
> -1UL);
> break;
> }
> return NOTIFY_OK;
> }
>
> static int crash_cpuhp_online(unsigned int cpu)
> {
> crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
> return 0;
> }
>
> static int crash_cpuhp_offline(unsigned int cpu)
> {
> crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
> return 0;
> }

I'm OK with the above. Shall I post v7 or are you still looking at patches 7 and 8?
Thanks!
Eric

>
> OR,
>
> static void crash_hotplug_handler(unsigned int hp_action,
> int* cpu)
>
> static int crash_cpuhp_online(unsigned int cpu)
> {
> crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, NULL);
> return 0;
> }
>
> static int crash_cpuhp_offline(unsigned int cpu)
> {
> int dead_cpu = cpu;
> crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, &cpu);
> return 0;
> }
>

2022-04-13 16:31:58

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH v6 4/8] crash: add generic infrastructure for crash hotplug support

On 04/13/22 at 07:37am, Eric DeVolder wrote:
>
>
> On 4/12/22 21:41, Baoquan He wrote:
> > On 04/11/22 at 08:54am, Eric DeVolder wrote:
> > >
> > >
> > > On 4/11/22 04:20, Baoquan He wrote:
> > > > Hi Eric,
> > > >
> > > > On 04/01/22 at 02:30pm, Eric DeVolder wrote:
> > > > ... ...
> > > >
> > > > > +static void crash_hotplug_handler(unsigned int hp_action,
> > > > > + unsigned long a, unsigned long b)
> > > >
> > > > I am still struggling to consider if these unused parameters should be
> > > > kept or removed. Do you foresee or feel on which ARCH they could be used?
> > > >
> > > > Considering our elfcorehdr updating method, once memory or cpu changed,
> > > > we will update elfcorehdr and cpu notes to reflect all existing memory
> > > > regions and cpu in the current system. We could end up with having them
> > > > but never being used. Then we may finally need to clean them up.
> > > >
> > > > If you have investigated and foresee or feel they could be used on a
> > > > certain architecture, we can keep them for the time being.
> > >
> > > So 'hp_action' and 'a' are used within the existing patch series.
> > > In crash_core.c, there is this bit of code:
> > >
> > > + kexec_crash_image->offlinecpu =
> > > + (hp_action == KEXEC_CRASH_HP_REMOVE_CPU) ?
> > > + (unsigned int)a : ~0U;
> > >
> > > which is referencing both 'hp_action' and using 'a' from the cpu notifier handler.
> > > I looked into removing 'a' and setting offlinecpu directly, but I thought
> > > it better that offlinecpu be set within the safety of the kexec_mutex.
> > > Also, Sourabh Jain's work with PowerPC utilizing this framework directly
> > > references hp_action in the arch-specific handler.
> > >
> > > The cpu and memory notifier handlers set hp_action accordingly. For cpu handler,
> > > the 'a' is set with the impacted cpu. For memory handler, 'a' and 'b' form the
> > > impacted memory range. I agree it looks like the memory range is currently
> > > not useful.
> >
> > OK, memory handler doesn't need the action, memory regions. While cpu
> > handler needs it to exclude the hot plugged cpu.
> >
> > We could have two ways to acheive this as below. How do you think about
> > them?
> >
> > static void crash_hotplug_handler(unsigned int hp_action,
> > unsigned long cpu)
> >
> > static int crash_memhp_notifier(struct notifier_block *nb,
> > unsigned long val, void *v)
> > {
> > ......
> > switch (val) {
> > case MEM_ONLINE:
> > crash_hotplug_handler(KEXEC_CRASH_HP_ADD_MEMORY,
> > -1UL);
> > break;
> >
> > case MEM_OFFLINE:
> > crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_MEMORY,
> > -1UL);
> > break;
> > }
> > return NOTIFY_OK;
> > }
> >
> > static int crash_cpuhp_online(unsigned int cpu)
> > {
> > crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, cpu);
> > return 0;
> > }
> >
> > static int crash_cpuhp_offline(unsigned int cpu)
> > {
> > crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
> > return 0;
> > }
>
> I'm OK with the above. Shall I post v7 or are you still looking at patches 7 and 8?
> Thanks!

Just acked patch 8. Patch 7 need be updated too, so will check in v7.

> >
> > OR,
> >
> > static void crash_hotplug_handler(unsigned int hp_action,
> > int* cpu)
> >
> > static int crash_cpuhp_online(unsigned int cpu)
> > {
> > crash_hotplug_handler(KEXEC_CRASH_HP_ADD_CPU, NULL);
> > return 0;
> > }
> >
> > static int crash_cpuhp_offline(unsigned int cpu)
> > {
> > int dead_cpu = cpu;
> > crash_hotplug_handler(KEXEC_CRASH_HP_REMOVE_CPU, &cpu);
> > return 0;
> > }
> >
>