2023-06-28 18:57:44

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 00/10] crash: Kernel handling of CPU and memory hot un/plug

This series is dependent upon "refactor Kconfig to consolidate
KEXEC and CRASH options".
https://lore.kernel.org/lkml/[email protected]/

Once the kdump service is loaded, if changes to CPUs or memory occur,
either by hot un/plug or off/onlining, the crash elfcorehdr must also
be updated.

The elfcorehdr describes to kdump the CPUs and memory in the system,
and any inaccuracies can result in a vmcore with missing CPU context
or memory regions.

The current solution utilizes udev to initiate an unload-then-reload
of the kdump image (eg. kernel, initrd, boot_params, purgatory and
elfcorehdr) by the userspace kexec utility. In the original post I
outlined the significant performance problems related to offloading
this activity to userspace.

This patchset introduces a generic crash handler that registers with
the CPU and memory notifiers. Upon CPU or memory changes, from either
hot un/plug or off/onlining, this generic handler is invoked and
performs important housekeeping, for example obtaining the appropriate
lock, and then invokes an architecture specific handler to do the
appropriate elfcorehdr update.

Note the description in patch 'crash: change crash_prepare_elf64_headers()
to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
enables further optimizations related to CPU plug/unplug/online/offline
performance of elfcorehdr updates.

In the case of x86_64, the arch specific handler generates a new
elfcorehdr, and overwrites the old one in memory; thus no involvement
with userspace needed.

To realize the benefits/test this patchset, one must make a couple
of minor changes to userspace:

- Prevent udev from updating kdump crash kernel on hot un/plug changes.
Add the following as the first lines to the RHEL udev rule file
/usr/lib/udev/rules.d/98-kexec.rules:

# The kernel updates the crash elfcorehdr for CPU and memory changes
SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

With this changeset applied, the two rules evaluate to false for
CPU and memory change events and thus skip the userspace
unload-then-reload of kdump.

- Change to the kexec_file_load for loading the kdump kernel:
Eg. on RHEL: in /usr/bin/kdumpctl, change to:
standard_kexec_args="-p -d -s"
which adds the -s to select kexec_file_load() syscall.

This kernel patchset also supports kexec_load() with a modified kexec
userspace utility. A working changeset to the kexec userspace utility
is posted to the kexec-tools mailing list here:

http://lists.infradead.org/pipermail/kexec/2023-May/027049.html

To use the kexec-tools patch, apply, build and install kexec-tools,
then change the kdumpctl's standard_kexec_args to replace the -s with
--hotplug. The removal of -s reverts to the kexec_load syscall and
the addition of --hotplug invokes the changes put forth in the
kexec-tools patch.

Regards,
eric
---
v24: 28jun2023
- Rebased onto 6.4.0
- Included Documentation/ABI/testing entries for the new sysfs
crash_hotplug attributes, per Greg Kroah-Hartman.
- Refactored drivers/base/cpu|memory.c to use the .is_visible()
method for attributes, per Greg Kroah-Hartman.
- Retained all existing Acks and RBs as the few changes as a result
of Greg's requests were trivial.

v23: 12jun2023
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.4.0-rc6
- Refactored Kconfig, per Thomas. See series:
https://lore.kernel.org/lkml/[email protected]/
- Reworked commit messages to conform to style, per Thomas.
- Applied Baoquan He Acked-by to kexec_load() patch.
- Applied Hari Bathini Acked-by for the series.
- No code changes.

v22: 3may2023
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.3.0
- Improved support for kexec_load(), per Hari Bathini. See
"crash: hotplug support for kexec_load()" which is the only
change to this series.
- Applied Baoquan He's Acked-by for all other patches.

v21: 4apr2023
https://lkml.org/lkml/2023/4/4/1136
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.3.0-rc5
- Additional simplification of indentation in crash_handle_hotplug_event(),
per Baoquan.

v20: 17mar2023
https://lkml.org/lkml/2023/3/17/1169
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.3.0-rc2
- Defaulting CRASH_HOTPLUG for x86 to Y, per Sourabh.
- Explicitly initializing image->hp_action, per Baoquan.
- Simplified kexec_trylock() in crash_handle_hotplug_event(),
per Baoquan.
- Applied Sourabh's Reviewed-by to the series.

v19: 6mar2023
https://lkml.org/lkml/2023/3/6/1358
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0
- Did away with offlinecpu, per Thomas Gleixner.
- Changed to CPUHP_BP_PREPARE_DYN instead of CPUHP_AP_ONLINE_DYN.
- Did away with elfcorehdr_index_valid, per Sourabh.
- Convert to for_each_possible_cpu() in crash_prepare_elf64_headers()
per Sourabh.
- Small optimization for x86 cpu changes.

v18: 31jan2023
https://lkml.org/lkml/2023/1/31/1356
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0-rc6
- Renamed struct kimage member hotplug_event to hp_action, and
re-enumerated the KEXEC_CRASH_HP_x items, adding _NONE at 0.
- Moved to cpuhp state CPUHP_BP_PREPARE_DYN instead of
CPUHP_AP_ONLINE_DYN in order to minimize window of time CPU
is not reflected in elfcorehdr.
- Reworked some of the comments and commit messages to offer
more of the why, than what, per Thomas Gleixner.

v17: 18jan2023
https://lkml.org/lkml/2023/1/18/1420
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0-rc4
- Moved a bit of code around so that kexec_load()-only builds
work, per Sourabh.
- Corrected computation of number of memory region Phdrs needed
when x86 memory hotplug is not enabled, per Baoquan.

v16: 5jan2023
https://lkml.org/lkml/2023/1/5/673
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.2.0-rc2
- Corrected error identified by Baoquan.

v15: 9dec2022
https://lkml.org/lkml/2022/12/9/520
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.1.0-rc8
- Replaced arch_un/map_crash_pages() with direct use of
kun/map_local_pages(), per Boris.
- Some x86 changes, per Boris.

v14: 16nov2022
https://lkml.org/lkml/2022/11/16/1645
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.1.0-rc5
- Introduced CRASH_HOTPLUG Kconfig item to better fine tune
compilation of feature components, per Boris.
- Removed hp_action parameter to arch_crash_handle_hotplug_event()
as it is unused.

v13: 31oct2022
https://lkml.org/lkml/2022/10/31/854
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.1.0-rc3, which means converting to use the new
kexec_trylock() away from mutex_lock(kexec_mutex).
- Moved arch_un/map_crash_pages() into kexec.h and default
implementation using k/unmap_local_pages().
- Changed more #ifdef's into IS_ENABLED()
- Changed CRASH_MAX_MEMORY_RANGES to 8192 from 32768, and it moved
into x86 crash.c as #define rather Kconfig item, per Boris.
- Check number of Phdrs against PN_XNUM, max possible.

v12: 9sep2022
https://lkml.org/lkml/2022/9/9/1358
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.0-rc4
- Addressed some minor formatting items, per Baoquan

v11: 26aug2022
https://lkml.org/lkml/2022/8/26/963
https://lore.kernel.org/lkml/[email protected]/
- Rebased onto 6.0-rc2
- Redid the rework of __weak to use asm/kexec.h, per Baoquan
- Reworked some comments and minor items, per Baoquan

v10: 21jul2022
https://lkml.org/lkml/2022/7/21/1007
https://lore.kernel.org/lkml/[email protected]/
- Rebased to 5.19.0-rc7
- Per Sourabh, corrected build issue with arch_un/map_crash_pages()
for architectures not supporting this feature.
- Per David Hildebrand, removed the WARN_ONCE() altogether.
- Per David Hansen, converted to use of kmap_local_page().
- Per Baoquan He, replaced use of __weak with the kexec technique.

v9: 13jun2022
https://lkml.org/lkml/2022/6/13/3382
https://lore.kernel.org/lkml/[email protected]/
- Rebased to 5.18.0
- Per Sourabh, moved crash_prepare_elf64_headers() into common
crash_core.c to avoid compile issues with kexec_load only path.
- Per David Hildebrand, replaced mutex_trylock() with mutex_lock().
- Changed the __weak arch_crash_handle_hotplug_event() to utilize
WARN_ONCE() instead of WARN(). Fix some formatting issues.
- Per Sourabh, introduced sysfs attribute crash_hotplug for memory
and CPUs; for use by userspace (udev) to determine if the kernel
performs crash hot un/plug support.
- Per Sourabh, moved the code detecting the elfcorehdr segment from
arch/x86 into crash_core:handle_hotplug_event() so both kexec_load
and kexec_file_load can benefit.
- Updated userspace kexec-tools kexec utility to reflect change to
using CRASH_MAX_MEMORY_RANGES and get_nr_cpus().
- Updated the new proposed udev rules to reflect using the sysfs
attributes crash_hotplug.

v8: 5may2022
https://lkml.org/lkml/2022/5/5/1133
https://lore.kernel.org/lkml/[email protected]/
- Per Borislav Petkov, eliminated CONFIG_CRASH_HOTPLUG in favor
of CONFIG_HOTPLUG_CPU || CONFIG_MEMORY_HOTPLUG, ie a new define
is not needed. Also use of IS_ENABLED() rather than #ifdef's.
Renamed crash_hotplug_handler() to handle_hotplug_event().
And other corrections.
- Per Baoquan, minimized the parameters to the arch_crash_
handle_hotplug_event() to hp_action and cpu.
- Introduce KEXEC_CRASH_HP_INVALID_CPU definition, per Baoquan.
- Per Sourabh Jain, renamed and repurposed CRASH_HOTPLUG_ELFCOREHDR_SZ
to CONFIG_CRASH_MAX_MEMORY_RANGES, mirroring kexec-tools change
by David Hildebrand. Folded this patch into the x86
kexec_file_load support patch.

v7: 13apr2022
https://lkml.org/lkml/2022/4/13/850
https://lore.kernel.org/lkml/[email protected]/
- Resolved parameter usage to crash_hotplug_handler(), per Baoquan.

v6: 1apr2022
https://lkml.org/lkml/2022/4/1/1203
https://lore.kernel.org/lkml/[email protected]/
- Reword commit messages and some comment cleanup per Baoquan.
- Changed elf_index to elfcorehdr_index for clarity.
- Minor code changes per Baoquan.

v5: 3mar2022
https://lkml.org/lkml/2022/3/3/674
https://lore.kernel.org/lkml/[email protected]/
- Reworded description of CRASH_HOTPLUG_ELFCOREHDR_SZ, per
David Hildenbrand.
- Refactored slightly a few patches per Baoquan recommendation.

v4: 9feb2022
https://lkml.org/lkml/2022/2/9/1406
https://lore.kernel.org/lkml/[email protected]/
- Refactored patches per Baoquan suggestsions.
- A few corrections, per Baoquan.

v3: 10jan2022
https://lkml.org/lkml/2022/1/10/1212
https://lore.kernel.org/lkml/[email protected]/
- Rebasing per Baoquan He request.
- Changed memory notifier per David Hildenbrand.
- Providing example kexec userspace change in cover letter.

RFC v2: 7dec2021
https://lkml.org/lkml/2021/12/7/1088
https://lore.kernel.org/lkml/[email protected]/
- Acting upon Baoquan He suggestion of removing elfcorehdr from
the purgatory list of segments, removed purgatory code from
patchset, and it is signficiantly simpler now.

RFC v1: 18nov2021
https://lkml.org/lkml/2021/11/18/845
https://lore.kernel.org/lkml/[email protected]/
- working patchset demonstrating kernel handling of hotplug
updates to x86 elfcorehdr for kexec_file_load

RFC: 14dec2020
https://lkml.org/lkml/2020/12/14/532
https://lore.kernel.org/lkml/[email protected]/
- proposed concept of allowing kernel to handle hotplug update
of elfcorehdr
---

Eric DeVolder (10):
drivers/base: refactor cpu.c to use .is_visible()
drivers/base: refactor memory.c to use .is_visible()
crash: move a few code bits to setup support of crash hotplug
crash: add generic infrastructure for crash hotplug support
kexec: exclude elfcorehdr from the segment digest
crash: memory and CPU hotplug sysfs attributes
x86/crash: add x86 crash hotplug support
crash: hotplug support for kexec_load()
crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
x86/crash: optimize CPU changes

.../ABI/testing/sysfs-devices-memory | 8 +
.../ABI/testing/sysfs-devices-system-cpu | 8 +
.../admin-guide/mm/memory-hotplug.rst | 8 +
Documentation/core-api/cpu_hotplug.rst | 18 +
arch/x86/Kconfig | 3 +
arch/x86/include/asm/kexec.h | 18 +
arch/x86/kernel/crash.c | 140 ++++++-
drivers/base/cpu.c | 83 +++-
drivers/base/memory.c | 91 ++++-
include/linux/crash_core.h | 9 +
include/linux/kexec.h | 63 +++-
include/uapi/linux/kexec.h | 1 +
kernel/Kconfig.kexec | 35 ++
kernel/crash_core.c | 355 ++++++++++++++++++
kernel/kexec.c | 5 +
kernel/kexec_core.c | 6 +
kernel/kexec_file.c | 187 +--------
kernel/ksysfs.c | 15 +
18 files changed, 819 insertions(+), 234 deletions(-)

--
2.31.1



2023-06-28 18:57:46

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 03/10] crash: move a few code bits to setup support of crash hotplug

The crash hotplug support leans on the work for the kexec_file_load()
syscall. To also support the kexec_load() syscall, a few bits of code
need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are
moved out of kexec_file.c and into a common location crash_core.c.

No functionality change intended.

Signed-off-by: Eric DeVolder <[email protected]>
Reviewed-by: Sourabh Jain <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
include/linux/kexec.h | 30 +++----
kernel/crash_core.c | 182 ++++++++++++++++++++++++++++++++++++++++++
kernel/kexec_file.c | 181 -----------------------------------------
3 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 22b5cd24f581..811a90e09698 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -105,6 +105,21 @@ struct compat_kexec_segment {
};
#endif

+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN 4096
+
+struct crash_mem {
+ unsigned int max_nr_ranges;
+ unsigned int nr_ranges;
+ struct range ranges[];
+};
+
+extern int crash_exclude_mem_range(struct crash_mem *mem,
+ unsigned long long mstart,
+ unsigned long long mend);
+extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz);
+
#ifdef CONFIG_KEXEC_FILE
struct purgatory_info {
/*
@@ -230,21 +245,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
}
#endif

-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN 4096
-
-struct crash_mem {
- unsigned int max_nr_ranges;
- unsigned int nr_ranges;
- struct range ranges[];
-};
-
-extern int crash_exclude_mem_range(struct crash_mem *mem,
- unsigned long long mstart,
- unsigned long long mend);
-extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
- void **addr, unsigned long *sz);
-
#ifndef arch_kexec_apply_relocations_add
/*
* arch_kexec_apply_relocations_add - apply relocations of type RELA
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..b7c30b748a16 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -10,6 +10,7 @@
#include <linux/utsname.h>
#include <linux/vmalloc.h>
#include <linux/sizes.h>
+#include <linux/kexec.h>

#include <asm/page.h>
#include <asm/sections.h>
@@ -314,6 +315,187 @@ static int __init parse_crashkernel_dummy(char *arg)
}
early_param("crashkernel", parse_crashkernel_dummy);

+int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
+ void **addr, unsigned long *sz)
+{
+ Elf64_Ehdr *ehdr;
+ Elf64_Phdr *phdr;
+ unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
+ unsigned char *buf;
+ unsigned int cpu, i;
+ unsigned long long notes_addr;
+ unsigned long mstart, mend;
+
+ /* extra phdr for vmcoreinfo ELF note */
+ nr_phdr = nr_cpus + 1;
+ nr_phdr += mem->nr_ranges;
+
+ /*
+ * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
+ * area (for example, ffffffff80000000 - ffffffffa0000000 on x86_64).
+ * I think this is required by tools like gdb. So same physical
+ * memory will be mapped in two ELF headers. One will contain kernel
+ * text virtual addresses and other will have __va(physical) addresses.
+ */
+
+ nr_phdr++;
+ elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
+ elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
+
+ buf = vzalloc(elf_sz);
+ if (!buf)
+ return -ENOMEM;
+
+ ehdr = (Elf64_Ehdr *)buf;
+ phdr = (Elf64_Phdr *)(ehdr + 1);
+ memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
+ ehdr->e_ident[EI_CLASS] = ELFCLASS64;
+ ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
+ ehdr->e_ident[EI_VERSION] = EV_CURRENT;
+ ehdr->e_ident[EI_OSABI] = ELF_OSABI;
+ memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
+ ehdr->e_type = ET_CORE;
+ ehdr->e_machine = ELF_ARCH;
+ ehdr->e_version = EV_CURRENT;
+ ehdr->e_phoff = sizeof(Elf64_Ehdr);
+ ehdr->e_ehsize = sizeof(Elf64_Ehdr);
+ ehdr->e_phentsize = sizeof(Elf64_Phdr);
+
+ /* Prepare one phdr of type PT_NOTE for each present CPU */
+ for_each_present_cpu(cpu) {
+ phdr->p_type = PT_NOTE;
+ notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
+ phdr->p_offset = phdr->p_paddr = notes_addr;
+ phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
+ (ehdr->e_phnum)++;
+ phdr++;
+ }
+
+ /* Prepare one PT_NOTE header for vmcoreinfo */
+ phdr->p_type = PT_NOTE;
+ phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
+ phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
+ (ehdr->e_phnum)++;
+ phdr++;
+
+ /* Prepare PT_LOAD type program header for kernel text region */
+ if (need_kernel_map) {
+ phdr->p_type = PT_LOAD;
+ phdr->p_flags = PF_R|PF_W|PF_X;
+ phdr->p_vaddr = (unsigned long) _text;
+ phdr->p_filesz = phdr->p_memsz = _end - _text;
+ phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
+ ehdr->e_phnum++;
+ phdr++;
+ }
+
+ /* Go through all the ranges in mem->ranges[] and prepare phdr */
+ for (i = 0; i < mem->nr_ranges; i++) {
+ mstart = mem->ranges[i].start;
+ mend = mem->ranges[i].end;
+
+ phdr->p_type = PT_LOAD;
+ phdr->p_flags = PF_R|PF_W|PF_X;
+ phdr->p_offset = mstart;
+
+ phdr->p_paddr = mstart;
+ phdr->p_vaddr = (unsigned long) __va(mstart);
+ phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
+ phdr->p_align = 0;
+ ehdr->e_phnum++;
+ pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
+ phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
+ ehdr->e_phnum, phdr->p_offset);
+ phdr++;
+ }
+
+ *addr = buf;
+ *sz = elf_sz;
+ return 0;
+}
+
+int crash_exclude_mem_range(struct crash_mem *mem,
+ unsigned long long mstart, unsigned long long mend)
+{
+ int i, j;
+ unsigned long long start, end, p_start, p_end;
+ struct range temp_range = {0, 0};
+
+ for (i = 0; i < mem->nr_ranges; i++) {
+ start = mem->ranges[i].start;
+ end = mem->ranges[i].end;
+ p_start = mstart;
+ p_end = mend;
+
+ if (mstart > end || mend < start)
+ continue;
+
+ /* Truncate any area outside of range */
+ if (mstart < start)
+ p_start = start;
+ if (mend > end)
+ p_end = end;
+
+ /* Found completely overlapping range */
+ if (p_start == start && p_end == end) {
+ mem->ranges[i].start = 0;
+ mem->ranges[i].end = 0;
+ if (i < mem->nr_ranges - 1) {
+ /* Shift rest of the ranges to left */
+ for (j = i; j < mem->nr_ranges - 1; j++) {
+ mem->ranges[j].start =
+ mem->ranges[j+1].start;
+ mem->ranges[j].end =
+ mem->ranges[j+1].end;
+ }
+
+ /*
+ * Continue to check if there are another overlapping ranges
+ * from the current position because of shifting the above
+ * mem ranges.
+ */
+ i--;
+ mem->nr_ranges--;
+ continue;
+ }
+ mem->nr_ranges--;
+ return 0;
+ }
+
+ if (p_start > start && p_end < end) {
+ /* Split original range */
+ mem->ranges[i].end = p_start - 1;
+ temp_range.start = p_end + 1;
+ temp_range.end = end;
+ } else if (p_start != start)
+ mem->ranges[i].end = p_start - 1;
+ else
+ mem->ranges[i].start = p_end + 1;
+ break;
+ }
+
+ /* If a split happened, add the split to array */
+ if (!temp_range.end)
+ return 0;
+
+ /* Split happened */
+ if (i == mem->max_nr_ranges - 1)
+ return -ENOMEM;
+
+ /* Location where new range should go */
+ j = i + 1;
+ if (j < mem->nr_ranges) {
+ /* Move over all ranges one slot towards the end */
+ for (i = mem->nr_ranges - 1; i >= j; i--)
+ mem->ranges[i + 1] = mem->ranges[i];
+ }
+
+ mem->ranges[j].start = temp_range.start;
+ mem->ranges[j].end = temp_range.end;
+ mem->nr_ranges++;
+ return 0;
+}
+
Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
void *data, size_t data_len)
{
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 69ee4a29136f..e9cf9e8d8f01 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -1150,184 +1150,3 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name,
return 0;
}
#endif /* CONFIG_ARCH_HAS_KEXEC_PURGATORY */
-
-int crash_exclude_mem_range(struct crash_mem *mem,
- unsigned long long mstart, unsigned long long mend)
-{
- int i, j;
- unsigned long long start, end, p_start, p_end;
- struct range temp_range = {0, 0};
-
- for (i = 0; i < mem->nr_ranges; i++) {
- start = mem->ranges[i].start;
- end = mem->ranges[i].end;
- p_start = mstart;
- p_end = mend;
-
- if (mstart > end || mend < start)
- continue;
-
- /* Truncate any area outside of range */
- if (mstart < start)
- p_start = start;
- if (mend > end)
- p_end = end;
-
- /* Found completely overlapping range */
- if (p_start == start && p_end == end) {
- mem->ranges[i].start = 0;
- mem->ranges[i].end = 0;
- if (i < mem->nr_ranges - 1) {
- /* Shift rest of the ranges to left */
- for (j = i; j < mem->nr_ranges - 1; j++) {
- mem->ranges[j].start =
- mem->ranges[j+1].start;
- mem->ranges[j].end =
- mem->ranges[j+1].end;
- }
-
- /*
- * Continue to check if there are another overlapping ranges
- * from the current position because of shifting the above
- * mem ranges.
- */
- i--;
- mem->nr_ranges--;
- continue;
- }
- mem->nr_ranges--;
- return 0;
- }
-
- if (p_start > start && p_end < end) {
- /* Split original range */
- mem->ranges[i].end = p_start - 1;
- temp_range.start = p_end + 1;
- temp_range.end = end;
- } else if (p_start != start)
- mem->ranges[i].end = p_start - 1;
- else
- mem->ranges[i].start = p_end + 1;
- break;
- }
-
- /* If a split happened, add the split to array */
- if (!temp_range.end)
- return 0;
-
- /* Split happened */
- if (i == mem->max_nr_ranges - 1)
- return -ENOMEM;
-
- /* Location where new range should go */
- j = i + 1;
- if (j < mem->nr_ranges) {
- /* Move over all ranges one slot towards the end */
- for (i = mem->nr_ranges - 1; i >= j; i--)
- mem->ranges[i + 1] = mem->ranges[i];
- }
-
- mem->ranges[j].start = temp_range.start;
- mem->ranges[j].end = temp_range.end;
- mem->nr_ranges++;
- return 0;
-}
-
-int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
- void **addr, unsigned long *sz)
-{
- Elf64_Ehdr *ehdr;
- Elf64_Phdr *phdr;
- unsigned long nr_cpus = num_possible_cpus(), nr_phdr, elf_sz;
- unsigned char *buf;
- unsigned int cpu, i;
- unsigned long long notes_addr;
- unsigned long mstart, mend;
-
- /* extra phdr for vmcoreinfo ELF note */
- nr_phdr = nr_cpus + 1;
- nr_phdr += mem->nr_ranges;
-
- /*
- * kexec-tools creates an extra PT_LOAD phdr for kernel text mapping
- * area (for example, ffffffff80000000 - ffffffffa0000000 on x86_64).
- * I think this is required by tools like gdb. So same physical
- * memory will be mapped in two ELF headers. One will contain kernel
- * text virtual addresses and other will have __va(physical) addresses.
- */
-
- nr_phdr++;
- elf_sz = sizeof(Elf64_Ehdr) + nr_phdr * sizeof(Elf64_Phdr);
- elf_sz = ALIGN(elf_sz, ELF_CORE_HEADER_ALIGN);
-
- buf = vzalloc(elf_sz);
- if (!buf)
- return -ENOMEM;
-
- ehdr = (Elf64_Ehdr *)buf;
- phdr = (Elf64_Phdr *)(ehdr + 1);
- memcpy(ehdr->e_ident, ELFMAG, SELFMAG);
- ehdr->e_ident[EI_CLASS] = ELFCLASS64;
- ehdr->e_ident[EI_DATA] = ELFDATA2LSB;
- ehdr->e_ident[EI_VERSION] = EV_CURRENT;
- ehdr->e_ident[EI_OSABI] = ELF_OSABI;
- memset(ehdr->e_ident + EI_PAD, 0, EI_NIDENT - EI_PAD);
- ehdr->e_type = ET_CORE;
- ehdr->e_machine = ELF_ARCH;
- ehdr->e_version = EV_CURRENT;
- ehdr->e_phoff = sizeof(Elf64_Ehdr);
- ehdr->e_ehsize = sizeof(Elf64_Ehdr);
- ehdr->e_phentsize = sizeof(Elf64_Phdr);
-
- /* Prepare one phdr of type PT_NOTE for each present CPU */
- for_each_present_cpu(cpu) {
- phdr->p_type = PT_NOTE;
- notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu));
- phdr->p_offset = phdr->p_paddr = notes_addr;
- phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t);
- (ehdr->e_phnum)++;
- phdr++;
- }
-
- /* Prepare one PT_NOTE header for vmcoreinfo */
- phdr->p_type = PT_NOTE;
- phdr->p_offset = phdr->p_paddr = paddr_vmcoreinfo_note();
- phdr->p_filesz = phdr->p_memsz = VMCOREINFO_NOTE_SIZE;
- (ehdr->e_phnum)++;
- phdr++;
-
- /* Prepare PT_LOAD type program header for kernel text region */
- if (need_kernel_map) {
- phdr->p_type = PT_LOAD;
- phdr->p_flags = PF_R|PF_W|PF_X;
- phdr->p_vaddr = (unsigned long) _text;
- phdr->p_filesz = phdr->p_memsz = _end - _text;
- phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
- ehdr->e_phnum++;
- phdr++;
- }
-
- /* Go through all the ranges in mem->ranges[] and prepare phdr */
- for (i = 0; i < mem->nr_ranges; i++) {
- mstart = mem->ranges[i].start;
- mend = mem->ranges[i].end;
-
- phdr->p_type = PT_LOAD;
- phdr->p_flags = PF_R|PF_W|PF_X;
- phdr->p_offset = mstart;
-
- phdr->p_paddr = mstart;
- phdr->p_vaddr = (unsigned long) __va(mstart);
- phdr->p_filesz = phdr->p_memsz = mend - mstart + 1;
- phdr->p_align = 0;
- ehdr->e_phnum++;
- pr_debug("Crash PT_LOAD ELF header. phdr=%p vaddr=0x%llx, paddr=0x%llx, sz=0x%llx e_phnum=%d p_offset=0x%llx\n",
- phdr, phdr->p_vaddr, phdr->p_paddr, phdr->p_filesz,
- ehdr->e_phnum, phdr->p_offset);
- phdr++;
- }
-
- *addr = buf;
- *sz = elf_sz;
- return 0;
-}
--
2.31.1


2023-06-28 18:58:02

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 06/10] crash: memory and CPU hotplug sysfs attributes

Introduce the crash_hotplug attribute for memory and CPUs for
use by userspace. These attributes directly facilitate the udev
rule for managing userspace re-loading of the crash kernel upon
hot un/plug changes.

For memory, expose the crash_hotplug attribute to the
/sys/devices/system/memory directory. For example:

# udevadm info --attribute-walk /sys/devices/system/memory/memory81
looking at device '/devices/system/memory/memory81':
KERNEL=="memory81"
SUBSYSTEM=="memory"
DRIVER==""
ATTR{online}=="1"
ATTR{phys_device}=="0"
ATTR{phys_index}=="00000051"
ATTR{removable}=="1"
ATTR{state}=="online"
ATTR{valid_zones}=="Movable"

looking at parent device '/devices/system/memory':
KERNELS=="memory"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{auto_online_blocks}=="offline"
ATTRS{block_size_bytes}=="8000000"
ATTRS{crash_hotplug}=="1"

For CPUs, expose the crash_hotplug attribute to the
/sys/devices/system/cpu directory. For example:

# udevadm info --attribute-walk /sys/devices/system/cpu/cpu0
looking at device '/devices/system/cpu/cpu0':
KERNEL=="cpu0"
SUBSYSTEM=="cpu"
DRIVER=="processor"
ATTR{crash_notes}=="277c38600"
ATTR{crash_notes_size}=="368"
ATTR{online}=="1"

looking at parent device '/devices/system/cpu':
KERNELS=="cpu"
SUBSYSTEMS==""
DRIVERS==""
ATTRS{crash_hotplug}=="1"
ATTRS{isolated}==""
ATTRS{kernel_max}=="8191"
ATTRS{nohz_full}==" (null)"
ATTRS{offline}=="4-7"
ATTRS{online}=="0-3"
ATTRS{possible}=="0-7"
ATTRS{present}=="0-3"

With these sysfs attributes in place, it is possible to efficiently
instruct the udev rule to skip crash kernel reloading for kernels
configured with crash hotplug support.

For example, the following is the proposed udev rule change for RHEL
system 98-kexec.rules (as the first lines of the rule file):

# The kernel updates the crash elfcorehdr for CPU and memory changes
SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

When examined in the context of 98-kexec.rules, the above rules
test if crash_hotplug is set, and if so, the userspace initiated
unload-then-reload of the crash kernel is skipped.

CPU and memory checks are separated in accordance with
CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options.
If an architecture supports, for example, memory hotplug but not
CPU hotplug, then the /sys/devices/system/memory/crash_hotplug
attribute file is present, but the /sys/devices/system/cpu/crash_hotplug
attribute file will NOT be present. Thus the udev rule skips
userspace processing of memory hot un/plug events, but the udev
rule will evaluate false for CPU events, thus allowing userspace to
process CPU hot un/plug events (ie the unload-then-reload of the kdump
capture kernel).

Signed-off-by: Eric DeVolder <[email protected]>
Reviewed-by: Sourabh Jain <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
Documentation/ABI/testing/sysfs-devices-memory | 8 ++++++++
.../ABI/testing/sysfs-devices-system-cpu | 8 ++++++++
.../admin-guide/mm/memory-hotplug.rst | 8 ++++++++
Documentation/core-api/cpu_hotplug.rst | 18 ++++++++++++++++++
drivers/base/cpu.c | 16 ++++++++++++++--
drivers/base/memory.c | 13 +++++++++++++
include/linux/kexec.h | 8 ++++++++
7 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
index d8b0f80b9e33..c50725ebebb7 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -110,3 +110,11 @@ Description:
link is created for memory section 9 on node0.

/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
+What: /sys/devices/system/cpu/crash_hotplug
+Date: Jun 2023
+Contact: Linux kernel mailing list <[email protected]>
+Description:
+ (RO) indicates whether or not the kernel directly supports
+ modifying the crash elfcorehdr for memory hot un/plug and/or
+ on/offline changes.
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index ecd585ca2d50..598b0fa67481 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -686,3 +686,11 @@ Description:
(RO) the list of CPUs that are isolated and don't
participate in load balancing. These CPUs are set by
boot parameter "isolcpus=".
+
+What: /sys/devices/system/cpu/crash_hotplug
+Date: Jun 2023
+Contact: Linux kernel mailing list <[email protected]>
+Description:
+ (RO) indicates whether or not the kernel directly supports
+ modifying the crash elfcorehdr for CPU hot un/plug and/or
+ on/offline changes.
diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index 1b02fe5807cc..eb99d79223a3 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -291,6 +291,14 @@ The following files are currently defined:
Availability depends on the CONFIG_ARCH_MEMORY_PROBE
kernel configuration option.
``uevent`` read-write: generic udev file for device subsystems.
+``crash_hotplug`` read-only: when changes to the system memory map
+ occur due to hot un/plug of memory, this file contains
+ '1' if the kernel updates the kdump capture kernel memory
+ map itself (via elfcorehdr), or '0' if userspace must update
+ the kdump capture kernel memory map.
+
+ Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+ configuration option.
====================== =========================================================

.. note::
diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst
index e6f5bc39cf5c..54581c501562 100644
--- a/Documentation/core-api/cpu_hotplug.rst
+++ b/Documentation/core-api/cpu_hotplug.rst
@@ -741,6 +741,24 @@ will receive all events. A script like::

can process the event further.

+When changes to the CPUs in the system occur, the sysfs file
+/sys/devices/system/cpu/crash_hotplug contains '1' if the kernel
+updates the kdump capture kernel list of CPUs itself (via elfcorehdr),
+or '0' if userspace must update the kdump capture kernel list of CPUs.
+
+The availability depends on the CONFIG_HOTPLUG_CPU kernel configuration
+option.
+
+To skip userspace processing of CPU hot un/plug events for kdump
+(ie the unload-then-reload to obtain a current list of CPUs), this sysfs
+file can be used in a udev rule as follows:
+
+ SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
+
+For a cpu hot un/plug event, if the architecture supports kernel updates
+of the elfcorehdr (which contains the list of CPUs), then the rule skips
+the unload-then-reload of the kdump capture kernel.
+
Kernel Inline Documentations Reference
======================================

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 75fa46a567a1..26c85f3c8193 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -20,6 +20,7 @@
#include <linux/tick.h>
#include <linux/pm_qos.h>
#include <linux/sched/isolation.h>
+#include <linux/kexec.h>

#include "base.h"

@@ -132,8 +133,6 @@ static DEVICE_ATTR(probe, S_IWUSR, NULL, cpu_probe_store);
static DEVICE_ATTR(release, S_IWUSR, NULL, cpu_release_store);

#ifdef CONFIG_KEXEC
-#include <linux/kexec.h>
-
static ssize_t crash_notes_show(struct device *dev,
struct device_attribute *attr,
char *buf)
@@ -290,6 +289,14 @@ static ssize_t print_cpus_nohz_full(struct device *dev,
}
static DEVICE_ATTR(nohz_full, 0444, print_cpus_nohz_full, NULL);

+static ssize_t crash_hotplug_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support());
+}
+static DEVICE_ATTR_ADMIN_RO(crash_hotplug);
+
static void cpu_device_release(struct device *dev)
{
/*
@@ -474,6 +481,7 @@ static struct attribute *cpu_root_attrs[] = {
&dev_attr_isolated.attr,
&dev_attr_nohz_full.attr,
&dev_attr_modalias.attr,
+ &dev_attr_crash_hotplug.attr,
NULL
};

@@ -509,6 +517,10 @@ cpu_root_attr_is_visible(struct kobject *kobj,
if (attr == &dev_attr_modalias.attr)
return mode;
}
+ if (IS_ENABLED(CONFIG_CRASH_HOTPLUG)) {
+ if (attr == &dev_attr_crash_hotplug.attr)
+ return mode;
+ }

return 0;
}
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f03eda7e1c9c..f1b9d8fccace 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -25,6 +25,7 @@

#include <linux/atomic.h>
#include <linux/uaccess.h>
+#include <linux/kexec.h>

#define MEMORY_CLASS_NAME "memory"

@@ -494,6 +495,13 @@ static ssize_t auto_online_blocks_store(struct device *dev,

static DEVICE_ATTR_RW(auto_online_blocks);

+static ssize_t crash_hotplug_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return sysfs_emit(buf, "%d\n", crash_hotplug_memory_support());
+}
+static DEVICE_ATTR_RO(crash_hotplug);
+
/*
* Some architectures will have custom drivers to do this, and
* will not need to do it from userspace. The fake hot-add code
@@ -916,6 +924,7 @@ static struct attribute *memory_root_attrs[] = {
&dev_attr_hard_offline_page.attr,
&dev_attr_block_size_bytes.attr,
&dev_attr_auto_online_blocks.attr,
+ &dev_attr_crash_hotplug.attr,
NULL
};

@@ -939,6 +948,10 @@ memory_root_attr_is_visible(struct kobject *kobj,
return mode;
if (attr == &dev_attr_auto_online_blocks.attr)
return mode;
+ if (IS_ENABLED(CONFIG_CRASH_HOTPLUG)) {
+ if (attr == &dev_attr_crash_hotplug.attr)
+ return mode;
+ }

return 0;
}
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index b9903dd48e24..6a8a724ac638 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -501,6 +501,14 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
#endif

+#ifndef crash_hotplug_cpu_support
+static inline int crash_hotplug_cpu_support(void) { return 0; }
+#endif
+
+#ifndef crash_hotplug_memory_support
+static inline int crash_hotplug_memory_support(void) { return 0; }
+#endif
+
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
struct task_struct;
--
2.31.1


2023-06-28 18:58:03

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 04/10] crash: add generic infrastructure for crash hotplug support

To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining). The crash elfcorehdr describes the CPUs and memory to
be written into the vmcore.

To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE
group, just prior to the STARTING group, which is very close to the
CPU starting up in a plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate. Note that for a CPU being unplugged
or offlined, the CPU will still be present in the list of CPUs
generated by crash_prepare_elf64_headers(). However, there is no
need to explicitly omit the CPU, see justification in
'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.

To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().

The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.

Signed-off-by: Eric DeVolder <[email protected]>
Reviewed-by: Sourabh Jain <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
include/linux/crash_core.h | 9 +++
include/linux/kexec.h | 11 +++
kernel/Kconfig.kexec | 31 ++++++++
kernel/crash_core.c | 142 +++++++++++++++++++++++++++++++++++++
kernel/kexec_core.c | 6 ++
5 files changed, 199 insertions(+)

diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index de62a722431e..e14345cc7a22 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);

+#define KEXEC_CRASH_HP_NONE 0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_CPU 2
+#define KEXEC_CRASH_HP_ADD_MEMORY 3
+#define KEXEC_CRASH_HP_REMOVE_MEMORY 4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
#endif /* LINUX_CRASH_CORE_H */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 811a90e09698..b9903dd48e24 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
#include <linux/compat.h>
#include <linux/ioport.h>
#include <linux/module.h>
+#include <linux/highmem.h>
#include <asm/kexec.h>

/* Verify architecture specific macros are defined */
@@ -360,6 +361,12 @@ struct kimage {
struct purgatory_info purgatory_info;
#endif

+#ifdef CONFIG_CRASH_HOTPLUG
+ int hp_action;
+ int elfcorehdr_index;
+ bool elfcorehdr_updated;
+#endif
+
#ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -490,6 +497,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g
static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { }
#endif

+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
struct task_struct;
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 5d576ddfd999..7eb42a795176 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -107,4 +107,35 @@ config CRASH_DUMP
For s390, this option also enables zfcpdump.
See also <file:Documentation/s390/zfcpdump.rst>

+config CRASH_HOTPLUG
+ bool "Update the crash elfcorehdr on system configuration changes"
+ default y
+ depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG)
+ depends on ARCH_SUPPORTS_CRASH_HOTPLUG
+ help
+ Enable direct update to the crash elfcorehdr (which contains
+ the list of CPUs and memory regions to be dumped upon a crash)
+ in response to hot plug/unplug or online/offline of CPUs or
+ memory. This is a much more advanced approach than userspace
+ attempting that.
+
+ If unsure, say Y.
+
+config CRASH_MAX_MEMORY_RANGES
+ int "Specify the maximum number of memory regions for the elfcorehdr"
+ default 8192
+ depends on CRASH_HOTPLUG
+ help
+ For the kexec_file_load() syscall path, specify the maximum number of
+ memory regions that the elfcorehdr buffer/segment can accommodate.
+ These regions are obtained via walk_system_ram_res(); eg. the
+ 'System RAM' entries in /proc/iomem.
+ This value is combined with NR_CPUS_DEFAULT and multiplied by
+ sizeof(Elf64_Phdr) to determine the final elfcorehdr memory buffer/
+ segment size.
+ The value 8192, for example, covers a (sparsely populated) 1TiB system
+ consisting of 128MiB memblocks, while resulting in an elfcorehdr
+ memory buffer/segment size under 1MiB. This represents a sane choice
+ to accommodate both baremetal and virtual machine configurations.
+
endmenu
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index b7c30b748a16..53d211c690a1 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -11,6 +11,8 @@
#include <linux/vmalloc.h>
#include <linux/sizes.h>
#include <linux/kexec.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>

#include <asm/page.h>
#include <asm/sections.h>
@@ -18,6 +20,7 @@
#include <crypto/sha1.h>

#include "kallsyms_internal.h"
+#include "kexec_internal.h"

/* vmcoreinfo stuff */
unsigned char *vmcoreinfo_data;
@@ -697,3 +700,142 @@ static int __init crash_save_vmcoreinfo_init(void)
}

subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+/*
+ * To accurately reflect hot un/plug changes of cpu and memory resources
+ * (including onling and offlining of those resources), the elfcorehdr
+ * (which is passed to the crash kernel via the elfcorehdr= parameter)
+ * must be updated with the new list of CPUs and memories.
+ *
+ * In order to make changes to elfcorehdr, two conditions are needed:
+ * First, the segment containing the elfcorehdr must be large enough
+ * to permit a growing number of resources; the elfcorehdr memory size
+ * is based on NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES.
+ * Second, purgatory must explicitly exclude the elfcorehdr from the
+ * list of segments it checks (since the elfcorehdr changes and thus
+ * would require an update to purgatory itself to update the digest).
+ */
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+{
+ struct kimage *image;
+
+ /* Obtain lock while changing crash information */
+ if (!kexec_trylock()) {
+ pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+ return;
+ }
+
+ /* Check kdump is not loaded */
+ if (!kexec_crash_image)
+ goto out;
+
+ image = kexec_crash_image;
+
+ if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
+ hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+ pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
+ else
+ pr_debug("hp_action %u\n", hp_action);
+
+ /*
+ * The elfcorehdr_index is set to -1 when the struct kimage
+ * is allocated. Find the segment containing the elfcorehdr,
+ * if not already found.
+ */
+ if (image->elfcorehdr_index < 0) {
+ unsigned long mem;
+ unsigned char *ptr;
+ unsigned int n;
+
+ for (n = 0; n < image->nr_segments; n++) {
+ mem = image->segment[n].mem;
+ ptr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+ if (ptr) {
+ /* The segment containing elfcorehdr */
+ if (memcmp(ptr, ELFMAG, SELFMAG) == 0)
+ image->elfcorehdr_index = (int)n;
+ kunmap_local(ptr);
+ }
+ }
+ }
+
+ if (image->elfcorehdr_index < 0) {
+ pr_err("unable to locate elfcorehdr segment");
+ goto out;
+ }
+
+ /* Needed in order for the segments to be updated */
+ arch_kexec_unprotect_crashkres();
+
+ /* Differentiate between normal load and hotplug update */
+ image->hp_action = hp_action;
+
+ /* Now invoke arch-specific update handler */
+ arch_crash_handle_hotplug_event(image);
+
+ /* No longer handling a hotplug event */
+ image->hp_action = KEXEC_CRASH_HP_NONE;
+ image->elfcorehdr_updated = true;
+
+ /* Change back to read-only */
+ arch_kexec_protect_crashkres();
+
+ /* Errors in the callback is not a reason to rollback state */
+out:
+ /* Release lock now that update complete */
+ kexec_unlock();
+}
+
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
+{
+ switch (val) {
+ case MEM_ONLINE:
+ crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
+ KEXEC_CRASH_HP_INVALID_CPU);
+ break;
+
+ case MEM_OFFLINE:
+ crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
+ KEXEC_CRASH_HP_INVALID_CPU);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+ .notifier_call = crash_memhp_notifier,
+ .priority = 0
+};
+
+static int crash_cpuhp_online(unsigned int cpu)
+{
+ crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
+ return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+ crash_handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+ return 0;
+}
+
+static int __init crash_hotplug_init(void)
+{
+ int result = 0;
+
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ register_memory_notifier(&crash_memhp_nb);
+
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
+ result = cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN,
+ "crash/cpuhp", crash_cpuhp_online, crash_cpuhp_offline);
+ }
+
+ return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 3d578c6fefee..8296d019737c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -277,6 +277,12 @@ struct kimage *do_kimage_alloc_init(void)
/* Initialize the list of unusable pages */
INIT_LIST_HEAD(&image->unusable_pages);

+#ifdef CONFIG_CRASH_HOTPLUG
+ image->hp_action = KEXEC_CRASH_HP_NONE;
+ image->elfcorehdr_index = -1;
+ image->elfcorehdr_updated = false;
+#endif
+
return image;
}

--
2.31.1


2023-06-28 18:58:24

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 07/10] x86/crash: add x86 crash hotplug support

When CPU or memory is hot un/plugged, or off/onlined, the crash
elfcorehdr, which describes the CPUs and memory in the system,
must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
and replaces the existing elfcorehdr. The segment containing the
elfcorehdr is identified at run-time in
crash_core:crash_handle_hotplug_event().

No modifications to purgatory (see 'kexec: exclude elfcorehdr
from the segment digest') or boot_params (as the elfcorehdr=
capture kernel command line parameter pointer remains unchanged
and correct) are needed, just elfcorehdr.

For kexec_file_load(), the elfcorehdr segment size is based on
NR_CPUS and CRASH_MAX_MEMORY_RANGES in order to accommodate a
growing number of CPU and memory resources.

For kexec_load(), the userspace kexec utility needs to size the
elfcorehdr segment in the same/similar manner.

To accommodate kexec_load() syscall in the absence of
kexec_file_load() syscall support, prepare_elf_headers() and
dependents are moved outside of CONFIG_KEXEC_FILE.

Signed-off-by: Eric DeVolder <[email protected]>
Reviewed-by: Sourabh Jain <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/Kconfig | 3 +
arch/x86/include/asm/kexec.h | 15 +++++
arch/x86/kernel/crash.c | 103 ++++++++++++++++++++++++++++++++---
3 files changed, 114 insertions(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 06a4472d0fc0..42c083da7ce4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2058,6 +2058,9 @@ config ARCH_SUPPORTS_KEXEC_JUMP
config ARCH_SUPPORTS_CRASH_DUMP
def_bool X86_64 || (X86_32 && HIGHMEM)

+config ARCH_SUPPORTS_CRASH_HOTPLUG
+ def_bool y
+
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
default "0x1000000"
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5b77bbc28f96..9143100ea3ea 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -209,6 +209,21 @@ typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);

+#ifdef CONFIG_CRASH_HOTPLUG
+void arch_crash_handle_hotplug_event(struct kimage *image);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+#endif
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index cdd92ab43cda..c70a111c44fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -158,8 +158,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
crash_save_cpu(regs, safe_smp_processor_id());
}

-#ifdef CONFIG_KEXEC_FILE
-
static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
{
unsigned int *nr_ranges = arg;
@@ -231,7 +229,7 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)

/* Prepare elf headers. Return addr and size */
static int prepare_elf_headers(struct kimage *image, void **addr,
- unsigned long *sz)
+ unsigned long *sz, unsigned long *nr_mem_ranges)
{
struct crash_mem *cmem;
int ret;
@@ -249,6 +247,9 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
if (ret)
goto out;

+ /* Return the computed number of memory ranges, for hotplug usage */
+ *nr_mem_ranges = cmem->nr_ranges;
+
/* By default prepare 64bit headers */
ret = crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), addr, sz);

@@ -257,6 +258,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
return ret;
}

+#ifdef CONFIG_KEXEC_FILE
static int add_e820_entry(struct boot_params *params, struct e820_entry *entry)
{
unsigned int nr_e820_entries;
@@ -371,18 +373,42 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
int crash_load_segments(struct kimage *image)
{
int ret;
+ unsigned long pnum = 0;
struct kexec_buf kbuf = { .image = image, .buf_min = 0,
.buf_max = ULONG_MAX, .top_down = false };

/* Prepare elf headers and add a segment */
- ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz);
+ ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz, &pnum);
if (ret)
return ret;

- image->elf_headers = kbuf.buffer;
- image->elf_headers_sz = kbuf.bufsz;
+ image->elf_headers = kbuf.buffer;
+ image->elf_headers_sz = kbuf.bufsz;
+ kbuf.memsz = kbuf.bufsz;
+
+#ifdef CONFIG_CRASH_HOTPLUG
+ /*
+ * The elfcorehdr segment size accounts for VMCOREINFO, kernel_map,
+ * maximum CPUs and maximum memory ranges.
+ */
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ pnum = 2 + CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES;
+ else
+ pnum += 2 + CONFIG_NR_CPUS_DEFAULT;
+
+ if (pnum < (unsigned long)PN_XNUM) {
+ kbuf.memsz = pnum * sizeof(Elf64_Phdr);
+ kbuf.memsz += sizeof(Elf64_Ehdr);
+
+ image->elfcorehdr_index = image->nr_segments;
+
+ /* Mark as usable to crash kernel, else crash kernel fails on boot */
+ image->elf_headers_sz = kbuf.memsz;
+ } else {
+ pr_err("number of Phdrs %lu exceeds max\n", pnum);
+ }
+#endif

- kbuf.memsz = kbuf.bufsz;
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
@@ -395,3 +421,66 @@ int crash_load_segments(struct kimage *image)
return ret;
}
#endif /* CONFIG_KEXEC_FILE */
+
+#ifdef CONFIG_CRASH_HOTPLUG
+
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: a pointer to kexec_crash_image
+ *
+ * Prepare the new elfcorehdr and replace the existing elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image)
+{
+ void *elfbuf = NULL, *old_elfcorehdr;
+ unsigned long nr_mem_ranges;
+ unsigned long mem, memsz;
+ unsigned long elfsz = 0;
+
+ /*
+ * Create the new elfcorehdr reflecting the changes to CPU and/or
+ * memory resources.
+ */
+ if (prepare_elf_headers(image, &elfbuf, &elfsz, &nr_mem_ranges)) {
+ pr_err("unable to create new elfcorehdr");
+ goto out;
+ }
+
+ /*
+ * Obtain address and size of the elfcorehdr segment, and
+ * check it against the new elfcorehdr buffer.
+ */
+ mem = image->segment[image->elfcorehdr_index].mem;
+ memsz = image->segment[image->elfcorehdr_index].memsz;
+ if (elfsz > memsz) {
+ pr_err("update elfcorehdr elfsz %lu > memsz %lu",
+ elfsz, memsz);
+ goto out;
+ }
+
+ /*
+ * Copy new elfcorehdr over the old elfcorehdr at destination.
+ */
+ old_elfcorehdr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+ if (!old_elfcorehdr) {
+ pr_err("mapping elfcorehdr segment failed\n");
+ goto out;
+ }
+
+ /*
+ * Temporarily invalidate the crash image while the
+ * elfcorehdr is updated.
+ */
+ xchg(&kexec_crash_image, NULL);
+ memcpy_flushcache(old_elfcorehdr, elfbuf, elfsz);
+ xchg(&kexec_crash_image, image);
+ kunmap_local(old_elfcorehdr);
+ pr_debug("updated elfcorehdr\n");
+
+out:
+ vfree(elfbuf);
+}
+#endif
--
2.31.1


2023-06-28 18:58:34

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 02/10] drivers/base: refactor memory.c to use .is_visible()

Greg Kroah-Hartman requested that this file use the .is_visible()
method instead of #ifdefs for the attributes in memory.c.

static struct attribute *memory_memblk_attrs[] = {
&dev_attr_phys_index.attr,
&dev_attr_state.attr,
&dev_attr_phys_device.attr,
&dev_attr_removable.attr,
#ifdef CONFIG_MEMORY_HOTREMOVE
&dev_attr_valid_zones.attr,
#endif
NULL
};

and

static struct attribute *memory_root_attrs[] = {
#ifdef CONFIG_ARCH_MEMORY_PROBE
&dev_attr_probe.attr,
#endif

#ifdef CONFIG_MEMORY_FAILURE
&dev_attr_soft_offline_page.attr,
&dev_attr_hard_offline_page.attr,
#endif

&dev_attr_block_size_bytes.attr,
&dev_attr_auto_online_blocks.attr,
NULL
};

To that end:
- the .is_visible() method is implemented, and IS_ENABLED(), rather
than #ifdef, is used to determine the visibility of the attribute.
- the DEVICE_ATTR_xx() attributes are moved outside of #ifdefs, so that
those structs are always present for the memory_memblk_attrs[] and
memory_root_attrs[].
- the #ifdefs guarding the attributes in the memory_memblk_attrs[] and
memory_root_attrs[] are moved to the corresponding callback function;
as the callback function must exist now that the attribute is always
compiled-in (though not necessarily visible).

No functionality change intended.

Signed-off-by: Eric DeVolder <[email protected]>
---
drivers/base/memory.c | 78 +++++++++++++++++++++++++++++++++++--------
1 file changed, 65 insertions(+), 13 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b456ac213610..f03eda7e1c9c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -405,10 +405,12 @@ static int print_allowed_zone(char *buf, int len, int nid,

return sysfs_emit_at(buf, len, " %s", zone->name);
}
+#endif

static ssize_t valid_zones_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
+#ifdef CONFIG_MEMORY_HOTREMOVE
struct memory_block *mem = to_memory_block(dev);
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -444,9 +446,11 @@ static ssize_t valid_zones_show(struct device *dev,
out:
len += sysfs_emit_at(buf, len, "\n");
return len;
+#else
+ return 0;
+#endif
}
static DEVICE_ATTR_RO(valid_zones);
-#endif

static DEVICE_ATTR_RO(phys_index);
static DEVICE_ATTR_RW(state);
@@ -496,10 +500,10 @@ static DEVICE_ATTR_RW(auto_online_blocks);
* as well as ppc64 will do all of their discovery in userspace
* and will require this interface.
*/
-#ifdef CONFIG_ARCH_MEMORY_PROBE
static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
+#ifdef CONFIG_ARCH_MEMORY_PROBE
u64 phys_addr;
int nid, ret;
unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;
@@ -527,12 +531,13 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
out:
unlock_device_hotplug();
return ret;
+#else
+ return 0;
+#endif
}

static DEVICE_ATTR_WO(probe);
-#endif

-#ifdef CONFIG_MEMORY_FAILURE
/*
* Support for offlining pages of memory
*/
@@ -542,6 +547,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
+#ifdef CONFIG_MEMORY_FAILURE
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
@@ -551,6 +557,9 @@ static ssize_t soft_offline_page_store(struct device *dev,
pfn >>= PAGE_SHIFT;
ret = soft_offline_page(pfn, 0);
return ret == 0 ? count : ret;
+#else
+ return 0;
+#endif
}

/* Forcibly offline a page, including killing processes. */
@@ -558,6 +567,7 @@ static ssize_t hard_offline_page_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
+#ifdef CONFIG_MEMORY_FAILURE
int ret;
u64 pfn;
if (!capable(CAP_SYS_ADMIN))
@@ -569,11 +579,13 @@ static ssize_t hard_offline_page_store(struct device *dev,
if (ret == -EOPNOTSUPP)
ret = 0;
return ret ? ret : count;
+#else
+ return 0;
+#endif
}

static DEVICE_ATTR_WO(soft_offline_page);
static DEVICE_ATTR_WO(hard_offline_page);
-#endif

/* See phys_device_show(). */
int __weak arch_get_memory_phys_device(unsigned long start_pfn)
@@ -611,14 +623,35 @@ static struct attribute *memory_memblk_attrs[] = {
&dev_attr_state.attr,
&dev_attr_phys_device.attr,
&dev_attr_removable.attr,
-#ifdef CONFIG_MEMORY_HOTREMOVE
&dev_attr_valid_zones.attr,
-#endif
NULL
};

+static umode_t
+memory_memblk_attr_is_visible(struct kobject *kobj,
+ struct attribute *attr, int unused)
+{
+ umode_t mode = attr->mode;
+
+ if (attr == &dev_attr_phys_index.attr)
+ return mode;
+ if (attr == &dev_attr_state.attr)
+ return mode;
+ if (attr == &dev_attr_phys_device.attr)
+ return mode;
+ if (attr == &dev_attr_removable.attr)
+ return mode;
+ if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) {
+ if (attr == &dev_attr_valid_zones.attr)
+ return mode;
+ }
+
+ return 0;
+}
+
static const struct attribute_group memory_memblk_attr_group = {
.attrs = memory_memblk_attrs,
+ .is_visible = memory_memblk_attr_is_visible,
};

static const struct attribute_group *memory_memblk_attr_groups[] = {
@@ -878,22 +911,41 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
}

static struct attribute *memory_root_attrs[] = {
-#ifdef CONFIG_ARCH_MEMORY_PROBE
&dev_attr_probe.attr,
-#endif
-
-#ifdef CONFIG_MEMORY_FAILURE
&dev_attr_soft_offline_page.attr,
&dev_attr_hard_offline_page.attr,
-#endif
-
&dev_attr_block_size_bytes.attr,
&dev_attr_auto_online_blocks.attr,
NULL
};

+static umode_t
+memory_root_attr_is_visible(struct kobject *kobj,
+ struct attribute *attr, int unused)
+{
+ umode_t mode = attr->mode;
+
+ if (IS_ENABLED(CONFIG_ARCH_MEMORY_PROBE)) {
+ if (attr == &dev_attr_probe.attr)
+ return mode;
+ }
+ if (IS_ENABLED(CONFIG_MEMORY_FAILURE)) {
+ if (attr == &dev_attr_soft_offline_page.attr)
+ return mode;
+ if (attr == &dev_attr_hard_offline_page.attr)
+ return mode;
+ }
+ if (attr == &dev_attr_block_size_bytes.attr)
+ return mode;
+ if (attr == &dev_attr_auto_online_blocks.attr)
+ return mode;
+
+ return 0;
+}
+
static const struct attribute_group memory_root_attr_group = {
.attrs = memory_root_attrs,
+ .is_visible = memory_root_attr_is_visible,
};

static const struct attribute_group *memory_root_attr_groups[] = {
--
2.31.1


2023-06-28 18:58:34

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 08/10] crash: hotplug support for kexec_load()

The hotplug support for kexec_load() requires changes to the
userspace kexec-tools and a little extra help from the kernel.

Given a kdump capture kernel loaded via kexec_load(), and a
subsequent hotplug event, the crash hotplug handler finds the
elfcorehdr and rewrites it to reflect the hotplug change.
That is the desired outcome, however, at kernel panic time,
the purgatory integrity check fails (because the elfcorehdr
changed), and the capture kernel does not boot and no vmcore
is generated.

Therefore, the userspace kexec-tools/kexec must indicate to the
kernel that the elfcorehdr can be modified (because the kexec
excluded the elfcorehdr from the digest, and sized the elfcorehdr
memory buffer appropriately).

To facilitate hotplug support with kexec_load():
- a new kexec flag KEXEC_UPATE_ELFCOREHDR indicates that it is
safe for the kernel to modify the kexec_load()'d elfcorehdr
- the /sys/kernel/crash_elfcorehdr_size node communicates the
preferred size of the elfcorehdr memory buffer
- The sysfs crash_hotplug nodes (ie.
/sys/devices/system/[cpu|memory]/crash_hotplug) dynamically
take into account kexec_file_load() vs kexec_load() and
KEXEC_UPDATE_ELFCOREHDR.
This is critical so that the udev rule processing of crash_hotplug
is all that is needed to determine if the userspace unload-then-load
of the kdump image is to be skipped, or not. The proposed udev
rule change looks like:
# The kernel updates the crash elfcorehdr for CPU and memory changes
SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

The table below indicates the behavior of kexec_load()'d kdump image
updates (with the new udev crash_hotplug rule in place):

Kernel |Kexec
-------+-----+----
Old |Old |New
| a | a
-------+-----+----
New | a | b
-------+-----+----

where kexec 'old' and 'new' delineate kexec-tools has the needed
modifications for the crash hotplug feature, and kernel 'old' and
'new' delineate the kernel supports this crash hotplug feature.

Behavior 'a' indicates the unload-then-reload of the entire kdump
image. For the kexec 'old' column, the unload-then-reload occurs
due to the missing flag KEXEC_UPDATE_ELFCOREHDR. An 'old' kernel
(with 'new' kexec) does not present the crash_hotplug sysfs node,
which leads to the unload-then-reload of the kdump image.

Behavior 'b' indicates the desired optimized behavior of the kernel
directly modifying the elfcorehdr and avoiding the unload-then-reload
of the kdump image.

If the udev rule is not updated with crash_hotplug node check, then
no matter any combination of kernel or kexec is new or old, the
kdump image continues to be unload-then-reload on hotplug changes.

To fully support crash hotplug feature, there needs to be a rollout
of kernel, kexec-tools and udev rule changes. However, the order of
the rollout of these pieces does not matter; kexec_load()'d kdump
images still function for hotplug as-is.

Suggested-by: Hari Bathini <[email protected]>
Signed-off-by: Eric DeVolder <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/include/asm/kexec.h | 11 +++++++----
arch/x86/kernel/crash.c | 27 +++++++++++++++++++++++++++
include/linux/kexec.h | 14 ++++++++++++--
include/uapi/linux/kexec.h | 1 +
kernel/Kconfig.kexec | 4 ++++
kernel/crash_core.c | 31 +++++++++++++++++++++++++++++++
kernel/kexec.c | 5 +++++
kernel/ksysfs.c | 15 +++++++++++++++
8 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9143100ea3ea..3be6a98751f0 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event

#ifdef CONFIG_HOTPLUG_CPU
-static inline int crash_hotplug_cpu_support(void) { return 1; }
-#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(void);
+#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
#endif

#ifdef CONFIG_MEMORY_HOTPLUG
-static inline int crash_hotplug_memory_support(void) { return 1; }
-#define crash_hotplug_memory_support crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(void);
+#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
#endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
#endif

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c70a111c44fa..caf22bcb61af 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -427,6 +427,33 @@ int crash_load_segments(struct kimage *image)
#undef pr_fmt
#define pr_fmt(fmt) "crash hp: " fmt

+/* These functions provide the value for the sysfs crash_hotplug nodes */
+#ifdef CONFIG_HOTPLUG_CPU
+int arch_crash_hotplug_cpu_support(void)
+{
+ return crash_check_update_elfcorehdr();
+}
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int arch_crash_hotplug_memory_support(void)
+{
+ return crash_check_update_elfcorehdr();
+}
+#endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void)
+{
+ unsigned int sz;
+
+ /* kernel_map, VMCOREINFO and maximum CPUs */
+ sz = 2 + CONFIG_NR_CPUS_DEFAULT;
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ sz += CONFIG_CRASH_MAX_MEMORY_RANGES;
+ sz *= sizeof(Elf64_Phdr);
+ return sz;
+}
+
/**
* arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
* @image: a pointer to kexec_crash_image
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 6a8a724ac638..bb0e614f2a05 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -335,6 +335,10 @@ struct kimage {
unsigned int preserve_context : 1;
/* If set, we are using file mode kexec syscall */
unsigned int file_mode:1;
+#ifdef CONFIG_CRASH_HOTPLUG
+ /* If set, allow changes to elfcorehdr of kexec_load'd image */
+ unsigned int update_elfcorehdr:1;
+#endif

#ifdef ARCH_HAS_KIMAGE_ARCH
struct kimage_arch arch;
@@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);

/* List of defined/legal kexec flags */
#ifndef CONFIG_KEXEC_JUMP
-#define KEXEC_FLAGS KEXEC_ON_CRASH
+#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
#else
-#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
+#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
#endif

/* List of defined/legal kexec file flags */
@@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
#endif

+int crash_check_update_elfcorehdr(void);
+
#ifndef crash_hotplug_cpu_support
static inline int crash_hotplug_cpu_support(void) { return 0; }
#endif
@@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
static inline int crash_hotplug_memory_support(void) { return 0; }
#endif

+#ifndef crash_get_elfcorehdr_size
+static inline unsigned int crash_get_elfcorehdr_size(void) { return 0; }
+#endif
+
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
struct task_struct;
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 981016e05cfa..01766dd839b0 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -12,6 +12,7 @@
/* kexec flags for different usage scenarios */
#define KEXEC_ON_CRASH 0x00000001
#define KEXEC_PRESERVE_CONTEXT 0x00000002
+#define KEXEC_UPDATE_ELFCOREHDR 0x00000004
#define KEXEC_ARCH_MASK 0xffff0000

/*
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 7eb42a795176..8a524b8ff6a2 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -138,4 +138,8 @@ config CRASH_MAX_MEMORY_RANGES
memory buffer/segment size under 1MiB. This represents a sane choice
to accommodate both baremetal and virtual machine configurations.

+ For the kexec_load() syscall path, CRASH_MAX_MEMORY_RANGES is part of
+ the computation behind the value provided through the
+ /sys/kernel/crash_elfcorehdr_size attribute.
+
endmenu
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 53d211c690a1..fa918176d46d 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
#ifdef CONFIG_CRASH_HOTPLUG
#undef pr_fmt
#define pr_fmt(fmt) "crash hp: " fmt
+
+/*
+ * This routine utilized when the crash_hotplug sysfs node is read.
+ * It reflects the kernel's ability/permission to update the crash
+ * elfcorehdr directly.
+ */
+int crash_check_update_elfcorehdr(void)
+{
+ int rc = 0;
+
+ /* Obtain lock while reading crash information */
+ if (!kexec_trylock()) {
+ pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
+ return 0;
+ }
+ if (kexec_crash_image) {
+ if (kexec_crash_image->file_mode)
+ rc = 1;
+ else
+ rc = kexec_crash_image->update_elfcorehdr;
+ }
+ /* Release lock now that update complete */
+ kexec_unlock();
+
+ return rc;
+}
+
/*
* To accurately reflect hot un/plug changes of cpu and memory resources
* (including onling and offlining of those resources), the elfcorehdr
@@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)

image = kexec_crash_image;

+ /* Check that updating elfcorehdr is permitted */
+ if (!(image->file_mode || image->update_elfcorehdr))
+ goto out;
+
if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 92d301f98776..107f355eac10 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -129,6 +129,11 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
if (flags & KEXEC_PRESERVE_CONTEXT)
image->preserve_context = 1;

+#ifdef CONFIG_CRASH_HOTPLUG
+ if (flags & KEXEC_UPDATE_ELFCOREHDR)
+ image->update_elfcorehdr = 1;
+#endif
+
ret = machine_kexec_prepare(image);
if (ret)
goto out;
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index aad7a3bfd846..1d4bc493b2f4 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
}
KERNEL_ATTR_RO(vmcoreinfo);

+#ifdef CONFIG_CRASH_HOTPLUG
+static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ unsigned int sz = crash_get_elfcorehdr_size();
+
+ return sysfs_emit(buf, "%u\n", sz);
+}
+KERNEL_ATTR_RO(crash_elfcorehdr_size);
+
+#endif
+
#endif /* CONFIG_CRASH_CORE */

/* whether file capabilities are enabled */
@@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
#endif
#ifdef CONFIG_CRASH_CORE
&vmcoreinfo_attr.attr,
+#ifdef CONFIG_CRASH_HOTPLUG
+ &crash_elfcorehdr_size_attr.attr,
+#endif
#endif
#ifndef CONFIG_TINY_RCU
&rcu_expedited_attr.attr,
--
2.31.1


2023-06-28 19:00:52

by Eric DeVolder

[permalink] [raw]
Subject: [PATCH v24 10/10] x86/crash: optimize CPU changes

crash_prepare_elf64_headers() writes into the elfcorehdr an ELF
PT_NOTE for all possible CPUs. As such, subsequent changes to CPUs
(ie. hot un/plug, online/offline) do not need to rewrite the elfcorehdr.

The kimage->file_mode term covers kdump images loaded via the
kexec_file_load() syscall. Since crash_prepare_elf64_headers()
wrote the initial elfcorehdr, no update to the elfcorehdr is
needed for CPU changes.

The kimage->elfcorehdr_updated term covers kdump images loaded via
the kexec_load() syscall. At least one memory or CPU change must occur
to cause crash_prepare_elf64_headers() to rewrite the elfcorehdr.
Afterwards, no update to the elfcorehdr is needed for CPU changes.

This code is intentionally *NOT* hoisted into
crash_handle_hotplug_event() as it would prevent the arch-specific
handler from running for CPU changes. This would break PPC, for
example, which needs to update other information besides the
elfcorehdr, on CPU changes.

Signed-off-by: Eric DeVolder <[email protected]>
Reviewed-by: Sourabh Jain <[email protected]>
Acked-by: Hari Bathini <[email protected]>
Acked-by: Baoquan He <[email protected]>
---
arch/x86/kernel/crash.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index caf22bcb61af..18d2a18d1073 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -467,6 +467,16 @@ void arch_crash_handle_hotplug_event(struct kimage *image)
unsigned long mem, memsz;
unsigned long elfsz = 0;

+ /*
+ * As crash_prepare_elf64_headers() has already described all
+ * possible CPUs, there is no need to update the elfcorehdr
+ * for additional CPU changes.
+ */
+ if ((image->file_mode || image->elfcorehdr_updated) &&
+ ((image->hp_action == KEXEC_CRASH_HP_ADD_CPU) ||
+ (image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU)))
+ return;
+
/*
* Create the new elfcorehdr reflecting the changes to CPU and/or
* memory resources.
--
2.31.1


2023-06-29 13:36:10

by Eric DeVolder

[permalink] [raw]
Subject: Re: [PATCH v24 02/10] drivers/base: refactor memory.c to use .is_visible()

I still need to convert the ifdefs within the functions to IS_ENABLED(), my apologies.
eric

On 6/28/23 13:52, Eric DeVolder wrote:
> Greg Kroah-Hartman requested that this file use the .is_visible()
> method instead of #ifdefs for the attributes in memory.c.
>
> static struct attribute *memory_memblk_attrs[] = {
> &dev_attr_phys_index.attr,
> &dev_attr_state.attr,
> &dev_attr_phys_device.attr,
> &dev_attr_removable.attr,
> #ifdef CONFIG_MEMORY_HOTREMOVE
> &dev_attr_valid_zones.attr,
> #endif
> NULL
> };
>
> and
>
> static struct attribute *memory_root_attrs[] = {
> #ifdef CONFIG_ARCH_MEMORY_PROBE
> &dev_attr_probe.attr,
> #endif
>
> #ifdef CONFIG_MEMORY_FAILURE
> &dev_attr_soft_offline_page.attr,
> &dev_attr_hard_offline_page.attr,
> #endif
>
> &dev_attr_block_size_bytes.attr,
> &dev_attr_auto_online_blocks.attr,
> NULL
> };
>
> To that end:
> - the .is_visible() method is implemented, and IS_ENABLED(), rather
> than #ifdef, is used to determine the visibility of the attribute.
> - the DEVICE_ATTR_xx() attributes are moved outside of #ifdefs, so that
> those structs are always present for the memory_memblk_attrs[] and
> memory_root_attrs[].
> - the #ifdefs guarding the attributes in the memory_memblk_attrs[] and
> memory_root_attrs[] are moved to the corresponding callback function;
> as the callback function must exist now that the attribute is always
> compiled-in (though not necessarily visible).
>
> No functionality change intended.
>
> Signed-off-by: Eric DeVolder <[email protected]>
> ---
> drivers/base/memory.c | 78 +++++++++++++++++++++++++++++++++++--------
> 1 file changed, 65 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index b456ac213610..f03eda7e1c9c 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -405,10 +405,12 @@ static int print_allowed_zone(char *buf, int len, int nid,
>
> return sysfs_emit_at(buf, len, " %s", zone->name);
> }
> +#endif
>
> static ssize_t valid_zones_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> {
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> struct memory_block *mem = to_memory_block(dev);
> unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> @@ -444,9 +446,11 @@ static ssize_t valid_zones_show(struct device *dev,
> out:
> len += sysfs_emit_at(buf, len, "\n");
> return len;
> +#else
> + return 0;
> +#endif
> }
> static DEVICE_ATTR_RO(valid_zones);
> -#endif
>
> static DEVICE_ATTR_RO(phys_index);
> static DEVICE_ATTR_RW(state);
> @@ -496,10 +500,10 @@ static DEVICE_ATTR_RW(auto_online_blocks);
> * as well as ppc64 will do all of their discovery in userspace
> * and will require this interface.
> */
> -#ifdef CONFIG_ARCH_MEMORY_PROBE
> static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
> +#ifdef CONFIG_ARCH_MEMORY_PROBE
> u64 phys_addr;
> int nid, ret;
> unsigned long pages_per_block = PAGES_PER_SECTION * sections_per_block;
> @@ -527,12 +531,13 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
> out:
> unlock_device_hotplug();
> return ret;
> +#else
> + return 0;
> +#endif
> }
>
> static DEVICE_ATTR_WO(probe);
> -#endif
>
> -#ifdef CONFIG_MEMORY_FAILURE
> /*
> * Support for offlining pages of memory
> */
> @@ -542,6 +547,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t count)
> {
> +#ifdef CONFIG_MEMORY_FAILURE
> int ret;
> u64 pfn;
> if (!capable(CAP_SYS_ADMIN))
> @@ -551,6 +557,9 @@ static ssize_t soft_offline_page_store(struct device *dev,
> pfn >>= PAGE_SHIFT;
> ret = soft_offline_page(pfn, 0);
> return ret == 0 ? count : ret;
> +#else
> + return 0;
> +#endif
> }
>
> /* Forcibly offline a page, including killing processes. */
> @@ -558,6 +567,7 @@ static ssize_t hard_offline_page_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t count)
> {
> +#ifdef CONFIG_MEMORY_FAILURE
> int ret;
> u64 pfn;
> if (!capable(CAP_SYS_ADMIN))
> @@ -569,11 +579,13 @@ static ssize_t hard_offline_page_store(struct device *dev,
> if (ret == -EOPNOTSUPP)
> ret = 0;
> return ret ? ret : count;
> +#else
> + return 0;
> +#endif
> }
>
> static DEVICE_ATTR_WO(soft_offline_page);
> static DEVICE_ATTR_WO(hard_offline_page);
> -#endif
>
> /* See phys_device_show(). */
> int __weak arch_get_memory_phys_device(unsigned long start_pfn)
> @@ -611,14 +623,35 @@ static struct attribute *memory_memblk_attrs[] = {
> &dev_attr_state.attr,
> &dev_attr_phys_device.attr,
> &dev_attr_removable.attr,
> -#ifdef CONFIG_MEMORY_HOTREMOVE
> &dev_attr_valid_zones.attr,
> -#endif
> NULL
> };
>
> +static umode_t
> +memory_memblk_attr_is_visible(struct kobject *kobj,
> + struct attribute *attr, int unused)
> +{
> + umode_t mode = attr->mode;
> +
> + if (attr == &dev_attr_phys_index.attr)
> + return mode;
> + if (attr == &dev_attr_state.attr)
> + return mode;
> + if (attr == &dev_attr_phys_device.attr)
> + return mode;
> + if (attr == &dev_attr_removable.attr)
> + return mode;
> + if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) {
> + if (attr == &dev_attr_valid_zones.attr)
> + return mode;
> + }
> +
> + return 0;
> +}
> +
> static const struct attribute_group memory_memblk_attr_group = {
> .attrs = memory_memblk_attrs,
> + .is_visible = memory_memblk_attr_is_visible,
> };
>
> static const struct attribute_group *memory_memblk_attr_groups[] = {
> @@ -878,22 +911,41 @@ void remove_memory_block_devices(unsigned long start, unsigned long size)
> }
>
> static struct attribute *memory_root_attrs[] = {
> -#ifdef CONFIG_ARCH_MEMORY_PROBE
> &dev_attr_probe.attr,
> -#endif
> -
> -#ifdef CONFIG_MEMORY_FAILURE
> &dev_attr_soft_offline_page.attr,
> &dev_attr_hard_offline_page.attr,
> -#endif
> -
> &dev_attr_block_size_bytes.attr,
> &dev_attr_auto_online_blocks.attr,
> NULL
> };
>
> +static umode_t
> +memory_root_attr_is_visible(struct kobject *kobj,
> + struct attribute *attr, int unused)
> +{
> + umode_t mode = attr->mode;
> +
> + if (IS_ENABLED(CONFIG_ARCH_MEMORY_PROBE)) {
> + if (attr == &dev_attr_probe.attr)
> + return mode;
> + }
> + if (IS_ENABLED(CONFIG_MEMORY_FAILURE)) {
> + if (attr == &dev_attr_soft_offline_page.attr)
> + return mode;
> + if (attr == &dev_attr_hard_offline_page.attr)
> + return mode;
> + }
> + if (attr == &dev_attr_block_size_bytes.attr)
> + return mode;
> + if (attr == &dev_attr_auto_online_blocks.attr)
> + return mode;
> +
> + return 0;
> +}
> +
> static const struct attribute_group memory_root_attr_group = {
> .attrs = memory_root_attrs,
> + .is_visible = memory_root_attr_is_visible,
> };
>
> static const struct attribute_group *memory_root_attr_groups[] = {