2021-04-19 00:56:48

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH v4 0/5] RISC-V: Add kexec/kdump support

This patch series adds kexec/kdump and crash kernel
support on RISC-V. For testing the patches a patched
version of kexec-tools is needed (still a work in
progress) which can be found at:

https://riscv.ics.forth.gr/kexec-tools-patched.tar.xz

v4:
* Rebase on top of "fixes" branch
* Resolve Alex's comments

v3:
* Rebase on newer kernel tree
* Minor cleanups
* Split UAPI changes to a separate patch
* Improve / cleanup init_resources
* Resolve Palmer's comments

v2:
* Rebase on newer kernel tree
* Minor cleanups
* Properly populate the ioresources tre, so that it
can be used later on for implementing strict /dev/mem
* Use linux,usable-memory on /memory instead of a new binding
* USe a reserved-memory node for ELF core header

Nick Kossifidis (5):
RISC-V: Add EM_RISCV to kexec UAPI header
RISC-V: Add kexec support
RISC-V: Improve init_resources
RISC-V: Add kdump support
RISC-V: Add crash kernel support

arch/riscv/Kconfig | 25 ++++
arch/riscv/include/asm/elf.h | 6 +
arch/riscv/include/asm/kexec.h | 56 +++++++
arch/riscv/kernel/Makefile | 6 +
arch/riscv/kernel/crash_dump.c | 46 ++++++
arch/riscv/kernel/crash_save_regs.S | 56 +++++++
arch/riscv/kernel/kexec_relocate.S | 223 ++++++++++++++++++++++++++++
arch/riscv/kernel/machine_kexec.c | 193 ++++++++++++++++++++++++
arch/riscv/kernel/setup.c | 114 ++++++++------
arch/riscv/mm/init.c | 104 +++++++++++++
include/uapi/linux/kexec.h | 1 +
11 files changed, 784 insertions(+), 46 deletions(-)
create mode 100644 arch/riscv/include/asm/kexec.h
create mode 100644 arch/riscv/kernel/crash_dump.c
create mode 100644 arch/riscv/kernel/crash_save_regs.S
create mode 100644 arch/riscv/kernel/kexec_relocate.S
create mode 100644 arch/riscv/kernel/machine_kexec.c

--
2.26.2


2021-04-19 00:56:57

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH v4 1/5] RISC-V: Add EM_RISCV to kexec UAPI header

Add RISC-V to the list of supported kexec architectures, we need to
add the definition early-on so that later patches can use it.

EM_RISCV is 243 as per ELF psABI specification here:
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md

Signed-off-by: Nick Kossifidis <[email protected]>
---
include/uapi/linux/kexec.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 05669c87a..778dc191c 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -42,6 +42,7 @@
#define KEXEC_ARCH_MIPS_LE (10 << 16)
#define KEXEC_ARCH_MIPS ( 8 << 16)
#define KEXEC_ARCH_AARCH64 (183 << 16)
+#define KEXEC_ARCH_RISCV (243 << 16)

/* The artificial cap on the number of segments passed to kexec_load. */
#define KEXEC_SEGMENT_MAX 16
--
2.26.2

2021-04-19 00:57:09

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH v4 2/5] RISC-V: Add kexec support

This patch adds support for kexec on RISC-V. On SMP systems it depends
on HOTPLUG_CPU in order to be able to bring up all harts after kexec.
It also needs a recent OpenSBI version that supports the HSM extension.
I tested it on riscv64 QEMU on both an smp and a non-smp system.

v6:
* Re-based on top of "fixes" branch
* Use PAGE_SIZE and CSR_* macros where possible
* Small cleanups

v5:
* For now depend on MMU, further changes needed for NOMMU support
* Make sure stvec is aligned
* Cleanup some unneeded fences
* Verify control code's buffer size
* Compile kexec_relocate.S with medany and norelax

v4:
* No functional changes, just re-based

v3:
* Use the new smp_shutdown_nonboot_cpus() call.
* Move riscv_kexec_relocate to .rodata

v2:
* Pass needed parameters as arguments to riscv_kexec_relocate
instead of using global variables.
* Use kimage_arch to hold the fdt address of the included fdt.
* Use SYM_* macros on kexec_relocate.S.
* Compatibility with STRICT_KERNEL_RWX.
* Compatibility with HOTPLUG_CPU for SMP
* Small cleanups

Signed-off-by: Nick Kossifidis <[email protected]>
---
arch/riscv/Kconfig | 15 +++
arch/riscv/include/asm/kexec.h | 49 ++++++++
arch/riscv/kernel/Makefile | 5 +
arch/riscv/kernel/kexec_relocate.S | 157 ++++++++++++++++++++++++
arch/riscv/kernel/machine_kexec.c | 186 +++++++++++++++++++++++++++++
5 files changed, 412 insertions(+)
create mode 100644 arch/riscv/include/asm/kexec.h
create mode 100644 arch/riscv/kernel/kexec_relocate.S
create mode 100644 arch/riscv/kernel/machine_kexec.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 4515a10c5..10cc19be0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -386,6 +386,21 @@ config RISCV_SBI_V01
help
This config allows kernel to use SBI v0.1 APIs. This will be
deprecated in future once legacy M-mode software are no longer in use.
+
+config KEXEC
+ bool "Kexec system call"
+ select KEXEC_CORE
+ select HOTPLUG_CPU if SMP
+ depends on MMU
+ help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is independent of the system firmware. And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similarity to the exec system call.
+
+
endmenu

menu "Boot options"
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
new file mode 100644
index 000000000..86e6e4922
--- /dev/null
+++ b/arch/riscv/include/asm/kexec.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ * Nick Kossifidis <[email protected]>
+ */
+
+#ifndef _RISCV_KEXEC_H
+#define _RISCV_KEXEC_H
+
+#include <asm/page.h> /* For PAGE_SIZE */
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
+
+/* Reserve a page for the control code buffer */
+#define KEXEC_CONTROL_PAGE_SIZE PAGE_SIZE
+
+#define KEXEC_ARCH KEXEC_ARCH_RISCV
+
+static inline void
+crash_setup_regs(struct pt_regs *newregs,
+ struct pt_regs *oldregs)
+{
+ /* Dummy implementation for now */
+}
+
+
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+ unsigned long fdt_addr;
+};
+
+const extern unsigned char riscv_kexec_relocate[];
+const extern unsigned int riscv_kexec_relocate_size;
+
+typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
+ unsigned long jump_addr,
+ unsigned long fdt_addr,
+ unsigned long hartid,
+ unsigned long va_pa_off);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 647a47f54..211ef87de 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -10,6 +10,10 @@ CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE)
endif
CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)

+ifdef CONFIG_KEXEC
+AFLAGS_kexec_relocate.o := -mcmodel=medany -mno-relax
+endif
+
extra-y += head.o
extra-y += vmlinux.lds

@@ -55,6 +59,7 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
endif
obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
obj-$(CONFIG_KGDB) += kgdb.o
+obj-$(CONFIG_KEXEC) += kexec_relocate.o machine_kexec.o

obj-$(CONFIG_JUMP_LABEL) += jump_label.o

diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
new file mode 100644
index 000000000..84101d0a2
--- /dev/null
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -0,0 +1,157 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ * Nick Kossifidis <[email protected]>
+ */
+
+#include <asm/asm.h> /* For RISCV_* and REG_* macros */
+#include <asm/csr.h> /* For CSR_* macros */
+#include <asm/page.h> /* For PAGE_SIZE */
+#include <linux/linkage.h> /* For SYM_* macros */
+
+.section ".rodata"
+SYM_CODE_START(riscv_kexec_relocate)
+
+ /*
+ * s0: Pointer to the current entry
+ * s1: (const) Phys address to jump to after relocation
+ * s2: (const) Phys address of the FDT image
+ * s3: (const) The hartid of the current hart
+ * s4: Pointer to the destination address for the relocation
+ * s5: (const) Number of words per page
+ * s6: (const) 1, used for subtraction
+ * s7: (const) va_pa_offset, used when switching MMU off
+ * s8: (const) Physical address of the main loop
+ * s9: (debug) indirection page counter
+ * s10: (debug) entry counter
+ * s11: (debug) copied words counter
+ */
+ mv s0, a0
+ mv s1, a1
+ mv s2, a2
+ mv s3, a3
+ mv s4, zero
+ li s5, (PAGE_SIZE / RISCV_SZPTR)
+ li s6, 1
+ mv s7, a4
+ mv s8, zero
+ mv s9, zero
+ mv s10, zero
+ mv s11, zero
+
+ /* Disable / cleanup interrupts */
+ csrw CSR_SIE, zero
+ csrw CSR_SIP, zero
+
+ /*
+ * When we switch SATP.MODE to "Bare" we'll only
+ * play with physical addresses. However the first time
+ * we try to jump somewhere, the offset on the jump
+ * will be relative to pc which will still be on VA. To
+ * deal with this we set stvec to the physical address at
+ * the start of the loop below so that we jump there in
+ * any case.
+ */
+ la s8, 1f
+ sub s8, s8, s7
+ csrw CSR_STVEC, s8
+
+ /* Process entries in a loop */
+.align 2
+1:
+ addi s10, s10, 1
+ REG_L t0, 0(s0) /* t0 = *image->entry */
+ addi s0, s0, RISCV_SZPTR /* image->entry++ */
+
+ /* IND_DESTINATION entry ? -> save destination address */
+ andi t1, t0, 0x1
+ beqz t1, 2f
+ andi s4, t0, ~0x1
+ j 1b
+
+2:
+ /* IND_INDIRECTION entry ? -> update next entry ptr (PA) */
+ andi t1, t0, 0x2
+ beqz t1, 2f
+ andi s0, t0, ~0x2
+ addi s9, s9, 1
+ csrw CSR_SATP, zero
+ jalr zero, s8, 0
+
+2:
+ /* IND_DONE entry ? -> jump to done label */
+ andi t1, t0, 0x4
+ beqz t1, 2f
+ j 4f
+
+2:
+ /*
+ * IND_SOURCE entry ? -> copy page word by word to the
+ * destination address we got from IND_DESTINATION
+ */
+ andi t1, t0, 0x8
+ beqz t1, 1b /* Unknown entry type, ignore it */
+ andi t0, t0, ~0x8
+ mv t3, s5 /* i = num words per page */
+3: /* copy loop */
+ REG_L t1, (t0) /* t1 = *src_ptr */
+ REG_S t1, (s4) /* *dst_ptr = *src_ptr */
+ addi t0, t0, RISCV_SZPTR /* stc_ptr++ */
+ addi s4, s4, RISCV_SZPTR /* dst_ptr++ */
+ sub t3, t3, s6 /* i-- */
+ addi s11, s11, 1 /* c++ */
+ beqz t3, 1b /* copy done ? */
+ j 3b
+
+4:
+ /* Pass the arguments to the next kernel / Cleanup*/
+ mv a0, s3
+ mv a1, s2
+ mv a2, s1
+
+ /* Cleanup */
+ mv a3, zero
+ mv a4, zero
+ mv a5, zero
+ mv a6, zero
+ mv a7, zero
+
+ mv s0, zero
+ mv s1, zero
+ mv s2, zero
+ mv s3, zero
+ mv s4, zero
+ mv s5, zero
+ mv s6, zero
+ mv s7, zero
+ mv s8, zero
+ mv s9, zero
+ mv s10, zero
+ mv s11, zero
+
+ mv t0, zero
+ mv t1, zero
+ mv t2, zero
+ mv t3, zero
+ mv t4, zero
+ mv t5, zero
+ mv t6, zero
+ csrw CSR_SEPC, zero
+ csrw CSR_SCAUSE, zero
+ csrw CSR_SSCRATCH, zero
+
+ /*
+ * Make sure the relocated code is visible
+ * and jump to the new kernel
+ */
+ fence.i
+
+ jalr zero, a2, 0
+
+SYM_CODE_END(riscv_kexec_relocate)
+riscv_kexec_relocate_end:
+
+ .section ".rodata"
+SYM_DATA(riscv_kexec_relocate_size,
+ .long riscv_kexec_relocate_end - riscv_kexec_relocate)
+
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
new file mode 100644
index 000000000..cd6803311
--- /dev/null
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ * Nick Kossifidis <[email protected]>
+ */
+
+#include <linux/kexec.h>
+#include <asm/kexec.h> /* For riscv_kexec_* symbol defines */
+#include <linux/smp.h> /* For smp_send_stop () */
+#include <asm/cacheflush.h> /* For local_flush_icache_all() */
+#include <asm/barrier.h> /* For smp_wmb() */
+#include <asm/page.h> /* For PAGE_MASK */
+#include <linux/libfdt.h> /* For fdt_check_header() */
+#include <asm/set_memory.h> /* For set_memory_x() */
+#include <linux/compiler.h> /* For unreachable() */
+#include <linux/cpu.h> /* For cpu_down() */
+
+/**
+ * kexec_image_info - Print received image details
+ */
+static void
+kexec_image_info(const struct kimage *image)
+{
+ unsigned long i;
+
+ pr_debug("Kexec image info:\n");
+ pr_debug("\ttype: %d\n", image->type);
+ pr_debug("\tstart: %lx\n", image->start);
+ pr_debug("\thead: %lx\n", image->head);
+ pr_debug("\tnr_segments: %lu\n", image->nr_segments);
+
+ for (i = 0; i < image->nr_segments; i++) {
+ pr_debug("\t segment[%lu]: %016lx - %016lx", i,
+ image->segment[i].mem,
+ image->segment[i].mem + image->segment[i].memsz);
+ pr_debug("\t\t0x%lx bytes, %lu pages\n",
+ (unsigned long) image->segment[i].memsz,
+ (unsigned long) image->segment[i].memsz / PAGE_SIZE);
+ }
+}
+
+/**
+ * machine_kexec_prepare - Initialize kexec
+ *
+ * This function is called from do_kexec_load, when the user has
+ * provided us with an image to be loaded. Its goal is to validate
+ * the image and prepare the control code buffer as needed.
+ * Note that kimage_alloc_init has already been called and the
+ * control buffer has already been allocated.
+ */
+int
+machine_kexec_prepare(struct kimage *image)
+{
+ struct kimage_arch *internal = &image->arch;
+ struct fdt_header fdt = {0};
+ void *control_code_buffer = NULL;
+ unsigned int control_code_buffer_sz = 0;
+ int i = 0;
+
+ kexec_image_info(image);
+
+ if (image->type == KEXEC_TYPE_CRASH) {
+ pr_warn("Loading a crash kernel is unsupported for now.\n");
+ return -EINVAL;
+ }
+
+ /* Find the Flattened Device Tree and save its physical address */
+ for (i = 0; i < image->nr_segments; i++) {
+ if (image->segment[i].memsz <= sizeof(fdt))
+ continue;
+
+ if (copy_from_user(&fdt, image->segment[i].buf, sizeof(fdt)))
+ continue;
+
+ if (fdt_check_header(&fdt))
+ continue;
+
+ internal->fdt_addr = (unsigned long) image->segment[i].mem;
+ break;
+ }
+
+ if (!internal->fdt_addr) {
+ pr_err("Device tree not included in the provided image\n");
+ return -EINVAL;
+ }
+
+ /* Copy the assembler code for relocation to the control page */
+ control_code_buffer = page_address(image->control_code_page);
+ control_code_buffer_sz = page_size(image->control_code_page);
+ if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) {
+ pr_err("Relocation code doesn't fit within a control page\n");
+ return -EINVAL;
+ }
+ memcpy(control_code_buffer, riscv_kexec_relocate,
+ riscv_kexec_relocate_size);
+
+ /* Mark the control page executable */
+ set_memory_x((unsigned long) control_code_buffer, 1);
+
+ return 0;
+}
+
+
+/**
+ * machine_kexec_cleanup - Cleanup any leftovers from
+ * machine_kexec_prepare
+ *
+ * This function is called by kimage_free to handle any arch-specific
+ * allocations done on machine_kexec_prepare. Since we didn't do any
+ * allocations there, this is just an empty function. Note that the
+ * control buffer is freed by kimage_free.
+ */
+void
+machine_kexec_cleanup(struct kimage *image)
+{
+}
+
+
+/*
+ * machine_shutdown - Prepare for a kexec reboot
+ *
+ * This function is called by kernel_kexec just before machine_kexec
+ * below. Its goal is to prepare the rest of the system (the other
+ * harts and possibly devices etc) for a kexec reboot.
+ */
+void machine_shutdown(void)
+{
+ /*
+ * No more interrupts on this hart
+ * until we are back up.
+ */
+ local_irq_disable();
+
+#if defined(CONFIG_HOTPLUG_CPU)
+ smp_shutdown_nonboot_cpus(smp_processor_id());
+#endif
+}
+
+/**
+ * machine_crash_shutdown - Prepare to kexec after a kernel crash
+ *
+ * This function is called by crash_kexec just before machine_kexec
+ * below and its goal is similar to machine_shutdown, but in case of
+ * a kernel crash. Since we don't handle such cases yet, this function
+ * is empty.
+ */
+void
+machine_crash_shutdown(struct pt_regs *regs)
+{
+}
+
+/**
+ * machine_kexec - Jump to the loaded kimage
+ *
+ * This function is called by kernel_kexec which is called by the
+ * reboot system call when the reboot cmd is LINUX_REBOOT_CMD_KEXEC,
+ * or by crash_kernel which is called by the kernel's arch-specific
+ * trap handler in case of a kernel panic. It's the final stage of
+ * the kexec process where the pre-loaded kimage is ready to be
+ * executed. We assume at this point that all other harts are
+ * suspended and this hart will be the new boot hart.
+ */
+void __noreturn
+machine_kexec(struct kimage *image)
+{
+ struct kimage_arch *internal = &image->arch;
+ unsigned long jump_addr = (unsigned long) image->start;
+ unsigned long first_ind_entry = (unsigned long) &image->head;
+ unsigned long this_hart_id = raw_smp_processor_id();
+ unsigned long fdt_addr = internal->fdt_addr;
+ void *control_code_buffer = page_address(image->control_code_page);
+ riscv_kexec_do_relocate do_relocate = control_code_buffer;
+
+ pr_notice("Will call new kernel at %08lx from hart id %lx\n",
+ jump_addr, this_hart_id);
+ pr_notice("FDT image at %08lx\n", fdt_addr);
+
+ /* Make sure the relocation code is visible to the hart */
+ local_flush_icache_all();
+
+ /* Jump to the relocation code */
+ pr_notice("Bye...\n");
+ do_relocate(first_ind_entry, jump_addr, fdt_addr,
+ this_hart_id, va_pa_offset);
+ unreachable();
+}
--
2.26.2

2021-04-19 00:58:27

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH v4 3/5] RISC-V: Improve init_resources

* Kernel region is always present and we know where it is, no
need to look for it inside the loop, just ignore it like the
rest of the reserved regions within system's memory.

* Don't call memblock_free inside the loop, if called it'll split
the region of pre-allocated resources in two parts, messing things
up, just re-use the previous pre-allocated resource and free any
unused resources after both loops finish.

Signed-off-by: Nick Kossifidis <[email protected]>
---
arch/riscv/kernel/setup.c | 91 ++++++++++++++++++++-------------------
1 file changed, 46 insertions(+), 45 deletions(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f8f15332c..030554bab 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -60,6 +60,7 @@ static DEFINE_PER_CPU(struct cpu, cpu_devices);
* also add "System RAM" regions for compatibility with other
* archs, and the rest of the known regions for completeness.
*/
+static struct resource kimage_res = { .name = "Kernel image", };
static struct resource code_res = { .name = "Kernel code", };
static struct resource data_res = { .name = "Kernel data", };
static struct resource rodata_res = { .name = "Kernel rodata", };
@@ -80,45 +81,54 @@ static int __init add_resource(struct resource *parent,
return 1;
}

-static int __init add_kernel_resources(struct resource *res)
+static int __init add_kernel_resources(void)
{
int ret = 0;

/*
* The memory region of the kernel image is continuous and
- * was reserved on setup_bootmem, find it here and register
- * it as a resource, then register the various segments of
- * the image as child nodes
+ * was reserved on setup_bootmem, register it here as a
+ * resource, with the various segments of the image as
+ * child nodes.
*/
- if (!(res->start <= code_res.start && res->end >= data_res.end))
- return 0;

- res->name = "Kernel image";
- res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+ code_res.start = __pa_symbol(_text);
+ code_res.end = __pa_symbol(_etext) - 1;
+ code_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;

- /*
- * We removed a part of this region on setup_bootmem so
- * we need to expand the resource for the bss to fit in.
- */
- res->end = bss_res.end;
+ rodata_res.start = __pa_symbol(__start_rodata);
+ rodata_res.end = __pa_symbol(__end_rodata) - 1;
+ rodata_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ data_res.start = __pa_symbol(_data);
+ data_res.end = __pa_symbol(_edata) - 1;
+ data_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ bss_res.start = __pa_symbol(__bss_start);
+ bss_res.end = __pa_symbol(__bss_stop) - 1;
+ bss_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ kimage_res.start = code_res.start;
+ kimage_res.end = bss_res.end;
+ kimage_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;

- ret = add_resource(&iomem_resource, res);
+ ret = add_resource(&iomem_resource, &kimage_res);
if (ret < 0)
return ret;

- ret = add_resource(res, &code_res);
+ ret = add_resource(&kimage_res, &code_res);
if (ret < 0)
return ret;

- ret = add_resource(res, &rodata_res);
+ ret = add_resource(&kimage_res, &rodata_res);
if (ret < 0)
return ret;

- ret = add_resource(res, &data_res);
+ ret = add_resource(&kimage_res, &data_res);
if (ret < 0)
return ret;

- ret = add_resource(res, &bss_res);
+ ret = add_resource(&kimage_res, &bss_res);

return ret;
}
@@ -129,54 +139,42 @@ static void __init init_resources(void)
struct resource *res = NULL;
struct resource *mem_res = NULL;
size_t mem_res_sz = 0;
- int ret = 0, i = 0;
-
- code_res.start = __pa_symbol(_text);
- code_res.end = __pa_symbol(_etext) - 1;
- code_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-
- rodata_res.start = __pa_symbol(__start_rodata);
- rodata_res.end = __pa_symbol(__end_rodata) - 1;
- rodata_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-
- data_res.start = __pa_symbol(_data);
- data_res.end = __pa_symbol(_edata) - 1;
- data_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-
- bss_res.start = __pa_symbol(__bss_start);
- bss_res.end = __pa_symbol(__bss_stop) - 1;
- bss_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+ int num_resources = 0, res_idx = 0;
+ int ret = 0;

/* + 1 as memblock_alloc() might increase memblock.reserved.cnt */
- mem_res_sz = (memblock.memory.cnt + memblock.reserved.cnt + 1) * sizeof(*mem_res);
+ num_resources = memblock.memory.cnt + memblock.reserved.cnt + 1;
+ res_idx = num_resources - 1;
+
+ mem_res_sz = num_resources * sizeof(*mem_res);
mem_res = memblock_alloc(mem_res_sz, SMP_CACHE_BYTES);
if (!mem_res)
panic("%s: Failed to allocate %zu bytes\n", __func__, mem_res_sz);
+
/*
* Start by adding the reserved regions, if they overlap
* with /memory regions, insert_resource later on will take
* care of it.
*/
+ ret = add_kernel_resources();
+ if (ret < 0)
+ goto error;
+
for_each_reserved_mem_region(region) {
- res = &mem_res[i++];
+ res = &mem_res[res_idx--];

res->name = "Reserved";
res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
res->start = __pfn_to_phys(memblock_region_reserved_base_pfn(region));
res->end = __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;

- ret = add_kernel_resources(res);
- if (ret < 0)
- goto error;
- else if (ret)
- continue;
-
/*
* Ignore any other reserved regions within
* system memory.
*/
if (memblock_is_memory(res->start)) {
- memblock_free((phys_addr_t) res, sizeof(struct resource));
+ /* Re-use this pre-allocated resource */
+ res_idx++;
continue;
}

@@ -187,7 +185,7 @@ static void __init init_resources(void)

/* Add /memory regions to the resource tree */
for_each_mem_region(region) {
- res = &mem_res[i++];
+ res = &mem_res[res_idx--];

if (unlikely(memblock_is_nomap(region))) {
res->name = "Reserved";
@@ -205,6 +203,9 @@ static void __init init_resources(void)
goto error;
}

+ /* Clean-up any unused pre-allocated resources */
+ mem_res_sz = (num_resources - res_idx + 1) * sizeof(*mem_res);
+ memblock_free((phys_addr_t) mem_res, mem_res_sz);
return;

error:
--
2.26.2

2021-04-19 00:58:46

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH v4 4/5] RISC-V: Add kdump support

This patch adds support for kdump, the kernel will reserve a
region for the crash kernel and jump there on panic. In order
for userspace tools (kexec-tools) to prepare the crash kernel
kexec image, we also need to expose some information on
/proc/iomem for the memory regions used by the kernel and for
the region reserved for crash kernel. Note that on userspace
the device tree is used to determine the system's memory
layout so the "System RAM" on /proc/iomem is ignored.

I tested this on riscv64 qemu and works as expected, you may
test it by triggering a crash through /proc/sysrq_trigger:

echo c > /proc/sysrq_trigger

v4:
* Re-base on top of "fixes" branch
* Use CSR_* macros and PMD_SIZE where possible

v3:
* Move ELF_CORE_COPY_REGS to asm/elf.h instead of uapi/asm/elf.h
* Set stvec when disabling MMU
* Minor cleanups and re-base

v2:
* Properly populate the ioresources tree, so that it can be
used later on for implementing strict /dev/mem.
* Minor cleanups and re-base

Signed-off-by: Nick Kossifidis <[email protected]>
---
arch/riscv/include/asm/elf.h | 6 +++
arch/riscv/include/asm/kexec.h | 19 +++++---
arch/riscv/kernel/Makefile | 2 +-
arch/riscv/kernel/crash_save_regs.S | 56 +++++++++++++++++++++++
arch/riscv/kernel/kexec_relocate.S | 68 ++++++++++++++++++++++++++-
arch/riscv/kernel/machine_kexec.c | 43 +++++++++--------
arch/riscv/kernel/setup.c | 11 ++++-
arch/riscv/mm/init.c | 71 +++++++++++++++++++++++++++++
8 files changed, 249 insertions(+), 27 deletions(-)
create mode 100644 arch/riscv/kernel/crash_save_regs.S

diff --git a/arch/riscv/include/asm/elf.h b/arch/riscv/include/asm/elf.h
index 5c725e1df..f4b490cd0 100644
--- a/arch/riscv/include/asm/elf.h
+++ b/arch/riscv/include/asm/elf.h
@@ -81,4 +81,10 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm,
int uses_interp);
#endif /* CONFIG_MMU */

+#define ELF_CORE_COPY_REGS(dest, regs) \
+do { \
+ *(struct user_regs_struct *)&(dest) = \
+ *(struct user_regs_struct *)regs; \
+} while (0);
+
#endif /* _ASM_RISCV_ELF_H */
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
index 86e6e4922..1e9541019 100644
--- a/arch/riscv/include/asm/kexec.h
+++ b/arch/riscv/include/asm/kexec.h
@@ -23,11 +23,16 @@

#define KEXEC_ARCH KEXEC_ARCH_RISCV

+extern void riscv_crash_save_regs(struct pt_regs *newregs);
+
static inline void
crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Dummy implementation for now */
+ if (oldregs)
+ memcpy(newregs, oldregs, sizeof(struct pt_regs));
+ else
+ riscv_crash_save_regs(newregs);
}


@@ -40,10 +45,12 @@ struct kimage_arch {
const extern unsigned char riscv_kexec_relocate[];
const extern unsigned int riscv_kexec_relocate_size;

-typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
- unsigned long jump_addr,
- unsigned long fdt_addr,
- unsigned long hartid,
- unsigned long va_pa_off);
+typedef void (*riscv_kexec_method)(unsigned long first_ind_entry,
+ unsigned long jump_addr,
+ unsigned long fdt_addr,
+ unsigned long hartid,
+ unsigned long va_pa_off);
+
+extern riscv_kexec_method riscv_kexec_norelocate;

#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 211ef87de..41a1469b2 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -59,7 +59,7 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
endif
obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
obj-$(CONFIG_KGDB) += kgdb.o
-obj-$(CONFIG_KEXEC) += kexec_relocate.o machine_kexec.o
+obj-$(CONFIG_KEXEC) += kexec_relocate.o crash_save_regs.o machine_kexec.o

obj-$(CONFIG_JUMP_LABEL) += jump_label.o

diff --git a/arch/riscv/kernel/crash_save_regs.S b/arch/riscv/kernel/crash_save_regs.S
new file mode 100644
index 000000000..7832fb763
--- /dev/null
+++ b/arch/riscv/kernel/crash_save_regs.S
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 FORTH-ICS/CARV
+ * Nick Kossifidis <[email protected]>
+ */
+
+#include <asm/asm.h> /* For RISCV_* and REG_* macros */
+#include <asm/csr.h> /* For CSR_* macros */
+#include <asm/asm-offsets.h> /* For offsets on pt_regs */
+#include <linux/linkage.h> /* For SYM_* macros */
+
+.section ".text"
+SYM_CODE_START(riscv_crash_save_regs)
+ REG_S ra, PT_RA(a0) /* x1 */
+ REG_S sp, PT_SP(a0) /* x2 */
+ REG_S gp, PT_GP(a0) /* x3 */
+ REG_S tp, PT_TP(a0) /* x4 */
+ REG_S t0, PT_T0(a0) /* x5 */
+ REG_S t1, PT_T1(a0) /* x6 */
+ REG_S t2, PT_T2(a0) /* x7 */
+ REG_S s0, PT_S0(a0) /* x8/fp */
+ REG_S s1, PT_S1(a0) /* x9 */
+ REG_S a0, PT_A0(a0) /* x10 */
+ REG_S a1, PT_A1(a0) /* x11 */
+ REG_S a2, PT_A2(a0) /* x12 */
+ REG_S a3, PT_A3(a0) /* x13 */
+ REG_S a4, PT_A4(a0) /* x14 */
+ REG_S a5, PT_A5(a0) /* x15 */
+ REG_S a6, PT_A6(a0) /* x16 */
+ REG_S a7, PT_A7(a0) /* x17 */
+ REG_S s2, PT_S2(a0) /* x18 */
+ REG_S s3, PT_S3(a0) /* x19 */
+ REG_S s4, PT_S4(a0) /* x20 */
+ REG_S s5, PT_S5(a0) /* x21 */
+ REG_S s6, PT_S6(a0) /* x22 */
+ REG_S s7, PT_S7(a0) /* x23 */
+ REG_S s8, PT_S8(a0) /* x24 */
+ REG_S s9, PT_S9(a0) /* x25 */
+ REG_S s10, PT_S10(a0) /* x26 */
+ REG_S s11, PT_S11(a0) /* x27 */
+ REG_S t3, PT_T3(a0) /* x28 */
+ REG_S t4, PT_T4(a0) /* x29 */
+ REG_S t5, PT_T5(a0) /* x30 */
+ REG_S t6, PT_T6(a0) /* x31 */
+
+ csrr t1, CSR_STATUS
+ csrr t2, CSR_EPC
+ csrr t3, CSR_TVAL
+ csrr t4, CSR_CAUSE
+
+ REG_S t1, PT_STATUS(a0)
+ REG_S t2, PT_EPC(a0)
+ REG_S t3, PT_BADADDR(a0)
+ REG_S t4, PT_CAUSE(a0)
+ ret
+SYM_CODE_END(riscv_crash_save_regs)
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index 84101d0a2..88c3beabe 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -151,7 +151,73 @@ SYM_CODE_START(riscv_kexec_relocate)
SYM_CODE_END(riscv_kexec_relocate)
riscv_kexec_relocate_end:

- .section ".rodata"
+
+/* Used for jumping to crashkernel */
+.section ".text"
+SYM_CODE_START(riscv_kexec_norelocate)
+ /*
+ * s0: (const) Phys address to jump to
+ * s1: (const) Phys address of the FDT image
+ * s2: (const) The hartid of the current hart
+ * s3: (const) va_pa_offset, used when switching MMU off
+ */
+ mv s0, a1
+ mv s1, a2
+ mv s2, a3
+ mv s3, a4
+
+ /* Disable / cleanup interrupts */
+ csrw CSR_SIE, zero
+ csrw CSR_SIP, zero
+
+ /* Switch to physical addressing */
+ la s4, 1f
+ sub s4, s4, s3
+ csrw CSR_STVEC, s4
+ csrw CSR_SATP, zero
+
+.align 2
+1:
+ /* Pass the arguments to the next kernel / Cleanup*/
+ mv a0, s2
+ mv a1, s1
+ mv a2, s0
+
+ /* Cleanup */
+ mv a3, zero
+ mv a4, zero
+ mv a5, zero
+ mv a6, zero
+ mv a7, zero
+
+ mv s0, zero
+ mv s1, zero
+ mv s2, zero
+ mv s3, zero
+ mv s4, zero
+ mv s5, zero
+ mv s6, zero
+ mv s7, zero
+ mv s8, zero
+ mv s9, zero
+ mv s10, zero
+ mv s11, zero
+
+ mv t0, zero
+ mv t1, zero
+ mv t2, zero
+ mv t3, zero
+ mv t4, zero
+ mv t5, zero
+ mv t6, zero
+ csrw CSR_SEPC, zero
+ csrw CSR_SCAUSE, zero
+ csrw CSR_SSCRATCH, zero
+
+ jalr zero, a2, 0
+SYM_CODE_END(riscv_kexec_norelocate)
+
+.section ".rodata"
SYM_DATA(riscv_kexec_relocate_size,
.long riscv_kexec_relocate_end - riscv_kexec_relocate)

diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index cd6803311..51e2671a3 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -59,11 +59,6 @@ machine_kexec_prepare(struct kimage *image)

kexec_image_info(image);

- if (image->type == KEXEC_TYPE_CRASH) {
- pr_warn("Loading a crash kernel is unsupported for now.\n");
- return -EINVAL;
- }
-
/* Find the Flattened Device Tree and save its physical address */
for (i = 0; i < image->nr_segments; i++) {
if (image->segment[i].memsz <= sizeof(fdt))
@@ -85,17 +80,21 @@ machine_kexec_prepare(struct kimage *image)
}

/* Copy the assembler code for relocation to the control page */
- control_code_buffer = page_address(image->control_code_page);
- control_code_buffer_sz = page_size(image->control_code_page);
- if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) {
- pr_err("Relocation code doesn't fit within a control page\n");
- return -EINVAL;
- }
- memcpy(control_code_buffer, riscv_kexec_relocate,
- riscv_kexec_relocate_size);
+ if (image->type != KEXEC_TYPE_CRASH) {
+ control_code_buffer = page_address(image->control_code_page);
+ control_code_buffer_sz = page_size(image->control_code_page);

- /* Mark the control page executable */
- set_memory_x((unsigned long) control_code_buffer, 1);
+ if (unlikely(riscv_kexec_relocate_size > control_code_buffer_sz)) {
+ pr_err("Relocation code doesn't fit within a control page\n");
+ return -EINVAL;
+ }
+
+ memcpy(control_code_buffer, riscv_kexec_relocate,
+ riscv_kexec_relocate_size);
+
+ /* Mark the control page executable */
+ set_memory_x((unsigned long) control_code_buffer, 1);
+ }

return 0;
}
@@ -147,6 +146,9 @@ void machine_shutdown(void)
void
machine_crash_shutdown(struct pt_regs *regs)
{
+ crash_save_cpu(regs, smp_processor_id());
+ machine_shutdown();
+ pr_info("Starting crashdump kernel...\n");
}

/**
@@ -169,7 +171,12 @@ machine_kexec(struct kimage *image)
unsigned long this_hart_id = raw_smp_processor_id();
unsigned long fdt_addr = internal->fdt_addr;
void *control_code_buffer = page_address(image->control_code_page);
- riscv_kexec_do_relocate do_relocate = control_code_buffer;
+ riscv_kexec_method kexec_method = NULL;
+
+ if (image->type != KEXEC_TYPE_CRASH)
+ kexec_method = control_code_buffer;
+ else
+ kexec_method = (riscv_kexec_method) &riscv_kexec_norelocate;

pr_notice("Will call new kernel at %08lx from hart id %lx\n",
jump_addr, this_hart_id);
@@ -180,7 +187,7 @@ machine_kexec(struct kimage *image)

/* Jump to the relocation code */
pr_notice("Bye...\n");
- do_relocate(first_ind_entry, jump_addr, fdt_addr,
- this_hart_id, va_pa_offset);
+ kexec_method(first_ind_entry, jump_addr, fdt_addr,
+ this_hart_id, va_pa_offset);
unreachable();
}
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 030554bab..31866dce9 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -20,6 +20,7 @@
#include <linux/swiotlb.h>
#include <linux/smp.h>
#include <linux/efi.h>
+#include <linux/crash_dump.h>

#include <asm/cpu_ops.h>
#include <asm/early_ioremap.h>
@@ -160,6 +161,14 @@ static void __init init_resources(void)
if (ret < 0)
goto error;

+#ifdef CONFIG_KEXEC_CORE
+ if (crashk_res.start != crashk_res.end) {
+ ret = add_resource(&iomem_resource, &crashk_res);
+ if (ret < 0)
+ goto error;
+ }
+#endif
+
for_each_reserved_mem_region(region) {
res = &mem_res[res_idx--];

@@ -252,7 +261,6 @@ void __init setup_arch(char **cmdline_p)
efi_init();
setup_bootmem();
paging_init();
- init_resources();
#if IS_ENABLED(CONFIG_BUILTIN_DTB)
unflatten_and_copy_device_tree();
#else
@@ -263,6 +271,7 @@ void __init setup_arch(char **cmdline_p)
#endif
misc_mem_init();

+ init_resources();
sbi_init();

if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 067583ab1..8b2b85a57 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -2,6 +2,8 @@
/*
* Copyright (C) 2012 Regents of the University of California
* Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2020 FORTH-ICS/CARV
+ * Nick Kossifidis <[email protected]>
*/

#include <linux/init.h>
@@ -14,6 +16,7 @@
#include <linux/libfdt.h>
#include <linux/set_memory.h>
#include <linux/dma-map-ops.h>
+#include <linux/crash_dump.h>

#include <asm/fixmap.h>
#include <asm/tlbflush.h>
@@ -585,6 +588,71 @@ void mark_rodata_ro(void)
}
#endif

+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_base = 0;
+ unsigned long long crash_size = 0;
+ unsigned long search_start = memblock_start_of_DRAM();
+ unsigned long search_end = memblock_end_of_DRAM();
+
+ int ret = 0;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ if (ret || !crash_size)
+ return;
+
+ crash_size = PAGE_ALIGN(crash_size);
+
+ if (crash_base == 0) {
+ /*
+ * Current riscv boot protocol requires 2MB alignment for
+ * RV64 and 4MB alignment for RV32 (hugepage size)
+ */
+ crash_base = memblock_find_in_range(search_start, search_end,
+ crash_size, PMD_SIZE);
+
+ if (crash_base == 0) {
+ pr_warn("crashkernel: couldn't allocate %lldKB\n",
+ crash_size >> 10);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size)) {
+ pr_warn("crashkernel: requested region is not memory\n");
+ return;
+ }
+
+ if (memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("crashkernel: requested region is reserved\n");
+ return;
+ }
+
+
+ if (!IS_ALIGNED(crash_base, PMD_SIZE)) {
+ pr_warn("crashkernel: requested region is misaligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("crashkernel: reserved 0x%016llx - 0x%016llx (%lld MB)\n",
+ crash_base, crash_base + crash_size, crash_size >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#endif /* CONFIG_KEXEC_CORE */
+
void __init paging_init(void)
{
setup_vm_final();
@@ -596,6 +664,9 @@ void __init misc_mem_init(void)
arch_numa_init();
sparse_init();
zone_sizes_init();
+#ifdef CONFIG_KEXEC_CORE
+ reserve_crashkernel();
+#endif
memblock_dump_all();
}

--
2.26.2