2021-11-26 18:06:42

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH 1/3] riscv: Don't use va_pa_offset on kdump

On kdump instead of using an intermediate step to relocate the kernel,
that lives in a "control buffer" outside the current kernel's mapping,
we jump to the crash kernel directly by calling riscv_kexec_norelocate().
The current implementation uses va_pa_offset while switching to physical
addressing, however since we moved the kernel outside the linear mapping
this won't work anymore since riscv_kexec_norelocate() is part of the
kernel mapping and we should use kernel_map.va_kernel_pa_offset, and also
take XIP kernel into account.

We don't really need to use va_pa_offset on riscv_kexec_norelocate, we
can just set STVEC to the physical address of the new kernel instead and
let the hart jump to the new kernel on the next instruction after setting
SATP to zero. This fixes kdump and is also simpler/cleaner.

I tested this on the latest qemu and HiFive Unmatched and works as
expected.

v2: I removed the direct jump after setting satp as suggested.

Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")

Signed-off-by: Nick Kossifidis <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Cc: <[email protected]> # 5.13
Cc: <[email protected]> # 5.14
---
arch/riscv/kernel/kexec_relocate.S | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index a80b52a74..059c5e216 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -159,25 +159,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
* s0: (const) Phys address to jump to
* s1: (const) Phys address of the FDT image
* s2: (const) The hartid of the current hart
- * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
*/
mv s0, a1
mv s1, a2
mv s2, a3
- mv s3, a4

/* Disable / cleanup interrupts */
csrw CSR_SIE, zero
csrw CSR_SIP, zero

- /* Switch to physical addressing */
- la s4, 1f
- sub s4, s4, s3
- csrw CSR_STVEC, s4
- csrw CSR_SATP, zero
-
-.align 2
-1:
/* Pass the arguments to the next kernel / Cleanup*/
mv a0, s2
mv a1, s1
@@ -214,7 +204,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
csrw CSR_SCAUSE, zero
csrw CSR_SSCRATCH, zero

- jalr zero, a2, 0
+ /*
+ * Switch to physical addressing
+ * This will also trigger a jump to CSR_STVEC
+ * which in this case is the address of the new
+ * kernel.
+ */
+ csrw CSR_STVEC, a2
+ csrw CSR_SATP, zero
+
SYM_CODE_END(riscv_kexec_norelocate)

.section ".rodata"
--
2.32.0



2021-11-26 18:06:46

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH 2/3] riscv: use hart id instead of cpu id on machine_kexec

raw_smp_processor_id() doesn't return the hart id as stated in
arch/riscv/include/asm/smp.h, use smp_processor_id() instead
to get the cpu id, and cpuid_to_hartid_map() to pass the hart id
to the next kernel. This fixes kexec on HiFive Unleashed/Unmatched
where cpu ids and hart ids don't match (on qemu-virt they match).

Fixes: fba8a8674f68 ("RISC-V: Add kexec support")

Signed-off-by: Nick Kossifidis <[email protected]>
Cc: [email protected]
---
arch/riscv/kernel/machine_kexec.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index e6eca271a..cbef0fc73 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -169,7 +169,8 @@ machine_kexec(struct kimage *image)
struct kimage_arch *internal = &image->arch;
unsigned long jump_addr = (unsigned long) image->start;
unsigned long first_ind_entry = (unsigned long) &image->head;
- unsigned long this_hart_id = raw_smp_processor_id();
+ unsigned long this_cpu_id = smp_processor_id();
+ unsigned long this_hart_id = cpuid_to_hartid_map(this_cpu_id);
unsigned long fdt_addr = internal->fdt_addr;
void *control_code_buffer = page_address(image->control_code_page);
riscv_kexec_method kexec_method = NULL;
--
2.32.0


2021-11-26 18:06:48

by Nick Kossifidis

[permalink] [raw]
Subject: [PATCH 3/3] riscv: try to allocate crashkern region from 32bit addressible memory

When allocating chrash kernel region without explicitly specifying its
base address/size, memblock_phys_alloc_range will attempt to allocate
memory top to bottom (memblock.bottom_up is false), so the crash
kernel region will end up in highmem on 64bit systems. This way
swiotlb can't work on the crash kernel, since there won't be any
32bit addressible memory available for the bounce buffers.

Try to allocate 32bit addressible memory if available, for the
crash kernel by restricting the top search address to be less
than SZ_4G. If that fails fallback to the previous behavior.

I tested this on HiFive Unmatched where the pci-e controller needs
swiotlb to work, with this patch it's possible to access the pci-e
controller on crash kernel and mount the rootfs from the nvme.

Signed-off-by: Nick Kossifidis <[email protected]>
---
arch/riscv/mm/init.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 24b2b8044..1963a517e 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -812,13 +812,22 @@ static void __init reserve_crashkernel(void)
/*
* Current riscv boot protocol requires 2MB alignment for
* RV64 and 4MB alignment for RV32 (hugepage size)
+ *
+ * Try to alloc from 32bit addressible physical memory so that
+ * swiotlb can work on the crash kernel.
*/
crash_base = memblock_phys_alloc_range(crash_size, PMD_SIZE,
- search_start, search_end);
+ search_start,
+ min(search_end, (unsigned long) SZ_4G));
if (crash_base == 0) {
- pr_warn("crashkernel: couldn't allocate %lldKB\n",
- crash_size >> 10);
- return;
+ /* Try again without restricting region to 32bit addressible memory */
+ crash_base = memblock_phys_alloc_range(crash_size, PMD_SIZE,
+ search_start, search_end);
+ if (crash_base == 0) {
+ pr_warn("crashkernel: couldn't allocate %lldKB\n",
+ crash_size >> 10);
+ return;
+ }
}

pr_info("crashkernel: reserved 0x%016llx - 0x%016llx (%lld MB)\n",
--
2.32.0


2022-01-07 18:21:01

by Nick Kossifidis

[permalink] [raw]
Subject: Re: [PATCH 1/3] riscv: Don't use va_pa_offset on kdump

Hello Palmer,

Any updates on those 3 patches ?

Regards,
Nick

Στις 2021-11-26 20:04, Nick Kossifidis έγραψε:
> On kdump instead of using an intermediate step to relocate the kernel,
> that lives in a "control buffer" outside the current kernel's mapping,
> we jump to the crash kernel directly by calling
> riscv_kexec_norelocate().
> The current implementation uses va_pa_offset while switching to
> physical
> addressing, however since we moved the kernel outside the linear
> mapping
> this won't work anymore since riscv_kexec_norelocate() is part of the
> kernel mapping and we should use kernel_map.va_kernel_pa_offset, and
> also
> take XIP kernel into account.
>
> We don't really need to use va_pa_offset on riscv_kexec_norelocate, we
> can just set STVEC to the physical address of the new kernel instead
> and
> let the hart jump to the new kernel on the next instruction after
> setting
> SATP to zero. This fixes kdump and is also simpler/cleaner.
>
> I tested this on the latest qemu and HiFive Unmatched and works as
> expected.
>
> v2: I removed the direct jump after setting satp as suggested.
>
> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear
> mapping")
>
> Signed-off-by: Nick Kossifidis <[email protected]>
> Reviewed-by: Alexandre Ghiti <[email protected]>
> Cc: <[email protected]> # 5.13
> Cc: <[email protected]> # 5.14


2022-01-09 18:56:59

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH 1/3] riscv: Don't use va_pa_offset on kdump

On Fri, 07 Jan 2022 10:03:59 PST (-0800), [email protected] wrote:
> Hello Palmer,
>
> Any updates on those 3 patches ?

Sorry, I hadn't realized these were fixes so they got stuck in the
queue0. I do now remember you saying you had some fixes at the RISC-V
conference, but I guess that got lost as well. Including something like
"fix" or "-fixes" in a subject line always helps, but if I miss stuff
IRC's always a good bet as that'll at least make sure I see it when I'm
in front of the computer -- there's a lot of people who want things at
these conferences.

It's too late for fixes, but it looks like things have been broken for a
while so these will have to all get backported to stable regardless.

This is on for-next.

Thanks!

>
> Regards,
> Nick
>
> Στις 2021-11-26 20:04, Nick Kossifidis έγραψε:
>> On kdump instead of using an intermediate step to relocate the kernel,
>> that lives in a "control buffer" outside the current kernel's mapping,
>> we jump to the crash kernel directly by calling
>> riscv_kexec_norelocate().
>> The current implementation uses va_pa_offset while switching to
>> physical
>> addressing, however since we moved the kernel outside the linear
>> mapping
>> this won't work anymore since riscv_kexec_norelocate() is part of the
>> kernel mapping and we should use kernel_map.va_kernel_pa_offset, and
>> also
>> take XIP kernel into account.
>>
>> We don't really need to use va_pa_offset on riscv_kexec_norelocate, we
>> can just set STVEC to the physical address of the new kernel instead
>> and
>> let the hart jump to the new kernel on the next instruction after
>> setting
>> SATP to zero. This fixes kdump and is also simpler/cleaner.
>>
>> I tested this on the latest qemu and HiFive Unmatched and works as
>> expected.
>>
>> v2: I removed the direct jump after setting satp as suggested.
>>
>> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear
>> mapping")
>>
>> Signed-off-by: Nick Kossifidis <[email protected]>
>> Reviewed-by: Alexandre Ghiti <[email protected]>
>> Cc: <[email protected]> # 5.13
>> Cc: <[email protected]> # 5.14