LinuxLists.cc - [PATCH] riscv: Don't use va_pa

2021-10-02 12:24:51

Subject: [PATCH] riscv: Don't use va_pa_offset on kdump

On kdump instead of using an intermediate step to relocate the kernel, that
lives in a "control buffer" outside the current kernel's mapping, we jump
to the crash kernel directly by calling riscv_kexec_norelocate(). The
current implementation uses va_pa_offset while switching to physical
addressing, however since we moved the kernel outside the linear mapping
this won't work anymore since riscv_kexec_norelocate() is part of the
kernel mapping and we should use kernel_map.va_kernel_pa_offset, and also
take XIP kernel into account.

We don't really need to use va_pa_offset on riscv_kexec_norelocate, we can
just set STVEC to the physical address of the new kernel instead and let
the hart jump to the new kernel on the next instruction after setting
SATP to zero. This fixes kdump and is also simpler/cleaner.

Signed-off-by: Nick Kossifidis <[email protected]>
---
arch/riscv/kernel/kexec_relocate.S | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index a80b52a74..e2f34196e 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -159,25 +159,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
* s0: (const) Phys address to jump to
* s1: (const) Phys address of the FDT image
* s2: (const) The hartid of the current hart
- * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
*/
mv s0, a1
mv s1, a2
mv s2, a3
- mv s3, a4

/* Disable / cleanup interrupts */
csrw CSR_SIE, zero
csrw CSR_SIP, zero

- /* Switch to physical addressing */
- la s4, 1f
- sub s4, s4, s3
- csrw CSR_STVEC, s4
- csrw CSR_SATP, zero
-
-.align 2
-1:
/* Pass the arguments to the next kernel / Cleanup*/
mv a0, s2
mv a1, s1
@@ -214,6 +204,11 @@ SYM_CODE_START(riscv_kexec_norelocate)
csrw CSR_SCAUSE, zero
csrw CSR_SSCRATCH, zero

+ /* Switch to physical addressing */
+ csrw CSR_STVEC, a2
+ csrw CSR_SATP, zero
+
+ /* This will trigger a jump to CSR_STVEC anyway */
jalr zero, a2, 0
SYM_CODE_END(riscv_kexec_norelocate)

--
2.32.0

2021-10-06 11:15:35

by Alexandre Ghiti

[permalink] [raw]

Subject: Re: [PATCH] riscv: Don't use va_pa_offset on kdump

On Sat, Oct 2, 2021 at 2:23 PM Nick Kossifidis <[email protected]> wrote:
>
> On kdump instead of using an intermediate step to relocate the kernel, that
> lives in a "control buffer" outside the current kernel's mapping, we jump
> to the crash kernel directly by calling riscv_kexec_norelocate(). The
> current implementation uses va_pa_offset while switching to physical
> addressing, however since we moved the kernel outside the linear mapping
> this won't work anymore since riscv_kexec_norelocate() is part of the
> kernel mapping and we should use kernel_map.va_kernel_pa_offset, and also
> take XIP kernel into account.
>
> We don't really need to use va_pa_offset on riscv_kexec_norelocate, we can
> just set STVEC to the physical address of the new kernel instead and let
> the hart jump to the new kernel on the next instruction after setting
> SATP to zero. This fixes kdump and is also simpler/cleaner.
>
> Signed-off-by: Nick Kossifidis <[email protected]>
> ---
> arch/riscv/kernel/kexec_relocate.S | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
> index a80b52a74..e2f34196e 100644
> --- a/arch/riscv/kernel/kexec_relocate.S
> +++ b/arch/riscv/kernel/kexec_relocate.S
> @@ -159,25 +159,15 @@ SYM_CODE_START(riscv_kexec_norelocate)
> * s0: (const) Phys address to jump to
> * s1: (const) Phys address of the FDT image
> * s2: (const) The hartid of the current hart
> - * s3: (const) kernel_map.va_pa_offset, used when switching MMU off
> */
> mv s0, a1
> mv s1, a2
> mv s2, a3
> - mv s3, a4
>
> /* Disable / cleanup interrupts */
> csrw CSR_SIE, zero
> csrw CSR_SIP, zero
>
> - /* Switch to physical addressing */
> - la s4, 1f
> - sub s4, s4, s3
> - csrw CSR_STVEC, s4
> - csrw CSR_SATP, zero
> -
> -.align 2
> -1:
> /* Pass the arguments to the next kernel / Cleanup*/
> mv a0, s2
> mv a1, s1
> @@ -214,6 +204,11 @@ SYM_CODE_START(riscv_kexec_norelocate)
> csrw CSR_SCAUSE, zero
> csrw CSR_SSCRATCH, zero
>
> + /* Switch to physical addressing */
> + csrw CSR_STVEC, a2
> + csrw CSR_SATP, zero
> +
> + /* This will trigger a jump to CSR_STVEC anyway */
> jalr zero, a2, 0

The last jump to a2 can be removed since the fault will be triggered
before even reaching this instruction.

> SYM_CODE_END(riscv_kexec_norelocate)
>
> --
> 2.32.0
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

This patch fixes a regression introduced when moving the kernel to the
end of the address space, so we should add:
Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")

And it should be backported to 5.13 and 5.14. It seems that the
following tags should be enough:

Cc: <[email protected]> # 5.13
Cc: <[email protected]> # 5.14

And finally, you can add:

Reviewed-by: Alexandre Ghiti <[email protected]>

Thanks,

Alex

2021-10-09 13:20:06

by Nick Kossifidis

[permalink] [raw]

Subject: Re: [PATCH] riscv: Don't use va_pa_offset on kdump

Στις 2021-10-06 14:13, Alexandre Ghiti έγραψε:
>> +
>> + /* This will trigger a jump to CSR_STVEC anyway */
>> jalr zero, a2, 0
>
> The last jump to a2 can be removed since the fault will be triggered
> before even reaching this instruction.
>

Just switching SATP to zero doesn't generate a trap unless mstatus.TVM
is set (for visualization purposes). The hart will try and execute the
next instruction but it's not clear in the spec what happens in case the
code is cached, I don't want to rely solely on STVEC. I prefer having
this instruction there, note that some earlier QEMU versions also had
this behavior (the original kdump patch didn't set STVEC and it worked
fine after setting SATP to zero).

>
> This patch fixes a regression introduced when moving the kernel to the
> end of the address space, so we should add:
> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear
> mapping")
>
> And it should be backported to 5.13 and 5.14. It seems that the
> following tags should be enough:
>
> Cc: <[email protected]> # 5.13
> Cc: <[email protected]> # 5.14
>
> And finally, you can add:
>
> Reviewed-by: Alexandre Ghiti <[email protected]>
>

ACK, thanks ! I'll resend the patch with the tags you mentioned.

Regards,
Nick

2021-10-23 20:15:20

by Palmer Dabbelt

[permalink] [raw]

Subject: Re: [PATCH] riscv: Don't use va_pa_offset on kdump

On Sat, 09 Oct 2021 06:18:48 PDT (-0700), [email protected] wrote:
> Στις 2021-10-06 14:13, Alexandre Ghiti έγραψε:
>>> +
>>> + /* This will trigger a jump to CSR_STVEC anyway */
>>> jalr zero, a2, 0
>>
>> The last jump to a2 can be removed since the fault will be triggered
>> before even reaching this instruction.
>>
>
> Just switching SATP to zero doesn't generate a trap unless mstatus.TVM
> is set (for visualization purposes). The hart will try and execute the
> next instruction but it's not clear in the spec what happens in case the
> code is cached, I don't want to rely solely on STVEC. I prefer having
> this instruction there, note that some earlier QEMU versions also had
> this behavior (the original kdump patch didn't set STVEC and it worked
> fine after setting SATP to zero).

IIRC this came down to some very specific wording in the spec.
Something along the lines of the 0 in SATP meaning "no translation",
SFENCE.VMA ordering translations, and the general "if the spec doesn't
mention it then it has to work" logic. I thought I opened a spec issue
about this for clarification, but I can't find it.

That said, I'm perfectly fine taking the safe approach here as it's not
like the performance matters here. Warrants a comment, though.

>
>>
>> This patch fixes a regression introduced when moving the kernel to the
>> end of the address space, so we should add:
>> Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear
>> mapping")
>>
>> And it should be backported to 5.13 and 5.14. It seems that the
>> following tags should be enough:
>>
>> Cc: <[email protected]> # 5.13
>> Cc: <[email protected]> # 5.14
>>
>> And finally, you can add:
>>
>> Reviewed-by: Alexandre Ghiti <[email protected]>
>>
>
> ACK, thanks ! I'll resend the patch with the tags you mentioned.

I don't have a v2 in my inbox, did I miss something? Also, if it's just
the tags then it's generally not necessary to re-send something. The
comment does, though.

LMK if you want me to deal with this, or if there's going to be a v2.

Thanks!

2021-10-25 01:21:23

by Nick Kossifidis

[permalink] [raw]

Subject: Re: [PATCH] riscv: Don't use va_pa_offset on kdump

Στις 2021-10-23 23:14, Palmer Dabbelt έγραψε:
> On Sat, 09 Oct 2021 06:18:48 PDT (-0700), [email protected] wrote:
>> Στις 2021-10-06 14:13, Alexandre Ghiti έγραψε:
>>>> +
>>>> + /* This will trigger a jump to CSR_STVEC anyway */
>>>> jalr zero, a2, 0
>>>
>>> The last jump to a2 can be removed since the fault will be triggered
>>> before even reaching this instruction.
>>>
>>
>> Just switching SATP to zero doesn't generate a trap unless mstatus.TVM
>> is set (for visualization purposes). The hart will try and execute the
>> next instruction but it's not clear in the spec what happens in case
>> the
>> code is cached, I don't want to rely solely on STVEC. I prefer having
>> this instruction there, note that some earlier QEMU versions also had
>> this behavior (the original kdump patch didn't set STVEC and it worked
>> fine after setting SATP to zero).
>
> IIRC this came down to some very specific wording in the spec.
> Something along the lines of the 0 in SATP meaning "no translation",
> SFENCE.VMA ordering translations, and the general "if the spec doesn't
> mention it then it has to work" logic. I thought I opened a spec
> issue about this for clarification, but I can't find it.
>

I guess you mean this one:
https://github.com/riscv/riscv-isa-manual/issues/538

I couldn't find anything though regarding cached code, it's not that
there's going to be a load after setting satp to 0 if the code has been
cached, so even if the translation is cached we don't have a guarantee
that the next instruction will result a trap.

> That said, I'm perfectly fine taking the safe approach here as it's
> not like the performance matters here. Warrants a comment, though.
>

ACK

>
> I don't have a v2 in my inbox, did I miss something? Also, if it's
> just the tags then it's generally not necessary to re-send something.
> The comment does, though.
>
> LMK if you want me to deal with this, or if there's going to be a v2.
>
> Thanks!

I'll send a v2 with the tags and the comment.

Regards,
Nick