2022-08-16 21:03:07

by Conor Dooley

[permalink] [raw]
Subject: RISC-V reserved memory problems

Hey all,
We've run into a bit of a problem with reserved memory on PolarFire, or
more accurately a pair of problems that seem to have opposite fixes.

The first of these problems is triggered when trying to implement a
remoteproc driver. To get the reserved memory buffer, remoteproc
does an of_reserved_mem_lookup(), something like:

np = of_parse_phandle(pdev->of_node, "memory-region", 0);
if (!np)
return -EINVAL;

rmem = of_reserved_mem_lookup(np);
if (!rmem)
return -EINVAL;

of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
a match - but this was triggering kernel panics for us. We did some
debugging and found that the name string's pointer was pointing to an
address in the 0x4000_0000 range. The minimum reproduction for this
crash is attached - it hacks in some print_reserved_mem()s into
setup_vm_final() around a tlb flush so you can see the before/after.
(You'll need a reserved memory node in your dts to replicate)

The output is like so, with the same crash as in the remoteproc driver:

[ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
[ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
[ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
[ 0.000000] printk: bootconsole [ns16550a0] enabled
[ 0.000000] printk: debug: skip boot console de-registration.
[ 0.000000] efi: UEFI not found.
[ 0.000000] before flush
[ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
[ 0.000000] after flush
[ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac
[ 0.000000] Oops [#1]
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
[ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
[ 0.000000] epc : string+0x4a/0xea
[ 0.000000] ra : vsnprintf+0x1e4/0x336
[ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
[ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
[ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
[ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
[ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
[ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
[ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
[ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
[ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
[ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
[ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08
[ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
[ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
[ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
[ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
[ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
[ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
[ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
[ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
[ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
[ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
[ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

We traced this back to early_init_fdt_scan_reserved_mem() in
setup_bootmem() - moving it later back up the boot sequence to
after the dt has been remapped etc has fixed the problem for us.

The least movement to get it working is attached, and also pushed
here: git.kernel.org/conor/c/1735589baefc

The second problem is a bit more complicated to explain - but we
found the solution conflicted with the remoteproc fix as we had
to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
process to solve this one.

We want to have a node in our devicetree that contains some memory
that is non-cached & marked as reserved-memory. Maybe we have just
missed something, but from what we've seen:
- the really early setup looks at the dtb, picks the highest bit
of memory and puts the dtb etc there so it can start using it
- early_init_fdt_scan_reserved_mem() is then called, which figures
out if memory is reserved or not.

Unfortunately, the highest bit of memory is the non-cached bit so
everything falls over, but we can avoid this by moving the call to
early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
takes place right before it in setup_bootmem().

Obviously, both of these changes are moving the function call in
opposite directions and we can only really do one of them. We are not
sure if what we are doing with the non-cached reserved-memory section
is just not permitted & cannot work - or if this is something that
was overlooked for RISC-V specifically and works for other archs.

It does seem like the first issue is a real bug, and I am happy to
submit the patch for that whenever - but having two problems with
opposite fixes seemed as if there was something else lurking that we
just don't have enough understanding to detect.

Any help would be great!

Thanks,
Conor.




Attachments:
fix_reserved_mem.patch (1.80 kB)
fix_reserved_mem.patch
debug_reserved_memory.patch (2.41 kB)
debug_reserved_memory.patch
Download all attachments

2022-08-18 14:46:47

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On 8/16/22 22:41, [email protected] wrote:
> Hey all,
> We've run into a bit of a problem with reserved memory on PolarFire, or
> more accurately a pair of problems that seem to have opposite fixes.
>
> The first of these problems is triggered when trying to implement a
> remoteproc driver. To get the reserved memory buffer, remoteproc
> does an of_reserved_mem_lookup(), something like:
>
> np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> if (!np)
> return -EINVAL;
>
> rmem = of_reserved_mem_lookup(np);
> if (!rmem)
> return -EINVAL;
>
> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> a match - but this was triggering kernel panics for us. We did some
> debugging and found that the name string's pointer was pointing to an
> address in the 0x4000_0000 range. The minimum reproduction for this
> crash is attached - it hacks in some print_reserved_mem()s into
> setup_vm_final() around a tlb flush so you can see the before/after.
> (You'll need a reserved memory node in your dts to replicate)
>
> The output is like so, with the same crash as in the remoteproc driver:
>
> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
> [ 0.000000] printk: bootconsole [ns16550a0] enabled
> [ 0.000000] printk: debug: skip boot console de-registration.
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] before flush
> [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
> [ 0.000000] after flush
> [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac
> [ 0.000000] Oops [#1]
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
> [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
> [ 0.000000] epc : string+0x4a/0xea
> [ 0.000000] ra : vsnprintf+0x1e4/0x336
> [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
> [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
> [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
> [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
> [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
> [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
> [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
> [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
> [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
> [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
> [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08
> [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
> [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
> [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
> [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
> [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
> [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
> [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
> [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
> [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
> [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
> [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> We traced this back to early_init_fdt_scan_reserved_mem() in
> setup_bootmem() - moving it later back up the boot sequence to
> after the dt has been remapped etc has fixed the problem for us.
>
> The least movement to get it working is attached, and also pushed
> here: git.kernel.org/conor/c/1735589baefc
>
> The second problem is a bit more complicated to explain - but we
> found the solution conflicted with the remoteproc fix as we had
> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> process to solve this one.
>
> We want to have a node in our devicetree that contains some memory
> that is non-cached & marked as reserved-memory. Maybe we have just
> missed something, but from what we've seen:
> - the really early setup looks at the dtb, picks the highest bit
> of memory and puts the dtb etc there so it can start using it
> - early_init_fdt_scan_reserved_mem() is then called, which figures
> out if memory is reserved or not.
>
> Unfortunately, the highest bit of memory is the non-cached bit so
> everything falls over, but we can avoid this by moving the call to
> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> takes place right before it in setup_bootmem().
>
> Obviously, both of these changes are moving the function call in
> opposite directions and we can only really do one of them. We are not
> sure if what we are doing with the non-cached reserved-memory section
> is just not permitted & cannot work - or if this is something that
> was overlooked for RISC-V specifically and works for other archs.
>
> It does seem like the first issue is a real bug, and I am happy to
> submit the patch for that whenever - but having two problems with
> opposite fixes seemed as if there was something else lurking that we
> just don't have enough understanding to detect.
>
> Any help would be great!
>
> Thanks,
> Conor.
>
>
>

Hello Conor,

could you, please, provide the relevant device-tree sniplets.

Please, have a look at the no-map property in
Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt.
It controls if the kernel in any way will access the memory outside of
your new driver.

Best regards

Heinrich

2022-08-23 12:06:21

by Conor Dooley

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

Hey Heinrich,
Thanks for chiming in.

On 18/08/2022 15:32, Heinrich Schuchardt wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 8/16/22 22:41, [email protected] wrote:
>> Hey all,
>> We've run into a bit of a problem with reserved memory on PolarFire, or
>> more accurately a pair of problems that seem to have opposite fixes.
>>
>> The first of these problems is triggered when trying to implement a
>> remoteproc driver. To get the reserved memory buffer, remoteproc
>> does an of_reserved_mem_lookup(), something like:
>>
>>       np = of_parse_phandle(pdev->of_node, "memory-region", 0);
>>       if (!np)
>>               return -EINVAL;
>>
>>       rmem = of_reserved_mem_lookup(np);
>>       if (!rmem)
>>               return -EINVAL;
>>
>> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
>> a match - but this was triggering kernel panics for us. We did some
>> debugging and found that the name string's pointer was pointing to an
>> address in the 0x4000_0000 range. The minimum reproduction for this
>> crash is attached - it hacks in some print_reserved_mem()s into
>> setup_vm_final() around a tlb flush so you can see the before/after.
>> (You'll need a reserved memory node in your dts to replicate)
>>
>> The output is like so, with the same crash as in the remoteproc driver:
>>
>> [    0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
>> [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
>> [    0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
>> [    0.000000] printk: bootconsole [ns16550a0] enabled
>> [    0.000000] printk: debug: skip boot console de-registration.
>> [    0.000000] efi: UEFI not found.
>> [    0.000000] before flush
>> [    0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
>> [    0.000000] after flush
>> [    0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac
>> [    0.000000] Oops [#1]
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
>> [    0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
>> [    0.000000] epc : string+0x4a/0xea
>> [    0.000000]  ra : vsnprintf+0x1e4/0x336
>> [    0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
>> [    0.000000]  gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
>> [    0.000000]  t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
>> [    0.000000]  s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
>> [    0.000000]  a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
>> [    0.000000]  a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
>> [    0.000000]  s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
>> [    0.000000]  s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
>> [    0.000000]  s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
>> [    0.000000]  s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
>> [    0.000000]  t5 : ffffffff812f3618 t6 : ffffffff81203d08
>> [    0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
>> [    0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
>> [    0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
>> [    0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
>> [    0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
>> [    0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
>> [    0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
>> [    0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
>> [    0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
>> [    0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
>> [    0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>> [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>>
>> We traced this back to early_init_fdt_scan_reserved_mem() in
>> setup_bootmem() - moving it later back up the boot sequence to
>> after the dt has been remapped etc has fixed the problem for us.
>>
>> The least movement to get it working is attached, and also pushed
>> here: git.kernel.org/conor/c/1735589baefc
>>
>> The second problem is a bit more complicated to explain - but we
>> found the solution conflicted with the remoteproc fix as we had
>> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
>> process to solve this one.
>>
>> We want to have a node in our devicetree that contains some memory
>> that is non-cached & marked as reserved-memory. Maybe we have just
>> missed something, but from what we've seen:
>> - the really early setup looks at the dtb, picks the highest bit
>>     of memory and puts the dtb etc there so it can start using it
>> - early_init_fdt_scan_reserved_mem() is then called, which figures
>>     out if memory is reserved or not.
>>
>> Unfortunately, the highest bit of memory is the non-cached bit so
>> everything falls over, but we can avoid this by moving the call to
>> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
>> takes place right before it in setup_bootmem().
>>
>> Obviously, both of these changes are moving the function call in
>> opposite directions and we can only really do one of them. We are not
>> sure if what we are doing with the non-cached reserved-memory section
>> is just not permitted & cannot work - or if this is something that
>> was overlooked for RISC-V specifically and works for other archs.
>>
>> It does seem like the first issue is a real bug, and I am happy to
>> submit the patch for that whenever - but having two problems with
>> opposite fixes seemed as if there was something else lurking that we
>> just don't have enough understanding to detect.
>>
>> Any help would be great!

> could you, please, provide the relevant device-tree sniplets.

Sure. That "might" have been a good thing to do from the start..
For the first problem it is actually fairly straightforward,
something like the following triggered it for me:
reserved-memory {
ranges;
#size-cells = <2>;
#address-cells = <2>;

fabricbuf0: fabricbuf@0 {
compatible = "shared-dma-pool";
reg = <0x0 0xae000000 0x0 0x2000000>;
label = "fabricbuf0-ddr-c";
};
};

I was able to repro this with the stanza in u-boot's dt and
not in Linux's too.

>
> Please, have a look at the no-map property in
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt.

For those playing along at home, this has been moved to:
Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml

For this case, it is something *like* the following:
reserved-memory {
ranges;
#size-cells = <2>;
#address-cells = <2>;

dma_nc_hi: linux,dma {
compatible = "shared-dma-pool";
size = <0x0 0x1000000>;
no-map;
linux,dma-default;
alloc-ranges = <0x14 0x00000000 0x0 0x1000000>;
dma-ranges = <0x14 0x00000000 0x14 0x00000000 0x0 0x1000000>;
};
};

ddrc_cache_lo: memory@80000000 {
device_type = "memory";
reg = <0x0 0x80000000 0x0 0x2e000000>;
status = "okay";
label = "cache-lo";
};

ddrc_cache_hi: memory@1000000000 {
device_type = "memory";
reg = <0x10 0x0 0x0 0x20000000>;
status = "okay";
label = "cache-hi";
};

ddr_nc_hi: memory@1400000000 {
device_type = "memory";
reg = <0x14 0x00000000 0x0 0x1000000>;
status = "okay";
label = "non-cache-hi";
};

As you can see, that does in fact have a no-map in it. I have adapted
this slightly so that it would resemble the existing dts, so it not
the *exact* one the issue was found with but it is functionally the
same. Hope that helps explain things a little better.

Thanks,
Conor.

2022-11-03 12:38:07

by Evgenii Shatokhin

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

Hi,

On 16.08.2022 23:41, [email protected] wrote:
> Hey all,
> We've run into a bit of a problem with reserved memory on PolarFire, or
> more accurately a pair of problems that seem to have opposite fixes.
>
> The first of these problems is triggered when trying to implement a
> remoteproc driver. To get the reserved memory buffer, remoteproc
> does an of_reserved_mem_lookup(), something like:
>
> np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> if (!np)
> return -EINVAL;
>
> rmem = of_reserved_mem_lookup(np);
> if (!rmem)
> return -EINVAL;
>
> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> a match - but this was triggering kernel panics for us. We did some
> debugging and found that the name string's pointer was pointing to an
> address in the 0x4000_0000 range. The minimum reproduction for this
> crash is attached - it hacks in some print_reserved_mem()s into
> setup_vm_final() around a tlb flush so you can see the before/after.
> (You'll need a reserved memory node in your dts to replicate)
>
> The output is like so, with the same crash as in the remoteproc driver:
>
> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
> [ 0.000000] printk: bootconsole [ns16550a0] enabled
> [ 0.000000] printk: debug: skip boot console de-registration.
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] before flush
> [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
> [ 0.000000] after flush
> [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac
> [ 0.000000] Oops [#1]
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
> [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
> [ 0.000000] epc : string+0x4a/0xea
> [ 0.000000] ra : vsnprintf+0x1e4/0x336
> [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
> [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
> [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
> [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
> [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
> [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
> [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
> [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
> [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
> [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
> [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08
> [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
> [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
> [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
> [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
> [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
> [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
> [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
> [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
> [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
> [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
> [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> We traced this back to early_init_fdt_scan_reserved_mem() in
> setup_bootmem() - moving it later back up the boot sequence to
> after the dt has been remapped etc has fixed the problem for us.
>
> The least movement to get it working is attached, and also pushed
> here: git.kernel.org/conor/c/1735589baefc

Any updates on this?

I have encountered the same issue with invalid reserved_mem[i].name
pointers recently, while working on a remoteproc driver for our
RISC-V-based SoC.

I can confirm that "riscv: fix reserved memory setup"
(git.kernel.org/conor/c/1735589baefc) fixes the issue in our kernel
based on 5.15.x.

Your patch does not seem to have any adverse side-effects either, so:

Tested-by: Evgenii Shatokhin <[email protected]>

If there are newer versions or variants of the fix, I'll be glad to test
them too.

By the way, I wonder why arm and aarch64 do not seem to be affected by
the issue. As far as I can see, these architectures also populate
reserved_mem[] before switching to the final memory mapping during
kernel init. I have not dug deep into that though.

>
> The second problem is a bit more complicated to explain - but we
> found the solution conflicted with the remoteproc fix as we had
> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> process to solve this one.
>
> We want to have a node in our devicetree that contains some memory
> that is non-cached & marked as reserved-memory. Maybe we have just
> missed something, but from what we've seen:
> - the really early setup looks at the dtb, picks the highest bit
> of memory and puts the dtb etc there so it can start using it
> - early_init_fdt_scan_reserved_mem() is then called, which figures
> out if memory is reserved or not.
>
> Unfortunately, the highest bit of memory is the non-cached bit so
> everything falls over, but we can avoid this by moving the call to
> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> takes place right before it in setup_bootmem().
>
> Obviously, both of these changes are moving the function call in
> opposite directions and we can only really do one of them. We are not
> sure if what we are doing with the non-cached reserved-memory section
> is just not permitted & cannot work - or if this is something that
> was overlooked for RISC-V specifically and works for other archs.
>
> It does seem like the first issue is a real bug, and I am happy to
> submit the patch for that whenever - but having two problems with
> opposite fixes seemed as if there was something else lurking that we
> just don't have enough understanding to detect.
>
> Any help would be great!
>
> Thanks,
> Conor.
>
>
>

Regards,
Evgenii

2022-11-03 12:40:04

by Conor Dooley

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On Thu, Nov 03, 2022 at 02:46:37PM +0300, Evgenii Shatokhin wrote:
> Hi,
>
> On 16.08.2022 23:41, [email protected] wrote:
> > Hey all,
> > We've run into a bit of a problem with reserved memory on PolarFire, or
> > more accurately a pair of problems that seem to have opposite fixes.
> >
> > The first of these problems is triggered when trying to implement a
> > remoteproc driver. To get the reserved memory buffer, remoteproc
> > does an of_reserved_mem_lookup(), something like:
> >
> > np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> > if (!np)
> > return -EINVAL;
> >
> > rmem = of_reserved_mem_lookup(np);
> > if (!rmem)
> > return -EINVAL;
> >
> > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> > a match - but this was triggering kernel panics for us. We did some
> > debugging and found that the name string's pointer was pointing to an
> > address in the 0x4000_0000 range. The minimum reproduction for this
> > crash is attached - it hacks in some print_reserved_mem()s into
> > setup_vm_final() around a tlb flush so you can see the before/after.
> > (You'll need a reserved memory node in your dts to replicate)
> >
> > The output is like so, with the same crash as in the remoteproc driver:
> >
> > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
> > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
> > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
> > [ 0.000000] printk: bootconsole [ns16550a0] enabled
> > [ 0.000000] printk: debug: skip boot console de-registration.
> > [ 0.000000] efi: UEFI not found.
> > [ 0.000000] before flush
> > [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
> > [ 0.000000] after flush
> > [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac
> > [ 0.000000] Oops [#1]
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
> > [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
> > [ 0.000000] epc : string+0x4a/0xea
> > [ 0.000000] ra : vsnprintf+0x1e4/0x336
> > [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
> > [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
> > [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
> > [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
> > [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
> > [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
> > [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
> > [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
> > [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
> > [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
> > [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08
> > [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
> > [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
> > [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
> > [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
> > [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
> > [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
> > [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
> > [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
> > [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
> > [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
> > [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> >
> > We traced this back to early_init_fdt_scan_reserved_mem() in
> > setup_bootmem() - moving it later back up the boot sequence to
> > after the dt has been remapped etc has fixed the problem for us.
> >
> > The least movement to get it working is attached, and also pushed
> > here: git.kernel.org/conor/c/1735589baefc
>
> Any updates on this?

"Yes". I /briefly/ chatted with Palmer about this at Plumbers. (see
below). Funny timing from you replying though, I'm planning on spending
the next days/week/weeks trying to sort this whole thing out - it's come
to a head for us.

> I have encountered the same issue with invalid reserved_mem[i].name pointers
> recently, while working on a remoteproc driver for our RISC-V-based SoC.
>
> I can confirm that "riscv: fix reserved memory setup"
> (git.kernel.org/conor/c/1735589baefc) fixes the issue in our kernel based on
> 5.15.x.

Aye, we're on 5.15 for our vendor tree too so that's where we found the
problem initally.

> Your patch does not seem to have any adverse side-effects either, so:

This patch itself, from what we can see, doesn't have any adverse
side-effects. The other issue that I pointed out in this mail with
reserved memory allocations requires an opposite fix, so there's
something else going on that needs digging into. When I spoke with
Palmer, I said I'd spend the time looking at it but just have not got
around to doing it until now.
I think it may make some sense to try and merge this patch as it at least
makes things better and unblocks people wanting to upstream remoteproc
drivers etc.
And then when I (eventually?) come up with something better maybe it can
build on that.

I'll resend this patch as standalone early next week unless I've somehow
made a breakthrough between now and then.

> Tested-by: Evgenii Shatokhin <[email protected]>

Thanks!

> If there are newer versions or variants of the fix, I'll be glad to test
> them too.

This patch is currently the most recent work I have done, but hopefully
that's going to change.

> By the way, I wonder why arm and aarch64 do not seem to be affected by the
> issue. As far as I can see, these architectures also populate reserved_mem[]
> before switching to the final memory mapping during kernel init. I have not
> dug deep into that though.

Aye, that's at least where I will be starting to do comparisons..

> > The second problem is a bit more complicated to explain - but we
> > found the solution conflicted with the remoteproc fix as we had
> > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> > process to solve this one.
> >
> > We want to have a node in our devicetree that contains some memory
> > that is non-cached & marked as reserved-memory. Maybe we have just
> > missed something, but from what we've seen:
> > - the really early setup looks at the dtb, picks the highest bit
> > of memory and puts the dtb etc there so it can start using it
> > - early_init_fdt_scan_reserved_mem() is then called, which figures
> > out if memory is reserved or not.
> >
> > Unfortunately, the highest bit of memory is the non-cached bit so
> > everything falls over, but we can avoid this by moving the call to
> > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> > takes place right before it in setup_bootmem().
> >
> > Obviously, both of these changes are moving the function call in
> > opposite directions and we can only really do one of them. We are not
> > sure if what we are doing with the non-cached reserved-memory section
> > is just not permitted & cannot work - or if this is something that
> > was overlooked for RISC-V specifically and works for other archs.
> >
> > It does seem like the first issue is a real bug, and I am happy to
> > submit the patch for that whenever - but having two problems with
> > opposite fixes seemed as if there was something else lurking that we
> > just don't have enough understanding to detect.
> >
> > Any help would be great!


2023-02-02 22:02:15

by Conor Dooley

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

Hullo Palmer, Mike & whoever else may read this,

Just reviving this thread from a little while ago as I have been in the
area again recently...

On Tue, Aug 16, 2022 at 08:41:05PM +0000, [email protected] wrote:
> Hey all,
> We've run into a bit of a problem with reserved memory on PolarFire, or
> more accurately a pair of problems that seem to have opposite fixes.
>
> The first of these problems is triggered when trying to implement a
> remoteproc driver. To get the reserved memory buffer, remoteproc
> does an of_reserved_mem_lookup(), something like:
>
> np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> if (!np)
> return -EINVAL;
>
> rmem = of_reserved_mem_lookup(np);
> if (!rmem)
> return -EINVAL;
>
> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> a match - but this was triggering kernel panics for us. We did some
> debugging and found that the name string's pointer was pointing to an
> address in the 0x4000_0000 range. The minimum reproduction for this
> crash is attached - it hacks in some print_reserved_mem()s into
> setup_vm_final() around a tlb flush so you can see the before/after.
> (You'll need a reserved memory node in your dts to replicate)
>
> The output is like so, with the same crash as in the remoteproc driver:
>
> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022

[...]

> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> We traced this back to early_init_fdt_scan_reserved_mem() in
> setup_bootmem() - moving it later back up the boot sequence to
> after the dt has been remapped etc has fixed the problem for us.
>
> The least movement to get it working is attached, and also pushed
> here: git.kernel.org/conor/c/1735589baefc

This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved
memory setup").

> The second problem is a bit more complicated to explain - but we
> found the solution conflicted with the remoteproc fix as we had
> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> process to solve this one.
>
> We want to have a node in our devicetree that contains some memory
> that is non-cached & marked as reserved-memory. Maybe we have just
> missed something, but from what we've seen:
> - the really early setup looks at the dtb, picks the highest bit
> of memory and puts the dtb etc there so it can start using it
> - early_init_fdt_scan_reserved_mem() is then called, which figures
> out if memory is reserved or not.
>
> Unfortunately, the highest bit of memory is the non-cached bit so
> everything falls over, but we can avoid this by moving the call to
> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> takes place right before it in setup_bootmem().
>
> Obviously, both of these changes are moving the function call in
> opposite directions and we can only really do one of them. We are not
> sure if what we are doing with the non-cached reserved-memory section
> is just not permitted & cannot work - or if this is something that
> was overlooked for RISC-V specifically and works for other archs.

We ended up working around this one by making sure that U-Boot loaded
the dtb to somewhere that would be inside the kernel's memory map, thus
avoiding the remapping in the first place.

We did run into another problem recently though, and 50e63dd8ed92 is
kinda at fault for it.
This particular issue was encountered with a devicetree where the
top-most memory region was entirely reserved & was not observed prior
to my fix for the first issue.

On RISC-V, the boot sequence is something like:
setup_bootmem();
setup_vm_final();
unflatten_device_tree();
early_init_fdt_scan_reserved_mem();

Whereas, before my patch it used to be (give-or-take):
setup_bootmem();
early_init_fdt_scan_reserved_mem();
setup_vm_final();
unflatten_device_tree();

The difference being that we used to have scanned the reserved memory
regions before calling setup_vm_final() & therefore know which regions
we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem()
before we've got the dt in a proper virtual memory address will cause
the kernel to panic if it tries to read a reserved memory node's label.

As we are now calling setup_vm_final() *before* we know what the
reserved memory regions are & as RISC-V allocates memblocks from the top
down, the allocations in setup_vm_final() will be done in the highest
memory region.
When early_init_fdt_scan_reserved_mem() then tries to reserve the
entirety of that top-most memory region, the reservation fails as part
of this region has already been allocated.
In the scenario where I found this bug, that top-most region is non-
cached memory & the kernel ends up panicking.
The memblock debug code made this pretty easy to spot, otherwise I'd
probably have spent more than just a few hours trying to figure out why
it was panicking!

My "this needs to be fixed today" solution for this problem was calling
memblock_set_bottom_up(true) in setup_bootmem() & that's what we are
going to carry downstream for now.

I haven't tested it (yet) but I suspect that it would also fix our
problem of the dtb being remapped into a non-cached region of memory
that we would later go on to reserve too. Non-cached being an issue
mainly due to the panicking, but failing to reserve (and using!) memory
regions that are meant to be reserved is very far from ideal even when
they are memory that the kernel can actually use.

I have no idea if that is an acceptable solution for upstream though, so
I guess this is me putting out feelers as to whether this is something I
should send a patch to do *OR* if this is another sign of the issues
that you (Mike, Palmer) mentioned in the past.
If it isn't an acceptable solution, I'm not really too sure how to
proceed!

Cheers,
Conor.


Attachments:
(No filename) (5.85 kB)
signature.asc (228.00 B)
Download all attachments

2023-03-07 11:36:32

by Mike Rapoport

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

Hi Conor,

Sorry for the delay, somehow this slipped between the cracks.

On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote:
> Hullo Palmer, Mike & whoever else may read this,
>
> Just reviving this thread from a little while ago as I have been in the
> area again recently...

TBH, I didn't really dig deep into the issues, but the thought I had was
what if DT was mapped via fixmap until the setup_vm_final() and then it
would be possible to call DT methods early.

Could be I'm shooting in the dark :)

> On Tue, Aug 16, 2022 at 08:41:05PM +0000, [email protected] wrote:
> > Hey all,
> > We've run into a bit of a problem with reserved memory on PolarFire, or
> > more accurately a pair of problems that seem to have opposite fixes.
> >
> > The first of these problems is triggered when trying to implement a
> > remoteproc driver. To get the reserved memory buffer, remoteproc
> > does an of_reserved_mem_lookup(), something like:
> >
> > np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> > if (!np)
> > return -EINVAL;
> >
> > rmem = of_reserved_mem_lookup(np);
> > if (!rmem)
> > return -EINVAL;
> >
> > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> > a match - but this was triggering kernel panics for us. We did some
> > debugging and found that the name string's pointer was pointing to an
> > address in the 0x4000_0000 range. The minimum reproduction for this
> > crash is attached - it hacks in some print_reserved_mem()s into
> > setup_vm_final() around a tlb flush so you can see the before/after.
> > (You'll need a reserved memory node in your dts to replicate)
> >
> > The output is like so, with the same crash as in the remoteproc driver:
> >
> > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
>
> [...]
>
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> >
> > We traced this back to early_init_fdt_scan_reserved_mem() in
> > setup_bootmem() - moving it later back up the boot sequence to
> > after the dt has been remapped etc has fixed the problem for us.
> >
> > The least movement to get it working is attached, and also pushed
> > here: git.kernel.org/conor/c/1735589baefc
>
> This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved
> memory setup").
>
> > The second problem is a bit more complicated to explain - but we
> > found the solution conflicted with the remoteproc fix as we had
> > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> > process to solve this one.
> >
> > We want to have a node in our devicetree that contains some memory
> > that is non-cached & marked as reserved-memory. Maybe we have just
> > missed something, but from what we've seen:
> > - the really early setup looks at the dtb, picks the highest bit
> > of memory and puts the dtb etc there so it can start using it
> > - early_init_fdt_scan_reserved_mem() is then called, which figures
> > out if memory is reserved or not.
> >
> > Unfortunately, the highest bit of memory is the non-cached bit so
> > everything falls over, but we can avoid this by moving the call to
> > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> > takes place right before it in setup_bootmem().
> >
> > Obviously, both of these changes are moving the function call in
> > opposite directions and we can only really do one of them. We are not
> > sure if what we are doing with the non-cached reserved-memory section
> > is just not permitted & cannot work - or if this is something that
> > was overlooked for RISC-V specifically and works for other archs.
>
> We ended up working around this one by making sure that U-Boot loaded
> the dtb to somewhere that would be inside the kernel's memory map, thus
> avoiding the remapping in the first place.
>
> We did run into another problem recently though, and 50e63dd8ed92 is
> kinda at fault for it.
> This particular issue was encountered with a devicetree where the
> top-most memory region was entirely reserved & was not observed prior
> to my fix for the first issue.
>
> On RISC-V, the boot sequence is something like:
> setup_bootmem();
> setup_vm_final();
> unflatten_device_tree();
> early_init_fdt_scan_reserved_mem();
>
> Whereas, before my patch it used to be (give-or-take):
> setup_bootmem();
> early_init_fdt_scan_reserved_mem();
> setup_vm_final();
> unflatten_device_tree();
>
> The difference being that we used to have scanned the reserved memory
> regions before calling setup_vm_final() & therefore know which regions
> we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem()
> before we've got the dt in a proper virtual memory address will cause
> the kernel to panic if it tries to read a reserved memory node's label.
>
> As we are now calling setup_vm_final() *before* we know what the
> reserved memory regions are & as RISC-V allocates memblocks from the top
> down, the allocations in setup_vm_final() will be done in the highest
> memory region.
> When early_init_fdt_scan_reserved_mem() then tries to reserve the
> entirety of that top-most memory region, the reservation fails as part
> of this region has already been allocated.
> In the scenario where I found this bug, that top-most region is non-
> cached memory & the kernel ends up panicking.
> The memblock debug code made this pretty easy to spot, otherwise I'd
> probably have spent more than just a few hours trying to figure out why
> it was panicking!
>
> My "this needs to be fixed today" solution for this problem was calling
> memblock_set_bottom_up(true) in setup_bootmem() & that's what we are
> going to carry downstream for now.
>
> I haven't tested it (yet) but I suspect that it would also fix our
> problem of the dtb being remapped into a non-cached region of memory
> that we would later go on to reserve too. Non-cached being an issue
> mainly due to the panicking, but failing to reserve (and using!) memory
> regions that are meant to be reserved is very far from ideal even when
> they are memory that the kernel can actually use.
>
> I have no idea if that is an acceptable solution for upstream though, so
> I guess this is me putting out feelers as to whether this is something I
> should send a patch to do *OR* if this is another sign of the issues
> that you (Mike, Palmer) mentioned in the past.
> If it isn't an acceptable solution, I'm not really too sure how to
> proceed!
>
> Cheers,
> Conor.
>



--
Sincerely yours,
Mike.

2023-03-09 09:15:50

by Conor Dooley

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On Tue, Mar 07, 2023 at 01:35:11PM +0200, Mike Rapoport wrote:
> Hi Conor,
>
> Sorry for the delay, somehow this slipped between the cracks.

No worries.

> On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote:
> > Hullo Palmer, Mike & whoever else may read this,
> >
> > Just reviving this thread from a little while ago as I have been in the
> > area again recently...
>
> TBH, I didn't really dig deep into the issues,

I only preserved most of the context here to point out that it wasn't an
isolated issue, the top-down/bottom-up bit is the main part that I was
interested in. The others are fixed, or workaround-able without
"harming" anyone else.

> but the thought I had was
> what if DT was mapped via fixmap until the setup_vm_final() and then it
> would be possible to call DT methods early.

From my memory, this would be more along the lines of what arm64 does.
I'll give it a shot and see how it goes. I figure it'll take me some
time!

> Could be I'm shooting in the dark :)

A pointer on where to start is helpful, even if it is "rewrite a bunch
of stuff".

Cheers,
Conor.

> > On Tue, Aug 16, 2022 at 08:41:05PM +0000, [email protected] wrote:
> > > Hey all,
> > > We've run into a bit of a problem with reserved memory on PolarFire, or
> > > more accurately a pair of problems that seem to have opposite fixes.
> > >
> > > The first of these problems is triggered when trying to implement a
> > > remoteproc driver. To get the reserved memory buffer, remoteproc
> > > does an of_reserved_mem_lookup(), something like:
> > >
> > > np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> > > if (!np)
> > > return -EINVAL;
> > >
> > > rmem = of_reserved_mem_lookup(np);
> > > if (!rmem)
> > > return -EINVAL;
> > >
> > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> > > a match - but this was triggering kernel panics for us. We did some
> > > debugging and found that the name string's pointer was pointing to an
> > > address in the 0x4000_0000 range. The minimum reproduction for this
> > > crash is attached - it hacks in some print_reserved_mem()s into
> > > setup_vm_final() around a tlb flush so you can see the before/after.
> > > (You'll need a reserved memory node in your dts to replicate)
> > >
> > > The output is like so, with the same crash as in the remoteproc driver:
> > >
> > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
> >
> > [...]
> >
> > > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> > >
> > > We traced this back to early_init_fdt_scan_reserved_mem() in
> > > setup_bootmem() - moving it later back up the boot sequence to
> > > after the dt has been remapped etc has fixed the problem for us.
> > >
> > > The least movement to get it working is attached, and also pushed
> > > here: git.kernel.org/conor/c/1735589baefc
> >
> > This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved
> > memory setup").
> >
> > > The second problem is a bit more complicated to explain - but we
> > > found the solution conflicted with the remoteproc fix as we had
> > > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> > > process to solve this one.
> > >
> > > We want to have a node in our devicetree that contains some memory
> > > that is non-cached & marked as reserved-memory. Maybe we have just
> > > missed something, but from what we've seen:
> > > - the really early setup looks at the dtb, picks the highest bit
> > > of memory and puts the dtb etc there so it can start using it
> > > - early_init_fdt_scan_reserved_mem() is then called, which figures
> > > out if memory is reserved or not.
> > >
> > > Unfortunately, the highest bit of memory is the non-cached bit so
> > > everything falls over, but we can avoid this by moving the call to
> > > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> > > takes place right before it in setup_bootmem().
> > >
> > > Obviously, both of these changes are moving the function call in
> > > opposite directions and we can only really do one of them. We are not
> > > sure if what we are doing with the non-cached reserved-memory section
> > > is just not permitted & cannot work - or if this is something that
> > > was overlooked for RISC-V specifically and works for other archs.
> >
> > We ended up working around this one by making sure that U-Boot loaded
> > the dtb to somewhere that would be inside the kernel's memory map, thus
> > avoiding the remapping in the first place.
> >
> > We did run into another problem recently though, and 50e63dd8ed92 is
> > kinda at fault for it.
> > This particular issue was encountered with a devicetree where the
> > top-most memory region was entirely reserved & was not observed prior
> > to my fix for the first issue.
> >
> > On RISC-V, the boot sequence is something like:
> > setup_bootmem();
> > setup_vm_final();
> > unflatten_device_tree();
> > early_init_fdt_scan_reserved_mem();
> >
> > Whereas, before my patch it used to be (give-or-take):
> > setup_bootmem();
> > early_init_fdt_scan_reserved_mem();
> > setup_vm_final();
> > unflatten_device_tree();
> >
> > The difference being that we used to have scanned the reserved memory
> > regions before calling setup_vm_final() & therefore know which regions
> > we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem()
> > before we've got the dt in a proper virtual memory address will cause
> > the kernel to panic if it tries to read a reserved memory node's label.
> >
> > As we are now calling setup_vm_final() *before* we know what the
> > reserved memory regions are & as RISC-V allocates memblocks from the top
> > down, the allocations in setup_vm_final() will be done in the highest
> > memory region.
> > When early_init_fdt_scan_reserved_mem() then tries to reserve the
> > entirety of that top-most memory region, the reservation fails as part
> > of this region has already been allocated.
> > In the scenario where I found this bug, that top-most region is non-
> > cached memory & the kernel ends up panicking.
> > The memblock debug code made this pretty easy to spot, otherwise I'd
> > probably have spent more than just a few hours trying to figure out why
> > it was panicking!
> >
> > My "this needs to be fixed today" solution for this problem was calling
> > memblock_set_bottom_up(true) in setup_bootmem() & that's what we are
> > going to carry downstream for now.
> >
> > I haven't tested it (yet) but I suspect that it would also fix our
> > problem of the dtb being remapped into a non-cached region of memory
> > that we would later go on to reserve too. Non-cached being an issue
> > mainly due to the panicking, but failing to reserve (and using!) memory
> > regions that are meant to be reserved is very far from ideal even when
> > they are memory that the kernel can actually use.
> >
> > I have no idea if that is an acceptable solution for upstream though, so
> > I guess this is me putting out feelers as to whether this is something I
> > should send a patch to do *OR* if this is another sign of the issues
> > that you (Mike, Palmer) mentioned in the past.
> > If it isn't an acceptable solution, I'm not really too sure how to
> > proceed!
> >
> > Cheers,
> > Conor.
> >
>
>
>
> --
> Sincerely yours,
> Mike.
>


Attachments:
(No filename) (7.33 kB)
signature.asc (228.00 B)
Download all attachments

2023-03-09 10:30:20

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

Hi Conor,

On 8/16/22 22:41, [email protected] wrote:
> Hey all,
> We've run into a bit of a problem with reserved memory on PolarFire, or
> more accurately a pair of problems that seem to have opposite fixes.
>
> The first of these problems is triggered when trying to implement a
> remoteproc driver. To get the reserved memory buffer, remoteproc
> does an of_reserved_mem_lookup(), something like:
>
> np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> if (!np)
> return -EINVAL;
>
> rmem = of_reserved_mem_lookup(np);
> if (!rmem)
> return -EINVAL;
>
> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> a match - but this was triggering kernel panics for us. We did some
> debugging and found that the name string's pointer was pointing to an
> address in the 0x4000_0000 range. The minimum reproduction for this


0x4000_0000 corresponds to DTB_EARLY_BASE_VA: this is the address that
is used to map the dtb before we can access it using the linear mapping.


> crash is attached - it hacks in some print_reserved_mem()s into
> setup_vm_final() around a tlb flush so you can see the before/after.
> (You'll need a reserved memory node in your dts to replicate)
>
> The output is like so, with the same crash as in the remoteproc driver:
>
> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
> [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
> [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
> [ 0.000000] printk: bootconsole [ns16550a0] enabled
> [ 0.000000] printk: debug: skip boot console de-registration.
> [ 0.000000] efi: UEFI not found.
> [ 0.000000] before flush
> [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
> [ 0.000000] after flush
> [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac


You take the trap here because the mapping for the dtb does not exist in
swapper_pg_dir, but you don't need this mapping anymore as you can
access the device tree through the linear mapping now.

I would say that: you build your kernel with CONFIG_BUILTIN_DTB and then
you don't call early_init_dt_verify which resets initial_boot_params to
the linear mapping address (it was initially set to 0x4000_0000 in
parse_dtb). If that's the case, does the following fix your issue?


diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 376d2827e736..2b09f0bd8432 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -276,6 +276,7 @@ void __init setup_arch(char **cmdline_p)
        efi_init();
        paging_init();
 #if IS_ENABLED(CONFIG_BUILTIN_DTB)
+       initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
        unflatten_and_copy_device_tree();
 #else
        if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))


> [ 0.000000] Oops [#1]
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
> [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
> [ 0.000000] epc : string+0x4a/0xea
> [ 0.000000] ra : vsnprintf+0x1e4/0x336
> [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
> [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
> [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
> [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
> [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
> [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
> [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
> [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
> [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
> [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
> [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08
> [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
> [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
> [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
> [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
> [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
> [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
> [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
> [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
> [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
> [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
> [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>
> We traced this back to early_init_fdt_scan_reserved_mem() in
> setup_bootmem() - moving it later back up the boot sequence to
> after the dt has been remapped etc has fixed the problem for us.
>
> The least movement to get it working is attached, and also pushed
> here: git.kernel.org/conor/c/1735589baefc
>
> The second problem is a bit more complicated to explain - but we
> found the solution conflicted with the remoteproc fix as we had
> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> process to solve this one.
>
> We want to have a node in our devicetree that contains some memory
> that is non-cached & marked as reserved-memory. Maybe we have just
> missed something, but from what we've seen:
> - the really early setup looks at the dtb, picks the highest bit
> of memory and puts the dtb etc there so it can start using it
> - early_init_fdt_scan_reserved_mem() is then called, which figures
> out if memory is reserved or not.
>
> Unfortunately, the highest bit of memory is the non-cached bit so
> everything falls over, but we can avoid this by moving the call to
> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> takes place right before it in setup_bootmem().


And then I suppose the allocations you are mentioning happen in
unflatten_XXX, so parsing the device tree for reserved memory nodes
before this should do the trick. Does the following fix your second issue?

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 2b09f0bd8432..94b3d049fe9d 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -277,14 +277,15 @@ void __init setup_arch(char **cmdline_p)
        paging_init();
 #if IS_ENABLED(CONFIG_BUILTIN_DTB)
        initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
+       early_init_fdt_scan_reserved_mem();
        unflatten_and_copy_device_tree();
 #else
-       if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))
+       if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) {
+               early_init_fdt_scan_reserved_mem();
                unflatten_device_tree();
-       else
+       } else
                pr_err("No DTB found in kernel mappings\n");
 #endif
-       early_init_fdt_scan_reserved_mem();
        misc_mem_init();

        init_resources();



>
> Obviously, both of these changes are moving the function call in
> opposite directions and we can only really do one of them. We are not
> sure if what we are doing with the non-cached reserved-memory section
> is just not permitted & cannot work - or if this is something that
> was overlooked for RISC-V specifically and works for other archs.
>
> It does seem like the first issue is a real bug, and I am happy to
> submit the patch for that whenever - but having two problems with
> opposite fixes seemed as if there was something else lurking that we
> just don't have enough understanding to detect.
>
> Any help would be great!
>
> Thanks,
> Conor.
>

Even if that does not fix your issue, the first patch is necessary as it
fixes initial_boot_params.


>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-03-09 12:31:41

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On 3/9/23 11:30, Alexandre Ghiti wrote:
> Hi Conor,
>
> On 8/16/22 22:41, [email protected] wrote:
>> Hey all,
>> We've run into a bit of a problem with reserved memory on PolarFire, or
>> more accurately a pair of problems that seem to have opposite fixes.
>>
>> The first of these problems is triggered when trying to implement a
>> remoteproc driver. To get the reserved memory buffer, remoteproc
>> does an of_reserved_mem_lookup(), something like:
>>
>>     np = of_parse_phandle(pdev->of_node, "memory-region", 0);
>>     if (!np)
>>         return -EINVAL;
>>
>>     rmem = of_reserved_mem_lookup(np);
>>     if (!rmem)
>>         return -EINVAL;
>>
>> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
>> a match - but this was triggering kernel panics for us. We did some
>> debugging and found that the name string's pointer was pointing to an
>> address in the 0x4000_0000 range. The minimum reproduction for this
>
>
> 0x4000_0000 corresponds to DTB_EARLY_BASE_VA: this is the address that
> is used to map the dtb before we can access it using the linear mapping.
>
>
>> crash is attached - it hacks in some print_reserved_mem()s into
>> setup_vm_final() around a tlb flush so you can see the before/after.
>> (You'll need a reserved memory node in your dts to replicate)
>>
>> The output is like so, with the same crash as in the remoteproc driver:
>>
>> [    0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834
>> (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0,
>> GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
>> [    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
>> [    0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000
>> (options '115200n8')
>> [    0.000000] printk: bootconsole [ns16550a0] enabled
>> [    0.000000] printk: debug: skip boot console de-registration.
>> [    0.000000] efi: UEFI not found.
>> [    0.000000] before flush
>> [    0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
>> [    0.000000] after flush
>> [    0.000000] Unable to handle kernel paging request at virtual
>> address 00000000401c31ac
>
>
> You take the trap here because the mapping for the dtb does not exist
> in swapper_pg_dir, but you don't need this mapping anymore as you can
> access the device tree through the linear mapping now.


You can forget everything below, I was completely wrong. I'll follow up
on Mike's answer.


>
> I would say that: you build your kernel with CONFIG_BUILTIN_DTB and
> then you don't call early_init_dt_verify which resets
> initial_boot_params to the linear mapping address (it was initially
> set to 0x4000_0000 in parse_dtb). If that's the case, does the
> following fix your issue?
>
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 376d2827e736..2b09f0bd8432 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -276,6 +276,7 @@ void __init setup_arch(char **cmdline_p)
>         efi_init();
>         paging_init();
>  #if IS_ENABLED(CONFIG_BUILTIN_DTB)
> +       initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
>         unflatten_and_copy_device_tree();
>  #else
>         if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))
>
>
>> [    0.000000] Oops [#1]
>> [    0.000000] Modules linked in:
>> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
>> 6.0.0-rc1-00001-g0d9d6953d834 #1
>> [    0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
>> [    0.000000] epc : string+0x4a/0xea
>> [    0.000000]  ra : vsnprintf+0x1e4/0x336
>> [    0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp :
>> ffffffff81203be0
>> [    0.000000]  gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 :
>> 0000000000000000
>> [    0.000000]  t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 :
>> ffffffff81203c20
>> [    0.000000]  s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 :
>> 0000000000000000
>> [    0.000000]  a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 :
>> ffffffffffffffff
>> [    0.000000]  a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 :
>> ffffffffffffffff
>> [    0.000000]  s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 :
>> 0000000000000008
>> [    0.000000]  s5 : ffffffff000000ff s6 : 0000000000ffffff s7 :
>> 00000000ffffff00
>> [    0.000000]  s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10:
>> 0000000000000002
>> [    0.000000]  s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 :
>> ffffffff812f3617
>> [    0.000000]  t5 : ffffffff812f3618 t6 : ffffffff81203d08
>> [    0.000000] status: 0000000200000100 badaddr: 00000000401c31ac
>> cause: 000000000000000d
>> [    0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
>> [    0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
>> [    0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
>> [    0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
>> [    0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
>> [    0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
>> [    0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
>> [    0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
>> [    0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
>> [    0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
>> [    0.000000] ---[ end trace 0000000000000000 ]---
>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle
>> task!
>> [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill
>> the idle task! ]---
>>
>> We traced this back to early_init_fdt_scan_reserved_mem() in
>> setup_bootmem() - moving it later back up the boot sequence to
>> after the dt has been remapped etc has fixed the problem for us.
>>
>> The least movement to get it working is attached, and also pushed
>> here: git.kernel.org/conor/c/1735589baefc
>>
>> The second problem is a bit more complicated to explain - but we
>> found the solution conflicted with the remoteproc fix as we had
>> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
>> process to solve this one.
>>
>> We want to have a node in our devicetree that contains some memory
>> that is non-cached & marked as reserved-memory. Maybe we have just
>> missed something, but from what we've seen:
>> - the really early setup looks at the dtb, picks the highest bit
>>     of memory and puts the dtb etc there so it can start using it
>> - early_init_fdt_scan_reserved_mem() is then called, which figures
>>     out if memory is reserved or not.
>>
>> Unfortunately, the highest bit of memory is the non-cached bit so
>> everything falls over, but we can avoid this by moving the call to
>> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
>> takes place right before it in setup_bootmem().
>
>
> And then I suppose the allocations you are mentioning happen in
> unflatten_XXX, so parsing the device tree for reserved memory nodes
> before this should do the trick. Does the following fix your second
> issue?
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 2b09f0bd8432..94b3d049fe9d 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -277,14 +277,15 @@ void __init setup_arch(char **cmdline_p)
>         paging_init();
>  #if IS_ENABLED(CONFIG_BUILTIN_DTB)
>         initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
> +       early_init_fdt_scan_reserved_mem();
>         unflatten_and_copy_device_tree();
>  #else
> -       if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))
> +       if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) {
> +               early_init_fdt_scan_reserved_mem();
>                 unflatten_device_tree();
> -       else
> +       } else
>                 pr_err("No DTB found in kernel mappings\n");
>  #endif
> -       early_init_fdt_scan_reserved_mem();
>         misc_mem_init();
>
>         init_resources();
>
>
>
>>
>> Obviously, both of these changes are moving the function call in
>> opposite directions and we can only really do one of them. We are not
>> sure if what we are doing with the non-cached reserved-memory section
>> is just not permitted & cannot work - or if this is something that
>> was overlooked for RISC-V specifically and works for other archs.
>>
>> It does seem like the first issue is a real bug, and I am happy to
>> submit the patch for that whenever - but having two problems with
>> opposite fixes seemed as if there was something else lurking that we
>> just don't have enough understanding to detect.
>>
>> Any help would be great!
>>
>> Thanks,
>> Conor.
>>
>
> Even if that does not fix your issue, the first patch is necessary as
> it fixes initial_boot_params.
>
>
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-03-09 12:33:15

by Conor Dooley

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On Thu, Mar 09, 2023 at 11:30:07AM +0100, Alexandre Ghiti wrote:
> Hi Conor,

Hey Alex!

> On 8/16/22 22:41, [email protected] wrote:
> > Hey all,
> > We've run into a bit of a problem with reserved memory on PolarFire, or
> > more accurately a pair of problems that seem to have opposite fixes.
> >
> > The first of these problems is triggered when trying to implement a
> > remoteproc driver. To get the reserved memory buffer, remoteproc
> > does an of_reserved_mem_lookup(), something like:
> >
> > np = of_parse_phandle(pdev->of_node, "memory-region", 0);
> > if (!np)
> > return -EINVAL;
> >
> > rmem = of_reserved_mem_lookup(np);
> > if (!rmem)
> > return -EINVAL;
> >
> > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
> > a match - but this was triggering kernel panics for us. We did some
> > debugging and found that the name string's pointer was pointing to an
> > address in the 0x4000_0000 range. The minimum reproduction for this
>
>
> 0x4000_0000 corresponds to DTB_EARLY_BASE_VA: this is the address that is
> used to map the dtb before we can access it using the linear mapping.
>
>
> > crash is attached - it hacks in some print_reserved_mem()s into
> > setup_vm_final() around a tlb flush so you can see the before/after.
> > (You'll need a reserved memory node in your dts to replicate)
> >
> > The output is like so, with the same crash as in the remoteproc driver:
> >
> > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
> > [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
> > [ 0.000000] Machine model: Microchip PolarFire-SoC Icicle Kit
> > [ 0.000000] earlycon: ns16550a0 at MMIO32 0x0000000020100000 (options '115200n8')
> > [ 0.000000] printk: bootconsole [ns16550a0] enabled
> > [ 0.000000] printk: debug: skip boot console de-registration.
> > [ 0.000000] efi: UEFI not found.
> > [ 0.000000] before flush
> > [ 0.000000] OF: reserved mem: debug name is fabricbuf@ae000000
> > [ 0.000000] after flush
> > [ 0.000000] Unable to handle kernel paging request at virtual address 00000000401c31ac
>
>
> You take the trap here because the mapping for the dtb does not exist in
> swapper_pg_dir, but you don't need this mapping anymore as you can access
> the device tree through the linear mapping now.
>
> I would say that: you build your kernel with CONFIG_BUILTIN_DTB

I do not have any "NONPORTABLE" options selected.

> and then you
> don't call early_init_dt_verify which resets initial_boot_params to the
> linear mapping address (it was initially set to 0x4000_0000 in parse_dtb).
> If that's the case, does the following fix your issue?

I dunno, I don't hit this issue anymore as of 50e63dd8ed92 ("riscv: fix
reserved memory setup"). That might not be the right fix in the grand
scheme of things, but it solved problems that both I and others were
hitting.

> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 376d2827e736..2b09f0bd8432 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -276,6 +276,7 @@ void __init setup_arch(char **cmdline_p)
> ??????? efi_init();
> ??????? paging_init();
> ?#if IS_ENABLED(CONFIG_BUILTIN_DTB)
> +?????? initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
> ??????? unflatten_and_copy_device_tree();
> ?#else
> ??????? if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))

I'll try and allocate some time for looking at this next week, got some
items on my todo list first. I doubt it'll help me cos I have not get
that option set, but I can go and test reserved memory stuff for systems
that do use it (I have a k210).

> > [ 0.000000] Oops [#1]
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
> > [ 0.000000] Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
> > [ 0.000000] epc : string+0x4a/0xea
> > [ 0.000000] ra : vsnprintf+0x1e4/0x336
> > [ 0.000000] epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
> > [ 0.000000] gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
> > [ 0.000000] t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
> > [ 0.000000] s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
> > [ 0.000000] a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
> > [ 0.000000] a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
> > [ 0.000000] s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
> > [ 0.000000] s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
> > [ 0.000000] s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
> > [ 0.000000] s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
> > [ 0.000000] t5 : ffffffff812f3618 t6 : ffffffff81203d08
> > [ 0.000000] status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
> > [ 0.000000] [<ffffffff80338936>] vsnprintf+0x1e4/0x336
> > [ 0.000000] [<ffffffff80055ae2>] vprintk_store+0xf6/0x344
> > [ 0.000000] [<ffffffff80055d86>] vprintk_emit+0x56/0x192
> > [ 0.000000] [<ffffffff80055ed8>] vprintk_default+0x16/0x1e
> > [ 0.000000] [<ffffffff800563d2>] vprintk+0x72/0x80
> > [ 0.000000] [<ffffffff806813b2>] _printk+0x36/0x50
> > [ 0.000000] [<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
> > [ 0.000000] [<ffffffff808057ec>] paging_init+0x528/0x5bc
> > [ 0.000000] [<ffffffff808031ae>] setup_arch+0xd0/0x592
> > [ 0.000000] [<ffffffff8080070e>] start_kernel+0x82/0x73c
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
> >
> > We traced this back to early_init_fdt_scan_reserved_mem() in
> > setup_bootmem() - moving it later back up the boot sequence to
> > after the dt has been remapped etc has fixed the problem for us.
> >
> > The least movement to get it working is attached, and also pushed
> > here: git.kernel.org/conor/c/1735589baefc
> >
> > The second problem is a bit more complicated to explain - but we
> > found the solution conflicted with the remoteproc fix as we had
> > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
> > process to solve this one.
> >
> > We want to have a node in our devicetree that contains some memory
> > that is non-cached & marked as reserved-memory. Maybe we have just
> > missed something, but from what we've seen:
> > - the really early setup looks at the dtb, picks the highest bit
> > of memory and puts the dtb etc there so it can start using it
> > - early_init_fdt_scan_reserved_mem() is then called, which figures
> > out if memory is reserved or not.
> >
> > Unfortunately, the highest bit of memory is the non-cached bit so
> > everything falls over, but we can avoid this by moving the call to
> > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
> > takes place right before it in setup_bootmem().
>
>
> And then I suppose the allocations you are mentioning happen in
> unflatten_XXX, so parsing the device tree for reserved memory nodes before
> this should do the trick. Does the following fix your second issue?

Again, I'll have to look at this.
To give you a quick answer, I believe it is the create_pgd_mapping()
stuff in setup_vm_final() actually.
I'm not super sure about this particular bit of the issue, but in the
email I sent more recently that expands on these issues (that Mike
replied to the other day) the similar issue that I saw the allocations
in reserved memory were inside paging_init().

> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 2b09f0bd8432..94b3d049fe9d 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -277,14 +277,15 @@ void __init setup_arch(char **cmdline_p)
> ??????? paging_init();
> ?#if IS_ENABLED(CONFIG_BUILTIN_DTB)
> ??????? initial_boot_params = __va(XIP_FIXUP(dtb_early_pa));
> +?????? early_init_fdt_scan_reserved_mem();
> ??????? unflatten_and_copy_device_tree();
> ?#else
> -?????? if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa))))
> +?????? if (early_init_dt_verify(__va(XIP_FIXUP(dtb_early_pa)))) {
> +?????????????? early_init_fdt_scan_reserved_mem();
> ??????????????? unflatten_device_tree();

Previously, this was problematic (doing the scan before unflattening).
I moved the scan to the earliest possible place that I could call it
without issues.
I'll get back to you...

> -?????? else
> +?????? } else
> ??????????????? pr_err("No DTB found in kernel mappings\n");
> ?#endif
> -?????? early_init_fdt_scan_reserved_mem();
> ??????? misc_mem_init();
>
> ??????? init_resources();
>
>
>
> >
> > Obviously, both of these changes are moving the function call in
> > opposite directions and we can only really do one of them. We are not
> > sure if what we are doing with the non-cached reserved-memory section
> > is just not permitted & cannot work - or if this is something that
> > was overlooked for RISC-V specifically and works for other archs.
> >
> > It does seem like the first issue is a real bug, and I am happy to
> > submit the patch for that whenever - but having two problems with
> > opposite fixes seemed as if there was something else lurking that we
> > just don't have enough understanding to detect.
> >
> > Any help would be great!
> >
> > Thanks,
> > Conor.
> >
>
> Even if that does not fix your issue, the first patch is necessary as it
> fixes initial_boot_params.

Sure, send a patch (with a nice commit message please, so that I don't
have to remember the whole process to review it!!) as my brain has long
since swapped out the nitty gritty of the unflattening -> scanning
process.

Thanks Alex!

Conor.


Attachments:
(No filename) (9.82 kB)
signature.asc (228.00 B)
Download all attachments

2023-03-09 12:45:24

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems


On 3/7/23 12:35, Mike Rapoport wrote:
> Hi Conor,
>
> Sorry for the delay, somehow this slipped between the cracks.
>
> On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote:
>> Hullo Palmer, Mike & whoever else may read this,
>>
>> Just reviving this thread from a little while ago as I have been in the
>> area again recently...
> TBH, I didn't really dig deep into the issues, but the thought I had was
> what if DT was mapped via fixmap until the setup_vm_final() and then it
> would be possible to call DT methods early.
>
> Could be I'm shooting in the dark :)


I think I understand the issue now, it's because In riscv, we establish
2 different virtual mappings and we map the device tree at 2 different
virtual addresses, which is the problem.

So to me, the solution is:

- to revert your previous fix, that is calling
early_init_fdt_scan_reserved_mem() before any call to memblock_alloc()
(which could result in an allocation in the area you want to reserve)

- to map the device tree at the same virtual address, because
early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb
mapping established in setup_vm() and uses reserved_mem with the new
mapping from setup_vm_final (which is what Mike proposes, we should use
the fixmap region to have the same virtual addresses)

Hope that makes sense: I'll come up with something this afternoon for
you to test!

Sorry for the first completely wrong email,

Thanks,

Alex


>
>> On Tue, Aug 16, 2022 at 08:41:05PM +0000, [email protected] wrote:
>>> Hey all,
>>> We've run into a bit of a problem with reserved memory on PolarFire, or
>>> more accurately a pair of problems that seem to have opposite fixes.
>>>
>>> The first of these problems is triggered when trying to implement a
>>> remoteproc driver. To get the reserved memory buffer, remoteproc
>>> does an of_reserved_mem_lookup(), something like:
>>>
>>> np = of_parse_phandle(pdev->of_node, "memory-region", 0);
>>> if (!np)
>>> return -EINVAL;
>>>
>>> rmem = of_reserved_mem_lookup(np);
>>> if (!rmem)
>>> return -EINVAL;
>>>
>>> of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find
>>> a match - but this was triggering kernel panics for us. We did some
>>> debugging and found that the name string's pointer was pointing to an
>>> address in the 0x4000_0000 range. The minimum reproduction for this
>>> crash is attached - it hacks in some print_reserved_mem()s into
>>> setup_vm_final() around a tlb flush so you can see the before/after.
>>> (You'll need a reserved memory node in your dts to replicate)
>>>
>>> The output is like so, with the same crash as in the remoteproc driver:
>>>
>>> [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022
>> [...]
>>
>>> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>>>
>>> We traced this back to early_init_fdt_scan_reserved_mem() in
>>> setup_bootmem() - moving it later back up the boot sequence to
>>> after the dt has been remapped etc has fixed the problem for us.
>>>
>>> The least movement to get it working is attached, and also pushed
>>> here: git.kernel.org/conor/c/1735589baefc
>> This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved
>> memory setup").
>>
>>> The second problem is a bit more complicated to explain - but we
>>> found the solution conflicted with the remoteproc fix as we had
>>> to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot
>>> process to solve this one.
>>>
>>> We want to have a node in our devicetree that contains some memory
>>> that is non-cached & marked as reserved-memory. Maybe we have just
>>> missed something, but from what we've seen:
>>> - the really early setup looks at the dtb, picks the highest bit
>>> of memory and puts the dtb etc there so it can start using it
>>> - early_init_fdt_scan_reserved_mem() is then called, which figures
>>> out if memory is reserved or not.
>>>
>>> Unfortunately, the highest bit of memory is the non-cached bit so
>>> everything falls over, but we can avoid this by moving the call to
>>> early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that
>>> takes place right before it in setup_bootmem().
>>>
>>> Obviously, both of these changes are moving the function call in
>>> opposite directions and we can only really do one of them. We are not
>>> sure if what we are doing with the non-cached reserved-memory section
>>> is just not permitted & cannot work - or if this is something that
>>> was overlooked for RISC-V specifically and works for other archs.
>> We ended up working around this one by making sure that U-Boot loaded
>> the dtb to somewhere that would be inside the kernel's memory map, thus
>> avoiding the remapping in the first place.
>>
>> We did run into another problem recently though, and 50e63dd8ed92 is
>> kinda at fault for it.
>> This particular issue was encountered with a devicetree where the
>> top-most memory region was entirely reserved & was not observed prior
>> to my fix for the first issue.
>>
>> On RISC-V, the boot sequence is something like:
>> setup_bootmem();
>> setup_vm_final();
>> unflatten_device_tree();
>> early_init_fdt_scan_reserved_mem();
>>
>> Whereas, before my patch it used to be (give-or-take):
>> setup_bootmem();
>> early_init_fdt_scan_reserved_mem();
>> setup_vm_final();
>> unflatten_device_tree();
>>
>> The difference being that we used to have scanned the reserved memory
>> regions before calling setup_vm_final() & therefore know which regions
>> we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem()
>> before we've got the dt in a proper virtual memory address will cause
>> the kernel to panic if it tries to read a reserved memory node's label.
>>
>> As we are now calling setup_vm_final() *before* we know what the
>> reserved memory regions are & as RISC-V allocates memblocks from the top
>> down, the allocations in setup_vm_final() will be done in the highest
>> memory region.
>> When early_init_fdt_scan_reserved_mem() then tries to reserve the
>> entirety of that top-most memory region, the reservation fails as part
>> of this region has already been allocated.
>> In the scenario where I found this bug, that top-most region is non-
>> cached memory & the kernel ends up panicking.
>> The memblock debug code made this pretty easy to spot, otherwise I'd
>> probably have spent more than just a few hours trying to figure out why
>> it was panicking!
>>
>> My "this needs to be fixed today" solution for this problem was calling
>> memblock_set_bottom_up(true) in setup_bootmem() & that's what we are
>> going to carry downstream for now.
>>
>> I haven't tested it (yet) but I suspect that it would also fix our
>> problem of the dtb being remapped into a non-cached region of memory
>> that we would later go on to reserve too. Non-cached being an issue
>> mainly due to the panicking, but failing to reserve (and using!) memory
>> regions that are meant to be reserved is very far from ideal even when
>> they are memory that the kernel can actually use.
>>
>> I have no idea if that is an acceptable solution for upstream though, so
>> I guess this is me putting out feelers as to whether this is something I
>> should send a patch to do *OR* if this is another sign of the issues
>> that you (Mike, Palmer) mentioned in the past.
>> If it isn't an acceptable solution, I'm not really too sure how to
>> proceed!
>>
>> Cheers,
>> Conor.
>>
>
>

2023-03-09 12:52:40

by Conor Dooley

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On Thu, Mar 09, 2023 at 01:45:05PM +0100, Alexandre Ghiti wrote:
>
> On 3/7/23 12:35, Mike Rapoport wrote:
> > Hi Conor,
> >
> > Sorry for the delay, somehow this slipped between the cracks.
> >
> > On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote:
> > > Hullo Palmer, Mike & whoever else may read this,
> > >
> > > Just reviving this thread from a little while ago as I have been in the
> > > area again recently...
> > TBH, I didn't really dig deep into the issues, but the thought I had was
> > what if DT was mapped via fixmap until the setup_vm_final() and then it
> > would be possible to call DT methods early.
> >
> > Could be I'm shooting in the dark :)
>
>
> I think I understand the issue now, it's because In riscv, we establish 2
> different virtual mappings and we map the device tree at 2 different virtual
> addresses, which is the problem.
>
> So to me, the solution is:
>
> - to revert your previous fix, that is calling
> early_init_fdt_scan_reserved_mem() before any call to memblock_alloc()
> (which could result in an allocation in the area you want to reserve)
>
> - to map the device tree at the same virtual address, because
> early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb
> mapping established in setup_vm() and uses reserved_mem with the new mapping
> from setup_vm_final (which is what Mike proposes, we should use the fixmap
> region to have the same virtual addresses)
>
> Hope that makes sense: I'll come up with something this afternoon for you to
> test!

Sounds good. Please give me some ELI5 commit messages if you can,
explanations for this stuff (which I found took a lot of archaeology to
understand) would be very welcome next time we need to go back looking
at this stuff.

Thanks Alex!
Conor.


Attachments:
(No filename) (1.74 kB)
signature.asc (228.00 B)
Download all attachments

2023-03-09 15:15:15

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On 3/9/23 13:51, Conor Dooley wrote:
> On Thu, Mar 09, 2023 at 01:45:05PM +0100, Alexandre Ghiti wrote:
>> On 3/7/23 12:35, Mike Rapoport wrote:
>>> Hi Conor,
>>>
>>> Sorry for the delay, somehow this slipped between the cracks.
>>>
>>> On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote:
>>>> Hullo Palmer, Mike & whoever else may read this,
>>>>
>>>> Just reviving this thread from a little while ago as I have been in the
>>>> area again recently...
>>> TBH, I didn't really dig deep into the issues, but the thought I had was
>>> what if DT was mapped via fixmap until the setup_vm_final() and then it
>>> would be possible to call DT methods early.
>>>
>>> Could be I'm shooting in the dark :)
>>
>> I think I understand the issue now, it's because In riscv, we establish 2
>> different virtual mappings and we map the device tree at 2 different virtual
>> addresses, which is the problem.
>>
>> So to me, the solution is:
>>
>> - to revert your previous fix, that is calling
>> early_init_fdt_scan_reserved_mem() before any call to memblock_alloc()
>> (which could result in an allocation in the area you want to reserve)
>>
>> - to map the device tree at the same virtual address, because
>> early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb
>> mapping established in setup_vm() and uses reserved_mem with the new mapping
>> from setup_vm_final (which is what Mike proposes, we should use the fixmap
>> region to have the same virtual addresses)
>>
>> Hope that makes sense: I'll come up with something this afternoon for you to
>> test!
> Sounds good. Please give me some ELI5 commit messages if you can,
> explanations for this stuff (which I found took a lot of archaeology to
> understand) would be very welcome next time we need to go back looking
> at this stuff.


Can you give it a try here:
https://github.com/AlexGhiti/riscv-linux/commits/dev/alex/conor_dtb_fixmap_v1
?

That works for me but I need to carefully explain and check that's
correct though, not upstreamable as is.

Thanks,

Alex


> Thanks Alex!
> Conor.
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-03-20 12:12:10

by Conor Dooley

[permalink] [raw]
Subject: Re: RISC-V reserved memory problems

On Thu, Mar 09, 2023 at 04:12:57PM +0100, Alexandre Ghiti wrote:
> On 3/9/23 13:51, Conor Dooley wrote:
> > On Thu, Mar 09, 2023 at 01:45:05PM +0100, Alexandre Ghiti wrote:
> > > On 3/7/23 12:35, Mike Rapoport wrote:
> > > > Hi Conor,
> > > >
> > > > Sorry for the delay, somehow this slipped between the cracks.
> > > >
> > > > On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote:
> > > > > Hullo Palmer, Mike & whoever else may read this,
> > > > >
> > > > > Just reviving this thread from a little while ago as I have been in the
> > > > > area again recently...
> > > > TBH, I didn't really dig deep into the issues, but the thought I had was
> > > > what if DT was mapped via fixmap until the setup_vm_final() and then it
> > > > would be possible to call DT methods early.
> > > >
> > > > Could be I'm shooting in the dark :)
> > >
> > > I think I understand the issue now, it's because In riscv, we establish 2
> > > different virtual mappings and we map the device tree at 2 different virtual
> > > addresses, which is the problem.
> > >
> > > So to me, the solution is:
> > >
> > > - to revert your previous fix, that is calling
> > > early_init_fdt_scan_reserved_mem() before any call to memblock_alloc()
> > > (which could result in an allocation in the area you want to reserve)
> > >
> > > - to map the device tree at the same virtual address, because
> > > early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb
> > > mapping established in setup_vm() and uses reserved_mem with the new mapping
> > > from setup_vm_final (which is what Mike proposes, we should use the fixmap
> > > region to have the same virtual addresses)
> > >
> > > Hope that makes sense: I'll come up with something this afternoon for you to
> > > test!
> > Sounds good. Please give me some ELI5 commit messages if you can,
> > explanations for this stuff (which I found took a lot of archaeology to
> > understand) would be very welcome next time we need to go back looking
> > at this stuff.
>
>
> Can you give it a try here:
> https://github.com/AlexGhiti/riscv-linux/commits/dev/alex/conor_dtb_fixmap_v1
> ?
>
> That works for me but I need to carefully explain and check that's correct
> though, not upstreamable as is.

Hey Alex,

So I ended up being pretty sick & had to take a week off. I gave this an
initial spin today & it appears to work.
I'll take it for a longer test-drive when you send a "real" patch for
it, but I tested both the lookup by name & the situation that was
allocating in reserved memory and both were not an issue.

Thanks for working on this,
Conor.


Attachments:
(No filename) (2.54 kB)
signature.asc (228.00 B)
Download all attachments