Hi list(s),
it seems there is a regression in arm64 memory mapping in 5.14, since it
fails on Rockchip RK3328 when the pl330 dmac tries to map with:
[ 8.921909] ------------[ cut here ]------------
[ 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235
dma_map_resource+0x68/0xc0
[ 8.921973] Modules linked in: spi_rockchip(+) fuse
[ 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted
5.14.0-rc7 #1
[ 8.922004] Hardware name: Pine64 Rock64 (DT)
[ 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[ 8.922018] pc : dma_map_resource+0x68/0xc0
[ 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
[ 8.922040] sp : ffff800012102ae0
[ 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27:
0000000000000000
[ 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24:
0000000000000001
[ 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21:
0000000000000001
[ 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18:
0000000000000000
[ 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15:
0000000000000000
[ 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12:
0000000000000000
[ 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 :
ffff800012102a80
[ 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 :
ffff0000fe7b1100
[ 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 :
0000000000000001
[ 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 :
ffff000000628c00
[ 8.922158] Call trace:
[ 8.922163] dma_map_resource+0x68/0xc0
[ 8.922173] pl330_prep_slave_sg+0x58/0x220
[ 8.922181] rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
[ 8.922208] rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
[ 8.922220] spi_transfer_one_message+0x284/0x57c
[ 8.922232] __spi_pump_messages+0x3dc/0x650
[ 8.922240] __spi_sync+0x3e4/0x48c
[ 8.922247] spi_sync+0x30/0x54
[ 8.922253] spi_mem_exec_op+0x264/0x444
[ 8.922260] spi_nor_spimem_read_data+0x148/0x160
[ 8.922269] spi_nor_read_data+0x30/0x40
[ 8.922276] spi_nor_read_sfdp+0x74/0xe4
[ 8.922285] spi_nor_parse_sfdp+0x1d0/0x1130
[ 8.922293] spi_nor_sfdp_init_params+0x3c/0x90
[ 8.922304] spi_nor_scan+0x7b4/0xacc
[ 8.922311] spi_nor_probe+0x94/0x2d0
[ 8.922317] spi_mem_probe+0x6c/0xb0
[ 8.922325] spi_probe+0x84/0xe4
[ 8.922335] really_probe+0xb4/0x45c
[ 8.922349] __driver_probe_device+0x114/0x190
[ 8.922358] driver_probe_device+0x40/0x100
[ 8.922367] __device_attach_driver+0x98/0x130
[ 8.922375] bus_for_each_drv+0x78/0xd0
[ 8.922383] __device_attach+0xdc/0x1c0
[ 8.922391] device_initial_probe+0x14/0x20
[ 8.922400] bus_probe_device+0x98/0xa0
[ 8.922408] device_add+0x36c/0x8c0
[ 8.922416] __spi_add_device+0x74/0x170
[ 8.922423] spi_add_device+0x64/0xa4
[ 8.922429] of_register_spi_device+0x21c/0x36c
[ 8.922436] spi_register_controller+0x5e0/0x834
[ 8.922443] devm_spi_register_controller+0x24/0x80
[ 8.922450] rockchip_spi_probe+0x434/0x5b0 [spi_rockchip]
[ 8.922468] platform_probe+0x68/0xe0
[ 8.922478] really_probe+0xb4/0x45c
[ 8.922487] __driver_probe_device+0x114/0x190
[ 8.922497] driver_probe_device+0x40/0x100
[ 8.922507] __driver_attach+0xcc/0x1e0
[ 8.922516] bus_for_each_dev+0x70/0xd0
[ 8.922524] driver_attach+0x24/0x30
[ 8.922533] bus_add_driver+0x140/0x234
[ 8.922540] driver_register+0x78/0x130
[ 8.922547] __platform_driver_register+0x28/0x34
[ 8.922554] rockchip_spi_driver_init+0x20/0x1000 [spi_rockchip]
[ 8.922566] do_one_initcall+0x50/0x1b0
[ 8.922579] do_init_module+0x54/0x250
[ 8.922589] load_module+0x2230/0x285c
[ 8.922597] __do_sys_finit_module+0xbc/0x12c
[ 8.922605] __arm64_sys_finit_module+0x24/0x30
[ 8.922613] invoke_syscall+0x48/0x114
[ 8.922625] el0_svc_common+0x40/0xfc
[ 8.922632] do_el0_svc_compat+0x20/0x50
[ 8.922640] el0_svc_compat+0x2c/0x54
[ 8.922652] el0t_32_sync_handler+0x90/0x140
[ 8.922660] el0t_32_sync+0x19c/0x1a0
[ 8.922669] ---[ end trace 2245d8ba23a1d75c ]---
Note: This does not relate to the spi driver - when disabling this
device in the device tree it fails for any other (i2s, for instance)
which uses dma.
Commenting out the failing check at [1], however, helps and the mapping
works again.
I tried to follow the recent changes for arm64 mm which could relate to
the check failing at [1] and reverting
commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
helps and makes it work again, but I'm 100% uncertain if that commit is
really the culprit.
Note, that the firmware (legacy u-boot) injects memory configuration in
the device tree as follows:
/memreserve/ 0x00000000fcefc000 0x000000000000d000;
/ {
..
compatible = "pine64,rock64\0rockchip,rk3328";
..
memory {
reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
device_type = "memory";
};
..
}
So: there is a "hole" in the mappable memory and reading the commit
message of
commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify
pfn_valid()")
suggests, there was a change for that case recently.
I also noticed there is a diff in the kernel log regarding memory init
up until 5.13.12 it says
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000000200000-0x00000000feffffff]
[ 0.000000] On node 0 totalpages: 1043968
[ 0.000000] DMA zone: 16312 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 1043968 pages, LIFO batch:63
In contrary in 5.14-rc7 it says:
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000000200000-0x00000000feffffff]
[ 0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
[ 0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
(note the "unavailable ranges")
I'm uncertain again here, if that diff is expected behavior because of
those recent mm changes for arm64.
After reverting
commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
the log changes to
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
[ 0.000000] DMA32 empty
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000000200000-0x00000000feffffff]
(no DMA zones here)
As you might have noticed I have _zero_ clue about memory mapping and
dma subsystem - so let me know if there is any more information needed
for that and thanks for your help.
Alex
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/mapping.c?id=e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93#n235
Hi Alex,
Thanks for the report.
On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
> it seems there is a regression in arm64 memory mapping in 5.14, since it
> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>
> [??? 8.921909] ------------[ cut here ]------------
> [??? 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> [??? 8.921973] Modules linked in: spi_rockchip(+) fuse
> [??? 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> [??? 8.922004] Hardware name: Pine64 Rock64 (DT)
> [??? 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> [??? 8.922018] pc : dma_map_resource+0x68/0xc0
> [??? 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
> [??? 8.922040] sp : ffff800012102ae0
> [??? 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
> [??? 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
> [??? 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
> [??? 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
> [??? 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [??? 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
> [??? 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
> [??? 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
> [??? 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
> [??? 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
> [??? 8.922158] Call trace:
> [??? 8.922163]? dma_map_resource+0x68/0xc0
> [??? 8.922173]? pl330_prep_slave_sg+0x58/0x220
> [??? 8.922181]? rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
> [??? 8.922208]? rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
[...]
> Note: This does not relate to the spi driver - when disabling this device in
> the device tree it fails for any other (i2s, for instance) which uses dma.
> Commenting out the failing check at [1], however, helps and the mapping
> works again.
Do you know which address dma_map_resource() is trying to map (maybe
add some printk())? It's not supposed to map RAM, hence the warning.
Random guess, the address is 0xff190800 (based on the x1 above but the
regs might as well be mangled).
> I tried to follow the recent changes for arm64 mm which could relate to the
> check failing at [1] and reverting
> ? commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
> helps and makes it work again, but I'm 100% uncertain if that commit is
> really the culprit.
>
> Note, that the firmware (legacy u-boot) injects memory configuration in the
> device tree as follows:
>
> /memreserve/??? 0x00000000fcefc000 0x000000000000d000;
> / {
> ..
> ??? compatible = "pine64,rock64\0rockchip,rk3328";
> ..
> ??? memory {
> ??? ??? reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
> ??? ??? device_type = "memory";
> ??? };
>
> ..
> }
Either pfn_valid() gets confused in 5.14 or something is wrong with the
DT. I have a suspicion it's the former since reverting the above commit
makes it disappear.
> So: there is a "hole" in the mappable memory and reading the commit message
> of
> ? commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify
> pfn_valid()")
> suggests, there was a change for that case recently.
I think the change from the arm64 pfn_valid() to the generic one is
avoiding the call to memblock_is_memory(). I wonder whether pfn_valid()
returns true just because we have a struct page available but the memory
may have been reserved.
Cc'ing Mike R.
> I also noticed there is a diff in the kernel log regarding memory init up
> until 5.13.12 it says
>
> [??? 0.000000] Zone ranges:
> [??? 0.000000]?? DMA????? [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000]?? DMA32??? empty
> [??? 0.000000]?? Normal?? empty
> [??? 0.000000] Movable zone start for each node
> [??? 0.000000] Early memory node ranges
> [??? 0.000000]?? node?? 0: [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000] On node 0 totalpages: 1043968
> [??? 0.000000]?? DMA zone: 16312 pages used for memmap
> [??? 0.000000]?? DMA zone: 0 pages reserved
> [??? 0.000000]?? DMA zone: 1043968 pages, LIFO batch:63
>
> In contrary in 5.14-rc7 it says:
>
> [??? 0.000000] Zone ranges:
> [??? 0.000000]?? DMA????? [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000]?? DMA32??? empty
> [??? 0.000000]?? Normal?? empty
> [??? 0.000000] Movable zone start for each node
> [??? 0.000000] Early memory node ranges
> [??? 0.000000]?? node?? 0: [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
> [??? 0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
>
> (note the "unavailable ranges")
> I'm uncertain again here, if that diff is expected behavior because of those
> recent mm changes for arm64.
>
> After reverting
> ? commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
> the log changes to
>
> [??? 0.000000] Zone ranges:
> [??? 0.000000]?? DMA????? [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000]?? DMA32??? empty
> [??? 0.000000]?? Normal?? empty
> [??? 0.000000] Movable zone start for each node
> [??? 0.000000] Early memory node ranges
> [??? 0.000000]?? node?? 0: [mem 0x0000000000200000-0x00000000feffffff]
> [??? 0.000000] Initmem setup node 0 [mem
> 0x0000000000200000-0x00000000feffffff]
>
> (no DMA zones here)
>
> As you might have noticed I have _zero_ clue about memory mapping and dma
> subsystem - so let me know if there is any more information needed for that
> and thanks for your help.
Adding Robin as well, he has a better clue than us on DMA ;).
> Alex
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/mapping.c?id=e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93#n235
--
Catalin
On 2021-08-24 18:37, Catalin Marinas wrote:
> Hi Alex,
>
> Thanks for the report.
>
> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>
>> [��� 8.921909] ------------[ cut here ]------------
>> [��� 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>> [��� 8.921973] Modules linked in: spi_rockchip(+) fuse
>> [��� 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>> [��� 8.922004] Hardware name: Pine64 Rock64 (DT)
>> [��� 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>> [��� 8.922018] pc : dma_map_resource+0x68/0xc0
>> [��� 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
>> [��� 8.922040] sp : ffff800012102ae0
>> [��� 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>> [��� 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>> [��� 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>> [��� 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>> [��� 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> [��� 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>> [��� 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>> [��� 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>> [��� 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>> [��� 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>> [��� 8.922158] Call trace:
>> [��� 8.922163]� dma_map_resource+0x68/0xc0
>> [��� 8.922173]� pl330_prep_slave_sg+0x58/0x220
>> [��� 8.922181]� rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>> [��� 8.922208]� rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> [...]
>> Note: This does not relate to the spi driver - when disabling this device in
>> the device tree it fails for any other (i2s, for instance) which uses dma.
>> Commenting out the failing check at [1], however, helps and the mapping
>> works again.
>
> Do you know which address dma_map_resource() is trying to map (maybe
> add some printk())? It's not supposed to map RAM, hence the warning.
> Random guess, the address is 0xff190800 (based on the x1 above but the
> regs might as well be mangled).
Yup, that fits the signature of dma_map_resource(), and would indeed be
right in the middle of the SPI peripheral on RK3328.
FWIW the comment about RAM there is a little inaccurate, but the point
remains that anything which *is* backed by a page should probably be
handled by dma_map_page() if at all.
>> I tried to follow the recent changes for arm64 mm which could relate to the
>> check failing at [1] and reverting
>> � commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>> helps and makes it work again, but I'm 100% uncertain if that commit is
>> really the culprit.
>>
>> Note, that the firmware (legacy u-boot) injects memory configuration in the
>> device tree as follows:
>>
>> /memreserve/��� 0x00000000fcefc000 0x000000000000d000;
>> / {
>> ..
>> ��� compatible = "pine64,rock64\0rockchip,rk3328";
>> ..
>> ��� memory {
>> ��� ��� reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
>> ��� ��� device_type = "memory";
>> ��� };
>>
>> ..
>> }
>
> Either pfn_valid() gets confused in 5.14 or something is wrong with the
> DT. I have a suspicion it's the former since reverting the above commit
> makes it disappear.
>
>> So: there is a "hole" in the mappable memory and reading the commit message
>> of
>> � commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify
>> pfn_valid()")
>> suggests, there was a change for that case recently.
>
> I think the change from the arm64 pfn_valid() to the generic one is
> avoiding the call to memblock_is_memory(). I wonder whether pfn_valid()
> returns true just because we have a struct page available but the memory
> may have been reserved.
Either way I think something's gone pretty badly wrong if Linux ends up
thinking that an MMIO region beyond the bounds of any possible RAM
should be page-backed :/
Robin.
>
> Cc'ing Mike R.
>
>> I also noticed there is a diff in the kernel log regarding memory init up
>> until 5.13.12 it says
>>
>> [��� 0.000000] Zone ranges:
>> [��� 0.000000]�� DMA����� [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000]�� DMA32��� empty
>> [��� 0.000000]�� Normal�� empty
>> [��� 0.000000] Movable zone start for each node
>> [��� 0.000000] Early memory node ranges
>> [��� 0.000000]�� node�� 0: [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000] On node 0 totalpages: 1043968
>> [��� 0.000000]�� DMA zone: 16312 pages used for memmap
>> [��� 0.000000]�� DMA zone: 0 pages reserved
>> [��� 0.000000]�� DMA zone: 1043968 pages, LIFO batch:63
>>
>> In contrary in 5.14-rc7 it says:
>>
>> [��� 0.000000] Zone ranges:
>> [��� 0.000000]�� DMA����� [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000]�� DMA32��� empty
>> [��� 0.000000]�� Normal�� empty
>> [��� 0.000000] Movable zone start for each node
>> [��� 0.000000] Early memory node ranges
>> [��� 0.000000]�� node�� 0: [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
>> [��� 0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
>>
>> (note the "unavailable ranges")
>> I'm uncertain again here, if that diff is expected behavior because of those
>> recent mm changes for arm64.
>>
>> After reverting
>> � commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>> the log changes to
>>
>> [��� 0.000000] Zone ranges:
>> [��� 0.000000]�� DMA����� [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000]�� DMA32��� empty
>> [��� 0.000000]�� Normal�� empty
>> [��� 0.000000] Movable zone start for each node
>> [��� 0.000000] Early memory node ranges
>> [��� 0.000000]�� node�� 0: [mem 0x0000000000200000-0x00000000feffffff]
>> [��� 0.000000] Initmem setup node 0 [mem
>> 0x0000000000200000-0x00000000feffffff]
>>
>> (no DMA zones here)
>>
>> As you might have noticed I have _zero_ clue about memory mapping and dma
>> subsystem - so let me know if there is any more information needed for that
>> and thanks for your help.
>
> Adding Robin as well, he has a better clue than us on DMA ;).
>
>> Alex
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/mapping.c?id=e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93#n235
>
On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
> Hi Alex,
>
> Thanks for the report.
>
> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
> > it seems there is a regression in arm64 memory mapping in 5.14, since it
> > fails on Rockchip RK3328 when the pl330 dmac tries to map with:
> >
> > [??? 8.921909] ------------[ cut here ]------------
> > [??? 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> > [??? 8.921973] Modules linked in: spi_rockchip(+) fuse
> > [??? 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> > [??? 8.922004] Hardware name: Pine64 Rock64 (DT)
> > [??? 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > [??? 8.922018] pc : dma_map_resource+0x68/0xc0
> > [??? 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
> > [??? 8.922040] sp : ffff800012102ae0
> > [??? 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
> > [??? 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
> > [??? 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
> > [??? 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
> > [??? 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > [??? 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
> > [??? 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
> > [??? 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
> > [??? 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
> > [??? 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
> > [??? 8.922158] Call trace:
> > [??? 8.922163]? dma_map_resource+0x68/0xc0
> > [??? 8.922173]? pl330_prep_slave_sg+0x58/0x220
> > [??? 8.922181]? rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
> > [??? 8.922208]? rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> [...]
> > Note: This does not relate to the spi driver - when disabling this device in
> > the device tree it fails for any other (i2s, for instance) which uses dma.
> > Commenting out the failing check at [1], however, helps and the mapping
> > works again.
> Do you know which address dma_map_resource() is trying to map (maybe
> add some printk())? It's not supposed to map RAM, hence the warning.
> Random guess, the address is 0xff190800 (based on the x1 above but the
> regs might as well be mangled).
0xff190800 will cause this warning for sure. It has a memory map, but it is
not RAM so old version of pfn_valid() would return 0 and the new one
returns 1.
> > I tried to follow the recent changes for arm64 mm which could relate to the
> > check failing at [1] and reverting
> > ? commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
> > helps and makes it work again, but I'm 100% uncertain if that commit is
> > really the culprit.
> >
> > Note, that the firmware (legacy u-boot) injects memory configuration in the
> > device tree as follows:
> >
> > /memreserve/??? 0x00000000fcefc000 0x000000000000d000;
> > / {
> > ..
> > ??? compatible = "pine64,rock64\0rockchip,rk3328";
> > ..
> > ??? memory {
> > ??? ??? reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
> > ??? ??? device_type = "memory";
> > ??? };
> >
> > ..
> > }
>
> Either pfn_valid() gets confused in 5.14 or something is wrong with the
> DT. I have a suspicion it's the former since reverting the above commit
> makes it disappear.
I think pfn_valid() actually behaves as expected but the caller is wrong
because pfn_valid != RAM (this applies btw to !arm64 as well).
/* Don't allow RAM to be mapped */
if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;
Alex, can you please try this patch:
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..4715e9641a29 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -232,7 +232,7 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
return DMA_MAPPING_ERROR;
/* Don't allow RAM to be mapped */
- if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
+ if (WARN_ON_ONCE(!memblock_is_memory(phys_addr)))
return DMA_MAPPING_ERROR;
if (dma_map_direct(dev, ops))
> > So: there is a "hole" in the mappable memory and reading the commit message
> > of
> > ? commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify
> > pfn_valid()")
> > suggests, there was a change for that case recently.
>
> I think the change from the arm64 pfn_valid() to the generic one is
> avoiding the call to memblock_is_memory(). I wonder whether pfn_valid()
> returns true just because we have a struct page available but the memory
> may have been reserved.
>
> Cc'ing Mike R.
>
> > I also noticed there is a diff in the kernel log regarding memory init up
> > until 5.13.12 it says
> >
> > [??? 0.000000] Zone ranges:
> > [??? 0.000000]?? DMA????? [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000]?? DMA32??? empty
> > [??? 0.000000]?? Normal?? empty
> > [??? 0.000000] Movable zone start for each node
> > [??? 0.000000] Early memory node ranges
> > [??? 0.000000]?? node?? 0: [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000] On node 0 totalpages: 1043968
> > [??? 0.000000]?? DMA zone: 16312 pages used for memmap
> > [??? 0.000000]?? DMA zone: 0 pages reserved
> > [??? 0.000000]?? DMA zone: 1043968 pages, LIFO batch:63
> >
> > In contrary in 5.14-rc7 it says:
> >
> > [??? 0.000000] Zone ranges:
> > [??? 0.000000]?? DMA????? [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000]?? DMA32??? empty
> > [??? 0.000000]?? Normal?? empty
> > [??? 0.000000] Movable zone start for each node
> > [??? 0.000000] Early memory node ranges
> > [??? 0.000000]?? node?? 0: [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
> > [??? 0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
> >
> > (note the "unavailable ranges")
> > I'm uncertain again here, if that diff is expected behavior because of those
> > recent mm changes for arm64.
> >
> > After reverting
> > ? commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
> > the log changes to
> >
> > [??? 0.000000] Zone ranges:
> > [??? 0.000000]?? DMA????? [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000]?? DMA32??? empty
> > [??? 0.000000]?? Normal?? empty
> > [??? 0.000000] Movable zone start for each node
> > [??? 0.000000] Early memory node ranges
> > [??? 0.000000]?? node?? 0: [mem 0x0000000000200000-0x00000000feffffff]
> > [??? 0.000000] Initmem setup node 0 [mem
> > 0x0000000000200000-0x00000000feffffff]
> >
> > (no DMA zones here)
> >
> > As you might have noticed I have _zero_ clue about memory mapping and dma
> > subsystem - so let me know if there is any more information needed for that
> > and thanks for your help.
>
> Adding Robin as well, he has a better clue than us on DMA ;).
>
> > Alex
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/mapping.c?id=e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93#n235
>
> --
> Catalin
--
Sincerely yours,
Mike.
On 2021-08-24 19:28, Mike Rapoport wrote:
> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>> Hi Alex,
>>
>> Thanks for the report.
>>
>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>
>>> [��� 8.921909] ------------[ cut here ]------------
>>> [��� 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>>> [��� 8.921973] Modules linked in: spi_rockchip(+) fuse
>>> [��� 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>> [��� 8.922004] Hardware name: Pine64 Rock64 (DT)
>>> [��� 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>> [��� 8.922018] pc : dma_map_resource+0x68/0xc0
>>> [��� 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
>>> [��� 8.922040] sp : ffff800012102ae0
>>> [��� 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>> [��� 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>> [��� 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>> [��� 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>> [��� 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>> [��� 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>> [��� 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>> [��� 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>> [��� 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>> [��� 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>> [��� 8.922158] Call trace:
>>> [��� 8.922163]� dma_map_resource+0x68/0xc0
>>> [��� 8.922173]� pl330_prep_slave_sg+0x58/0x220
>>> [��� 8.922181]� rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>> [��� 8.922208]� rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>> [...]
>>> Note: This does not relate to the spi driver - when disabling this device in
>>> the device tree it fails for any other (i2s, for instance) which uses dma.
>>> Commenting out the failing check at [1], however, helps and the mapping
>>> works again.
>
>> Do you know which address dma_map_resource() is trying to map (maybe
>> add some printk())? It's not supposed to map RAM, hence the warning.
>> Random guess, the address is 0xff190800 (based on the x1 above but the
>> regs might as well be mangled).
>
> 0xff190800 will cause this warning for sure. It has a memory map, but it is
> not RAM so old version of pfn_valid() would return 0 and the new one
> returns 1.
How does that happen, though? It's not a memory address, and it's not
even within the bounds of anywhere there should or could be memory. This
SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
the DRAM controller (which may not all be populated, and may have pieces
carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
do we have pages (or at least the assumption of pages) for somewhere
which by all rights should not have them?
>>> I tried to follow the recent changes for arm64 mm which could relate to the
>>> check failing at [1] and reverting
>>> � commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>>> helps and makes it work again, but I'm 100% uncertain if that commit is
>>> really the culprit.
>>>
>>> Note, that the firmware (legacy u-boot) injects memory configuration in the
>>> device tree as follows:
>>>
>>> /memreserve/��� 0x00000000fcefc000 0x000000000000d000;
>>> / {
>>> ..
>>> ��� compatible = "pine64,rock64\0rockchip,rk3328";
>>> ..
>>> ��� memory {
>>> ��� ��� reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
>>> ��� ��� device_type = "memory";
>>> ��� };
>>>
>>> ..
>>> }
>>
>> Either pfn_valid() gets confused in 5.14 or something is wrong with the
>> DT. I have a suspicion it's the former since reverting the above commit
>> makes it disappear.
>
> I think pfn_valid() actually behaves as expected but the caller is wrong
> because pfn_valid != RAM (this applies btw to !arm64 as well).
>
> /* Don't allow RAM to be mapped */
> if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> return DMA_MAPPING_ERROR;
>
> Alex, can you please try this patch:
That will certainly paper over the issue, but it's avoiding the question
of what went wrong with the memory map in the first place. The comment
is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
addresses that would be wrong to pass to dma_map_page(), so I believe
pfn_valid() is still the correct check.
Robin.
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..4715e9641a29 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -232,7 +232,7 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
> return DMA_MAPPING_ERROR;
>
> /* Don't allow RAM to be mapped */
> - if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> + if (WARN_ON_ONCE(!memblock_is_memory(phys_addr)))
> return DMA_MAPPING_ERROR;
>
> if (dma_map_direct(dev, ops))
>
>>> So: there is a "hole" in the mappable memory and reading the commit message
>>> of
>>> � commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify
>>> pfn_valid()")
>>> suggests, there was a change for that case recently.
>>
>> I think the change from the arm64 pfn_valid() to the generic one is
>> avoiding the call to memblock_is_memory(). I wonder whether pfn_valid()
>> returns true just because we have a struct page available but the memory
>> may have been reserved.
>>
>> Cc'ing Mike R.
>>
>>> I also noticed there is a diff in the kernel log regarding memory init up
>>> until 5.13.12 it says
>>>
>>> [��� 0.000000] Zone ranges:
>>> [��� 0.000000]�� DMA����� [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000]�� DMA32��� empty
>>> [��� 0.000000]�� Normal�� empty
>>> [��� 0.000000] Movable zone start for each node
>>> [��� 0.000000] Early memory node ranges
>>> [��� 0.000000]�� node�� 0: [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000] On node 0 totalpages: 1043968
>>> [��� 0.000000]�� DMA zone: 16312 pages used for memmap
>>> [��� 0.000000]�� DMA zone: 0 pages reserved
>>> [��� 0.000000]�� DMA zone: 1043968 pages, LIFO batch:63
>>>
>>> In contrary in 5.14-rc7 it says:
>>>
>>> [��� 0.000000] Zone ranges:
>>> [��� 0.000000]�� DMA����� [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000]�� DMA32��� empty
>>> [��� 0.000000]�� Normal�� empty
>>> [��� 0.000000] Movable zone start for each node
>>> [��� 0.000000] Early memory node ranges
>>> [��� 0.000000]�� node�� 0: [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
>>> [��� 0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
>>>
>>> (note the "unavailable ranges")
>>> I'm uncertain again here, if that diff is expected behavior because of those
>>> recent mm changes for arm64.
>>>
>>> After reverting
>>> � commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>>> the log changes to
>>>
>>> [��� 0.000000] Zone ranges:
>>> [��� 0.000000]�� DMA����� [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000]�� DMA32��� empty
>>> [��� 0.000000]�� Normal�� empty
>>> [��� 0.000000] Movable zone start for each node
>>> [��� 0.000000] Early memory node ranges
>>> [��� 0.000000]�� node�� 0: [mem 0x0000000000200000-0x00000000feffffff]
>>> [��� 0.000000] Initmem setup node 0 [mem
>>> 0x0000000000200000-0x00000000feffffff]
>>>
>>> (no DMA zones here)
>>>
>>> As you might have noticed I have _zero_ clue about memory mapping and dma
>>> subsystem - so let me know if there is any more information needed for that
>>> and thanks for your help.
>>
>> Adding Robin as well, he has a better clue than us on DMA ;).
>>
>>> Alex
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/mapping.c?id=e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93#n235
>>
>> --
>> Catalin
>
On 24.08.21 20:46, Robin Murphy wrote:
> On 2021-08-24 19:28, Mike Rapoport wrote:
>> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>>> Hi Alex,
>>>
>>> Thanks for the report.
>>>
>>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>>
>>>> [��� 8.921909] ------------[ cut here ]------------
>>>> [��� 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>>>> [��� 8.921973] Modules linked in: spi_rockchip(+) fuse
>>>> [��� 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>>> [��� 8.922004] Hardware name: Pine64 Rock64 (DT)
>>>> [��� 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>> [��� 8.922018] pc : dma_map_resource+0x68/0xc0
>>>> [��� 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
>>>> [��� 8.922040] sp : ffff800012102ae0
>>>> [��� 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>>> [��� 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>>> [��� 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>>> [��� 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>>> [��� 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>> [��� 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>>> [��� 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>>> [��� 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>>> [��� 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>>> [��� 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>>> [��� 8.922158] Call trace:
>>>> [��� 8.922163]� dma_map_resource+0x68/0xc0
>>>> [��� 8.922173]� pl330_prep_slave_sg+0x58/0x220
>>>> [��� 8.922181]� rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>>> [��� 8.922208]� rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>>> [...]
>>>> Note: This does not relate to the spi driver - when disabling this device in
>>>> the device tree it fails for any other (i2s, for instance) which uses dma.
>>>> Commenting out the failing check at [1], however, helps and the mapping
>>>> works again.
>>
>>> Do you know which address dma_map_resource() is trying to map (maybe
>>> add some printk())? It's not supposed to map RAM, hence the warning.
>>> Random guess, the address is 0xff190800 (based on the x1 above but the
>>> regs might as well be mangled).
>>
>> 0xff190800 will cause this warning for sure. It has a memory map, but it is
>> not RAM so old version of pfn_valid() would return 0 and the new one
>> returns 1.
>
> How does that happen, though? It's not a memory address, and it's not
> even within the bounds of anywhere there should or could be memory. This
> SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
> the DRAM controller (which may not all be populated, and may have pieces
> carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
> do we have pages (or at least the assumption of pages) for somewhere
> which by all rights should not have them?
Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to
avoid any such hacks. If there is a memory hole, it gets a memmap as well.
Tricking pfn_valid() into returning "false" where we actually have a
memmap only makes it look like there is no memmap; but there is one, and
it's PG_reserved.
[...]
>>> Either pfn_valid() gets confused in 5.14 or something is wrong with the
>>> DT. I have a suspicion it's the former since reverting the above commit
>>> makes it disappear.
>>
>> I think pfn_valid() actually behaves as expected but the caller is wrong
>> because pfn_valid != RAM (this applies btw to !arm64 as well).
>>
>> /* Don't allow RAM to be mapped */
>> if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>> return DMA_MAPPING_ERROR;
>>
>> Alex, can you please try this patch:
>
> That will certainly paper over the issue, but it's avoiding the question
> of what went wrong with the memory map in the first place. The comment
> is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
> addresses that would be wrong to pass to dma_map_page(), so I believe
> pfn_valid() is still the correct check.
If we want to check for RAM, pfn_valid() would be wrong. If we want to
check for "is there a memmap, for whatever lives or does not live
there", pfn_valid() is the right check.
--
Thanks,
David / dhildenb
Hi Catalin,
thanks for your quick reply.
Am 24.08.21 um 19:37 schrieb Catalin Marinas:
> Hi Alex,
>
> Thanks for the report.
>
> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>
>> [ 8.921909] ------------[ cut here ]------------
>> [ 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>> [ 8.921973] Modules linked in: spi_rockchip(+) fuse
>> [ 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>> [ 8.922004] Hardware name: Pine64 Rock64 (DT)
>> [ 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>> [ 8.922018] pc : dma_map_resource+0x68/0xc0
>> [ 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
>> [ 8.922040] sp : ffff800012102ae0
>> [ 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>> [ 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>> [ 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>> [ 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>> [ 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> [ 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>> [ 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>> [ 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>> [ 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>> [ 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>> [ 8.922158] Call trace:
>> [ 8.922163] dma_map_resource+0x68/0xc0
>> [ 8.922173] pl330_prep_slave_sg+0x58/0x220
>> [ 8.922181] rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>> [ 8.922208] rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> [...]
>> Note: This does not relate to the spi driver - when disabling this device in
>> the device tree it fails for any other (i2s, for instance) which uses dma.
>> Commenting out the failing check at [1], however, helps and the mapping
>> works again.
> Do you know which address dma_map_resource() is trying to map (maybe
> add some printk())? It's not supposed to map RAM, hence the warning.
> Random guess, the address is 0xff190800 (based on the x1 above but the
> regs might as well be mangled).
Confirmed:
[..]
[ 8.353879] dma_map_resource Failed to map address 0xff190800
[..]
Alex
>
>> I tried to follow the recent changes for arm64 mm which could relate to the
>> check failing at [1] and reverting
>> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>> helps and makes it work again, but I'm 100% uncertain if that commit is
>> really the culprit.
>>
>> Note, that the firmware (legacy u-boot) injects memory configuration in the
>> device tree as follows:
>>
>> /memreserve/ 0x00000000fcefc000 0x000000000000d000;
>> / {
>> ..
>> compatible = "pine64,rock64\0rockchip,rk3328";
>> ..
>> memory {
>> reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
>> device_type = "memory";
>> };
>>
>> ..
>> }
> Either pfn_valid() gets confused in 5.14 or something is wrong with the
> DT. I have a suspicion it's the former since reverting the above commit
> makes it disappear.
>
>> So: there is a "hole" in the mappable memory and reading the commit message
>> of
>> commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify
>> pfn_valid()")
>> suggests, there was a change for that case recently.
> I think the change from the arm64 pfn_valid() to the generic one is
> avoiding the call to memblock_is_memory(). I wonder whether pfn_valid()
> returns true just because we have a struct page available but the memory
> may have been reserved.
>
> Cc'ing Mike R.
>
>> I also noticed there is a diff in the kernel log regarding memory init up
>> until 5.13.12 it says
>>
>> [ 0.000000] Zone ranges:
>> [ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] DMA32 empty
>> [ 0.000000] Normal empty
>> [ 0.000000] Movable zone start for each node
>> [ 0.000000] Early memory node ranges
>> [ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] On node 0 totalpages: 1043968
>> [ 0.000000] DMA zone: 16312 pages used for memmap
>> [ 0.000000] DMA zone: 0 pages reserved
>> [ 0.000000] DMA zone: 1043968 pages, LIFO batch:63
>>
>> In contrary in 5.14-rc7 it says:
>>
>> [ 0.000000] Zone ranges:
>> [ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] DMA32 empty
>> [ 0.000000] Normal empty
>> [ 0.000000] Movable zone start for each node
>> [ 0.000000] Early memory node ranges
>> [ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
>> [ 0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
>>
>> (note the "unavailable ranges")
>> I'm uncertain again here, if that diff is expected behavior because of those
>> recent mm changes for arm64.
>>
>> After reverting
>> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>> the log changes to
>>
>> [ 0.000000] Zone ranges:
>> [ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] DMA32 empty
>> [ 0.000000] Normal empty
>> [ 0.000000] Movable zone start for each node
>> [ 0.000000] Early memory node ranges
>> [ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
>> [ 0.000000] Initmem setup node 0 [mem
>> 0x0000000000200000-0x00000000feffffff]
>>
>> (no DMA zones here)
>>
>> As you might have noticed I have _zero_ clue about memory mapping and dma
>> subsystem - so let me know if there is any more information needed for that
>> and thanks for your help.
> Adding Robin as well, he has a better clue than us on DMA ;).
>
>> Alex
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/mapping.c?id=e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93#n235
Hi Mike,
thanks for your reply
Am 24.08.21 um 20:28 schrieb Mike Rapoport:
> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>> Hi Alex,
>>
>> Thanks for the report.
>>
>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>
>>> [ 8.921909] ------------[ cut here ]------------
>>> [ 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>>> [ 8.921973] Modules linked in: spi_rockchip(+) fuse
>>> [ 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>> [ 8.922004] Hardware name: Pine64 Rock64 (DT)
>>> [ 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>> [ 8.922018] pc : dma_map_resource+0x68/0xc0
>>> [ 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
>>> [ 8.922040] sp : ffff800012102ae0
>>> [ 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>> [ 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>> [ 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>> [ 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>> [ 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>> [ 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>> [ 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>> [ 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>> [ 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>> [ 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>> [ 8.922158] Call trace:
>>> [ 8.922163] dma_map_resource+0x68/0xc0
>>> [ 8.922173] pl330_prep_slave_sg+0x58/0x220
>>> [ 8.922181] rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>> [ 8.922208] rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>> [...]
>>> Note: This does not relate to the spi driver - when disabling this device in
>>> the device tree it fails for any other (i2s, for instance) which uses dma.
>>> Commenting out the failing check at [1], however, helps and the mapping
>>> works again.
>> Do you know which address dma_map_resource() is trying to map (maybe
>> add some printk())? It's not supposed to map RAM, hence the warning.
>> Random guess, the address is 0xff190800 (based on the x1 above but the
>> regs might as well be mangled).
> 0xff190800 will cause this warning for sure. It has a memory map, but it is
> not RAM so old version of pfn_valid() would return 0 and the new one
> returns 1.
>
>>> I tried to follow the recent changes for arm64 mm which could relate to the
>>> check failing at [1] and reverting
>>> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>>> helps and makes it work again, but I'm 100% uncertain if that commit is
>>> really the culprit.
>>>
>>> Note, that the firmware (legacy u-boot) injects memory configuration in the
>>> device tree as follows:
>>>
>>> /memreserve/ 0x00000000fcefc000 0x000000000000d000;
>>> / {
>>> ..
>>> compatible = "pine64,rock64\0rockchip,rk3328";
>>> ..
>>> memory {
>>> reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
>>> device_type = "memory";
>>> };
>>>
>>> ..
>>> }
>> Either pfn_valid() gets confused in 5.14 or something is wrong with the
>> DT. I have a suspicion it's the former since reverting the above commit
>> makes it disappear.
> I think pfn_valid() actually behaves as expected but the caller is wrong
> because pfn_valid != RAM (this applies btw to !arm64 as well).
>
> /* Don't allow RAM to be mapped */
> if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> return DMA_MAPPING_ERROR;
>
> Alex, can you please try this patch:
>
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 2b06a809d0b9..4715e9641a29 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -232,7 +232,7 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
> return DMA_MAPPING_ERROR;
>
> /* Don't allow RAM to be mapped */
> - if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> + if (WARN_ON_ONCE(!memblock_is_memory(phys_addr)))
> return DMA_MAPPING_ERROR;
>
> if (dma_map_direct(dev, ops))
>
Nope, doesn't help:
[ 8.353879] dma_map_resource Failed to map address 0xff190800
[ 8.353886] dma_map_resource pfn_valid(PHYS_PFN(0xff190800)): 1
[ 8.353892] dma_map_resource memblock_is_memory(0xff190800): 0
If understand the comment for that check correct, that we _don't_ want
RAM to be mapped - shoudn't that be:
+ if (WARN_ON_ONCE(memblock_is_memory(phys_addr)))
?
Alex
>>> So: there is a "hole" in the mappable memory and reading the commit message
>>> of
>>> commit a7d9f306ba70 ("arm64: drop pfn_valid_within() and simplify
>>> pfn_valid()")
>>> suggests, there was a change for that case recently.
>> I think the change from the arm64 pfn_valid() to the generic one is
>> avoiding the call to memblock_is_memory(). I wonder whether pfn_valid()
>> returns true just because we have a struct page available but the memory
>> may have been reserved.
>>
>> Cc'ing Mike R.
>>
>>> I also noticed there is a diff in the kernel log regarding memory init up
>>> until 5.13.12 it says
>>>
>>> [ 0.000000] Zone ranges:
>>> [ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] DMA32 empty
>>> [ 0.000000] Normal empty
>>> [ 0.000000] Movable zone start for each node
>>> [ 0.000000] Early memory node ranges
>>> [ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] On node 0 totalpages: 1043968
>>> [ 0.000000] DMA zone: 16312 pages used for memmap
>>> [ 0.000000] DMA zone: 0 pages reserved
>>> [ 0.000000] DMA zone: 1043968 pages, LIFO batch:63
>>>
>>> In contrary in 5.14-rc7 it says:
>>>
>>> [ 0.000000] Zone ranges:
>>> [ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] DMA32 empty
>>> [ 0.000000] Normal empty
>>> [ 0.000000] Movable zone start for each node
>>> [ 0.000000] Early memory node ranges
>>> [ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
>>> [ 0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
>>>
>>> (note the "unavailable ranges")
>>> I'm uncertain again here, if that diff is expected behavior because of those
>>> recent mm changes for arm64.
>>>
>>> After reverting
>>> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>>> the log changes to
>>>
>>> [ 0.000000] Zone ranges:
>>> [ 0.000000] DMA [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] DMA32 empty
>>> [ 0.000000] Normal empty
>>> [ 0.000000] Movable zone start for each node
>>> [ 0.000000] Early memory node ranges
>>> [ 0.000000] node 0: [mem 0x0000000000200000-0x00000000feffffff]
>>> [ 0.000000] Initmem setup node 0 [mem
>>> 0x0000000000200000-0x00000000feffffff]
>>>
>>> (no DMA zones here)
>>>
>>> As you might have noticed I have _zero_ clue about memory mapping and dma
>>> subsystem - so let me know if there is any more information needed for that
>>> and thanks for your help.
>> Adding Robin as well, he has a better clue than us on DMA ;).
>>
>>> Alex
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/mapping.c?id=e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93#n235
>> --
>> Catalin
On Tue, Aug 24, 2021 at 10:14:01PM +0200, Alex Bee wrote:
> Hi Mike,
>
> thanks for your reply
>
> Am 24.08.21 um 20:28 schrieb Mike Rapoport:
> > On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
> > > Hi Alex,
> > >
> > > Thanks for the report.
> > >
> > > On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
> > > > it seems there is a regression in arm64 memory mapping in 5.14, since it
> > > > fails on Rockchip RK3328 when the pl330 dmac tries to map with:
> > > >
> > > > [??? 8.921909] ------------[ cut here ]------------
> > > > [??? 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> > > > [??? 8.921973] Modules linked in: spi_rockchip(+) fuse
> > > > [??? 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> > > > [??? 8.922004] Hardware name: Pine64 Rock64 (DT)
> > > > [??? 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > > > [??? 8.922018] pc : dma_map_resource+0x68/0xc0
> > > > [??? 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
> > > > [??? 8.922040] sp : ffff800012102ae0
> > > > [??? 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
> > > > [??? 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
> > > > [??? 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
> > > > [??? 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
> > > > [??? 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > > > [??? 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
> > > > [??? 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
> > > > [??? 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
> > > > [??? 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
> > > > [??? 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
> > > > [??? 8.922158] Call trace:
> > > > [??? 8.922163]? dma_map_resource+0x68/0xc0
> > > > [??? 8.922173]? pl330_prep_slave_sg+0x58/0x220
> > > > [??? 8.922181]? rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
> > > > [??? 8.922208]? rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> > > [...]
> > > > Note: This does not relate to the spi driver - when disabling this device in
> > > > the device tree it fails for any other (i2s, for instance) which uses dma.
> > > > Commenting out the failing check at [1], however, helps and the mapping
> > > > works again.
> > > Do you know which address dma_map_resource() is trying to map (maybe
> > > add some printk())? It's not supposed to map RAM, hence the warning.
> > > Random guess, the address is 0xff190800 (based on the x1 above but the
> > > regs might as well be mangled).
> > 0xff190800 will cause this warning for sure. It has a memory map, but it is
> > not RAM so old version of pfn_valid() would return 0 and the new one
> > returns 1.
> > > > I tried to follow the recent changes for arm64 mm which could relate to the
> > > > check failing at [1] and reverting
> > > > ? commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
> > > > helps and makes it work again, but I'm 100% uncertain if that commit is
> > > > really the culprit.
> > > >
> > > > Note, that the firmware (legacy u-boot) injects memory configuration in the
> > > > device tree as follows:
> > > >
> > > > /memreserve/??? 0x00000000fcefc000 0x000000000000d000;
> > > > / {
> > > > ..
> > > > ??? compatible = "pine64,rock64\0rockchip,rk3328";
> > > > ..
> > > > ??? memory {
> > > > ??? ??? reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
> > > > ??? ??? device_type = "memory";
> > > > ??? };
> > > >
> > > > ..
> > > > }
> > > Either pfn_valid() gets confused in 5.14 or something is wrong with the
> > > DT. I have a suspicion it's the former since reverting the above commit
> > > makes it disappear.
> > I think pfn_valid() actually behaves as expected but the caller is wrong
> > because pfn_valid != RAM (this applies btw to !arm64 as well).
> >
> > /* Don't allow RAM to be mapped */
> > if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> > return DMA_MAPPING_ERROR;
> >
> > Alex, can you please try this patch:
> >
> > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> > index 2b06a809d0b9..4715e9641a29 100644
> > --- a/kernel/dma/mapping.c
> > +++ b/kernel/dma/mapping.c
> > @@ -232,7 +232,7 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
> > return DMA_MAPPING_ERROR;
> > /* Don't allow RAM to be mapped */
> > - if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> > + if (WARN_ON_ONCE(!memblock_is_memory(phys_addr)))
> > return DMA_MAPPING_ERROR;
> > if (dma_map_direct(dev, ops))
>
> Nope, doesn't help:
>
> [??? 8.353879] dma_map_resource Failed to map address 0xff190800
> [??? 8.353886] dma_map_resource pfn_valid(PHYS_PFN(0xff190800)): 1
> [??? 8.353892] dma_map_resource memblock_is_memory(0xff190800): 0
>
> If understand the comment for that check correct, that we _don't_ want RAM
> to be mapped - shoudn't that be:
>
> + if (WARN_ON_ONCE(memblock_is_memory(phys_addr)))
>
> ?
Right, we don't want RAM to be mapped, the negation was wrong and it should
be
if (WARN_ON_ONCE(memblock_is_memory(phys_addr)))
--
Sincerely yours,
Mike.
Am 25.08.21 um 06:39 schrieb Mike Rapoport:
> On Tue, Aug 24, 2021 at 10:14:01PM +0200, Alex Bee wrote:
>> Hi Mike,
>>
>> thanks for your reply
>>
>> Am 24.08.21 um 20:28 schrieb Mike Rapoport:
>>> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>>>> Hi Alex,
>>>>
>>>> Thanks for the report.
>>>>
>>>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>>>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>>>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>>>
>>>>> [ 8.921909] ------------[ cut here ]------------
>>>>> [ 8.921940] WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>>>>> [ 8.921973] Modules linked in: spi_rockchip(+) fuse
>>>>> [ 8.921996] CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>>>> [ 8.922004] Hardware name: Pine64 Rock64 (DT)
>>>>> [ 8.922011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>>> [ 8.922018] pc : dma_map_resource+0x68/0xc0
>>>>> [ 8.922026] lr : pl330_prep_slave_fifo+0x78/0xd0
>>>>> [ 8.922040] sp : ffff800012102ae0
>>>>> [ 8.922043] x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>>>> [ 8.922056] x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>>>> [ 8.922067] x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>>>> [ 8.922078] x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>>>> [ 8.922089] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>>> [ 8.922100] x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>>>> [ 8.922111] x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>>>> [ 8.922123] x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>>>> [ 8.922134] x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>>>> [ 8.922145] x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>>>> [ 8.922158] Call trace:
>>>>> [ 8.922163] dma_map_resource+0x68/0xc0
>>>>> [ 8.922173] pl330_prep_slave_sg+0x58/0x220
>>>>> [ 8.922181] rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>>>> [ 8.922208] rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>>>> [...]
>>>>> Note: This does not relate to the spi driver - when disabling this device in
>>>>> the device tree it fails for any other (i2s, for instance) which uses dma.
>>>>> Commenting out the failing check at [1], however, helps and the mapping
>>>>> works again.
>>>> Do you know which address dma_map_resource() is trying to map (maybe
>>>> add some printk())? It's not supposed to map RAM, hence the warning.
>>>> Random guess, the address is 0xff190800 (based on the x1 above but the
>>>> regs might as well be mangled).
>>> 0xff190800 will cause this warning for sure. It has a memory map, but it is
>>> not RAM so old version of pfn_valid() would return 0 and the new one
>>> returns 1.
>>>>> I tried to follow the recent changes for arm64 mm which could relate to the
>>>>> check failing at [1] and reverting
>>>>> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
>>>>> helps and makes it work again, but I'm 100% uncertain if that commit is
>>>>> really the culprit.
>>>>>
>>>>> Note, that the firmware (legacy u-boot) injects memory configuration in the
>>>>> device tree as follows:
>>>>>
>>>>> /memreserve/ 0x00000000fcefc000 0x000000000000d000;
>>>>> / {
>>>>> ..
>>>>> compatible = "pine64,rock64\0rockchip,rk3328";
>>>>> ..
>>>>> memory {
>>>>> reg = <0x00 0x200000 0x00 0xfee00000 0x00 0x00 0x00 0x00>;
>>>>> device_type = "memory";
>>>>> };
>>>>>
>>>>> ..
>>>>> }
>>>> Either pfn_valid() gets confused in 5.14 or something is wrong with the
>>>> DT. I have a suspicion it's the former since reverting the above commit
>>>> makes it disappear.
>>> I think pfn_valid() actually behaves as expected but the caller is wrong
>>> because pfn_valid != RAM (this applies btw to !arm64 as well).
>>>
>>> /* Don't allow RAM to be mapped */
>>> if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>>> return DMA_MAPPING_ERROR;
>>>
>>> Alex, can you please try this patch:
>>>
>>> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
>>> index 2b06a809d0b9..4715e9641a29 100644
>>> --- a/kernel/dma/mapping.c
>>> +++ b/kernel/dma/mapping.c
>>> @@ -232,7 +232,7 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
>>> return DMA_MAPPING_ERROR;
>>> /* Don't allow RAM to be mapped */
>>> - if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>>> + if (WARN_ON_ONCE(!memblock_is_memory(phys_addr)))
>>> return DMA_MAPPING_ERROR;
>>> if (dma_map_direct(dev, ops))
>> Nope, doesn't help:
>>
>> [ 8.353879] dma_map_resource Failed to map address 0xff190800
>> [ 8.353886] dma_map_resource pfn_valid(PHYS_PFN(0xff190800)): 1
>> [ 8.353892] dma_map_resource memblock_is_memory(0xff190800): 0
>>
>> If understand the comment for that check correct, that we _don't_ want RAM
>> to be mapped - shoudn't that be:
>>
>> + if (WARN_ON_ONCE(memblock_is_memory(phys_addr)))
>>
>> ?
> Right, we don't want RAM to be mapped, the negation was wrong and it should
> be
>
> if (WARN_ON_ONCE(memblock_is_memory(phys_addr)))
>
OK, great: that helps with no other complaints - at least on this
(arm64) SoC.
Since this affects all other archs, I doubt this will be in 5.14 - but
it would be great if there could be fix for that in 5.14 since at system
breaking bug. I'm not sure why it wasn't reported yet - is there any
chance that at least
commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID")
can be reverted in 5.14?
Alex
+ hch
On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
> On 24.08.21 20:46, Robin Murphy wrote:
> > On 2021-08-24 19:28, Mike Rapoport wrote:
> > > On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
> > > > On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
> > > > > it seems there is a regression in arm64 memory mapping in 5.14, since it
> > > > > fails on Rockchip RK3328 when the pl330 dmac tries to map with:
> > > > >
> > > > > ------------[ cut here ]------------
> > > > > WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> > > > > Modules linked in: spi_rockchip(+) fuse
> > > > > CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> > > > > Hardware name: Pine64 Rock64 (DT)
> > > > > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > > > > pc : dma_map_resource+0x68/0xc0
> > > > > lr : pl330_prep_slave_fifo+0x78/0xd0
> > > > > sp : ffff800012102ae0
> > > > > x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
> > > > > x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
> > > > > x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
> > > > > x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
> > > > > x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > > > > x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
> > > > > x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
> > > > > x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
> > > > > x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
> > > > > x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
> > > > > Call trace:
> > > > > dma_map_resource+0x68/0xc0
> > > > > pl330_prep_slave_sg+0x58/0x220
> > > > > rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
> > > > > rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> > > > [...]
> > > > > Note: This does not relate to the spi driver - when disabling this device in
> > > > > the device tree it fails for any other (i2s, for instance) which uses dma.
> > > > > Commenting out the failing check at [1], however, helps and the mapping
> > > > > works again.
> > >
> > > > Do you know which address dma_map_resource() is trying to map (maybe
> > > > add some printk())? It's not supposed to map RAM, hence the warning.
> > > > Random guess, the address is 0xff190800 (based on the x1 above but the
> > > > regs might as well be mangled).
> > >
> > > 0xff190800 will cause this warning for sure. It has a memory map, but it is
> > > not RAM so old version of pfn_valid() would return 0 and the new one
> > > returns 1.
> >
> > How does that happen, though? It's not a memory address, and it's not
> > even within the bounds of anywhere there should or could be memory. This
> > SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
> > the DRAM controller (which may not all be populated, and may have pieces
> > carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
> > do we have pages (or at least the assumption of pages) for somewhere
> > which by all rights should not have them?
>
> Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to avoid
> any such hacks. If there is a memory hole, it gets a memmap as well.
>
> Tricking pfn_valid() into returning "false" where we actually have a memmap
> only makes it look like there is no memmap; but there is one, and
> it's PG_reserved.
I can see the documentation for pfn_valid() does not claim anything more
than the presence of an memmap entry. But I wonder whether the confusion
is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
assumes __va() can be used on pfn_valid(), though I suspect it relies on
the calling function to check that the resource was RAM. The arm64
kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
standard memcpy on it, which wouldn't work for I/O (should we change
this check to pfn_is_map_memory() for arm64?).
> > > > Either pfn_valid() gets confused in 5.14 or something is wrong with the
> > > > DT. I have a suspicion it's the former since reverting the above commit
> > > > makes it disappear.
> > >
> > > I think pfn_valid() actually behaves as expected but the caller is wrong
> > > because pfn_valid != RAM (this applies btw to !arm64 as well).
> > >
> > > /* Don't allow RAM to be mapped */
> > > if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> > > return DMA_MAPPING_ERROR;
> > >
> > > Alex, can you please try this patch:
> >
> > That will certainly paper over the issue, but it's avoiding the question
> > of what went wrong with the memory map in the first place. The comment
> > is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
> > addresses that would be wrong to pass to dma_map_page(), so I believe
> > pfn_valid() is still the correct check.
>
> If we want to check for RAM, pfn_valid() would be wrong. If we want to check
> for "is there a memmap, for whatever lives or does not live there",
> pfn_valid() is the right check.
So what should the DMA code use instead? Last time we needed something
similar, the recommendation was to use pfn_to_online_page(). Mike is
suggesting memblock_is_memory().
Given how later we are in the -rc cycle, I suggest we revert Anshuman's
commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
assess the implications in 5.15 (the patch doesn't seem to have the
arm64 maintainers' ack anyway ;)).
Thanks.
--
Catalin
On Wed, Aug 25, 2021 at 11:20:46AM +0100, Catalin Marinas wrote:
> Given how later we are in the -rc cycle, I suggest we revert Anshuman's
> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
> assess the implications in 5.15 (the patch doesn't seem to have the
> arm64 maintainers' ack anyway ;)).
I'll stick the revert (below) into kernelci now so we can get some coverage
in case it breaks something else.
Will
--->8
From e97ba0e39e486c20d8f76f3e632e4b7d933602cd Mon Sep 17 00:00:00 2001
From: Will Deacon <[email protected]>
Date: Wed, 25 Aug 2021 11:10:07 +0100
Subject: [PATCH] Revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
This reverts commit 16c9afc776608324ca71c0bc354987bab532f51d.
Alex Bee reports a regression in 5.14 on their RK3328 SoC when
configuring the PL330 DMA controller:
| ------------[ cut here ]------------
| WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
| Modules linked in: spi_rockchip(+) fuse
| CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
| Hardware name: Pine64 Rock64 (DT)
| pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
| pc : dma_map_resource+0x68/0xc0
| lr : pl330_prep_slave_fifo+0x78/0xd0
This appears to be because dma_map_resource() is being called for a
physical address which does not correspond to a memory address yet does
have a valid 'struct page' due to the way in which the vmemmap is
constructed.
Prior to 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64
implementation of pfn_valid() called memblock_is_memory() to return
'false' for such regions and the DMA mapping request would proceed.
However, now that we are using the generic implementation where only the
presence of the memory map entry is considered, we return 'true' and
erroneously fail with DMA_MAPPING_ERROR because we identify the region
as DRAM.
Although fixing this in the DMA mapping code is arguably the right fix,
it is a risky, cross-architecture change at this stage in the cycle. So
just revert arm64 back to its old pfn_valid() implementation for v5.14.
Cc: Catalin Marinas <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Reported-by: Alex Bee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/page.h | 1 +
arch/arm64/mm/init.c | 37 +++++++++++++++++++++++++++++++++++
include/linux/mmzone.h | 9 ---------
4 files changed, 39 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fdcd54d39c1e..62c3c1d2190f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -156,6 +156,7 @@ config ARM64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
+ select HAVE_ARCH_PFN_VALID
select HAVE_ARCH_PREL32_RELOCATIONS
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_SECCOMP_FILTER
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 993a27ea6f54..f98c91bbd7c1 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -41,6 +41,7 @@ void tag_clear_highpage(struct page *to);
typedef struct page *pgtable_t;
+int pfn_valid(unsigned long pfn);
int pfn_is_map_memory(unsigned long pfn);
#include <asm/memory.h>
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 8490ed2917ff..1fdb7bb7c198 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -219,6 +219,43 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
free_area_init(max_zone_pfns);
}
+int pfn_valid(unsigned long pfn)
+{
+ phys_addr_t addr = PFN_PHYS(pfn);
+ struct mem_section *ms;
+
+ /*
+ * Ensure the upper PAGE_SHIFT bits are clear in the
+ * pfn. Else it might lead to false positives when
+ * some of the upper bits are set, but the lower bits
+ * match a valid pfn.
+ */
+ if (PHYS_PFN(addr) != pfn)
+ return 0;
+
+ if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
+ return 0;
+
+ ms = __pfn_to_section(pfn);
+ if (!valid_section(ms))
+ return 0;
+
+ /*
+ * ZONE_DEVICE memory does not have the memblock entries.
+ * memblock_is_map_memory() check for ZONE_DEVICE based
+ * addresses will always fail. Even the normal hotplugged
+ * memory will never have MEMBLOCK_NOMAP flag set in their
+ * memblock entries. Skip memblock search for all non early
+ * memory sections covering all of hotplug memory including
+ * both normal and ZONE_DEVICE based.
+ */
+ if (!early_section(ms))
+ return pfn_section_valid(ms, pfn);
+
+ return memblock_is_memory(addr);
+}
+EXPORT_SYMBOL(pfn_valid);
+
int pfn_is_map_memory(unsigned long pfn)
{
phys_addr_t addr = PFN_PHYS(pfn);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fcb535560028..ee70f21a79d5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1463,15 +1463,6 @@ static inline int pfn_valid(unsigned long pfn)
{
struct mem_section *ms;
- /*
- * Ensure the upper PAGE_SHIFT bits are clear in the
- * pfn. Else it might lead to false positives when
- * some of the upper bits are set, but the lower bits
- * match a valid pfn.
- */
- if (PHYS_PFN(PFN_PHYS(pfn)) != pfn)
- return 0;
-
if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
return 0;
ms = __nr_to_section(pfn_to_section_nr(pfn));
--
2.33.0.rc2.250.ged5fa647cd-goog
On Wed, Aug 25, 2021 at 11:28:56AM +0100, Will Deacon wrote:
> On Wed, Aug 25, 2021 at 11:20:46AM +0100, Catalin Marinas wrote:
> > Given how later we are in the -rc cycle, I suggest we revert Anshuman's
> > commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
> > assess the implications in 5.15 (the patch doesn't seem to have the
> > arm64 maintainers' ack anyway ;)).
>
> I'll stick the revert (below) into kernelci now so we can get some coverage
> in case it breaks something else.
Bah, having said that...
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fcb535560028..ee70f21a79d5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1463,15 +1463,6 @@ static inline int pfn_valid(unsigned long pfn)
> {
> struct mem_section *ms;
>
> - /*
> - * Ensure the upper PAGE_SHIFT bits are clear in the
> - * pfn. Else it might lead to false positives when
> - * some of the upper bits are set, but the lower bits
> - * match a valid pfn.
> - */
> - if (PHYS_PFN(PFN_PHYS(pfn)) != pfn)
> - return 0;
> -
I suppose we should leave this bit as-is, since the whole point here is
trying to minimise the impact on other architectures.
Will
On 2021-08-25 11:28, Will Deacon wrote:
> On Wed, Aug 25, 2021 at 11:20:46AM +0100, Catalin Marinas wrote:
>> Given how later we are in the -rc cycle, I suggest we revert Anshuman's
>> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
>> assess the implications in 5.15 (the patch doesn't seem to have the
>> arm64 maintainers' ack anyway ;)).
>
> I'll stick the revert (below) into kernelci now so we can get some coverage
> in case it breaks something else.
>
> Will
>
> --->8
>
> From e97ba0e39e486c20d8f76f3e632e4b7d933602cd Mon Sep 17 00:00:00 2001
> From: Will Deacon <[email protected]>
> Date: Wed, 25 Aug 2021 11:10:07 +0100
> Subject: [PATCH] Revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"
>
> This reverts commit 16c9afc776608324ca71c0bc354987bab532f51d.
>
> Alex Bee reports a regression in 5.14 on their RK3328 SoC when
> configuring the PL330 DMA controller:
>
> | ------------[ cut here ]------------
> | WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> | Modules linked in: spi_rockchip(+) fuse
> | CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> | Hardware name: Pine64 Rock64 (DT)
> | pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> | pc : dma_map_resource+0x68/0xc0
> | lr : pl330_prep_slave_fifo+0x78/0xd0
>
> This appears to be because dma_map_resource() is being called for a
> physical address which does not correspond to a memory address yet does
> have a valid 'struct page' due to the way in which the vmemmap is
> constructed.
>
> Prior to 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64
> implementation of pfn_valid() called memblock_is_memory() to return
> 'false' for such regions and the DMA mapping request would proceed.
> However, now that we are using the generic implementation where only the
> presence of the memory map entry is considered, we return 'true' and
> erroneously fail with DMA_MAPPING_ERROR because we identify the region
> as DRAM.
>
> Although fixing this in the DMA mapping code is arguably the right fix,
> it is a risky, cross-architecture change at this stage in the cycle. So
> just revert arm64 back to its old pfn_valid() implementation for v5.14.
TBH the offending warning is only meant to be a quick sanity check, so I
don't think there should be much impact to changing the DMA code; it's
just a question of figuring out what change to make. I'm digging in now...
Thanks,
Robin.
> Cc: Catalin Marinas <[email protected]>
> Cc: Robin Murphy <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Reported-by: Alex Bee <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/page.h | 1 +
> arch/arm64/mm/init.c | 37 +++++++++++++++++++++++++++++++++++
> include/linux/mmzone.h | 9 ---------
> 4 files changed, 39 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fdcd54d39c1e..62c3c1d2190f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -156,6 +156,7 @@ config ARM64
> select HAVE_ARCH_KGDB
> select HAVE_ARCH_MMAP_RND_BITS
> select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
> + select HAVE_ARCH_PFN_VALID
> select HAVE_ARCH_PREL32_RELOCATIONS
> select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
> select HAVE_ARCH_SECCOMP_FILTER
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 993a27ea6f54..f98c91bbd7c1 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -41,6 +41,7 @@ void tag_clear_highpage(struct page *to);
>
> typedef struct page *pgtable_t;
>
> +int pfn_valid(unsigned long pfn);
> int pfn_is_map_memory(unsigned long pfn);
>
> #include <asm/memory.h>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 8490ed2917ff..1fdb7bb7c198 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -219,6 +219,43 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
> free_area_init(max_zone_pfns);
> }
>
> +int pfn_valid(unsigned long pfn)
> +{
> + phys_addr_t addr = PFN_PHYS(pfn);
> + struct mem_section *ms;
> +
> + /*
> + * Ensure the upper PAGE_SHIFT bits are clear in the
> + * pfn. Else it might lead to false positives when
> + * some of the upper bits are set, but the lower bits
> + * match a valid pfn.
> + */
> + if (PHYS_PFN(addr) != pfn)
> + return 0;
> +
> + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> + return 0;
> +
> + ms = __pfn_to_section(pfn);
> + if (!valid_section(ms))
> + return 0;
> +
> + /*
> + * ZONE_DEVICE memory does not have the memblock entries.
> + * memblock_is_map_memory() check for ZONE_DEVICE based
> + * addresses will always fail. Even the normal hotplugged
> + * memory will never have MEMBLOCK_NOMAP flag set in their
> + * memblock entries. Skip memblock search for all non early
> + * memory sections covering all of hotplug memory including
> + * both normal and ZONE_DEVICE based.
> + */
> + if (!early_section(ms))
> + return pfn_section_valid(ms, pfn);
> +
> + return memblock_is_memory(addr);
> +}
> +EXPORT_SYMBOL(pfn_valid);
> +
> int pfn_is_map_memory(unsigned long pfn)
> {
> phys_addr_t addr = PFN_PHYS(pfn);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fcb535560028..ee70f21a79d5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1463,15 +1463,6 @@ static inline int pfn_valid(unsigned long pfn)
> {
> struct mem_section *ms;
>
> - /*
> - * Ensure the upper PAGE_SHIFT bits are clear in the
> - * pfn. Else it might lead to false positives when
> - * some of the upper bits are set, but the lower bits
> - * match a valid pfn.
> - */
> - if (PHYS_PFN(PFN_PHYS(pfn)) != pfn)
> - return 0;
> -
> if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
> return 0;
> ms = __nr_to_section(pfn_to_section_nr(pfn));
>
On 25.08.21 12:20, Catalin Marinas wrote:
> + hch
>
> On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
>> On 24.08.21 20:46, Robin Murphy wrote:
>>> On 2021-08-24 19:28, Mike Rapoport wrote:
>>>> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>>>>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>>>>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>>>>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>>>>
>>>>>> ------------[ cut here ]------------
>>>>>> WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>>>>>> Modules linked in: spi_rockchip(+) fuse
>>>>>> CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>>>>> Hardware name: Pine64 Rock64 (DT)
>>>>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>>>> pc : dma_map_resource+0x68/0xc0
>>>>>> lr : pl330_prep_slave_fifo+0x78/0xd0
>>>>>> sp : ffff800012102ae0
>>>>>> x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>>>>> x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>>>>> x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>>>>> x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>>>> x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>>>>> x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>>>>> x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>>>>> x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>>>>> x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>>>>> Call trace:
>>>>>> dma_map_resource+0x68/0xc0
>>>>>> pl330_prep_slave_sg+0x58/0x220
>>>>>> rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>>>>> rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>>>>> [...]
>>>>>> Note: This does not relate to the spi driver - when disabling this device in
>>>>>> the device tree it fails for any other (i2s, for instance) which uses dma.
>>>>>> Commenting out the failing check at [1], however, helps and the mapping
>>>>>> works again.
>>>>
>>>>> Do you know which address dma_map_resource() is trying to map (maybe
>>>>> add some printk())? It's not supposed to map RAM, hence the warning.
>>>>> Random guess, the address is 0xff190800 (based on the x1 above but the
>>>>> regs might as well be mangled).
>>>>
>>>> 0xff190800 will cause this warning for sure. It has a memory map, but it is
>>>> not RAM so old version of pfn_valid() would return 0 and the new one
>>>> returns 1.
>>>
>>> How does that happen, though? It's not a memory address, and it's not
>>> even within the bounds of anywhere there should or could be memory. This
>>> SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
>>> the DRAM controller (which may not all be populated, and may have pieces
>>> carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
>>> do we have pages (or at least the assumption of pages) for somewhere
>>> which by all rights should not have them?
>>
>> Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to avoid
>> any such hacks. If there is a memory hole, it gets a memmap as well.
>>
>> Tricking pfn_valid() into returning "false" where we actually have a memmap
>> only makes it look like there is no memmap; but there is one, and
>> it's PG_reserved.
>
> I can see the documentation for pfn_valid() does not claim anything more
> than the presence of an memmap entry. But I wonder whether the confusion
> is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
> assumes __va() can be used on pfn_valid(), though I suspect it relies on
> the calling function to check that the resource was RAM. The arm64
> kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
> standard memcpy on it, which wouldn't work for I/O (should we change
> this check to pfn_is_map_memory() for arm64?).
>
kern_addr_valid() checks that there is a direct map entry, and that the
mapped address has a valid mmap. (copied from x86-64)
Would you expect to have a direct map for memory holes and similar (IOW,
!System RAM)?
>>>>> Either pfn_valid() gets confused in 5.14 or something is wrong with the
>>>>> DT. I have a suspicion it's the former since reverting the above commit
>>>>> makes it disappear.
>>>>
>>>> I think pfn_valid() actually behaves as expected but the caller is wrong
>>>> because pfn_valid != RAM (this applies btw to !arm64 as well).
>>>>
>>>> /* Don't allow RAM to be mapped */
>>>> if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>>>> return DMA_MAPPING_ERROR;
>>>>
>>>> Alex, can you please try this patch:
>>>
>>> That will certainly paper over the issue, but it's avoiding the question
>>> of what went wrong with the memory map in the first place. The comment
>>> is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
>>> addresses that would be wrong to pass to dma_map_page(), so I believe
>>> pfn_valid() is still the correct check.
>>
>> If we want to check for RAM, pfn_valid() would be wrong. If we want to check
>> for "is there a memmap, for whatever lives or does not live there",
>> pfn_valid() is the right check.
>
> So what should the DMA code use instead? Last time we needed something
> similar, the recommendation was to use pfn_to_online_page(). Mike is
> suggesting memblock_is_memory().
We use pfn_to_online_page() when we want to know if it's system RAM and
that the memmap actually contains something sane (-> memmap content has
a well defined state).
You can have offline memory blocks where pfn_to_online_page() would
return "false", memblock_is_memory() would return "true". IOW, there is
a memmap, it's System RAM, but the memmap is stale and not trustworthy.
If you want to make sure no System RAM (online/offline/...) will get
mapped, memblock_is_memory() should be the right thing to use. I recall
that x86 traverse the resource tree instead to exclude system ram
regions similarly.
>
> Given how later we are in the -rc cycle, I suggest we revert Anshuman's
> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
> assess the implications in 5.15 (the patch doesn't seem to have the
> arm64 maintainers' ack anyway ;)).
--
Thanks,
David / dhildenb
On Wed, Aug 25, 2021 at 11:20:46AM +0100, Catalin Marinas wrote:
> + hch
>
> On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
> > On 24.08.21 20:46, Robin Murphy wrote:
> > > On 2021-08-24 19:28, Mike Rapoport wrote:
> > > > On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
> > > > > On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
> > > > > > it seems there is a regression in arm64 memory mapping in 5.14, since it
> > > > > > fails on Rockchip RK3328 when the pl330 dmac tries to map with:
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> > > > > > Modules linked in: spi_rockchip(+) fuse
> > > > > > CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> > > > > > Hardware name: Pine64 Rock64 (DT)
> > > > > > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > > > > > pc : dma_map_resource+0x68/0xc0
> > > > > > lr : pl330_prep_slave_fifo+0x78/0xd0
> > > > > > sp : ffff800012102ae0
> > > > > > x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
> > > > > > x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
> > > > > > x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
> > > > > > x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
> > > > > > x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > > > > > x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
> > > > > > x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
> > > > > > x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
> > > > > > x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
> > > > > > x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
> > > > > > Call trace:
> > > > > > dma_map_resource+0x68/0xc0
> > > > > > pl330_prep_slave_sg+0x58/0x220
> > > > > > rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
> > > > > > rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> > > > > [...]
> > > > > > Note: This does not relate to the spi driver - when disabling this device in
> > > > > > the device tree it fails for any other (i2s, for instance) which uses dma.
> > > > > > Commenting out the failing check at [1], however, helps and the mapping
> > > > > > works again.
> > > >
> > > > > Do you know which address dma_map_resource() is trying to map (maybe
> > > > > add some printk())? It's not supposed to map RAM, hence the warning.
> > > > > Random guess, the address is 0xff190800 (based on the x1 above but the
> > > > > regs might as well be mangled).
> > > >
> > > > 0xff190800 will cause this warning for sure. It has a memory map, but it is
> > > > not RAM so old version of pfn_valid() would return 0 and the new one
> > > > returns 1.
> > >
> > > How does that happen, though? It's not a memory address, and it's not
> > > even within the bounds of anywhere there should or could be memory. This
> > > SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
> > > the DRAM controller (which may not all be populated, and may have pieces
> > > carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
> > > do we have pages (or at least the assumption of pages) for somewhere
> > > which by all rights should not have them?
> >
> > Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to avoid
> > any such hacks. If there is a memory hole, it gets a memmap as well.
> >
> > Tricking pfn_valid() into returning "false" where we actually have a memmap
> > only makes it look like there is no memmap; but there is one, and
> > it's PG_reserved.
>
> I can see the documentation for pfn_valid() does not claim anything more
> than the presence of an memmap entry. But I wonder whether the confusion
> is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
> assumes __va() can be used on pfn_valid(), though I suspect it relies on
> the calling function to check that the resource was RAM. The arm64
> kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
> standard memcpy on it, which wouldn't work for I/O (should we change
> this check to pfn_is_map_memory() for arm64?).
Since for the most cases pfn_valid() would actually mean RAM, the confusion
is really widespread :(
Using pfn_is_map_memory() in kern_addr_valid() seems to me a better choice
that pfn_valid().
> > > > > Either pfn_valid() gets confused in 5.14 or something is wrong with the
> > > > > DT. I have a suspicion it's the former since reverting the above commit
> > > > > makes it disappear.
> > > >
> > > > I think pfn_valid() actually behaves as expected but the caller is wrong
> > > > because pfn_valid != RAM (this applies btw to !arm64 as well).
> > > >
> > > > /* Don't allow RAM to be mapped */
> > > > if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> > > > return DMA_MAPPING_ERROR;
> > > >
> > > > Alex, can you please try this patch:
> > >
> > > That will certainly paper over the issue, but it's avoiding the question
> > > of what went wrong with the memory map in the first place. The comment
> > > is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
> > > addresses that would be wrong to pass to dma_map_page(), so I believe
> > > pfn_valid() is still the correct check.
> >
> > If we want to check for RAM, pfn_valid() would be wrong. If we want to check
> > for "is there a memmap, for whatever lives or does not live there",
> > pfn_valid() is the right check.
>
> So what should the DMA code use instead? Last time we needed something
> similar, the recommendation was to use pfn_to_online_page(). Mike is
> suggesting memblock_is_memory().
This was a for testing purposes :)
I considered some ugly fix for v5.14 with a __weak check in dma/mapping.c for
backward compatibility and an override in arm64 and then a proper audit for
v5.15.
But as Will went for revert this is not really relevant.
> Given how later we are in the -rc cycle, I suggest we revert Anshuman's
> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
> assess the implications in 5.15 (the patch doesn't seem to have the
> arm64 maintainers' ack anyway ;)).
>
> Thanks.
>
> --
> Catalin
--
Sincerely yours,
Mike.
On Wed, Aug 25, 2021 at 12:38:31PM +0200, David Hildenbrand wrote:
> On 25.08.21 12:20, Catalin Marinas wrote:
>
> > I can see the documentation for pfn_valid() does not claim anything more
> > than the presence of an memmap entry. But I wonder whether the confusion
> > is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
> > assumes __va() can be used on pfn_valid(), though I suspect it relies on
> > the calling function to check that the resource was RAM. The arm64
> > kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
> > standard memcpy on it, which wouldn't work for I/O (should we change
> > this check to pfn_is_map_memory() for arm64?).
>
> kern_addr_valid() checks that there is a direct map entry, and that the
> mapped address has a valid mmap. (copied from x86-64)
>
> Would you expect to have a direct map for memory holes and similar (IOW,
> !System RAM)?
I don't see where will it bail out for an IOMEM mapping before doing the
pfn_valid() check...
--
Sincerely yours,
Mike.
On 2021-08-25 11:38, David Hildenbrand wrote:
> On 25.08.21 12:20, Catalin Marinas wrote:
>> + hch
>>
>> On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
>>> On 24.08.21 20:46, Robin Murphy wrote:
>>>> On 2021-08-24 19:28, Mike Rapoport wrote:
>>>>> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>>>>>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>>>>>> it seems there is a regression in arm64 memory mapping in 5.14,
>>>>>>> since it
>>>>>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>>>>>
>>>>>>> ------------[ cut here ]------------
>>>>>>> WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235
>>>>>>> dma_map_resource+0x68/0xc0
>>>>>>> Modules linked in: spi_rockchip(+) fuse
>>>>>>> CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>>>>>> Hardware name: Pine64 Rock64 (DT)
>>>>>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>>>>> pc : dma_map_resource+0x68/0xc0
>>>>>>> lr : pl330_prep_slave_fifo+0x78/0xd0
>>>>>>> sp : ffff800012102ae0
>>>>>>> x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>>>>>> x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>>>>>> x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>>>>>> x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>>>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>>>>> x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>>>>>> x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>>>>>> x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>>>>>> x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>>>>>> x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>>>>>> Call trace:
>>>>>>> dma_map_resource+0x68/0xc0
>>>>>>> pl330_prep_slave_sg+0x58/0x220
>>>>>>> rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>>>>>> rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>>>>>> [...]
>>>>>>> Note: This does not relate to the spi driver - when disabling
>>>>>>> this device in
>>>>>>> the device tree it fails for any other (i2s, for instance) which
>>>>>>> uses dma.
>>>>>>> Commenting out the failing check at [1], however, helps and the
>>>>>>> mapping
>>>>>>> works again.
>>>>>
>>>>>> Do you know which address dma_map_resource() is trying to map (maybe
>>>>>> add some printk())? It's not supposed to map RAM, hence the warning.
>>>>>> Random guess, the address is 0xff190800 (based on the x1 above but
>>>>>> the
>>>>>> regs might as well be mangled).
>>>>>
>>>>> 0xff190800 will cause this warning for sure. It has a memory map,
>>>>> but it is
>>>>> not RAM so old version of pfn_valid() would return 0 and the new one
>>>>> returns 1.
>>>>
>>>> How does that happen, though? It's not a memory address, and it's not
>>>> even within the bounds of anywhere there should or could be memory.
>>>> This
>>>> SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
>>>> the DRAM controller (which may not all be populated, and may have
>>>> pieces
>>>> carved out by secure firmware), while 0xff000000-0xffffffff is MMIO.
>>>> Why
>>>> do we have pages (or at least the assumption of pages) for somewhere
>>>> which by all rights should not have them?
>>>
>>> Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to
>>> avoid
>>> any such hacks. If there is a memory hole, it gets a memmap as well.
Urgh, apologies for being slow. This case isn't a memory hole as such,
but I failed to consider the *ends* of memory not being section-aligned
and leading to an overlap anyway.
>>> Tricking pfn_valid() into returning "false" where we actually have a
>>> memmap
>>> only makes it look like there is no memmap; but there is one, and
>>> it's PG_reserved.
>>
>> I can see the documentation for pfn_valid() does not claim anything more
>> than the presence of an memmap entry. But I wonder whether the confusion
>> is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
>> assumes __va() can be used on pfn_valid(), though I suspect it relies on
>> the calling function to check that the resource was RAM. The arm64
>> kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
>> standard memcpy on it, which wouldn't work for I/O (should we change
>> this check to pfn_is_map_memory() for arm64?).
>>
>
> kern_addr_valid() checks that there is a direct map entry, and that the
> mapped address has a valid mmap. (copied from x86-64)
>
> Would you expect to have a direct map for memory holes and similar (IOW,
> !System RAM)?
Probably - we can have no-map regions for firmware-reserved RAM which I
believe end up as PG_reserved if they're small enough, for the same
reasoning as this case.
>>>>>> Either pfn_valid() gets confused in 5.14 or something is wrong
>>>>>> with the
>>>>>> DT. I have a suspicion it's the former since reverting the above
>>>>>> commit
>>>>>> makes it disappear.
>>>>>
>>>>> I think pfn_valid() actually behaves as expected but the caller is
>>>>> wrong
>>>>> because pfn_valid != RAM (this applies btw to !arm64 as well).
>>>>>
>>>>> /* Don't allow RAM to be mapped */
>>>>> if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>>>>> return DMA_MAPPING_ERROR;
>>>>>
>>>>> Alex, can you please try this patch:
>>>>
>>>> That will certainly paper over the issue, but it's avoiding the
>>>> question
>>>> of what went wrong with the memory map in the first place. The comment
>>>> is indeed a bit inaccurate, but ultimately dma_map_resource() exists
>>>> for
>>>> addresses that would be wrong to pass to dma_map_page(), so I believe
>>>> pfn_valid() is still the correct check.
>>>
>>> If we want to check for RAM, pfn_valid() would be wrong. If we want
>>> to check
>>> for "is there a memmap, for whatever lives or does not live there",
>>> pfn_valid() is the right check.
>>
>> So what should the DMA code use instead? Last time we needed something
>> similar, the recommendation was to use pfn_to_online_page(). Mike is
>> suggesting memblock_is_memory().
>
> We use pfn_to_online_page() when we want to know if it's system RAM and
> that the memmap actually contains something sane (-> memmap content has
> a well defined state).
>
> You can have offline memory blocks where pfn_to_online_page() would
> return "false", memblock_is_memory() would return "true". IOW, there is
> a memmap, it's System RAM, but the memmap is stale and not trustworthy.
That's fine - offline memory is doubly-invalid to map as an MMIO resource :)
> If you want to make sure no System RAM (online/offline/...) will get
> mapped, memblock_is_memory() should be the right thing to use. I recall
> that x86 traverse the resource tree instead to exclude system ram
> regions similarly.
I'm thinking that "pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))"
might be the closest thing to what I'd like to express - does that seem
sensible at all? The one thing I'm not quite sure about is the
interaction with P2P mappings of ZONE_DEVICE, but that's also true of
the previous behaviour, and I'm not aware that the usage model has been
fully nailed down yet, so I suspect we have some wiggle room. At worst,
false negatives in certain situations wouldn't be the end of the world,
since this is merely a sanity check for something which is expected to
be a no-op the vast majority of the time, so being unobtrusive is more
important than being exhaustive.
Thanks,
Robin.
>> Given how later we are in the -rc cycle, I suggest we revert Anshuman's
>> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
>> assess the implications in 5.15 (the patch doesn't seem to have the
>> arm64 maintainers' ack anyway ;)).
>
On 25.08.21 12:55, Catalin Marinas wrote:
> On Wed, Aug 25, 2021 at 12:38:31PM +0200, David Hildenbrand wrote:
>> On 25.08.21 12:20, Catalin Marinas wrote:
>>> On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
>>>> On 24.08.21 20:46, Robin Murphy wrote:
>>>>> On 2021-08-24 19:28, Mike Rapoport wrote:
>>>>>> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>>>>>>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>>>>>>> it seems there is a regression in arm64 memory mapping in 5.14, since it
>>>>>>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>>>>>>
>>>>>>>> ------------[ cut here ]------------
>>>>>>>> WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
>>>>>>>> Modules linked in: spi_rockchip(+) fuse
>>>>>>>> CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>>>>>>> Hardware name: Pine64 Rock64 (DT)
>>>>>>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>>>>>> pc : dma_map_resource+0x68/0xc0
>>>>>>>> lr : pl330_prep_slave_fifo+0x78/0xd0
>>>>>>>> sp : ffff800012102ae0
>>>>>>>> x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>>>>>>> x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>>>>>>> x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>>>>>>> x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>>>>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>>>>>> x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>>>>>>> x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>>>>>>> x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>>>>>>> x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>>>>>>> x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>>>>>>> Call trace:
>>>>>>>> dma_map_resource+0x68/0xc0
>>>>>>>> pl330_prep_slave_sg+0x58/0x220
>>>>>>>> rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>>>>>>> rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>>>>>>> [...]
>>>>>>>> Note: This does not relate to the spi driver - when disabling this device in
>>>>>>>> the device tree it fails for any other (i2s, for instance) which uses dma.
>>>>>>>> Commenting out the failing check at [1], however, helps and the mapping
>>>>>>>> works again.
>>>>>>
>>>>>>> Do you know which address dma_map_resource() is trying to map (maybe
>>>>>>> add some printk())? It's not supposed to map RAM, hence the warning.
>>>>>>> Random guess, the address is 0xff190800 (based on the x1 above but the
>>>>>>> regs might as well be mangled).
>>>>>>
>>>>>> 0xff190800 will cause this warning for sure. It has a memory map, but it is
>>>>>> not RAM so old version of pfn_valid() would return 0 and the new one
>>>>>> returns 1.
>>>>>
>>>>> How does that happen, though? It's not a memory address, and it's not
>>>>> even within the bounds of anywhere there should or could be memory. This
>>>>> SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
>>>>> the DRAM controller (which may not all be populated, and may have pieces
>>>>> carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
>>>>> do we have pages (or at least the assumption of pages) for somewhere
>>>>> which by all rights should not have them?
>>>>
>>>> Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to avoid
>>>> any such hacks. If there is a memory hole, it gets a memmap as well.
>>>>
>>>> Tricking pfn_valid() into returning "false" where we actually have a memmap
>>>> only makes it look like there is no memmap; but there is one, and
>>>> it's PG_reserved.
>>>
>>> I can see the documentation for pfn_valid() does not claim anything more
>>> than the presence of an memmap entry. But I wonder whether the confusion
>>> is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
>>> assumes __va() can be used on pfn_valid(), though I suspect it relies on
>>> the calling function to check that the resource was RAM. The arm64
>>> kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
>>> standard memcpy on it, which wouldn't work for I/O (should we change
>>> this check to pfn_is_map_memory() for arm64?).
>>
>> kern_addr_valid() checks that there is a direct map entry, and that the
>> mapped address has a valid mmap. (copied from x86-64)
>
> It checks that there is a va->pa mapping, not necessarily in the linear
> map as it walks the page tables. So for some I/O range that happens to
> be mapped but which was in close proximity to RAM so that pfn_valid() is
> true, kern_addr_valid() would return true. I don't thin that was the
> intention.
>
>> Would you expect to have a direct map for memory holes and similar (IOW,
>> !System RAM)?
>
> No, but we with the generic pfn_valid(), it may return true for mapped
> MMIO (with different attributes than the direct map).
Ah, right. But can we actually run into that via kcore?
kcore builds the RAM list via walk_system_ram_range(), IOW the resource
tree. And we end up calling kern_addr_valid() only on KCORE_RAM,
KCORE_VMEMMAP and KCORE_TEXT.
Not saying that kern_addr_valid() shouldn't be improved.
--
Thanks,
David / dhildenb
On Wed, Aug 25, 2021 at 12:38:31PM +0200, David Hildenbrand wrote:
> On 25.08.21 12:20, Catalin Marinas wrote:
> > On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
> > > On 24.08.21 20:46, Robin Murphy wrote:
> > > > On 2021-08-24 19:28, Mike Rapoport wrote:
> > > > > On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
> > > > > > On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
> > > > > > > it seems there is a regression in arm64 memory mapping in 5.14, since it
> > > > > > > fails on Rockchip RK3328 when the pl330 dmac tries to map with:
> > > > > > >
> > > > > > > ------------[ cut here ]------------
> > > > > > > WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> > > > > > > Modules linked in: spi_rockchip(+) fuse
> > > > > > > CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> > > > > > > Hardware name: Pine64 Rock64 (DT)
> > > > > > > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > > > > > > pc : dma_map_resource+0x68/0xc0
> > > > > > > lr : pl330_prep_slave_fifo+0x78/0xd0
> > > > > > > sp : ffff800012102ae0
> > > > > > > x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
> > > > > > > x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
> > > > > > > x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
> > > > > > > x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
> > > > > > > x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > > > > > > x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
> > > > > > > x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
> > > > > > > x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
> > > > > > > x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
> > > > > > > x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
> > > > > > > Call trace:
> > > > > > > dma_map_resource+0x68/0xc0
> > > > > > > pl330_prep_slave_sg+0x58/0x220
> > > > > > > rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
> > > > > > > rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> > > > > > [...]
> > > > > > > Note: This does not relate to the spi driver - when disabling this device in
> > > > > > > the device tree it fails for any other (i2s, for instance) which uses dma.
> > > > > > > Commenting out the failing check at [1], however, helps and the mapping
> > > > > > > works again.
> > > > >
> > > > > > Do you know which address dma_map_resource() is trying to map (maybe
> > > > > > add some printk())? It's not supposed to map RAM, hence the warning.
> > > > > > Random guess, the address is 0xff190800 (based on the x1 above but the
> > > > > > regs might as well be mangled).
> > > > >
> > > > > 0xff190800 will cause this warning for sure. It has a memory map, but it is
> > > > > not RAM so old version of pfn_valid() would return 0 and the new one
> > > > > returns 1.
> > > >
> > > > How does that happen, though? It's not a memory address, and it's not
> > > > even within the bounds of anywhere there should or could be memory. This
> > > > SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
> > > > the DRAM controller (which may not all be populated, and may have pieces
> > > > carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
> > > > do we have pages (or at least the assumption of pages) for somewhere
> > > > which by all rights should not have them?
> > >
> > > Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to avoid
> > > any such hacks. If there is a memory hole, it gets a memmap as well.
> > >
> > > Tricking pfn_valid() into returning "false" where we actually have a memmap
> > > only makes it look like there is no memmap; but there is one, and
> > > it's PG_reserved.
> >
> > I can see the documentation for pfn_valid() does not claim anything more
> > than the presence of an memmap entry. But I wonder whether the confusion
> > is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
> > assumes __va() can be used on pfn_valid(), though I suspect it relies on
> > the calling function to check that the resource was RAM. The arm64
> > kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
> > standard memcpy on it, which wouldn't work for I/O (should we change
> > this check to pfn_is_map_memory() for arm64?).
>
> kern_addr_valid() checks that there is a direct map entry, and that the
> mapped address has a valid mmap. (copied from x86-64)
It checks that there is a va->pa mapping, not necessarily in the linear
map as it walks the page tables. So for some I/O range that happens to
be mapped but which was in close proximity to RAM so that pfn_valid() is
true, kern_addr_valid() would return true. I don't thin that was the
intention.
> Would you expect to have a direct map for memory holes and similar (IOW,
> !System RAM)?
No, but we with the generic pfn_valid(), it may return true for mapped
MMIO (with different attributes than the direct map).
> > > > > > Either pfn_valid() gets confused in 5.14 or something is wrong with the
> > > > > > DT. I have a suspicion it's the former since reverting the above commit
> > > > > > makes it disappear.
> > > > >
> > > > > I think pfn_valid() actually behaves as expected but the caller is wrong
> > > > > because pfn_valid != RAM (this applies btw to !arm64 as well).
> > > > >
> > > > > /* Don't allow RAM to be mapped */
> > > > > if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> > > > > return DMA_MAPPING_ERROR;
> > > > >
> > > > > Alex, can you please try this patch:
> > > >
> > > > That will certainly paper over the issue, but it's avoiding the question
> > > > of what went wrong with the memory map in the first place. The comment
> > > > is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
> > > > addresses that would be wrong to pass to dma_map_page(), so I believe
> > > > pfn_valid() is still the correct check.
> > >
> > > If we want to check for RAM, pfn_valid() would be wrong. If we want to check
> > > for "is there a memmap, for whatever lives or does not live there",
> > > pfn_valid() is the right check.
> >
> > So what should the DMA code use instead? Last time we needed something
> > similar, the recommendation was to use pfn_to_online_page(). Mike is
> > suggesting memblock_is_memory().
>
> We use pfn_to_online_page() when we want to know if it's system RAM and that
> the memmap actually contains something sane (-> memmap content has a well
> defined state).
>
> You can have offline memory blocks where pfn_to_online_page() would return
> "false", memblock_is_memory() would return "true". IOW, there is a memmap,
> it's System RAM, but the memmap is stale and not trustworthy.
>
> If you want to make sure no System RAM (online/offline/...) will get mapped,
> memblock_is_memory() should be the right thing to use. I recall that x86
> traverse the resource tree instead to exclude system ram regions similarly.
Thanks, this makes sense.
--
Catalin
On 25.08.21 12:58, Robin Murphy wrote:
> On 2021-08-25 11:38, David Hildenbrand wrote:
>> On 25.08.21 12:20, Catalin Marinas wrote:
>>> + hch
>>>
>>> On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
>>>> On 24.08.21 20:46, Robin Murphy wrote:
>>>>> On 2021-08-24 19:28, Mike Rapoport wrote:
>>>>>> On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
>>>>>>> On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
>>>>>>>> it seems there is a regression in arm64 memory mapping in 5.14,
>>>>>>>> since it
>>>>>>>> fails on Rockchip RK3328 when the pl330 dmac tries to map with:
>>>>>>>>
>>>>>>>> ------------[ cut here ]------------
>>>>>>>> WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235
>>>>>>>> dma_map_resource+0x68/0xc0
>>>>>>>> Modules linked in: spi_rockchip(+) fuse
>>>>>>>> CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
>>>>>>>> Hardware name: Pine64 Rock64 (DT)
>>>>>>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>>>>>> pc : dma_map_resource+0x68/0xc0
>>>>>>>> lr : pl330_prep_slave_fifo+0x78/0xd0
>>>>>>>> sp : ffff800012102ae0
>>>>>>>> x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
>>>>>>>> x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
>>>>>>>> x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
>>>>>>>> x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
>>>>>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>>>>>> x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
>>>>>>>> x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
>>>>>>>> x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
>>>>>>>> x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
>>>>>>>> x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
>>>>>>>> Call trace:
>>>>>>>> dma_map_resource+0x68/0xc0
>>>>>>>> pl330_prep_slave_sg+0x58/0x220
>>>>>>>> rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
>>>>>>>> rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
>>>>>>> [...]
>>>>>>>> Note: This does not relate to the spi driver - when disabling
>>>>>>>> this device in
>>>>>>>> the device tree it fails for any other (i2s, for instance) which
>>>>>>>> uses dma.
>>>>>>>> Commenting out the failing check at [1], however, helps and the
>>>>>>>> mapping
>>>>>>>> works again.
>>>>>>
>>>>>>> Do you know which address dma_map_resource() is trying to map (maybe
>>>>>>> add some printk())? It's not supposed to map RAM, hence the warning.
>>>>>>> Random guess, the address is 0xff190800 (based on the x1 above but
>>>>>>> the
>>>>>>> regs might as well be mangled).
>>>>>>
>>>>>> 0xff190800 will cause this warning for sure. It has a memory map,
>>>>>> but it is
>>>>>> not RAM so old version of pfn_valid() would return 0 and the new one
>>>>>> returns 1.
>>>>>
>>>>> How does that happen, though? It's not a memory address, and it's not
>>>>> even within the bounds of anywhere there should or could be memory.
>>>>> This
>>>>> SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
>>>>> the DRAM controller (which may not all be populated, and may have
>>>>> pieces
>>>>> carved out by secure firmware), while 0xff000000-0xffffffff is MMIO.
>>>>> Why
>>>>> do we have pages (or at least the assumption of pages) for somewhere
>>>>> which by all rights should not have them?
>>>>
>>>> Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to
>>>> avoid
>>>> any such hacks. If there is a memory hole, it gets a memmap as well.
>
> Urgh, apologies for being slow. This case isn't a memory hole as such,
> but I failed to consider the *ends* of memory not being section-aligned
> and leading to an overlap anyway.
>
>>>> Tricking pfn_valid() into returning "false" where we actually have a
>>>> memmap
>>>> only makes it look like there is no memmap; but there is one, and
>>>> it's PG_reserved.
>>>
>>> I can see the documentation for pfn_valid() does not claim anything more
>>> than the presence of an memmap entry. But I wonder whether the confusion
>>> is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
>>> assumes __va() can be used on pfn_valid(), though I suspect it relies on
>>> the calling function to check that the resource was RAM. The arm64
>>> kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
>>> standard memcpy on it, which wouldn't work for I/O (should we change
>>> this check to pfn_is_map_memory() for arm64?).
>>>
>>
>> kern_addr_valid() checks that there is a direct map entry, and that the
>> mapped address has a valid mmap. (copied from x86-64)
>>
>> Would you expect to have a direct map for memory holes and similar (IOW,
>> !System RAM)?
>
> Probably - we can have no-map regions for firmware-reserved RAM which I
> believe end up as PG_reserved if they're small enough, for the same
> reasoning as this case.
>
>>>>>>> Either pfn_valid() gets confused in 5.14 or something is wrong
>>>>>>> with the
>>>>>>> DT. I have a suspicion it's the former since reverting the above
>>>>>>> commit
>>>>>>> makes it disappear.
>>>>>>
>>>>>> I think pfn_valid() actually behaves as expected but the caller is
>>>>>> wrong
>>>>>> because pfn_valid != RAM (this applies btw to !arm64 as well).
>>>>>>
>>>>>> /* Don't allow RAM to be mapped */
>>>>>> if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
>>>>>> return DMA_MAPPING_ERROR;
>>>>>>
>>>>>> Alex, can you please try this patch:
>>>>>
>>>>> That will certainly paper over the issue, but it's avoiding the
>>>>> question
>>>>> of what went wrong with the memory map in the first place. The comment
>>>>> is indeed a bit inaccurate, but ultimately dma_map_resource() exists
>>>>> for
>>>>> addresses that would be wrong to pass to dma_map_page(), so I believe
>>>>> pfn_valid() is still the correct check.
>>>>
>>>> If we want to check for RAM, pfn_valid() would be wrong. If we want
>>>> to check
>>>> for "is there a memmap, for whatever lives or does not live there",
>>>> pfn_valid() is the right check.
>>>
>>> So what should the DMA code use instead? Last time we needed something
>>> similar, the recommendation was to use pfn_to_online_page(). Mike is
>>> suggesting memblock_is_memory().
>>
>> We use pfn_to_online_page() when we want to know if it's system RAM and
>> that the memmap actually contains something sane (-> memmap content has
>> a well defined state).
Sorry that was only partially right: to be more precise
pfn_to_online_page() might also succeeds on memory holes and similar
(essentially everywhere where we don't have CONFIG_HAVE_ARCH_PFN_VALID),
it just means that there is a memmap and that the memmap has a well
defined. If it's system RAM, it's online and either getting used or
lying around as free in the buddy.
For example, pfn_to_online_page() won't succeed on ZONE_DEVICE memory,
while pfn_valid() will.
>>
>> You can have offline memory blocks where pfn_to_online_page() would
>> return "false", memblock_is_memory() would return "true". IOW, there is
>> a memmap, it's System RAM, but the memmap is stale and not trustworthy.
>
> That's fine - offline memory is doubly-invalid to map as an MMIO resource :)
>
>> If you want to make sure no System RAM (online/offline/...) will get
>> mapped, memblock_is_memory() should be the right thing to use. I recall
>> that x86 traverse the resource tree instead to exclude system ram
>> regions similarly.
>
> I'm thinking that "pfn_valid(pfn) && !PageReserved(pfn_to_page(pfn))"
For offline memory blocks or for ZONE_DEVICE memory where pfn_valid()
succeeds, the memmap might be stale and not contain anything reasonable.
So the PageReserved() check can be problematic.
Also note that PageReserved() has plenty of different semantics (see
PG_reserved in include/linux/page-flags.h )
> might be the closest thing to what I'd like to express - does that seem
> sensible at all? The one thing I'm not quite sure about is the
> interaction with P2P mappings of ZONE_DEVICE, but that's also true of
> the previous behaviour, and I'm not aware that the usage model has been
> fully nailed down yet, so I suspect we have some wiggle room. At worst,
Yes.
> false negatives in certain situations wouldn't be the end of the world,
> since this is merely a sanity check for something which is expected to
> be a no-op the vast majority of the time, so being unobtrusive is more
> important than being exhaustive.
Right.
--
Thanks,
David / dhildenb
On Wed, Aug 25, 2021 at 01:12:37PM +0200, David Hildenbrand wrote:
> On 25.08.21 12:55, Catalin Marinas wrote:
> > On Wed, Aug 25, 2021 at 12:38:31PM +0200, David Hildenbrand wrote:
> > > On 25.08.21 12:20, Catalin Marinas wrote:
> > > > I can see the documentation for pfn_valid() does not claim anything more
> > > > than the presence of an memmap entry. But I wonder whether the confusion
> > > > is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
> > > > assumes __va() can be used on pfn_valid(), though I suspect it relies on
> > > > the calling function to check that the resource was RAM. The arm64
> > > > kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
> > > > standard memcpy on it, which wouldn't work for I/O (should we change
> > > > this check to pfn_is_map_memory() for arm64?).
> > >
> > > kern_addr_valid() checks that there is a direct map entry, and that the
> > > mapped address has a valid mmap. (copied from x86-64)
> >
> > It checks that there is a va->pa mapping, not necessarily in the linear
> > map as it walks the page tables. So for some I/O range that happens to
> > be mapped but which was in close proximity to RAM so that pfn_valid() is
> > true, kern_addr_valid() would return true. I don't thin that was the
> > intention.
> >
> > > Would you expect to have a direct map for memory holes and similar (IOW,
> > > !System RAM)?
> >
> > No, but we with the generic pfn_valid(), it may return true for mapped
> > MMIO (with different attributes than the direct map).
>
> Ah, right. But can we actually run into that via kcore?
>
> kcore builds the RAM list via walk_system_ram_range(), IOW the resource
> tree. And we end up calling kern_addr_valid() only on KCORE_RAM,
> KCORE_VMEMMAP and KCORE_TEXT.
It's probably fine but I'd rather do some check of the other call sites
before attempting to move arm64 to the generic pfn_valid() again.
--
Catalin
On Wed, Aug 25, 2021 at 11:20:46AM +0100, Catalin Marinas wrote:
> + hch
>
> On Tue, Aug 24, 2021 at 08:59:22PM +0200, David Hildenbrand wrote:
> > On 24.08.21 20:46, Robin Murphy wrote:
> > > On 2021-08-24 19:28, Mike Rapoport wrote:
> > > > On Tue, Aug 24, 2021 at 06:37:41PM +0100, Catalin Marinas wrote:
> > > > > On Tue, Aug 24, 2021 at 03:40:47PM +0200, Alex Bee wrote:
> > > > > > it seems there is a regression in arm64 memory mapping in 5.14, since it
> > > > > > fails on Rockchip RK3328 when the pl330 dmac tries to map with:
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0
> > > > > > Modules linked in: spi_rockchip(+) fuse
> > > > > > CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1
> > > > > > Hardware name: Pine64 Rock64 (DT)
> > > > > > pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> > > > > > pc : dma_map_resource+0x68/0xc0
> > > > > > lr : pl330_prep_slave_fifo+0x78/0xd0
> > > > > > sp : ffff800012102ae0
> > > > > > x29: ffff800012102ae0 x28: ffff000005c94800 x27: 0000000000000000
> > > > > > x26: ffff000000566bd0 x25: 0000000000000001 x24: 0000000000000001
> > > > > > x23: 0000000000000002 x22: ffff000000628c00 x21: 0000000000000001
> > > > > > x20: ffff000000566bd0 x19: 0000000000000001 x18: 0000000000000000
> > > > > > x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > > > > > x14: 0000000000000277 x13: 0000000000000001 x12: 0000000000000000
> > > > > > x11: 0000000000000001 x10: 00000000000008e0 x9 : ffff800012102a80
> > > > > > x8 : ffff000000d14b80 x7 : ffff0000fe7b12f0 x6 : ffff0000fe7b1100
> > > > > > x5 : fffffc000000000f x4 : 0000000000000000 x3 : 0000000000000001
> > > > > > x2 : 0000000000000001 x1 : 00000000ff190800 x0 : ffff000000628c00
> > > > > > Call trace:
> > > > > > dma_map_resource+0x68/0xc0
> > > > > > pl330_prep_slave_sg+0x58/0x220
> > > > > > rockchip_spi_prepare_dma+0xd8/0x2c0 [spi_rockchip]
> > > > > > rockchip_spi_transfer_one+0x294/0x3d8 [spi_rockchip]
> > > > > [...]
> > > > > > Note: This does not relate to the spi driver - when disabling this device in
> > > > > > the device tree it fails for any other (i2s, for instance) which uses dma.
> > > > > > Commenting out the failing check at [1], however, helps and the mapping
> > > > > > works again.
> > > >
> > > > > Do you know which address dma_map_resource() is trying to map (maybe
> > > > > add some printk())? It's not supposed to map RAM, hence the warning.
> > > > > Random guess, the address is 0xff190800 (based on the x1 above but the
> > > > > regs might as well be mangled).
> > > >
> > > > 0xff190800 will cause this warning for sure. It has a memory map, but it is
> > > > not RAM so old version of pfn_valid() would return 0 and the new one
> > > > returns 1.
> > >
> > > How does that happen, though? It's not a memory address, and it's not
> > > even within the bounds of anywhere there should or could be memory. This
> > > SoC has a simple memory map - everything from 0 to 0xfeffffff goes to
> > > the DRAM controller (which may not all be populated, and may have pieces
> > > carved out by secure firmware), while 0xff000000-0xffffffff is MMIO. Why
> > > do we have pages (or at least the assumption of pages) for somewhere
> > > which by all rights should not have them?
> >
> > Simple: we allocate the vmemmap for whole sections (e.g., 128 MiB) to avoid
> > any such hacks. If there is a memory hole, it gets a memmap as well.
> >
> > Tricking pfn_valid() into returning "false" where we actually have a memmap
> > only makes it look like there is no memmap; but there is one, and
> > it's PG_reserved.
>
> I can see the documentation for pfn_valid() does not claim anything more
> than the presence of an memmap entry. But I wonder whether the confusion
> is wider-spread than just the DMA code. At a quick grep, try_ram_remap()
> assumes __va() can be used on pfn_valid(), though I suspect it relies on
> the calling function to check that the resource was RAM. The arm64
> kern_addr_valid() returns true based on pfn_valid() and kcore.c uses
> standard memcpy on it, which wouldn't work for I/O (should we change
> this check to pfn_is_map_memory() for arm64?).
>
> > > > > Either pfn_valid() gets confused in 5.14 or something is wrong with the
> > > > > DT. I have a suspicion it's the former since reverting the above commit
> > > > > makes it disappear.
> > > >
> > > > I think pfn_valid() actually behaves as expected but the caller is wrong
> > > > because pfn_valid != RAM (this applies btw to !arm64 as well).
> > > >
> > > > /* Don't allow RAM to be mapped */
> > > > if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> > > > return DMA_MAPPING_ERROR;
> > > >
> > > > Alex, can you please try this patch:
> > >
> > > That will certainly paper over the issue, but it's avoiding the question
> > > of what went wrong with the memory map in the first place. The comment
> > > is indeed a bit inaccurate, but ultimately dma_map_resource() exists for
> > > addresses that would be wrong to pass to dma_map_page(), so I believe
> > > pfn_valid() is still the correct check.
> >
> > If we want to check for RAM, pfn_valid() would be wrong. If we want to check
> > for "is there a memmap, for whatever lives or does not live there",
> > pfn_valid() is the right check.
>
> So what should the DMA code use instead? Last time we needed something
> similar, the recommendation was to use pfn_to_online_page(). Mike is
> suggesting memblock_is_memory().
I did some digging and it seems that the most "generic" way to check if a
page is in RAM is page_is_ram(). It's not 100% bullet proof as it'll give
false negatives for architectures that do not register "System RAM", but
those are not using dma_map_resource() anyway and, apparently, never would.
Alex, can you test this patch please?
From 35586e9dbbdb47962cb0f3803bc650d77867905a Mon Sep 17 00:00:00 2001
From: Mike Rapoport <[email protected]>
Date: Sat, 18 Sep 2021 00:01:08 +0300
Subject: [PATCH] dma-mapping: use page_is_ram to ensure RAM is not mapped
dma_map_resource() uses pfn_valid() to ensure the range is not RAM.
However, pfn_valid() only checks for availability of the memory map for a
PFN but it does not ensure that the PFN is actually backed by RAM.
Replace pfn_valid() with page_is_ram() that does verify whether a PFN is in
RAM or not.
Signed-off-by: Mike Rapoport <[email protected]>
---
kernel/dma/mapping.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 7ee5284bff58..dc6064a213a2 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -296,7 +296,7 @@ dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
return DMA_MAPPING_ERROR;
/* Don't allow RAM to be mapped */
- if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
+ if (WARN_ON_ONCE(page_is_ram(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;
if (dma_map_direct(dev, ops))
--
2.28.0
> Given how later we are in the -rc cycle, I suggest we revert Anshuman's
> commit 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID") and try to
> assess the implications in 5.15 (the patch doesn't seem to have the
> arm64 maintainers' ack anyway ;)).
>
> Thanks.
>
> --
> Catalin
--
Sincerely yours,
Mike.
On Sat, Sep 18, 2021 at 12:22:47AM +0300, Mike Rapoport wrote:
> I did some digging and it seems that the most "generic" way to check if a
> page is in RAM is page_is_ram(). It's not 100% bullet proof as it'll give
> false negatives for architectures that do not register "System RAM", but
> those are not using dma_map_resource() anyway and, apparently, never would.
The downside of page_is_ram is that it looks really expensiv for
something done at dma mapping time.
On Sat, Sep 18, 2021 at 07:18:43AM +0200, Christoph Hellwig wrote:
> On Sat, Sep 18, 2021 at 12:22:47AM +0300, Mike Rapoport wrote:
> > I did some digging and it seems that the most "generic" way to check if a
> > page is in RAM is page_is_ram(). It's not 100% bullet proof as it'll give
> > false negatives for architectures that do not register "System RAM", but
> > those are not using dma_map_resource() anyway and, apparently, never would.
>
> The downside of page_is_ram is that it looks really expensiv for
> something done at dma mapping time.
Indeed :(
But pfn_valid is plain wrong...
I'll keep digging.
--
Sincerely yours,
Mike.
On Sat, Sep 18, 2021 at 11:37:22AM +0300, Mike Rapoport wrote:
> On Sat, Sep 18, 2021 at 07:18:43AM +0200, Christoph Hellwig wrote:
> > On Sat, Sep 18, 2021 at 12:22:47AM +0300, Mike Rapoport wrote:
> > > I did some digging and it seems that the most "generic" way to check if a
> > > page is in RAM is page_is_ram(). It's not 100% bullet proof as it'll give
> > > false negatives for architectures that do not register "System RAM", but
> > > those are not using dma_map_resource() anyway and, apparently, never would.
> >
> > The downside of page_is_ram is that it looks really expensiv for
> > something done at dma mapping time.
>
> Indeed :(
> But pfn_valid is plain wrong...
> I'll keep digging.
I did some more archaeology and it that check for pfn_valid() was requested
by arm folks because their MMU may have troubles with alias mappings with
different attributes and so they made the check to use a false assumption
that pfn_valid() == "RAM".
As this WARN_ON(pfn_valid()) is only present in dma_map_resource() it's
probably safe to drop it entirely.
Otherwise the simplest way would be to hide it behind something like
ARCH_WANTS_DMA_NOT_RAM and make arm/arm64 select it.
Thoughts?
--
Sincerely yours,
Mike.
On Sat, Sep 18, 2021 at 02:39:49PM +0300, Mike Rapoport wrote:
> On Sat, Sep 18, 2021 at 11:37:22AM +0300, Mike Rapoport wrote:
> > On Sat, Sep 18, 2021 at 07:18:43AM +0200, Christoph Hellwig wrote:
> > > On Sat, Sep 18, 2021 at 12:22:47AM +0300, Mike Rapoport wrote:
> > > > I did some digging and it seems that the most "generic" way to check if a
> > > > page is in RAM is page_is_ram(). It's not 100% bullet proof as it'll give
> > > > false negatives for architectures that do not register "System RAM", but
> > > > those are not using dma_map_resource() anyway and, apparently, never would.
> > >
> > > The downside of page_is_ram is that it looks really expensiv for
> > > something done at dma mapping time.
> >
> > Indeed :(
> > But pfn_valid is plain wrong...
> > I'll keep digging.
>
> I did some more archaeology and it that check for pfn_valid() was requested
> by arm folks because their MMU may have troubles with alias mappings with
> different attributes and so they made the check to use a false assumption
> that pfn_valid() == "RAM".
>
> As this WARN_ON(pfn_valid()) is only present in dma_map_resource() it's
> probably safe to drop it entirely.
I agree, we should drop it. IIUC dma_map_resource() does not create any
kernel mapping to cause problems with attribute aliasing. You'd need a
prior devm_ioremap_resource() if you want access to that range from the
CPU side. For arm64 at least, the latter ends up with a
pfn_is_map_memory() check.
--
Catalin
On 18.09.21 07:18, Christoph Hellwig wrote:
> On Sat, Sep 18, 2021 at 12:22:47AM +0300, Mike Rapoport wrote:
>> I did some digging and it seems that the most "generic" way to check if a
>> page is in RAM is page_is_ram(). It's not 100% bullet proof as it'll give
>> false negatives for architectures that do not register "System RAM", but
>> those are not using dma_map_resource() anyway and, apparently, never would.
>
> The downside of page_is_ram is that it looks really expensiv for
> something done at dma mapping time.
>
There would be ways to speed it up, similar to
https://lkml.kernel.org/r/[email protected]
but the end result is still walking a list. Question would be, how much
that overhead matters in practice.
--
Thanks,
David / dhildenb
On Mon, Sep 20, 2021 at 11:57:58AM +0100, Catalin Marinas wrote:
> > As this WARN_ON(pfn_valid()) is only present in dma_map_resource() it's
> > probably safe to drop it entirely.
>
> I agree, we should drop it. IIUC dma_map_resource() does not create any
> kernel mapping to cause problems with attribute aliasing. You'd need a
> prior devm_ioremap_resource() if you want access to that range from the
> CPU side. For arm64 at least, the latter ends up with a
> pfn_is_map_memory() check.
It doesn't create any new mappings. The only real issue is that it
does the wrong thing for RAM in a way that might not be noticed on
simple (x86/PC) platforms.
On Tue, Sep 21, 2021 at 10:20:07AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 20, 2021 at 11:57:58AM +0100, Catalin Marinas wrote:
> > > As this WARN_ON(pfn_valid()) is only present in dma_map_resource() it's
> > > probably safe to drop it entirely.
> >
> > I agree, we should drop it. IIUC dma_map_resource() does not create any
> > kernel mapping to cause problems with attribute aliasing. You'd need a
> > prior devm_ioremap_resource() if you want access to that range from the
> > CPU side. For arm64 at least, the latter ends up with a
> > pfn_is_map_memory() check.
>
> It doesn't create any new mappings. The only real issue is that it
> does the wrong thing for RAM in a way that might not be noticed on
> simple (x86/PC) platforms.
But if the mapping request was rejected by devm_ioremap_resource() because
of an attempt to map RAM, why we would get to dma_map_resource() at all?
--
Sincerely yours,
Mike.
On Tue, Sep 21, 2021 at 12:34:10PM +0300, Mike Rapoport wrote:
> > It doesn't create any new mappings. The only real issue is that it
> > does the wrong thing for RAM in a way that might not be noticed on
> > simple (x86/PC) platforms.
>
> But if the mapping request was rejected by devm_ioremap_resource() because
> of an attempt to map RAM, why we would get to dma_map_resource() at all?
dma_map_resource takes a phys_addr_t that could come from anywhere.
On Tue, Sep 21, 2021 at 05:38:05PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 21, 2021 at 12:34:10PM +0300, Mike Rapoport wrote:
> > > It doesn't create any new mappings. The only real issue is that it
> > > does the wrong thing for RAM in a way that might not be noticed on
> > > simple (x86/PC) platforms.
> >
> > But if the mapping request was rejected by devm_ioremap_resource() because
> > of an attempt to map RAM, why we would get to dma_map_resource() at all?
>
> dma_map_resource takes a phys_addr_t that could come from anywhere.
Right, but it's not different from, say, dma_map_page_attrs() that can get a
struct page from anywhere and there is no actual memory for that struct
page at all. Do you suggest add a check that that struct page is backed by
memory in dma_map_page_attrs() as well?
--
Sincerely yours,
Mike.