2023-01-23 11:28:28

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

During the early page table creation, we used to set the mapping for
PAGE_OFFSET to the kernel load address: but the kernel load address is
always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
PAGE_OFFSET is).

But actually we don't have to establish this mapping (ie set va_pa_offset)
that early in the boot process because:

- first, setup_vm installs a temporary kernel mapping and among other
things, discovers the system memory,
- then, setup_vm_final creates the final kernel mapping and takes
advantage of the discovered system memory to create the linear
mapping.

During the first phase, we don't know the start of the system memory and
then until the second phase is finished, we can't use the linear mapping at
all and phys_to_virt/virt_to_phys translations must not be used because it
would result in a different translation from the 'real' one once the final
mapping is installed.

So here we simply delay the initialization of va_pa_offset to after the
system memory discovery. But to make sure noone uses the linear mapping
before, we add some guard in the DEBUG_VIRTUAL config.

Finally we can use PUD/P4D/PGD hugepages when possible, which will result
in a better TLB utilization.

Note that we rely on the firmware to protect itself using PMP.

Acked-by: Rob Herring <[email protected]> # DT bits
Signed-off-by: Alexandre Ghiti <[email protected]>
---

v4:
- Rebase on top of v6.2-rc3, as noted by Conor
- Add Acked-by Rob

v3:
- Change the comment about initrd_start VA conversion so that it fits
ARM64 and RISCV64 (and others in the future if needed), as suggested
by Rob

v2:
- Add a comment on why RISCV64 does not need to set initrd_start/end that
early in the boot process, as asked by Rob

arch/riscv/include/asm/page.h | 16 ++++++++++++++++
arch/riscv/mm/init.c | 25 +++++++++++++++++++------
arch/riscv/mm/physaddr.c | 16 ++++++++++++++++
drivers/of/fdt.c | 11 ++++++-----
4 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 9f432c1b5289..7fe84c89e572 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
#define PTE_FMT "%08lx"
#endif

+#ifdef CONFIG_64BIT
+/*
+ * We override this value as its generic definition uses __pa too early in
+ * the boot process (before kernel_map.va_pa_offset is set).
+ */
+#define MIN_MEMBLOCK_ADDR 0
+#endif
+
#ifdef CONFIG_MMU
extern unsigned long riscv_pfn_base;
#define ARCH_PFN_OFFSET (riscv_pfn_base)
@@ -122,7 +130,11 @@ extern phys_addr_t phys_ram_base;
#define is_linear_mapping(x) \
((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))

+#ifndef CONFIG_DEBUG_VIRTUAL
#define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
+#else
+void *linear_mapping_pa_to_va(unsigned long x);
+#endif
#define kernel_mapping_pa_to_va(y) ({ \
unsigned long _y = (unsigned long)(y); \
(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ? \
@@ -131,7 +143,11 @@ extern phys_addr_t phys_ram_base;
})
#define __pa_to_va_nodebug(x) linear_mapping_pa_to_va(x)

+#ifndef CONFIG_DEBUG_VIRTUAL
#define linear_mapping_va_to_pa(x) ((unsigned long)(x) - kernel_map.va_pa_offset)
+#else
+phys_addr_t linear_mapping_va_to_pa(unsigned long x);
+#endif
#define kernel_mapping_va_to_pa(y) ({ \
unsigned long _y = (unsigned long)(y); \
(IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 478d6763a01a..cc892ba9f787 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
phys_ram_end = memblock_end_of_DRAM();
if (!IS_ENABLED(CONFIG_XIP_KERNEL))
phys_ram_base = memblock_start_of_DRAM();
+
+ /*
+ * Any use of __va/__pa before this point is wrong as we did not know the
+ * start of DRAM before.
+ */
+ kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
+ riscv_pfn_base = PFN_DOWN(phys_ram_base);
+
/*
* memblock allocator is not aware of the fact that last 4K bytes of
* the addressable memory can not be mapped because of IS_ERR_VALUE
@@ -671,9 +679,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,

static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
{
- /* Upgrade to PMD_SIZE mappings whenever possible */
- base &= PMD_SIZE - 1;
- if (!base && size >= PMD_SIZE)
+ if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
+ return PGDIR_SIZE;
+
+ if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
+ return P4D_SIZE;
+
+ if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
+ return PUD_SIZE;
+
+ if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
return PMD_SIZE;

return PAGE_SIZE;
@@ -982,11 +997,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
set_satp_mode();
#endif

- kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
+ kernel_map.va_pa_offset = 0UL;
kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;

- riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
-
/*
* The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
* kernel, whereas for 64-bit kernel, the end of the virtual address
diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
index 9b18bda74154..18706f457da7 100644
--- a/arch/riscv/mm/physaddr.c
+++ b/arch/riscv/mm/physaddr.c
@@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
return __va_to_pa_nodebug(x);
}
EXPORT_SYMBOL(__phys_addr_symbol);
+
+phys_addr_t linear_mapping_va_to_pa(unsigned long x)
+{
+ BUG_ON(!kernel_map.va_pa_offset);
+
+ return ((unsigned long)(x) - kernel_map.va_pa_offset);
+}
+EXPORT_SYMBOL(linear_mapping_va_to_pa);
+
+void *linear_mapping_pa_to_va(unsigned long x)
+{
+ BUG_ON(!kernel_map.va_pa_offset);
+
+ return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
+}
+EXPORT_SYMBOL(linear_mapping_pa_to_va);
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index f08b25195ae7..58107bd56f8f 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
static void __early_init_dt_declare_initrd(unsigned long start,
unsigned long end)
{
- /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
- * enabled since __va() is called too early. ARM64 does make use
- * of phys_initrd_start/phys_initrd_size so we can skip this
- * conversion.
+ /*
+ * __va() is not yet available this early on some platforms. In that
+ * case, the platform uses phys_initrd_start/phys_initrd_size instead
+ * and does the VA conversion itself.
*/
- if (!IS_ENABLED(CONFIG_ARM64)) {
+ if (!IS_ENABLED(CONFIG_ARM64) &&
+ !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
initrd_start = (unsigned long)__va(start);
initrd_end = (unsigned long)__va(end);
initrd_below_start_ok = 1;
--
2.37.2



2023-01-23 14:26:05

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
>
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
>
> - first, setup_vm installs a temporary kernel mapping and among other
> things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
> advantage of the discovered system memory to create the linear
> mapping.
>
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
>
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
>
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
>
> Note that we rely on the firmware to protect itself using PMP.
>
> Acked-by: Rob Herring <[email protected]> # DT bits
> Signed-off-by: Alexandre Ghiti <[email protected]>
> ---
>
> v4:
> - Rebase on top of v6.2-rc3, as noted by Conor
> - Add Acked-by Rob
>
> v3:
> - Change the comment about initrd_start VA conversion so that it fits
> ARM64 and RISCV64 (and others in the future if needed), as suggested
> by Rob
>
> v2:
> - Add a comment on why RISCV64 does not need to set initrd_start/end that
> early in the boot process, as asked by Rob
>
> arch/riscv/include/asm/page.h | 16 ++++++++++++++++
> arch/riscv/mm/init.c | 25 +++++++++++++++++++------
> arch/riscv/mm/physaddr.c | 16 ++++++++++++++++
> drivers/of/fdt.c | 11 ++++++-----
> 4 files changed, 57 insertions(+), 11 deletions(-)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 9f432c1b5289..7fe84c89e572 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> #define PTE_FMT "%08lx"
> #endif
>
> +#ifdef CONFIG_64BIT
> +/*
> + * We override this value as its generic definition uses __pa too early in
> + * the boot process (before kernel_map.va_pa_offset is set).
> + */
> +#define MIN_MEMBLOCK_ADDR 0
> +#endif
> +
> #ifdef CONFIG_MMU
> extern unsigned long riscv_pfn_base;
> #define ARCH_PFN_OFFSET (riscv_pfn_base)
> @@ -122,7 +130,11 @@ extern phys_addr_t phys_ram_base;
> #define is_linear_mapping(x) \
> ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
>
> +#ifndef CONFIG_DEBUG_VIRTUAL
> #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> +#else
> +void *linear_mapping_pa_to_va(unsigned long x);
> +#endif
> #define kernel_mapping_pa_to_va(y) ({ \
> unsigned long _y = (unsigned long)(y); \
> (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ? \
> @@ -131,7 +143,11 @@ extern phys_addr_t phys_ram_base;
> })
> #define __pa_to_va_nodebug(x) linear_mapping_pa_to_va(x)
>
> +#ifndef CONFIG_DEBUG_VIRTUAL
> #define linear_mapping_va_to_pa(x) ((unsigned long)(x) - kernel_map.va_pa_offset)
> +#else
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> +#endif
> #define kernel_mapping_va_to_pa(y) ({ \
> unsigned long _y = (unsigned long)(y); \
> (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 478d6763a01a..cc892ba9f787 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> phys_ram_end = memblock_end_of_DRAM();
> if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> phys_ram_base = memblock_start_of_DRAM();
> +
> + /*
> + * Any use of __va/__pa before this point is wrong as we did not know the
> + * start of DRAM before.
> + */
> + kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> + riscv_pfn_base = PFN_DOWN(phys_ram_base);
> +
> /*
> * memblock allocator is not aware of the fact that last 4K bytes of
> * the addressable memory can not be mapped because of IS_ERR_VALUE
> @@ -671,9 +679,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
>
> static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> {
> - /* Upgrade to PMD_SIZE mappings whenever possible */
> - base &= PMD_SIZE - 1;
> - if (!base && size >= PMD_SIZE)
> + if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> + return PGDIR_SIZE;
> +
> + if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> + return P4D_SIZE;
> +
> + if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> + return PUD_SIZE;
> +
> + if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> return PMD_SIZE;
>
> return PAGE_SIZE;
> @@ -982,11 +997,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> set_satp_mode();
> #endif
>
> - kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> + kernel_map.va_pa_offset = 0UL;
> kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
>
> - riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
> -
> /*
> * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> * kernel, whereas for 64-bit kernel, the end of the virtual address
> diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> index 9b18bda74154..18706f457da7 100644
> --- a/arch/riscv/mm/physaddr.c
> +++ b/arch/riscv/mm/physaddr.c
> @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
> return __va_to_pa_nodebug(x);
> }
> EXPORT_SYMBOL(__phys_addr_symbol);
> +
> +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> +{
> + BUG_ON(!kernel_map.va_pa_offset);
> +
> + return ((unsigned long)(x) - kernel_map.va_pa_offset);
> +}
> +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> +
> +void *linear_mapping_pa_to_va(unsigned long x)
> +{
> + BUG_ON(!kernel_map.va_pa_offset);
> +
> + return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> +}
> +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index f08b25195ae7..58107bd56f8f 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> static void __early_init_dt_declare_initrd(unsigned long start,
> unsigned long end)
> {
> - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> - * enabled since __va() is called too early. ARM64 does make use
> - * of phys_initrd_start/phys_initrd_size so we can skip this
> - * conversion.
> + /*
> + * __va() is not yet available this early on some platforms. In that
> + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> + * and does the VA conversion itself.
> */
> - if (!IS_ENABLED(CONFIG_ARM64)) {
> + if (!IS_ENABLED(CONFIG_ARM64) &&
> + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {

There are now two architectures, so maybe it's time for a new config
symbol which would be selected by arm64 and riscv64 and then used here,
e.g.

if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {

> initrd_start = (unsigned long)__va(start);
> initrd_end = (unsigned long)__va(end);
> initrd_below_start_ok = 1;
> --
> 2.37.2
>

Otherwise,

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2023-01-23 22:11:51

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> During the early page table creation, we used to set the mapping for
> PAGE_OFFSET to the kernel load address: but the kernel load address is
> always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> PAGE_OFFSET is).
>
> But actually we don't have to establish this mapping (ie set va_pa_offset)
> that early in the boot process because:
>
> - first, setup_vm installs a temporary kernel mapping and among other
> things, discovers the system memory,
> - then, setup_vm_final creates the final kernel mapping and takes
> advantage of the discovered system memory to create the linear
> mapping.
>
> During the first phase, we don't know the start of the system memory and
> then until the second phase is finished, we can't use the linear mapping at
> all and phys_to_virt/virt_to_phys translations must not be used because it
> would result in a different translation from the 'real' one once the final
> mapping is installed.
>
> So here we simply delay the initialization of va_pa_offset to after the
> system memory discovery. But to make sure noone uses the linear mapping
> before, we add some guard in the DEBUG_VIRTUAL config.
>
> Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> in a better TLB utilization.
>
> Note that we rely on the firmware to protect itself using PMP.
>
> Acked-by: Rob Herring <[email protected]> # DT bits
> Signed-off-by: Alexandre Ghiti <[email protected]>

No good on !MMU unfortunately Alex:
../arch/riscv/mm/init.c:222:2: error: use of undeclared identifier 'riscv_pfn_base'
riscv_pfn_base = PFN_DOWN(phys_ram_base);
^

Reproduces with nommu_virt_defconfig.

Thanks,
Conor.


Attachments:
(No filename) (1.79 kB)
signature.asc (228.00 B)
Download all attachments

2023-01-24 00:19:39

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

Hi Alexandre,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on robh/for-next]
[also build test ERROR on linus/master v6.2-rc5 next-20230123]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Alexandre-Ghiti/riscv-Use-PUD-P4D-PGD-pages-for-the-linear-mapping/20230123-192952
base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
patch link: https://lore.kernel.org/r/20230123112803.817534-1-alexghiti%40rivosinc.com
patch subject: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping
config: riscv-nommu_virt_defconfig (https://download.01.org/0day-ci/archive/20230124/[email protected]/config)
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/b0cdf20b21efcc85ec67bffa6a10894dedd0d12e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Alexandre-Ghiti/riscv-Use-PUD-P4D-PGD-pages-for-the-linear-mapping/20230123-192952
git checkout b0cdf20b21efcc85ec67bffa6a10894dedd0d12e
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash arch/riscv/mm/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

arch/riscv/mm/init.c: In function 'setup_bootmem':
>> arch/riscv/mm/init.c:222:9: error: 'riscv_pfn_base' undeclared (first use in this function)
222 | riscv_pfn_base = PFN_DOWN(phys_ram_base);
| ^~~~~~~~~~~~~~
arch/riscv/mm/init.c:222:9: note: each undeclared identifier is reported only once for each function it appears in


vim +/riscv_pfn_base +222 arch/riscv/mm/init.c

187
188 static void __init setup_bootmem(void)
189 {
190 phys_addr_t vmlinux_end = __pa_symbol(&_end);
191 phys_addr_t max_mapped_addr;
192 phys_addr_t phys_ram_end, vmlinux_start;
193
194 if (IS_ENABLED(CONFIG_XIP_KERNEL))
195 vmlinux_start = __pa_symbol(&_sdata);
196 else
197 vmlinux_start = __pa_symbol(&_start);
198
199 memblock_enforce_memory_limit(memory_limit);
200
201 /*
202 * Make sure we align the reservation on PMD_SIZE since we will
203 * map the kernel in the linear mapping as read-only: we do not want
204 * any allocation to happen between _end and the next pmd aligned page.
205 */
206 if (IS_ENABLED(CONFIG_64BIT) && IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
207 vmlinux_end = (vmlinux_end + PMD_SIZE - 1) & PMD_MASK;
208 /*
209 * Reserve from the start of the kernel to the end of the kernel
210 */
211 memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start);
212
213 phys_ram_end = memblock_end_of_DRAM();
214 if (!IS_ENABLED(CONFIG_XIP_KERNEL))
215 phys_ram_base = memblock_start_of_DRAM();
216
217 /*
218 * Any use of __va/__pa before this point is wrong as we did not know the
219 * start of DRAM before.
220 */
221 kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> 222 riscv_pfn_base = PFN_DOWN(phys_ram_base);
223
224 /*
225 * memblock allocator is not aware of the fact that last 4K bytes of
226 * the addressable memory can not be mapped because of IS_ERR_VALUE
227 * macro. Make sure that last 4k bytes are not usable by memblock
228 * if end of dram is equal to maximum addressable memory. For 64-bit
229 * kernel, this problem can't happen here as the end of the virtual
230 * address space is occupied by the kernel mapping then this check must
231 * be done as soon as the kernel mapping base address is determined.
232 */
233 if (!IS_ENABLED(CONFIG_64BIT)) {
234 max_mapped_addr = __pa(~(ulong)0);
235 if (max_mapped_addr == (phys_ram_end - 1))
236 memblock_set_current_limit(max_mapped_addr - 4096);
237 }
238
239 min_low_pfn = PFN_UP(phys_ram_base);
240 max_low_pfn = max_pfn = PFN_DOWN(phys_ram_end);
241 high_memory = (void *)(__va(PFN_PHYS(max_low_pfn)));
242
243 dma32_phys_limit = min(4UL * SZ_1G, (unsigned long)PFN_PHYS(max_low_pfn));
244 set_max_mapnr(max_low_pfn - ARCH_PFN_OFFSET);
245
246 reserve_initrd_mem();
247 /*
248 * If DTB is built in, no need to reserve its memblock.
249 * Otherwise, do reserve it but avoid using
250 * early_init_fdt_reserve_self() since __pa() does
251 * not work for DTB pointers that are fixmap addresses
252 */
253 if (!IS_ENABLED(CONFIG_BUILTIN_DTB)) {
254 /*
255 * In case the DTB is not located in a memory region we won't
256 * be able to locate it later on via the linear mapping and
257 * get a segfault when accessing it via __va(dtb_early_pa).
258 * To avoid this situation copy DTB to a memory region.
259 * Note that memblock_phys_alloc will also reserve DTB region.
260 */
261 if (!memblock_is_memory(dtb_early_pa)) {
262 size_t fdt_size = fdt_totalsize(dtb_early_va);
263 phys_addr_t new_dtb_early_pa = memblock_phys_alloc(fdt_size, PAGE_SIZE);
264 void *new_dtb_early_va = early_memremap(new_dtb_early_pa, fdt_size);
265
266 memcpy(new_dtb_early_va, dtb_early_va, fdt_size);
267 early_memunmap(new_dtb_early_va, fdt_size);
268 _dtb_early_pa = new_dtb_early_pa;
269 } else
270 memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va));
271 }
272
273 dma_contiguous_reserve(dma32_phys_limit);
274 if (IS_ENABLED(CONFIG_64BIT))
275 hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
276 memblock_allow_resize();
277 }
278

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-01-24 09:05:01

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

Hi Conor,

On Mon, Jan 23, 2023 at 11:11 PM Conor Dooley <[email protected]> wrote:
>
> On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > During the early page table creation, we used to set the mapping for
> > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > PAGE_OFFSET is).
> >
> > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > that early in the boot process because:
> >
> > - first, setup_vm installs a temporary kernel mapping and among other
> > things, discovers the system memory,
> > - then, setup_vm_final creates the final kernel mapping and takes
> > advantage of the discovered system memory to create the linear
> > mapping.
> >
> > During the first phase, we don't know the start of the system memory and
> > then until the second phase is finished, we can't use the linear mapping at
> > all and phys_to_virt/virt_to_phys translations must not be used because it
> > would result in a different translation from the 'real' one once the final
> > mapping is installed.
> >
> > So here we simply delay the initialization of va_pa_offset to after the
> > system memory discovery. But to make sure noone uses the linear mapping
> > before, we add some guard in the DEBUG_VIRTUAL config.
> >
> > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > in a better TLB utilization.
> >
> > Note that we rely on the firmware to protect itself using PMP.
> >
> > Acked-by: Rob Herring <[email protected]> # DT bits
> > Signed-off-by: Alexandre Ghiti <[email protected]>
>
> No good on !MMU unfortunately Alex:
> ../arch/riscv/mm/init.c:222:2: error: use of undeclared identifier 'riscv_pfn_base'
> riscv_pfn_base = PFN_DOWN(phys_ram_base);
> ^
>
> Reproduces with nommu_virt_defconfig.

Thanks, fixed locally, I'll push the v5 soon.

Thanks again,

Alex

>
> Thanks,
> Conor.

2023-01-25 10:41:12

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Mon, Jan 23, 2023 at 03:25:54PM +0100, Andrew Jones wrote:
> On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > During the early page table creation, we used to set the mapping for
> > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > PAGE_OFFSET is).
> >
> > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > that early in the boot process because:
> >
> > - first, setup_vm installs a temporary kernel mapping and among other
> > things, discovers the system memory,
> > - then, setup_vm_final creates the final kernel mapping and takes
> > advantage of the discovered system memory to create the linear
> > mapping.
> >
> > During the first phase, we don't know the start of the system memory and
> > then until the second phase is finished, we can't use the linear mapping at
> > all and phys_to_virt/virt_to_phys translations must not be used because it
> > would result in a different translation from the 'real' one once the final
> > mapping is installed.
> >
> > So here we simply delay the initialization of va_pa_offset to after the
> > system memory discovery. But to make sure noone uses the linear mapping
> > before, we add some guard in the DEBUG_VIRTUAL config.
> >
> > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > in a better TLB utilization.
> >
> > Note that we rely on the firmware to protect itself using PMP.
> >
> > Acked-by: Rob Herring <[email protected]> # DT bits
> > Signed-off-by: Alexandre Ghiti <[email protected]>
> > ---
> >
> > v4:
> > - Rebase on top of v6.2-rc3, as noted by Conor
> > - Add Acked-by Rob
> >
> > v3:
> > - Change the comment about initrd_start VA conversion so that it fits
> > ARM64 and RISCV64 (and others in the future if needed), as suggested
> > by Rob
> >
> > v2:
> > - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > early in the boot process, as asked by Rob
> >
> > arch/riscv/include/asm/page.h | 16 ++++++++++++++++
> > arch/riscv/mm/init.c | 25 +++++++++++++++++++------
> > arch/riscv/mm/physaddr.c | 16 ++++++++++++++++
> > drivers/of/fdt.c | 11 ++++++-----
> > 4 files changed, 57 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 9f432c1b5289..7fe84c89e572 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> > #define PTE_FMT "%08lx"
> > #endif
> >
> > +#ifdef CONFIG_64BIT
> > +/*
> > + * We override this value as its generic definition uses __pa too early in
> > + * the boot process (before kernel_map.va_pa_offset is set).
> > + */
> > +#define MIN_MEMBLOCK_ADDR 0
> > +#endif
> > +
> > #ifdef CONFIG_MMU
> > extern unsigned long riscv_pfn_base;
> > #define ARCH_PFN_OFFSET (riscv_pfn_base)
> > @@ -122,7 +130,11 @@ extern phys_addr_t phys_ram_base;
> > #define is_linear_mapping(x) \
> > ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> >
> > +#ifndef CONFIG_DEBUG_VIRTUAL
> > #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > +#else
> > +void *linear_mapping_pa_to_va(unsigned long x);
> > +#endif
> > #define kernel_mapping_pa_to_va(y) ({ \
> > unsigned long _y = (unsigned long)(y); \
> > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ? \
> > @@ -131,7 +143,11 @@ extern phys_addr_t phys_ram_base;
> > })
> > #define __pa_to_va_nodebug(x) linear_mapping_pa_to_va(x)
> >
> > +#ifndef CONFIG_DEBUG_VIRTUAL
> > #define linear_mapping_va_to_pa(x) ((unsigned long)(x) - kernel_map.va_pa_offset)
> > +#else
> > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > +#endif
> > #define kernel_mapping_va_to_pa(y) ({ \
> > unsigned long _y = (unsigned long)(y); \
> > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 478d6763a01a..cc892ba9f787 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> > phys_ram_end = memblock_end_of_DRAM();
> > if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> > phys_ram_base = memblock_start_of_DRAM();
> > +
> > + /*
> > + * Any use of __va/__pa before this point is wrong as we did not know the
> > + * start of DRAM before.
> > + */
> > + kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > + riscv_pfn_base = PFN_DOWN(phys_ram_base);
> > +
> > /*
> > * memblock allocator is not aware of the fact that last 4K bytes of
> > * the addressable memory can not be mapped because of IS_ERR_VALUE
> > @@ -671,9 +679,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> >
> > static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> > {
> > - /* Upgrade to PMD_SIZE mappings whenever possible */
> > - base &= PMD_SIZE - 1;
> > - if (!base && size >= PMD_SIZE)
> > + if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > + return PGDIR_SIZE;
> > +
> > + if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > + return P4D_SIZE;
> > +
> > + if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > + return PUD_SIZE;
> > +
> > + if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> > return PMD_SIZE;
> >
> > return PAGE_SIZE;
> > @@ -982,11 +997,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > set_satp_mode();
> > #endif
> >
> > - kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > + kernel_map.va_pa_offset = 0UL;
> > kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> >
> > - riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
> > -
> > /*
> > * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> > * kernel, whereas for 64-bit kernel, the end of the virtual address
> > diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> > index 9b18bda74154..18706f457da7 100644
> > --- a/arch/riscv/mm/physaddr.c
> > +++ b/arch/riscv/mm/physaddr.c
> > @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
> > return __va_to_pa_nodebug(x);
> > }
> > EXPORT_SYMBOL(__phys_addr_symbol);
> > +
> > +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> > +{
> > + BUG_ON(!kernel_map.va_pa_offset);
> > +
> > + return ((unsigned long)(x) - kernel_map.va_pa_offset);
> > +}
> > +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> > +
> > +void *linear_mapping_pa_to_va(unsigned long x)
> > +{
> > + BUG_ON(!kernel_map.va_pa_offset);
> > +
> > + return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> > +}
> > +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index f08b25195ae7..58107bd56f8f 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > static void __early_init_dt_declare_initrd(unsigned long start,
> > unsigned long end)
> > {
> > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > - * enabled since __va() is called too early. ARM64 does make use
> > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > - * conversion.
> > + /*
> > + * __va() is not yet available this early on some platforms. In that
> > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > + * and does the VA conversion itself.
> > */
> > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
>
> There are now two architectures, so maybe it's time for a new config
> symbol which would be selected by arm64 and riscv64 and then used here,
> e.g.
>
> if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {

I see v5 left this as it was. Any comment on this suggestion?

Thanks,
drew

>
> > initrd_start = (unsigned long)__va(start);
> > initrd_end = (unsigned long)__va(end);
> > initrd_below_start_ok = 1;
> > --
> > 2.37.2
> >
>
> Otherwise,
>
> Reviewed-by: Andrew Jones <[email protected]>
>
> Thanks,
> drew

2023-01-25 12:13:09

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Wed, Jan 25, 2023 at 11:41 AM Andrew Jones <[email protected]> wrote:
>
> On Mon, Jan 23, 2023 at 03:25:54PM +0100, Andrew Jones wrote:
> > On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > > During the early page table creation, we used to set the mapping for
> > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > PAGE_OFFSET is).
> > >
> > > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > > that early in the boot process because:
> > >
> > > - first, setup_vm installs a temporary kernel mapping and among other
> > > things, discovers the system memory,
> > > - then, setup_vm_final creates the final kernel mapping and takes
> > > advantage of the discovered system memory to create the linear
> > > mapping.
> > >
> > > During the first phase, we don't know the start of the system memory and
> > > then until the second phase is finished, we can't use the linear mapping at
> > > all and phys_to_virt/virt_to_phys translations must not be used because it
> > > would result in a different translation from the 'real' one once the final
> > > mapping is installed.
> > >
> > > So here we simply delay the initialization of va_pa_offset to after the
> > > system memory discovery. But to make sure noone uses the linear mapping
> > > before, we add some guard in the DEBUG_VIRTUAL config.
> > >
> > > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > > in a better TLB utilization.
> > >
> > > Note that we rely on the firmware to protect itself using PMP.
> > >
> > > Acked-by: Rob Herring <[email protected]> # DT bits
> > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > ---
> > >
> > > v4:
> > > - Rebase on top of v6.2-rc3, as noted by Conor
> > > - Add Acked-by Rob
> > >
> > > v3:
> > > - Change the comment about initrd_start VA conversion so that it fits
> > > ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > by Rob
> > >
> > > v2:
> > > - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > early in the boot process, as asked by Rob
> > >
> > > arch/riscv/include/asm/page.h | 16 ++++++++++++++++
> > > arch/riscv/mm/init.c | 25 +++++++++++++++++++------
> > > arch/riscv/mm/physaddr.c | 16 ++++++++++++++++
> > > drivers/of/fdt.c | 11 ++++++-----
> > > 4 files changed, 57 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > > index 9f432c1b5289..7fe84c89e572 100644
> > > --- a/arch/riscv/include/asm/page.h
> > > +++ b/arch/riscv/include/asm/page.h
> > > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> > > #define PTE_FMT "%08lx"
> > > #endif
> > >
> > > +#ifdef CONFIG_64BIT
> > > +/*
> > > + * We override this value as its generic definition uses __pa too early in
> > > + * the boot process (before kernel_map.va_pa_offset is set).
> > > + */
> > > +#define MIN_MEMBLOCK_ADDR 0
> > > +#endif
> > > +
> > > #ifdef CONFIG_MMU
> > > extern unsigned long riscv_pfn_base;
> > > #define ARCH_PFN_OFFSET (riscv_pfn_base)
> > > @@ -122,7 +130,11 @@ extern phys_addr_t phys_ram_base;
> > > #define is_linear_mapping(x) \
> > > ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> > >
> > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > > #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > > +#else
> > > +void *linear_mapping_pa_to_va(unsigned long x);
> > > +#endif
> > > #define kernel_mapping_pa_to_va(y) ({ \
> > > unsigned long _y = (unsigned long)(y); \
> > > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ? \
> > > @@ -131,7 +143,11 @@ extern phys_addr_t phys_ram_base;
> > > })
> > > #define __pa_to_va_nodebug(x) linear_mapping_pa_to_va(x)
> > >
> > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > > #define linear_mapping_va_to_pa(x) ((unsigned long)(x) - kernel_map.va_pa_offset)
> > > +#else
> > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > > +#endif
> > > #define kernel_mapping_va_to_pa(y) ({ \
> > > unsigned long _y = (unsigned long)(y); \
> > > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > > index 478d6763a01a..cc892ba9f787 100644
> > > --- a/arch/riscv/mm/init.c
> > > +++ b/arch/riscv/mm/init.c
> > > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> > > phys_ram_end = memblock_end_of_DRAM();
> > > if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> > > phys_ram_base = memblock_start_of_DRAM();
> > > +
> > > + /*
> > > + * Any use of __va/__pa before this point is wrong as we did not know the
> > > + * start of DRAM before.
> > > + */
> > > + kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > > + riscv_pfn_base = PFN_DOWN(phys_ram_base);
> > > +
> > > /*
> > > * memblock allocator is not aware of the fact that last 4K bytes of
> > > * the addressable memory can not be mapped because of IS_ERR_VALUE
> > > @@ -671,9 +679,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> > >
> > > static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> > > {
> > > - /* Upgrade to PMD_SIZE mappings whenever possible */
> > > - base &= PMD_SIZE - 1;
> > > - if (!base && size >= PMD_SIZE)
> > > + if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > > + return PGDIR_SIZE;
> > > +
> > > + if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > > + return P4D_SIZE;
> > > +
> > > + if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > > + return PUD_SIZE;
> > > +
> > > + if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> > > return PMD_SIZE;
> > >
> > > return PAGE_SIZE;
> > > @@ -982,11 +997,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > > set_satp_mode();
> > > #endif
> > >
> > > - kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > > + kernel_map.va_pa_offset = 0UL;
> > > kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> > >
> > > - riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
> > > -
> > > /*
> > > * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> > > * kernel, whereas for 64-bit kernel, the end of the virtual address
> > > diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> > > index 9b18bda74154..18706f457da7 100644
> > > --- a/arch/riscv/mm/physaddr.c
> > > +++ b/arch/riscv/mm/physaddr.c
> > > @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
> > > return __va_to_pa_nodebug(x);
> > > }
> > > EXPORT_SYMBOL(__phys_addr_symbol);
> > > +
> > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> > > +{
> > > + BUG_ON(!kernel_map.va_pa_offset);
> > > +
> > > + return ((unsigned long)(x) - kernel_map.va_pa_offset);
> > > +}
> > > +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> > > +
> > > +void *linear_mapping_pa_to_va(unsigned long x)
> > > +{
> > > + BUG_ON(!kernel_map.va_pa_offset);
> > > +
> > > + return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> > > +}
> > > +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > index f08b25195ae7..58107bd56f8f 100644
> > > --- a/drivers/of/fdt.c
> > > +++ b/drivers/of/fdt.c
> > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > unsigned long end)
> > > {
> > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > - * enabled since __va() is called too early. ARM64 does make use
> > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > - * conversion.
> > > + /*
> > > + * __va() is not yet available this early on some platforms. In that
> > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > + * and does the VA conversion itself.
> > > */
> > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> >
> > There are now two architectures, so maybe it's time for a new config
> > symbol which would be selected by arm64 and riscv64 and then used here,
> > e.g.
> >
> > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
>
> I see v5 left this as it was. Any comment on this suggestion?

Introducing a config for this only use case sounds excessive to me,
but I'll let Rob decide what he wants to see here.

>
> Thanks,
> drew
>
> >
> > > initrd_start = (unsigned long)__va(start);
> > > initrd_end = (unsigned long)__va(end);
> > > initrd_below_start_ok = 1;
> > > --
> > > 2.37.2
> > >
> >
> > Otherwise,
> >
> > Reviewed-by: Andrew Jones <[email protected]>
> >
> > Thanks,
> > drew

2023-01-25 15:10:51

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Wed, Jan 25, 2023 at 01:12:49PM +0100, Alexandre Ghiti wrote:
> On Wed, Jan 25, 2023 at 11:41 AM Andrew Jones <[email protected]> wrote:
> >
> > On Mon, Jan 23, 2023 at 03:25:54PM +0100, Andrew Jones wrote:
> > > On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > > > During the early page table creation, we used to set the mapping for
> > > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > > PAGE_OFFSET is).
> > > >
> > > > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > > > that early in the boot process because:
> > > >
> > > > - first, setup_vm installs a temporary kernel mapping and among other
> > > > things, discovers the system memory,
> > > > - then, setup_vm_final creates the final kernel mapping and takes
> > > > advantage of the discovered system memory to create the linear
> > > > mapping.
> > > >
> > > > During the first phase, we don't know the start of the system memory and
> > > > then until the second phase is finished, we can't use the linear mapping at
> > > > all and phys_to_virt/virt_to_phys translations must not be used because it
> > > > would result in a different translation from the 'real' one once the final
> > > > mapping is installed.
> > > >
> > > > So here we simply delay the initialization of va_pa_offset to after the
> > > > system memory discovery. But to make sure noone uses the linear mapping
> > > > before, we add some guard in the DEBUG_VIRTUAL config.
> > > >
> > > > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > > > in a better TLB utilization.
> > > >
> > > > Note that we rely on the firmware to protect itself using PMP.
> > > >
> > > > Acked-by: Rob Herring <[email protected]> # DT bits
> > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > ---
> > > >
> > > > v4:
> > > > - Rebase on top of v6.2-rc3, as noted by Conor
> > > > - Add Acked-by Rob
> > > >
> > > > v3:
> > > > - Change the comment about initrd_start VA conversion so that it fits
> > > > ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > > by Rob
> > > >
> > > > v2:
> > > > - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > > early in the boot process, as asked by Rob
> > > >
> > > > arch/riscv/include/asm/page.h | 16 ++++++++++++++++
> > > > arch/riscv/mm/init.c | 25 +++++++++++++++++++------
> > > > arch/riscv/mm/physaddr.c | 16 ++++++++++++++++
> > > > drivers/of/fdt.c | 11 ++++++-----
> > > > 4 files changed, 57 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > > > index 9f432c1b5289..7fe84c89e572 100644
> > > > --- a/arch/riscv/include/asm/page.h
> > > > +++ b/arch/riscv/include/asm/page.h
> > > > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> > > > #define PTE_FMT "%08lx"
> > > > #endif
> > > >
> > > > +#ifdef CONFIG_64BIT
> > > > +/*
> > > > + * We override this value as its generic definition uses __pa too early in
> > > > + * the boot process (before kernel_map.va_pa_offset is set).
> > > > + */
> > > > +#define MIN_MEMBLOCK_ADDR 0
> > > > +#endif
> > > > +
> > > > #ifdef CONFIG_MMU
> > > > extern unsigned long riscv_pfn_base;
> > > > #define ARCH_PFN_OFFSET (riscv_pfn_base)
> > > > @@ -122,7 +130,11 @@ extern phys_addr_t phys_ram_base;
> > > > #define is_linear_mapping(x) \
> > > > ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> > > >
> > > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > > > #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > > > +#else
> > > > +void *linear_mapping_pa_to_va(unsigned long x);
> > > > +#endif
> > > > #define kernel_mapping_pa_to_va(y) ({ \
> > > > unsigned long _y = (unsigned long)(y); \
> > > > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ? \
> > > > @@ -131,7 +143,11 @@ extern phys_addr_t phys_ram_base;
> > > > })
> > > > #define __pa_to_va_nodebug(x) linear_mapping_pa_to_va(x)
> > > >
> > > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > > > #define linear_mapping_va_to_pa(x) ((unsigned long)(x) - kernel_map.va_pa_offset)
> > > > +#else
> > > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > > > +#endif
> > > > #define kernel_mapping_va_to_pa(y) ({ \
> > > > unsigned long _y = (unsigned long)(y); \
> > > > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > > > index 478d6763a01a..cc892ba9f787 100644
> > > > --- a/arch/riscv/mm/init.c
> > > > +++ b/arch/riscv/mm/init.c
> > > > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> > > > phys_ram_end = memblock_end_of_DRAM();
> > > > if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> > > > phys_ram_base = memblock_start_of_DRAM();
> > > > +
> > > > + /*
> > > > + * Any use of __va/__pa before this point is wrong as we did not know the
> > > > + * start of DRAM before.
> > > > + */
> > > > + kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > > > + riscv_pfn_base = PFN_DOWN(phys_ram_base);
> > > > +
> > > > /*
> > > > * memblock allocator is not aware of the fact that last 4K bytes of
> > > > * the addressable memory can not be mapped because of IS_ERR_VALUE
> > > > @@ -671,9 +679,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> > > >
> > > > static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> > > > {
> > > > - /* Upgrade to PMD_SIZE mappings whenever possible */
> > > > - base &= PMD_SIZE - 1;
> > > > - if (!base && size >= PMD_SIZE)
> > > > + if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > > > + return PGDIR_SIZE;
> > > > +
> > > > + if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > > > + return P4D_SIZE;
> > > > +
> > > > + if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > > > + return PUD_SIZE;
> > > > +
> > > > + if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> > > > return PMD_SIZE;
> > > >
> > > > return PAGE_SIZE;
> > > > @@ -982,11 +997,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > > > set_satp_mode();
> > > > #endif
> > > >
> > > > - kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > > > + kernel_map.va_pa_offset = 0UL;
> > > > kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> > > >
> > > > - riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
> > > > -
> > > > /*
> > > > * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> > > > * kernel, whereas for 64-bit kernel, the end of the virtual address
> > > > diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> > > > index 9b18bda74154..18706f457da7 100644
> > > > --- a/arch/riscv/mm/physaddr.c
> > > > +++ b/arch/riscv/mm/physaddr.c
> > > > @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
> > > > return __va_to_pa_nodebug(x);
> > > > }
> > > > EXPORT_SYMBOL(__phys_addr_symbol);
> > > > +
> > > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> > > > +{
> > > > + BUG_ON(!kernel_map.va_pa_offset);
> > > > +
> > > > + return ((unsigned long)(x) - kernel_map.va_pa_offset);
> > > > +}
> > > > +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> > > > +
> > > > +void *linear_mapping_pa_to_va(unsigned long x)
> > > > +{
> > > > + BUG_ON(!kernel_map.va_pa_offset);
> > > > +
> > > > + return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> > > > +}
> > > > +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index f08b25195ae7..58107bd56f8f 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > > unsigned long end)
> > > > {
> > > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > > - * enabled since __va() is called too early. ARM64 does make use
> > > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > > - * conversion.
> > > > + /*
> > > > + * __va() is not yet available this early on some platforms. In that
> > > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > > + * and does the VA conversion itself.
> > > > */
> > > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> > >
> > > There are now two architectures, so maybe it's time for a new config
> > > symbol which would be selected by arm64 and riscv64 and then used here,
> > > e.g.
> > >
> > > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
> >
> > I see v5 left this as it was. Any comment on this suggestion?
>
> Introducing a config for this only use case sounds excessive to me,
> but I'll let Rob decide what he wants to see here.

To me, the suggestion is less about trying to tidy up DT code and more
about bringing this comment about arm64 and riscv64 not being able to
use the linear map as early as other architectures up out of the
depths of DT code. Seeing an architecture select something like
NO_EARLY_LINEAR_MAP, which has a paragraph explaining what that
means, may help avoid other early uses of __va() which may or may
not fail quickly and cleanly with a BUG.

Thanks,
drew

2023-01-27 08:45:39

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

Hey Andrew,

Sorry about the delay.

On Wed, Jan 25, 2023 at 4:10 PM Andrew Jones <[email protected]> wrote:
>
> On Wed, Jan 25, 2023 at 01:12:49PM +0100, Alexandre Ghiti wrote:
> > On Wed, Jan 25, 2023 at 11:41 AM Andrew Jones <[email protected]> wrote:
> > >
> > > On Mon, Jan 23, 2023 at 03:25:54PM +0100, Andrew Jones wrote:
> > > > On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > > > > During the early page table creation, we used to set the mapping for
> > > > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > > > PAGE_OFFSET is).
> > > > >
> > > > > But actually we don't have to establish this mapping (ie set va_pa_offset)
> > > > > that early in the boot process because:
> > > > >
> > > > > - first, setup_vm installs a temporary kernel mapping and among other
> > > > > things, discovers the system memory,
> > > > > - then, setup_vm_final creates the final kernel mapping and takes
> > > > > advantage of the discovered system memory to create the linear
> > > > > mapping.
> > > > >
> > > > > During the first phase, we don't know the start of the system memory and
> > > > > then until the second phase is finished, we can't use the linear mapping at
> > > > > all and phys_to_virt/virt_to_phys translations must not be used because it
> > > > > would result in a different translation from the 'real' one once the final
> > > > > mapping is installed.
> > > > >
> > > > > So here we simply delay the initialization of va_pa_offset to after the
> > > > > system memory discovery. But to make sure noone uses the linear mapping
> > > > > before, we add some guard in the DEBUG_VIRTUAL config.
> > > > >
> > > > > Finally we can use PUD/P4D/PGD hugepages when possible, which will result
> > > > > in a better TLB utilization.
> > > > >
> > > > > Note that we rely on the firmware to protect itself using PMP.
> > > > >
> > > > > Acked-by: Rob Herring <[email protected]> # DT bits
> > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > ---
> > > > >
> > > > > v4:
> > > > > - Rebase on top of v6.2-rc3, as noted by Conor
> > > > > - Add Acked-by Rob
> > > > >
> > > > > v3:
> > > > > - Change the comment about initrd_start VA conversion so that it fits
> > > > > ARM64 and RISCV64 (and others in the future if needed), as suggested
> > > > > by Rob
> > > > >
> > > > > v2:
> > > > > - Add a comment on why RISCV64 does not need to set initrd_start/end that
> > > > > early in the boot process, as asked by Rob
> > > > >
> > > > > arch/riscv/include/asm/page.h | 16 ++++++++++++++++
> > > > > arch/riscv/mm/init.c | 25 +++++++++++++++++++------
> > > > > arch/riscv/mm/physaddr.c | 16 ++++++++++++++++
> > > > > drivers/of/fdt.c | 11 ++++++-----
> > > > > 4 files changed, 57 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > > > > index 9f432c1b5289..7fe84c89e572 100644
> > > > > --- a/arch/riscv/include/asm/page.h
> > > > > +++ b/arch/riscv/include/asm/page.h
> > > > > @@ -90,6 +90,14 @@ typedef struct page *pgtable_t;
> > > > > #define PTE_FMT "%08lx"
> > > > > #endif
> > > > >
> > > > > +#ifdef CONFIG_64BIT
> > > > > +/*
> > > > > + * We override this value as its generic definition uses __pa too early in
> > > > > + * the boot process (before kernel_map.va_pa_offset is set).
> > > > > + */
> > > > > +#define MIN_MEMBLOCK_ADDR 0
> > > > > +#endif
> > > > > +
> > > > > #ifdef CONFIG_MMU
> > > > > extern unsigned long riscv_pfn_base;
> > > > > #define ARCH_PFN_OFFSET (riscv_pfn_base)
> > > > > @@ -122,7 +130,11 @@ extern phys_addr_t phys_ram_base;
> > > > > #define is_linear_mapping(x) \
> > > > > ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
> > > > >
> > > > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > > > > #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
> > > > > +#else
> > > > > +void *linear_mapping_pa_to_va(unsigned long x);
> > > > > +#endif
> > > > > #define kernel_mapping_pa_to_va(y) ({ \
> > > > > unsigned long _y = (unsigned long)(y); \
> > > > > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < phys_ram_base) ? \
> > > > > @@ -131,7 +143,11 @@ extern phys_addr_t phys_ram_base;
> > > > > })
> > > > > #define __pa_to_va_nodebug(x) linear_mapping_pa_to_va(x)
> > > > >
> > > > > +#ifndef CONFIG_DEBUG_VIRTUAL
> > > > > #define linear_mapping_va_to_pa(x) ((unsigned long)(x) - kernel_map.va_pa_offset)
> > > > > +#else
> > > > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> > > > > +#endif
> > > > > #define kernel_mapping_va_to_pa(y) ({ \
> > > > > unsigned long _y = (unsigned long)(y); \
> > > > > (IS_ENABLED(CONFIG_XIP_KERNEL) && _y < kernel_map.virt_addr + XIP_OFFSET) ? \
> > > > > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > > > > index 478d6763a01a..cc892ba9f787 100644
> > > > > --- a/arch/riscv/mm/init.c
> > > > > +++ b/arch/riscv/mm/init.c
> > > > > @@ -213,6 +213,14 @@ static void __init setup_bootmem(void)
> > > > > phys_ram_end = memblock_end_of_DRAM();
> > > > > if (!IS_ENABLED(CONFIG_XIP_KERNEL))
> > > > > phys_ram_base = memblock_start_of_DRAM();
> > > > > +
> > > > > + /*
> > > > > + * Any use of __va/__pa before this point is wrong as we did not know the
> > > > > + * start of DRAM before.
> > > > > + */
> > > > > + kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
> > > > > + riscv_pfn_base = PFN_DOWN(phys_ram_base);
> > > > > +
> > > > > /*
> > > > > * memblock allocator is not aware of the fact that last 4K bytes of
> > > > > * the addressable memory can not be mapped because of IS_ERR_VALUE
> > > > > @@ -671,9 +679,16 @@ void __init create_pgd_mapping(pgd_t *pgdp,
> > > > >
> > > > > static uintptr_t __init best_map_size(phys_addr_t base, phys_addr_t size)
> > > > > {
> > > > > - /* Upgrade to PMD_SIZE mappings whenever possible */
> > > > > - base &= PMD_SIZE - 1;
> > > > > - if (!base && size >= PMD_SIZE)
> > > > > + if (!(base & (PGDIR_SIZE - 1)) && size >= PGDIR_SIZE)
> > > > > + return PGDIR_SIZE;
> > > > > +
> > > > > + if (!(base & (P4D_SIZE - 1)) && size >= P4D_SIZE)
> > > > > + return P4D_SIZE;
> > > > > +
> > > > > + if (!(base & (PUD_SIZE - 1)) && size >= PUD_SIZE)
> > > > > + return PUD_SIZE;
> > > > > +
> > > > > + if (!(base & (PMD_SIZE - 1)) && size >= PMD_SIZE)
> > > > > return PMD_SIZE;
> > > > >
> > > > > return PAGE_SIZE;
> > > > > @@ -982,11 +997,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > > > > set_satp_mode();
> > > > > #endif
> > > > >
> > > > > - kernel_map.va_pa_offset = PAGE_OFFSET - kernel_map.phys_addr;
> > > > > + kernel_map.va_pa_offset = 0UL;
> > > > > kernel_map.va_kernel_pa_offset = kernel_map.virt_addr - kernel_map.phys_addr;
> > > > >
> > > > > - riscv_pfn_base = PFN_DOWN(kernel_map.phys_addr);
> > > > > -
> > > > > /*
> > > > > * The default maximal physical memory size is KERN_VIRT_SIZE for 32-bit
> > > > > * kernel, whereas for 64-bit kernel, the end of the virtual address
> > > > > diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
> > > > > index 9b18bda74154..18706f457da7 100644
> > > > > --- a/arch/riscv/mm/physaddr.c
> > > > > +++ b/arch/riscv/mm/physaddr.c
> > > > > @@ -33,3 +33,19 @@ phys_addr_t __phys_addr_symbol(unsigned long x)
> > > > > return __va_to_pa_nodebug(x);
> > > > > }
> > > > > EXPORT_SYMBOL(__phys_addr_symbol);
> > > > > +
> > > > > +phys_addr_t linear_mapping_va_to_pa(unsigned long x)
> > > > > +{
> > > > > + BUG_ON(!kernel_map.va_pa_offset);
> > > > > +
> > > > > + return ((unsigned long)(x) - kernel_map.va_pa_offset);
> > > > > +}
> > > > > +EXPORT_SYMBOL(linear_mapping_va_to_pa);
> > > > > +
> > > > > +void *linear_mapping_pa_to_va(unsigned long x)
> > > > > +{
> > > > > + BUG_ON(!kernel_map.va_pa_offset);
> > > > > +
> > > > > + return ((void *)((unsigned long)(x) + kernel_map.va_pa_offset));
> > > > > +}
> > > > > +EXPORT_SYMBOL(linear_mapping_pa_to_va);
> > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > index f08b25195ae7..58107bd56f8f 100644
> > > > > --- a/drivers/of/fdt.c
> > > > > +++ b/drivers/of/fdt.c
> > > > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > > > unsigned long end)
> > > > > {
> > > > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > > > - * enabled since __va() is called too early. ARM64 does make use
> > > > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > > > - * conversion.
> > > > > + /*
> > > > > + * __va() is not yet available this early on some platforms. In that
> > > > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > > > + * and does the VA conversion itself.
> > > > > */
> > > > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> > > >
> > > > There are now two architectures, so maybe it's time for a new config
> > > > symbol which would be selected by arm64 and riscv64 and then used here,
> > > > e.g.
> > > >
> > > > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
> > >
> > > I see v5 left this as it was. Any comment on this suggestion?
> >
> > Introducing a config for this only use case sounds excessive to me,
> > but I'll let Rob decide what he wants to see here.
>
> To me, the suggestion is less about trying to tidy up DT code and more
> about bringing this comment about arm64 and riscv64 not being able to
> use the linear map as early as other architectures up out of the
> depths of DT code. Seeing an architecture select something like
> NO_EARLY_LINEAR_MAP, which has a paragraph explaining what that
> means, may help avoid other early uses of __va() which may or may
> not fail quickly and cleanly with a BUG.
>

You're right, do you have some bandwidth for doing that?

Thanks,

Alex

> Thanks,
> drew

2023-01-27 08:58:10

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Fri, Jan 27, 2023 at 09:45:21AM +0100, Alexandre Ghiti wrote:
...
> > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > index f08b25195ae7..58107bd56f8f 100644
> > > > > > --- a/drivers/of/fdt.c
> > > > > > +++ b/drivers/of/fdt.c
> > > > > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > > > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > > > > unsigned long end)
> > > > > > {
> > > > > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > > > > - * enabled since __va() is called too early. ARM64 does make use
> > > > > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > > > > - * conversion.
> > > > > > + /*
> > > > > > + * __va() is not yet available this early on some platforms. In that
> > > > > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > > > > + * and does the VA conversion itself.
> > > > > > */
> > > > > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > > > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > > > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> > > > >
> > > > > There are now two architectures, so maybe it's time for a new config
> > > > > symbol which would be selected by arm64 and riscv64 and then used here,
> > > > > e.g.
> > > > >
> > > > > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
> > > >
> > > > I see v5 left this as it was. Any comment on this suggestion?
> > >
> > > Introducing a config for this only use case sounds excessive to me,
> > > but I'll let Rob decide what he wants to see here.
> >
> > To me, the suggestion is less about trying to tidy up DT code and more
> > about bringing this comment about arm64 and riscv64 not being able to
> > use the linear map as early as other architectures up out of the
> > depths of DT code. Seeing an architecture select something like
> > NO_EARLY_LINEAR_MAP, which has a paragraph explaining what that
> > means, may help avoid other early uses of __va() which may or may
> > not fail quickly and cleanly with a BUG.
> >
>
> You're right, do you have some bandwidth for doing that?
>

Sure, I'll post something today.

Thanks,
drew

2023-01-28 10:13:29

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Fri, Jan 27, 2023 at 09:58:03AM +0100, Andrew Jones wrote:
> On Fri, Jan 27, 2023 at 09:45:21AM +0100, Alexandre Ghiti wrote:
> ...
> > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > index f08b25195ae7..58107bd56f8f 100644
> > > > > > > --- a/drivers/of/fdt.c
> > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > > > > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > > > > > unsigned long end)
> > > > > > > {
> > > > > > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > > > > > - * enabled since __va() is called too early. ARM64 does make use
> > > > > > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > > > > > - * conversion.
> > > > > > > + /*
> > > > > > > + * __va() is not yet available this early on some platforms. In that
> > > > > > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > > > > > + * and does the VA conversion itself.
> > > > > > > */
> > > > > > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > > > > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > > > > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> > > > > >
> > > > > > There are now two architectures, so maybe it's time for a new config
> > > > > > symbol which would be selected by arm64 and riscv64 and then used here,
> > > > > > e.g.
> > > > > >
> > > > > > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
> > > > >
> > > > > I see v5 left this as it was. Any comment on this suggestion?
> > > >
> > > > Introducing a config for this only use case sounds excessive to me,
> > > > but I'll let Rob decide what he wants to see here.
> > >
> > > To me, the suggestion is less about trying to tidy up DT code and more
> > > about bringing this comment about arm64 and riscv64 not being able to
> > > use the linear map as early as other architectures up out of the
> > > depths of DT code. Seeing an architecture select something like
> > > NO_EARLY_LINEAR_MAP, which has a paragraph explaining what that
> > > means, may help avoid other early uses of __va() which may or may
> > > not fail quickly and cleanly with a BUG.
> > >
> >
> > You're right, do you have some bandwidth for doing that?
> >
>
> Sure, I'll post something today.
>

Hi Alex,

So on second thought, and after a bunch of grepping, I don't think we need
to try and give architectures a way to declare that they have incomplete
early linear maps. It would actually be difficult to determine when and
why it should be selected, and, afaict, this is currently the one and
only place it would be helpful. So, rather than pulling this comment about
early __va() all the way up to the Kconfig level, I think pulling it into
arch-specific code should be sufficient, and actually better.

The situation we have here is that OF code is providing a convenience
by doing early setup of initrd_start and initrd_end, which is nice for the
large majority of architectures, but, as we see, it can't be used by all.
My new suggestion is that we expose this initrd setup function as a weak
function which architectures may override if needed. If those
architectures want to prepare a mapping in their function, they can, or,
if they know it's safe to defer the setup until later, then they can just
provide a stub. I've drafted a patch where arm64 just provides a stub.
I'll post it soon.

Thanks,
drew

2023-01-30 13:48:26

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Wed, Jan 25, 2023 at 6:13 AM Alexandre Ghiti <[email protected]> wrote:
>
> On Wed, Jan 25, 2023 at 11:41 AM Andrew Jones <[email protected]> wrote:
> >
> > On Mon, Jan 23, 2023 at 03:25:54PM +0100, Andrew Jones wrote:
> > > On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > > > During the early page table creation, we used to set the mapping for
> > > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > > PAGE_OFFSET is).

[...]

> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index f08b25195ae7..58107bd56f8f 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > > unsigned long end)
> > > > {
> > > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > > - * enabled since __va() is called too early. ARM64 does make use
> > > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > > - * conversion.
> > > > + /*
> > > > + * __va() is not yet available this early on some platforms. In that
> > > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > > + * and does the VA conversion itself.
> > > > */
> > > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> > >
> > > There are now two architectures, so maybe it's time for a new config
> > > symbol which would be selected by arm64 and riscv64 and then used here,
> > > e.g.
> > >
> > > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
> >
> > I see v5 left this as it was. Any comment on this suggestion?
>
> Introducing a config for this only use case sounds excessive to me,
> but I'll let Rob decide what he wants to see here.

Agreed. Can we just keep it as is here.

> > > > initrd_start = (unsigned long)__va(start);
> > > > initrd_end = (unsigned long)__va(end);

I think long term, we should just get rid of needing to do this part
in the DT code and let the initrd code do this.

Rob

2023-01-30 14:21:46

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Mon, Jan 30, 2023 at 07:48:04AM -0600, Rob Herring wrote:
> On Wed, Jan 25, 2023 at 6:13 AM Alexandre Ghiti <[email protected]> wrote:
> >
> > On Wed, Jan 25, 2023 at 11:41 AM Andrew Jones <[email protected]> wrote:
> > >
> > > On Mon, Jan 23, 2023 at 03:25:54PM +0100, Andrew Jones wrote:
> > > > On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > > > > During the early page table creation, we used to set the mapping for
> > > > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > > > PAGE_OFFSET is).
>
> [...]
>
> > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > index f08b25195ae7..58107bd56f8f 100644
> > > > > --- a/drivers/of/fdt.c
> > > > > +++ b/drivers/of/fdt.c
> > > > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > > > unsigned long end)
> > > > > {
> > > > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > > > - * enabled since __va() is called too early. ARM64 does make use
> > > > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > > > - * conversion.
> > > > > + /*
> > > > > + * __va() is not yet available this early on some platforms. In that
> > > > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > > > + * and does the VA conversion itself.
> > > > > */
> > > > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> > > >
> > > > There are now two architectures, so maybe it's time for a new config
> > > > symbol which would be selected by arm64 and riscv64 and then used here,
> > > > e.g.
> > > >
> > > > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
> > >
> > > I see v5 left this as it was. Any comment on this suggestion?
> >
> > Introducing a config for this only use case sounds excessive to me,
> > but I'll let Rob decide what he wants to see here.
>
> Agreed. Can we just keep it as is here.
>
> > > > > initrd_start = (unsigned long)__va(start);
> > > > > initrd_end = (unsigned long)__va(end);
>
> I think long term, we should just get rid of needing to do this part
> in the DT code and let the initrd code do this.

initrd code provides reserve_initrd_mem() for this and riscv calls
it later on. afaict, this early setting in OF code is a convenience
which architectures could be taught not to depend on, and then it
could be removed. But, until then, some architectures will need to
avoid it. As I commented downthread, I also don't want to go with
a config anymore, but it'd be nice to keep arch-specifics out of
here, so I've posted a patch changing __early_init_dt_declare_initrd
to be a weak function.

Thanks,
drew

2023-01-30 14:58:17

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v4] riscv: Use PUD/P4D/PGD pages for the linear mapping

On Mon, Jan 30, 2023 at 8:19 AM Andrew Jones <[email protected]> wrote:
>
> On Mon, Jan 30, 2023 at 07:48:04AM -0600, Rob Herring wrote:
> > On Wed, Jan 25, 2023 at 6:13 AM Alexandre Ghiti <[email protected]> wrote:
> > >
> > > On Wed, Jan 25, 2023 at 11:41 AM Andrew Jones <[email protected]> wrote:
> > > >
> > > > On Mon, Jan 23, 2023 at 03:25:54PM +0100, Andrew Jones wrote:
> > > > > On Mon, Jan 23, 2023 at 12:28:02PM +0100, Alexandre Ghiti wrote:
> > > > > > During the early page table creation, we used to set the mapping for
> > > > > > PAGE_OFFSET to the kernel load address: but the kernel load address is
> > > > > > always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
> > > > > > pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
> > > > > > PAGE_OFFSET is).
> >
> > [...]
> >
> > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > index f08b25195ae7..58107bd56f8f 100644
> > > > > > --- a/drivers/of/fdt.c
> > > > > > +++ b/drivers/of/fdt.c
> > > > > > @@ -891,12 +891,13 @@ const void * __init of_flat_dt_match_machine(const void *default_match,
> > > > > > static void __early_init_dt_declare_initrd(unsigned long start,
> > > > > > unsigned long end)
> > > > > > {
> > > > > > - /* ARM64 would cause a BUG to occur here when CONFIG_DEBUG_VM is
> > > > > > - * enabled since __va() is called too early. ARM64 does make use
> > > > > > - * of phys_initrd_start/phys_initrd_size so we can skip this
> > > > > > - * conversion.
> > > > > > + /*
> > > > > > + * __va() is not yet available this early on some platforms. In that
> > > > > > + * case, the platform uses phys_initrd_start/phys_initrd_size instead
> > > > > > + * and does the VA conversion itself.
> > > > > > */
> > > > > > - if (!IS_ENABLED(CONFIG_ARM64)) {
> > > > > > + if (!IS_ENABLED(CONFIG_ARM64) &&
> > > > > > + !(IS_ENABLED(CONFIG_RISCV) && IS_ENABLED(CONFIG_64BIT))) {
> > > > >
> > > > > There are now two architectures, so maybe it's time for a new config
> > > > > symbol which would be selected by arm64 and riscv64 and then used here,
> > > > > e.g.
> > > > >
> > > > > if (!IS_ENABLED(CONFIG_NO_EARLY_LINEAR_MAP)) {
> > > >
> > > > I see v5 left this as it was. Any comment on this suggestion?
> > >
> > > Introducing a config for this only use case sounds excessive to me,
> > > but I'll let Rob decide what he wants to see here.
> >
> > Agreed. Can we just keep it as is here.
> >
> > > > > > initrd_start = (unsigned long)__va(start);
> > > > > > initrd_end = (unsigned long)__va(end);
> >
> > I think long term, we should just get rid of needing to do this part
> > in the DT code and let the initrd code do this.
>
> initrd code provides reserve_initrd_mem() for this and riscv calls
> it later on. afaict, this early setting in OF code is a convenience
> which architectures could be taught not to depend on, and then it
> could be removed. But, until then, some architectures will need to
> avoid it. As I commented downthread, I also don't want to go with
> a config anymore, but it'd be nice to keep arch-specifics out of
> here, so I've posted a patch changing __early_init_dt_declare_initrd
> to be a weak function.

If this was *always* going to be architecture specific, then I'd
agree. But I think there are better paths to refactor this. I don't
want to go back to a weak function and encourage more implementations
of __early_init_dt_declare_initrd().

Rob