2024-05-10 06:29:24

by Nam Cao

[permalink] [raw]
Subject: [PATCH 0/7] remove size limit on XIP kernel

Hi,

For XIP kernel, the writable data section is always at offset specified in
XIP_OFFSET, which is hard-coded to 32MB.

Unfortunately, this means the read-only section (placed before the
writable section) is restricted in size. This causes build failure if the
kernel gets too large.

This series remove the use of XIP_OFFSET one by one, then remove this
macro entirely at the end, with the goal of lifting this size restriction.

Also some cleanup and documentation along the way.

This series depends on
https://lore.kernel.org/linux-riscv/[email protected]/
to apply cleanly, and also depends on
https://lore.kernel.org/linux-riscv/[email protected]/
which fixes a boot issue.

Best regards,
Nam

Nam Cao (7):
riscv: cleanup XIP_FIXUP macro
riscv: replace va_kernel_pa_offset with va_kernel_data_pa_offset on
XIP
riscv: drop the use of XIP_OFFSET in XIP_FIXUP_OFFSET
riscv: drop the use of XIP_OFFSET in XIP_FIXUP_FLASH_OFFSET
riscv: drop the use of XIP_OFFSET in kernel_mapping_va_to_pa()
riscv: drop the use of XIP_OFFSET in create_kernel_page_table()
riscv: remove limit on the size of read-only section for XIP kernel

arch/riscv/include/asm/page.h | 25 ++++++++++++++++++------
arch/riscv/include/asm/pgtable.h | 18 +++++++----------
arch/riscv/include/asm/xip_fixup.h | 30 +++++++++++++++++++++++------
arch/riscv/kernel/vmlinux-xip.lds.S | 4 ++--
arch/riscv/mm/init.c | 11 +++++++----
5 files changed, 59 insertions(+), 29 deletions(-)

--
2.39.2



2024-05-10 06:30:10

by Nam Cao

[permalink] [raw]
Subject: [PATCH 5/7] riscv: drop the use of XIP_OFFSET in kernel_mapping_va_to_pa()

XIP_OFFSET is the hard-coded offset of writable data section within the
kernel.

By hard-coding this value, the read-only section of the kernel (which is
placed before the writable data section) is restricted in size.

As a preparation to remove this hard-coded macro XIP_OFFSET entirely,
remove the use of XIP_OFFSET in kernel_mapping_va_to_pa(). The macro
XIP_OFFSET is used in this case to check if the virtual address is mapped
to Flash or to RAM. The same check can be done with kernel_map.xiprom_sz.

Signed-off-by: Nam Cao <[email protected]>
---
arch/riscv/include/asm/page.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 14d0de928f9b..bcd77df15835 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -159,7 +159,7 @@ phys_addr_t linear_mapping_va_to_pa(unsigned long x);
#ifdef CONFIG_XIP_KERNEL
#define kernel_mapping_va_to_pa(y) ({ \
unsigned long _y = (unsigned long)(y); \
- (_y < kernel_map.virt_addr + XIP_OFFSET) ? \
+ (_y < kernel_map.virt_addr + kernel_map.xiprom_sz) ? \
(_y - kernel_map.va_kernel_xip_pa_offset) : \
(_y - kernel_map.va_kernel_data_pa_offset); \
})
--
2.39.2


2024-05-10 06:30:12

by Nam Cao

[permalink] [raw]
Subject: [PATCH 6/7] riscv: drop the use of XIP_OFFSET in create_kernel_page_table()

XIP_OFFSET is the hard-coded offset of writable data section within the
kernel.

By hard-coding this value, the read-only section of the kernel (which is
placed before the writable data section) is restricted in size.

As a preparation to remove this hard-coded value entirely, stop using
XIP_OFFSET in create_kernel_page_table(). Instead use _sdata and _start to
do the same thing.

Signed-off-by: Nam Cao <[email protected]>
---
arch/riscv/mm/init.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 9846c6924509..62ff4aa2be96 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -907,7 +907,7 @@ static void __init relocate_kernel(void)
static void __init create_kernel_page_table(pgd_t *pgdir,
__always_unused bool early)
{
- uintptr_t va, end_va;
+ uintptr_t va, start_va, end_va;

/* Map the flash resident part */
end_va = kernel_map.virt_addr + kernel_map.xiprom_sz;
@@ -917,10 +917,11 @@ static void __init create_kernel_page_table(pgd_t *pgdir,
PMD_SIZE, PAGE_KERNEL_EXEC);

/* Map the data in RAM */
+ start_va = kernel_map.virt_addr + (uintptr_t)&_sdata - (uintptr_t)&_start;
end_va = kernel_map.virt_addr + kernel_map.size;
- for (va = kernel_map.virt_addr + XIP_OFFSET; va < end_va; va += PMD_SIZE)
+ for (va = start_va; va < end_va; va += PMD_SIZE)
create_pgd_mapping(pgdir, va,
- kernel_map.phys_addr + (va - (kernel_map.virt_addr + XIP_OFFSET)),
+ kernel_map.phys_addr + (va - start_va),
PMD_SIZE, PAGE_KERNEL);
}
#else
--
2.39.2


2024-05-10 06:30:15

by Nam Cao

[permalink] [raw]
Subject: [PATCH 7/7] riscv: remove limit on the size of read-only section for XIP kernel

XIP_OFFSET is the hard-coded offset of writable data section within the
kernel.

By hard-coding this value, the read-only section of the kernel (which is
placed before the writable data section) is restricted in size. This causes
build failures if the kernel get too big (an example is in Closes:).

Remove this limit.

Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Nam Cao <[email protected]>
---
arch/riscv/include/asm/pgtable.h | 7 -------
arch/riscv/kernel/vmlinux-xip.lds.S | 4 ++--
2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index fbf342f4afee..75f4a92ea5bb 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -104,13 +104,6 @@

#endif

-#ifdef CONFIG_XIP_KERNEL
-#define XIP_OFFSET SZ_32M
-#define XIP_OFFSET_MASK (SZ_32M - 1)
-#else
-#define XIP_OFFSET 0
-#endif
-
#ifndef __ASSEMBLY__

#include <asm/page.h>
diff --git a/arch/riscv/kernel/vmlinux-xip.lds.S b/arch/riscv/kernel/vmlinux-xip.lds.S
index 8c3daa1b0531..01f73f2ffecc 100644
--- a/arch/riscv/kernel/vmlinux-xip.lds.S
+++ b/arch/riscv/kernel/vmlinux-xip.lds.S
@@ -65,10 +65,10 @@ SECTIONS
* From this point, stuff is considered writable and will be copied to RAM
*/
__data_loc = ALIGN(PAGE_SIZE); /* location in file */
- . = KERNEL_LINK_ADDR + XIP_OFFSET; /* location in memory */
+ . = ALIGN(SZ_2M); /* location in memory */

#undef LOAD_OFFSET
-#define LOAD_OFFSET (KERNEL_LINK_ADDR + XIP_OFFSET - (__data_loc & XIP_OFFSET_MASK))
+#define LOAD_OFFSET (KERNEL_LINK_ADDR + _sdata - __data_loc)

_sdata = .; /* Start of data section */
_data = .;
--
2.39.2


2024-05-12 15:47:55

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH 0/7] remove size limit on XIP kernel

Hi Nam,

On 10/05/2024 08:28, Nam Cao wrote:
> Hi,
>
> For XIP kernel, the writable data section is always at offset specified in
> XIP_OFFSET, which is hard-coded to 32MB.
>
> Unfortunately, this means the read-only section (placed before the
> writable section) is restricted in size. This causes build failure if the
> kernel gets too large.
>
> This series remove the use of XIP_OFFSET one by one, then remove this
> macro entirely at the end, with the goal of lifting this size restriction.
>
> Also some cleanup and documentation along the way.
>
> This series depends on
> https://lore.kernel.org/linux-riscv/[email protected]/
> to apply cleanly, and also depends on
> https://lore.kernel.org/linux-riscv/[email protected]/
> which fixes a boot issue.
>
> Best regards,
> Nam
>
> Nam Cao (7):
> riscv: cleanup XIP_FIXUP macro
> riscv: replace va_kernel_pa_offset with va_kernel_data_pa_offset on
> XIP
> riscv: drop the use of XIP_OFFSET in XIP_FIXUP_OFFSET
> riscv: drop the use of XIP_OFFSET in XIP_FIXUP_FLASH_OFFSET
> riscv: drop the use of XIP_OFFSET in kernel_mapping_va_to_pa()
> riscv: drop the use of XIP_OFFSET in create_kernel_page_table()
> riscv: remove limit on the size of read-only section for XIP kernel
>
> arch/riscv/include/asm/page.h | 25 ++++++++++++++++++------
> arch/riscv/include/asm/pgtable.h | 18 +++++++----------
> arch/riscv/include/asm/xip_fixup.h | 30 +++++++++++++++++++++++------
> arch/riscv/kernel/vmlinux-xip.lds.S | 4 ++--
> arch/riscv/mm/init.c | 11 +++++++----
> 5 files changed, 59 insertions(+), 29 deletions(-)
>

XIP kernels are intended for use with flash devices so the XIP_OFFSET
restriction actually represents the size of the flash device: IIRC this
32MB was chosen because it would fit "most devices". I think it would be
good to come up with a mechanism that allows to restrict the size at
build time: a config? XIP kernels are custom kernels so the user could
enter its flash size so that if kernel ends up being too large, it
fails. And by default, we could a very large size to avoid kernel test
robot build failures.

Thanks,

Alex


2024-05-13 08:24:25

by Nam Cao

[permalink] [raw]
Subject: Re: [PATCH 0/7] remove size limit on XIP kernel

On Sun, May 12, 2024 at 05:47:24PM +0200, Alexandre Ghiti wrote:
> XIP kernels are intended for use with flash devices so the XIP_OFFSET
> restriction actually represents the size of the flash device: IIRC this 32MB
> was chosen because it would fit "most devices". I think it would be good to
> come up with a mechanism that allows to restrict the size at build time: a
> config? XIP kernels are custom kernels so the user could enter its flash
> size so that if kernel ends up being too large, it fails. And by default, we
> could a very large size to avoid kernel test robot build failures.

I'm not sure if it is beneficial to do that. Users' flash tool would
complain about the kernel not fitting in flash anyway. I think this would
only make it (a bit more) complicated to build the kernel.

Furthermore XIP_OFFSET at the moment is not the flash size, it is size of
text (and some other read-only sections). Kernel's data sections are in
flash too, which is copied to RAM during boot.

With the current linker setting, the first 32MB of virtual memory is
occupied by .text. Then at 32MB offset, .data section starts. So if
XIP_OFFSET limit is exceeded, the kernel's size is already way more than
32MB.

Instead, this series moves .data and .text right next to each other (well,
almost, because we need PMD_SIZE alignment). So that if .text size exceed
32MB, the offset that .data resides would increase as well.

If we really want to set flash size during build time (which, again, I
doubt the benefit of), this series would still be relevant: it is not just
removing the size limit, it is also removing the fixed position of the
data section in virtual address space (in other words, removing the
useless "hole" between .text section and .data section).

Best regards,
Nam


2024-05-27 12:50:08

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH 5/7] riscv: drop the use of XIP_OFFSET in kernel_mapping_va_to_pa()


On 10/05/2024 08:28, Nam Cao wrote:
> XIP_OFFSET is the hard-coded offset of writable data section within the
> kernel.
>
> By hard-coding this value, the read-only section of the kernel (which is
> placed before the writable data section) is restricted in size.
>
> As a preparation to remove this hard-coded macro XIP_OFFSET entirely,
> remove the use of XIP_OFFSET in kernel_mapping_va_to_pa(). The macro
> XIP_OFFSET is used in this case to check if the virtual address is mapped
> to Flash or to RAM. The same check can be done with kernel_map.xiprom_sz.
>
> Signed-off-by: Nam Cao <[email protected]>
> ---
> arch/riscv/include/asm/page.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 14d0de928f9b..bcd77df15835 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -159,7 +159,7 @@ phys_addr_t linear_mapping_va_to_pa(unsigned long x);
> #ifdef CONFIG_XIP_KERNEL
> #define kernel_mapping_va_to_pa(y) ({ \
> unsigned long _y = (unsigned long)(y); \
> - (_y < kernel_map.virt_addr + XIP_OFFSET) ? \
> + (_y < kernel_map.virt_addr + kernel_map.xiprom_sz) ? \
> (_y - kernel_map.va_kernel_xip_pa_offset) : \
> (_y - kernel_map.va_kernel_data_pa_offset); \
> })


Reviewed-by: Alexandre Ghiti <[email protected]>

Thanks,

Alex


2024-05-27 12:53:41

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH 6/7] riscv: drop the use of XIP_OFFSET in create_kernel_page_table()


On 10/05/2024 08:28, Nam Cao wrote:
> XIP_OFFSET is the hard-coded offset of writable data section within the
> kernel.
>
> By hard-coding this value, the read-only section of the kernel (which is
> placed before the writable data section) is restricted in size.
>
> As a preparation to remove this hard-coded value entirely, stop using
> XIP_OFFSET in create_kernel_page_table(). Instead use _sdata and _start to
> do the same thing.
>
> Signed-off-by: Nam Cao <[email protected]>
> ---
> arch/riscv/mm/init.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 9846c6924509..62ff4aa2be96 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -907,7 +907,7 @@ static void __init relocate_kernel(void)
> static void __init create_kernel_page_table(pgd_t *pgdir,
> __always_unused bool early)
> {
> - uintptr_t va, end_va;
> + uintptr_t va, start_va, end_va;
>
> /* Map the flash resident part */
> end_va = kernel_map.virt_addr + kernel_map.xiprom_sz;
> @@ -917,10 +917,11 @@ static void __init create_kernel_page_table(pgd_t *pgdir,
> PMD_SIZE, PAGE_KERNEL_EXEC);
>
> /* Map the data in RAM */
> + start_va = kernel_map.virt_addr + (uintptr_t)&_sdata - (uintptr_t)&_start;
> end_va = kernel_map.virt_addr + kernel_map.size;
> - for (va = kernel_map.virt_addr + XIP_OFFSET; va < end_va; va += PMD_SIZE)
> + for (va = start_va; va < end_va; va += PMD_SIZE)
> create_pgd_mapping(pgdir, va,
> - kernel_map.phys_addr + (va - (kernel_map.virt_addr + XIP_OFFSET)),
> + kernel_map.phys_addr + (va - start_va),
> PMD_SIZE, PAGE_KERNEL);
> }
> #else


Reviewed-by: Alexandre Ghiti <[email protected]>

Thanks,

Alex


2024-05-27 13:22:22

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH 7/7] riscv: remove limit on the size of read-only section for XIP kernel


On 10/05/2024 08:28, Nam Cao wrote:
> XIP_OFFSET is the hard-coded offset of writable data section within the
> kernel.
>
> By hard-coding this value, the read-only section of the kernel (which is
> placed before the writable data section) is restricted in size. This causes
> build failures if the kernel get too big (an example is in Closes:).

s/get/gets

I think you can use:

Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ [1]

And instead use:

s/an example is in Closes:/[1]


>
> Remove this limit.
>
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
> Signed-off-by: Nam Cao <[email protected]>
> ---
> arch/riscv/include/asm/pgtable.h | 7 -------
> arch/riscv/kernel/vmlinux-xip.lds.S | 4 ++--
> 2 files changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index fbf342f4afee..75f4a92ea5bb 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -104,13 +104,6 @@
>
> #endif
>
> -#ifdef CONFIG_XIP_KERNEL
> -#define XIP_OFFSET SZ_32M
> -#define XIP_OFFSET_MASK (SZ_32M - 1)
> -#else
> -#define XIP_OFFSET 0
> -#endif
> -
> #ifndef __ASSEMBLY__
>
> #include <asm/page.h>
> diff --git a/arch/riscv/kernel/vmlinux-xip.lds.S b/arch/riscv/kernel/vmlinux-xip.lds.S
> index 8c3daa1b0531..01f73f2ffecc 100644
> --- a/arch/riscv/kernel/vmlinux-xip.lds.S
> +++ b/arch/riscv/kernel/vmlinux-xip.lds.S
> @@ -65,10 +65,10 @@ SECTIONS
> * From this point, stuff is considered writable and will be copied to RAM
> */
> __data_loc = ALIGN(PAGE_SIZE); /* location in file */
> - . = KERNEL_LINK_ADDR + XIP_OFFSET; /* location in memory */
> + . = ALIGN(SZ_2M); /* location in memory */


You can't use SZ_2M here since it corresponds to PMD_SIZE for rv64 but
on rv32 (which is allowed to use xip kernels), it's 4MB. Use
SECTION_ALIGN instead.


>
> #undef LOAD_OFFSET
> -#define LOAD_OFFSET (KERNEL_LINK_ADDR + XIP_OFFSET - (__data_loc & XIP_OFFSET_MASK))
> +#define LOAD_OFFSET (KERNEL_LINK_ADDR + _sdata - __data_loc)
>
> _sdata = .; /* Start of data section */
> _data = .;

When the above comment is fixed, you can add:

Reviewed-by: Alexandre Ghiti <[email protected]>

And many thanks for this big cleanup.

Alex


2024-06-02 07:32:28

by Nam Cao

[permalink] [raw]
Subject: Re: [PATCH 7/7] riscv: remove limit on the size of read-only section for XIP kernel

On Mon, May 27, 2024 at 02:58:14PM +0200, Alexandre Ghiti wrote:
> > diff --git a/arch/riscv/kernel/vmlinux-xip.lds.S b/arch/riscv/kernel/vmlinux-xip.lds.S
> > index 8c3daa1b0531..01f73f2ffecc 100644
> > --- a/arch/riscv/kernel/vmlinux-xip.lds.S
> > +++ b/arch/riscv/kernel/vmlinux-xip.lds.S
> > @@ -65,10 +65,10 @@ SECTIONS
> > * From this point, stuff is considered writable and will be copied to RAM
> > */
> > __data_loc = ALIGN(PAGE_SIZE); /* location in file */
> > - . = KERNEL_LINK_ADDR + XIP_OFFSET; /* location in memory */
> > + . = ALIGN(SZ_2M); /* location in memory */
>
> You can't use SZ_2M here since it corresponds to PMD_SIZE for rv64 but on
> rv32 (which is allowed to use xip kernels), it's 4MB. Use SECTION_ALIGN
> instead.

SECTION_ALIGN doesn't work unfortunately. For XIP, SECTION_ALIGN is
L1_CACHE_BYTES which is 64 bytes, but we need at least PMD_SIZE alignment
to setup virtual mapping.

Ideally we use PMD_SIZE here, but I can't #include that header file.
Probably we can refactor the header files so that we can #include the
header file that PMD_SIZE is in. But I am not sure if it's worth it.

I'm thinking just go for:
ifdef CONFIG_64_BIT
. = ALIGN(SZ_2M);
#else
. = ALIGN(SZ_4M);
#endif

Or even simpler, just:
. = ALIGN(SZ_4M);

As much as I hate magic numbers, I think we can give linker script some
leeway. Perhaps with an explanation why this alignment is chosen?

Or do you have a better idea?

Best regards,
Nam

2024-06-07 19:18:07

by Nam Cao

[permalink] [raw]
Subject: Re: [PATCH 7/7] riscv: remove limit on the size of read-only section for XIP kernel

On Sun, Jun 02, 2024 at 09:32:17AM +0200, Nam Cao wrote:
> On Mon, May 27, 2024 at 02:58:14PM +0200, Alexandre Ghiti wrote:
> > > diff --git a/arch/riscv/kernel/vmlinux-xip.lds.S b/arch/riscv/kernel/vmlinux-xip.lds.S
> > > index 8c3daa1b0531..01f73f2ffecc 100644
> > > --- a/arch/riscv/kernel/vmlinux-xip.lds.S
> > > +++ b/arch/riscv/kernel/vmlinux-xip.lds.S
> > > @@ -65,10 +65,10 @@ SECTIONS
> > > * From this point, stuff is considered writable and will be copied to RAM
> > > */
> > > __data_loc = ALIGN(PAGE_SIZE); /* location in file */
> > > - . = KERNEL_LINK_ADDR + XIP_OFFSET; /* location in memory */
> > > + . = ALIGN(SZ_2M); /* location in memory */
> >
> > You can't use SZ_2M here since it corresponds to PMD_SIZE for rv64 but on
> > rv32 (which is allowed to use xip kernels), it's 4MB. Use SECTION_ALIGN
> > instead.
>
> SECTION_ALIGN doesn't work unfortunately. For XIP, SECTION_ALIGN is
> L1_CACHE_BYTES which is 64 bytes, but we need at least PMD_SIZE alignment
> to setup virtual mapping.

Sorry, I think I had tunnel vision. The solution is so obvious.

I will send v2 shortly. Thanks so much for spending time reviewing.

Best regards,
Nam
>
> Ideally we use PMD_SIZE here, but I can't #include that header file.
> Probably we can refactor the header files so that we can #include the
> header file that PMD_SIZE is in. But I am not sure if it's worth it.
>
> I'm thinking just go for:
> ifdef CONFIG_64_BIT
> . = ALIGN(SZ_2M);
> #else
> . = ALIGN(SZ_4M);
> #endif
>
> Or even simpler, just:
> . = ALIGN(SZ_4M);
>
> As much as I hate magic numbers, I think we can give linker script some
> leeway. Perhaps with an explanation why this alignment is chosen?
>
> Or do you have a better idea?
>
> Best regards,
> Nam