2024-02-20 08:39:34

by Yuntao Liu

[permalink] [raw]
Subject: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

The current arm32 architecture does not yet support the
HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
embedded scenarios, and enabling this feature would be beneficial for
reducing the size of the kernel image.

In order to make this work, we keep the necessary tables by annotating
them with KEEP, also it requires further changes to linker script to KEEP
some tables and wildcard compiler generated sections into the right place.

It boots normally with defconfig, vexpress_defconfig and tinyconfig.

The size comparison of zImage is as follows:
defconfig vexpress_defconfig tinyconfig
5137712 5138024 424192 no dce
5032560 4997824 298384 dce
2.0% 2.7% 29.7% shrink

When using smaller config file, there is a significant reduction in the
size of the zImage.

We also tested this patch on a commercially available single-board
computer, and the comparison is as follows:
a15eb_config
2161384 no dce
2092240 dce
3.2% shrink

The zImage size has been reduced by approximately 3.2%, which is 70KB on
2.1M.

Signed-off-by: Yuntao Liu <[email protected]>
---
arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/vmlinux.lds.S | 4 ++--
arch/arm/include/asm/vmlinux.lds.h | 8 ++++----
arch/arm/kernel/vmlinux.lds.S | 6 +++---
4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 0af6709570d1..de78ceb821df 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -113,6 +113,7 @@ config ARM
select HAVE_KERNEL_XZ
select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M
select HAVE_KRETPROBES if HAVE_KPROBES
+ select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI
select HAVE_OPTPROBES if !THUMB2_KERNEL
diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
index 3fcb3e62dc56..da21244aa892 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.S
+++ b/arch/arm/boot/compressed/vmlinux.lds.S
@@ -89,7 +89,7 @@ SECTIONS
* The EFI stub always executes from RAM, and runs strictly before the
* decompressor, so we can make an exception for its r/w data, and keep it
*/
- *(.data.efistub .bss.efistub)
+ *(.data.* .bss.*)
__pecoff_data_end = .;

/*
@@ -125,7 +125,7 @@ SECTIONS

. = BSS_START;
__bss_start = .;
- .bss : { *(.bss) }
+ .bss : { *(.bss .bss.*) }
_end = .;

. = ALIGN(8); /* the stack must be 64-bit aligned */
diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h
index 4c8632d5c432..f2ff79f740ab 100644
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -42,7 +42,7 @@
#define PROC_INFO \
. = ALIGN(4); \
__proc_info_begin = .; \
- *(.proc.info.init) \
+ KEEP(*(.proc.info.init)) \
__proc_info_end = .;

#define IDMAP_TEXT \
@@ -125,13 +125,13 @@
__vectors_lma = .; \
OVERLAY 0xffff0000 : NOCROSSREFS AT(__vectors_lma) { \
.vectors { \
- *(.vectors) \
+ KEEP(*(.vectors)) \
} \
.vectors.bhb.loop8 { \
- *(.vectors.bhb.loop8) \
+ KEEP(*(.vectors.bhb.loop8)) \
} \
.vectors.bhb.bpiall { \
- *(.vectors.bhb.bpiall) \
+ KEEP(*(.vectors.bhb.bpiall)) \
} \
} \
ARM_LMA(__vectors, .vectors); \
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index bd9127c4b451..de373c6c2ae8 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -74,7 +74,7 @@ SECTIONS
. = ALIGN(4);
__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
__start___ex_table = .;
- ARM_MMU_KEEP(*(__ex_table))
+ ARM_MMU_KEEP(KEEP(*(__ex_table)))
__stop___ex_table = .;
}

@@ -99,7 +99,7 @@ SECTIONS
}
.init.arch.info : {
__arch_info_begin = .;
- *(.arch.info.init)
+ KEEP(*(.arch.info.init))
__arch_info_end = .;
}
.init.tagtable : {
@@ -116,7 +116,7 @@ SECTIONS
#endif
.init.pv_table : {
__pv_table_begin = .;
- *(.pv_table)
+ KEEP(*(.pv_table))
__pv_table_end = .;
}

--
2.34.1



2024-02-20 08:41:56

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote:
> The current arm32 architecture does not yet support the
> HAVE_LD_DEAD_CODE_DATA_ELIMINATION feature. arm32 is widely used in
> embedded scenarios, and enabling this feature would be beneficial for
> reducing the size of the kernel image.
>
> In order to make this work, we keep the necessary tables by annotating
> them with KEEP, also it requires further changes to linker script to KEEP
> some tables and wildcard compiler generated sections into the right place.

Thanks for the patch, I think this is a very useful feature
and we should get this upstream, especially if we can combine
it with CONFIG_LTO_CLANG (which is supported on arm64 at the
moment, but not on arm).

> It boots normally with defconfig, vexpress_defconfig and tinyconfig.
>
> The size comparison of zImage is as follows:
> defconfig vexpress_defconfig tinyconfig
> 5137712 5138024 424192 no dce
> 5032560 4997824 298384 dce
> 2.0% 2.7% 29.7% shrink
>
> When using smaller config file, there is a significant reduction in the
> size of the zImage.
>
> We also tested this patch on a commercially available single-board
> computer, and the comparison is as follows:
> a15eb_config
> 2161384 no dce
> 2092240 dce
> 3.2% shrink
>
> The zImage size has been reduced by approximately 3.2%, which is 70KB on
> 2.1M.

Nice description! I do suspect that there will be additional
bugs that we run into with some corner cases.

> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S
> b/arch/arm/boot/compressed/vmlinux.lds.S
> index 3fcb3e62dc56..da21244aa892 100644
> --- a/arch/arm/boot/compressed/vmlinux.lds.S
> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> @@ -89,7 +89,7 @@ SECTIONS
> * The EFI stub always executes from RAM, and runs strictly before
> the
> * decompressor, so we can make an exception for its r/w data, and
> keep it
> */
> - *(.data.efistub .bss.efistub)
> + *(.data.* .bss.*)
> __pecoff_data_end = .;
>
> /*

This doesn't seem right to me, or maybe I misunderstand what
the original version does. Have you tested with both
CONFIG_EFI_STUB on and off, and booting with and without
UEFI?

If I read this right, you would move all .data and .bss
into the stub here, not just the parts we actually want?

> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
> index bd9127c4b451..de373c6c2ae8 100644
> --- a/arch/arm/kernel/vmlinux.lds.S
> +++ b/arch/arm/kernel/vmlinux.lds.S
> @@ -74,7 +74,7 @@ SECTIONS
> . = ALIGN(4);
> __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
> __start___ex_table = .;
> - ARM_MMU_KEEP(*(__ex_table))
> + ARM_MMU_KEEP(KEEP(*(__ex_table)))
> __stop___ex_table = .;
> }
>
> @@ -116,7 +116,7 @@ SECTIONS
> #endif
> .init.pv_table : {
> __pv_table_begin = .;
> - *(.pv_table)
> + KEEP(*(.pv_table))
> __pv_table_end = .;
> }

I guess this prevents discarding any function that has a reference
from pv_table or ex_table, even if there are no other references,
right?

I don't know how to solve this other than forcing all the
uaccess and virt_to_phys functions to be out of line
helpers. For uaccess, there are probably very few functions
that need this, so it should make little difference.

You might want to try changing CONFIG_ARM_PATCH_PHYS_VIRT
into a method that just always adds an offset from C code
instead of the boot time patching. That way the code would
be a bit less efficient but you might be able to get
a larger size reduction by dropping additional unused code.

Maybe test your patch both with and without
ARM_PATCH_PHYS_VIRT to see what the best-case impact would
be.

Arnd

2024-02-20 09:54:21

by Yuntao Liu

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION



在 2024/2/20 16:40, Arnd Bergmann 写道:
> On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote:
>> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S
>> b/arch/arm/boot/compressed/vmlinux.lds.S
>> index 3fcb3e62dc56..da21244aa892 100644
>> --- a/arch/arm/boot/compressed/vmlinux.lds.S
>> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
>> @@ -89,7 +89,7 @@ SECTIONS
>> * The EFI stub always executes from RAM, and runs strictly before
>> the
>> * decompressor, so we can make an exception for its r/w data, and
>> keep it
>> */
>> - *(.data.efistub .bss.efistub)
>> + *(.data.* .bss.*)
>> __pecoff_data_end = .;
>>
>> /*
>
> This doesn't seem right to me, or maybe I misunderstand what
> the original version does. Have you tested with both
> CONFIG_EFI_STUB on and off, and booting with and without
> UEFI?

Yes, I have tested with CONFIG_EFI_STUB on and off, and booting with
UEFI on a single-board computer, and it boots well.


>
> If I read this right, you would move all .data and .bss
> into the stub here, not just the parts we actually want?

In the file "drivers/firmware/efi/libstub/Makefile", it is written:

---

#
# ARM discards the .data section because it disallows r/w data in the
# decompressor. So move our .data to .data.efistub and .bss to .bss.efistub,
# which are preserved explicitly by the decompressor linker script.
#
STUBCOPY_FLAGS-$(CONFIG_ARM) += --rename-section .data=.data.efistub \
--rename-section .bss=.bss.efistub,load,alloc

---

I think that .data.efistub represents the entire .data section, the same
applies to .bss as well,

so i move all .data and .bss into the stub here.


>
>> diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
>> index bd9127c4b451..de373c6c2ae8 100644
>> --- a/arch/arm/kernel/vmlinux.lds.S
>> +++ b/arch/arm/kernel/vmlinux.lds.S
>> @@ -74,7 +74,7 @@ SECTIONS
>> . = ALIGN(4);
>> __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
>> __start___ex_table = .;
>> - ARM_MMU_KEEP(*(__ex_table))
>> + ARM_MMU_KEEP(KEEP(*(__ex_table)))
>> __stop___ex_table = .;
>> }
>>
>> @@ -116,7 +116,7 @@ SECTIONS
>> #endif
>> .init.pv_table : {
>> __pv_table_begin = .;
>> - *(.pv_table)
>> + KEEP(*(.pv_table))
>> __pv_table_end = .;
>> }
>
> I guess this prevents discarding any function that has a reference
> from pv_table or ex_table, even if there are no other references,
> right?

Indeed so, if not keep ex_table, the compilation process will result in

an error:

no __ex_table in file: vmlinux

Failed to sort kernel tables

and if not keep pv_table, It can be compiled successfully, but the QEMU
boots will fail.

>
> I don't know how to solve this other than forcing all the
> uaccess and virt_to_phys functions to be out of line
> helpers. For uaccess, there are probably very few functions
> that need this, so it should make little difference.
>
> You might want to try changing CONFIG_ARM_PATCH_PHYS_VIRT
> into a method that just always adds an offset from C code
> instead of the boot time patching. That way the code would
> be a bit less efficient but you might be able to get
> a larger size reduction by dropping additional unused code.
>
> Maybe test your patch both with and without
> ARM_PATCH_PHYS_VIRT to see what the best-case impact would
> be.
>

This is a very good idea, I will give it a try.

> Arnd

2024-02-21 15:52:17

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

On Tue, Feb 20, 2024, at 10:53, liuyuntao (F) wrote:
> 在 2024/2/20 16:40, Arnd Bergmann 写道:
>> On Tue, Feb 20, 2024, at 09:15, Yuntao Liu wrote:
> #
> # ARM discards the .data section because it disallows r/w data in the
> # decompressor. So move our .data to .data.efistub and .bss to .bss.efistub,
> # which are preserved explicitly by the decompressor linker script.
> #
> STUBCOPY_FLAGS-$(CONFIG_ARM) += --rename-section .data=.data.efistub \
> --rename-section .bss=.bss.efistub,load,alloc
>
> ---
>
> I think that .data.efistub represents the entire .data section, the same
> applies to .bss as well,
>
> so i move all .data and .bss into the stub here.
>

Ok, I see.

>>
>> I guess this prevents discarding any function that has a reference
>> from pv_table or ex_table, even if there are no other references,
>> right?
>
> Indeed so, if not keep ex_table, the compilation process will result in
>
> an error:
>
> no __ex_table in file: vmlinux
>
> Failed to sort kernel tables

Sure, and without the ex_table contents, it would not be able
to recover from a failed uaccess either.

> and if not keep pv_table, It can be compiled successfully, but the QEMU
> boots will fail.

Right. The pv_table isn't technically necessary since it can
be disabled. I think it was originally introduced in order
to avoid performance regressions when we introduced multiplatform
kernels that can run at arbitrary physical addresses rather than
having it as a build-time constant.

I don't know how much difference that actually makes for performance,
so turning it into a normal runtime lookup may or may not be
a good compromise when building with HAVE_LD_DEAD_CODE_DATA_ELIMINATION.

I have given your patch some build testing with random
configurations in my build setup and it seems to work
fine with gcc/binutils, but unfortunately I came across
a link failure using clang/lld:

ld.lld: error: ./arch/arm/kernel/vmlinux.lds:35: ( expected, but got }
>>> __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } .vectors.bhb.loop8 { KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); __vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); __vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); __vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = __vectors_lma + SIZEOF(.vectors) + SIZEOF(vectors.bhb.loop8) + SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); __stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors));
>>> ^

I don't immediately see what the problem is here, I hope you
can address it before you send a v2.

Arnd

2024-02-22 09:52:42

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

On Wed, Feb 21, 2024, at 16:51, Arnd Bergmann wrote:
> I have given your patch some build testing with random
> configurations in my build setup and it seems to work
> fine with gcc/binutils, but unfortunately I came across
> a link failure using clang/lld:

I ran into another bug now, this time with CONFIG_XIP_KERNEL=y:

no __ex_table in file: vmlinux
Failed to sort kernel tables
make[4]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1

Essentially you have to modify arch/arm/kernel/vmlinux-xip.lds.S
the same way as vmlinux.lds.S:

--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -63,7 +63,7 @@ SECTIONS
. = ALIGN(4);
__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
__start___ex_table = .;
- ARM_MMU_KEEP(*(__ex_table))
+ ARM_MMU_KEEP(KEEP(*(__ex_table)))
__stop___ex_table = .;
}

@@ -83,7 +83,7 @@ SECTIONS
}
.init.arch.info : {
__arch_info_begin = .;
- *(.arch.info.init)
+ KEEP(*(.arch.info.init))
__arch_info_end = .;
}
.init.tagtable : {


The pv_table is not needed for XIP_KERNEL=y because that
requires not patching the kernel.

Arnd

2024-02-22 11:24:42

by Yuntao Liu

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION



On 2024/2/21 23:51, Arnd Bergmann wrote:
> I have given your patch some build testing with random
> configurations in my build setup and it seems to work
> fine with gcc/binutils, but unfortunately I came across
> a link failure using clang/lld:
>
> ld.lld: error: ./arch/arm/kernel/vmlinux.lds:35: ( expected, but got }
>>>> __vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors { KEEP(*(.vectors)) } .vectors.bhb.loop8 { KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall { KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors); __vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors); __vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8); __vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start = LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end = LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . = __vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) + SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) + 0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs); __stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma + SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq - ADDR(.vectors));
>>>> ^
>
> I don't immediately see what the problem is here, I hope you
> can address it before you send a v2.
>
> Arnd

I reproduced this issue with make LLVM=1 ARCH=arm -j320 > ../make.txt.
Based on the position of the caret, I speculate that the issue arises from
lld's inability to recognize the KEEP() command preceding the OUTPUT
section in the OVERLAY command.
>
The full syntax of the OVERLAY command is as follows: OVERLAY [start] :
[NOCROSSREFS] [AT ( ldaddr )] { secname1 { output-section-command
output-section-command … } [:phdr…] [=fill] secname2 {
output-section-command output-section-command … } [:phdr…] [=fill] … }
[>region] [:phdr…] [=fill] [,]
>
I attempted to modify KEEP(*(.vectors)) to KEEP(*(.vectors))(.vectors),
and received the following error message:
ld.lld: error: ./arch/arm/kernel/vmlinux.lds:36: ( expected, but got } >
__vectors_lma = .; OVERLAY 0xffff0000 : AT(__vectors_lma) { .vectors {
KEEP(*(.vectors))(.vectors) } .vectors.bhb.loop8 {
KEEP(*(.vectors.bhb.loop8)) } .vectors.bhb.bpiall {
KEEP(*(.vectors.bhb.bpiall)) } } __vectors_start = LOADADDR(.vectors);
__vectors_end = LOADADDR(.vectors) + SIZEOF(.vectors);
__vectors_bhb_loop8_start = LOADADDR(.vectors.bhb.loop8);
__vectors_bhb_loop8_end = LOADADDR(.vectors.bhb.loop8) +
SIZEOF(.vectors.bhb.loop8); __vectors_bhb_bpiall_start =
LOADADDR(.vectors.bhb.bpiall); __vectors_bhb_bpiall_end =
LOADADDR(.vectors.bhb.bpiall) + SIZEOF(.vectors.bhb.bpiall); . =
__vectors_lma + SIZEOF(.vectors) + SIZEOF(.vectors.bhb.loop8) +
SIZEOF(.vectors.bhb.bpiall); __stubs_lma = .; .stubs ADDR(.vectors) +
0x1000 : AT(__stubs_lma) { *(.stubs) } __stubs_start = LOADADDR(.stubs);
__stubs_end = LOADADDR(.stubs) + SIZEOF(.stubs); . = __stubs_lma +
SIZEOF(.stubs); PROVIDE(vector_fiq_offset = vector_fiq -
ADDR(.vectors)); > ^
The position of the caret has been moved below the right brace
of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
the entire `KEEP(*(.vectors))` as a file name. This could potentially be
a bug in lld. Perhaps we can temporarily
enable the DCE option only when option LD_IS_LLD is disabled,
like risc-v:
`select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.

2024-02-22 11:32:36

by Yuntao Liu

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION



On 2024/2/22 17:52, Arnd Bergmann wrote:
> On Wed, Feb 21, 2024, at 16:51, Arnd Bergmann wrote:
>> I have given your patch some build testing with random
>> configurations in my build setup and it seems to work
>> fine with gcc/binutils, but unfortunately I came across
>> a link failure using clang/lld:
>
> I ran into another bug now, this time with CONFIG_XIP_KERNEL=y:
>
> no __ex_table in file: vmlinux
> Failed to sort kernel tables
> make[4]: *** [scripts/Makefile.vmlinux:37: vmlinux] Error 1
>
> Essentially you have to modify arch/arm/kernel/vmlinux-xip.lds.S
> the same way as vmlinux.lds.S:
>
Thanks a lot. I didn't consider this situation. I will take your advice.
Thanks again.

> --- a/arch/arm/kernel/vmlinux-xip.lds.S
> +++ b/arch/arm/kernel/vmlinux-xip.lds.S
> @@ -63,7 +63,7 @@ SECTIONS
> . = ALIGN(4);
> __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
> __start___ex_table = .;
> - ARM_MMU_KEEP(*(__ex_table))
> + ARM_MMU_KEEP(KEEP(*(__ex_table)))
> __stop___ex_table = .;
> }
>
> @@ -83,7 +83,7 @@ SECTIONS
> }
> .init.arch.info : {
> __arch_info_begin = .;
> - *(.arch.info.init)
> + KEEP(*(.arch.info.init))
> __arch_info_end = .;
> }
> .init.tagtable : {
>
>
> The pv_table is not needed for XIP_KERNEL=y because that
> requires not patching the kernel.
>
> Arnd

2024-02-22 16:07:12

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>
> The position of the caret has been moved below the right brace
> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
> a bug in lld. Perhaps we can temporarily
> enable the DCE option only when option LD_IS_LLD is disabled,
> like risc-v:
>
> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.

I would really like to see this working with lld if at all
possible, as it allows the combination of gc-sections with
lto and CONFIG_TRIM_UNUSED_KSYMS.

I experimented with lld myself now and I did get a booting
kernel even without the the KEEP() on the vectors. I also
see that this is the only use of OVERLAY in the kernel, so
I hope that we can find a way to make it work with existing
lld after all, either without the KEEP or without the OVERLAY.

Did you see any problems without the KEEP() on the vectors?

Arnd

2024-02-23 01:40:02

by Yuntao Liu

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION



On 2024/2/23 0:04, Arnd Bergmann wrote:
> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>
>> The position of the caret has been moved below the right brace
>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>> a bug in lld. Perhaps we can temporarily
>> enable the DCE option only when option LD_IS_LLD is disabled,
>> like risc-v:
>>
>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>
> I would really like to see this working with lld if at all
> possible, as it allows the combination of gc-sections with
> lto and CONFIG_TRIM_UNUSED_KSYMS.
>
> I experimented with lld myself now and I did get a booting
> kernel even without the the KEEP() on the vectors. I also

When the kernel booted up successfully, was config DCE enabled?

> see that this is the only use of OVERLAY in the kernel, so
> I hope that we can find a way to make it work with existing
> lld after all, either without the KEEP or without the OVERLAY.

Yeah, i will try other way to make it work.

>
> Did you see any problems without the KEEP() on the vectors?
>
> Arnd

Without the KEEP(),disable DCE, qemu boots well,
enable DCE,qemu hangs on startup.
I experimented with lld using vexpress_defconfig.

2024-02-23 06:33:09

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

On Fri, Feb 23, 2024, at 02:39, liuyuntao (F) wrote:
> On 2024/2/23 0:04, Arnd Bergmann wrote:
>> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>>
>>> The position of the caret has been moved below the right brace
>>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>>> a bug in lld. Perhaps we can temporarily
>>> enable the DCE option only when option LD_IS_LLD is disabled,
>>> like risc-v:
>>>
>>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>>
>> I would really like to see this working with lld if at all
>> possible, as it allows the combination of gc-sections with
>> lto and CONFIG_TRIM_UNUSED_KSYMS.
>>
>> I experimented with lld myself now and I did get a booting
>> kernel even without the the KEEP() on the vectors. I also
>
> When the kernel booted up successfully, was config DCE enabled?

My mistake. I did have DCE enabled in the kernel I built,
but ended up passing an older image to it in the end,
and that did not boot.

Arnd

2024-02-27 08:07:10

by Yuntao Liu

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION



On 2024/2/23 0:04, Arnd Bergmann wrote:
> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>
>> The position of the caret has been moved below the right brace
>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>> a bug in lld. Perhaps we can temporarily
>> enable the DCE option only when option LD_IS_LLD is disabled,
>> like risc-v:
>>
>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>
> I would really like to see this working with lld if at all
> possible, as it allows the combination of gc-sections with
> lto and CONFIG_TRIM_UNUSED_KSYMS.
>
> I experimented with lld myself now and I did get a booting
> kernel even without the the KEEP() on the vectors. I also
> see that this is the only use of OVERLAY in the kernel, so
> I hope that we can find a way to make it work with existing
> lld after all, either without the KEEP or without the OVERLAY.
>
> Did you see any problems without the KEEP() on the vectors?
>
> Arnd

Hi, Arnd. I have added a global symbol g_keep1 in .vectors, g_keep2 in
vectors.bhb.loop8 and g_keep3 in .vectors.bhb.bpiall respectively. I
also added another section to reference these three global symbols, to
prevent these sections from being removed during linking with --gc-sections.

It worked,but there should be a better way to achieve it. the patch:

diff --git a/arch/arm/include/asm/vmlinux.lds.h
b/arch/arm/include/asm/vmlinux.lds.h
index f2ff79f740ab..d60f6e83a9f7 100644
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -125,13 +125,13 @@
__vectors_lma = .; \
OVERLAY 0xffff0000 : NOCROSSREFS AT(__vectors_lma) { \
.vectors { \
- KEEP(*(.vectors)) \
+ *(.vectors) \
} \
.vectors.bhb.loop8 { \
- KEEP(*(.vectors.bhb.loop8)) \
+ *(.vectors.bhb.loop8) \
} \
.vectors.bhb.bpiall { \
- KEEP(*(.vectors.bhb.bpiall)) \
+ *(.vectors.bhb.bpiall) \
} \
} \
ARM_LMA(__vectors, .vectors); \
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 6150a716828c..84536e805da0 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -1075,6 +1075,9 @@ THUMB( .reloc ., R_ARM_THM_PC12,
L__vector_swi )
W(b) vector_addrexcptn
W(b) vector_irq
W(b) vector_fiq
+ .global g_keep1
+ g_keep1:
+ .word 0

#ifdef CONFIG_HARDEN_BRANCH_HISTORY
.section .vectors.bhb.loop8, "ax", %progbits
@@ -1088,6 +1091,9 @@ THUMB( .reloc ., R_ARM_THM_PC12,
L__vector_bhb_loop8_swi )
W(b) vector_addrexcptn
W(b) vector_bhb_loop8_irq
W(b) vector_bhb_loop8_fiq
+ .global g_keep2
+ g_keep2:
+ .word 0

.section .vectors.bhb.bpiall, "ax", %progbits
W(b) vector_rst
@@ -1100,6 +1106,9 @@ THUMB( .reloc ., R_ARM_THM_PC12,
L__vector_bhb_bpiall_swi )
W(b) vector_addrexcptn
W(b) vector_bhb_bpiall_irq
W(b) vector_bhb_bpiall_fiq
+ .global g_keep3
+ g_keep3:
+ .word 0
#endif

.data
@@ -1108,3 +1117,8 @@ THUMB( .reloc ., R_ARM_THM_PC12,
L__vector_bhb_bpiall_swi )
.globl cr_alignment
cr_alignment:
.space 4
+
+.section .keep_vectors, "ax", %progbits
+ LDR r0, =g_keep1
+ LDR r1, =g_keep2
+ LDR r2, =g_keep3
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 01a887c1141a..5cdfb4ba3ac4 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -62,6 +62,7 @@ SECTIONS
.text : { /* Real text segment */
_stext = .; /* Text and read-only data */
ARM_TEXT
+ KEEP(*(.keep_vectors))
}

#ifdef CONFIG_DEBUG_ALIGN_RODATA
--
2.34.1

> I would really like to see this working with lld if at all
> possible, as it allows the combination of gc-sections with
> lto and CONFIG_TRIM_UNUSED_KSYMS.
Then, I enabled config CONFIG_LTO_CLANG_THIN in arm32, but
came across a link failure using clang/lld:

following symbols must have non local/private scope:
free_mem_end_ptr
free_mem_ptr
malloc_count
malloc_ptr
output_data

in file arch/arm/boot/compressed/Makefile:
# We need to prevent any GOTOFF relocs being used with references
# to symbols in the .bss section since we cannot relocate them
# independently from the rest at run time. This can be achieved by
# ensuring that no private .bss symbols exist, as global symbols
# always have a GOT entry which is what we need.
# The .data section is already discarded by the linker script so no need
# to bother about it here.
check_for_bad_syms = \
bad_syms=$$($(NM) $@ | sed -n 's/^.\{8\} [bc] \(.*\)/\1/p') && \
[ -z "$$bad_syms" ] || \
( echo "following symbols must have non local/private scope:" >&2; \
echo "$$bad_syms" >&2; false )

Turn on the config CONFIG_LTO_CLANG_THIN , use nm to check the type of
these symbols as 'b', turn off the config type to 'B'. I tried to
explicitly declare these variables using
__attribute__((visibility("default")), but it didn't take effect on the
variable `output_data`.
ld.lld: warning:
thinlto-cache/llvmcache-285C4672B80361F1DA67C743A5C8350FDDEEC8E3:(.data.malloc_ptr)
is being placed in '.data.':

ld.lld: warning:
thinlto-cache/llvmcache-285C4672B80361F1DA67C743A5C8350FDDEEC8E3:(.data.malloc_ptr)
is being placed in '.data.'
ld.lld: warning:
thinlto-cache/llvmcache-285C4672B80361F1DA67C743A5C8350FDDEEC8E3:(.data.malloc_count)
is being placed in '.dat'
following symbols must have non local/private scope:
output_data

2024-03-07 03:10:05

by Yuntao Liu

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION



On 2024/2/27 16:06, liuyuntao (F) wrote:
>
>
> On 2024/2/23 0:04, Arnd Bergmann wrote:
>> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>>
>>> The position of the caret has been moved below the right brace
>>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>>> a bug in lld. Perhaps we can temporarily
>>> enable the DCE option only when option LD_IS_LLD is disabled,
>>> like risc-v:
>>>
>>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>>
>> I would really like to see this working with lld if at all
>> possible, as it allows the combination of gc-sections with
>> lto and CONFIG_TRIM_UNUSED_KSYMS.
>
Another way to preserve .vector sections without using KEEP annotation.
It boots well. How do you feel about this approach? and, could I submit
a v2 patch?

1: Define ARM_VECTORS_TEXT to explicitly preserve .vectors section.

> --- a/arch/arm/include/asm/vmlinux.lds.h
> +++ b/arch/arm/include/asm/vmlinux.lds.h
> @@ -87,6 +87,13 @@
> *(.vfp11_veneer) \
> *(.v4_bx)
>
> +#define ARM_VECTORS_TEXT \
> + .vectors.text : { \
> + KEEP(*(.vectors)) \
> + KEEP(*(.vectors.bhb.loop8)) \
> + KEEP(*(.vectors.bhb.bpiall)) \
> + }
> +
> #define ARM_TEXT \
> IDMAP_TEXT \
> __entry_text_start = .; \

2: Ref ARM_VECTORS_TEXT if config HAVE_LD_DEAD_CODE_DATA_ELIMINATION is
enabled, and the same to arch/arm/kernel/vmlinux.lds.S

> --- a/arch/arm/kernel/vmlinux-xip.lds.S

> +++ b/arch/arm/kernel/vmlinux-xip.lds.S

> @@ -135,6 +135,10 @@ SECTIONS

> ARM_TCM

> #endif

>

> +#ifdef HAVE_LD_DEAD_CODE_DATA_ELIMINATION

> + ARM_VECTORS_TEXT

> +#endif

> +

> /*

> * End of copied data. We need a dummy section to get its
LMA.
> * Also located before final ALIGN() as trailing padding is
> not stored


2024-03-07 07:29:42

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH -next] arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

On Thu, Mar 7, 2024, at 04:09, liuyuntao (F) wrote:
> On 2024/2/27 16:06, liuyuntao (F) wrote:
>>
>>
>> On 2024/2/23 0:04, Arnd Bergmann wrote:
>>> On Thu, Feb 22, 2024, at 12:24, liuyuntao (F) wrote:
>>>>
>>>> The position of the caret has been moved below the right brace
>>>> of { KEEP(*(.vectors.bhb.loop8)) }, indicating that lld is treating
>>>> the entire `KEEP(*(.vectors))` as a file name. This could potentially be
>>>> a bug in lld. Perhaps we can temporarily
>>>> enable the DCE option only when option LD_IS_LLD is disabled,
>>>> like risc-v:
>>>>
>>>> `select HAVE_LD_DEAD_CODE_DATA_ELIMINATION if !LD_IS_LLD`.
>>>
>>> I would really like to see this working with lld if at all
>>> possible, as it allows the combination of gc-sections with
>>> lto and CONFIG_TRIM_UNUSED_KSYMS.
>>
> Another way to preserve .vector sections without using KEEP annotation.
> It boots well. How do you feel about this approach? and, could I submit
> a v2 patch?

Yes, if that works, please submit it as v2, we'll see if anyone
has concerns about the new version then.

Arnd