2023-07-17 08:51:17

by Kefeng Wang

[permalink] [raw]
Subject: [PATCH] arm64: enable dead code elimination

Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
user to enable dead code elimination. In order for this to work, ensure
that we keep the necessary tables by annotating them with KEEP, also it
requires further changes to linker script to KEEP some tables and wildcard
compiler generated sections into the right place.

The following comparison is based 6.5-rc2 with defconfig,

$ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
Function old new delta
...
Total: Before=17888959, After=17824683, chg -0.36%

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
Data old new delta
...
Total: Before=4820808, After=4820764, chg -0.00%

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
RO Data old new delta
...
Total: Before=5179123, After=5178027, chg -0.02%

$ size vmlinux-base vmlinux
text data bss dec hex filename
25433734 15385766 630656 41450156 2787aac vmlinux-base
24756738 15360870 629888 40747496 26dc1e8 vmlinux-new

Memory available after booting, saving 704k on qemu,
base: 8084532K/8388608K
new: 8085236K/8388608K

Signed-off-by: Kefeng Wang <[email protected]>
---
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/vmlinux.lds.S | 5 +++--
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a2511b30d0f6..73bb908ec62f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -148,6 +148,7 @@ config ARM64
select GENERIC_VDSO_TIME_NS
select HARDIRQS_SW_RESEND
select HAS_IOPORT
+ select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_PCI
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 3cd7e76cc562..bb4ce6cd6896 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -238,7 +238,7 @@ SECTIONS
. = ALIGN(4);
.altinstructions : {
__alt_instructions = .;
- *(.altinstructions)
+ KEEP(*(.altinstructions))
__alt_instructions_end = .;
}

@@ -258,8 +258,9 @@ SECTIONS
INIT_CALLS
CON_INITCALL
INIT_RAM_FS
- *(.init.altinstructions .init.bss) /* from the EFI stub */
+ KEEP(*(.init.altinstructions .init.bss*)) /* from the EFI stub */
}
+
.exit.data : {
EXIT_DATA
}
--
2.27.0



2023-07-17 10:03:29

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] arm64: enable dead code elimination

On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote:
> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
> user to enable dead code elimination. In order for this to work, ensure
> that we keep the necessary tables by annotating them with KEEP, also it
> requires further changes to linker script to KEEP some tables and wildcard
> compiler generated sections into the right place.
>
> The following comparison is based 6.5-rc2 with defconfig,
>
> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
> Function old new delta
> ...
> Total: Before=17888959, After=17824683, chg -0.36%
>
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
> Data old new delta
> ...
> Total: Before=4820808, After=4820764, chg -0.00%
>
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
> RO Data old new delta
> ...
> Total: Before=5179123, After=5178027, chg -0.02%
>
> $ size vmlinux-base vmlinux
> text data bss dec hex filename
> 25433734 15385766 630656 41450156 2787aac vmlinux-base
> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new
>
> Memory available after booting, saving 704k on qemu,
> base: 8084532K/8388608K
> new: 8085236K/8388608K

Is that a 0.009% improvement? Is it really worth the hassle?

x86 doesn't select this and risc-v had to turn it off for LLD, so it feels
like we're just creating a rod for our own back by selecting it.

Will

> Signed-off-by: Kefeng Wang <[email protected]>
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/kernel/vmlinux.lds.S | 5 +++--
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2511b30d0f6..73bb908ec62f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -148,6 +148,7 @@ config ARM64
> select GENERIC_VDSO_TIME_NS
> select HARDIRQS_SW_RESEND
> select HAS_IOPORT
> + select HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> select HAVE_MOVE_PMD
> select HAVE_MOVE_PUD
> select HAVE_PCI
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 3cd7e76cc562..bb4ce6cd6896 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -238,7 +238,7 @@ SECTIONS
> . = ALIGN(4);
> .altinstructions : {
> __alt_instructions = .;
> - *(.altinstructions)
> + KEEP(*(.altinstructions))
> __alt_instructions_end = .;
> }
>
> @@ -258,8 +258,9 @@ SECTIONS
> INIT_CALLS
> CON_INITCALL
> INIT_RAM_FS
> - *(.init.altinstructions .init.bss) /* from the EFI stub */
> + KEEP(*(.init.altinstructions .init.bss*)) /* from the EFI stub */
> }
> +
> .exit.data : {
> EXIT_DATA
> }
> --
> 2.27.0
>

2023-07-17 10:34:00

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] arm64: enable dead code elimination

On 2023-07-17 09:07, Kefeng Wang wrote:
> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing
> the
> user to enable dead code elimination. In order for this to work, ensure
> that we keep the necessary tables by annotating them with KEEP, also it
> requires further changes to linker script to KEEP some tables and
> wildcard
> compiler generated sections into the right place.
>
> The following comparison is based 6.5-rc2 with defconfig,
>
> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980
> (-64276)
> Function old new delta
> ...
> Total: Before=17888959, After=17824683, chg -0.36%
>
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
> Data old new delta
> ...
> Total: Before=4820808, After=4820764, chg -0.00%
>
> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
> RO Data old new delta
> ...
> Total: Before=5179123, After=5178027, chg -0.02%
>
> $ size vmlinux-base vmlinux
> text data bss dec hex filename
> 25433734 15385766 630656 41450156 2787aac vmlinux-base
> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new
>
> Memory available after booting, saving 704k on qemu,
> base: 8084532K/8388608K
> new: 8085236K/8388608K
>
> Signed-off-by: Kefeng Wang <[email protected]>

I took this patch for a spin in my tree, and ended up with:

CC .vmlinux.export.o
UPD include/generated/utsversion.h
CC init/version-timestamp.o
LD .tmp_vmlinux.kallsyms1
ld: init/main.o(__patchable_function_entries): error: need linked-to
section for --gc-sections
make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: vmlinux]
Error 2
make: *** [Makefile:234: __sub-make] Error 2

so it's probably not ready for prime time.

M.
--
Jazz is not dead. It just smells funny...

2023-07-17 11:41:49

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH] arm64: enable dead code elimination



On 2023/7/17 17:24, Will Deacon wrote:
> On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote:
>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>> user to enable dead code elimination. In order for this to work, ensure
>> that we keep the necessary tables by annotating them with KEEP, also it
>> requires further changes to linker script to KEEP some tables and wildcard
>> compiler generated sections into the right place.
>>
>> The following comparison is based 6.5-rc2 with defconfig,
>>
>> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
>> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
>> Function old new delta
>> ...
>> Total: Before=17888959, After=17824683, chg -0.36%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
>> Data old new delta
>> ...
>> Total: Before=4820808, After=4820764, chg -0.00%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
>> RO Data old new delta
>> ...
>> Total: Before=5179123, After=5178027, chg -0.02%
>>
>> $ size vmlinux-base vmlinux
>> text data bss dec hex filename
>> 25433734 15385766 630656 41450156 2787aac vmlinux-base
>> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new
>>
>> Memory available after booting, saving 704k on qemu,
>> base: 8084532K/8388608K
>> new: 8085236K/8388608K
>
> Is that a 0.009% improvement? Is it really worth the hassle?
>
> x86 doesn't select this and risc-v had to turn it off for LLD, so it feels
> like we're just creating a rod for our own back by selecting it.


The LD_DEAD_CODE_DATA_ELIMINATION is particularly used for small configs
on small systems, risc-v is aimed to resource limited board platforms,
maybe x86 has no strong requirement, and we will try to use it on some
embedded board, if no one try it, this feature will never become stable :)

>
> Will
>

2023-07-17 12:00:05

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH] arm64: enable dead code elimination



On 2023/7/17 17:42, Marc Zyngier wrote:
> On 2023-07-17 09:07, Kefeng Wang wrote:
>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>> user to enable dead code elimination. In order for this to work, ensure
>> that we keep the necessary tables by annotating them with KEEP, also it
>> requires further changes to linker script to KEEP some tables and
>> wildcard
>> compiler generated sections into the right place.
>>
>> The following comparison is based 6.5-rc2 with defconfig,
>>
>> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
>> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
>> Function                                     old     new   delta
>> ...
>> Total: Before=17888959, After=17824683, chg -0.36%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
>> Data                                         old     new   delta
>> ...
>> Total: Before=4820808, After=4820764, chg -0.00%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
>> RO Data                                      old     new   delta
>> ...
>> Total: Before=5179123, After=5178027, chg -0.02%
>>
>> $ size vmlinux-base vmlinux
>>    text       data         bss      dec       hex    filename
>> 25433734  15385766  630656  41450156  2787aac    vmlinux-base
>> 24756738  15360870  629888  40747496  26dc1e8    vmlinux-new
>>
>> Memory available after booting, saving 704k on qemu,
>> base: 8084532K/8388608K
>> new:  8085236K/8388608K
>>
>> Signed-off-by: Kefeng Wang <[email protected]>
>
> I took this patch for a spin in my tree, and ended up with:
>
>   CC      .vmlinux.export.o
>   UPD     include/generated/utsversion.h
>   CC      init/version-timestamp.o
>   LD      .tmp_vmlinux.kallsyms1
> ld: init/main.o(__patchable_function_entries): error: need linked-to
> section for --gc-sections
> make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
> make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238: vmlinux]
> Error 2
> make: *** [Makefile:234: __sub-make] Error 2

I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or
allyesconfig, does it need special config or gcc version?
>
> so it's probably not ready for prime time.
>
>         M.

2023-07-17 12:32:54

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH] arm64: enable dead code elimination

On Mon, 17 Jul 2023 12:56:39 +0100,
Kefeng Wang <[email protected]> wrote:
>
>
>
> On 2023/7/17 17:42, Marc Zyngier wrote:
> > On 2023-07-17 09:07, Kefeng Wang wrote:
> >> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
> >> user to enable dead code elimination. In order for this to work, ensure
> >> that we keep the necessary tables by annotating them with KEEP, also it
> >> requires further changes to linker script to KEEP some tables and
> >> wildcard
> >> compiler generated sections into the right place.
> >>
> >> The following comparison is based 6.5-rc2 with defconfig,
> >>
> >> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
> >> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
> >> Function                                     old     new   delta
> >> ...
> >> Total: Before=17888959, After=17824683, chg -0.36%
> >>
> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
> >> Data                                         old     new   delta
> >> ...
> >> Total: Before=4820808, After=4820764, chg -0.00%
> >>
> >> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
> >> RO Data                                      old     new   delta
> >> ...
> >> Total: Before=5179123, After=5178027, chg -0.02%
> >>
> >> $ size vmlinux-base vmlinux
> >>    text       data         bss      dec       hex    filename
> >> 25433734  15385766  630656  41450156  2787aac    vmlinux-base
> >> 24756738  15360870  629888  40747496  26dc1e8    vmlinux-new
> >>
> >> Memory available after booting, saving 704k on qemu,
> >> base: 8084532K/8388608K
> >> new:  8085236K/8388608K
> >>
> >> Signed-off-by: Kefeng Wang <[email protected]>
> >
> > I took this patch for a spin in my tree, and ended up with:
> >
> >   CC      .vmlinux.export.o
> >   UPD     include/generated/utsversion.h
> >   CC      init/version-timestamp.o
> >   LD      .tmp_vmlinux.kallsyms1
> > ld: init/main.o(__patchable_function_entries): error: need linked-to
> > section for --gc-sections
> > make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
> > make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238:
> > vmlinux] Error 2
> > make: *** [Makefile:234: __sub-make] Error 2
>
> I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or
> allyesconfig, does it need special config or gcc version?

You tell me!

gcc (Debian 10.2.1-6) 10.2.1 20210110
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

so hardly something special. This is built with the current state of
my NV tree, available here[1] As for the configuration, have a look
here[2].

M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.6-WIP
[2] https://paste.debian.net/1286106/

--
Without deviation from the norm, progress is not possible.

2023-07-18 11:24:35

by Kefeng Wang

[permalink] [raw]
Subject: Re: [PATCH] arm64: enable dead code elimination



On 2023/7/17 20:15, Marc Zyngier wrote:
> On Mon, 17 Jul 2023 12:56:39 +0100,
> Kefeng Wang <[email protected]> wrote:
>>
>>
>>
>> On 2023/7/17 17:42, Marc Zyngier wrote:
>>> On 2023-07-17 09:07, Kefeng Wang wrote:
>>>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>>>> user to enable dead code elimination. In order for this to work, ensure
>>>> that we keep the necessary tables by annotating them with KEEP, also it
>>>> requires further changes to linker script to KEEP some tables and
>>>> wildcard
>>>> compiler generated sections into the right place.
>>>>
>>>> The following comparison is based 6.5-rc2 with defconfig,
>>>>
...
>>>
>>> I took this patch for a spin in my tree, and ended up with:
>>>
>>>   CC      .vmlinux.export.o
>>>   UPD     include/generated/utsversion.h
>>>   CC      init/version-timestamp.o
>>>   LD      .tmp_vmlinux.kallsyms1
>>> ld: init/main.o(__patchable_function_entries): error: need linked-to
>>> section for --gc-sections
>>> make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
>>> make[1]: *** [/home/maz/hot-poop/arm-platforms/Makefile:1238:
>>> vmlinux] Error 2
>>> make: *** [Makefile:234: __sub-make] Error 2
>>
>> I don't find this error with CONFIG_FTRACE_MCOUNT_RECORD or
>> allyesconfig, does it need special config or gcc version?
>
> You tell me!
>
> gcc (Debian 10.2.1-6) 10.2.1 20210110
> Copyright (C) 2020 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> so hardly something special. This is built with the current state of
> my NV tree, available here[1] As for the configuration, have a look
> here[2].

1) With gcc 10.3.1/ld (GNU Binutils) 2.37, it could be reproduced,
but there is no issue for cross-compiler gcc 9.3/ld (GNU Binutils for
Ubuntu) 2.34.

2) There is same issue like commit f7584322e4fe ("riscv: disable
HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD") said with allyesconfig on
arm64, it takes too long in bfd_flavour_name()

Samples: 257K of event 'cycles', Event count (approx.): 203974259359

Overhead Shared Object Symbol
IPC [IPC Coverage]
- 61.11% libbfd-2.34-arm64.so [.] bfd_flavour_name
- -
bfd_flavour_name

- 6.55% libbfd-2.34-arm64.so [.] bfd_hash_traverse
- -


Just like you said, it is not ready for prime time, so please ignore
this patch :(


>
> M.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-6.6-WIP
> [2] https://paste.debian.net/1286106/
>

2024-01-25 13:50:34

by Yuntao Liu

[permalink] [raw]
Subject: Re: [PATCH] arm64: enable dead code elimination

On 2023/7/17 17:24, Will Deacon wrote:
> On Mon, Jul 17, 2023 at 04:07:39PM +0800, Kefeng Wang wrote:
>> Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for arm64, allowing the
>> user to enable dead code elimination. In order for this to work, ensure
>> that we keep the necessary tables by annotating them with KEEP, also it
>> requires further changes to linker script to KEEP some tables and wildcard
>> compiler generated sections into the right place.
>>
>> The following comparison is based 6.5-rc2 with defconfig,
>>
>> $ ./scripts/bloat-o-meter vmlinux-base vmlinux-new
>> add/remove: 3/1106 grow/shrink: 4102/6964 up/down: 35704/-99980 (-64276)
>> Function old new delta
>> ...
>> Total: Before=17888959, After=17824683, chg -0.36%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-44 (-44)
>> Data old new delta
>> ...
>> Total: Before=4820808, After=4820764, chg -0.00%
>>
>> add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-1096 (-1096)
>> RO Data old new delta
>> ...
>> Total: Before=5179123, After=5178027, chg -0.02%
>>
>> $ size vmlinux-base vmlinux
>> text data bss dec hex filename
>> 25433734 15385766 630656 41450156 2787aac vmlinux-base
>> 24756738 15360870 629888 40747496 26dc1e8 vmlinux-new
>>
>> Memory available after booting, saving 704k on qemu,
>> base: 8084532K/8388608K
>> new: 8085236K/8388608K
>
> Is that a 0.009% improvement? Is it really worth the hassle?
>
> x86 doesn't select this and risc-v had to turn it off for LLD, so it feels
> like we're just creating a rod for our own back by selecting it.

I tested this patch and found that, the smaller the config file,the more
significant the reduction in file size of the builds. This may be useful
in scenarios such as embedded systems where size is particularly critical.

Just like Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V,
this boots well on qemu, with defconfig, it shrinks their builds by ~1.6%,
and with tinyconfig it shrinks their builds by ~18.7%.

defconfig:
text data bss dec hex
26839348 16695234 629456 44164038 2a1e3c6 before
26140556 16667058 628880 43436494 296c9ce after

tinyconfig:
text data bss dec hex
1259568 272100 104312 1635980 18f68c before
967056 258716 103824 1329596 1449bc after

| tinyconfig | defconfig
--------|------------------------|---------------------
No DCE | 1635980 | 44164038
DCE | 1329596 | 43436494
Shrink | 306384 (~18.7%) | 727544 (~1.6%)

>
> Will
>