Hi Linus,
Please pull this Clang Link Time Optimization series for v5.12-rc1.
This has been in linux-next for the entire last development cycle,
and is built on the work done preparing[0] for LTO by arm64 folks,
tracing folks, etc. This series includes the core changes as well as
the remaining pieces for arm64 (LTO has been the default build method on
Android for about 3 years now, as it is the prerequisite for the Control
Flow Integrity protections).
While x86 LTO enablement is done[1], it depends on some objtool
clean-ups[2], though it appears those actually have been in linux-next
(via tip/objtool/core), so it's possible that if that tree lands, I'll
send a "part 2" pull request for LTO that includes x86 support (though
I guess that depends on the length of the merge window).
For merge log posterity, and as detailed in commit dc5723b02e52 ("kbuild:
add support for Clang LTO"), here is the lt;dr to do an LTO build:
make LLVM=1 LLVM_IAS=1 defconfig
scripts/config -e LTO_CLANG_THIN
make LLVM=1 LLVM_IAS=1
(To do a cross-compile of arm64, add "CROSS_COMPILE=aarch64-linux-gnu-"
and "ARCH=arm64" to the "make" command lines.)
Thanks!
-Kees
[0] https://git.kernel.org/linus/3c09ec59cdea5b132212d97154d625fd34e436dd
[1] https://github.com/samitolvanen/linux/commits/clang-lto
[2] https://lore.kernel.org/lkml/[email protected]/
The following changes since commit e71ba9452f0b5b2e8dc8aa5445198cd9214a6a62:
Linux 5.11-rc2 (2021-01-03 15:55:30 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/clang-lto-v5.12-rc1
for you to fetch changes up to 2b8689520520175075ca97bc4eaf51ff3f7253aa:
kbuild: prevent CC_FLAGS_LTO self-bloating on recursive rebuilds (2021-02-17 10:10:37 -0800)
----------------------------------------------------------------
clang-lto series for v5.12-rc1
- Clang LTO build infrastructure and arm64-specific enablement (Sami Tolvanen)
- Recursive build CC_FLAGS_LTO fix (Alexander Lobakin)
----------------------------------------------------------------
Alexander Lobakin (1):
kbuild: prevent CC_FLAGS_LTO self-bloating on recursive rebuilds
Sami Tolvanen (16):
tracing: move function tracer options to Kconfig
kbuild: add support for Clang LTO
kbuild: lto: fix module versioning
kbuild: lto: limit inlining
kbuild: lto: merge module sections
kbuild: lto: add a default list of used symbols
init: lto: ensure initcall ordering
init: lto: fix PREL32 relocations
PCI: Fix PREL32 relocations for LTO
modpost: lto: strip .lto from module names
scripts/mod: disable LTO for empty.c
efi/libstub: disable LTO
drivers/misc/lkdtm: disable LTO for rodata.o
arm64: vdso: disable LTO
arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
arm64: allow LTO to be selected
.gitignore | 1 +
Makefile | 45 ++++--
arch/Kconfig | 90 ++++++++++++
arch/arm64/Kconfig | 4 +
arch/arm64/kernel/vdso/Makefile | 3 +-
drivers/firmware/efi/libstub/Makefile | 2 +
drivers/misc/lkdtm/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 11 +-
include/linux/init.h | 79 ++++++++--
include/linux/pci.h | 27 +++-
init/Kconfig | 1 +
kernel/trace/Kconfig | 16 ++
scripts/Makefile.build | 48 +++++-
scripts/Makefile.lib | 6 +-
scripts/Makefile.modfinal | 9 +-
scripts/Makefile.modpost | 25 +++-
scripts/generate_initcall_order.pl | 270 ++++++++++++++++++++++++++++++++++
scripts/link-vmlinux.sh | 70 +++++++--
scripts/lto-used-symbollist.txt | 5 +
scripts/mod/Makefile | 1 +
scripts/mod/modpost.c | 16 +-
scripts/mod/modpost.h | 9 ++
scripts/mod/sumversion.c | 6 +-
scripts/module.lds.S | 24 +++
24 files changed, 707 insertions(+), 62 deletions(-)
create mode 100755 scripts/generate_initcall_order.pl
create mode 100644 scripts/lto-used-symbollist.txt
--
Kees Cook
On Mon, Feb 22, 2021 at 3:11 PM Kees Cook <[email protected]> wrote:
>
> While x86 LTO enablement is done[1], it depends on some objtool
> clean-ups[2], though it appears those actually have been in linux-next
> (via tip/objtool/core), so it's possible that if that tree lands [..]
That tree is actually next on my list of things to merge after this
one, so it should be out soonish.
Linus
The pull request you sent on Mon, 22 Feb 2021 15:11:19 -0800:
> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git tags/clang-lto-v5.12-rc1
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/79db4d2293eba2ce6265a341bedf6caecad5eeb3
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
On Tue, Feb 23, 2021 at 9:49 AM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, Feb 22, 2021 at 3:11 PM Kees Cook <[email protected]> wrote:
> >
> > While x86 LTO enablement is done[1], it depends on some objtool
> > clean-ups[2], though it appears those actually have been in linux-next
> > (via tip/objtool/core), so it's possible that if that tree lands [..]
>
> That tree is actually next on my list of things to merge after this
> one, so it should be out soonish.
"soonish" turned out to be later than I thought, because my "build
changes" set of pulls included the module change that I then wasted a
lot of time on trying to figure out why it slowed down my build so
much.
But it's out now, as pr-tracker-bot already noted.
Linus
On Tue, Feb 23, 2021 at 12:33:05PM -0800, Linus Torvalds wrote:
> On Tue, Feb 23, 2021 at 9:49 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Mon, Feb 22, 2021 at 3:11 PM Kees Cook <[email protected]> wrote:
> > >
> > > While x86 LTO enablement is done[1], it depends on some objtool
> > > clean-ups[2], though it appears those actually have been in linux-next
> > > (via tip/objtool/core), so it's possible that if that tree lands [..]
> >
> > That tree is actually next on my list of things to merge after this
> > one, so it should be out soonish.
>
> "soonish" turned out to be later than I thought, because my "build
> changes" set of pulls included the module change that I then wasted a
> lot of time on trying to figure out why it slowed down my build so
> much.
>
> But it's out now, as pr-tracker-bot already noted.
Great! Thanks for the updates; I'll prepare "part 2" right away. :)
--
Kees Cook
From: Linus Torvalds <[email protected]>
Date: Tue, 23 Feb 2021 12:33:05 -0800
> On Tue, Feb 23, 2021 at 9:49 AM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Mon, Feb 22, 2021 at 3:11 PM Kees Cook <[email protected]> wrote:
> > >
> > > While x86 LTO enablement is done[1], it depends on some objtool
> > > clean-ups[2], though it appears those actually have been in linux-next
> > > (via tip/objtool/core), so it's possible that if that tree lands [..]
> >
> > That tree is actually next on my list of things to merge after this
> > one, so it should be out soonish.
>
> "soonish" turned out to be later than I thought, because my "build
> changes" set of pulls included the module change that I then wasted a
> lot of time on trying to figure out why it slowed down my build so
> much.
I guess it's about CONFIG_TRIM_UNUSED_KSYMS you disabled in your tree.
Well, it's actually widely used, mostly in the embedded world where
there are often no out-of-tree modules, but a need to save as much
space as possible.
For full-blown systems and distributions it's almost needless, right.
> But it's out now, as pr-tracker-bot already noted.
>
> Linus
Thanks,
Al
On Wed, Feb 24, 2021 at 1:10 AM Alexander Lobakin <[email protected]> wrote:
> From: Linus Torvalds <[email protected]> Date: Tue, 23 Feb 2021 12:33:05 -0800
>> > On Tue, Feb 23, 2021 at 9:49 AM Linus Torvalds <[email protected]> wrote:
> > > On Mon, Feb 22, 2021 at 3:11 PM Kees Cook <[email protected]> wrote:
> > > >
> > > > While x86 LTO enablement is done[1], it depends on some objtool
> > > > clean-ups[2], though it appears those actually have been in linux-next
> > > > (via tip/objtool/core), so it's possible that if that tree lands [..]
> > >
> > > That tree is actually next on my list of things to merge after this
> > > one, so it should be out soonish.
> >
> > "soonish" turned out to be later than I thought, because my "build
> > changes" set of pulls included the module change that I then wasted a
> > lot of time on trying to figure out why it slowed down my build so
> > much.
>
> I guess it's about CONFIG_TRIM_UNUSED_KSYMS you disabled in your tree.
> Well, it's actually widely used, mostly in the embedded world where
> there are often no out-of-tree modules, but a need to save as much
> space as possible.
> For full-blown systems and distributions it's almost needless, right.
Generally, CONFIG_TRIM_UNUSED_KSYMS helps mostly
when combined with either LTO or --gc-sections
(CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION), though
the effect seems to be smaller than I expected. For example on m68k:
4005135 1374302 167108 5546545 54a231 vmlinux-normal
3916254 1378078 167108 5461440 5355c0 vmlinux+trim
4012933 1362514 164280 5539727 54878f vmlinux+gcsection
3797884 1334194 164640 5296718 50d24e vmlinux+gcsection+trim
For arm64 defconfig, CONFIG_TRIM_UNUSED_KSYMS saves around
700KB by itself, or when combined with either gc-sections or LTO,
but saves a full megabyte when all three are combined:
text data bss dec hex filename
16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux
However, the combination of thinlto and trim indeed has a steep
cost in compile time, taking almost twice as long as a normal
defconfig (gc-sections makes it slightly faster).
==== defconfig ====
332.001786355 seconds time elapsed
8599.464163000 seconds user
676.919635000 seconds sys
==== trim_defconfig ====
448.378576012 seconds time elapsed
10735.489271000 seconds user
964.006504000 seconds sys
==== gc_defconfig ====
324.347492236 seconds time elapsed
8465.785800000 seconds user
614.899797000 seconds sys
==== gc+trim_defconfig ====
429.188875620 seconds time elapsed
10203.759658000 seconds user
871.307973000 seconds sys
==== thinlto_defconfig ====
389.793540200 seconds time elapsed
9491.665320000 seconds user
664.858109000 seconds sys
==== thinlto+trim_defconfig ====
580.431820561 seconds time elapsed
11429.515538000 seconds user
1056.985745000 seconds sys
==== gc+thinlto_defconfig ====
389.484364525 seconds time elapsed
9473.831980000 seconds user
675.057675000 seconds sys
==== gc+thinlto+trim_defconfig ====
580.824912807 seconds time elapsed
11433.650337000 seconds user
1049.845569000 seconds sys
Arnd