2020-11-18 22:09:32

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 00/17] Add support for Clang LTO

This patch series adds support for building the kernel with Clang's
Link Time Optimization (LTO). In addition to performance, the primary
motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
be used in the kernel. Google has shipped millions of Pixel devices
running three major kernel versions with LTO+CFI since 2018.

Most of the patches are build system changes for handling LLVM bitcode,
which Clang produces with LTO instead of ELF object files, postponing
ELF processing until a later stage, and ensuring initcall ordering.

Note that v7 brings back arm64 support as Will has now staged the
prerequisite memory ordering patches [1], and drops x86_64 while we work
on fixing the remaining objtool warnings [2].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
[2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/

You can also pull this series from

https://github.com/samitolvanen/linux.git lto-v7

---
Changes in v7:

- Rebased to master again.

- Added back arm64 patches as the prerequisites are now staged,
and dropped x86_64 support until the remaining objtool issues
are resolved.

- Dropped ifdefs from module.lds.S.

Changes in v6:

- Added the missing --mcount flag to patch 5.

- Dropped the arm64 patches from this series and will repost them
later.

Changes in v5:

- Rebased on top of tip/master.

- Changed the command line for objtool to use --vmlinux --duplicate
to disable warnings about retpoline thunks and to fix .orc_unwind
generation for vmlinux.o.

- Added --noinstr flag to objtool, so we can use --vmlinux without
also enabling noinstr validation.

- Disabled objtool's unreachable instruction warnings with LTO to
disable false positives for the int3 padding in vmlinux.o.

- Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps
in x86 assembly code to fix objtool warnings with retpoline.

- Fixed modpost warnings about missing version information with
CONFIG_MODVERSIONS.

- Included Makefile.lib into Makefile.modpost for ld_flags. Thanks
to Sedat for pointing this out.

- Updated the help text for ThinLTO to better explain the trade-offs.

- Updated commit messages with better explanations.

Changes in v4:

- Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool.

- Moved ftrace configs related to generating __mcount_loc to Kconfig,
so they are available also in Makefile.modfinal.

- Dropped two prerequisite patches that were merged to Linus' tree.

Changes in v3:

- Added a separate patch to remove the unused DISABLE_LTO treewide,
as filtering out CC_FLAGS_LTO instead is preferred.

- Updated the Kconfig help to explain why LTO is behind a choice
and disabled by default.

- Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now
appended directly to CC_FLAGS_LTO.

- Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier.

- Fixed ThinLTO cache handling for external module builds.

- Rebased on top of Masahiro's patch for preprocessing modules.lds,
and moved the contents of module-lto.lds to modules.lds.S.

- Moved objtool_args to Makefile.lib to avoid duplication of the
command line parameters in Makefile.modfinal.

- Clarified in the commit message for the initcall ordering patch
that the initcall order remains the same as without LTO.

- Changed link-vmlinux.sh to use jobserver-exec to control the
number of jobs started by generate_initcall_ordering.pl.

- Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's
no longer needed with ToT kernel.

- Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug
with stack protector attributes.

Changes in v2:

- Fixed -Wmissing-prototypes warnings with W=1.

- Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache
scrubbing to make distclean.

- Added a comment about Clang >=11 being required.

- Added a patch to disable LTO for the arm64 KVM nVHE code.

- Disabled objtool's noinstr validation with LTO unless enabled.

- Included Peter's proposed objtool mcount patch in the series
and replaced recordmcount with the objtool pass to avoid
whitelisting relocations that are not calls.

- Updated several commit messages with better explanations.


Sami Tolvanen (17):
tracing: move function tracer options to Kconfig
kbuild: add support for Clang LTO
kbuild: lto: fix module versioning
kbuild: lto: limit inlining
kbuild: lto: merge module sections
kbuild: lto: remove duplicate dependencies from .mod files
init: lto: ensure initcall ordering
init: lto: fix PREL32 relocations
PCI: Fix PREL32 relocations for LTO
modpost: lto: strip .lto from module names
scripts/mod: disable LTO for empty.c
efi/libstub: disable LTO
drivers/misc/lkdtm: disable LTO for rodata.o
arm64: vdso: disable LTO
KVM: arm64: disable LTO for the nVHE directory
arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
arm64: allow LTO_CLANG and THINLTO to be selected

.gitignore | 1 +
Makefile | 45 +++--
arch/Kconfig | 74 +++++++
arch/arm64/Kconfig | 4 +
arch/arm64/kernel/vdso/Makefile | 3 +-
arch/arm64/kvm/hyp/nvhe/Makefile | 4 +-
drivers/firmware/efi/libstub/Makefile | 2 +
drivers/misc/lkdtm/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 11 +-
include/linux/init.h | 79 +++++++-
include/linux/pci.h | 19 +-
kernel/trace/Kconfig | 16 ++
scripts/Makefile.build | 50 ++++-
scripts/Makefile.lib | 6 +-
scripts/Makefile.modfinal | 9 +-
scripts/Makefile.modpost | 25 ++-
scripts/generate_initcall_order.pl | 270 ++++++++++++++++++++++++++
scripts/link-vmlinux.sh | 70 ++++++-
scripts/mod/Makefile | 1 +
scripts/mod/modpost.c | 16 +-
scripts/mod/modpost.h | 9 +
scripts/mod/sumversion.c | 6 +-
scripts/module.lds.S | 24 +++
23 files changed, 677 insertions(+), 68 deletions(-)
create mode 100755 scripts/generate_initcall_order.pl


base-commit: 0fa8ee0d9ab95c9350b8b84574824d9a384a9f7d
--
2.29.2.299.gdc1121823c-goog


2020-11-18 22:09:56

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 07/17] init: lto: ensure initcall ordering

With LTO, the compiler doesn't necessarily obey the link order for
initcalls, and initcall variables need globally unique names to avoid
collisions at link time.

This change exports __KBUILD_MODNAME and adds the initcall_id() macro,
which uses it together with __COUNTER__ and __LINE__ to help ensure
these variables have unique names, and moves each variable to its own
section when LTO is enabled, so the correct order can be specified using
a linker script.

The generate_initcall_ordering.pl script uses nm to find initcalls from
the object files passed to the linker, and generates a linker script
that specifies the same order for initcalls that we would have without
LTO. With LTO enabled, the script is called in link-vmlinux.sh through
jobserver-exec to limit the number of jobs spawned.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
include/linux/init.h | 52 +++++-
scripts/Makefile.lib | 6 +-
scripts/generate_initcall_order.pl | 270 +++++++++++++++++++++++++++++
scripts/link-vmlinux.sh | 15 ++
4 files changed, 334 insertions(+), 9 deletions(-)
create mode 100755 scripts/generate_initcall_order.pl

diff --git a/include/linux/init.h b/include/linux/init.h
index 7b53cb3092ee..d466bea7ecba 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -184,19 +184,57 @@ extern bool initcall_debug;
* as KEEP() in the linker script.
*/

+/* Format: <modname>__<counter>_<line>_<fn> */
+#define __initcall_id(fn) \
+ __PASTE(__KBUILD_MODNAME, \
+ __PASTE(__, \
+ __PASTE(__COUNTER__, \
+ __PASTE(_, \
+ __PASTE(__LINE__, \
+ __PASTE(_, fn))))))
+
+/* Format: __<prefix>__<iid><id> */
+#define __initcall_name(prefix, __iid, id) \
+ __PASTE(__, \
+ __PASTE(prefix, \
+ __PASTE(__, \
+ __PASTE(__iid, id))))
+
+#ifdef CONFIG_LTO_CLANG
+/*
+ * With LTO, the compiler doesn't necessarily obey link order for
+ * initcalls. In order to preserve the correct order, we add each
+ * variable into its own section and generate a linker script (in
+ * scripts/link-vmlinux.sh) to specify the order of the sections.
+ */
+#define __initcall_section(__sec, __iid) \
+ #__sec ".init.." #__iid
+#else
+#define __initcall_section(__sec, __iid) \
+ #__sec ".init"
+#endif
+
#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
-#define ___define_initcall(fn, id, __sec) \
+#define ____define_initcall(fn, __name, __sec) \
__ADDRESSABLE(fn) \
- asm(".section \"" #__sec ".init\", \"a\" \n" \
- "__initcall_" #fn #id ": \n" \
+ asm(".section \"" __sec "\", \"a\" \n" \
+ __stringify(__name) ": \n" \
".long " #fn " - . \n" \
".previous \n");
#else
-#define ___define_initcall(fn, id, __sec) \
- static initcall_t __initcall_##fn##id __used \
- __attribute__((__section__(#__sec ".init"))) = fn;
+#define ____define_initcall(fn, __name, __sec) \
+ static initcall_t __name __used \
+ __attribute__((__section__(__sec))) = fn;
#endif

+#define __unique_initcall(fn, id, __sec, __iid) \
+ ____define_initcall(fn, \
+ __initcall_name(initcall, __iid, id), \
+ __initcall_section(__sec, __iid))
+
+#define ___define_initcall(fn, id, __sec) \
+ __unique_initcall(fn, id, __sec, __initcall_id(fn))
+
#define __define_initcall(fn, id) ___define_initcall(fn, id, .initcall##id)

/*
@@ -236,7 +274,7 @@ extern bool initcall_debug;
#define __exitcall(fn) \
static exitcall_t __exitcall_##fn __exit_call = fn

-#define console_initcall(fn) ___define_initcall(fn,, .con_initcall)
+#define console_initcall(fn) ___define_initcall(fn, con, .con_initcall)

struct obs_kernel_param {
const char *str;
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 94133708889d..53aa3e18ce8a 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -117,9 +117,11 @@ target-stem = $(basename $(patsubst $(obj)/%,%,$@))
# These flags are needed for modversions and compiling, so we define them here
# $(modname_flags) defines KBUILD_MODNAME as the name of the module it will
# end up in (or would, if it gets compiled in)
-name-fix = $(call stringify,$(subst $(comma),_,$(subst -,_,$1)))
+name-fix-token = $(subst $(comma),_,$(subst -,_,$1))
+name-fix = $(call stringify,$(call name-fix-token,$1))
basename_flags = -DKBUILD_BASENAME=$(call name-fix,$(basetarget))
-modname_flags = -DKBUILD_MODNAME=$(call name-fix,$(modname))
+modname_flags = -DKBUILD_MODNAME=$(call name-fix,$(modname)) \
+ -D__KBUILD_MODNAME=kmod_$(call name-fix-token,$(modname))
modfile_flags = -DKBUILD_MODFILE=$(call stringify,$(modfile))

_c_flags = $(filter-out $(CFLAGS_REMOVE_$(target-stem).o), \
diff --git a/scripts/generate_initcall_order.pl b/scripts/generate_initcall_order.pl
new file mode 100755
index 000000000000..1a88d3f1b913
--- /dev/null
+++ b/scripts/generate_initcall_order.pl
@@ -0,0 +1,270 @@
+#!/usr/bin/env perl
+# SPDX-License-Identifier: GPL-2.0
+#
+# Generates a linker script that specifies the correct initcall order.
+#
+# Copyright (C) 2019 Google LLC
+
+use strict;
+use warnings;
+use IO::Handle;
+use IO::Select;
+use POSIX ":sys_wait_h";
+
+my $nm = $ENV{'NM'} || die "$0: ERROR: NM not set?";
+my $objtree = $ENV{'objtree'} || '.';
+
+## currently active child processes
+my $jobs = {}; # child process pid -> file handle
+## results from child processes
+my $results = {}; # object index -> [ { level, secname }, ... ]
+
+## reads _NPROCESSORS_ONLN to determine the maximum number of processes to
+## start
+sub get_online_processors {
+ open(my $fh, "getconf _NPROCESSORS_ONLN 2>/dev/null |")
+ or die "$0: ERROR: failed to execute getconf: $!";
+ my $procs = <$fh>;
+ close($fh);
+
+ if (!($procs =~ /^\d+$/)) {
+ return 1;
+ }
+
+ return int($procs);
+}
+
+## writes results to the parent process
+## format: <file index> <initcall level> <base initcall section name>
+sub write_results {
+ my ($index, $initcalls) = @_;
+
+ # sort by the counter value to ensure the order of initcalls within
+ # each object file is correct
+ foreach my $counter (sort { $a <=> $b } keys(%{$initcalls})) {
+ my $level = $initcalls->{$counter}->{'level'};
+
+ # section name for the initcall function
+ my $secname = $initcalls->{$counter}->{'module'} . '__' .
+ $counter . '_' .
+ $initcalls->{$counter}->{'line'} . '_' .
+ $initcalls->{$counter}->{'function'};
+
+ print "$index $level $secname\n";
+ }
+}
+
+## reads a result line from a child process and adds it to the $results array
+sub read_results{
+ my ($fh) = @_;
+
+ # each child prints out a full line w/ autoflush and exits after the
+ # last line, so even if buffered I/O blocks here, it shouldn't block
+ # very long
+ my $data = <$fh>;
+
+ if (!defined($data)) {
+ return 0;
+ }
+
+ chomp($data);
+
+ my ($index, $level, $secname) = $data =~
+ /^(\d+)\ ([^\ ]+)\ (.*)$/;
+
+ if (!defined($index) ||
+ !defined($level) ||
+ !defined($secname)) {
+ die "$0: ERROR: child process returned invalid data: $data\n";
+ }
+
+ $index = int($index);
+
+ if (!exists($results->{$index})) {
+ $results->{$index} = [];
+ }
+
+ push (@{$results->{$index}}, {
+ 'level' => $level,
+ 'secname' => $secname
+ });
+
+ return 1;
+}
+
+## finds initcalls from an object file or all object files in an archive, and
+## writes results back to the parent process
+sub find_initcalls {
+ my ($index, $file) = @_;
+
+ die "$0: ERROR: file $file doesn't exist?" if (! -f $file);
+
+ open(my $fh, "\"$nm\" --defined-only \"$file\" 2>/dev/null |")
+ or die "$0: ERROR: failed to execute \"$nm\": $!";
+
+ my $initcalls = {};
+
+ while (<$fh>) {
+ chomp;
+
+ # check for the start of a new object file (if processing an
+ # archive)
+ my ($path)= $_ =~ /^(.+)\:$/;
+
+ if (defined($path)) {
+ write_results($index, $initcalls);
+ $initcalls = {};
+ next;
+ }
+
+ # look for an initcall
+ my ($module, $counter, $line, $symbol) = $_ =~
+ /[a-z]\s+__initcall__(\S*)__(\d+)_(\d+)_(.*)$/;
+
+ if (!defined($module)) {
+ $module = ''
+ }
+
+ if (!defined($counter) ||
+ !defined($line) ||
+ !defined($symbol)) {
+ next;
+ }
+
+ # parse initcall level
+ my ($function, $level) = $symbol =~
+ /^(.*)((early|rootfs|con|[0-9])s?)$/;
+
+ die "$0: ERROR: invalid initcall name $symbol in $file($path)"
+ if (!defined($function) || !defined($level));
+
+ $initcalls->{$counter} = {
+ 'module' => $module,
+ 'line' => $line,
+ 'function' => $function,
+ 'level' => $level,
+ };
+ }
+
+ close($fh);
+ write_results($index, $initcalls);
+}
+
+## waits for any child process to complete, reads the results, and adds them to
+## the $results array for later processing
+sub wait_for_results {
+ my ($select) = @_;
+
+ my $pid = 0;
+ do {
+ # unblock children that may have a full write buffer
+ foreach my $fh ($select->can_read(0)) {
+ read_results($fh);
+ }
+
+ # check for children that have exited, read the remaining data
+ # from them, and clean up
+ $pid = waitpid(-1, WNOHANG);
+ if ($pid > 0) {
+ if (!exists($jobs->{$pid})) {
+ next;
+ }
+
+ my $fh = $jobs->{$pid};
+ $select->remove($fh);
+
+ while (read_results($fh)) {
+ # until eof
+ }
+
+ close($fh);
+ delete($jobs->{$pid});
+ }
+ } while ($pid > 0);
+}
+
+## forks a child to process each file passed in the command line and collects
+## the results
+sub process_files {
+ my $index = 0;
+ my $njobs = $ENV{'PARALLELISM'} || get_online_processors();
+ my $select = IO::Select->new();
+
+ while (my $file = shift(@ARGV)) {
+ # fork a child process and read it's stdout
+ my $pid = open(my $fh, '-|');
+
+ if (!defined($pid)) {
+ die "$0: ERROR: failed to fork: $!";
+ } elsif ($pid) {
+ # save the child process pid and the file handle
+ $select->add($fh);
+ $jobs->{$pid} = $fh;
+ } else {
+ # in the child process
+ STDOUT->autoflush(1);
+ find_initcalls($index, "$objtree/$file");
+ exit;
+ }
+
+ $index++;
+
+ # limit the number of children to $njobs
+ if (scalar(keys(%{$jobs})) >= $njobs) {
+ wait_for_results($select);
+ }
+ }
+
+ # wait for the remaining children to complete
+ while (scalar(keys(%{$jobs})) > 0) {
+ wait_for_results($select);
+ }
+}
+
+sub generate_initcall_lds() {
+ process_files();
+
+ my $sections = {}; # level -> [ secname, ...]
+
+ # sort results to retain link order and split to sections per
+ # initcall level
+ foreach my $index (sort { $a <=> $b } keys(%{$results})) {
+ foreach my $result (@{$results->{$index}}) {
+ my $level = $result->{'level'};
+
+ if (!exists($sections->{$level})) {
+ $sections->{$level} = [];
+ }
+
+ push(@{$sections->{$level}}, $result->{'secname'});
+ }
+ }
+
+ die "$0: ERROR: no initcalls?" if (!keys(%{$sections}));
+
+ # print out a linker script that defines the order of initcalls for
+ # each level
+ print "SECTIONS {\n";
+
+ foreach my $level (sort(keys(%{$sections}))) {
+ my $section;
+
+ if ($level eq 'con') {
+ $section = '.con_initcall.init';
+ } else {
+ $section = ".initcall${level}.init";
+ }
+
+ print "\t${section} : {\n";
+
+ foreach my $secname (@{$sections->{$level}}) {
+ print "\t\t*(${section}..${secname}) ;\n";
+ }
+
+ print "\t}\n";
+ }
+
+ print "}\n";
+}
+
+generate_initcall_lds();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 78e55fe7210b..c5919d5a0b4f 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -43,6 +43,17 @@ info()
fi
}

+# Generate a linker script to ensure correct ordering of initcalls.
+gen_initcalls()
+{
+ info GEN .tmp_initcalls.lds
+
+ ${PYTHON} ${srctree}/scripts/jobserver-exec \
+ ${PERL} ${srctree}/scripts/generate_initcall_order.pl \
+ ${KBUILD_VMLINUX_OBJS} ${KBUILD_VMLINUX_LIBS} \
+ > .tmp_initcalls.lds
+}
+
# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into
# .tmp_symversions.lds
gen_symversions()
@@ -72,6 +83,9 @@ modpost_link()
--end-group"

if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ gen_initcalls
+ lds="-T .tmp_initcalls.lds"
+
if [ -n "${CONFIG_MODVERSIONS}" ]; then
gen_symversions
lds="${lds} -T .tmp_symversions.lds"
@@ -262,6 +276,7 @@ cleanup()
{
rm -f .btf.*
rm -f .tmp_System.map
+ rm -f .tmp_initcalls.lds
rm -f .tmp_symversions.lds
rm -f .tmp_vmlinux*
rm -f System.map
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:10:17

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 12/17] efi/libstub: disable LTO

With CONFIG_LTO_CLANG, we produce LLVM bitcode instead of ELF object
files. Since LTO is not really needed here and the Makefile assumes we
produce an object file, disable LTO for libstub.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
drivers/firmware/efi/libstub/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 8a94388e38b3..c23466e05e60 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -38,6 +38,8 @@ KBUILD_CFLAGS := $(cflags-y) -Os -DDISABLE_BRANCH_PROFILING \

# remove SCS flags from all objects in this directory
KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
+# disable LTO
+KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS))

GCOV_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:10:38

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 13/17] drivers/misc/lkdtm: disable LTO for rodata.o

Disable LTO for rodata.o to allow objcopy to be used to
manipulate sections.

Signed-off-by: Sami Tolvanen <[email protected]>
Acked-by: Kees Cook <[email protected]>
---
drivers/misc/lkdtm/Makefile | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/misc/lkdtm/Makefile b/drivers/misc/lkdtm/Makefile
index c70b3822013f..dd4c936d4d73 100644
--- a/drivers/misc/lkdtm/Makefile
+++ b/drivers/misc/lkdtm/Makefile
@@ -13,6 +13,7 @@ lkdtm-$(CONFIG_LKDTM) += cfi.o

KASAN_SANITIZE_stackleak.o := n
KCOV_INSTRUMENT_rodata.o := n
+CFLAGS_REMOVE_rodata.o += $(CC_FLAGS_LTO)

OBJCOPYFLAGS :=
OBJCOPYFLAGS_rodata_objcopy.o := \
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:10:48

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 05/17] kbuild: lto: merge module sections

LLD always splits sections with LTO, which increases module sizes. This
change adds linker script rules to merge the split sections in the final
module.

Suggested-by: Nick Desaulniers <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
scripts/module.lds.S | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index 69b9b71a6a47..18d5b8423635 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -23,6 +23,30 @@ SECTIONS {
.init_array 0 : ALIGN(8) { *(SORT(.init_array.*)) *(.init_array) }

__jump_table 0 : ALIGN(8) { KEEP(*(__jump_table)) }
+
+ __patchable_function_entries : { *(__patchable_function_entries) }
+
+ /*
+ * With CONFIG_LTO_CLANG, LLD always enables -fdata-sections and
+ * -ffunction-sections, which increases the size of the final module.
+ * Merge the split sections in the final binary.
+ */
+ .bss : {
+ *(.bss .bss.[0-9a-zA-Z_]*)
+ *(.bss..L*)
+ }
+
+ .data : {
+ *(.data .data.[0-9a-zA-Z_]*)
+ *(.data..L*)
+ }
+
+ .rodata : {
+ *(.rodata .rodata.[0-9a-zA-Z_]*)
+ *(.rodata..L*)
+ }
+
+ .text : { *(.text .text.[0-9a-zA-Z_]*) }
}

/* bring in arch-specific sections */
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:10:55

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 02/17] kbuild: add support for Clang LTO

This change adds build system support for Clang's Link Time
Optimization (LTO). With -flto, instead of ELF object files, Clang
produces LLVM bitcode, which is compiled into native code at link
time, allowing the final binary to be optimized globally. For more
details, see:

https://llvm.org/docs/LinkTimeOptimization.html

The Kconfig option CONFIG_LTO_CLANG is implemented as a choice,
which defaults to LTO being disabled. To use LTO, the architecture
must select ARCH_SUPPORTS_LTO_CLANG and support:

- compiling with Clang,
- compiling inline assembly with Clang's integrated assembler,
- and linking with LLD.

While using full LTO results in the best runtime performance, the
compilation is not scalable in time or memory. CONFIG_THINLTO
enables ThinLTO, which allows parallel optimization and faster
incremental builds. ThinLTO is used by default if the architecture
also selects ARCH_SUPPORTS_THINLTO:

https://clang.llvm.org/docs/ThinLTO.html

To enable LTO, LLVM tools must be used to handle bitcode files. The
easiest way is to pass the LLVM=1 option to make:

$ make LLVM=1 defconfig
$ scripts/config -e LTO_CLANG
$ make LLVM=1

Alternatively, at least the following LLVM tools must be used:

CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm

To prepare for LTO support with other compilers, common parts are
gated behind the CONFIG_LTO option, and LTO can be disabled for
specific files by filtering out CC_FLAGS_LTO.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
Makefile | 19 +++++++-
arch/Kconfig | 75 +++++++++++++++++++++++++++++++
include/asm-generic/vmlinux.lds.h | 11 +++--
scripts/Makefile.build | 9 +++-
scripts/Makefile.modfinal | 9 +++-
scripts/Makefile.modpost | 21 ++++++++-
scripts/link-vmlinux.sh | 32 +++++++++----
7 files changed, 158 insertions(+), 18 deletions(-)

diff --git a/Makefile b/Makefile
index 8c8feb4245a6..240560e88d69 100644
--- a/Makefile
+++ b/Makefile
@@ -893,6 +893,21 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS)
export CC_FLAGS_SCS
endif

+ifdef CONFIG_LTO_CLANG
+ifdef CONFIG_THINLTO
+CC_FLAGS_LTO += -flto=thin -fsplit-lto-unit
+KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache
+else
+CC_FLAGS_LTO += -flto
+endif
+CC_FLAGS_LTO += -fvisibility=default
+endif
+
+ifdef CONFIG_LTO
+KBUILD_CFLAGS += $(CC_FLAGS_LTO)
+export CC_FLAGS_LTO
+endif
+
ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
KBUILD_CFLAGS += -falign-functions=32
endif
@@ -1473,7 +1488,7 @@ MRPROPER_FILES += include/config include/generated \
*.spec

# Directories & files removed with 'make distclean'
-DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS
+DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS .thinlto-cache

# clean - Delete most, but leave enough to build external modules
#
@@ -1719,7 +1734,7 @@ PHONY += compile_commands.json

clean-dirs := $(KBUILD_EXTMOD)
clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \
- $(KBUILD_EXTMOD)/compile_commands.json
+ $(KBUILD_EXTMOD)/compile_commands.json $(KBUILD_EXTMOD)/.thinlto-cache

PHONY += help
help:
diff --git a/arch/Kconfig b/arch/Kconfig
index 56b6ccc0e32d..a41fcb3ca7c6 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -598,6 +598,81 @@ config SHADOW_CALL_STACK
reading and writing arbitrary memory may be able to locate them
and hijack control flow by modifying the stacks.

+config LTO
+ bool
+
+config ARCH_SUPPORTS_LTO_CLANG
+ bool
+ help
+ An architecture should select this option if it supports:
+ - compiling with Clang,
+ - compiling inline assembly with Clang's integrated assembler,
+ - and linking with LLD.
+
+config ARCH_SUPPORTS_THINLTO
+ bool
+ help
+ An architecture should select this option if it supports Clang's
+ ThinLTO.
+
+config THINLTO
+ bool "Clang ThinLTO"
+ depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO
+ default y
+ help
+ This option enables Clang's ThinLTO, which allows for parallel
+ optimization and faster incremental compiles. More information
+ can be found from Clang's documentation:
+
+ https://clang.llvm.org/docs/ThinLTO.html
+
+ If you say N here, the compiler will use full LTO, which may
+ produce faster code, but building the kernel will be significantly
+ slower as the linker won't efficiently utilize multiple threads.
+
+ If unsure, say Y.
+
+choice
+ prompt "Link Time Optimization (LTO)"
+ default LTO_NONE
+ help
+ This option enables Link Time Optimization (LTO), which allows the
+ compiler to optimize binaries globally.
+
+ If unsure, select LTO_NONE. Note that LTO is very resource-intensive
+ so it's disabled by default.
+
+config LTO_NONE
+ bool "None"
+
+config LTO_CLANG
+ bool "Clang's Link Time Optimization (EXPERIMENTAL)"
+ # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
+ depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
+ depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
+ depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
+ depends on ARCH_SUPPORTS_LTO_CLANG
+ depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
+ depends on !KASAN
+ depends on !GCOV_KERNEL
+ depends on !MODVERSIONS
+ select LTO
+ help
+ This option enables Clang's Link Time Optimization (LTO), which
+ allows the compiler to optimize the kernel globally. If you enable
+ this option, the compiler generates LLVM bitcode instead of ELF
+ object files, and the actual compilation from bitcode happens at
+ the LTO link step, which may take several minutes depending on the
+ kernel configuration. More information can be found from LLVM's
+ documentation:
+
+ https://llvm.org/docs/LinkTimeOptimization.html
+
+ To select this option, you also need to use LLVM tools to handle
+ the bitcode by passing LLVM=1 to make.
+
+endchoice
+
config HAVE_ARCH_WITHIN_STACK_FRAMES
bool
help
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..8988a2e445d8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -90,15 +90,18 @@
* .data. We don't want to pull in .data..other sections, which Linux
* has defined. Same for text and bss.
*
+ * With LTO_CLANG, the linker also splits sections by default, so we need
+ * these macros to combine the sections during the final link.
+ *
* RODATA_MAIN is not used because existing code already defines .rodata.x
* sections to be brought in with rodata.
*/
-#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
-#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX*
+#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral*
#define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
-#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]*
-#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
+#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L*
+#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..compoundliteral*
#define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
#else
#define TEXT_MAIN .text
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 2175ddb1ee0c..ed74b2f986f7 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -111,7 +111,7 @@ endif
# ---------------------------------------------------------------------------

quiet_cmd_cc_s_c = CC $(quiet_modtag) $@
- cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) -fverbose-asm -S -o $@ $<
+ cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS) $(CC_FLAGS_LTO), $(c_flags)) -fverbose-asm -S -o $@ $<

$(obj)/%.s: $(src)/%.c FORCE
$(call if_changed_dep,cc_s_c)
@@ -425,8 +425,15 @@ $(obj)/lib.a: $(lib-y) FORCE
# Do not replace $(filter %.o,^) with $(real-prereqs). When a single object
# module is turned into a multi object module, $^ will contain header file
# dependencies recorded in the .*.cmd file.
+ifdef CONFIG_LTO_CLANG
+quiet_cmd_link_multi-m = AR [M] $@
+cmd_link_multi-m = \
+ rm -f $@; \
+ $(AR) cDPrsT $@ $(filter %.o,$^)
+else
quiet_cmd_link_multi-m = LD [M] $@
cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter %.o,$^)
+endif

$(multi-used-m): FORCE
$(call if_changed,link_multi-m)
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index ae01baf96f4e..2cb9a1d88434 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -6,6 +6,7 @@
PHONY := __modfinal
__modfinal:

+include $(objtree)/include/config/auto.conf
include $(srctree)/scripts/Kbuild.include

# for c_flags
@@ -29,6 +30,12 @@ quiet_cmd_cc_o_c = CC [M] $@

ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)

+ifdef CONFIG_LTO_CLANG
+# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to
+# avoid a second slow LTO link
+prelink-ext := .lto
+endif
+
quiet_cmd_ld_ko_o = LD [M] $@
cmd_ld_ko_o = \
$(LD) -r $(KBUILD_LDFLAGS) \
@@ -36,7 +43,7 @@ quiet_cmd_ld_ko_o = LD [M] $@
-T scripts/module.lds -o $@ $(filter %.o, $^); \
$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)

-$(modules): %.ko: %.o %.mod.o scripts/module.lds FORCE
+$(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE
+$(call if_changed,ld_ko_o)

targets += $(modules) $(modules:.ko=.mod.o)
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index f54b6ac37ac2..9ff8bfdb574d 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -43,6 +43,9 @@ __modpost:
include include/config/auto.conf
include scripts/Kbuild.include

+# for ld_flags
+include scripts/Makefile.lib
+
MODPOST = scripts/mod/modpost \
$(if $(CONFIG_MODVERSIONS),-m) \
$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a) \
@@ -102,12 +105,26 @@ $(input-symdump):
@echo >&2 'WARNING: Symbol version dump "$@" is missing.'
@echo >&2 ' Modules may not have dependencies or modversions.'

+ifdef CONFIG_LTO_CLANG
+# With CONFIG_LTO_CLANG, .o files might be LLVM bitcode, so we need to run
+# LTO to compile them into native code before running modpost
+prelink-ext := .lto
+
+quiet_cmd_cc_lto_link_modules = LTO [M] $@
+cmd_cc_lto_link_modules = $(LD) $(ld_flags) -r -o $@ --whole-archive $^
+
+%.lto.o: %.o
+ $(call if_changed,cc_lto_link_modules)
+endif
+
+modules := $(sort $(shell cat $(MODORDER)))
+
# Read out modules.order to pass in modpost.
# Otherwise, allmodconfig would fail with "Argument list too long".
quiet_cmd_modpost = MODPOST $@
- cmd_modpost = sed 's/ko$$/o/' $< | $(MODPOST) -T -
+ cmd_modpost = sed 's/\.ko$$/$(prelink-ext)\.o/' $< | $(MODPOST) -T -

-$(output-symdump): $(MODORDER) $(input-symdump) FORCE
+$(output-symdump): $(MODORDER) $(input-symdump) $(modules:.ko=$(prelink-ext).o) FORCE
$(call if_changed,modpost)

targets += $(output-symdump)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 6eded325c837..596507573a48 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -56,6 +56,14 @@ modpost_link()
${KBUILD_VMLINUX_LIBS} \
--end-group"

+ if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ # This might take a while, so indicate that we're doing
+ # an LTO link
+ info LTO ${1}
+ else
+ info LD ${1}
+ fi
+
${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects}
}

@@ -103,13 +111,22 @@ vmlinux_link()
fi

if [ "${SRCARCH}" != "um" ]; then
- objects="--whole-archive \
- ${KBUILD_VMLINUX_OBJS} \
- --no-whole-archive \
- --start-group \
- ${KBUILD_VMLINUX_LIBS} \
- --end-group \
- ${@}"
+ if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ # Use vmlinux.o instead of performing the slow LTO
+ # link again.
+ objects="--whole-archive \
+ vmlinux.o \
+ --no-whole-archive \
+ ${@}"
+ else
+ objects="--whole-archive \
+ ${KBUILD_VMLINUX_OBJS} \
+ --no-whole-archive \
+ --start-group \
+ ${KBUILD_VMLINUX_LIBS} \
+ --end-group \
+ ${@}"
+ fi

${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} \
${strip_debug#-Wl,} \
@@ -274,7 +291,6 @@ fi;
${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init need-builtin=1

#link vmlinux.o
-info LD vmlinux.o
modpost_link vmlinux.o
objtool_link vmlinux.o

--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:10:59

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 04/17] kbuild: lto: limit inlining

This change limits function inlining across translation unit boundaries
in order to reduce the binary size with LTO. The -import-instr-limit
flag defines a size limit, as the number of LLVM IR instructions, for
importing functions from other TUs, defaulting to 100.

Based on testing with arm64 defconfig, we found that a limit of 5 is a
reasonable compromise between performance and binary size, reducing the
size of a stripped vmlinux by 11%.

Suggested-by: George Burgess IV <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index f27c0da5d05a..bee378f9fd50 100644
--- a/Makefile
+++ b/Makefile
@@ -901,6 +901,9 @@ else
CC_FLAGS_LTO += -flto
endif
CC_FLAGS_LTO += -fvisibility=default
+
+# Limit inlining across translation units to reduce binary size
+KBUILD_LDFLAGS += -mllvm -import-instr-limit=5
endif

ifdef CONFIG_LTO
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:10:59

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 08/17] init: lto: fix PREL32 relocations

With LTO, the compiler can rename static functions to avoid global
naming collisions. As initcall functions are typically static,
renaming can break references to them in inline assembly. This
change adds a global stub with a stable name for each initcall to
fix the issue when PREL32 relocations are used.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
include/linux/init.h | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index d466bea7ecba..27b9478dcdef 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -209,26 +209,49 @@ extern bool initcall_debug;
*/
#define __initcall_section(__sec, __iid) \
#__sec ".init.." #__iid
+
+/*
+ * With LTO, the compiler can rename static functions to avoid
+ * global naming collisions. We use a global stub function for
+ * initcalls to create a stable symbol name whose address can be
+ * taken in inline assembly when PREL32 relocations are used.
+ */
+#define __initcall_stub(fn, __iid, id) \
+ __initcall_name(initstub, __iid, id)
+
+#define __define_initcall_stub(__stub, fn) \
+ int __init __stub(void); \
+ int __init __stub(void) \
+ { \
+ return fn(); \
+ } \
+ __ADDRESSABLE(__stub)
#else
#define __initcall_section(__sec, __iid) \
#__sec ".init"
+
+#define __initcall_stub(fn, __iid, id) fn
+
+#define __define_initcall_stub(__stub, fn) \
+ __ADDRESSABLE(fn)
#endif

#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
-#define ____define_initcall(fn, __name, __sec) \
- __ADDRESSABLE(fn) \
+#define ____define_initcall(fn, __stub, __name, __sec) \
+ __define_initcall_stub(__stub, fn) \
asm(".section \"" __sec "\", \"a\" \n" \
__stringify(__name) ": \n" \
- ".long " #fn " - . \n" \
+ ".long " __stringify(__stub) " - . \n" \
".previous \n");
#else
-#define ____define_initcall(fn, __name, __sec) \
+#define ____define_initcall(fn, __unused, __name, __sec) \
static initcall_t __name __used \
__attribute__((__section__(__sec))) = fn;
#endif

#define __unique_initcall(fn, id, __sec, __iid) \
____define_initcall(fn, \
+ __initcall_stub(fn, __iid, id), \
__initcall_name(initcall, __iid, id), \
__initcall_section(__sec, __iid))

--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:04

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 06/17] kbuild: lto: remove duplicate dependencies from .mod files

With LTO, llvm-nm prints out symbols for each archive member
separately, which results in a lot of duplicate dependencies in the
.mod file when CONFIG_TRIM_UNUSED_SYMS is enabled. When a module
consists of several compilation units, the output can exceed the
default xargs command size limit and split the dependency list to
multiple lines, which results in used symbols getting trimmed.

This change removes duplicate dependencies, which will reduce the
probability of this happening and makes .mod files smaller and
easier to read.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
scripts/Makefile.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index eae2f5386a03..f80ada58271d 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -281,7 +281,7 @@ endef

# List module undefined symbols (or empty line if not enabled)
ifdef CONFIG_TRIM_UNUSED_KSYMS
-cmd_undef_syms = $(NM) $< | sed -n 's/^ *U //p' | xargs echo
+cmd_undef_syms = $(NM) $< | sed -n 's/^ *U //p' | sort -u | xargs echo
else
cmd_undef_syms = echo
endif
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:04

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 17/17] arm64: allow LTO_CLANG and THINLTO to be selected

Allow CONFIG_LTO_CLANG and CONFIG_THINLTO to be enabled.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
arch/arm64/Kconfig | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c7f07978f5b6..56bd83a764f4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,8 @@ config ARM64
select ARCH_USE_SYM_ANNOTATIONS
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
+ select ARCH_SUPPORTS_LTO_CLANG
+ select ARCH_SUPPORTS_THINLTO
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && (GCC_VERSION >= 50000 || CC_IS_CLANG)
select ARCH_SUPPORTS_NUMA_BALANCING
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:17

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 10/17] modpost: lto: strip .lto from module names

With LTO, everything is compiled into LLVM bitcode, so we have to link
each module into native code before modpost. Kbuild uses the .lto.o
suffix for these files, which also ends up in module information. This
change strips the unnecessary .lto suffix from the module name.

Suggested-by: Bill Wendling <[email protected]>
Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
scripts/mod/modpost.c | 16 +++++++---------
scripts/mod/modpost.h | 9 +++++++++
scripts/mod/sumversion.c | 6 +++++-
3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index f882ce0d9327..ebb15cc3f262 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -17,7 +17,6 @@
#include <ctype.h>
#include <string.h>
#include <limits.h>
-#include <stdbool.h>
#include <errno.h>
#include "modpost.h"
#include "../../include/linux/license.h"
@@ -80,14 +79,6 @@ modpost_log(enum loglevel loglevel, const char *fmt, ...)
exit(1);
}

-static inline bool strends(const char *str, const char *postfix)
-{
- if (strlen(str) < strlen(postfix))
- return false;
-
- return strcmp(str + strlen(str) - strlen(postfix), postfix) == 0;
-}
-
void *do_nofail(void *ptr, const char *expr)
{
if (!ptr)
@@ -1984,6 +1975,10 @@ static char *remove_dot(char *s)
size_t m = strspn(s + n + 1, "0123456789");
if (m && (s[n + m] == '.' || s[n + m] == 0))
s[n] = 0;
+
+ /* strip trailing .lto */
+ if (strends(s, ".lto"))
+ s[strlen(s) - 4] = '\0';
}
return s;
}
@@ -2007,6 +2002,9 @@ static void read_symbols(const char *modname)
/* strip trailing .o */
tmp = NOFAIL(strdup(modname));
tmp[strlen(tmp) - 2] = '\0';
+ /* strip trailing .lto */
+ if (strends(tmp, ".lto"))
+ tmp[strlen(tmp) - 4] = '\0';
mod = new_module(tmp);
free(tmp);
}
diff --git a/scripts/mod/modpost.h b/scripts/mod/modpost.h
index 3aa052722233..fab30d201f9e 100644
--- a/scripts/mod/modpost.h
+++ b/scripts/mod/modpost.h
@@ -2,6 +2,7 @@
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
+#include <stdbool.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
@@ -180,6 +181,14 @@ static inline unsigned int get_secindex(const struct elf_info *info,
return info->symtab_shndx_start[sym - info->symtab_start];
}

+static inline bool strends(const char *str, const char *postfix)
+{
+ if (strlen(str) < strlen(postfix))
+ return false;
+
+ return strcmp(str + strlen(str) - strlen(postfix), postfix) == 0;
+}
+
/* file2alias.c */
extern unsigned int cross_build;
void handle_moddevtable(struct module *mod, struct elf_info *info,
diff --git a/scripts/mod/sumversion.c b/scripts/mod/sumversion.c
index d587f40f1117..760e6baa7eda 100644
--- a/scripts/mod/sumversion.c
+++ b/scripts/mod/sumversion.c
@@ -391,10 +391,14 @@ void get_src_version(const char *modname, char sum[], unsigned sumlen)
struct md4_ctx md;
char *fname;
char filelist[PATH_MAX + 1];
+ int postfix_len = 1;
+
+ if (strends(modname, ".lto.o"))
+ postfix_len = 5;

/* objects for a module are listed in the first line of *.mod file. */
snprintf(filelist, sizeof(filelist), "%.*smod",
- (int)strlen(modname) - 1, modname);
+ (int)strlen(modname) - postfix_len, modname);

buf = read_text_file(filelist);

--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:25

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 11/17] scripts/mod: disable LTO for empty.c

With CONFIG_LTO_CLANG, clang generates LLVM IR instead of ELF object
files. As empty.o is used for probing target properties, disable LTO
for it to produce an object file instead.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
scripts/mod/Makefile | 1 +
1 file changed, 1 insertion(+)

diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
index 78071681d924..c9e38ad937fd 100644
--- a/scripts/mod/Makefile
+++ b/scripts/mod/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
OBJECT_FILES_NON_STANDARD := y
+CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO)

hostprogs-always-y += modpost mk_elfconfig
always-y += empty.o
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:32

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 14/17] arm64: vdso: disable LTO

Disable LTO for the vDSO by filtering out CC_FLAGS_LTO, as there's no
point in using link-time optimization for the small about of C code.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
arch/arm64/kernel/vdso/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index d65f52264aba..50fe49fb4d95 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -30,7 +30,8 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \
ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18
ccflags-y += -DDISABLE_BRANCH_PROFILING

-CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS)
+CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) $(GCC_PLUGINS_CFLAGS) \
+ $(CC_FLAGS_LTO)
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:41

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 03/17] kbuild: lto: fix module versioning

With CONFIG_MODVERSIONS, version information is linked into each
compilation unit that exports symbols. With LTO, we cannot use this
method as all C code is compiled into LLVM bitcode instead. This
change collects symbol versions into .symversions files and merges
them in link-vmlinux.sh where they are all linked into vmlinux.o at
the same time.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
.gitignore | 1 +
Makefile | 3 ++-
arch/Kconfig | 1 -
scripts/Makefile.build | 33 +++++++++++++++++++++++++++++++--
scripts/Makefile.modpost | 6 +++++-
scripts/link-vmlinux.sh | 23 ++++++++++++++++++++++-
6 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/.gitignore b/.gitignore
index d01cda8e1177..44e34991875e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -41,6 +41,7 @@
*.so.dbg
*.su
*.symtypes
+*.symversions
*.tab.[ch]
*.tar
*.xz
diff --git a/Makefile b/Makefile
index 240560e88d69..f27c0da5d05a 100644
--- a/Makefile
+++ b/Makefile
@@ -1831,7 +1831,8 @@ clean: $(clean-dirs)
-o -name '.tmp_*.o.*' \
-o -name '*.c.[012]*.*' \
-o -name '*.ll' \
- -o -name '*.gcno' \) -type f -print | xargs rm -f
+ -o -name '*.gcno' \
+ -o -name '*.*.symversions' \) -type f -print | xargs rm -f

# Generate tags for editors
# ---------------------------------------------------------------------------
diff --git a/arch/Kconfig b/arch/Kconfig
index a41fcb3ca7c6..736ae228e506 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -655,7 +655,6 @@ config LTO_CLANG
depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
depends on !KASAN
depends on !GCOV_KERNEL
- depends on !MODVERSIONS
select LTO
help
This option enables Clang's Link Time Optimization (LTO), which
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ed74b2f986f7..eae2f5386a03 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -166,6 +166,15 @@ ifdef CONFIG_MODVERSIONS
# the actual value of the checksum generated by genksyms
# o remove .tmp_<file>.o to <file>.o

+ifdef CONFIG_LTO_CLANG
+# Generate .o.symversions files for each .o with exported symbols, and link these
+# to the kernel and/or modules at the end.
+cmd_modversions_c = \
+ if $(NM) $@ 2>/dev/null | grep -q __ksymtab; then \
+ $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
+ > [email protected]; \
+ fi;
+else
cmd_modversions_c = \
if $(OBJDUMP) -h $@ | grep -q __ksymtab; then \
$(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
@@ -177,6 +186,7 @@ cmd_modversions_c = \
rm -f $(@D)/.tmp_$(@F:.o=.ver); \
fi
endif
+endif

ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
# compiler will not generate __mcount_loc use recordmcount or recordmcount.pl
@@ -390,6 +400,18 @@ $(obj)/%.asn1.c $(obj)/%.asn1.h: $(src)/%.asn1 $(objtree)/scripts/asn1_compiler
$(subdir-builtin): $(obj)/%/built-in.a: $(obj)/% ;
$(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ;

+# combine symversions for later processing
+quiet_cmd_update_lto_symversions = SYMVER $@
+ifeq ($(CONFIG_LTO_CLANG) $(CONFIG_MODVERSIONS),y y)
+ cmd_update_lto_symversions = \
+ rm -f [email protected] \
+ $(foreach n, $(filter-out FORCE,$^), \
+ $(if $(wildcard $(n).symversions), \
+ ; cat $(n).symversions >> [email protected]))
+else
+ cmd_update_lto_symversions = echo >/dev/null
+endif
+
#
# Rule to compile a set of .o files into one .a file (without symbol table)
#
@@ -397,8 +419,11 @@ $(subdir-modorder): $(obj)/%/modules.order: $(obj)/% ;
quiet_cmd_ar_builtin = AR $@
cmd_ar_builtin = rm -f $@; $(AR) cDPrST $@ $(real-prereqs)

+quiet_cmd_ar_and_symver = AR $@
+ cmd_ar_and_symver = $(cmd_update_lto_symversions); $(cmd_ar_builtin)
+
$(obj)/built-in.a: $(real-obj-y) FORCE
- $(call if_changed,ar_builtin)
+ $(call if_changed,ar_and_symver)

#
# Rule to create modules.order file
@@ -418,8 +443,11 @@ $(obj)/modules.order: $(obj-m) FORCE
#
# Rule to compile a set of .o files into one .a file (with symbol table)
#
+quiet_cmd_ar_lib = AR $@
+ cmd_ar_lib = $(cmd_update_lto_symversions); $(cmd_ar)
+
$(obj)/lib.a: $(lib-y) FORCE
- $(call if_changed,ar)
+ $(call if_changed,ar_lib)

# NOTE:
# Do not replace $(filter %.o,^) with $(real-prereqs). When a single object
@@ -428,6 +456,7 @@ $(obj)/lib.a: $(lib-y) FORCE
ifdef CONFIG_LTO_CLANG
quiet_cmd_link_multi-m = AR [M] $@
cmd_link_multi-m = \
+ $(cmd_update_lto_symversions); \
rm -f $@; \
$(AR) cDPrsT $@ $(filter %.o,$^)
else
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 9ff8bfdb574d..066beffca09a 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -111,7 +111,11 @@ ifdef CONFIG_LTO_CLANG
prelink-ext := .lto

quiet_cmd_cc_lto_link_modules = LTO [M] $@
-cmd_cc_lto_link_modules = $(LD) $(ld_flags) -r -o $@ --whole-archive $^
+cmd_cc_lto_link_modules = \
+ $(LD) $(ld_flags) -r -o $@ \
+ $(shell [ -s $(@:.lto.o=.o.symversions) ] && \
+ echo -T $(@:.lto.o=.o.symversions)) \
+ --whole-archive $^

%.lto.o: %.o
$(call if_changed,cc_lto_link_modules)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 596507573a48..78e55fe7210b 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -43,11 +43,26 @@ info()
fi
}

+# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into
+# .tmp_symversions.lds
+gen_symversions()
+{
+ info GEN .tmp_symversions.lds
+ rm -f .tmp_symversions.lds
+
+ for o in ${KBUILD_VMLINUX_OBJS} ${KBUILD_VMLINUX_LIBS}; do
+ if [ -f ${o}.symversions ]; then
+ cat ${o}.symversions >> .tmp_symversions.lds
+ fi
+ done
+}
+
# Link of vmlinux.o used for section mismatch analysis
# ${1} output file
modpost_link()
{
local objects
+ local lds=""

objects="--whole-archive \
${KBUILD_VMLINUX_OBJS} \
@@ -57,6 +72,11 @@ modpost_link()
--end-group"

if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ if [ -n "${CONFIG_MODVERSIONS}" ]; then
+ gen_symversions
+ lds="${lds} -T .tmp_symversions.lds"
+ fi
+
# This might take a while, so indicate that we're doing
# an LTO link
info LTO ${1}
@@ -64,7 +84,7 @@ modpost_link()
info LD ${1}
fi

- ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects}
+ ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${lds} ${objects}
}

objtool_link()
@@ -242,6 +262,7 @@ cleanup()
{
rm -f .btf.*
rm -f .tmp_System.map
+ rm -f .tmp_symversions.lds
rm -f .tmp_vmlinux*
rm -f System.map
rm -f vmlinux
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:43

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 15/17] KVM: arm64: disable LTO for the nVHE directory

We use objcopy to manipulate ELF binaries for the nVHE code,
which fails with LTO as the compiler produces LLVM bitcode
instead. Disable LTO for this code to allow objcopy to be used.

Signed-off-by: Sami Tolvanen <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index ddde15fe85f2..4ceed7682287 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -51,9 +51,9 @@ $(obj)/kvm_nvhe.o: $(obj)/kvm_nvhe.tmp.o FORCE
quiet_cmd_hypcopy = HYPCOPY $@
cmd_hypcopy = $(OBJCOPY) --prefix-symbols=__kvm_nvhe_ $< $@

-# Remove ftrace and Shadow Call Stack CFLAGS.
+# Remove ftrace, LTO, and Shadow Call Stack CFLAGS.
# This is equivalent to the 'notrace' and '__noscs' annotations.
-KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))
+KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_FTRACE) $(CC_FLAGS_LTO) $(CC_FLAGS_SCS), $(KBUILD_CFLAGS))

# KVM nVHE code is run at a different exception code with a different map, so
# compiler instrumentation that inserts callbacks or checks into the code may
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:11:59

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 01/17] tracing: move function tracer options to Kconfig

Move function tracer options to Kconfig to make it easier to add
new methods for generating __mcount_loc, and to make the options
available also when building kernel modules.

Note that FTRACE_MCOUNT_USE_* options are updated on rebuild and
therefore, work even if the .config was generated in a different
environment.

Signed-off-by: Sami Tolvanen <[email protected]>
---
Makefile | 20 ++++++++------------
kernel/trace/Kconfig | 16 ++++++++++++++++
scripts/Makefile.build | 6 ++----
3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/Makefile b/Makefile
index e2c3f65c4721..8c8feb4245a6 100644
--- a/Makefile
+++ b/Makefile
@@ -851,12 +851,8 @@ KBUILD_CFLAGS += $(DEBUG_CFLAGS)
export DEBUG_CFLAGS

ifdef CONFIG_FUNCTION_TRACER
-ifdef CONFIG_FTRACE_MCOUNT_RECORD
- # gcc 5 supports generating the mcount tables directly
- ifeq ($(call cc-option-yn,-mrecord-mcount),y)
- CC_FLAGS_FTRACE += -mrecord-mcount
- export CC_USING_RECORD_MCOUNT := 1
- endif
+ifdef CONFIG_FTRACE_MCOUNT_USE_CC
+ CC_FLAGS_FTRACE += -mrecord-mcount
ifdef CONFIG_HAVE_NOP_MCOUNT
ifeq ($(call cc-option-yn, -mnop-mcount),y)
CC_FLAGS_FTRACE += -mnop-mcount
@@ -864,6 +860,12 @@ ifdef CONFIG_FTRACE_MCOUNT_RECORD
endif
endif
endif
+ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
+ ifdef CONFIG_HAVE_C_RECORDMCOUNT
+ BUILD_C_RECORDMCOUNT := y
+ export BUILD_C_RECORDMCOUNT
+ endif
+endif
ifdef CONFIG_HAVE_FENTRY
ifeq ($(call cc-option-yn, -mfentry),y)
CC_FLAGS_FTRACE += -mfentry
@@ -873,12 +875,6 @@ endif
export CC_FLAGS_FTRACE
KBUILD_CFLAGS += $(CC_FLAGS_FTRACE) $(CC_FLAGS_USING)
KBUILD_AFLAGS += $(CC_FLAGS_USING)
-ifdef CONFIG_DYNAMIC_FTRACE
- ifdef CONFIG_HAVE_C_RECORDMCOUNT
- BUILD_C_RECORDMCOUNT := y
- export BUILD_C_RECORDMCOUNT
- endif
-endif
endif

# We trigger additional mismatches with less inlining
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a4020c0b4508..927ad004888a 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -595,6 +595,22 @@ config FTRACE_MCOUNT_RECORD
depends on DYNAMIC_FTRACE
depends on HAVE_FTRACE_MCOUNT_RECORD

+config FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+ bool
+ depends on FTRACE_MCOUNT_RECORD
+
+config FTRACE_MCOUNT_USE_CC
+ def_bool y
+ depends on $(cc-option,-mrecord-mcount)
+ depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+ depends on FTRACE_MCOUNT_RECORD
+
+config FTRACE_MCOUNT_USE_RECORDMCOUNT
+ def_bool y
+ depends on !FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY
+ depends on !FTRACE_MCOUNT_USE_CC
+ depends on FTRACE_MCOUNT_RECORD
+
config TRACING_MAP
bool
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index ae647379b579..2175ddb1ee0c 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -178,8 +178,7 @@ cmd_modversions_c = \
fi
endif

-ifdef CONFIG_FTRACE_MCOUNT_RECORD
-ifndef CC_USING_RECORD_MCOUNT
+ifdef CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT
# compiler will not generate __mcount_loc use recordmcount or recordmcount.pl
ifdef BUILD_C_RECORDMCOUNT
ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
@@ -206,8 +205,7 @@ recordmcount_source := $(srctree)/scripts/recordmcount.pl
endif # BUILD_C_RECORDMCOUNT
cmd_record_mcount = $(if $(findstring $(strip $(CC_FLAGS_FTRACE)),$(_c_flags)), \
$(sub_cmd_record_mcount))
-endif # CC_USING_RECORD_MCOUNT
-endif # CONFIG_FTRACE_MCOUNT_RECORD
+endif # CONFIG_FTRACE_MCOUNT_USE_RECORDMCOUNT

ifdef CONFIG_STACK_VALIDATION
ifneq ($(SKIP_STACK_VALIDATION),1)
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:12:20

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 16/17] arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS

DYNAMIC_FTRACE_WITH_REGS uses -fpatchable-function-entry, which makes
running recordmcount unnecessary as there are no mcount calls in object
files, and __mcount_loc doesn't need to be generated.

While there's normally no harm in running recordmcount even when it's
not strictly needed, this won't work with LTO as we have LLVM bitcode
instead of ELF objects.

This change selects FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY, which
disables recordmcount when patchable function entries are used instead.

Signed-off-by: Sami Tolvanen <[email protected]>
---
arch/arm64/Kconfig | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1515f6f153a0..c7f07978f5b6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -158,6 +158,8 @@ config ARM64
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS \
if $(cc-option,-fpatchable-function-entry=2)
+ select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
+ if DYNAMIC_FTRACE_WITH_REGS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
--
2.29.2.299.gdc1121823c-goog

2020-11-18 22:13:41

by Sami Tolvanen

[permalink] [raw]
Subject: [PATCH v7 09/17] PCI: Fix PREL32 relocations for LTO

With Clang's Link Time Optimization (LTO), the compiler can rename
static functions to avoid global naming collisions. As PCI fixup
functions are typically static, renaming can break references
to them in inline assembly. This change adds a global stub to
DECLARE_PCI_FIXUP_SECTION to fix the issue when PREL32 relocations
are used.

Signed-off-by: Sami Tolvanen <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
---
include/linux/pci.h | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 22207a79762c..5b8505a5ca5f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1912,19 +1912,28 @@ enum pci_fixup_pass {
};

#ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS
-#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
- class_shift, hook) \
- __ADDRESSABLE(hook) \
+#define ___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
+ class_shift, hook, stub) \
+ void stub(struct pci_dev *dev); \
+ void stub(struct pci_dev *dev) \
+ { \
+ hook(dev); \
+ } \
asm(".section " #sec ", \"a\" \n" \
".balign 16 \n" \
".short " #vendor ", " #device " \n" \
".long " #class ", " #class_shift " \n" \
- ".long " #hook " - . \n" \
+ ".long " #stub " - . \n" \
".previous \n");
+
+#define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
+ class_shift, hook, stub) \
+ ___DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
+ class_shift, hook, stub)
#define DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
class_shift, hook) \
__DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \
- class_shift, hook)
+ class_shift, hook, __UNIQUE_ID(hook))
#else
/* Anonymous variables would be nice... */
#define DECLARE_PCI_FIXUP_SECTION(section, name, vendor, device, class, \
--
2.29.2.299.gdc1121823c-goog

2020-11-18 23:45:59

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <[email protected]> wrote:
>
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> be used in the kernel. Google has shipped millions of Pixel devices
> running three major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM bitcode,
> which Clang produces with LTO instead of ELF object files, postponing
> ELF processing until a later stage, and ensuring initcall ordering.
>
> Note that v7 brings back arm64 support as Will has now staged the
> prerequisite memory ordering patches [1], and drops x86_64 while we work
> on fixing the remaining objtool warnings [2].
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> [2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/
>
> You can also pull this series from
>
> https://github.com/samitolvanen/linux.git lto-v7

Thanks for continuing to drive this series Sami. For the series,

Tested-by: Nick Desaulniers <[email protected]>

I did virtualized boot tests with the series applied to aarch64
defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
with CONFIG_THINLTO. If you make changes to the series in follow ups,
please drop my tested by tag from the modified patches and I'll help
re-test. Some minor feedback on the Kconfig change, but I'll post it
off of that patch.

>
> ---
> Changes in v7:
>
> - Rebased to master again.
>
> - Added back arm64 patches as the prerequisites are now staged,
> and dropped x86_64 support until the remaining objtool issues
> are resolved.
>
> - Dropped ifdefs from module.lds.S.
>
> Changes in v6:
>
> - Added the missing --mcount flag to patch 5.
>
> - Dropped the arm64 patches from this series and will repost them
> later.
>
> Changes in v5:
>
> - Rebased on top of tip/master.
>
> - Changed the command line for objtool to use --vmlinux --duplicate
> to disable warnings about retpoline thunks and to fix .orc_unwind
> generation for vmlinux.o.
>
> - Added --noinstr flag to objtool, so we can use --vmlinux without
> also enabling noinstr validation.
>
> - Disabled objtool's unreachable instruction warnings with LTO to
> disable false positives for the int3 padding in vmlinux.o.
>
> - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps
> in x86 assembly code to fix objtool warnings with retpoline.
>
> - Fixed modpost warnings about missing version information with
> CONFIG_MODVERSIONS.
>
> - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks
> to Sedat for pointing this out.
>
> - Updated the help text for ThinLTO to better explain the trade-offs.
>
> - Updated commit messages with better explanations.
>
> Changes in v4:
>
> - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool.
>
> - Moved ftrace configs related to generating __mcount_loc to Kconfig,
> so they are available also in Makefile.modfinal.
>
> - Dropped two prerequisite patches that were merged to Linus' tree.
>
> Changes in v3:
>
> - Added a separate patch to remove the unused DISABLE_LTO treewide,
> as filtering out CC_FLAGS_LTO instead is preferred.
>
> - Updated the Kconfig help to explain why LTO is behind a choice
> and disabled by default.
>
> - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now
> appended directly to CC_FLAGS_LTO.
>
> - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier.
>
> - Fixed ThinLTO cache handling for external module builds.
>
> - Rebased on top of Masahiro's patch for preprocessing modules.lds,
> and moved the contents of module-lto.lds to modules.lds.S.
>
> - Moved objtool_args to Makefile.lib to avoid duplication of the
> command line parameters in Makefile.modfinal.
>
> - Clarified in the commit message for the initcall ordering patch
> that the initcall order remains the same as without LTO.
>
> - Changed link-vmlinux.sh to use jobserver-exec to control the
> number of jobs started by generate_initcall_ordering.pl.
>
> - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's
> no longer needed with ToT kernel.
>
> - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug
> with stack protector attributes.
>
> Changes in v2:
>
> - Fixed -Wmissing-prototypes warnings with W=1.
>
> - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache
> scrubbing to make distclean.
>
> - Added a comment about Clang >=11 being required.
>
> - Added a patch to disable LTO for the arm64 KVM nVHE code.
>
> - Disabled objtool's noinstr validation with LTO unless enabled.
>
> - Included Peter's proposed objtool mcount patch in the series
> and replaced recordmcount with the objtool pass to avoid
> whitelisting relocations that are not calls.
>
> - Updated several commit messages with better explanations.
>
>
> Sami Tolvanen (17):
> tracing: move function tracer options to Kconfig
> kbuild: add support for Clang LTO
> kbuild: lto: fix module versioning
> kbuild: lto: limit inlining
> kbuild: lto: merge module sections
> kbuild: lto: remove duplicate dependencies from .mod files
> init: lto: ensure initcall ordering
> init: lto: fix PREL32 relocations
> PCI: Fix PREL32 relocations for LTO
> modpost: lto: strip .lto from module names
> scripts/mod: disable LTO for empty.c
> efi/libstub: disable LTO
> drivers/misc/lkdtm: disable LTO for rodata.o
> arm64: vdso: disable LTO
> KVM: arm64: disable LTO for the nVHE directory
> arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
> arm64: allow LTO_CLANG and THINLTO to be selected
>
> .gitignore | 1 +
> Makefile | 45 +++--
> arch/Kconfig | 74 +++++++
> arch/arm64/Kconfig | 4 +
> arch/arm64/kernel/vdso/Makefile | 3 +-
> arch/arm64/kvm/hyp/nvhe/Makefile | 4 +-
> drivers/firmware/efi/libstub/Makefile | 2 +
> drivers/misc/lkdtm/Makefile | 1 +
> include/asm-generic/vmlinux.lds.h | 11 +-
> include/linux/init.h | 79 +++++++-
> include/linux/pci.h | 19 +-
> kernel/trace/Kconfig | 16 ++
> scripts/Makefile.build | 50 ++++-
> scripts/Makefile.lib | 6 +-
> scripts/Makefile.modfinal | 9 +-
> scripts/Makefile.modpost | 25 ++-
> scripts/generate_initcall_order.pl | 270 ++++++++++++++++++++++++++
> scripts/link-vmlinux.sh | 70 ++++++-
> scripts/mod/Makefile | 1 +
> scripts/mod/modpost.c | 16 +-
> scripts/mod/modpost.h | 9 +
> scripts/mod/sumversion.c | 6 +-
> scripts/module.lds.S | 24 +++
> 23 files changed, 677 insertions(+), 68 deletions(-)
> create mode 100755 scripts/generate_initcall_order.pl
>
>
> base-commit: 0fa8ee0d9ab95c9350b8b84574824d9a384a9f7d
> --
> 2.29.2.299.gdc1121823c-goog
>


--
Thanks,
~Nick Desaulniers

2020-11-18 23:51:28

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <[email protected]> wrote:
>
> This change adds build system support for Clang's Link Time
> Optimization (LTO). With -flto, instead of ELF object files, Clang
> produces LLVM bitcode, which is compiled into native code at link
> time, allowing the final binary to be optimized globally. For more
> details, see:
>
> https://llvm.org/docs/LinkTimeOptimization.html
>
> The Kconfig option CONFIG_LTO_CLANG is implemented as a choice,
> which defaults to LTO being disabled. To use LTO, the architecture
> must select ARCH_SUPPORTS_LTO_CLANG and support:
>
> - compiling with Clang,
> - compiling inline assembly with Clang's integrated assembler,
> - and linking with LLD.
>
> While using full LTO results in the best runtime performance, the
> compilation is not scalable in time or memory. CONFIG_THINLTO
> enables ThinLTO, which allows parallel optimization and faster
> incremental builds. ThinLTO is used by default if the architecture
> also selects ARCH_SUPPORTS_THINLTO:
>
> https://clang.llvm.org/docs/ThinLTO.html
>
> To enable LTO, LLVM tools must be used to handle bitcode files. The
> easiest way is to pass the LLVM=1 option to make:
>
> $ make LLVM=1 defconfig
> $ scripts/config -e LTO_CLANG
> $ make LLVM=1
>
> Alternatively, at least the following LLVM tools must be used:
>
> CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm
>
> To prepare for LTO support with other compilers, common parts are
> gated behind the CONFIG_LTO option, and LTO can be disabled for
> specific files by filtering out CC_FLAGS_LTO.
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> ---
> Makefile | 19 +++++++-
> arch/Kconfig | 75 +++++++++++++++++++++++++++++++
> include/asm-generic/vmlinux.lds.h | 11 +++--
> scripts/Makefile.build | 9 +++-
> scripts/Makefile.modfinal | 9 +++-
> scripts/Makefile.modpost | 21 ++++++++-
> scripts/link-vmlinux.sh | 32 +++++++++----
> 7 files changed, 158 insertions(+), 18 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 8c8feb4245a6..240560e88d69 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -893,6 +893,21 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS)
> export CC_FLAGS_SCS
> endif
>
> +ifdef CONFIG_LTO_CLANG
> +ifdef CONFIG_THINLTO
> +CC_FLAGS_LTO += -flto=thin -fsplit-lto-unit
> +KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache
> +else
> +CC_FLAGS_LTO += -flto
> +endif
> +CC_FLAGS_LTO += -fvisibility=default
> +endif
> +
> +ifdef CONFIG_LTO
> +KBUILD_CFLAGS += $(CC_FLAGS_LTO)
> +export CC_FLAGS_LTO
> +endif
> +
> ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
> KBUILD_CFLAGS += -falign-functions=32
> endif
> @@ -1473,7 +1488,7 @@ MRPROPER_FILES += include/config include/generated \
> *.spec
>
> # Directories & files removed with 'make distclean'
> -DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS
> +DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS .thinlto-cache
>
> # clean - Delete most, but leave enough to build external modules
> #
> @@ -1719,7 +1734,7 @@ PHONY += compile_commands.json
>
> clean-dirs := $(KBUILD_EXTMOD)
> clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \
> - $(KBUILD_EXTMOD)/compile_commands.json
> + $(KBUILD_EXTMOD)/compile_commands.json $(KBUILD_EXTMOD)/.thinlto-cache
>
> PHONY += help
> help:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 56b6ccc0e32d..a41fcb3ca7c6 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -598,6 +598,81 @@ config SHADOW_CALL_STACK
> reading and writing arbitrary memory may be able to locate them
> and hijack control flow by modifying the stacks.
>
> +config LTO
> + bool
> +
> +config ARCH_SUPPORTS_LTO_CLANG
> + bool
> + help
> + An architecture should select this option if it supports:
> + - compiling with Clang,
> + - compiling inline assembly with Clang's integrated assembler,
> + - and linking with LLD.
> +
> +config ARCH_SUPPORTS_THINLTO
> + bool
> + help
> + An architecture should select this option if it supports Clang's
> + ThinLTO.
> +
> +config THINLTO
> + bool "Clang ThinLTO"
> + depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO
> + default y
> + help
> + This option enables Clang's ThinLTO, which allows for parallel
> + optimization and faster incremental compiles. More information
> + can be found from Clang's documentation:
> +
> + https://clang.llvm.org/docs/ThinLTO.html
> +
> + If you say N here, the compiler will use full LTO, which may
> + produce faster code, but building the kernel will be significantly
> + slower as the linker won't efficiently utilize multiple threads.
> +
> + If unsure, say Y.

I think the order of these new configs makes it so that ThinLTO
appears above LTO in menuconfig; I don't like that, and wish it came
immediately after. Does `THINLTO` have to be defined _after_ the
choice for LTO_NONE/LTO_CLANG, perhaps?

Secondly, I don't like how ThinLTO is a config and not a choice. If I
don't set ThinLTO, what am I getting? That's a rhetorical question; I
know its full LTO, and I guess the help text does talk about the
tradeoffs and what you would get. I guess what's curious to me is
"why does it display ThinLTO? Why not FullLTO?" I can't help but
wonder if a kconfig `choice` rather than a `config` would be better
here, that way it's more obvious the user is making a choice between
ThinLTO vs Full LTO, rather than the current patches which look like
"ThinkLTO on/off."

These are cosmetic concerns, feel free to ignore. Just a thought.

> +
> +choice
> + prompt "Link Time Optimization (LTO)"
> + default LTO_NONE
> + help
> + This option enables Link Time Optimization (LTO), which allows the
> + compiler to optimize binaries globally.
> +
> + If unsure, select LTO_NONE. Note that LTO is very resource-intensive
> + so it's disabled by default.
> +
> +config LTO_NONE
> + bool "None"
> +
> +config LTO_CLANG
> + bool "Clang's Link Time Optimization (EXPERIMENTAL)"
> + # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
> + depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
> + depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
> + depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
> + depends on ARCH_SUPPORTS_LTO_CLANG
> + depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
> + depends on !KASAN
> + depends on !GCOV_KERNEL
> + depends on !MODVERSIONS
> + select LTO
> + help
> + This option enables Clang's Link Time Optimization (LTO), which
> + allows the compiler to optimize the kernel globally. If you enable
> + this option, the compiler generates LLVM bitcode instead of ELF
> + object files, and the actual compilation from bitcode happens at
> + the LTO link step, which may take several minutes depending on the
> + kernel configuration. More information can be found from LLVM's
> + documentation:
> +
> + https://llvm.org/docs/LinkTimeOptimization.html
> +
> + To select this option, you also need to use LLVM tools to handle
> + the bitcode by passing LLVM=1 to make.
> +
> +endchoice
> +
> config HAVE_ARCH_WITHIN_STACK_FRAMES
> bool
> help
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index b2b3d81b1535..8988a2e445d8 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -90,15 +90,18 @@
> * .data. We don't want to pull in .data..other sections, which Linux
> * has defined. Same for text and bss.
> *
> + * With LTO_CLANG, the linker also splits sections by default, so we need
> + * these macros to combine the sections during the final link.
> + *
> * RODATA_MAIN is not used because existing code already defines .rodata.x
> * sections to be brought in with rodata.
> */
> -#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> +#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
> #define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
> -#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX*
> +#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral*
> #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
> -#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]*
> -#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
> +#define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]* .rodata..L*
> +#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]* .bss..compoundliteral*
> #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
> #else
> #define TEXT_MAIN .text
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index 2175ddb1ee0c..ed74b2f986f7 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -111,7 +111,7 @@ endif
> # ---------------------------------------------------------------------------
>
> quiet_cmd_cc_s_c = CC $(quiet_modtag) $@
> - cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS), $(c_flags)) -fverbose-asm -S -o $@ $<
> + cmd_cc_s_c = $(CC) $(filter-out $(DEBUG_CFLAGS) $(CC_FLAGS_LTO), $(c_flags)) -fverbose-asm -S -o $@ $<
>
> $(obj)/%.s: $(src)/%.c FORCE
> $(call if_changed_dep,cc_s_c)
> @@ -425,8 +425,15 @@ $(obj)/lib.a: $(lib-y) FORCE
> # Do not replace $(filter %.o,^) with $(real-prereqs). When a single object
> # module is turned into a multi object module, $^ will contain header file
> # dependencies recorded in the .*.cmd file.
> +ifdef CONFIG_LTO_CLANG
> +quiet_cmd_link_multi-m = AR [M] $@
> +cmd_link_multi-m = \
> + rm -f $@; \
> + $(AR) cDPrsT $@ $(filter %.o,$^)
> +else
> quiet_cmd_link_multi-m = LD [M] $@
> cmd_link_multi-m = $(LD) $(ld_flags) -r -o $@ $(filter %.o,$^)
> +endif
>
> $(multi-used-m): FORCE
> $(call if_changed,link_multi-m)
> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
> index ae01baf96f4e..2cb9a1d88434 100644
> --- a/scripts/Makefile.modfinal
> +++ b/scripts/Makefile.modfinal
> @@ -6,6 +6,7 @@
> PHONY := __modfinal
> __modfinal:
>
> +include $(objtree)/include/config/auto.conf
> include $(srctree)/scripts/Kbuild.include
>
> # for c_flags
> @@ -29,6 +30,12 @@ quiet_cmd_cc_o_c = CC [M] $@
>
> ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
>
> +ifdef CONFIG_LTO_CLANG
> +# With CONFIG_LTO_CLANG, reuse the object file we compiled for modpost to
> +# avoid a second slow LTO link
> +prelink-ext := .lto
> +endif
> +
> quiet_cmd_ld_ko_o = LD [M] $@
> cmd_ld_ko_o = \
> $(LD) -r $(KBUILD_LDFLAGS) \
> @@ -36,7 +43,7 @@ quiet_cmd_ld_ko_o = LD [M] $@
> -T scripts/module.lds -o $@ $(filter %.o, $^); \
> $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
>
> -$(modules): %.ko: %.o %.mod.o scripts/module.lds FORCE
> +$(modules): %.ko: %$(prelink-ext).o %.mod.o scripts/module.lds FORCE
> +$(call if_changed,ld_ko_o)
>
> targets += $(modules) $(modules:.ko=.mod.o)
> diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
> index f54b6ac37ac2..9ff8bfdb574d 100644
> --- a/scripts/Makefile.modpost
> +++ b/scripts/Makefile.modpost
> @@ -43,6 +43,9 @@ __modpost:
> include include/config/auto.conf
> include scripts/Kbuild.include
>
> +# for ld_flags
> +include scripts/Makefile.lib
> +
> MODPOST = scripts/mod/modpost \
> $(if $(CONFIG_MODVERSIONS),-m) \
> $(if $(CONFIG_MODULE_SRCVERSION_ALL),-a) \
> @@ -102,12 +105,26 @@ $(input-symdump):
> @echo >&2 'WARNING: Symbol version dump "$@" is missing.'
> @echo >&2 ' Modules may not have dependencies or modversions.'
>
> +ifdef CONFIG_LTO_CLANG
> +# With CONFIG_LTO_CLANG, .o files might be LLVM bitcode, so we need to run
> +# LTO to compile them into native code before running modpost
> +prelink-ext := .lto
> +
> +quiet_cmd_cc_lto_link_modules = LTO [M] $@
> +cmd_cc_lto_link_modules = $(LD) $(ld_flags) -r -o $@ --whole-archive $^
> +
> +%.lto.o: %.o
> + $(call if_changed,cc_lto_link_modules)
> +endif
> +
> +modules := $(sort $(shell cat $(MODORDER)))
> +
> # Read out modules.order to pass in modpost.
> # Otherwise, allmodconfig would fail with "Argument list too long".
> quiet_cmd_modpost = MODPOST $@
> - cmd_modpost = sed 's/ko$$/o/' $< | $(MODPOST) -T -
> + cmd_modpost = sed 's/\.ko$$/$(prelink-ext)\.o/' $< | $(MODPOST) -T -
>
> -$(output-symdump): $(MODORDER) $(input-symdump) FORCE
> +$(output-symdump): $(MODORDER) $(input-symdump) $(modules:.ko=$(prelink-ext).o) FORCE
> $(call if_changed,modpost)
>
> targets += $(output-symdump)
> diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> index 6eded325c837..596507573a48 100755
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@ -56,6 +56,14 @@ modpost_link()
> ${KBUILD_VMLINUX_LIBS} \
> --end-group"
>
> + if [ -n "${CONFIG_LTO_CLANG}" ]; then
> + # This might take a while, so indicate that we're doing
> + # an LTO link
> + info LTO ${1}
> + else
> + info LD ${1}
> + fi
> +
> ${LD} ${KBUILD_LDFLAGS} -r -o ${1} ${objects}
> }
>
> @@ -103,13 +111,22 @@ vmlinux_link()
> fi
>
> if [ "${SRCARCH}" != "um" ]; then
> - objects="--whole-archive \
> - ${KBUILD_VMLINUX_OBJS} \
> - --no-whole-archive \
> - --start-group \
> - ${KBUILD_VMLINUX_LIBS} \
> - --end-group \
> - ${@}"
> + if [ -n "${CONFIG_LTO_CLANG}" ]; then
> + # Use vmlinux.o instead of performing the slow LTO
> + # link again.
> + objects="--whole-archive \
> + vmlinux.o \
> + --no-whole-archive \
> + ${@}"
> + else
> + objects="--whole-archive \
> + ${KBUILD_VMLINUX_OBJS} \
> + --no-whole-archive \
> + --start-group \
> + ${KBUILD_VMLINUX_LIBS} \
> + --end-group \
> + ${@}"
> + fi
>
> ${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} \
> ${strip_debug#-Wl,} \
> @@ -274,7 +291,6 @@ fi;
> ${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init need-builtin=1
>
> #link vmlinux.o
> -info LD vmlinux.o
> modpost_link vmlinux.o
> objtool_link vmlinux.o
>
> --
> 2.29.2.299.gdc1121823c-goog
>


--
Thanks,
~Nick Desaulniers

2020-11-20 04:06:34

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> be used in the kernel. Google has shipped millions of Pixel devices
> running three major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM bitcode,
> which Clang produces with LTO instead of ELF object files, postponing
> ELF processing until a later stage, and ensuring initcall ordering.
>
> Note that v7 brings back arm64 support as Will has now staged the
> prerequisite memory ordering patches [1], and drops x86_64 while we work
> on fixing the remaining objtool warnings [2].

Sami,

Here are some patches to fix the objtool issues (other than crypto which
I'll work on next).

https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git objtool-vmlinux

--
Josh

2020-11-20 10:32:07

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <[email protected]> wrote:
>
> On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <[email protected]> wrote:
> >
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > be used in the kernel. Google has shipped millions of Pixel devices
> > running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM bitcode,
> > which Clang produces with LTO instead of ELF object files, postponing
> > ELF processing until a later stage, and ensuring initcall ordering.
> >
> > Note that v7 brings back arm64 support as Will has now staged the
> > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > on fixing the remaining objtool warnings [2].
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> > [2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/
> >
> > You can also pull this series from
> >
> > https://github.com/samitolvanen/linux.git lto-v7
>
> Thanks for continuing to drive this series Sami. For the series,
>
> Tested-by: Nick Desaulniers <[email protected]>
>
> I did virtualized boot tests with the series applied to aarch64
> defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> with CONFIG_THINLTO. If you make changes to the series in follow ups,
> please drop my tested by tag from the modified patches and I'll help
> re-test. Some minor feedback on the Kconfig change, but I'll post it
> off of that patch.
>

When you say 'virtualized" do you mean QEMU on x86? Or actual
virtualization on an AArch64 KVM host?

The distinction is important here, given the potential impact of LTO
on things that QEMU simply does not model when it runs in TCG mode on
a foreign host architecture.

> >
> > ---
> > Changes in v7:
> >
> > - Rebased to master again.
> >
> > - Added back arm64 patches as the prerequisites are now staged,
> > and dropped x86_64 support until the remaining objtool issues
> > are resolved.
> >
> > - Dropped ifdefs from module.lds.S.
> >
> > Changes in v6:
> >
> > - Added the missing --mcount flag to patch 5.
> >
> > - Dropped the arm64 patches from this series and will repost them
> > later.
> >
> > Changes in v5:
> >
> > - Rebased on top of tip/master.
> >
> > - Changed the command line for objtool to use --vmlinux --duplicate
> > to disable warnings about retpoline thunks and to fix .orc_unwind
> > generation for vmlinux.o.
> >
> > - Added --noinstr flag to objtool, so we can use --vmlinux without
> > also enabling noinstr validation.
> >
> > - Disabled objtool's unreachable instruction warnings with LTO to
> > disable false positives for the int3 padding in vmlinux.o.
> >
> > - Added ANNOTATE_RETPOLINE_SAFE annotations to the indirect jumps
> > in x86 assembly code to fix objtool warnings with retpoline.
> >
> > - Fixed modpost warnings about missing version information with
> > CONFIG_MODVERSIONS.
> >
> > - Included Makefile.lib into Makefile.modpost for ld_flags. Thanks
> > to Sedat for pointing this out.
> >
> > - Updated the help text for ThinLTO to better explain the trade-offs.
> >
> > - Updated commit messages with better explanations.
> >
> > Changes in v4:
> >
> > - Fixed a typo in Makefile.lib to correctly pass --no-fp to objtool.
> >
> > - Moved ftrace configs related to generating __mcount_loc to Kconfig,
> > so they are available also in Makefile.modfinal.
> >
> > - Dropped two prerequisite patches that were merged to Linus' tree.
> >
> > Changes in v3:
> >
> > - Added a separate patch to remove the unused DISABLE_LTO treewide,
> > as filtering out CC_FLAGS_LTO instead is preferred.
> >
> > - Updated the Kconfig help to explain why LTO is behind a choice
> > and disabled by default.
> >
> > - Dropped CC_FLAGS_LTO_CLANG, compiler-specific LTO flags are now
> > appended directly to CC_FLAGS_LTO.
> >
> > - Updated $(AR) flags as KBUILD_ARFLAGS was removed earlier.
> >
> > - Fixed ThinLTO cache handling for external module builds.
> >
> > - Rebased on top of Masahiro's patch for preprocessing modules.lds,
> > and moved the contents of module-lto.lds to modules.lds.S.
> >
> > - Moved objtool_args to Makefile.lib to avoid duplication of the
> > command line parameters in Makefile.modfinal.
> >
> > - Clarified in the commit message for the initcall ordering patch
> > that the initcall order remains the same as without LTO.
> >
> > - Changed link-vmlinux.sh to use jobserver-exec to control the
> > number of jobs started by generate_initcall_ordering.pl.
> >
> > - Dropped the x86/relocs patch to whitelist L4_PAGE_OFFSET as it's
> > no longer needed with ToT kernel.
> >
> > - Disabled LTO for arch/x86/power/cpu.c to work around a Clang bug
> > with stack protector attributes.
> >
> > Changes in v2:
> >
> > - Fixed -Wmissing-prototypes warnings with W=1.
> >
> > - Dropped cc-option from -fsplit-lto-unit and added .thinlto-cache
> > scrubbing to make distclean.
> >
> > - Added a comment about Clang >=11 being required.
> >
> > - Added a patch to disable LTO for the arm64 KVM nVHE code.
> >
> > - Disabled objtool's noinstr validation with LTO unless enabled.
> >
> > - Included Peter's proposed objtool mcount patch in the series
> > and replaced recordmcount with the objtool pass to avoid
> > whitelisting relocations that are not calls.
> >
> > - Updated several commit messages with better explanations.
> >
> >
> > Sami Tolvanen (17):
> > tracing: move function tracer options to Kconfig
> > kbuild: add support for Clang LTO
> > kbuild: lto: fix module versioning
> > kbuild: lto: limit inlining
> > kbuild: lto: merge module sections
> > kbuild: lto: remove duplicate dependencies from .mod files
> > init: lto: ensure initcall ordering
> > init: lto: fix PREL32 relocations
> > PCI: Fix PREL32 relocations for LTO
> > modpost: lto: strip .lto from module names
> > scripts/mod: disable LTO for empty.c
> > efi/libstub: disable LTO
> > drivers/misc/lkdtm: disable LTO for rodata.o
> > arm64: vdso: disable LTO
> > KVM: arm64: disable LTO for the nVHE directory
> > arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
> > arm64: allow LTO_CLANG and THINLTO to be selected
> >
> > .gitignore | 1 +
> > Makefile | 45 +++--
> > arch/Kconfig | 74 +++++++
> > arch/arm64/Kconfig | 4 +
> > arch/arm64/kernel/vdso/Makefile | 3 +-
> > arch/arm64/kvm/hyp/nvhe/Makefile | 4 +-
> > drivers/firmware/efi/libstub/Makefile | 2 +
> > drivers/misc/lkdtm/Makefile | 1 +
> > include/asm-generic/vmlinux.lds.h | 11 +-
> > include/linux/init.h | 79 +++++++-
> > include/linux/pci.h | 19 +-
> > kernel/trace/Kconfig | 16 ++
> > scripts/Makefile.build | 50 ++++-
> > scripts/Makefile.lib | 6 +-
> > scripts/Makefile.modfinal | 9 +-
> > scripts/Makefile.modpost | 25 ++-
> > scripts/generate_initcall_order.pl | 270 ++++++++++++++++++++++++++
> > scripts/link-vmlinux.sh | 70 ++++++-
> > scripts/mod/Makefile | 1 +
> > scripts/mod/modpost.c | 16 +-
> > scripts/mod/modpost.h | 9 +
> > scripts/mod/sumversion.c | 6 +-
> > scripts/module.lds.S | 24 +++
> > 23 files changed, 677 insertions(+), 68 deletions(-)
> > create mode 100755 scripts/generate_initcall_order.pl
> >
> >
> > base-commit: 0fa8ee0d9ab95c9350b8b84574824d9a384a9f7d
> > --
> > 2.29.2.299.gdc1121823c-goog
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

2020-11-20 16:25:33

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Wed, Nov 18, 2020 at 3:49 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <[email protected]> wrote:
> >
> > This change adds build system support for Clang's Link Time
> > Optimization (LTO). With -flto, instead of ELF object files, Clang
> > produces LLVM bitcode, which is compiled into native code at link
> > time, allowing the final binary to be optimized globally. For more
> > details, see:
> >
> > https://llvm.org/docs/LinkTimeOptimization.html
> >
> > The Kconfig option CONFIG_LTO_CLANG is implemented as a choice,
> > which defaults to LTO being disabled. To use LTO, the architecture
> > must select ARCH_SUPPORTS_LTO_CLANG and support:
> >
> > - compiling with Clang,
> > - compiling inline assembly with Clang's integrated assembler,
> > - and linking with LLD.
> >
> > While using full LTO results in the best runtime performance, the
> > compilation is not scalable in time or memory. CONFIG_THINLTO
> > enables ThinLTO, which allows parallel optimization and faster
> > incremental builds. ThinLTO is used by default if the architecture
> > also selects ARCH_SUPPORTS_THINLTO:
> >
> > https://clang.llvm.org/docs/ThinLTO.html
> >
> > To enable LTO, LLVM tools must be used to handle bitcode files. The
> > easiest way is to pass the LLVM=1 option to make:
> >
> > $ make LLVM=1 defconfig
> > $ scripts/config -e LTO_CLANG
> > $ make LLVM=1
> >
> > Alternatively, at least the following LLVM tools must be used:
> >
> > CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm
> >
> > To prepare for LTO support with other compilers, common parts are
> > gated behind the CONFIG_LTO option, and LTO can be disabled for
> > specific files by filtering out CC_FLAGS_LTO.
> >
> > Signed-off-by: Sami Tolvanen <[email protected]>
> > Reviewed-by: Kees Cook <[email protected]>
> > ---
> > Makefile | 19 +++++++-
> > arch/Kconfig | 75 +++++++++++++++++++++++++++++++
> > include/asm-generic/vmlinux.lds.h | 11 +++--
> > scripts/Makefile.build | 9 +++-
> > scripts/Makefile.modfinal | 9 +++-
> > scripts/Makefile.modpost | 21 ++++++++-
> > scripts/link-vmlinux.sh | 32 +++++++++----
> > 7 files changed, 158 insertions(+), 18 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index 8c8feb4245a6..240560e88d69 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -893,6 +893,21 @@ KBUILD_CFLAGS += $(CC_FLAGS_SCS)
> > export CC_FLAGS_SCS
> > endif
> >
> > +ifdef CONFIG_LTO_CLANG
> > +ifdef CONFIG_THINLTO
> > +CC_FLAGS_LTO += -flto=thin -fsplit-lto-unit
> > +KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache
> > +else
> > +CC_FLAGS_LTO += -flto
> > +endif
> > +CC_FLAGS_LTO += -fvisibility=default
> > +endif
> > +
> > +ifdef CONFIG_LTO
> > +KBUILD_CFLAGS += $(CC_FLAGS_LTO)
> > +export CC_FLAGS_LTO
> > +endif
> > +
> > ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_32B
> > KBUILD_CFLAGS += -falign-functions=32
> > endif
> > @@ -1473,7 +1488,7 @@ MRPROPER_FILES += include/config include/generated \
> > *.spec
> >
> > # Directories & files removed with 'make distclean'
> > -DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS
> > +DISTCLEAN_FILES += tags TAGS cscope* GPATH GTAGS GRTAGS GSYMS .thinlto-cache
> >
> > # clean - Delete most, but leave enough to build external modules
> > #
> > @@ -1719,7 +1734,7 @@ PHONY += compile_commands.json
> >
> > clean-dirs := $(KBUILD_EXTMOD)
> > clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers $(KBUILD_EXTMOD)/modules.nsdeps \
> > - $(KBUILD_EXTMOD)/compile_commands.json
> > + $(KBUILD_EXTMOD)/compile_commands.json $(KBUILD_EXTMOD)/.thinlto-cache
> >
> > PHONY += help
> > help:
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 56b6ccc0e32d..a41fcb3ca7c6 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -598,6 +598,81 @@ config SHADOW_CALL_STACK
> > reading and writing arbitrary memory may be able to locate them
> > and hijack control flow by modifying the stacks.
> >
> > +config LTO
> > + bool
> > +
> > +config ARCH_SUPPORTS_LTO_CLANG
> > + bool
> > + help
> > + An architecture should select this option if it supports:
> > + - compiling with Clang,
> > + - compiling inline assembly with Clang's integrated assembler,
> > + - and linking with LLD.
> > +
> > +config ARCH_SUPPORTS_THINLTO
> > + bool
> > + help
> > + An architecture should select this option if it supports Clang's
> > + ThinLTO.
> > +
> > +config THINLTO
> > + bool "Clang ThinLTO"
> > + depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO
> > + default y
> > + help
> > + This option enables Clang's ThinLTO, which allows for parallel
> > + optimization and faster incremental compiles. More information
> > + can be found from Clang's documentation:
> > +
> > + https://clang.llvm.org/docs/ThinLTO.html
> > +
> > + If you say N here, the compiler will use full LTO, which may
> > + produce faster code, but building the kernel will be significantly
> > + slower as the linker won't efficiently utilize multiple threads.
> > +
> > + If unsure, say Y.
>
> I think the order of these new configs makes it so that ThinLTO
> appears above LTO in menuconfig; I don't like that, and wish it came
> immediately after. Does `THINLTO` have to be defined _after_ the
> choice for LTO_NONE/LTO_CLANG, perhaps?
>
> Secondly, I don't like how ThinLTO is a config and not a choice. If I
> don't set ThinLTO, what am I getting? That's a rhetorical question; I
> know its full LTO, and I guess the help text does talk about the
> tradeoffs and what you would get. I guess what's curious to me is
> "why does it display ThinLTO? Why not FullLTO?" I can't help but
> wonder if a kconfig `choice` rather than a `config` would be better
> here, that way it's more obvious the user is making a choice between
> ThinLTO vs Full LTO, rather than the current patches which look like
> "ThinkLTO on/off."

Changing the ThinLTO config to a choice and moving it after the main
LTO config sounds like a good idea to me. I'll see if I can change
this in v8. Thanks!

Sami

2020-11-20 19:51:16

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Fri, Nov 20, 2020 at 08:23:11AM -0800, Sami Tolvanen wrote:
> Changing the ThinLTO config to a choice and moving it after the main
> LTO config sounds like a good idea to me. I'll see if I can change
> this in v8. Thanks!

Originally, I thought this might be a bit ugly once GCC LTO is added,
but this could be just a choice like we're done for the stack
initialization. Something like an "LTO" choice of NONE, CLANG_FULL,
CLANG_THIN, and in the future GCC, etc.

--
Kees Cook

2020-11-20 20:22:53

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <[email protected]> wrote:
> >
> > Thanks for continuing to drive this series Sami. For the series,
> >
> > Tested-by: Nick Desaulniers <[email protected]>
> >
> > I did virtualized boot tests with the series applied to aarch64
> > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > with CONFIG_THINLTO. If you make changes to the series in follow ups,
> > please drop my tested by tag from the modified patches and I'll help
> > re-test. Some minor feedback on the Kconfig change, but I'll post it
> > off of that patch.
> >
>
> When you say 'virtualized" do you mean QEMU on x86? Or actual
> virtualization on an AArch64 KVM host?

aarch64 guest on x86_64 host. If you have additional configurations
that are important to you, additional testing help would be
appreciated.

>
> The distinction is important here, given the potential impact of LTO
> on things that QEMU simply does not model when it runs in TCG mode on
> a foreign host architecture.

--
Thanks,
~Nick Desaulniers

2020-11-20 20:27:34

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Thu, Nov 19, 2020 at 8:04 PM Josh Poimboeuf <[email protected]> wrote:
>
> On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > be used in the kernel. Google has shipped millions of Pixel devices
> > running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM bitcode,
> > which Clang produces with LTO instead of ELF object files, postponing
> > ELF processing until a later stage, and ensuring initcall ordering.
> >
> > Note that v7 brings back arm64 support as Will has now staged the
> > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > on fixing the remaining objtool warnings [2].
>
> Sami,
>
> Here are some patches to fix the objtool issues (other than crypto which
> I'll work on next).
>
> https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git objtool-vmlinux

Thanks, Josh! I can confirm that these fix all the non-crypto objtool
warnings with LTO as well.

Sami

2020-11-20 20:31:28

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Fri, Nov 20, 2020 at 11:47:21AM -0800, Kees Cook wrote:
> On Fri, Nov 20, 2020 at 08:23:11AM -0800, Sami Tolvanen wrote:
> > Changing the ThinLTO config to a choice and moving it after the main
> > LTO config sounds like a good idea to me. I'll see if I can change
> > this in v8. Thanks!
>
> Originally, I thought this might be a bit ugly once GCC LTO is added,
> but this could be just a choice like we're done for the stack
> initialization. Something like an "LTO" choice of NONE, CLANG_FULL,
> CLANG_THIN, and in the future GCC, etc.

Having two separate choices might be a little bit cleaner though? One
for the compiler (LTO_CLANG versus LTO_GCC) and one for the type
(THINLTO versus FULLLTO). The type one could just have a "depends on
CC_IS_CLANG" to ensure it only showed up when needed.

Cheers,
Nathan

2020-11-20 20:45:00

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Fri, Nov 20, 2020 at 01:29:35PM -0700, Nathan Chancellor wrote:
> On Fri, Nov 20, 2020 at 11:47:21AM -0800, Kees Cook wrote:
> > On Fri, Nov 20, 2020 at 08:23:11AM -0800, Sami Tolvanen wrote:
> > > Changing the ThinLTO config to a choice and moving it after the main
> > > LTO config sounds like a good idea to me. I'll see if I can change
> > > this in v8. Thanks!
> >
> > Originally, I thought this might be a bit ugly once GCC LTO is added,
> > but this could be just a choice like we're done for the stack
> > initialization. Something like an "LTO" choice of NONE, CLANG_FULL,
> > CLANG_THIN, and in the future GCC, etc.
>
> Having two separate choices might be a little bit cleaner though? One
> for the compiler (LTO_CLANG versus LTO_GCC) and one for the type
> (THINLTO versus FULLLTO). The type one could just have a "depends on
> CC_IS_CLANG" to ensure it only showed up when needed.

Right, that's how the stack init choice works. Kconfigs that aren't
supported by the compiler won't be shown. I.e. after Sami's future
patch, the only choice for GCC will be CONFIG_LTO_NONE. But building
under Clang, it would offer CONFIG_LTO_NONE, CONFIG_LTO_CLANG_FULL,
CONFIG_LTO_CLANG_THIN, or something.

(and I assume CONFIG_LTO would be def_bool y, depends on !LTO_NONE)

--
Kees Cook

2020-11-20 21:02:15

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Fri, Nov 20, 2020 at 12:43 PM Kees Cook <[email protected]> wrote:
>
> On Fri, Nov 20, 2020 at 01:29:35PM -0700, Nathan Chancellor wrote:
> > On Fri, Nov 20, 2020 at 11:47:21AM -0800, Kees Cook wrote:
> > > On Fri, Nov 20, 2020 at 08:23:11AM -0800, Sami Tolvanen wrote:
> > > > Changing the ThinLTO config to a choice and moving it after the main
> > > > LTO config sounds like a good idea to me. I'll see if I can change
> > > > this in v8. Thanks!
> > >
> > > Originally, I thought this might be a bit ugly once GCC LTO is added,
> > > but this could be just a choice like we're done for the stack
> > > initialization. Something like an "LTO" choice of NONE, CLANG_FULL,
> > > CLANG_THIN, and in the future GCC, etc.
> >
> > Having two separate choices might be a little bit cleaner though? One
> > for the compiler (LTO_CLANG versus LTO_GCC) and one for the type
> > (THINLTO versus FULLLTO). The type one could just have a "depends on
> > CC_IS_CLANG" to ensure it only showed up when needed.
>
> Right, that's how the stack init choice works. Kconfigs that aren't
> supported by the compiler won't be shown. I.e. after Sami's future
> patch, the only choice for GCC will be CONFIG_LTO_NONE. But building
> under Clang, it would offer CONFIG_LTO_NONE, CONFIG_LTO_CLANG_FULL,
> CONFIG_LTO_CLANG_THIN, or something.
>
> (and I assume CONFIG_LTO would be def_bool y, depends on !LTO_NONE)

I'm fine with adding ThinLTO as another option to the LTO choice, but
it would duplicate the dependencies and a lot of the help text. I
suppose we could add another config for the dependencies and have both
LTO options depend on that instead.

Sami

2020-11-20 23:33:07

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers <[email protected]> wrote:
>
> On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <[email protected]> wrote:
> >
> > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <[email protected]> wrote:
> > >
> > > Thanks for continuing to drive this series Sami. For the series,
> > >
> > > Tested-by: Nick Desaulniers <[email protected]>
> > >
> > > I did virtualized boot tests with the series applied to aarch64
> > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > > with CONFIG_THINLTO. If you make changes to the series in follow ups,
> > > please drop my tested by tag from the modified patches and I'll help
> > > re-test. Some minor feedback on the Kconfig change, but I'll post it
> > > off of that patch.
> > >
> >
> > When you say 'virtualized" do you mean QEMU on x86? Or actual
> > virtualization on an AArch64 KVM host?
>
> aarch64 guest on x86_64 host. If you have additional configurations
> that are important to you, additional testing help would be
> appreciated.
>

Could you run this on an actual phone? Or does Android already ship
with this stuff?


> >
> > The distinction is important here, given the potential impact of LTO
> > on things that QEMU simply does not model when it runs in TCG mode on
> > a foreign host architecture.
>
> --
> Thanks,
> ~Nick Desaulniers

2020-11-20 23:56:30

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Fri, Nov 20, 2020 at 3:30 PM Ard Biesheuvel <[email protected]> wrote:
>
> On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers <[email protected]> wrote:
> >
> > On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <[email protected]> wrote:
> > >
> > > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <[email protected]> wrote:
> > > >
> > > > Thanks for continuing to drive this series Sami. For the series,
> > > >
> > > > Tested-by: Nick Desaulniers <[email protected]>
> > > >
> > > > I did virtualized boot tests with the series applied to aarch64
> > > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > > > with CONFIG_THINLTO. If you make changes to the series in follow ups,
> > > > please drop my tested by tag from the modified patches and I'll help
> > > > re-test. Some minor feedback on the Kconfig change, but I'll post it
> > > > off of that patch.
> > > >
> > >
> > > When you say 'virtualized" do you mean QEMU on x86? Or actual
> > > virtualization on an AArch64 KVM host?
> >
> > aarch64 guest on x86_64 host. If you have additional configurations
> > that are important to you, additional testing help would be
> > appreciated.
> >
>
> Could you run this on an actual phone? Or does Android already ship
> with this stuff?

By `this`, if you mean "the LTO series", it has been shipping on
Android phones for years now, I think it's even required in the latest
release.

If you mean "the LTO series + mainline" on a phone, well there's the
android-mainline of https://android.googlesource.com/kernel/common/,
in which this series was recently removed in order to facilitate
rebasing Android's patches on ToT-mainline until getting the series
landed upstream. Bit of a chicken and the egg problem there.

If you mean "the LTO series + mainline + KVM" on a phone; I don't know
the precise state of aarch64 KVM and Android (Will or Marc would
know). We did experiment recently with RockPI's for aach64 KVM, IIRC;
I think Android is tricky as it still requires A64+A32/T32 chipsets,
Alistair would know more. Might be interesting to boot a virtualized
(or paravirtualized?) guest built with LTO in a host built with LTO
for sure, but I don't know if we have tried that yet (I think we did
try LTO guests of android kernels, but I think they were on the stock
RockPI host BSP image IIRC).

> > > The distinction is important here, given the potential impact of LTO
> > > on things that QEMU simply does not model when it runs in TCG mode on
> > > a foreign host architecture.

--
Thanks,
~Nick Desaulniers

2020-11-21 00:01:26

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Fri, Nov 20, 2020 at 12:58:41PM -0800, Sami Tolvanen wrote:
> On Fri, Nov 20, 2020 at 12:43 PM Kees Cook <[email protected]> wrote:
> >
> > On Fri, Nov 20, 2020 at 01:29:35PM -0700, Nathan Chancellor wrote:
> > > On Fri, Nov 20, 2020 at 11:47:21AM -0800, Kees Cook wrote:
> > > > On Fri, Nov 20, 2020 at 08:23:11AM -0800, Sami Tolvanen wrote:
> > > > > Changing the ThinLTO config to a choice and moving it after the main
> > > > > LTO config sounds like a good idea to me. I'll see if I can change
> > > > > this in v8. Thanks!
> > > >
> > > > Originally, I thought this might be a bit ugly once GCC LTO is added,
> > > > but this could be just a choice like we're done for the stack
> > > > initialization. Something like an "LTO" choice of NONE, CLANG_FULL,
> > > > CLANG_THIN, and in the future GCC, etc.
> > >
> > > Having two separate choices might be a little bit cleaner though? One
> > > for the compiler (LTO_CLANG versus LTO_GCC) and one for the type
> > > (THINLTO versus FULLLTO). The type one could just have a "depends on
> > > CC_IS_CLANG" to ensure it only showed up when needed.
> >
> > Right, that's how the stack init choice works. Kconfigs that aren't
> > supported by the compiler won't be shown. I.e. after Sami's future
> > patch, the only choice for GCC will be CONFIG_LTO_NONE. But building
> > under Clang, it would offer CONFIG_LTO_NONE, CONFIG_LTO_CLANG_FULL,
> > CONFIG_LTO_CLANG_THIN, or something.
> >
> > (and I assume CONFIG_LTO would be def_bool y, depends on !LTO_NONE)
>
> I'm fine with adding ThinLTO as another option to the LTO choice, but
> it would duplicate the dependencies and a lot of the help text. I
> suppose we could add another config for the dependencies and have both
> LTO options depend on that instead.

How about something like this? This separates the arch support, compiler
support, and user choice into three separate Kconfig areas, which I
think should work.


diff --git a/Makefile b/Makefile
index e397c4caec1b..af902718e882 100644
--- a/Makefile
+++ b/Makefile
@@ -897,7 +897,7 @@ export CC_FLAGS_SCS
endif

ifdef CONFIG_LTO_CLANG
-ifdef CONFIG_THINLTO
+ifdef CONFIG_LTO_CLANG_THIN
CC_FLAGS_LTO += -flto=thin -fsplit-lto-unit
KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache
else
diff --git a/arch/Kconfig b/arch/Kconfig
index cdd29b5fdb56..5c22e10e4c12 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -600,6 +600,14 @@ config SHADOW_CALL_STACK

config LTO
bool
+ help
+ Selected if the kernel will be built using the compiler's LTO feature.
+
+config LTO_CLANG
+ bool
+ select LTO
+ help
+ Selected if the kernel will be built using Clang's LTO feature.

config ARCH_SUPPORTS_LTO_CLANG
bool
@@ -609,28 +617,25 @@ config ARCH_SUPPORTS_LTO_CLANG
- compiling inline assembly with Clang's integrated assembler,
- and linking with LLD.

-config ARCH_SUPPORTS_THINLTO
+config ARCH_SUPPORTS_LTO_CLANG_THIN
bool
help
- An architecture should select this option if it supports Clang's
- ThinLTO.
+ An architecture should select this option if it can supports Clang's
+ ThinLTO mode.

-config THINLTO
- bool "Clang ThinLTO"
- depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO
- default y
+config HAS_LTO_CLANG
+ def_bool y
+ # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
+ depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
+ depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
+ depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
+ depends on ARCH_SUPPORTS_LTO_CLANG
+ depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
+ depends on !KASAN
+ depends on !GCOV_KERNEL
help
- This option enables Clang's ThinLTO, which allows for parallel
- optimization and faster incremental compiles. More information
- can be found from Clang's documentation:
-
- https://clang.llvm.org/docs/ThinLTO.html
-
- If you say N here, the compiler will use full LTO, which may
- produce faster code, but building the kernel will be significantly
- slower as the linker won't efficiently utilize multiple threads.
-
- If unsure, say Y.
+ The compiler and Kconfig options support building with Clang's
+ LTO.

choice
prompt "Link Time Optimization (LTO)"
@@ -644,20 +649,14 @@ choice

config LTO_NONE
bool "None"
+ help
+ Build the kernel normally, without Link Time Optimization (LTO).

-config LTO_CLANG
- bool "Clang's Link Time Optimization (EXPERIMENTAL)"
- # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
- depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
- depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
- depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
- depends on ARCH_SUPPORTS_LTO_CLANG
- depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
- depends on !KASAN
- depends on !GCOV_KERNEL
- select LTO
+config LTO_CLANG_FULL
+ bool "Clang Full LTO (EXPERIMENTAL)"
+ select LTO_CLANG
help
- This option enables Clang's Link Time Optimization (LTO), which
+ This option enables Clang's full Link Time Optimization (LTO), which
allows the compiler to optimize the kernel globally. If you enable
this option, the compiler generates LLVM bitcode instead of ELF
object files, and the actual compilation from bitcode happens at
@@ -667,9 +666,22 @@ config LTO_CLANG

https://llvm.org/docs/LinkTimeOptimization.html

- To select this option, you also need to use LLVM tools to handle
- the bitcode by passing LLVM=1 to make.
+ During link time, this option can use a large amount of RAM, and
+ may take much longer than the ThinLTO option.

+config LTO_CLANG_THIN
+ bool "Clang ThinLTO (EXPERIMENTAL)"
+ depends on ARCH_SUPPORTS_LTO_CLANG_THIN
+ select LTO_CLANG
+ help
+ This option enables Clang's ThinLTO, which allows for parallel
+ optimization and faster incremental compiles compared to the
+ CONFIG_LTO_CLANG_FULL option. More information can be found
+ from Clang's documentation:
+
+ https://clang.llvm.org/docs/ThinLTO.html
+
+ If unsure, say Y.
endchoice

config CFI_CLANG
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8bf763307544..f39df315316e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -74,7 +74,7 @@ config ARM64
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
select ARCH_SUPPORTS_LTO_CLANG
- select ARCH_SUPPORTS_THINLTO
+ select ARCH_SUPPORTS_LTO_CLANG_THIN
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && (GCC_VERSION >= 50000 || CC_IS_CLANG)
select ARCH_SUPPORTS_NUMA_BALANCING
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cb4c77a9b5ab..f99a4d3b55ae 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -93,7 +93,7 @@ config X86
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
select ARCH_SUPPORTS_LTO_CLANG if X86_64
- select ARCH_SUPPORTS_THINLTO if X86_64
+ select ARCH_SUPPORTS_LTO_CLANG_THIN if X86_64
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 3106636375c0..96505113b907 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -161,7 +161,7 @@ static unsigned long kallsyms_sym_address(int idx)
return kallsyms_relative_base - 1 - kallsyms_offsets[idx];
}

-#if defined(CONFIG_CFI_CLANG) && defined(CONFIG_THINLTO)
+#if defined(CONFIG_CFI_CLANG) && defined(CONFIG_LTO_CLANG_THIN)
/*
* LLVM appends a hash to static function names when ThinLTO and CFI are
* both enabled, which causes confusion and potentially breaks user space

--
Kees Cook

2020-11-21 01:50:43

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Fri, Nov 20, 2020 at 3:59 PM Kees Cook <[email protected]> wrote:
>
> On Fri, Nov 20, 2020 at 12:58:41PM -0800, Sami Tolvanen wrote:
> > On Fri, Nov 20, 2020 at 12:43 PM Kees Cook <[email protected]> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 01:29:35PM -0700, Nathan Chancellor wrote:
> > > > On Fri, Nov 20, 2020 at 11:47:21AM -0800, Kees Cook wrote:
> > > > > On Fri, Nov 20, 2020 at 08:23:11AM -0800, Sami Tolvanen wrote:
> > > > > > Changing the ThinLTO config to a choice and moving it after the main
> > > > > > LTO config sounds like a good idea to me. I'll see if I can change
> > > > > > this in v8. Thanks!
> > > > >
> > > > > Originally, I thought this might be a bit ugly once GCC LTO is added,
> > > > > but this could be just a choice like we're done for the stack
> > > > > initialization. Something like an "LTO" choice of NONE, CLANG_FULL,
> > > > > CLANG_THIN, and in the future GCC, etc.
> > > >
> > > > Having two separate choices might be a little bit cleaner though? One
> > > > for the compiler (LTO_CLANG versus LTO_GCC) and one for the type
> > > > (THINLTO versus FULLLTO). The type one could just have a "depends on
> > > > CC_IS_CLANG" to ensure it only showed up when needed.
> > >
> > > Right, that's how the stack init choice works. Kconfigs that aren't
> > > supported by the compiler won't be shown. I.e. after Sami's future
> > > patch, the only choice for GCC will be CONFIG_LTO_NONE. But building
> > > under Clang, it would offer CONFIG_LTO_NONE, CONFIG_LTO_CLANG_FULL,
> > > CONFIG_LTO_CLANG_THIN, or something.
> > >
> > > (and I assume CONFIG_LTO would be def_bool y, depends on !LTO_NONE)
> >
> > I'm fine with adding ThinLTO as another option to the LTO choice, but
> > it would duplicate the dependencies and a lot of the help text. I
> > suppose we could add another config for the dependencies and have both
> > LTO options depend on that instead.
>
> How about something like this? This separates the arch support, compiler
> support, and user choice into three separate Kconfig areas, which I
> think should work.

Sure, this looks good to me, I'll use this in v8. The only minor
concern I have is that ThinLTO cannot be set as the default LTO mode,
but I assume anyone who selects LTO is also capable of deciding which
mode is better for them.

> diff --git a/Makefile b/Makefile
> index e397c4caec1b..af902718e882 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -897,7 +897,7 @@ export CC_FLAGS_SCS
> endif
>
> ifdef CONFIG_LTO_CLANG
> -ifdef CONFIG_THINLTO
> +ifdef CONFIG_LTO_CLANG_THIN
> CC_FLAGS_LTO += -flto=thin -fsplit-lto-unit
> KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod-prefix).thinlto-cache
> else
> diff --git a/arch/Kconfig b/arch/Kconfig
> index cdd29b5fdb56..5c22e10e4c12 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -600,6 +600,14 @@ config SHADOW_CALL_STACK
>
> config LTO
> bool
> + help
> + Selected if the kernel will be built using the compiler's LTO feature.
> +
> +config LTO_CLANG
> + bool
> + select LTO
> + help
> + Selected if the kernel will be built using Clang's LTO feature.
>
> config ARCH_SUPPORTS_LTO_CLANG
> bool
> @@ -609,28 +617,25 @@ config ARCH_SUPPORTS_LTO_CLANG
> - compiling inline assembly with Clang's integrated assembler,
> - and linking with LLD.
>
> -config ARCH_SUPPORTS_THINLTO
> +config ARCH_SUPPORTS_LTO_CLANG_THIN
> bool
> help
> - An architecture should select this option if it supports Clang's
> - ThinLTO.
> + An architecture should select this option if it can supports Clang's
> + ThinLTO mode.
>
> -config THINLTO
> - bool "Clang ThinLTO"
> - depends on LTO_CLANG && ARCH_SUPPORTS_THINLTO
> - default y
> +config HAS_LTO_CLANG
> + def_bool y
> + # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
> + depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
> + depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
> + depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
> + depends on ARCH_SUPPORTS_LTO_CLANG
> + depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
> + depends on !KASAN
> + depends on !GCOV_KERNEL
> help
> - This option enables Clang's ThinLTO, which allows for parallel
> - optimization and faster incremental compiles. More information
> - can be found from Clang's documentation:
> -
> - https://clang.llvm.org/docs/ThinLTO.html
> -
> - If you say N here, the compiler will use full LTO, which may
> - produce faster code, but building the kernel will be significantly
> - slower as the linker won't efficiently utilize multiple threads.
> -
> - If unsure, say Y.
> + The compiler and Kconfig options support building with Clang's
> + LTO.
>
> choice
> prompt "Link Time Optimization (LTO)"
> @@ -644,20 +649,14 @@ choice
>
> config LTO_NONE
> bool "None"
> + help
> + Build the kernel normally, without Link Time Optimization (LTO).
>
> -config LTO_CLANG
> - bool "Clang's Link Time Optimization (EXPERIMENTAL)"
> - # Clang >= 11: https://github.com/ClangBuiltLinux/linux/issues/510
> - depends on CC_IS_CLANG && CLANG_VERSION >= 110000 && LD_IS_LLD
> - depends on $(success,$(NM) --help | head -n 1 | grep -qi llvm)
> - depends on $(success,$(AR) --help | head -n 1 | grep -qi llvm)
> - depends on ARCH_SUPPORTS_LTO_CLANG
> - depends on !FTRACE_MCOUNT_USE_RECORDMCOUNT
> - depends on !KASAN
> - depends on !GCOV_KERNEL
> - select LTO
> +config LTO_CLANG_FULL
> + bool "Clang Full LTO (EXPERIMENTAL)"
> + select LTO_CLANG
> help
> - This option enables Clang's Link Time Optimization (LTO), which
> + This option enables Clang's full Link Time Optimization (LTO), which
> allows the compiler to optimize the kernel globally. If you enable
> this option, the compiler generates LLVM bitcode instead of ELF
> object files, and the actual compilation from bitcode happens at
> @@ -667,9 +666,22 @@ config LTO_CLANG
>
> https://llvm.org/docs/LinkTimeOptimization.html
>
> - To select this option, you also need to use LLVM tools to handle
> - the bitcode by passing LLVM=1 to make.
> + During link time, this option can use a large amount of RAM, and
> + may take much longer than the ThinLTO option.
>
> +config LTO_CLANG_THIN
> + bool "Clang ThinLTO (EXPERIMENTAL)"
> + depends on ARCH_SUPPORTS_LTO_CLANG_THIN
> + select LTO_CLANG
> + help
> + This option enables Clang's ThinLTO, which allows for parallel
> + optimization and faster incremental compiles compared to the
> + CONFIG_LTO_CLANG_FULL option. More information can be found
> + from Clang's documentation:
> +
> + https://clang.llvm.org/docs/ThinLTO.html
> +
> + If unsure, say Y.
> endchoice

The two LTO_CLANG_* options need to depend on HAS_LTO_CLANG, of course.

Sami

2020-11-21 03:17:51

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Fri, Nov 20, 2020 at 11:29:51AM +0100, Ard Biesheuvel wrote:
> On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <[email protected]> wrote:
> >
> > On Wed, Nov 18, 2020 at 2:07 PM Sami Tolvanen <[email protected]> wrote:
> > >
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > be used in the kernel. Google has shipped millions of Pixel devices
> > > running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM bitcode,
> > > which Clang produces with LTO instead of ELF object files, postponing
> > > ELF processing until a later stage, and ensuring initcall ordering.
> > >
> > > Note that v7 brings back arm64 support as Will has now staged the
> > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > on fixing the remaining objtool warnings [2].
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> > > [2] https://lore.kernel.org/lkml/20201114004911.aip52eimk6c2uxd4@treble/
> > >
> > > You can also pull this series from
> > >
> > > https://github.com/samitolvanen/linux.git lto-v7
> >
> > Thanks for continuing to drive this series Sami. For the series,
> >
> > Tested-by: Nick Desaulniers <[email protected]>
> >
> > I did virtualized boot tests with the series applied to aarch64
> > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > with CONFIG_THINLTO. If you make changes to the series in follow ups,
> > please drop my tested by tag from the modified patches and I'll help
> > re-test. Some minor feedback on the Kconfig change, but I'll post it
> > off of that patch.
> >
>
> When you say 'virtualized" do you mean QEMU on x86? Or actual
> virtualization on an AArch64 KVM host?
>
> The distinction is important here, given the potential impact of LTO
> on things that QEMU simply does not model when it runs in TCG mode on
> a foreign host architecture.

I have booted this series on my Raspberry Pi 4 (ARCH=arm64 defconfig).

$ uname -r
5.10.0-rc4-00108-g830200082c74

$ zgrep LTO /proc/config.gz
CONFIG_LTO=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_THINLTO=y
CONFIG_THINLTO=y
# CONFIG_LTO_NONE is not set
CONFIG_LTO_CLANG=y
# CONFIG_HID_WALTOP is not set

and I have taken that same kernel and booted it under QEMU with
'-enable-kvm' without any visible issues.

I have tested four combinations:

clang 12 @ f9f0a4046e11c2b4c130640f343e3b2b5db08c1:
* CONFIG_THINLTO=y
* CONFIG_THINLTO=n

clang 11.0.0
* CONFIG_THINLTO=y
* CONFIG_THINLTO=n

Tested-by: Nathan Chancellor <[email protected]>

Cheers,
Nathan

2020-11-21 07:39:43

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Sat, 21 Nov 2020 at 00:53, Nick Desaulniers <[email protected]> wrote:
>
> On Fri, Nov 20, 2020 at 3:30 PM Ard Biesheuvel <[email protected]> wrote:
> >
> > On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers <[email protected]> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <[email protected]> wrote:
> > > >
> > > > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <[email protected]> wrote:
> > > > >
> > > > > Thanks for continuing to drive this series Sami. For the series,
> > > > >
> > > > > Tested-by: Nick Desaulniers <[email protected]>
> > > > >
> > > > > I did virtualized boot tests with the series applied to aarch64
> > > > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
> > > > > with CONFIG_THINLTO. If you make changes to the series in follow ups,
> > > > > please drop my tested by tag from the modified patches and I'll help
> > > > > re-test. Some minor feedback on the Kconfig change, but I'll post it
> > > > > off of that patch.
> > > > >
> > > >
> > > > When you say 'virtualized" do you mean QEMU on x86? Or actual
> > > > virtualization on an AArch64 KVM host?
> > >
> > > aarch64 guest on x86_64 host. If you have additional configurations
> > > that are important to you, additional testing help would be
> > > appreciated.
> > >
> >
> > Could you run this on an actual phone? Or does Android already ship
> > with this stuff?
>
> By `this`, if you mean "the LTO series", it has been shipping on
> Android phones for years now, I think it's even required in the latest
> release.
>
> If you mean "the LTO series + mainline" on a phone, well there's the
> android-mainline of https://android.googlesource.com/kernel/common/,
> in which this series was recently removed in order to facilitate
> rebasing Android's patches on ToT-mainline until getting the series
> landed upstream. Bit of a chicken and the egg problem there.
>
> If you mean "the LTO series + mainline + KVM" on a phone; I don't know
> the precise state of aarch64 KVM and Android (Will or Marc would
> know). We did experiment recently with RockPI's for aach64 KVM, IIRC;
> I think Android is tricky as it still requires A64+A32/T32 chipsets,
> Alistair would know more. Might be interesting to boot a virtualized
> (or paravirtualized?) guest built with LTO in a host built with LTO
> for sure, but I don't know if we have tried that yet (I think we did
> try LTO guests of android kernels, but I think they were on the stock
> RockPI host BSP image IIRC).
>

I don't think testing under KVM gives us more confidence or coverage
than testing on bare metal. I was just pointing out that 'virtualized'
is misleading, and if you test things under QEMU/x86 + TCG, it is
better to be clear about this, and refer to it as 'under emulation'.

2020-11-21 11:47:28

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On 2020-11-20 23:53, Nick Desaulniers wrote:
> On Fri, Nov 20, 2020 at 3:30 PM Ard Biesheuvel <[email protected]> wrote:
>>
>> On Fri, 20 Nov 2020 at 21:19, Nick Desaulniers
>> <[email protected]> wrote:
>> >
>> > On Fri, Nov 20, 2020 at 2:30 AM Ard Biesheuvel <[email protected]> wrote:
>> > >
>> > > On Thu, 19 Nov 2020 at 00:42, Nick Desaulniers <[email protected]> wrote:
>> > > >
>> > > > Thanks for continuing to drive this series Sami. For the series,
>> > > >
>> > > > Tested-by: Nick Desaulniers <[email protected]>
>> > > >
>> > > > I did virtualized boot tests with the series applied to aarch64
>> > > > defconfig without CONFIG_LTO, with CONFIG_LTO_CLANG, and a third time
>> > > > with CONFIG_THINLTO. If you make changes to the series in follow ups,
>> > > > please drop my tested by tag from the modified patches and I'll help
>> > > > re-test. Some minor feedback on the Kconfig change, but I'll post it
>> > > > off of that patch.
>> > > >
>> > >
>> > > When you say 'virtualized" do you mean QEMU on x86? Or actual
>> > > virtualization on an AArch64 KVM host?
>> >
>> > aarch64 guest on x86_64 host. If you have additional configurations
>> > that are important to you, additional testing help would be
>> > appreciated.
>> >
>>
>> Could you run this on an actual phone? Or does Android already ship
>> with this stuff?
>
> By `this`, if you mean "the LTO series", it has been shipping on
> Android phones for years now, I think it's even required in the latest
> release.
>
> If you mean "the LTO series + mainline" on a phone, well there's the
> android-mainline of https://android.googlesource.com/kernel/common/,
> in which this series was recently removed in order to facilitate
> rebasing Android's patches on ToT-mainline until getting the series
> landed upstream. Bit of a chicken and the egg problem there.
>
> If you mean "the LTO series + mainline + KVM" on a phone; I don't know
> the precise state of aarch64 KVM and Android (Will or Marc would
> know).

If you are lucky enough to have an Android system booting at EL2,
KVM should just works [1], though I haven't tried with this series.

> We did experiment recently with RockPI's for aach64 KVM, IIRC;
> I think Android is tricky as it still requires A64+A32/T32 chipsets,

Which is about 100% of the Android systems at the moment (I don't think
any of the asymmetric SoCs are in the wild yet). It doesn't really
affect
KVM anyway.

M.

[1] with the broken firmware gotchas that I believed to be erradicated
8 years ago, but are still prevalent in the Android world: laughable
PSCI implementation, invalid CNTFRQ_EL0...
--
Who you jivin' with that Cosmik Debris?

2020-11-21 20:14:19

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v7 02/17] kbuild: add support for Clang LTO

On Fri, Nov 20, 2020 at 05:46:44PM -0800, Sami Tolvanen wrote:
> Sure, this looks good to me, I'll use this in v8. The only minor
> concern I have is that ThinLTO cannot be set as the default LTO mode,
> but I assume anyone who selects LTO is also capable of deciding which
> mode is better for them.

It could be re-arranged similar to what you had before, but like:

config LTO
bool "..."
depends on HAS_LTO
help
...

choice
prompt "LTO mode" if LTO
default LTO_GCC if HAS_LTO_GCC
default LTO_CLANG_THIN if HAS_LTO_CLANG
default LTO_CLANG_FULL
help
...

config LTO_CLANG_THIN
...

config LTO_CLANG_FULL
endchoice

Then the LTO is top-level yes/no, but depends on detected capabilities,
and the mode is visible if LTO is chosen, etc.

I'm not really sure which is better...

> > +config LTO_CLANG_THIN
> > + bool "Clang ThinLTO (EXPERIMENTAL)"
> > + depends on ARCH_SUPPORTS_LTO_CLANG_THIN
> > + select LTO_CLANG
> > + help
> > + This option enables Clang's ThinLTO, which allows for parallel
> > + optimization and faster incremental compiles compared to the
> > + CONFIG_LTO_CLANG_FULL option. More information can be found
> > + from Clang's documentation:
> > +
> > + https://clang.llvm.org/docs/ThinLTO.html
> > +
> > + If unsure, say Y.
> > endchoice
>
> The two LTO_CLANG_* options need to depend on HAS_LTO_CLANG, of course.

Whoops, yes. Thanks for catching that. :)

--
Kees Cook

2020-11-23 10:24:52

by David Brazdil

[permalink] [raw]
Subject: Re: [PATCH v7 15/17] KVM: arm64: disable LTO for the nVHE directory

Hey Sami,

On Wed, Nov 18, 2020 at 02:07:29PM -0800, Sami Tolvanen wrote:
> We use objcopy to manipulate ELF binaries for the nVHE code,
> which fails with LTO as the compiler produces LLVM bitcode
> instead. Disable LTO for this code to allow objcopy to be used.

We now partially link the nVHE code (generating machine code) before objcopy,
so I think you should be able to drop this patch now. Tried building your
branch without it, ran a couple of unit tests and all seems fine.

David

2020-11-23 18:36:26

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 15/17] KVM: arm64: disable LTO for the nVHE directory

On Mon, Nov 23, 2020 at 2:21 AM David Brazdil <[email protected]> wrote:
>
> Hey Sami,
>
> On Wed, Nov 18, 2020 at 02:07:29PM -0800, Sami Tolvanen wrote:
> > We use objcopy to manipulate ELF binaries for the nVHE code,
> > which fails with LTO as the compiler produces LLVM bitcode
> > instead. Disable LTO for this code to allow objcopy to be used.
>
> We now partially link the nVHE code (generating machine code) before objcopy,
> so I think you should be able to drop this patch now. Tried building your
> branch without it, ran a couple of unit tests and all seems fine.

Great, thanks for testing this, David! I'll drop this patch from v8.

Sami

2020-11-30 11:56:46

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v7 14/17] arm64: vdso: disable LTO

On Wed, Nov 18, 2020 at 02:07:28PM -0800, Sami Tolvanen wrote:
> Disable LTO for the vDSO by filtering out CC_FLAGS_LTO, as there's no
> point in using link-time optimization for the small about of C code.

"about" => "amount" ?

>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> ---
> arch/arm64/kernel/vdso/Makefile | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)

With the typo fixed:

Acked-by: Will Deacon <[email protected]>

Will

2020-11-30 12:02:41

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v7 16/17] arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS

On Wed, Nov 18, 2020 at 02:07:30PM -0800, Sami Tolvanen wrote:
> DYNAMIC_FTRACE_WITH_REGS uses -fpatchable-function-entry, which makes
> running recordmcount unnecessary as there are no mcount calls in object
> files, and __mcount_loc doesn't need to be generated.
>
> While there's normally no harm in running recordmcount even when it's
> not strictly needed, this won't work with LTO as we have LLVM bitcode
> instead of ELF objects.
>
> This change selects FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY, which
> disables recordmcount when patchable function entries are used instead.
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> ---
> arch/arm64/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1515f6f153a0..c7f07978f5b6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -158,6 +158,8 @@ config ARM64
> select HAVE_DYNAMIC_FTRACE
> select HAVE_DYNAMIC_FTRACE_WITH_REGS \
> if $(cc-option,-fpatchable-function-entry=2)
> + select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY \
> + if DYNAMIC_FTRACE_WITH_REGS

I don't really understand why this is in the arch header file, rather
than have the core code check for "fpatchable-function-entry=2" and expose
a CC_HAS_PATCHABLE_FUNCTION_ENTRY, but in the interest of making some
progress on this series:

Acked-by: Will Deacon <[email protected]>

Will

2020-11-30 12:04:12

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v7 17/17] arm64: allow LTO_CLANG and THINLTO to be selected

On Wed, Nov 18, 2020 at 02:07:31PM -0800, Sami Tolvanen wrote:
> Allow CONFIG_LTO_CLANG and CONFIG_THINLTO to be enabled.
>
> Signed-off-by: Sami Tolvanen <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> ---
> arch/arm64/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c7f07978f5b6..56bd83a764f4 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -73,6 +73,8 @@ config ARM64
> select ARCH_USE_SYM_ANNOTATIONS
> select ARCH_SUPPORTS_MEMORY_FAILURE
> select ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK
> + select ARCH_SUPPORTS_LTO_CLANG
> + select ARCH_SUPPORTS_THINLTO

Acked-by: Will Deacon <[email protected]>

Will

2020-11-30 12:04:53

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

Hi Sami,

On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> This patch series adds support for building the kernel with Clang's
> Link Time Optimization (LTO). In addition to performance, the primary
> motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> be used in the kernel. Google has shipped millions of Pixel devices
> running three major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM bitcode,
> which Clang produces with LTO instead of ELF object files, postponing
> ELF processing until a later stage, and ensuring initcall ordering.
>
> Note that v7 brings back arm64 support as Will has now staged the
> prerequisite memory ordering patches [1], and drops x86_64 while we work
> on fixing the remaining objtool warnings [2].

Sounds like you're going to post a v8, but that's the plan for merging
that? The arm64 parts look pretty good to me now.

Will

2020-11-30 23:49:58

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 14/17] arm64: vdso: disable LTO

On Mon, Nov 30, 2020 at 3:52 AM Will Deacon <[email protected]> wrote:
>
> On Wed, Nov 18, 2020 at 02:07:28PM -0800, Sami Tolvanen wrote:
> > Disable LTO for the vDSO by filtering out CC_FLAGS_LTO, as there's no
> > point in using link-time optimization for the small about of C code.
>
> "about" => "amount" ?

Oops, I'll fix that in v8. Thanks!

Sami

2020-12-01 19:54:57

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Tue, Dec 1, 2020 at 9:31 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > Hi Sami,
> >
> > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > be used in the kernel. Google has shipped millions of Pixel devices
> > > running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM bitcode,
> > > which Clang produces with LTO instead of ELF object files, postponing
> > > ELF processing until a later stage, and ensuring initcall ordering.
> > >
> > > Note that v7 brings back arm64 support as Will has now staged the
> > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > on fixing the remaining objtool warnings [2].
> >
> > Sounds like you're going to post a v8, but that's the plan for merging
> > that? The arm64 parts look pretty good to me now.
>
> I haven't seen Masahiro comment on this in a while, so given the review
> history and its use (for years now) in Android, I will carry v8 (assuming
> all is fine with it) it in -next unless there are objections.

I had some minor stylistic feedback on the Kconfig changes; I'm happy
for you to land the bulk of the changes and then I follow up with
patches to the Kconfig after.
--
Thanks,
~Nick Desaulniers

2020-12-01 21:42:26

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Tue, Dec 1, 2020 at 11:51 AM 'Nick Desaulniers' via Clang Built
Linux <[email protected]> wrote:
>
> On Tue, Dec 1, 2020 at 9:31 AM Kees Cook <[email protected]> wrote:
> >
> > On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > > Hi Sami,
> > >
> > > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > > This patch series adds support for building the kernel with Clang's
> > > > Link Time Optimization (LTO). In addition to performance, the primary
> > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > > be used in the kernel. Google has shipped millions of Pixel devices
> > > > running three major kernel versions with LTO+CFI since 2018.
> > > >
> > > > Most of the patches are build system changes for handling LLVM bitcode,
> > > > which Clang produces with LTO instead of ELF object files, postponing
> > > > ELF processing until a later stage, and ensuring initcall ordering.
> > > >
> > > > Note that v7 brings back arm64 support as Will has now staged the
> > > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > > on fixing the remaining objtool warnings [2].
> > >
> > > Sounds like you're going to post a v8, but that's the plan for merging
> > > that? The arm64 parts look pretty good to me now.
> >
> > I haven't seen Masahiro comment on this in a while, so given the review
> > history and its use (for years now) in Android, I will carry v8 (assuming
> > all is fine with it) it in -next unless there are objections.
>
> I had some minor stylistic feedback on the Kconfig changes; I'm happy
> for you to land the bulk of the changes and then I follow up with
> patches to the Kconfig after.

These are included in v8, which I just sent out.

Sami

2020-12-01 22:39:48

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> Hi Sami,
>
> On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > be used in the kernel. Google has shipped millions of Pixel devices
> > running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM bitcode,
> > which Clang produces with LTO instead of ELF object files, postponing
> > ELF processing until a later stage, and ensuring initcall ordering.
> >
> > Note that v7 brings back arm64 support as Will has now staged the
> > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > on fixing the remaining objtool warnings [2].
>
> Sounds like you're going to post a v8, but that's the plan for merging
> that? The arm64 parts look pretty good to me now.

I haven't seen Masahiro comment on this in a while, so given the review
history and its use (for years now) in Android, I will carry v8 (assuming
all is fine with it) it in -next unless there are objections.

--
Kees Cook

2020-12-02 02:46:35

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Wed, Dec 2, 2020 at 2:31 AM Kees Cook <[email protected]> wrote:
>
> On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > Hi Sami,
> >
> > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > This patch series adds support for building the kernel with Clang's
> > > Link Time Optimization (LTO). In addition to performance, the primary
> > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > be used in the kernel. Google has shipped millions of Pixel devices
> > > running three major kernel versions with LTO+CFI since 2018.
> > >
> > > Most of the patches are build system changes for handling LLVM bitcode,
> > > which Clang produces with LTO instead of ELF object files, postponing
> > > ELF processing until a later stage, and ensuring initcall ordering.
> > >
> > > Note that v7 brings back arm64 support as Will has now staged the
> > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > on fixing the remaining objtool warnings [2].
> >
> > Sounds like you're going to post a v8, but that's the plan for merging
> > that? The arm64 parts look pretty good to me now.
>
> I haven't seen Masahiro comment on this in a while, so given the review
> history and its use (for years now) in Android, I will carry v8 (assuming
> all is fine with it) it in -next unless there are objections.


What I dislike about this implementation is
it cannot drop any unreachable function/data.
(and it is completely different from GCC LTO)

This is not real LTO.




> --
> Kees Cook
>
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/202012010929.3788AF5%40keescook.



--
Best Regards
Masahiro Yamada

2020-12-02 05:50:22

by Sami Tolvanen

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Tue, Dec 1, 2020 at 6:43 PM Masahiro Yamada <[email protected]> wrote:
>
> On Wed, Dec 2, 2020 at 2:31 AM Kees Cook <[email protected]> wrote:
> >
> > On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > > Hi Sami,
> > >
> > > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > > This patch series adds support for building the kernel with Clang's
> > > > Link Time Optimization (LTO). In addition to performance, the primary
> > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > > be used in the kernel. Google has shipped millions of Pixel devices
> > > > running three major kernel versions with LTO+CFI since 2018.
> > > >
> > > > Most of the patches are build system changes for handling LLVM bitcode,
> > > > which Clang produces with LTO instead of ELF object files, postponing
> > > > ELF processing until a later stage, and ensuring initcall ordering.
> > > >
> > > > Note that v7 brings back arm64 support as Will has now staged the
> > > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > > on fixing the remaining objtool warnings [2].
> > >
> > > Sounds like you're going to post a v8, but that's the plan for merging
> > > that? The arm64 parts look pretty good to me now.
> >
> > I haven't seen Masahiro comment on this in a while, so given the review
> > history and its use (for years now) in Android, I will carry v8 (assuming
> > all is fine with it) it in -next unless there are objections.
>
>
> What I dislike about this implementation is
> it cannot drop any unreachable function/data.
> (and it is completely different from GCC LTO)
>
> This is not real LTO.

I'm not sure I understand your concern. LTO cannot drop functions or
data from vmlinux.o that may be referenced externally. However, with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, the linker certainly can drop
unused functions and data when linking vmlinux, and there's no reason
this option can't be used together with LTO. In fact, Pixel 3 does
enable this option, but in our experience, there isn't much unused
code or data to remove, so later devices no longer use it.

There's technically no reason why we couldn't postpone LTO until we
link vmlinux instead, and thus allow the linker to possibly remove
more unused code without the help of --gc-sections, but at least with
the current build process, that would involve performing the slow LTO
link step multiple times, which isn't worth it when we can get the
performance benefits (and CFI) already when linking vmlinux.o with
LTO.

Sami

2020-12-02 18:59:20

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] Add support for Clang LTO

On Wed, Dec 02, 2020 at 11:42:21AM +0900, Masahiro Yamada wrote:
> On Wed, Dec 2, 2020 at 2:31 AM Kees Cook <[email protected]> wrote:
> >
> > On Mon, Nov 30, 2020 at 12:01:31PM +0000, Will Deacon wrote:
> > > Hi Sami,
> > >
> > > On Wed, Nov 18, 2020 at 02:07:14PM -0800, Sami Tolvanen wrote:
> > > > This patch series adds support for building the kernel with Clang's
> > > > Link Time Optimization (LTO). In addition to performance, the primary
> > > > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI) to
> > > > be used in the kernel. Google has shipped millions of Pixel devices
> > > > running three major kernel versions with LTO+CFI since 2018.
> > > >
> > > > Most of the patches are build system changes for handling LLVM bitcode,
> > > > which Clang produces with LTO instead of ELF object files, postponing
> > > > ELF processing until a later stage, and ensuring initcall ordering.
> > > >
> > > > Note that v7 brings back arm64 support as Will has now staged the
> > > > prerequisite memory ordering patches [1], and drops x86_64 while we work
> > > > on fixing the remaining objtool warnings [2].
> > >
> > > Sounds like you're going to post a v8, but that's the plan for merging
> > > that? The arm64 parts look pretty good to me now.
> >
> > I haven't seen Masahiro comment on this in a while, so given the review
> > history and its use (for years now) in Android, I will carry v8 (assuming
> > all is fine with it) it in -next unless there are objections.
>
>
> What I dislike about this implementation is
> it cannot drop any unreachable function/data.
> (and it is completely different from GCC LTO)

This seems to be an orthogonal concern: the kernel doesn't have GCC LTO
support either (though much of Sami's work is required for GCC LTO too).

> This is not real LTO.

I don't know what you're defining as "real LTO", but this is, very much,
Link Time Optimization: the compiler has access to the entire code at
once, and it is therefore in a position to perform many manipulations to
the code. As Sami mentioned, perhaps you're thinking specifically of
dead code elimination? That's a specific optimization.

> [thread[1] merging]
> This help document is misleading.
> People who read the document would misunderstand how great this feature would.
>
> This should be added in the commit log and Kconfig help:
>
> In contrast to the example in the documentation, Clang LTO
> for the kernel cannot remove any unreachable function or data.
> In fact, this results in even bigger vmlinux and modules.

Which LTO passes are happening, how optimization are being performed,
etc, are endlessly tunable, but we can't work on that tuning without
the infrastructure to perform an LTO build in the first place. We need
to land the support, and go from there. As written, it works very well
for arm64 (which is what v8 targets specifically) and the results have
been running on millions of Android phones for years now. If further
tuning needs to happen for other architectures, config combinations, etc,
those can and will be developed. (For example, x86 is around the corner,
once some false positive warnings from objtool get hammered out, etc.)

I still want this in -next so we can build on it and improve it -- it
has been stuck in limbo for too long.

-Kees

[1] https://lore.kernel.org/kernel-hardening/CAK7LNASMh1KysAB4+gU7_iuTW+5GT2_yMDevwpLwx0iqjxwmWw@mail.gmail.com/

--
Kees Cook