2022-11-14 11:48:13

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 00/46] gcc-LTO support for the kernel

Hi,

this is the first call for comments (and kbuild complaints) for this
support of gcc (full) LTO in the kernel. Most of the patches come from
Andi. Me and Martin rebased them to new kernels and fixed the to-use
known issues. Also I updated most of the commit logs and reordered the
patches to groups of patches with similar intent.

The very first patch comes from Alexander and is pending on some x86
queue already (I believe). I am attaching it only for completeness.
Without that, the kernel does not boot (LTO reorders a lot).

In our measurements, the performance differences are negligible.

The kernel is bigger with gcc LTO due to more inlining. The next step
might be to play with non-static functions as we export everything, so
the compiler cannot actually drop anything (esp. inlined and no longer
needed functions).

Cc: Alexander Potapenko <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Alexey Makhalov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: H.J. Lu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jan Hubicka <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Joe Lawrence <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Martin Liska <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Oleksandr Tyshchenko <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Richard Biener <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: VMware PV-Drivers Reviewers <[email protected]>
Cc: Yonghong Song <[email protected]>

Alexander Lobakin (1):
x86/boot: robustify calling startup_{32,64}() from the decompressor
code

Andi Kleen (36):
Compiler Attributes, lto: introduce __noreorder
tracepoint, lto: Mark static call functions as __visible
static_call, lto: Mark static keys as __visible
static_call, lto: Mark static_call_return0() as __visible
static_call, lto: Mark func_a() as __visible_on_lto
x86/alternative, lto: Mark int3_*() as global and __visible
x86/paravirt, lto: Mark native_steal_clock() as __visible_on_lto
x86/preempt, lto: Mark preempt_schedule_*thunk() as __visible
x86/xen, lto: Mark xen_vcpu_stolen() as __visible
x86, lto: Mark gdt_page and native_sched_clock() as __visible
amd, lto: Mark amd pmu and pstate functions as __visible_on_lto
entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible
export, lto: Mark __kstrtab* in EXPORT_SYMBOL() as global and
__visible
softirq, lto: Mark irq_enter/exit_rcu() as __visible
btf, lto: Make all BTF IDs global on LTO
init.h, lto: mark initcalls as __noreorder
bpf, lto: mark interpreter jump table as __noreorder
sched, lto: mark sched classes as __noreorder
linkage, lto: use C version for SYSCALL_ALIAS() / cond_syscall()
scripts, lto: re-add gcc-ld
scripts, lto: use CONFIG_LTO for many LTO specific actions
Kbuild, lto: Add Link Time Optimization support
x86/purgatory, lto: Disable gcc LTO for purgatory
x86/realmode, lto: Disable gcc LTO for real mode code
x86/vdso, lto: Disable gcc LTO for the vdso
scripts, lto: disable gcc LTO for some mod sources
Kbuild, lto: disable gcc LTO for bounds+asm-offsets
lib/string, lto: disable gcc LTO for string.o
Compiler attributes, lto: disable __flatten with LTO
Kbuild, lto: don't include weak source file symbols in System.map
x86, lto: Disable relative init pointers with gcc LTO
x86/livepatch, lto: Disable live patching with gcc LTO
x86/lib, lto: Mark 32bit mem{cpy,move,set} as __used
scripts, lto: check C symbols for modversions
scripts/bloat-o-meter, lto: handle gcc LTO
x86, lto: Finally enable gcc LTO for x86

Jiri Slaby (5):
kbuild: pass jobserver to cmd_ld_vmlinux.o
compiler.h: introduce __visible_on_lto
compiler.h: introduce __global_on_lto
btf, lto: pass scope as strings
x86/apic, lto: Mark apic_driver*() as __noreorder

Martin Liska (4):
kbuild: lto: preserve MAKEFLAGS for module linking
x86/sev, lto: Mark cpuid_table_copy as __visible_on_lto
mm/kasan, lto: Mark kasan mem{cpy,move,set} as __used
kasan, lto: remove extra BUILD_BUG() in memory_is_poisoned

Documentation/kbuild/index.rst | 2 +
Documentation/kbuild/lto-build.rst | 76 +++++++++++++++++++++++++++++
Kbuild | 3 ++
Makefile | 6 ++-
arch/Kconfig | 52 ++++++++++++++++++++
arch/x86/Kconfig | 5 +-
arch/x86/boot/compressed/head_32.S | 2 +-
arch/x86/boot/compressed/head_64.S | 2 +-
arch/x86/boot/compressed/misc.c | 16 +++---
arch/x86/entry/vdso/Makefile | 2 +
arch/x86/events/amd/core.c | 2 +-
arch/x86/include/asm/apic.h | 4 +-
arch/x86/include/asm/preempt.h | 4 +-
arch/x86/kernel/alternative.c | 5 +-
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/paravirt.c | 2 +-
arch/x86/kernel/sev-shared.c | 2 +-
arch/x86/kernel/tsc.c | 2 +-
arch/x86/lib/memcpy_32.c | 6 +--
arch/x86/purgatory/Makefile | 2 +
arch/x86/realmode/Makefile | 1 +
drivers/cpufreq/amd-pstate.c | 15 +++---
drivers/xen/time.c | 2 +-
include/asm-generic/vmlinux.lds.h | 2 +-
include/linux/btf_ids.h | 24 ++++-----
include/linux/compiler.h | 8 +++
include/linux/compiler_attributes.h | 15 ++++++
include/linux/export.h | 6 ++-
include/linux/init.h | 2 +-
include/linux/linkage.h | 16 +++---
include/linux/static_call.h | 12 ++---
include/linux/tracepoint.h | 4 +-
kernel/bpf/core.c | 2 +-
kernel/entry/common.c | 2 +-
kernel/kallsyms.c | 2 +-
kernel/livepatch/Kconfig | 1 +
kernel/sched/sched.h | 1 +
kernel/softirq.c | 4 +-
kernel/static_call.c | 2 +-
kernel/static_call_inline.c | 6 +--
kernel/time/posix-stubs.c | 19 +++++++-
lib/Makefile | 2 +
mm/kasan/generic.c | 2 +-
mm/kasan/shadow.c | 6 +--
scripts/Makefile.build | 17 ++++---
scripts/Makefile.lib | 2 +-
scripts/Makefile.lto | 43 ++++++++++++++++
scripts/Makefile.modfinal | 2 +-
scripts/Makefile.vmlinux | 3 +-
scripts/Makefile.vmlinux_o | 6 +--
scripts/bloat-o-meter | 2 +-
scripts/gcc-ld | 40 +++++++++++++++
scripts/link-vmlinux.sh | 9 ++--
scripts/mksysmap | 2 +
scripts/mod/Makefile | 3 ++
scripts/module.lds.S | 2 +-
56 files changed, 384 insertions(+), 100 deletions(-)
create mode 100644 Documentation/kbuild/lto-build.rst
create mode 100644 scripts/Makefile.lto
create mode 100755 scripts/gcc-ld

--
2.38.1



2022-11-14 11:48:21

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 09/46] static_call, lto: Mark static_call_return0() as __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark static_call_return0() as __visible.

Cc: Peter Zijlstra <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/static_call.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/static_call.c b/kernel/static_call.c
index e9c3e69f3837..9197fe86d8bd 100644
--- a/kernel/static_call.c
+++ b/kernel/static_call.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/static_call.h>

-long __static_call_return0(void)
+__visible long __static_call_return0(void)
{
return 0;
}
--
2.38.1


2022-11-14 11:48:25

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 08/46] static_call, lto: Mark static keys as __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark static call functions as __visible, namely static keys here.

Cc: Peter Zijlstra <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/static_call.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index df53bed9d71f..e629ab0c4ca3 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -182,7 +182,7 @@ extern long __static_call_return0(void);

#define DEFINE_STATIC_CALL(name, _func) \
DECLARE_STATIC_CALL(name, _func); \
- struct static_call_key STATIC_CALL_KEY(name) = { \
+ __visible struct static_call_key STATIC_CALL_KEY(name) = { \
.func = _func, \
.type = 1, \
}; \
@@ -190,7 +190,7 @@ extern long __static_call_return0(void);

#define DEFINE_STATIC_CALL_NULL(name, _func) \
DECLARE_STATIC_CALL(name, _func); \
- struct static_call_key STATIC_CALL_KEY(name) = { \
+ __visible struct static_call_key STATIC_CALL_KEY(name) = { \
.func = NULL, \
.type = 1, \
}; \
@@ -198,7 +198,7 @@ extern long __static_call_return0(void);

#define DEFINE_STATIC_CALL_RET0(name, _func) \
DECLARE_STATIC_CALL(name, _func); \
- struct static_call_key STATIC_CALL_KEY(name) = { \
+ __visible struct static_call_key STATIC_CALL_KEY(name) = { \
.func = __static_call_return0, \
.type = 1, \
}; \
@@ -227,14 +227,14 @@ static inline int static_call_init(void) { return 0; }

#define DEFINE_STATIC_CALL(name, _func) \
DECLARE_STATIC_CALL(name, _func); \
- struct static_call_key STATIC_CALL_KEY(name) = { \
+ __visible struct static_call_key STATIC_CALL_KEY(name) = { \
.func = _func, \
}; \
ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func)

#define DEFINE_STATIC_CALL_NULL(name, _func) \
DECLARE_STATIC_CALL(name, _func); \
- struct static_call_key STATIC_CALL_KEY(name) = { \
+ __visible struct static_call_key STATIC_CALL_KEY(name) = { \
.func = NULL, \
}; \
ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)
@@ -288,7 +288,7 @@ static inline long __static_call_return0(void)

#define __DEFINE_STATIC_CALL(name, _func, _func_init) \
DECLARE_STATIC_CALL(name, _func); \
- struct static_call_key STATIC_CALL_KEY(name) = { \
+ __visible struct static_call_key STATIC_CALL_KEY(name) = { \
.func = _func_init, \
}

--
2.38.1


2022-11-14 11:49:09

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 11/46] x86/alternative, lto: Mark int3_*() as global and __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark int3_magic() and int3_selftest_ip() as global and __visible.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/alternative.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5cadcea035e0..05e5eb9cbd51 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -823,11 +823,12 @@ extern struct paravirt_patch_site __start_parainstructions[],
* convention such that we can 'call' it from assembly.
*/

-extern void int3_magic(unsigned int *ptr); /* defined in asm */
+extern __visible void int3_magic(unsigned int *ptr); /* defined in asm */

asm (
" .pushsection .init.text, \"ax\", @progbits\n"
" .type int3_magic, @function\n"
+" .globl int3_magic\n"
"int3_magic:\n"
ANNOTATE_NOENDBR
" movl $1, (%" _ASM_ARG1 ")\n"
@@ -836,7 +837,7 @@ asm (
" .popsection\n"
);

-extern void int3_selftest_ip(void); /* defined in asm below */
+extern __visible void int3_selftest_ip(void); /* defined in asm below */

static int __init
int3_exception_notify(struct notifier_block *self, unsigned long val, void *data)
--
2.38.1


2022-11-14 11:49:10

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 12/46] x86/paravirt, lto: Mark native_steal_clock() as __visible_on_lto

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark native_steal_clock() as __visible_on_lto.

[js] use __visible_on_lto

Cc: Juergen Gross <[email protected]>
Cc: "Srivatsa S. Bhat
Cc: Alexey Makhalov <[email protected]>
Cc: VMware PV-Drivers Reviewers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/paravirt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 7ca2d46c08cc..27a537cd4b0e 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -120,7 +120,7 @@ unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr,
struct static_key paravirt_steal_enabled;
struct static_key paravirt_steal_rq_enabled;

-static u64 native_steal_clock(int cpu)
+__visible_on_lto u64 native_steal_clock(int cpu)
{
return 0;
}
--
2.38.1


2022-11-14 11:57:42

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 28/46] scripts, lto: re-add gcc-ld

From: Andi Kleen <[email protected]>

The primary goal of the script is to mangle linker command line arguments
into something which gcc understands. Such as converting "-z now" into
"-Wl,-z,now".

The script was removed by commit 86879fd277e8 (scripts: remove obsolete
gcc-ld script) as there was no use in the kernel. It had been added long
time ago to support exactly these lto patches, so we need to add it back
now.

Since the removed version, it is improved a bit:
* some missing linker and gcc command line arguments were added, and
* when V=1 is specified, it prints the final gcc command line

[js] rebase + commit message massage

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/gcc-ld | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
create mode 100755 scripts/gcc-ld

diff --git a/scripts/gcc-ld b/scripts/gcc-ld
new file mode 100755
index 000000000000..13e85ece8d04
--- /dev/null
+++ b/scripts/gcc-ld
@@ -0,0 +1,40 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# run gcc with ld options
+# used as a wrapper to execute link time optimizations
+# yes virginia, this is not pretty
+
+ARGS="-nostdlib"
+
+for j in "$@" ; do
+ if [ "$j" = -v ] ; then
+ exec `$CC -print-prog-name=ld` -v
+ fi
+done
+
+while [ "$1" != "" ] ; do
+ case "$1" in
+ -save-temps*|-m32|-m64) N="$1" ;;
+ -r) N="$1" ;;
+ -flinker-output*) N="$1" ;;
+ -[Wg]*) N="$1" ;;
+ -[olv]|-[Ofd]*|-nostdlib) N="$1" ;;
+ --end-group|--start-group|--whole-archive|--no-whole-archive|\
+--no-undefined|--hash-style*|--build-id*|--eh-frame-hdr|-Bsymbolic)
+ N="-Wl,$1" ;;
+ -[RTFGhIezcbyYu]*|\
+--script|--defsym|-init|-Map|--oformat|-rpath|\
+-rpath-link|--sort-section|--section-start|-Tbss|-Tdata|-Ttext|-soname|\
+--version-script|--dynamic-list|--version-exports-symbol|--wrap|-m|-z)
+ A="$1" ; shift ; N="-Wl,$A,$1" ;;
+ -[m]*) N="$1" ;;
+ -*) N="-Wl,$1" ;;
+ *) N="$1" ;;
+ esac
+ ARGS="$ARGS $N"
+ shift
+done
+
+[ -n "$V" ] && echo >&2 $CC $ARGS
+
+exec $CC $ARGS
--
2.38.1


2022-11-14 11:58:07

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 36/46] lib/string, lto: disable gcc LTO for string.o

From: Andi Kleen <[email protected]>

gcc can generate calls for string functions implicitly, and that assumes
they exist in a non-LTOed copy. Mark string.o as LTO disabled to avoid
missing symbols at link time.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
lib/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/lib/Makefile b/lib/Makefile
index 59bd7c2f793a..bf72b58de5c8 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -27,6 +27,8 @@ KASAN_SANITIZE_string.o := n
CFLAGS_string.o += -fno-stack-protector
endif

+CFLAGS_string.o += $(DISABLE_LTO_GCC)
+
lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o timerqueue.o xarray.o \
maple_tree.o idr.o extable.o irq_regs.o argv_split.o \
--
2.38.1


2022-11-14 11:58:32

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 26/46] x86/apic, lto: Mark apic_driver*() as __noreorder

From: Jiri Slaby <[email protected]>

The apic code assumes that the apic drivers are in a particular order in
memory. gcc LTO can violate this. So add __noreorder to apic_driver()
and apic_drivers() to avoid a boot BUG().

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/include/asm/apic.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 3415321c8240..9c5c69482ab0 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -363,12 +363,12 @@ extern struct apic *apic;
* to enforce the order with in them.
*/
#define apic_driver(sym) \
- static const struct apic *__apicdrivers_##sym __used \
+ static const struct apic *__apicdrivers_##sym __used __noreorder \
__aligned(sizeof(struct apic *)) \
__section(".apicdrivers") = { &sym }

#define apic_drivers(sym1, sym2) \
- static struct apic *__apicdrivers_##sym1##sym2[2] __used \
+ static struct apic *__apicdrivers_##sym1##sym2[2] __used __noreorder \
__aligned(sizeof(struct apic *)) \
__section(".apicdrivers") = { &sym1, &sym2 }

--
2.38.1


2022-11-14 11:59:36

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 45/46] kasan, lto: remove extra BUILD_BUG() in memory_is_poisoned

From: Martin Liska <[email protected]>

The function memory_is_poisoned() can handle any size which can be
propagated by LTO later on. So we can end up with a constant that is not
handled in the switch. Thus just break and call memory_is_poisoned_n()
which handles arbitrary size to avoid build errors with gcc LTO.

Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
mm/kasan/generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index d8b5590f9484..d261f83c6687 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -152,7 +152,7 @@ static __always_inline bool memory_is_poisoned(unsigned long addr, size_t size)
case 16:
return memory_is_poisoned_16(addr);
default:
- BUILD_BUG();
+ break;
}
}

--
2.38.1


2022-11-14 12:03:09

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 31/46] x86/purgatory, lto: Disable gcc LTO for purgatory

From: Andi Kleen <[email protected]>

There are various issues with gcc LTO in the purgatory code, so disable
LTO here for now.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/purgatory/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index 17f09dc26381..c00dc09d6fe4 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -60,6 +60,8 @@ ifdef CONFIG_CFI_CLANG
PURGATORY_CFLAGS_REMOVE += $(CC_FLAGS_CFI)
endif

+PURGATORY_CFLAGS_REMOVE += $(CC_FLAGS_LTO)
+
CFLAGS_REMOVE_purgatory.o += $(PURGATORY_CFLAGS_REMOVE)
CFLAGS_purgatory.o += $(PURGATORY_CFLAGS)

--
2.38.1


2022-11-14 12:03:28

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 42/46] mm/kasan, lto: Mark kasan mem{cpy,move,set} as __used

From: Martin Liska <[email protected]>

gcc doesn't always recognize that memcpy/set/move called through
__builtins are referenced because the reference happens too late in the
RTL expansion phase. This can make LTO to drop them, leading to
undefined symbols. Mark them as __used to avoid that.

Cc: Andrey Ryabinin <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Konovalov <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
mm/kasan/shadow.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 0e3648b603a6..94c98feea9c8 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -39,7 +39,7 @@ bool __kasan_check_write(const volatile void *p, unsigned int size)
EXPORT_SYMBOL(__kasan_check_write);

#undef memset
-void *memset(void *addr, int c, size_t len)
+__used void *memset(void *addr, int c, size_t len)
{
if (!kasan_check_range((unsigned long)addr, len, true, _RET_IP_))
return NULL;
@@ -49,7 +49,7 @@ void *memset(void *addr, int c, size_t len)

#ifdef __HAVE_ARCH_MEMMOVE
#undef memmove
-void *memmove(void *dest, const void *src, size_t len)
+__used void *memmove(void *dest, const void *src, size_t len)
{
if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) ||
!kasan_check_range((unsigned long)dest, len, true, _RET_IP_))
@@ -60,7 +60,7 @@ void *memmove(void *dest, const void *src, size_t len)
#endif

#undef memcpy
-void *memcpy(void *dest, const void *src, size_t len)
+__used void *memcpy(void *dest, const void *src, size_t len)
{
if (!kasan_check_range((unsigned long)src, len, false, _RET_IP_) ||
!kasan_check_range((unsigned long)dest, len, true, _RET_IP_))
--
2.38.1


2022-11-14 12:03:59

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 23/46] init.h, lto: mark initcalls as __noreorder

From: Andi Kleen <[email protected]>

Kernels don't like any reordering of initcalls between files, as several
initcalls depend on each other. LTO is allowed to reorder as it wishes
and previously needed to use -fno-toplevel-reordering to prevent boot
failures. Now we can use __noreorder per symbol. So mark initcall
functions as such.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/init.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/init.h b/include/linux/init.h
index 077d7f93b402..ca827e2fb0da 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -246,7 +246,7 @@ extern bool initcall_debug;
static_assert(__same_type(initcall_t, &fn));
#else
#define ____define_initcall(fn, __unused, __name, __sec) \
- static initcall_t __name __used \
+ static initcall_t __name __used __noreorder \
__attribute__((__section__(__sec))) = fn;
#endif

--
2.38.1


2022-11-14 12:08:42

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 10/46] static_call, lto: Mark func_a() as __visible_on_lto

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark func_a() as __visible_on_lto as it was static.

[js] use __visible_on_lto

Cc: Peter Zijlstra <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/static_call_inline.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
index dc5665b62814..6933b4437597 100644
--- a/kernel/static_call_inline.c
+++ b/kernel/static_call_inline.c
@@ -501,7 +501,7 @@ early_initcall(static_call_init);

#ifdef CONFIG_STATIC_CALL_SELFTEST

-static int func_a(int x)
+__visible_on_lto int sc_func_a(int x)
{
return x+1;
}
@@ -511,7 +511,7 @@ static int func_b(int x)
return x+2;
}

-DEFINE_STATIC_CALL(sc_selftest, func_a);
+DEFINE_STATIC_CALL(sc_selftest, sc_func_a);

static struct static_call_data {
int (*func)(int);
@@ -520,7 +520,7 @@ static struct static_call_data {
} static_call_data [] __initdata = {
{ NULL, 2, 3 },
{ func_b, 2, 4 },
- { func_a, 2, 3 }
+ { sc_func_a, 2, 3 }
};

static int __init test_static_call_init(void)
--
2.38.1


2022-11-14 12:08:52

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 19/46] export, lto: Mark __kstrtab* in EXPORT_SYMBOL() as global and __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark __kstrtab_*[] and __kstrtabns_*[] symbols as global and
__visible.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/export.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/export.h b/include/linux/export.h
index 3f31ced0d977..3cb5f85327da 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -85,11 +85,13 @@ struct kernel_symbol {
*/
#define ___EXPORT_SYMBOL(sym, sec, ns) \
extern typeof(sym) sym; \
- extern const char __kstrtab_##sym[]; \
- extern const char __kstrtabns_##sym[]; \
+ extern const char __visible __kstrtab_##sym[]; \
+ extern const char __visible __kstrtabns_##sym[]; \
asm(" .section \"__ksymtab_strings\",\"aMS\",%progbits,1 \n" \
+ " .globl __kstrtab_" #sym " \n" \
"__kstrtab_" #sym ": \n" \
" .asciz \"" #sym "\" \n" \
+ " .globl __kstrtabns_" #sym " \n" \
"__kstrtabns_" #sym ": \n" \
" .asciz \"" ns "\" \n" \
" .previous \n"); \
--
2.38.1


2022-11-14 12:09:21

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 21/46] btf, lto: pass scope as strings

From: Jiri Slaby <[email protected]>

gcc LTO can put assembler top level statements into other assembler
files. The BTF IDs assumed that they are in the same file. We need to
make all BTF IDs global to work around this.

This is a preparation for that, as we will pass __global_on_lto as
scope. That is macro that expands either to "globl" or "local" depending
whether LTO is enabled.

That wouldn't work without this patch as we stringify scope at the
moment.

Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Cc: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/btf_ids.h | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h
index 2aea877d644f..3011757a48ef 100644
--- a/include/linux/btf_ids.h
+++ b/include/linux/btf_ids.h
@@ -83,16 +83,16 @@ word \
#define __BTF_ID_LIST(name, scope) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
-"." #scope " " #name "; \n" \
+"." scope " " #name "; \n" \
#name ":; \n" \
".popsection; \n");

#define BTF_ID_LIST(name) \
-__BTF_ID_LIST(name, local) \
+__BTF_ID_LIST(name, "local") \
extern u32 name[];

#define BTF_ID_LIST_GLOBAL(name, n) \
-__BTF_ID_LIST(name, globl)
+__BTF_ID_LIST(name, "globl")

/* The BTF_ID_LIST_SINGLE macro defines a BTF_ID_LIST with
* a single entry.
@@ -142,18 +142,18 @@ asm( \
#define __BTF_SET_START(name, scope) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
-"." #scope " __BTF_ID__set__" #name "; \n" \
+"." scope " __BTF_ID__set__" #name "; \n" \
"__BTF_ID__set__" #name ":; \n" \
".zero 4 \n" \
".popsection; \n");

#define BTF_SET_START(name) \
-__BTF_ID_LIST(name, local) \
-__BTF_SET_START(name, local)
+__BTF_ID_LIST(name, "local") \
+__BTF_SET_START(name, "local")

#define BTF_SET_START_GLOBAL(name) \
-__BTF_ID_LIST(name, globl) \
-__BTF_SET_START(name, globl)
+__BTF_ID_LIST(name, "globl") \
+__BTF_SET_START(name, "globl")

#define BTF_SET_END(name) \
asm( \
@@ -186,14 +186,14 @@ extern struct btf_id_set name;
#define __BTF_SET8_START(name, scope) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
-"." #scope " __BTF_ID__set8__" #name "; \n" \
+"." scope " __BTF_ID__set8__" #name "; \n" \
"__BTF_ID__set8__" #name ":; \n" \
".zero 8 \n" \
".popsection; \n");

#define BTF_SET8_START(name) \
-__BTF_ID_LIST(name, local) \
-__BTF_SET8_START(name, local)
+__BTF_ID_LIST(name, "local") \
+__BTF_SET8_START(name, "local")

#define BTF_SET8_END(name) \
asm( \
--
2.38.1


2022-11-14 12:09:51

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 41/46] x86/lib, lto: Mark 32bit mem{cpy,move,set} as __used

From: Andi Kleen <[email protected]>

gcc doesn't always recognize that memcpy/set/move called through
__builtins are referenced because the reference happens too late in the
RTL expansion phase. This can make LTO to drop them, leading to
undefined symbols. Mark them as __used to avoid that.

This is only needed on 32bit, on 64bit they're assembler anyways.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/lib/memcpy_32.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/lib/memcpy_32.c b/arch/x86/lib/memcpy_32.c
index ef3af7ff2c8a..53fa1cac79d1 100644
--- a/arch/x86/lib/memcpy_32.c
+++ b/arch/x86/lib/memcpy_32.c
@@ -6,19 +6,19 @@
#undef memset
#undef memmove

-__visible void *memcpy(void *to, const void *from, size_t n)
+__used __visible void *memcpy(void *to, const void *from, size_t n)
{
return __memcpy(to, from, n);
}
EXPORT_SYMBOL(memcpy);

-__visible void *memset(void *s, int c, size_t count)
+__used __visible void *memset(void *s, int c, size_t count)
{
return __memset(s, c, count);
}
EXPORT_SYMBOL(memset);

-__visible void *memmove(void *dest, const void *src, size_t n)
+__used __visible void *memmove(void *dest, const void *src, size_t n)
{
int d0,d1,d2,d3,d4,d5;
char *ret = dest;
--
2.38.1


2022-11-14 12:10:30

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 39/46] x86, lto: Disable relative init pointers with gcc LTO

From: Andi Kleen <[email protected]>

Relative init pointers are implemented using custom top-level assembler
that references the init function. With LTO, the top-level assembler
statement can end up in other assembler files than the init function,
which then causes linker errors if the init function was static.

This could be fixed by making all the init functions global, but that
would be a very intrusive change all over the tree.

Instead, disable relative init pointers for gcc LTO.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/Kconfig | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ceab0db..6455d843d559 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -176,7 +176,9 @@ config X86
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT
- select HAVE_ARCH_PREL32_RELOCATIONS
+ # LTO can move assembler to different files, so all
+ # the init functions would need to be global for this to work
+ select HAVE_ARCH_PREL32_RELOCATIONS if !LTO_GCC
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_STACKLEAK
--
2.38.1


2022-11-14 12:12:02

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 16/46] x86, lto: Mark gdt_page and native_sched_clock() as __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark gdt_page and native_sched_clock() as __visible.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/tsc.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..5417a8fd7a45 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -201,7 +201,7 @@ static const struct cpu_dev default_cpu = {

static const struct cpu_dev *this_cpu = &default_cpu;

-DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
+__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
#ifdef CONFIG_X86_64
/*
* We need valid kernel segments for data and code in long mode too
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..df1589482662 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -215,7 +215,7 @@ static void __init cyc2ns_init_secondary_cpus(void)
/*
* Scheduler clock - returns current time in nanosec units.
*/
-u64 native_sched_clock(void)
+__visible u64 native_sched_clock(void)
{
if (static_branch_likely(&__use_tsc)) {
u64 tsc_now = rdtsc();
--
2.38.1


2022-11-14 12:12:22

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 35/46] Kbuild, lto: disable gcc LTO for bounds+asm-offsets

From: Andi Kleen <[email protected]>

Disable LTO when generating the bounds+asm-offsets.s files which are
scanned for C constants. With a LTO build, the file would contain the
gcc IR in assembler form, which breaks the scanning scripts.

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
Kbuild | 3 +++
1 file changed, 3 insertions(+)

diff --git a/Kbuild b/Kbuild
index 464b34a08f51..40744d76d416 100644
--- a/Kbuild
+++ b/Kbuild
@@ -11,6 +11,8 @@ bounds-file := include/generated/bounds.h

targets := kernel/bounds.s

+kernel/bounds.s: KBUILD_CFLAGS += $(DISABLE_LTO_GCC)
+
$(bounds-file): kernel/bounds.s FORCE
$(call filechk,offsets,__LINUX_BOUNDS_H__)

@@ -30,6 +32,7 @@ offsets-file := include/generated/asm-offsets.h
targets += arch/$(SRCARCH)/kernel/asm-offsets.s

arch/$(SRCARCH)/kernel/asm-offsets.s: $(timeconst-file) $(bounds-file)
+arch/$(SRCARCH)/kernel/asm-offsets.s: KBUILD_CFLAGS += $(DISABLE_LTO_GCC)

$(offsets-file): arch/$(SRCARCH)/kernel/asm-offsets.s FORCE
$(call filechk,offsets,__ASM_OFFSETS_H__)
--
2.38.1


2022-11-14 12:13:40

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 46/46] x86, lto: Finally enable gcc LTO for x86

From: Andi Kleen <[email protected]>

Now that everything is in place, allow gcc LTO for the x86 build.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6455d843d559..2c96facf4a42 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -112,6 +112,7 @@ config X86
select ARCH_USES_CFI_TRAPS if X86_64 && CFI_CLANG
select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_LTO_CLANG_THIN
+ select ARCH_SUPPORTS_LTO_GCC
select ARCH_USE_BUILTIN_BSWAP
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS
--
2.38.1


2022-11-14 12:15:13

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 40/46] x86/livepatch, lto: Disable live patching with gcc LTO

From: Andi Kleen <[email protected]>

It is not supported by gcc 12 so far, so it causes compiler "sorry"
messages.

Other than the compiler support, there shouldn't be any barriers for
live patching LTOed kernels, although it might be more difficult to
create patches for larger functions.

Cc: Josh Poimboeuf <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Miroslav Benes <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Joe Lawrence <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/livepatch/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/livepatch/Kconfig b/kernel/livepatch/Kconfig
index 53d51ed619a3..22699adc39a6 100644
--- a/kernel/livepatch/Kconfig
+++ b/kernel/livepatch/Kconfig
@@ -12,6 +12,7 @@ config LIVEPATCH
depends on KALLSYMS_ALL
depends on HAVE_LIVEPATCH
depends on !TRIM_UNUSED_KSYMS
+ depends on !LTO_GCC # not supported in gcc
help
Say Y here if you want to support kernel live patching.
This option has no runtime impact until a kernel "patch"
--
2.38.1


2022-11-14 13:19:07

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 25/46] sched, lto: mark sched classes as __noreorder

From: Andi Kleen <[email protected]>

The scheduler code assumes that the scheduler classes are in a
particular order in memory. gcc LTO can violate this. Specify
__noreorder to avoid a boot BUG().

Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Valentin Schneider <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/sched/sched.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a4a20046e586..fe2703528972 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2230,6 +2230,7 @@ static inline void set_next_task(struct rq *rq, struct task_struct *next)
*/
#define DEFINE_SCHED_CLASS(name) \
const struct sched_class name##_sched_class \
+ __noreorder \
__aligned(__alignof__(struct sched_class)) \
__section("__" #name "_sched_class")

--
2.38.1


2022-11-14 13:19:32

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 15/46] x86/xen, lto: Mark xen_vcpu_stolen() as __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark xen_vcpu_stolen() as __visible.

Cc: Juergen Gross <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Oleksandr Tyshchenko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
drivers/xen/time.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/time.c b/drivers/xen/time.c
index 152dd33bb223..006a04592c8f 100644
--- a/drivers/xen/time.c
+++ b/drivers/xen/time.c
@@ -145,7 +145,7 @@ void xen_get_runstate_snapshot(struct vcpu_runstate_info *res)
}

/* return true when a vcpu could run but has no real cpu to run on */
-bool xen_vcpu_stolen(int vcpu)
+__visible bool xen_vcpu_stolen(int vcpu)
{
return per_cpu(xen_runstate, vcpu).state == RUNSTATE_runnable;
}
--
2.38.1


2022-11-14 13:20:37

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 32/46] x86/realmode, lto: Disable gcc LTO for real mode code

From: Andi Kleen <[email protected]>

The early real mode bootup code makes various assumptions that break
with LTO. For example it assumes that top level assembler statements
don't get reordered. Disable LTO for the real mode code.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/realmode/Makefile | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/realmode/Makefile b/arch/x86/realmode/Makefile
index a0b491ae2de8..47b8b500cf15 100644
--- a/arch/x86/realmode/Makefile
+++ b/arch/x86/realmode/Makefile
@@ -10,6 +10,7 @@
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KBUILD_CFLAGS += $(DISABLE_LTO_GCC)

subdir- := rm

--
2.38.1


2022-11-14 13:21:54

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 14/46] x86/sev, lto: Mark cpuid_table_copy as __visible_on_lto

From: Martin Liska <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark cpuid_table_copy as __visible_on_lto.

[js] use __visible_on_lto

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/kernel/sev-shared.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 3a5b0c9c4fcc..554da8aabfc7 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -64,7 +64,7 @@ struct snp_cpuid_table {
static u16 ghcb_version __ro_after_init;

/* Copy of the SNP firmware's CPUID page. */
-static struct snp_cpuid_table cpuid_table_copy __ro_after_init;
+__visible_on_lto struct snp_cpuid_table cpuid_table_copy __ro_after_init;

/*
* These will be initialized based on CPUID table so that non-present
--
2.38.1


2022-11-14 13:22:59

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 44/46] scripts/bloat-o-meter, lto: handle gcc LTO

From: Andi Kleen <[email protected]>

gcc LTO can add .lto_priv postfixes to symbols. Ignore those in
bloat-o-meter to allow comparison of non-LTO with LTO kernels.

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/bloat-o-meter | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/bloat-o-meter b/scripts/bloat-o-meter
index f9553f60a14a..ab994b3bf6e2 100755
--- a/scripts/bloat-o-meter
+++ b/scripts/bloat-o-meter
@@ -45,7 +45,7 @@ def getsizes(file, format):
if name == "linux_banner": continue
if name == "vermagic": continue
# statics and some other optimizations adds random .NUMBER
- name = re_NUMBER.sub('', name)
+ name = re_NUMBER.sub('', name).replace(".lto_priv", "")
sym[name] = sym.get(name, 0) + int(size, 16)
return sym

--
2.38.1


2022-11-14 13:23:03

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 22/46] btf, lto: Make all BTF IDs global on LTO

From: Andi Kleen <[email protected]>

gcc LTO can put assembler top level statements into other assembler
files. The BTF IDs assumed that they are in the same file. So if we are
building with gcc LTO, make all BTF IDs global to work around this.

This is done by new __global_on_lto macro.

[js] do that for 8B BTF set too (commit ab21d6063c01)
[js] do global only in LTO case

Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/btf_ids.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h
index 3011757a48ef..a2bef302e42c 100644
--- a/include/linux/btf_ids.h
+++ b/include/linux/btf_ids.h
@@ -37,7 +37,7 @@ struct btf_id_set8 {
#define ____BTF_ID(symbol, word) \
asm( \
".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \
-".local " #symbol " ; \n" \
+"." __global_on_lto " " #symbol " ; \n" \
".type " #symbol ", STT_OBJECT; \n" \
".size " #symbol ", 4; \n" \
#symbol ": \n" \
@@ -88,7 +88,7 @@ asm( \
".popsection; \n");

#define BTF_ID_LIST(name) \
-__BTF_ID_LIST(name, "local") \
+__BTF_ID_LIST(name, __global_on_lto) \
extern u32 name[];

#define BTF_ID_LIST_GLOBAL(name, n) \
@@ -148,8 +148,8 @@ asm( \
".popsection; \n");

#define BTF_SET_START(name) \
-__BTF_ID_LIST(name, "local") \
-__BTF_SET_START(name, "local")
+__BTF_ID_LIST(name, __global_on_lto) \
+__BTF_SET_START(name, __global_on_lto)

#define BTF_SET_START_GLOBAL(name) \
__BTF_ID_LIST(name, "globl") \
@@ -192,8 +192,8 @@ asm( \
".popsection; \n");

#define BTF_SET8_START(name) \
-__BTF_ID_LIST(name, "local") \
-__BTF_SET8_START(name, "local")
+__BTF_ID_LIST(name, __global_on_lto) \
+__BTF_SET8_START(name, __global_on_lto)

#define BTF_SET8_END(name) \
asm( \
--
2.38.1


2022-11-14 13:23:50

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 34/46] scripts, lto: disable gcc LTO for some mod sources

From: Andi Kleen <[email protected]>

The mod tools scan assembler (devicetable-offsets.s) to generate symbols
into devicetable-offsets.h and binary (empty.o) to find out ELF setup.
That doesn't work with LTO. So just disable LTO for empty.o and
devicetable-offsets.s.

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/mod/Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
index c9e38ad937fd..aa3465d6bc4a 100644
--- a/scripts/mod/Makefile
+++ b/scripts/mod/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
OBJECT_FILES_NON_STANDARD := y
CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO)
+CFLAGS_REMOVE_empty.o += $(CC_FLAGS_LTO_GCC)

hostprogs-always-y += modpost mk_elfconfig
always-y += empty.o
@@ -9,6 +10,8 @@ modpost-objs := modpost.o file2alias.o sumversion.o

devicetable-offsets-file := devicetable-offsets.h

+$(obj)/devicetable-offsets.s: KBUILD_CFLAGS += $(DISABLE_LTO_GCC)
+
$(obj)/$(devicetable-offsets-file): $(obj)/devicetable-offsets.s FORCE
$(call filechk,offsets,__DEVICETABLE_OFFSETS_H__)

--
2.38.1


2022-11-14 13:24:40

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 38/46] Kbuild, lto: don't include weak source file symbols in System.map

From: Andi Kleen <[email protected]>

The gcc LTO build can generate some extra weak source code file name
symbols on the second kallsyms link like:
0000000002fdf20a W head64.c.552cf5a6

This causes the "Inconsistent kallsyms data" error due to mismatches in
the stage1 vs stage2 kallsyms link. Filter those out when generating
the System.map.

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/mksysmap | 2 ++
1 file changed, 2 insertions(+)

diff --git a/scripts/mksysmap b/scripts/mksysmap
index 16a08b8ef2f8..0f19a44ab136 100755
--- a/scripts/mksysmap
+++ b/scripts/mksysmap
@@ -34,6 +34,7 @@
# U - undefined global symbols
# N - debugging symbols
# w - local weak symbols
+# W - weak symbols if they contain .c.

# readprofile starts reading symbols when _stext is found, and
# continue until it finds a symbol which is not either of 'T', 't',
@@ -57,4 +58,5 @@ $NM -n $1 | grep -v \
-e ' __kstrtab_' \
-e ' __kstrtabns_' \
-e ' L0$' \
+ -e ' W .*\.c\.' \
> $2
--
2.38.1


2022-11-14 13:25:13

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 30/46] Kbuild, lto: Add Link Time Optimization support

From: Andi Kleen <[email protected]>

This patch adds gcc LTO support. It leverages some of the existing
support for clang LTO.

With LTO, gcc will do whole program optimizations for the whole kernel
and each module. This increases compile time, but can generate faster
and smaller code and allows the compiler to do global checking. For
example the compiler can complain now about type mismatches for symbols
between different files.

LTO allows gcc to inline functions between different files and do
various other optimization across the whole binary.

The LTO patches have been used for many years by various users, mostly
to make their kernel smaller. The original versions date back to 2012.

This version has a lot of outdated cruft dropped and doesn't need any
special tool chain (except for new enough) anymore.

This adds the basic Kbuild plumbing for LTO:
* Add a new LDFINAL variable that controls the final link for vmlinux or
module. In this case we call gcc-ld instead of ld, to run the LTO
step.

* Add Makefile support to enable LTO

For more information see Documentation/kbuild/lto-build.rst

Thanks to H.J. Lu, Joe Mario, Honza Hubicka, Richard Biener, Don Zickus,
Changlong Xie, Gleb Schukin, Martin Liska, various github contributors,
who helped with this project (and probably some more who I forgot,
sorry).

[js] pass -flto only once (the one with jobserver)
[ml] "-m: command not found" and whitespace fix
[bs] fixed Documentation issues:
* blank line padding before single requirement list
* use bullet list for FAQ
* use bullet lists for external link references list
* add LTO documentation to toc index

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Cc: Richard Biener <[email protected]>
Cc: Jan Hubicka <[email protected]>
Cc: H.J. Lu <[email protected]>
Cc: Don Zickus <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Bagas Sanjaya <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
Documentation/kbuild/index.rst | 2 +
Documentation/kbuild/lto-build.rst | 76 ++++++++++++++++++++++++++++++
Makefile | 4 +-
arch/Kconfig | 52 ++++++++++++++++++++
scripts/Makefile.build | 9 ++--
scripts/Makefile.lto | 43 +++++++++++++++++
scripts/Makefile.modfinal | 2 +-
scripts/Makefile.vmlinux | 3 +-
scripts/Makefile.vmlinux_o | 4 +-
scripts/link-vmlinux.sh | 7 +--
10 files changed, 190 insertions(+), 12 deletions(-)
create mode 100644 Documentation/kbuild/lto-build.rst
create mode 100644 scripts/Makefile.lto

diff --git a/Documentation/kbuild/index.rst b/Documentation/kbuild/index.rst
index cee2f99f734b..1937eee7c437 100644
--- a/Documentation/kbuild/index.rst
+++ b/Documentation/kbuild/index.rst
@@ -22,6 +22,8 @@ Kernel Build System
gcc-plugins
llvm

+ lto-build
+
.. only:: subproject and html

Indices
diff --git a/Documentation/kbuild/lto-build.rst b/Documentation/kbuild/lto-build.rst
new file mode 100644
index 000000000000..3fb17342e72f
--- /dev/null
+++ b/Documentation/kbuild/lto-build.rst
@@ -0,0 +1,76 @@
+=====================================================
+gcc link time optimization (LTO) for the Linux kernel
+=====================================================
+
+Link Time Optimization allows the compiler to optimize the complete program
+instead of just each file.
+
+The compiler can inline functions between files and do various other global
+optimizations, like specializing functions for common parameters,
+determing when global variables are clobbered, making functions pure/const,
+propagating constants globally, removing unneeded data and others.
+
+It will also drop unused functions which can make the kernel
+image smaller in some circumstances, in particular for small kernel
+configurations.
+
+For small monolithic kernels it can throw away unused code very effectively
+(especially when modules are disabled) and usually shrinks
+the code size.
+
+Build time and memory consumption at build time will increase, depending
+on the size of the largest binary. Modular kernels are less affected.
+With LTO incremental builds are less incremental, as always the whole
+binary needs to be re-optimized (but not re-parsed)
+
+Oopses can be somewhat more difficult to read, due to the more aggressive
+inlining: it helps to use scripts/faddr2line.
+
+It is currently incompatible with live patching.
+
+Normal "reasonable" builds work with less than 4GB of RAM, but very large
+configurations like allyesconfig typically need more memory. The actual
+memory needed depends on the available memory (gcc sizes its garbage
+collector pools based on that or on the ulimit -m limits) and
+the compiler version.
+
+Requirements:
+-------------
+
+- Enough memory: 4GB for a standard build, more for allyesconfig
+ The peak memory usage happens single threaded (when lto-wpa merges types),
+ so dialing back -j options will not help much.
+
+A 32bit hosted compiler is unlikely to work due to the memory requirements.
+You can however build a kernel targeted at 32bit on a 64bit host.
+
+FAQs:
+-----
+
+* I get a section type attribute conflict
+
+ Usually because of someone doing const __initdata (should be
+ const __initconst) or const __read_mostly (should be just const). Check
+ both symbols reported by gcc.
+
+References:
+-----------
+
+* Presentation on Kernel LTO
+ (note, performance numbers/details totally outdated.)
+
+ http://halobates.de/kernel-lto.pdf
+
+* Generic gcc LTO:
+
+ * http://www.ucw.cz/~hubicka/slides/labs2013.pdf
+ * http://www.hipeac.net/system/files/barcelona.pdf
+
+* Somewhat outdated too (from GCC site):
+
+ * http://gcc.gnu.org/projects/lto/lto.pdf
+ * http://gcc.gnu.org/projects/lto/whopr.pdf
+
+Happy Link-Time-Optimizing!
+
+Andi Kleen
diff --git a/Makefile b/Makefile
index 0b723c903819..d0dfb5ca2b21 100644
--- a/Makefile
+++ b/Makefile
@@ -482,6 +482,7 @@ KBUILD_HOSTLDLIBS := $(HOST_LFS_LIBS) $(HOSTLDLIBS)

# Make variables (CC, etc...)
CPP = $(CC) -E
+LDFINAL = $(LD)
ifneq ($(LLVM),)
CC = $(LLVM_PREFIX)clang$(LLVM_SUFFIX)
LD = $(LLVM_PREFIX)ld.lld$(LLVM_SUFFIX)
@@ -604,7 +605,7 @@ export RUSTC RUSTDOC RUSTFMT RUSTC_OR_CLIPPY_QUIET RUSTC_OR_CLIPPY BINDGEN CARGO
export HOSTRUSTC KBUILD_HOSTRUSTFLAGS
export CPP AR NM STRIP OBJCOPY OBJDUMP READELF PAHOLE RESOLVE_BTFIDS LEX YACC AWK INSTALLKERNEL
export PERL PYTHON3 CHECK CHECKFLAGS MAKE UTS_MACHINE HOSTCXX
-export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD
+export KGZIP KBZIP2 KLZOP LZMA LZ4 XZ ZSTD LDFINAL
export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
export KBUILD_USERCFLAGS KBUILD_USERLDFLAGS

@@ -1085,6 +1086,7 @@ include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct
+include-$(CONFIG_LTO_GCC) += scripts/Makefile.lto
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins

include $(addprefix $(srctree)/, $(include-y))
diff --git a/arch/Kconfig b/arch/Kconfig
index 8f138e580d1a..ad52c8fddfb4 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -689,6 +689,21 @@ config HAS_LTO_CLANG
The compiler and Kconfig options support building with Clang's
LTO.

+config ARCH_SUPPORTS_LTO_GCC
+ bool
+
+# Some ar versions leak file descriptors when using the LTO
+# plugin and cause strange errors when ulimit -n is too low.
+# Pick an arbitrary threshold, which should be enough for most
+# kernel configs. This was a regression that is only
+# in some transient binutils version, so either older or
+# new enough is ok.
+# This might not be the exact range with this bug.
+config BAD_AR
+ depends on LD_VERSION = 23000
+ depends on $(shell,ulimit -n) < 4000
+ def_bool y
+
choice
prompt "Link Time Optimization (LTO)"
default LTO_NONE
@@ -736,8 +751,45 @@ config LTO_CLANG_THIN
https://clang.llvm.org/docs/ThinLTO.html

If unsure, say Y.
+
+config LTO_GCC
+ bool "gcc LTO"
+ depends on ARCH_SUPPORTS_LTO_GCC && CC_IS_GCC
+ depends on GCC_VERSION >= 100300
+ depends on LD_VERSION >= 22700
+ depends on !BAD_AR
+ select LTO
+ help
+ Enable whole program (link time) optimizations (LTO) for the whole
+ kernel and each module. This usually increases compile time,
+ especially for incremential builds, but tends to generate better code
+ as well as some global checks.
+
+ It allows the compiler to inline functions between different files
+ and do other global optimization, like propagating constants between
+ functions, determine side effects of functions, avoid unnecessary
+ register saving around functions, or optimize unused function
+ arguments. It also allows the compiler to drop unused functions.
+
+ With this option the compiler will also do some global checking over
+ different source files.
+
+ This requires a gcc 10.3 or later compiler and binutils >= 2.27.
+
+ On larger non modular configurations this may need more than 4GB of
+ RAM for the link phase, as well as a 64bit host compiler.
+
+ For more information see Documentation/kbuild/lto-build.rst
endchoice

+config LTO_CP_CLONE
+ bool "Allow aggressive cloning for function specialization"
+ depends on LTO_GCC
+ help
+ Allow the compiler to clone and specialize functions for specific
+ arguments when it determines these arguments are commonly
+ called. Experimential. Will increase text size.
+
config ARCH_SUPPORTS_CFI_CLANG
bool
help
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 0a28e3884efe..9b522c9efcb6 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -154,7 +154,7 @@ is-single-obj-m = $(and $(part-of-module),$(filter $@, $(obj-m)),y)
# When a module consists of a single object, there is no reason to keep LLVM IR.
# Make $(LD) covert LLVM IR to ELF here.
ifdef CONFIG_LTO
-cmd_ld_single_m = $(if $(is-single-obj-m), ; $(LD) $(ld_flags) -r -o $(tmp-target) $@; mv $(tmp-target) $@)
+cmd_ld_single_m = $(if $(is-single-obj-m), ; $(LDFINAL) $(ld_flags) -r -o $(tmp-target) $@; mv $(tmp-target) $@)
endif

quiet_cmd_cc_o_c = CC $(quiet_modtag) $@
@@ -265,7 +265,8 @@ $(obj)/%.usyms: $(obj)/%.o FORCE
$(call if_changed,undefined_syms)

quiet_cmd_cc_lst_c = MKLST $@
- cmd_cc_lst_c = $(CC) $(c_flags) -g -c -o $*.o $< && \
+ cmd_cc_lst_c = $(if $(CONFIG_LTO),$(warning Listing in LTO mode does not match final binary)) \
+ $(CC) $(c_flags) -g -c -o $*.o $< && \
$(CONFIG_SHELL) $(srctree)/scripts/makelst $*.o \
System.map $(OBJDUMP) > $@

@@ -446,8 +447,8 @@ $(obj)/modules.order: $(obj-m) FORCE
$(obj)/lib.a: $(lib-y) FORCE
$(call if_changed,ar)

-quiet_cmd_ld_multi_m = LD [M] $@
- cmd_ld_multi_m = $(LD) $(ld_flags) -r -o $@ @$(patsubst %.o,%.mod,$@) $(cmd_objtool)
+quiet_cmd_ld_multi_m = LDFINAL [M] $@
+ cmd_ld_multi_m = $(LDFINAL) $(ld_flags) -r -o $@ @$(patsubst %.o,%.mod,$@) $(cmd_objtool)

define rule_ld_multi_m
$(call cmd_and_savecmd,ld_multi_m)
diff --git a/scripts/Makefile.lto b/scripts/Makefile.lto
new file mode 100644
index 000000000000..33ac0da2bb47
--- /dev/null
+++ b/scripts/Makefile.lto
@@ -0,0 +1,43 @@
+#
+# Support for gcc link time optimization
+#
+
+DISABLE_LTO_GCC :=
+export DISABLE_LTO_GCC
+
+ifdef CONFIG_LTO_GCC
+ CC_FLAGS_LTO_GCC := -flto
+ DISABLE_LTO_GCC := -fno-lto
+
+ KBUILD_CFLAGS += ${CC_FLAGS_LTO_GCC}
+
+ CC_FLAGS_LTO := -flto
+ export CC_FLAGS_LTO
+
+ lto-flags-y := -flinker-output=nolto-rel -flto=jobserver
+ lto-flags-y += -fwhole-program
+
+ lto-flags-$(CONFIG_LTO_CP_CLONE) += -fipa-cp-clone
+
+ # allow extra flags from command line
+ lto-flags-y += ${LTO_EXTRA_CFLAGS}
+
+ # For LTO we need to use gcc to do the linking, not ld
+ # directly. Use a wrapper to convert the ld command line
+ # to gcc
+ LDFINAL := ${CONFIG_SHELL} ${srctree}/scripts/gcc-ld \
+ ${lto-flags-y}
+
+ # LTO gcc creates a lot of files in TMPDIR, and with /tmp as tmpfs
+ # it's easy to drive the machine OOM. Use the object directory
+ # instead for temporaries.
+ # This has the drawback that there might be some junk more visible
+ # after interrupted compilations, but you would have that junk
+ # there anyways in /tmp.
+ TMPDIR ?= $(objtree)
+ export TMPDIR
+
+ # use plugin aware tools
+ AR = $(CROSS_COMPILE)gcc-ar
+ NM = $(CROSS_COMPILE)gcc-nm
+endif # CONFIG_LTO_GCC
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 25bedd83644b..c52536c91c8c 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -32,7 +32,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)

quiet_cmd_ld_ko_o = LD [M] $@
cmd_ld_ko_o += \
- $(LD) -r $(KBUILD_LDFLAGS) \
+ $(LDFINAL) -r $(KBUILD_LDFLAGS) \
$(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
-T scripts/module.lds -o $@ $(filter %.o, $^); \
$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
diff --git a/scripts/Makefile.vmlinux b/scripts/Makefile.vmlinux
index 49946cb96844..8871e55f881b 100644
--- a/scripts/Makefile.vmlinux
+++ b/scripts/Makefile.vmlinux
@@ -26,7 +26,8 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)

# Final link of vmlinux with optional arch pass after final link
cmd_link_vmlinux = \
- $< "$(LD)" "$(KBUILD_LDFLAGS)" "$(LDFLAGS_vmlinux)"; \
+ $< "$(LD)" "$(LDFINAL)" "$(KBUILD_LDFLAGS)" \
+ "$(LDFLAGS_vmlinux)"; \
$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)

targets += vmlinux
diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
index 1c86895cfcf8..1f646b16aa70 100644
--- a/scripts/Makefile.vmlinux_o
+++ b/scripts/Makefile.vmlinux_o
@@ -44,9 +44,9 @@ objtool-args = $(vmlinux-objtool-args-y) --link
# Link of vmlinux.o used for section mismatch analysis
# ---------------------------------------------------------------------------

-quiet_cmd_ld_vmlinux.o = LD $@
+quiet_cmd_ld_vmlinux.o = LDFINAL $@
cmd_ld_vmlinux.o = \
- $(LD) ${KBUILD_LDFLAGS} -r -o $@ \
+ $(LDFINAL) ${KBUILD_LDFLAGS} -r -o $@ \
$(addprefix -T , $(initcalls-lds)) \
--whole-archive vmlinux.a --no-whole-archive \
--start-group $(KBUILD_VMLINUX_LIBS) --end-group \
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 652f33be9549..c89258bcf818 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -29,8 +29,9 @@
set -e

LD="$1"
-KBUILD_LDFLAGS="$2"
-LDFLAGS_vmlinux="$3"
+LDFINAL="$2"
+KBUILD_LDFLAGS="$3"
+LDFLAGS_vmlinux="$4"

is_enabled() {
grep -q "^$1=y" include/config/auto.conf
@@ -82,7 +83,7 @@ vmlinux_link()
ldlibs="-lutil -lrt -lpthread"
else
wl=
- ld="${LD}"
+ ld="${LDFINAL}"
ldflags="${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}"
ldlibs=
fi
--
2.38.1


2022-11-14 13:25:19

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 17/46] amd, lto: Mark amd pmu and pstate functions as __visible_on_lto

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark amd_pmu_test_overflow_topbit() and all amd pstate functions as
__visible_on_lto.

Also the pstate ones have to be renamed so that they are unique.

[ml] fix amd_pmu_test_overflow_topbit() too
[js] use __visible_on_lto

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Viresh Kumar <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/events/amd/core.c | 2 +-
drivers/cpufreq/amd-pstate.c | 15 ++++++++-------
2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 8b70237c33f7..9dfdfd85b493 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -643,7 +643,7 @@ static inline void amd_pmu_ack_global_status(u64 status)
wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status);
}

-static bool amd_pmu_test_overflow_topbit(int idx)
+__visible_on_lto bool amd_pmu_test_overflow_topbit(int idx)
{
u64 counter;

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ace7d50cf2ac..d0b67a60191d 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -66,7 +66,7 @@ MODULE_PARM_DESC(shared_mem,

static struct cpufreq_driver amd_pstate_driver;

-static inline int pstate_enable(bool enable)
+__visible_on_lto int do_amd_pstate_enable(bool enable)
{
return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable);
}
@@ -84,14 +84,14 @@ static int cppc_enable(bool enable)
return ret;
}

-DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable);
+DEFINE_STATIC_CALL(amd_pstate_enable, do_amd_pstate_enable);

static inline int amd_pstate_enable(bool enable)
{
return static_call(amd_pstate_enable)(enable);
}

-static int pstate_init_perf(struct amd_cpudata *cpudata)
+__visible_on_lto int do_amd_pstate_init_perf(struct amd_cpudata *cpudata)
{
u64 cap1;
u32 highest_perf;
@@ -142,15 +142,16 @@ static int cppc_init_perf(struct amd_cpudata *cpudata)
return 0;
}

-DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf);
+DEFINE_STATIC_CALL(amd_pstate_init_perf, do_amd_pstate_init_perf);

static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata)
{
return static_call(amd_pstate_init_perf)(cpudata);
}

-static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf,
- u32 des_perf, u32 max_perf, bool fast_switch)
+__visible_on_lto void do_amd_pstate_update_perf(struct amd_cpudata *cpudata,
+ u32 min_perf, u32 des_perf, u32 max_perf,
+ bool fast_switch)
{
if (fast_switch)
wrmsrl(MSR_AMD_CPPC_REQ, READ_ONCE(cpudata->cppc_req_cached));
@@ -172,7 +173,7 @@ static void cppc_update_perf(struct amd_cpudata *cpudata,
cppc_set_perf(cpudata->cpu, &perf_ctrls);
}

-DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf);
+DEFINE_STATIC_CALL(amd_pstate_update_perf, do_amd_pstate_update_perf);

static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata,
u32 min_perf, u32 des_perf,
--
2.38.1


2022-11-14 13:26:26

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 33/46] x86/vdso, lto: Disable gcc LTO for the vdso

From: Andi Kleen <[email protected]>

Disable gcc LTO for the vdso. It's not really useful here and causes
various strange problems.

Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/entry/vdso/Makefile | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 3e88b9df8c8f..e8099ee163a0 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -3,6 +3,8 @@
# Building vDSO images for x86.
#

+KBUILD_CFLAGS += $(DISABLE_LTO_GCC)
+
# Absolute relocation type $(ARCH_REL_TYPE_ABS) needs to be defined before
# the inclusion of generic Makefile.
ARCH_REL_TYPE_ABS := R_X86_64_JUMP_SLOT|R_X86_64_GLOB_DAT|R_X86_64_RELATIVE|
--
2.38.1


2022-11-14 13:26:54

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 24/46] bpf, lto: mark interpreter jump table as __noreorder

From: Andi Kleen <[email protected]>

gcc LTO has a problem that can cause static variables containing &&
labels to be put into a different LTO partition and then fail the build.
This can happen with the jump table in the BPF interprer.

Mark the interpreter function and the jump table as __noreorder, this
guarantees they both end up in the first partition.

Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: KP Singh <[email protected]>
Cc: Stanislav Fomichev <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/bpf/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 25a54e04560e..d40ce00622f6 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1640,7 +1640,7 @@ u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
*
* Return: whatever value is in %BPF_R0 at program exit
*/
-static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
+static u64 __noreorder ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
{
#define BPF_INSN_2_LBL(x, y) [BPF_##x | BPF_##y] = &&x##_##y
#define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z
--
2.38.1


2022-11-14 13:29:07

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 05/46] compiler.h: introduce __global_on_lto

From: Jiri Slaby <[email protected]>

__global_on_lto is defined as "globl" when gcc LTO is turned on (see
later patches), and "local" otherwise. It is needed for top-level
symbols which are referenced in assembly. It is because the assembly and
the symbol can each end up in a different file with gcc LTO. And that
leads to linker errors.

So the symbols have to be global when gcc LTO is in charge. On the
contrary, they can remain local on non-gcc-LTO builds.

Cc: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/compiler.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 2305a3cbe99c..16e4c1de14c4 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -135,8 +135,10 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,

#ifdef CONFIG_LTO_GCC
# define __visible_on_lto __visible
+# define __global_on_lto "globl"
#else
# define __visible_on_lto static
+# define __global_on_lto "local"
#endif

#ifndef unreachable
--
2.38.1


2022-11-14 13:29:11

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 37/46] Compiler attributes, lto: disable __flatten with LTO

From: Andi Kleen <[email protected]>

Using __flatten causes a simple gcc 12 LTO build not fit into 16GB
anymore. Disable flatten with LTO. With gcc 12, the build still does not
finish linking in 10 minutes, eating 40GB of RAM at that point.

There is an upstream bug about this:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107014

Until this is resolved, simply disable __flatten with LTO.

In the future, instead of this patch, we should likely drop __flatten
and its only user (pcpu_build_alloc_info()) and use always_inline to all
functions which shall be inlined there.

Cc: Miguel Ojeda <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/compiler_attributes.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index be6c71fd5ebb..09cf8eebcb0d 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -229,7 +229,12 @@
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
* clang: https://clang.llvm.org/docs/AttributeReference.html#flatten
*/
+#ifndef CONFIG_LTO_GCC
# define __flatten __attribute__((flatten))
+#else
+/* Causes very large memory use with gcc in LTO mode */
+# define __flatten
+#endif

/*
* Note the missing underscores.
--
2.38.1


2022-11-14 13:32:19

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 27/46] linkage, lto: use C version for SYSCALL_ALIAS() / cond_syscall()

From: Andi Kleen <[email protected]>

With LTO, aliases get largely resolved in the compiler, not in the
linker.

Implement cond_syscall() and SYSCALL_ALIAS() in C to let the compiler
understand the aliases so that it can resolve them properly.

Likely, the architecture specific versions are now not needed anymore,
but they are kept for now.

There is one subtlety here:
The assembler version didn't care whether there was a prototype or not.
This variant assumes there is no prototype because it uses a dummy
(void) signature. This works for sys_ni.c, but breaks for
kernel/time/posix-stubs.c. To avoid problems there, a second variant of
the macro (_PROTO) is added. That uses the previously declared type
(by typeof()).

I actually tried to avoid this by adding prototypes for SYS_NI() and use
only the _PROTO variant, but it resulted in very large patches and lots
of problems with all the different cases. Eventually, I gave up and just
use the prototype case in posix-stubs.c

[js] gcc >= 8 emits Wattribute-alias warning. Work around that by
__diag_*(). This is ugly, but due to gcc bug, I see no better
option. Suggestions welcome.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/linkage.h | 16 ++++++++--------
kernel/time/posix-stubs.c | 19 +++++++++++++++++--
2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/linkage.h b/include/linux/linkage.h
index 1feab6136b5b..688b9bb80e96 100644
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -23,17 +23,17 @@
#endif

#ifndef cond_syscall
-#define cond_syscall(x) asm( \
- ".weak " __stringify(x) "\n\t" \
- ".set " __stringify(x) "," \
- __stringify(sys_ni_syscall))
+#define cond_syscall(x) \
+ extern long x(void) __attribute__((alias("sys_ni_syscall"), weak));
#endif

#ifndef SYSCALL_ALIAS
-#define SYSCALL_ALIAS(alias, name) asm( \
- ".globl " __stringify(alias) "\n\t" \
- ".set " __stringify(alias) "," \
- __stringify(name))
+#define SYSCALL_ALIAS(a, name) \
+ long a(void) __attribute__((alias(__stringify(name))))
+#define SYSCALL_ALIAS_PROTO(a, name) \
+ typeof(a) a __attribute__((alias(__stringify(name))))
+#else
+#define SYSCALL_ALIAS_PROTO(a, name) SYSCALL_ALIAS(a, name)
#endif

#define __page_aligned_data __section(".data..page_aligned") __aligned(PAGE_SIZE)
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index 90ea5f373e50..23e1a63adc2b 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -31,13 +31,21 @@ asmlinkage long sys_ni_posix_timers(void)
}

#ifndef SYS_NI
-#define SYS_NI(name) SYSCALL_ALIAS(sys_##name, sys_ni_posix_timers)
+#define SYS_NI(name) SYSCALL_ALIAS_PROTO(sys_##name, sys_ni_posix_timers)
#endif

#ifndef COMPAT_SYS_NI
-#define COMPAT_SYS_NI(name) SYSCALL_ALIAS(compat_sys_##name, sys_ni_posix_timers)
+#define COMPAT_SYS_NI(name) \
+ SYSCALL_ALIAS_PROTO(compat_sys_##name, sys_ni_posix_timers)
#endif

+/*
+ * This cannot go to SYS_NI() or SYSCALL_ALIAS_PROTO() due to gcc bug fixed in
+ * gcc >= 13 (cf. PR 97498). I wonder how is __SYSCALL_DEFINEx() able to work?
+ */
+__diag_push();
+__diag_ignore(GCC, 8, "-Wattribute-alias", "Alias to nonimplemented syscall");
+
SYS_NI(timer_create);
SYS_NI(timer_gettime);
SYS_NI(timer_getoverrun);
@@ -51,6 +59,8 @@ SYS_NI(clock_adjtime32);
SYS_NI(alarm);
#endif

+__diag_pop();
+
/*
* We preserve minimal support for CLOCK_REALTIME and CLOCK_MONOTONIC
* as it is easy to remain compatible with little code. CLOCK_BOOTTIME
@@ -157,6 +167,9 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
which_clock);
}

+__diag_push();
+__diag_ignore(GCC, 8, "-Wattribute-alias", "Alias to nonimplemented syscall");
+
#ifdef CONFIG_COMPAT
COMPAT_SYS_NI(timer_create);
#endif
@@ -170,6 +183,8 @@ COMPAT_SYS_NI(setitimer);
SYS_NI(timer_settime32);
SYS_NI(timer_gettime32);

+__diag_pop();
+
SYSCALL_DEFINE2(clock_settime32, const clockid_t, which_clock,
struct old_timespec32 __user *, tp)
{
--
2.38.1


2022-11-14 13:32:26

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 29/46] scripts, lto: use CONFIG_LTO for many LTO specific actions

From: Andi Kleen <[email protected]>

The clang LTO and the gcc LTO share some changes in Makefiles and build
scripts. Change the common ones to use CONFIG_LTO instead of
CONFIG_LTO_CLANG so that they can be used by gcc too.

[js] fix scripts/link-vmlinux.sh too

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
Makefile | 2 +-
include/asm-generic/vmlinux.lds.h | 2 +-
kernel/kallsyms.c | 2 +-
scripts/Makefile.build | 2 +-
scripts/Makefile.lib | 2 +-
scripts/link-vmlinux.sh | 2 +-
scripts/module.lds.S | 2 +-
7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 58cd4f5e1c3a..0b723c903819 100644
--- a/Makefile
+++ b/Makefile
@@ -992,7 +992,7 @@ endif
endif
endif

-ifdef CONFIG_LTO
+ifdef CONFIG_LTO_CLANG
KBUILD_CFLAGS += -fno-lto $(CC_FLAGS_LTO)
KBUILD_AFLAGS += -fno-lto
export CC_FLAGS_LTO
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 3dc5824141cd..5e2179dd41d5 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -96,7 +96,7 @@
* RODATA_MAIN is not used because existing code already defines .rodata.x
* sections to be brought in with rodata.
*/
-#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG)
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO)
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
#define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..L* .data..compoundliteral* .data.$__unnamed_* .data.$L*
#define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 60c20f301a6b..1d4557ae090f 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -167,7 +167,7 @@ static bool cleanup_symbol_name(char *s)
{
char *res;

- if (!IS_ENABLED(CONFIG_LTO_CLANG))
+ if (!IS_ENABLED(CONFIG_LTO))
return false;

/*
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 564a20ce2667..0a28e3884efe 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -153,7 +153,7 @@ is-single-obj-m = $(and $(part-of-module),$(filter $@, $(obj-m)),y)

# When a module consists of a single object, there is no reason to keep LLVM IR.
# Make $(LD) covert LLVM IR to ELF here.
-ifdef CONFIG_LTO_CLANG
+ifdef CONFIG_LTO
cmd_ld_single_m = $(if $(is-single-obj-m), ; $(LD) $(ld_flags) -r -o $(tmp-target) $@; mv $(tmp-target) $@)
endif

diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3aa384cec76b..ac918fd84d96 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -269,7 +269,7 @@ objtool-args = $(objtool-args-y) \
$(if $(delay-objtool), --link) \
$(if $(part-of-module), --module)

-delay-objtool := $(or $(CONFIG_LTO_CLANG),$(CONFIG_X86_KERNEL_IBT))
+delay-objtool := $(or $(CONFIG_LTO),$(CONFIG_X86_KERNEL_IBT))

cmd_objtool = $(if $(objtool-enabled), ; $(objtool) $(objtool-args) $@)
cmd_gen_objtooldep = $(if $(objtool-enabled), { echo ; echo '$@: $$(wildcard $(objtool))' ; } >> $(dot-target).cmd)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 918470d768e9..652f33be9549 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -60,7 +60,7 @@ vmlinux_link()
# skip output file argument
shift

- if is_enabled CONFIG_LTO_CLANG || is_enabled CONFIG_X86_KERNEL_IBT; then
+ if is_enabled CONFIG_LTO || is_enabled CONFIG_X86_KERNEL_IBT; then
# Use vmlinux.o instead of performing the slow LTO link again.
objs=vmlinux.o
libs=
diff --git a/scripts/module.lds.S b/scripts/module.lds.S
index da4bddd26171..b36b0527b0a8 100644
--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -27,7 +27,7 @@ SECTIONS {
__kcfi_traps : { KEEP(*(.kcfi_traps)) }
#endif

-#ifdef CONFIG_LTO_CLANG
+#ifdef CONFIG_LTO
/*
* With CONFIG_LTO_CLANG, LLD always enables -fdata-sections and
* -ffunction-sections, which increases the size of the final module.
--
2.38.1


2022-11-14 13:33:02

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On 14. 11. 22, 12:56, Ard Biesheuvel wrote:
> On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
>>
>> Hi,
>>
>> this is the first call for comments (and kbuild complaints) for this
>> support of gcc (full) LTO in the kernel. Most of the patches come from
>> Andi. Me and Martin rebased them to new kernels and fixed the to-use
>> known issues. Also I updated most of the commit logs and reordered the
>> patches to groups of patches with similar intent.
>>
>> The very first patch comes from Alexander and is pending on some x86
>> queue already (I believe). I am attaching it only for completeness.
>> Without that, the kernel does not boot (LTO reorders a lot).
>>
>
> You didn't cc me on that patch so I will reply here: I don't think
> this is the right solution.
> On x86, there is a lot of stuff injected into .head.text that simply
> does not belong there, and getting rid of the __head annotation and
> dropping __HEAD from the Xen pvh head.S file would be a much better
> solution.

I think Alexander was working on that too. I'm not sure -- anyway, we
still have the other fix. That is putting startup_64() to a special
section and put that to the beginning of vmlinux using lds. (Until
.head.text is completely gone for good -- same as on arm, you wrote
somewhere.)

In any case, that patch was added only for reference, if anyone wants to
give the series a try. Next time, I can attach the other workaround ;).

I don't expect anyone will take the series as is. There will be a lot of
comments, I suppose. Hence many re-spins...

thanks,
--
js
suse labs


2022-11-14 13:45:32

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark raw_irqentry_exit_cond_resched() as __visible.

Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/entry/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 846add8394c4..13c1a7a0e8ce 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -378,7 +378,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
return ret;
}

-void raw_irqentry_exit_cond_resched(void)
+__visible void raw_irqentry_exit_cond_resched(void)
{
if (!preempt_count()) {
/* Sanity check RCU and thread stack */
--
2.38.1


2022-11-14 13:51:30

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 43/46] scripts, lto: check C symbols for modversions

From: Andi Kleen <[email protected]>

The gcc LTO nm doesn't output assembler symbols, which makes the
symversions check fail because ksymtab is defined in assembler. Instead,
check for a C symbol that is generated too.

Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/Makefile.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 9b522c9efcb6..dafa8aeed9c2 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -173,7 +173,7 @@ ifdef CONFIG_MODVERSIONS
# be compiled and linked to the kernel and/or modules.

gen_symversions = \
- if $(NM) $@ 2>/dev/null | grep -q __ksymtab; then \
+ if $(NM) $@ 2>/dev/null | grep -q __kstrtab; then \
$(call cmd_gensymtypes_$(1),$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
>> $(dot-target).cmd; \
fi
--
2.38.1


2022-11-14 13:51:36

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 20/46] softirq, lto: Mark irq_enter/exit_rcu() as __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark irq_enter_rcu() and irq_exit_rcu() as __visible.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
kernel/softirq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index c8a6913c067d..9d62e09c9581 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -604,7 +604,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
/**
* irq_enter_rcu - Enter an interrupt context with RCU watching
*/
-void irq_enter_rcu(void)
+__visible void irq_enter_rcu(void)
{
__irq_enter_raw();

@@ -657,7 +657,7 @@ static inline void __irq_exit_rcu(void)
*
* Also processes softirqs if needed and possible.
*/
-void irq_exit_rcu(void)
+__visible void irq_exit_rcu(void)
{
__irq_exit_rcu();
/* must be last! */
--
2.38.1


2022-11-14 13:51:58

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 01/46] x86/boot: robustify calling startup_{32,64}() from the decompressor code

From: Alexander Lobakin <[email protected]>

After commit ce697ccee1a8 ("kbuild: remove head-y syntax"), I
started digging whether x86 is ready for removing this old cruft.
Removing its objects from the list makes the kernel unbootable.
This applies only to bzImage, vmlinux still works correctly.
The reason is that with no strict object order determined by the
linker arguments, not the linker script, startup_64 can be placed
not right at the beginning of the kernel.
Here's vmlinux.map's beginning before removing:

ffffffff81000000 vmlinux.o:(.head.text)
ffffffff81000000 startup_64
ffffffff81000070 secondary_startup_64
ffffffff81000075 secondary_startup_64_no_verify
ffffffff81000160 verify_cpu

and after:

ffffffff81000000 vmlinux.o:(.head.text)
ffffffff81000000 pvh_start_xen
ffffffff81000080 startup_64
ffffffff810000f0 secondary_startup_64
ffffffff810000f5 secondary_startup_64_no_verify

Not a problem itself, but the self-extractor code has the address of
that function hardcoded the beginning, not looking onto the ELF
header, which always contains the address of startup_{32,64}().

So, instead of doing an "act of blind faith", just take the address
from the ELF header and extract a relative offset to the entry
point. The decompressor function already returns a pointer to the
beginning of the kernel to the Asm code, which then jumps to it,
so add that offset to the return value.
This doesn't change anything for now, but allows to resign from the
"head object list" for x86 and makes sure valid Kbuild or any other
improvements won't break anything here in general.

Tested-by: Jiri Slaby <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jiri Slaby <[email protected]>
Signed-off-by: Jiri Slaby (SUSE) <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 2 +-
arch/x86/boot/compressed/head_64.S | 2 +-
arch/x86/boot/compressed/misc.c | 16 ++++++++++------
3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 3b354eb9516d..56f9847e208b 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -187,7 +187,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
leal boot_heap@GOTOFF(%ebx), %eax
pushl %eax /* heap area */
pushl %esi /* real mode pointer */
- call extract_kernel /* returns kernel location in %eax */
+ call extract_kernel /* returns kernel entry point in %eax */
addl $24, %esp

/*
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d33f060900d2..aeba5aa3d26c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -593,7 +593,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
movl input_len(%rip), %ecx /* input_len */
movq %rbp, %r8 /* output target address */
movl output_len(%rip), %r9d /* decompressed length, end of relocs */
- call extract_kernel /* returns kernel location in %rax */
+ call extract_kernel /* returns kernel entry point in %rax */
popq %rsi

/*
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cf690d8712f4..2548d7fb243e 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -277,7 +277,7 @@ static inline void handle_relocations(void *output, unsigned long output_len,
{ }
#endif

-static void parse_elf(void *output)
+static size_t parse_elf(void *output)
{
#ifdef CONFIG_X86_64
Elf64_Ehdr ehdr;
@@ -287,16 +287,15 @@ static void parse_elf(void *output)
Elf32_Phdr *phdrs, *phdr;
#endif
void *dest;
+ size_t off;
int i;

memcpy(&ehdr, output, sizeof(ehdr));
if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
- ehdr.e_ident[EI_MAG3] != ELFMAG3) {
+ ehdr.e_ident[EI_MAG3] != ELFMAG3)
error("Kernel is not a valid ELF file");
- return;
- }

debug_putstr("Parsing ELF... ");

@@ -305,6 +304,7 @@ static void parse_elf(void *output)
error("Failed to allocate space for phdrs");

memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
+ off = ehdr.e_entry - phdrs->p_paddr;

for (i = 0; i < ehdr.e_phnum; i++) {
phdr = &phdrs[i];
@@ -328,6 +328,8 @@ static void parse_elf(void *output)
}

free(phdrs);
+
+ return off;
}

/*
@@ -356,6 +358,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
const unsigned long kernel_total_size = VO__end - VO__text;
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
unsigned long needed_size;
+ size_t off;

/* Retain x86 boot parameters pointer passed from startup_32/64. */
boot_params = rmode;
@@ -456,14 +459,15 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
debug_putstr("\nDecompressing Linux... ");
__decompress(input_data, input_len, NULL, NULL, output, output_len,
NULL, error);
- parse_elf(output);
+ off = parse_elf(output);
+ debug_putaddr(off);
handle_relocations(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");

/* Disable exception handling before booting the kernel */
cleanup_exception_handling();

- return output;
+ return output + off;
}

void fortify_panic(const char *name)
--
2.38.1


2022-11-14 13:51:59

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 04/46] compiler.h: introduce __visible_on_lto

From: Jiri Slaby <[email protected]>

__visible_on_lto is defined as "__visible" when gcc LTO is turned on
(see later patches), and "static" otherwise. It is needed for top-level
symbols which are referenced in assembly. It is because the assembly and
the symbol can each end up in a different file with gcc LTO. And that
leads to linker errors.

So the symbols have to be visible when gcc LTO is in charge. On the
contrary, they have to be static on non-gcc-LTO builds. Otherwise a
warning about missing declaration occurs.

Reported-by: kernel test robot <[email protected]>
Cc: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
include/linux/compiler.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 973a1bfd7ef5..2305a3cbe99c 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -133,6 +133,12 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
#define __annotate_jump_table
#endif /* CONFIG_OBJTOOL */

+#ifdef CONFIG_LTO_GCC
+# define __visible_on_lto __visible
+#else
+# define __visible_on_lto static
+#endif
+
#ifndef unreachable
# define unreachable() do { \
annotate_unreachable(); \
--
2.38.1


2022-11-14 13:54:11

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 03/46] kbuild: lto: preserve MAKEFLAGS for module linking

From: Martin Liska <[email protected]>

Prefix cc_o_c and ld_multi_m commands in makefile in order to preserve
access to jobserver. This is needed for gcc LTO at least (enabled in
later patches in this series). Note that both commands can invoke the
linker (ld_single_m in the former case).

Fixes this warning:
lto-wrapper: warning: jobserver is not available: ‘--jobserver-auth=’ is not present in ‘MAKEFLAGS’

Cc: Sedat Dilek <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Fixes: 5d45950dfbb1 (kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o)
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/Makefile.build | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 41f3602fc8de..564a20ce2667 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -247,7 +247,7 @@ endef

# Built-in and composite module parts
$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
- $(call if_changed_rule,cc_o_c)
+ +$(call if_changed_rule,cc_o_c)
$(call cmd,force_checksrc)

# To make this rule robust against "Argument list too long" error,
@@ -457,7 +457,7 @@ endef
$(multi-obj-m): objtool-enabled := $(delay-objtool)
$(multi-obj-m): part-of-module := y
$(multi-obj-m): %.o: %.mod FORCE
- $(call if_changed_rule,ld_multi_m)
+ +$(call if_changed_rule,ld_multi_m)
$(call multi_depend, $(multi-obj-m), .o, -objs -y -m)

# Add intermediate targets:
--
2.38.1


2022-11-14 13:57:48

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 02/46] kbuild: pass jobserver to cmd_ld_vmlinux.o

From: Jiri Slaby <[email protected]>

Until the link-vmlinux.sh split (cf. the commit below), the linker was
run with jobserver set in MAKEFLAGS. After the split, the command in
Makefile.vmlinux_o is not prefixed by "+" anymore, so this information
is lost.

Restore it as linkers working in parallel (namely gcc LTO) make a use of
it. Actually, they complain, if jobserver is not set:
lto-wrapper: warning: jobserver is not available: '--jobserver-auth=' is not present in 'MAKEFLAGS'

Fixes: 5d45950dfbb1 (kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o)
Cc: Sedat Dilek <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michal Marek <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
scripts/Makefile.vmlinux_o | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
index 0edfdb40364b..1c86895cfcf8 100644
--- a/scripts/Makefile.vmlinux_o
+++ b/scripts/Makefile.vmlinux_o
@@ -58,7 +58,7 @@ define rule_ld_vmlinux.o
endef

vmlinux.o: $(initcalls-lds) vmlinux.a $(KBUILD_VMLINUX_LIBS) FORCE
- $(call if_changed_rule,ld_vmlinux.o)
+ +$(call if_changed_rule,ld_vmlinux.o)

targets += vmlinux.o

--
2.38.1


2022-11-14 13:59:30

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 13/46] x86/preempt, lto: Mark preempt_schedule_*thunk() as __visible

From: Andi Kleen <[email protected]>

Symbols referenced from assembler (either directly or e.f. from
DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
they could end up in a different object file than the assembler. This
can lead to linker errors without this patch.

So mark preempt_schedule_thunk() and preempt_schedule_notrace_thunk() as
__visible.

Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Martin Liska <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
---
arch/x86/include/asm/preempt.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 5f6daea1ee24..c76ec881b23c 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -106,13 +106,13 @@ static __always_inline bool should_resched(int preempt_offset)
#ifdef CONFIG_PREEMPTION

extern asmlinkage void preempt_schedule(void);
-extern asmlinkage void preempt_schedule_thunk(void);
+extern __visible asmlinkage void preempt_schedule_thunk(void);

#define preempt_schedule_dynamic_enabled preempt_schedule_thunk
#define preempt_schedule_dynamic_disabled NULL

extern asmlinkage void preempt_schedule_notrace(void);
-extern asmlinkage void preempt_schedule_notrace_thunk(void);
+extern __visible asmlinkage void preempt_schedule_notrace_thunk(void);

#define preempt_schedule_notrace_dynamic_enabled preempt_schedule_notrace_thunk
#define preempt_schedule_notrace_dynamic_disabled NULL
--
2.38.1


2022-11-14 14:25:37

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
>
> Hi,
>
> this is the first call for comments (and kbuild complaints) for this
> support of gcc (full) LTO in the kernel. Most of the patches come from
> Andi. Me and Martin rebased them to new kernels and fixed the to-use
> known issues. Also I updated most of the commit logs and reordered the
> patches to groups of patches with similar intent.
>
> The very first patch comes from Alexander and is pending on some x86
> queue already (I believe). I am attaching it only for completeness.
> Without that, the kernel does not boot (LTO reorders a lot).
>

You didn't cc me on that patch so I will reply here: I don't think
this is the right solution.
On x86, there is a lot of stuff injected into .head.text that simply
does not belong there, and getting rid of the __head annotation and
dropping __HEAD from the Xen pvh head.S file would be a much better
solution.

2022-11-14 17:23:53

by Miguel Ojeda

[permalink] [raw]
Subject: Re: [PATCH 37/46] Compiler attributes, lto: disable __flatten with LTO

On Mon, Nov 14, 2022 at 12:45 PM Jiri Slaby (SUSE) <[email protected]> wrote:
>
> +#ifndef CONFIG_LTO_GCC
> # define __flatten __attribute__((flatten))
> +#else
> +/* Causes very large memory use with gcc in LTO mode */
> +# define __flatten
> +#endif

Currently, this header avoids attributes that depend on configuration
options on purpose (see the comment at the top), so it would be best
to move it elsewhere, e.g. `compiler_types.h`.

Though I feel bad about having to move this attribute out since it is
just that config option compared to other more involved bits in
`compiler_types.h`... :(

Cheers,
Miguel

2022-11-14 17:36:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 08/46] static_call, lto: Mark static keys as __visible

On Mon, Nov 14, 2022 at 12:43:06PM +0100, Jiri Slaby (SUSE) wrote:
> From: Andi Kleen <[email protected]>
>
> Symbols referenced from assembler (either directly or e.f. from
> DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> they could end up in a different object file than the assembler. This
> can lead to linker errors without this patch.
>
> So mark static call functions as __visible, namely static keys here.

Why doesn't llvm-lto need this?

Also, why am I getting a random selection of the patchset?

2022-11-14 17:39:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 14/46] x86/sev, lto: Mark cpuid_table_copy as __visible_on_lto

On Mon, Nov 14, 2022 at 12:43:12PM +0100, Jiri Slaby (SUSE) wrote:
> From: Martin Liska <[email protected]>
>
> Symbols referenced from assembler (either directly or e.f. from
> DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> they could end up in a different object file than the assembler. This
> can lead to linker errors without this patch.
>
> So mark cpuid_table_copy as __visible_on_lto.
>
> [js] use __visible_on_lto
>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Martin Liska <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> arch/x86/kernel/sev-shared.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 3a5b0c9c4fcc..554da8aabfc7 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -64,7 +64,7 @@ struct snp_cpuid_table {
> static u16 ghcb_version __ro_after_init;
>
> /* Copy of the SNP firmware's CPUID page. */
> -static struct snp_cpuid_table cpuid_table_copy __ro_after_init;
> +__visible_on_lto struct snp_cpuid_table cpuid_table_copy __ro_after_init;

Same again, address is taken (and passed into inline asm). Must not be
eliminated.

2022-11-14 17:56:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 12/46] x86/paravirt, lto: Mark native_steal_clock() as __visible_on_lto

On Mon, Nov 14, 2022 at 12:43:10PM +0100, Jiri Slaby (SUSE) wrote:
> From: Andi Kleen <[email protected]>
>
> Symbols referenced from assembler (either directly or e.f. from
> DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> they could end up in a different object file than the assembler. This
> can lead to linker errors without this patch.

> @@ -120,7 +120,7 @@ unsigned int paravirt_patch(u8 type, void *insn_buff, unsigned long addr,
> struct static_key paravirt_steal_enabled;
> struct static_key paravirt_steal_rq_enabled;
>
> -static u64 native_steal_clock(int cpu)
> +__visible_on_lto u64 native_steal_clock(int cpu)

More hate; same reason, DEFINE_STATIC_CALL() takes the function address
and stuffs it in a variable, WTF is GCC-LTO eliminating it?

2022-11-14 18:15:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 10/46] static_call, lto: Mark func_a() as __visible_on_lto

On Mon, Nov 14, 2022 at 12:43:08PM +0100, Jiri Slaby (SUSE) wrote:

> -static int func_a(int x)
> +__visible_on_lto int sc_func_a(int x)

> } static_call_data [] __initdata = {
> { NULL, 2, 3 },
> { func_b, 2, 4 },
> - { func_a, 2, 3 }
> + { sc_func_a, 2, 3 }
> };

I must say I really hate this. Also, with address taken, it still
eliminiates it?

This whole GCC-LTO sounds sub-par.

2022-11-14 18:15:56

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH 02/46] kbuild: pass jobserver to cmd_ld_vmlinux.o

On Mon, Nov 14, 2022 at 8:44 PM Jiri Slaby (SUSE) <[email protected]> wrote:
>
> From: Jiri Slaby <[email protected]>
>
> Until the link-vmlinux.sh split (cf. the commit below), the linker was
> run with jobserver set in MAKEFLAGS. After the split, the command in
> Makefile.vmlinux_o is not prefixed by "+" anymore, so this information
> is lost.
>
> Restore it as linkers working in parallel (namely gcc LTO) make a use of
> it. Actually, they complain, if jobserver is not set:
> lto-wrapper: warning: jobserver is not available: '--jobserver-auth=' is not present in 'MAKEFLAGS'
>
> Fixes: 5d45950dfbb1 (kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o)


This Fixes is wrong since GCC LTO is not in upstream code.






> Cc: Sedat Dilek <[email protected]>
> Cc: Masahiro Yamada <[email protected]>
> Cc: Michal Marek <[email protected]>
> Cc: Nick Desaulniers <[email protected]>
> Cc: Martin Liska <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> scripts/Makefile.vmlinux_o | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
> index 0edfdb40364b..1c86895cfcf8 100644
> --- a/scripts/Makefile.vmlinux_o
> +++ b/scripts/Makefile.vmlinux_o
> @@ -58,7 +58,7 @@ define rule_ld_vmlinux.o
> endef
>
> vmlinux.o: $(initcalls-lds) vmlinux.a $(KBUILD_VMLINUX_LIBS) FORCE
> - $(call if_changed_rule,ld_vmlinux.o)
> + +$(call if_changed_rule,ld_vmlinux.o)
>
> targets += vmlinux.o
>
> --
> 2.38.1
>


--
Best Regards
Masahiro Yamada

2022-11-14 18:39:23

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [PATCH 03/46] kbuild: lto: preserve MAKEFLAGS for module linking

On Mon, Nov 14, 2022 at 8:44 PM Jiri Slaby (SUSE) <[email protected]> wrote:
>
> From: Martin Liska <[email protected]>
>
> Prefix cc_o_c and ld_multi_m commands in makefile in order to preserve
> access to jobserver. This is needed for gcc LTO at least (enabled in
> later patches in this series). Note that both commands can invoke the
> linker (ld_single_m in the former case).
>
> Fixes this warning:
> lto-wrapper: warning: jobserver is not available: ‘--jobserver-auth=’ is not present in ‘MAKEFLAGS’
>
> Cc: Sedat Dilek <[email protected]>
> Cc: Masahiro Yamada <[email protected]>
> Cc: Michal Marek <[email protected]>
> Cc: Nick Desaulniers <[email protected]>
> Fixes: 5d45950dfbb1 (kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o)


Same as 02.

Also, 5d45950dfbb1 did not touch scripts/Makefile.build at all.
Please stop adding random, wrong Fixes.



Make already compiles many files in parallel.
It does not make sense to request a jobserver for
a single C file compilation.

Is there any way to turn off this annoyance?










> Signed-off-by: Martin Liska <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> scripts/Makefile.build | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/Makefile.build b/scripts/Makefile.build
> index 41f3602fc8de..564a20ce2667 100644
> --- a/scripts/Makefile.build
> +++ b/scripts/Makefile.build
> @@ -247,7 +247,7 @@ endef
>
> # Built-in and composite module parts
> $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
> - $(call if_changed_rule,cc_o_c)
> + +$(call if_changed_rule,cc_o_c)
> $(call cmd,force_checksrc)
>
> # To make this rule robust against "Argument list too long" error,
> @@ -457,7 +457,7 @@ endef
> $(multi-obj-m): objtool-enabled := $(delay-objtool)
> $(multi-obj-m): part-of-module := y
> $(multi-obj-m): %.o: %.mod FORCE
> - $(call if_changed_rule,ld_multi_m)
> + +$(call if_changed_rule,ld_multi_m)
> $(call multi_depend, $(multi-obj-m), .o, -objs -y -m)
>
> # Add intermediate targets:
> --
> 2.38.1
>


--
Best Regards
Masahiro Yamada

2022-11-14 19:04:50

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 08/46] static_call, lto: Mark static keys as __visible

On Mon, Nov 14, 2022 at 12:43:06PM +0100, Jiri Slaby (SUSE) wrote:
> From: Andi Kleen <[email protected]>
>
> Symbols referenced from assembler (either directly or e.f. from
> DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> they could end up in a different object file than the assembler. This
> can lead to linker errors without this patch.
>
> So mark static call functions as __visible, namely static keys here.
>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: Jason Baron <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ard Biesheuvel <[email protected]>
> Signed-off-by: Andi Kleen <[email protected]>
> Signed-off-by: Martin Liska <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> include/linux/static_call.h | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/static_call.h b/include/linux/static_call.h
> index df53bed9d71f..e629ab0c4ca3 100644
> --- a/include/linux/static_call.h
> +++ b/include/linux/static_call.h
> @@ -182,7 +182,7 @@ extern long __static_call_return0(void);
>
> #define DEFINE_STATIC_CALL(name, _func) \
> DECLARE_STATIC_CALL(name, _func); \
> - struct static_call_key STATIC_CALL_KEY(name) = { \
> + __visible struct static_call_key STATIC_CALL_KEY(name) = { \

Why not __visible_on_lto?

--
Josh

2022-11-14 19:16:21

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 30/46] Kbuild, lto: Add Link Time Optimization support

On Mon, Nov 14, 2022 at 12:43:28PM +0100, Jiri Slaby (SUSE) wrote:
> +++ b/Documentation/kbuild/lto-build.rst
> @@ -0,0 +1,76 @@
> +=====================================================
> +gcc link time optimization (LTO) for the Linux kernel
> +=====================================================
> +
> +Link Time Optimization allows the compiler to optimize the complete program
> +instead of just each file.
> +
> +The compiler can inline functions between files and do various other global
> +optimizations, like specializing functions for common parameters,
> +determing when global variables are clobbered, making functions pure/const,
> +propagating constants globally, removing unneeded data and others.
> +
> +It will also drop unused functions which can make the kernel
> +image smaller in some circumstances, in particular for small kernel
> +configurations.
> +
> +For small monolithic kernels it can throw away unused code very effectively
> +(especially when modules are disabled) and usually shrinks
> +the code size.
> +
> +Build time and memory consumption at build time will increase, depending
> +on the size of the largest binary. Modular kernels are less affected.
> +With LTO incremental builds are less incremental, as always the whole
> +binary needs to be re-optimized (but not re-parsed)
> +
> +Oopses can be somewhat more difficult to read, due to the more aggressive
> +inlining: it helps to use scripts/faddr2line.
> +
> +It is currently incompatible with live patching.

... because ?

--
Josh

2022-11-14 19:17:24

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 08/46] static_call, lto: Mark static keys as __visible

On Mon, Nov 14, 2022 at 04:51:07PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 14, 2022 at 12:43:06PM +0100, Jiri Slaby (SUSE) wrote:
> > From: Andi Kleen <[email protected]>
> >
> > Symbols referenced from assembler (either directly or e.f. from
> > DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> > they could end up in a different object file than the assembler. This
> > can lead to linker errors without this patch.
> >
> > So mark static call functions as __visible, namely static keys here.
>
> Why doesn't llvm-lto need this?
>
> Also, why am I getting a random selection of the patchset?

Same, please Cc me on the whole set next time.

--
Josh

2022-11-14 19:30:28

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 40/46] x86/livepatch, lto: Disable live patching with gcc LTO

On Mon, Nov 14, 2022 at 12:43:38PM +0100, Jiri Slaby (SUSE) wrote:
> From: Andi Kleen <[email protected]>
>
> It is not supported by gcc 12 so far, so it causes compiler "sorry"
> messages.

What specifically is not supported by GCC 12? What are the "sorry"
messages?

> Other than the compiler support, there shouldn't be any barriers for
> live patching LTOed kernels, although it might be more difficult to
> create patches for larger functions.

This seems to conflict with the documentation.

> Cc: Josh Poimboeuf <[email protected]>
> Cc: Jiri Kosina <[email protected]>
> Cc: Miroslav Benes <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Cc: Joe Lawrence <[email protected]>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>
> Signed-off-by: Martin Liska <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> kernel/livepatch/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/livepatch/Kconfig b/kernel/livepatch/Kconfig
> index 53d51ed619a3..22699adc39a6 100644
> --- a/kernel/livepatch/Kconfig
> +++ b/kernel/livepatch/Kconfig
> @@ -12,6 +12,7 @@ config LIVEPATCH
> depends on KALLSYMS_ALL
> depends on HAVE_LIVEPATCH
> depends on !TRIM_UNUSED_KSYMS
> + depends on !LTO_GCC # not supported in gcc

The comment doesn't help.

--
Josh

2022-11-14 20:34:53

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
>
> Hi,
>
> this is the first call for comments (and kbuild complaints) for this
> support of gcc (full) LTO in the kernel. Most of the patches come from
> Andi. Me and Martin rebased them to new kernels and fixed the to-use
> known issues. Also I updated most of the commit logs and reordered the
> patches to groups of patches with similar intent.
>
> The very first patch comes from Alexander and is pending on some x86
> queue already (I believe). I am attaching it only for completeness.
> Without that, the kernel does not boot (LTO reorders a lot).
>
> In our measurements, the performance differences are negligible.
>
> The kernel is bigger with gcc LTO due to more inlining.

OK, so if I understand this correctly:
- the performance is the same
- the resulting image is bigger
- we need a whole lot of ugly hacks to placate the linker.

Pardon my cynicism, but this cover letter does not mention any
advantages of LTO, so what is the point of all of this?

(On Clang, LTO was needed for CFI, but this is not even the case anymore)

2022-11-14 21:01:25

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 40/46] x86/livepatch, lto: Disable live patching with gcc LTO

On Mon, Nov 14, 2022 at 11:07:42AM -0800, Josh Poimboeuf wrote:
> On Mon, Nov 14, 2022 at 12:43:38PM +0100, Jiri Slaby (SUSE) wrote:
> > From: Andi Kleen <[email protected]>
> >
> > It is not supported by gcc 12 so far, so it causes compiler "sorry"
> > messages.
>
> What specifically is not supported by GCC 12?

-fwhole-program and the live patching options are mutually exclusive.
Okay I suppose it could be handled by disabling -fwhole-program, although
that might limit some optimizations.

> What are the "sorry" messages?

It's an error message from the compiler telling you that something is
not implemented.


-Andi

2022-11-14 21:01:44

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 08/46] static_call, lto: Mark static keys as __visible

On Mon, Nov 14, 2022 at 04:51:07PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 14, 2022 at 12:43:06PM +0100, Jiri Slaby (SUSE) wrote:
> > From: Andi Kleen <[email protected]>
> >
> > Symbols referenced from assembler (either directly or e.f. from
> > DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> > they could end up in a different object file than the assembler. This
> > can lead to linker errors without this patch.
> >
> > So mark static call functions as __visible, namely static keys here.
>
> Why doesn't llvm-lto need this?

It has an integrated assembler that can feed this information to the LTO
symbol table, while gas cannot do that.

There was some discussion to extend the gcc top level asm syntax to
express external symbols, but so far it doesn't exist.

>
> Also, why am I getting a random selection of the patchset?

Me too.

-Andi


2022-11-14 21:10:15

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 10/46] static_call, lto: Mark func_a() as __visible_on_lto

On Mon, Nov 14, 2022 at 04:54:16PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 14, 2022 at 12:43:08PM +0100, Jiri Slaby (SUSE) wrote:
>
> > -static int func_a(int x)
> > +__visible_on_lto int sc_func_a(int x)
>
> > } static_call_data [] __initdata = {
> > { NULL, 2, 3 },
> > { func_b, 2, 4 },
> > - { func_a, 2, 3 }
> > + { sc_func_a, 2, 3 }
> > };
>
> I must say I really hate this. Also, with address taken, it still
> eliminiates it?

It doesn't eliminate it, but makes it static, which causes the label to
change, so the assembler reference breaks.

-Andi

2022-11-14 22:11:21

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH 40/46] x86/livepatch, lto: Disable live patching with gcc LTO

On Mon, Nov 14, 2022 at 12:28:09PM -0800, Andi Kleen wrote:
> On Mon, Nov 14, 2022 at 11:07:42AM -0800, Josh Poimboeuf wrote:
> > On Mon, Nov 14, 2022 at 12:43:38PM +0100, Jiri Slaby (SUSE) wrote:
> > > From: Andi Kleen <[email protected]>
> > >
> > > It is not supported by gcc 12 so far, so it causes compiler "sorry"
> > > messages.
> >
> > What specifically is not supported by GCC 12?
>
> -fwhole-program and the live patching options are mutually exclusive.

What live patching options are you referring to?

--
Josh

2022-11-15 07:12:27

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 02/46] kbuild: pass jobserver to cmd_ld_vmlinux.o

On 14. 11. 22, 18:57, Masahiro Yamada wrote:
> On Mon, Nov 14, 2022 at 8:44 PM Jiri Slaby (SUSE) <[email protected]> wrote:
>>
>> From: Jiri Slaby <[email protected]>
>>
>> Until the link-vmlinux.sh split (cf. the commit below), the linker was
>> run with jobserver set in MAKEFLAGS. After the split, the command in
>> Makefile.vmlinux_o is not prefixed by "+" anymore, so this information
>> is lost.
>>
>> Restore it as linkers working in parallel (namely gcc LTO) make a use of
>> it. Actually, they complain, if jobserver is not set:
>> lto-wrapper: warning: jobserver is not available: '--jobserver-auth=' is not present in 'MAKEFLAGS'
>>
>> Fixes: 5d45950dfbb1 (kbuild: move vmlinux.o link to scripts/Makefile.vmlinux_o)
>
>
> This Fixes is wrong since GCC LTO is not in upstream code.

Yup, this is a left-over. Now dropped from both.

thanks,
--
js


2022-11-15 13:44:01

by Martin Liška

[permalink] [raw]
Subject: Re: [PATCH 30/46] Kbuild, lto: Add Link Time Optimization support

On 11/14/22 19:55, Josh Poimboeuf wrote:
> On Mon, Nov 14, 2022 at 12:43:28PM +0100, Jiri Slaby (SUSE) wrote:
>> +++ b/Documentation/kbuild/lto-build.rst
>> @@ -0,0 +1,76 @@
>> +=====================================================
>> +gcc link time optimization (LTO) for the Linux kernel
>> +=====================================================
>> +
>> +Link Time Optimization allows the compiler to optimize the complete program
>> +instead of just each file.
>> +
>> +The compiler can inline functions between files and do various other global
>> +optimizations, like specializing functions for common parameters,
>> +determing when global variables are clobbered, making functions pure/const,
>> +propagating constants globally, removing unneeded data and others.
>> +
>> +It will also drop unused functions which can make the kernel
>> +image smaller in some circumstances, in particular for small kernel
>> +configurations.
>> +
>> +For small monolithic kernels it can throw away unused code very effectively
>> +(especially when modules are disabled) and usually shrinks
>> +the code size.
>> +
>> +Build time and memory consumption at build time will increase, depending
>> +on the size of the largest binary. Modular kernels are less affected.
>> +With LTO incremental builds are less incremental, as always the whole
>> +binary needs to be re-optimized (but not re-parsed)
>> +
>> +Oopses can be somewhat more difficult to read, due to the more aggressive
>> +inlining: it helps to use scripts/faddr2line.
>> +
>> +It is currently incompatible with live patching.
>
> ... because ?

There's no fundamental reason why live patching can't coexist with -flto.

We removed the sorry message for GCC 13.1 release:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1a308905c1baf64d0ea4d09d7d92b55e79a2a339
when it comes to -flive-patching=inline-clone option.

But it seems Linux does not utilize the option (based on git grep):
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flive-patching

That said, I would remove this limitation as LTO can make creation of live patches
more complicated, but fundamentally there's no barrier.

Thanks,
Martin

2022-11-15 14:08:01

by Martin Liška

[permalink] [raw]
Subject: Re: [PATCH 40/46] x86/livepatch, lto: Disable live patching with gcc LTO

On 11/14/22 23:00, Josh Poimboeuf wrote:
> On Mon, Nov 14, 2022 at 12:28:09PM -0800, Andi Kleen wrote:
>> On Mon, Nov 14, 2022 at 11:07:42AM -0800, Josh Poimboeuf wrote:
>>> On Mon, Nov 14, 2022 at 12:43:38PM +0100, Jiri Slaby (SUSE) wrote:
>>>> From: Andi Kleen <[email protected]>
>>>>
>>>> It is not supported by gcc 12 so far, so it causes compiler "sorry"
>>>> messages.
>>>
>>> What specifically is not supported by GCC 12?
>>
>> -fwhole-program and the live patching options are mutually exclusive.
>
> What live patching options are you referring to?
>

As mentioned in the reply to the next email, we speak about:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flive-patching
option:

gcc -flto -flive-patching=inline-clone a.c
cc1: sorry, unimplemented: live patching is not supported with LTO

Cheers,
Martin

2022-11-16 23:40:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible

On Mon, Nov 14 2022 at 12:43, Jiri Slaby wrote:
> Symbols referenced from assembler (either directly or e.f. from

from assembler? I'm not aware that the assembler references anything.

Also what does e.f. mean? Did you want to write e.g.?

> DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> they could end up in a different object file than the assembler. This

than the assembler? Are we shipping the assembler in an object file?

> can lead to linker errors without this patch.

git grep -i 'this patch' Documentation/process/

> So mark raw_irqentry_exit_cond_resched() as __visible.

And all that tells me what? I know what you want to say, but it's not
there.

Symbols in different compilation units which are referenced from
assembly code either directly or indirectly, e.g. from
DEFINE_STATIC_KEY(), must be marked visible for GCC based LTO builds.

Add the missing __visible annotation to raw_irqentry_exit_cond_resched().

See?

There is no 'global' because it's obvious that a symbol in a different
compilation unit must be global to be resolvable. It's also obvious that
code in different compilation units ends up in different object files.

So stating that it's a 'must' to have such symbols marked visible is
good enough for an argument because that tells the reader that this is a
mandatory requirement for an GCC based LTO build.

No?

Thanks,

tglx


2022-11-17 08:43:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
> >
> > Hi,
> >
> > this is the first call for comments (and kbuild complaints) for this
> > support of gcc (full) LTO in the kernel. Most of the patches come from
> > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > known issues. Also I updated most of the commit logs and reordered the
> > patches to groups of patches with similar intent.
> >
> > The very first patch comes from Alexander and is pending on some x86
> > queue already (I believe). I am attaching it only for completeness.
> > Without that, the kernel does not boot (LTO reorders a lot).
> >
> > In our measurements, the performance differences are negligible.
> >
> > The kernel is bigger with gcc LTO due to more inlining.
>
> OK, so if I understand this correctly:
> - the performance is the same
> - the resulting image is bigger
> - we need a whole lot of ugly hacks to placate the linker.
>
> Pardon my cynicism, but this cover letter does not mention any
> advantages of LTO, so what is the point of all of this?

Seconded; I really hate all the ugly required for the GCC-LTO
'solution'. There not actually being any benefit just makes it a very
simple decision to drop all these patches on the floor.



2022-11-17 08:45:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible

On Thu, Nov 17, 2022 at 12:30:34AM +0100, Thomas Gleixner wrote:
> On Mon, Nov 14 2022 at 12:43, Jiri Slaby wrote:
> > Symbols referenced from assembler (either directly or e.f. from
>
> from assembler? I'm not aware that the assembler references anything.
>
> Also what does e.f. mean? Did you want to write e.g.?
>
> > DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> > they could end up in a different object file than the assembler. This
>
> than the assembler? Are we shipping the assembler in an object file?
>
> > can lead to linker errors without this patch.
>
> git grep -i 'this patch' Documentation/process/
>
> > So mark raw_irqentry_exit_cond_resched() as __visible.
>
> And all that tells me what? I know what you want to say, but it's not
> there.
>
> Symbols in different compilation units which are referenced from
> assembly code either directly or indirectly, e.g. from
> DEFINE_STATIC_KEY(), must be marked visible for GCC based LTO builds.
>
> Add the missing __visible annotation to raw_irqentry_exit_cond_resched().
>
> See?
>
> There is no 'global' because it's obvious that a symbol in a different
> compilation unit must be global to be resolvable. It's also obvious that
> code in different compilation units ends up in different object files.
>
> So stating that it's a 'must' to have such symbols marked visible is
> good enough for an argument because that tells the reader that this is a
> mandatory requirement for an GCC based LTO build.
>
> No?

I still don't understand any of it -- this symbol is not static (and
thus lives in the global namespace and it's name must not be mangled
lest it breaks ABI), this symbol has it's address taken, so it must not
be eliminated.

WTF does this crazy LTO thing require __visible on it?

The original Changelog babbles something about multiple object files,
which doesn't make sense either, there is only a single object file with
LTO -- that's sort of the whole point. The translation unit output
becomes some intermediate gunk -- to be used as input for the LTO pass,
but it is not an ELF object file.

The linker takes all these intermediate files, does the global
optimization thing and then generates a real ELF object file.

Anyway; I think we can drop all this crazy on the floor again, since per
the 0/n (which I didn't get) there isn't any actual benefit from using
GCC-LTO, so why should we bother with all this ugly.

I would suggest GCC implement this integrated assembler and follow the
clang lead here -- or people who want LTO use clang. GCC is clearly
inferior here.

2022-11-17 09:19:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 08/46] static_call, lto: Mark static keys as __visible

On Mon, Nov 14, 2022 at 12:34:33PM -0800, Andi Kleen wrote:
> On Mon, Nov 14, 2022 at 04:51:07PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 14, 2022 at 12:43:06PM +0100, Jiri Slaby (SUSE) wrote:
> > > From: Andi Kleen <[email protected]>
> > >
> > > Symbols referenced from assembler (either directly or e.f. from
> > > DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
> > > they could end up in a different object file than the assembler. This
> > > can lead to linker errors without this patch.
> > >
> > > So mark static call functions as __visible, namely static keys here.
> >
> > Why doesn't llvm-lto need this?
>
> It has an integrated assembler that can feed this information to the LTO
> symbol table, while gas cannot do that.
>
> There was some discussion to extend the gcc top level asm syntax to
> express external symbols, but so far it doesn't exist.

Urgh, that's ugly too. Why does GCC insist on ugly solutions; clang has
shown it can be done sanely, follow.

2022-11-17 09:42:38

by Richard Biener

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, 17 Nov 2022, Peter Zijlstra wrote:

> On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > this is the first call for comments (and kbuild complaints) for this
> > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > known issues. Also I updated most of the commit logs and reordered the
> > > patches to groups of patches with similar intent.
> > >
> > > The very first patch comes from Alexander and is pending on some x86
> > > queue already (I believe). I am attaching it only for completeness.
> > > Without that, the kernel does not boot (LTO reorders a lot).
> > >
> > > In our measurements, the performance differences are negligible.
> > >
> > > The kernel is bigger with gcc LTO due to more inlining.
> >
> > OK, so if I understand this correctly:
> > - the performance is the same
> > - the resulting image is bigger
> > - we need a whole lot of ugly hacks to placate the linker.
> >
> > Pardon my cynicism, but this cover letter does not mention any
> > advantages of LTO, so what is the point of all of this?
>
> Seconded; I really hate all the ugly required for the GCC-LTO
> 'solution'. There not actually being any benefit just makes it a very
> simple decision to drop all these patches on the floor.

I'd say that instead a prerequesite for the series would be to actually
enforce hidden visibility for everything not part of the kernel module
API so the compiler can throw away unused functions. Currently it has
to keep everything because with a shared object there might be external
references to everything exported from individual TUs.

There was a size benefit mentioned for module-less monolithic kernels
as likely used in embedded setups, not sure if that's enough motivation
to properly annotate symbols with visibility - and as far as I understand
all these 'required' are actually such fixes.

Richard.

2022-11-17 11:55:50

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, 17 Nov 2022 at 12:43, Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Nov 17, 2022 at 08:50:59AM +0000, Richard Biener wrote:
> > On Thu, 17 Nov 2022, Peter Zijlstra wrote:
> >
> > > On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > > > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > this is the first call for comments (and kbuild complaints) for this
> > > > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > > > known issues. Also I updated most of the commit logs and reordered the
> > > > > patches to groups of patches with similar intent.
> > > > >
> > > > > The very first patch comes from Alexander and is pending on some x86
> > > > > queue already (I believe). I am attaching it only for completeness.
> > > > > Without that, the kernel does not boot (LTO reorders a lot).
> > > > >
> > > > > In our measurements, the performance differences are negligible.
> > > > >
> > > > > The kernel is bigger with gcc LTO due to more inlining.
> > > >
> > > > OK, so if I understand this correctly:
> > > > - the performance is the same
> > > > - the resulting image is bigger
> > > > - we need a whole lot of ugly hacks to placate the linker.
> > > >
> > > > Pardon my cynicism, but this cover letter does not mention any
> > > > advantages of LTO, so what is the point of all of this?
> > >
> > > Seconded; I really hate all the ugly required for the GCC-LTO
> > > 'solution'. There not actually being any benefit just makes it a very
> > > simple decision to drop all these patches on the floor.
> >
> > I'd say that instead a prerequesite for the series would be to actually
> > enforce hidden visibility for everything not part of the kernel module
> > API so the compiler can throw away unused functions. Currently it has
> > to keep everything because with a shared object there might be external
> > references to everything exported from individual TUs.
>
> I'm not sure what you're on about; only symbols annotated with
> EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> have their address taken. You can feely eliminate any unused symbol.
>
> > There was a size benefit mentioned for module-less monolithic kernels
> > as likely used in embedded setups, not sure if that's enough motivation
> > to properly annotate symbols with visibility - and as far as I understand
> > all these 'required' are actually such fixes.
>
> I'm not seeing how littering __visible is useful or desired, doubly so
> for that static hack, that's just a crude work around for GCC LTO being
> inferior for not being able to read inline asm.

We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
symbols that may appear to the compiler as though they are never
referenced.

Would it be possible to repurpose those so that the LTO code knows
which symbols it must not remove?

2022-11-17 11:56:02

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, Nov 17 2022 at 08:50, Richard Biener wrote:
> On Thu, 17 Nov 2022, Peter Zijlstra wrote:
>> Seconded; I really hate all the ugly required for the GCC-LTO
>> 'solution'. There not actually being any benefit just makes it a very
>> simple decision to drop all these patches on the floor.
>
> I'd say that instead a prerequesite for the series would be to actually
> enforce hidden visibility for everything not part of the kernel module
> API so the compiler can throw away unused functions. Currently it has
> to keep everything because with a shared object there might be external
> references to everything exported from individual TUs.
>
> There was a size benefit mentioned for module-less monolithic kernels
> as likely used in embedded setups, not sure if that's enough motivation
> to properly annotate symbols with visibility - and as far as I understand
> all these 'required' are actually such fixes.

To accomodate a broken tool which cannot figure out which functions are
referenced in the final lump and which are not, right?

Can we pretty please fix the tool instead of proliferating the
brokenness?

Thanks,

tglx

2022-11-17 12:27:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, Nov 17, 2022 at 08:50:59AM +0000, Richard Biener wrote:
> On Thu, 17 Nov 2022, Peter Zijlstra wrote:
>
> > On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > this is the first call for comments (and kbuild complaints) for this
> > > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > > known issues. Also I updated most of the commit logs and reordered the
> > > > patches to groups of patches with similar intent.
> > > >
> > > > The very first patch comes from Alexander and is pending on some x86
> > > > queue already (I believe). I am attaching it only for completeness.
> > > > Without that, the kernel does not boot (LTO reorders a lot).
> > > >
> > > > In our measurements, the performance differences are negligible.
> > > >
> > > > The kernel is bigger with gcc LTO due to more inlining.
> > >
> > > OK, so if I understand this correctly:
> > > - the performance is the same
> > > - the resulting image is bigger
> > > - we need a whole lot of ugly hacks to placate the linker.
> > >
> > > Pardon my cynicism, but this cover letter does not mention any
> > > advantages of LTO, so what is the point of all of this?
> >
> > Seconded; I really hate all the ugly required for the GCC-LTO
> > 'solution'. There not actually being any benefit just makes it a very
> > simple decision to drop all these patches on the floor.
>
> I'd say that instead a prerequesite for the series would be to actually
> enforce hidden visibility for everything not part of the kernel module
> API so the compiler can throw away unused functions. Currently it has
> to keep everything because with a shared object there might be external
> references to everything exported from individual TUs.

I'm not sure what you're on about; only symbols annotated with
EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
have their address taken. You can feely eliminate any unused symbol.

> There was a size benefit mentioned for module-less monolithic kernels
> as likely used in embedded setups, not sure if that's enough motivation
> to properly annotate symbols with visibility - and as far as I understand
> all these 'required' are actually such fixes.

I'm not seeing how littering __visible is useful or desired, doubly so
for that static hack, that's just a crude work around for GCC LTO being
inferior for not being able to read inline asm.

2022-11-17 14:02:24

by Richard Biener

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, 17 Nov 2022, Ard Biesheuvel wrote:

> On Thu, 17 Nov 2022 at 12:43, Peter Zijlstra <[email protected]> wrote:
> >
> > On Thu, Nov 17, 2022 at 08:50:59AM +0000, Richard Biener wrote:
> > > On Thu, 17 Nov 2022, Peter Zijlstra wrote:
> > >
> > > > On Mon, Nov 14, 2022 at 08:40:50PM +0100, Ard Biesheuvel wrote:
> > > > > On Mon, 14 Nov 2022 at 12:44, Jiri Slaby (SUSE) <[email protected]> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > this is the first call for comments (and kbuild complaints) for this
> > > > > > support of gcc (full) LTO in the kernel. Most of the patches come from
> > > > > > Andi. Me and Martin rebased them to new kernels and fixed the to-use
> > > > > > known issues. Also I updated most of the commit logs and reordered the
> > > > > > patches to groups of patches with similar intent.
> > > > > >
> > > > > > The very first patch comes from Alexander and is pending on some x86
> > > > > > queue already (I believe). I am attaching it only for completeness.
> > > > > > Without that, the kernel does not boot (LTO reorders a lot).
> > > > > >
> > > > > > In our measurements, the performance differences are negligible.
> > > > > >
> > > > > > The kernel is bigger with gcc LTO due to more inlining.
> > > > >
> > > > > OK, so if I understand this correctly:
> > > > > - the performance is the same
> > > > > - the resulting image is bigger
> > > > > - we need a whole lot of ugly hacks to placate the linker.
> > > > >
> > > > > Pardon my cynicism, but this cover letter does not mention any
> > > > > advantages of LTO, so what is the point of all of this?
> > > >
> > > > Seconded; I really hate all the ugly required for the GCC-LTO
> > > > 'solution'. There not actually being any benefit just makes it a very
> > > > simple decision to drop all these patches on the floor.
> > >
> > > I'd say that instead a prerequesite for the series would be to actually
> > > enforce hidden visibility for everything not part of the kernel module
> > > API so the compiler can throw away unused functions. Currently it has
> > > to keep everything because with a shared object there might be external
> > > references to everything exported from individual TUs.
> >
> > I'm not sure what you're on about; only symbols annotated with
> > EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> > have their address taken. You can feely eliminate any unused symbol.

But IIRC that's not reflected on the ELF level by making EXPORT_SYMBOL*()
symbols public and the rest hidden - instead all symbols global in the C TUs
will become public and the module dynamic loader details are hidden from
GCCs view of the kernel image as ELF relocatable object.

> > > There was a size benefit mentioned for module-less monolithic kernels
> > > as likely used in embedded setups, not sure if that's enough motivation
> > > to properly annotate symbols with visibility - and as far as I understand
> > > all these 'required' are actually such fixes.
> >
> > I'm not seeing how littering __visible is useful or desired, doubly so
> > for that static hack, that's just a crude work around for GCC LTO being
> > inferior for not being able to read inline asm.
>
> We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> symbols that may appear to the compiler as though they are never
> referenced.
>
> Would it be possible to repurpose those so that the LTO code knows
> which symbols it must not remove?

I find

/*
* Force the compiler to emit 'sym' as a symbol, so that we can reference
* it from inline assembler. Necessary in case 'sym' could be inlined
* otherwise, or eliminated entirely due to lack of references that are
* visible to the compiler.
*/
#define ___ADDRESSABLE(sym, __attrs) \
static void * __used __attrs \
__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
#define __ADDRESSABLE(sym) \
___ADDRESSABLE(sym, __section(".discard.addressable"))

that should be enough to force LTO keeping 'sym' - unless there's
a linker script that discards .discard.addressable which I fear LTO
will notice, losing the effect. A more direct way would be to attach
__used to 'sym' directly. __ADDRESSABLE doesn't seem to be used
directly but instead I see cases like

#define __define_initcall_stub(__stub, fn) \
int __init __stub(void); \
int __init __stub(void) \
{ \
return fn(); \
} \
__ADDRESSABLE(__stub)

where one could have added __used to the __stub prototypes instead?

The folks who worked on LTO enablement of the kernel should know the
real issue better - I understand asm()s are a pain because GCC
refuses to parse the assembler string heuristically for used
symbols (but it can never be more than heuristics). The issue with
asm()s is not so much elimination (__used solves that) but that
GCC can end up moving the asm() and the refered to symbols to
different link-time units causing unresolved symbols for non-global
symbols. -fno-toplevel-reorder should fix that at some cost.

Richard.

--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

2022-11-17 15:06:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, Nov 17, 2022 at 01:55:07PM +0000, Richard Biener wrote:

> > > I'm not sure what you're on about; only symbols annotated with
> > > EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> > > have their address taken. You can feely eliminate any unused symbol.
>
> But IIRC that's not reflected on the ELF level by making EXPORT_SYMBOL*()
> symbols public and the rest hidden - instead all symbols global in the C TUs
> will become public and the module dynamic loader details are hidden from
> GCCs view of the kernel image as ELF relocatable object.

It is reflected by keeping their address in __ksymtab_$foo sections, as
such their address 'escapes'.

> > We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> > symbols that may appear to the compiler as though they are never
> > referenced.
> >
> > Would it be possible to repurpose those so that the LTO code knows
> > which symbols it must not remove?
>
> I find
>
> /*
> * Force the compiler to emit 'sym' as a symbol, so that we can reference
> * it from inline assembler. Necessary in case 'sym' could be inlined
> * otherwise, or eliminated entirely due to lack of references that are
> * visible to the compiler.
> */
> #define ___ADDRESSABLE(sym, __attrs) \
> static void * __used __attrs \
> __UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
> #define __ADDRESSABLE(sym) \
> ___ADDRESSABLE(sym, __section(".discard.addressable"))
>
> that should be enough to force LTO keeping 'sym' - unless there's
> a linker script that discards .discard.addressable which I fear LTO
> will notice, losing the effect.

The initial LTO link pass will not discard .discard sections in order to
generate a regular ELF object file. This object file is then fed to
objtool and the kallsyms tool and eventually linked with the linker
script in a multi-stage link pass.

Also see scripts/link-vmlinux.sh for all the horrible details.

> The folks who worked on LTO enablement of the kernel should know the
> real issue better - I understand asm()s are a pain because GCC
> refuses to parse the assembler string heuristically for used
> symbols (but it can never be more than heuristics).

I don't understand why it can't be more than heuristics; eventually the
asm() contents end up in a real assembler and it has to make sense.

Might as well parse it directly -- isn't that what clang-ias does?

> The issue with asm()s is not so much elimination (__used solves that)
> but that GCC can end up moving the asm() and the refered to symbols to
> different link-time units causing unresolved symbols for non-global
> symbols. -fno-toplevel-reorder should fix that at some cost.

I thought the whole point of LTO was that there was only a single link
time unit, translate all the tus into intermadiate gunk and then collect
the whole lot in one go.

2022-11-17 15:24:04

by Richard Biener

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, 17 Nov 2022, Peter Zijlstra wrote:

> On Thu, Nov 17, 2022 at 01:55:07PM +0000, Richard Biener wrote:
>
> > > > I'm not sure what you're on about; only symbols annotated with
> > > > EXPORT_SYMBOL*() are accessible from modules (aka DSOs) and those will
> > > > have their address taken. You can feely eliminate any unused symbol.
> >
> > But IIRC that's not reflected on the ELF level by making EXPORT_SYMBOL*()
> > symbols public and the rest hidden - instead all symbols global in the C TUs
> > will become public and the module dynamic loader details are hidden from
> > GCCs view of the kernel image as ELF relocatable object.
>
> It is reflected by keeping their address in __ksymtab_$foo sections, as
> such their address 'escapes'.

That's not enough to make symbols not appearing in __ksymtab_$foo
sections eliminatable.

> > > We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> > > symbols that may appear to the compiler as though they are never
> > > referenced.
> > >
> > > Would it be possible to repurpose those so that the LTO code knows
> > > which symbols it must not remove?
> >
> > I find
> >
> > /*
> > * Force the compiler to emit 'sym' as a symbol, so that we can reference
> > * it from inline assembler. Necessary in case 'sym' could be inlined
> > * otherwise, or eliminated entirely due to lack of references that are
> > * visible to the compiler.
> > */
> > #define ___ADDRESSABLE(sym, __attrs) \
> > static void * __used __attrs \
> > __UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
> > #define __ADDRESSABLE(sym) \
> > ___ADDRESSABLE(sym, __section(".discard.addressable"))
> >
> > that should be enough to force LTO keeping 'sym' - unless there's
> > a linker script that discards .discard.addressable which I fear LTO
> > will notice, losing the effect.
>
> The initial LTO link pass will not discard .discard sections in order to
> generate a regular ELF object file. This object file is then fed to
> objtool and the kallsyms tool and eventually linked with the linker
> script in a multi-stage link pass.
>
> Also see scripts/link-vmlinux.sh for all the horrible details.
>
> > The folks who worked on LTO enablement of the kernel should know the
> > real issue better - I understand asm()s are a pain because GCC
> > refuses to parse the assembler string heuristically for used
> > symbols (but it can never be more than heuristics).
>
> I don't understand why it can't be more than heuristics; eventually the
> asm() contents end up in a real assembler and it has to make sense.
>
> Might as well parse it directly -- isn't that what clang-ias does?

GCC doesn't have an integrated assembler and the actual assembler text
that's emitted is not known at the stage we need to know the symbol.
Which means for GCC it would be heuristics.

> > The issue with asm()s is not so much elimination (__used solves that)
> > but that GCC can end up moving the asm() and the refered to symbols to
> > different link-time units causing unresolved symbols for non-global
> > symbols. -fno-toplevel-reorder should fix that at some cost.
>
> I thought the whole point of LTO was that there was only a single link
> time unit, translate all the tus into intermadiate gunk and then collect
> the whole lot in one go.

that's what it does, but it fans out to parallelize the final compile,
dividing the whole lot again which is where this problem can appear
if GCC doesn't see that asm() X uses symbol Y.

Richard.

--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

2022-11-17 15:24:40

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 00/46] gcc-LTO support for the kernel

On Thu, 17 Nov 2022 at 14:55, Richard Biener <[email protected]> wrote:
>
> On Thu, 17 Nov 2022, Ard Biesheuvel wrote:
>
...
> > We have an __ADDRESSABLE() macro and asmlinkage modifier to annotate
> > symbols that may appear to the compiler as though they are never
> > referenced.
> >
> > Would it be possible to repurpose those so that the LTO code knows
> > which symbols it must not remove?
>
> I find
>
> /*
> * Force the compiler to emit 'sym' as a symbol, so that we can reference
> * it from inline assembler. Necessary in case 'sym' could be inlined
> * otherwise, or eliminated entirely due to lack of references that are
> * visible to the compiler.
> */
> #define ___ADDRESSABLE(sym, __attrs) \
> static void * __used __attrs \
> __UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
> #define __ADDRESSABLE(sym) \
> ___ADDRESSABLE(sym, __section(".discard.addressable"))
>
> that should be enough to force LTO keeping 'sym' - unless there's
> a linker script that discards .discard.addressable which I fear LTO
> will notice, losing the effect. A more direct way would be to attach
> __used to 'sym' directly. __ADDRESSABLE doesn't seem to be used
> directly but instead I see cases like
>
> #define __define_initcall_stub(__stub, fn) \
> int __init __stub(void); \
> int __init __stub(void) \
> { \
> return fn(); \
> } \
> __ADDRESSABLE(__stub)
>
> where one could have added __used to the __stub prototypes instead?
>

Probably, yes.

But my point was not really about the implementation of those things,
more about whether we could redefine them to something else that would
help the compiler infer that this symbol needs to be retained.

asmlinkage in particular seems relevant, which is currently only used
for C++ inclusion or for setting regparm{0} on i386.

2022-11-17 20:21:09

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH 40/46] x86/livepatch, lto: Disable live patching with gcc LTO

On Mon, Nov 14, 2022 at 3:48 AM Jiri Slaby (SUSE) <[email protected]> wrote:
>
> From: Andi Kleen <[email protected]>
>
> It is not supported by gcc 12 so far, so it causes compiler "sorry"
> messages.
>
> Other than the compiler support, there shouldn't be any barriers for
> live patching LTOed kernels, although it might be more difficult to
> create patches for larger functions.

A loosely related question: does livepatch work with CLANG LTO?
AFAICT, kpatch-build doesn't support it. But the kernel side should
work just fine?

Thanks,
Song

[...]

2022-11-17 22:23:52

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible


> I still don't understand any of it -- this symbol is not static (and
> thus lives in the global namespace and it's name must not be mangled
> lest it breaks ABI), this symbol has it's address taken, so it must not
> be eliminated.

It's not eliminated, but is still manged because gcc turns it into
static due to

-fwhole-program. Maybe this could avoided in gcc, but at least that's
what it does currently.

I believe disabling -fwhole-program would likely avoid it, but it would
also prevent some code

transformations because gcc would need to assume that every function can
be called by

someone it doesn't see.

> WTF does this crazy LTO thing require __visible on it?
>
> The original Changelog babbles something about multiple object files,
> which doesn't make sense either, there is only a single object file with
> LTO -- that's sort of the whole point. The translation unit output
> becomes some intermediate gunk -- to be used as input for the LTO pass,
> but it is not an ELF object file.
>
> The linker takes all these intermediate files, does the global
> optimization thing and then generates a real ELF object file.

That would be a single threaded very very slow global compilation.
Instead gcc WHOPR uses

partitioning to generate smaller units that can be compiled in parallel
based on their call dependencies,

and these use different object files from the individual assembler
invocations.

>
> Anyway; I think we can drop all this crazy on the floor again, since per
> the 0/n (which I didn't get) there isn't any actual benefit from using
> GCC-LTO, so why should we bother with all this ugly.

At least in the past it generated smaller kernels for small configurations.

One benefit that wasn't mentioned is doing type and other checks (e.g.
constant propagation

through inlining) across files.

In general LTO gives the compiler a lot more freedom to optimize code,
so even if it's not quite there

yet I think it's beneficial to let users play around with it and see if
they can get benefits.



-Andi


2022-11-18 01:48:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible

On Thu, Nov 17 2022 at 14:07, Andi Kleen wrote:
>> Anyway; I think we can drop all this crazy on the floor again, since per
>> the 0/n (which I didn't get) there isn't any actual benefit from using
>> GCC-LTO, so why should we bother with all this ugly.
>
> At least in the past it generated smaller kernels for small configurations.
>
> One benefit that wasn't mentioned is doing type and other checks (e.g.
> constant propagation
>
> through inlining) across files.
>
> In general LTO gives the compiler a lot more freedom to optimize code,
> so even if it's not quite there
>
> yet I think it's beneficial to let users play around with it and see if
> they can get benefits.

Sure, they can play around with it but that does not require to merge
all this nonsensical ballast for a half thought out compiler.

If they want to do that they can apply the pile of patches as provided
and play around.

If anything useful comes out of that with sensible changelogs and a
sensible argumentation why supporting a half thought out compiler is
required then we can revisit that.

Up to that point this is all considered to be __invisible.

Thanks,

tglx

2022-11-19 01:49:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible


> Sure, they can play around with it but that does not require to merge
> all this nonsensical ballast for a half thought out compiler.


You are referring to __visible?

TBH I don't understand the problem. In general __visible is useful
documentation,

so you know something is used from assembler or other strange contexts.
Doing such things

explicitly marked instead of implicitly hidden and they just happen to
work by accident

seems cleaner to me.


I can also see the __visible markings being useful for other purposes,
e.g. static analysis tools or

dynamic instrumentation like the various sanitizers. Everything that is
referenced outside

the normal code that the compiler sees may need some special handling.


That said I don't see the point of __visible_in_lto either, it should be
just all __visible.


Similar argument applies to __noreorder, it's also useful documentation.


There are a few real workarounds in the patchkit that are a bit ugly,
but __visible isn't it.


>
> If they want to do that they can apply the pile of patches as provided
> and play around.


It's very difficult to maintain out of tree, while in tree it's much
simpler.

I think Linux should support its primary compiler well and not give up
due to relatively small obstacles.


-Andi

2022-11-19 09:01:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible

On Fri, Nov 18 2022 at 16:50, Andi Kleen wrote:
>> Sure, they can play around with it but that does not require to merge
>> all this nonsensical ballast for a half thought out compiler.
>
> You are referring to __visible?
>
> TBH I don't understand the problem. In general __visible is useful
> documentation, so you know something is used from assembler or other
> strange contexts. Doing such things explicitly marked instead of
> implicitly hidden and they just happen to work by accident
> seems cleaner to me.

Seems cleaner is really not a technical argument. Visible is completely
useless. Either a symbol is global and therefore reachable from any
point in the final "executable" or it's not. Whether that reference is
in assembly or from a pointer, static key or whatever does not matter at
all. There is no such thing as a 'strange context'.

Nothing works here by accident. A global symbol is a global symbol
whether it's defined or referenced from C or from ASM or from any other
programming language does not matter at all.

> I can also see the __visible markings being useful for other purposes,
> e.g. static analysis tools or dynamic instrumentation like the various
> sanitizers. Everything that is referenced outside the normal code that
> the compiler sees may need some special handling.

All you have is 'may need' and 'I can see'. Where is the actual use case?

>> If they want to do that they can apply the pile of patches as provided
>> and play around.
>
> It's very difficult to maintain out of tree, while in tree it's much
> simpler.

Sure. Lots of things are simpler to maintain in tree, but that's not an
argument for merging anything.

> I think Linux should support its primary compiler well and not give up
> due to relatively small obstacles.

It's not an obstacle. It's a fundamental broken model. clang has proven
that it can be done proper, so there is no reason to proliferate the
inferior.

While you might consider gcc to be the primary compiler, that might have
been true a decade ago. A lot of people prefer clang as their primary
compiler simply because its saner and the maintainers behind it are
working with us and not trying to inflict their half baken crap on us to
spare themself the work to do it right.

Thanks,

tglx

2022-11-22 10:37:35

by Jiri Slaby

[permalink] [raw]
Subject: Re: [PATCH 18/46] entry, lto: Mark raw_irqentry_exit_cond_resched() as __visible

Hi,

On 17. 11. 22, 0:30, Thomas Gleixner wrote:
> On Mon, Nov 14 2022 at 12:43, Jiri Slaby wrote:
>> Symbols referenced from assembler (either directly or e.f. from
>
> from assembler? I'm not aware that the assembler references anything.

"""
Noun assembler

assembler (countable and uncountable, plural assemblers)

1. (programming, countable) A program that reads source code written in
assembly language and produces executable machine code, possibly
together with information needed by linkers, debuggers and other tools.

2. (computer languages, informal, chiefly uncountable) Assembly language.

I wrote that program in assembler.
""" [1]

I refer in the above to 2. You refer to 1.

In some languages, incl. mine, we don't distinguish between the two.
It's always assembler. Yet, that might confuse you, even though it's
correct as you can see above. I can switch to mode 1 (assembler and
assembly) for sure.

[1] https://en.wiktionary.org/wiki/assembler

> Also what does e.f. mean? Did you want to write e.g.?

Yes, my and my spellchecker's bad.

>> DEFINE_STATIC_KEY()) need to be global and visible in gcc LTO because
>> they could end up in a different object file than the assembler. This
>
> than the assembler? Are we shipping the assembler in an object file?

Nope, see above.

>> can lead to linker errors without this patch.
>
> git grep -i 'this patch' Documentation/process/

Sorry, I don't understand, care to elaborate? None of the lines from the
output seems to match the case here.

>> So mark raw_irqentry_exit_cond_resched() as __visible.
>
> And all that tells me what? I know what you want to say, but it's not
> there.
>
> Symbols in different compilation units which are referenced from
> assembly code either directly or indirectly, e.g. from
> DEFINE_STATIC_KEY(), must be marked visible for GCC based LTO builds.
>
> Add the missing __visible annotation to raw_irqentry_exit_cond_resched().
>
> See?
>
> There is no 'global' because it's obvious that a symbol in a different
> compilation unit must be global to be resolvable. It's also obvious that
> code in different compilation units ends up in different object files.

It's not about different compilation units. It's about different partitions.

> So stating that it's a 'must' to have such symbols marked visible is
> good enough for an argument because that tells the reader that this is a
> mandatory requirement for an GCC based LTO build.

My bad that I failed to explain properly in the commit log. But we are
working on throwing all this __visible thing away. Agreed, that it's
ridiculous/absurd.

thanks,
--
js
suse labs

2022-11-26 17:37:26

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [PATCH 45/46] kasan, lto: remove extra BUILD_BUG() in memory_is_poisoned

On Mon, Nov 14, 2022 at 12:45 PM Jiri Slaby (SUSE) <[email protected]> wrote:
>
> From: Martin Liska <[email protected]>
>
> The function memory_is_poisoned() can handle any size which can be
> propagated by LTO later on. So we can end up with a constant that is not
> handled in the switch. Thus just break and call memory_is_poisoned_n()
> which handles arbitrary size to avoid build errors with gcc LTO.
>
> Cc: Andrey Ryabinin <[email protected]>
> Cc: Alexander Potapenko <[email protected]>
> Cc: Andrey Konovalov <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Vincenzo Frascino <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Martin Liska <[email protected]>
> Signed-off-by: Jiri Slaby <[email protected]>
> ---
> mm/kasan/generic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> index d8b5590f9484..d261f83c6687 100644
> --- a/mm/kasan/generic.c
> +++ b/mm/kasan/generic.c
> @@ -152,7 +152,7 @@ static __always_inline bool memory_is_poisoned(unsigned long addr, size_t size)
> case 16:
> return memory_is_poisoned_16(addr);
> default:
> - BUILD_BUG();
> + break;
> }
> }
>
> --
> 2.38.1
>

Reviewed-by: Andrey Konovalov <[email protected]>