Hi all,
After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put
functions into .ltext with large section flag (#73037)") [1], which
landed in the 18.x cycle, there is a runtime warning when loading a
kernel via kexec due to the presence of two text sections (.text and
ltext).
$ kexec -l /boot/vmlinuz-linux --initrd=/boot/initramfs-linux.img --reuse-cmdline
$ dmesg -l warn+
...
[ 1.264240] ------------[ cut here ]------------
[ 1.264647] WARNING: CPU: 0 PID: 96 at kernel/kexec_file.c:945 kexec_load_purgatory+0x2c8/0x3c0
[ 1.265322] Modules linked in:
[ 1.265565] CPU: 0 PID: 96 Comm: kexec Not tainted 6.9.0-rc4-00031-g96fca68c4fbf #1 eae91b3fe699ecba2dd0a886471788e49eb36ac0
[ 1.266403] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 1.267268] RIP: 0010:kexec_load_purgatory+0x2c8/0x3c0
[ 1.267661] Code: 54 24 0c 48 89 c8 48 29 d0 0f 82 5d ff ff ff 49 03 54 24 1c 48 39 d1 0f 83 4f ff ff ff 49 8b 17 48 39 4a 18 0f 84 30 ff ff ff <0f> 0b e9 3b ff ff ff 66 85 c9 74 18 48 8b 5a 28 48 01 d3 45 31 e4
[ 1.269052] RSP: 0018:ffffbe28007cfb50 EFLAGS: 00010206
[ 1.269447] RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 0000000000000000
[ 1.269982] RDX: ffff988c8174d000 RSI: 0000000000000010 RDI: ffffbe2801d940c0
[ 1.270527] RBP: 0000000000000002 R08: 0000003d8b4c0000 R09: cc0000000025ff00
[ 1.271063] R10: 0000003d8b4c0000 R11: cc0000000025ff00 R12: ffffbe28000d5084
[ 1.271603] R13: 000000013ffff000 R14: ffff988c8174d000 R15: ffffbe28007cfbe0
[ 1.272140] FS: 00007fec73535740(0000) GS:ffff988cbbc00000(0000) knlGS:0000000000000000
[ 1.272744] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.273178] CR2: 00007fec736b1390 CR3: 0000000101a24000 CR4: 0000000000350ef0
[ 1.273732] Call Trace:
[ 1.273929] <TASK>
[ 1.274100] ? __warn+0xc9/0x1c0
[ 1.274356] ? kexec_load_purgatory+0x2c8/0x3c0
[ 1.274704] ? report_bug+0x139/0x1e0
[ 1.274998] ? handle_bug+0x42/0x70
[ 1.275269] ? exc_invalid_op+0x1a/0x50
[ 1.275574] ? asm_exc_invalid_op+0x1a/0x20
[ 1.275900] ? kexec_load_purgatory+0x2c8/0x3c0
[ 1.276251] bzImage64_load+0x1c1/0x6a0
[ 1.276556] kexec_image_load_default+0x49/0x60
[ 1.276907] __se_sys_kexec_file_load+0x606/0x790
[ 1.277280] ? arch_exit_to_user_mode_prepare+0x6e/0x70
[ 1.277675] do_syscall_64+0x90/0x170
[ 1.277955] ? srso_return_thunk+0x5/0x5f
[ 1.278265] ? __count_memcg_events+0x50/0xc0
[ 1.278597] ? srso_return_thunk+0x5/0x5f
[ 1.278901] ? handle_mm_fault+0xb18/0x11c0
[ 1.279218] ? vfs_read+0x2c8/0x2f0
[ 1.279498] ? srso_return_thunk+0x5/0x5f
[ 1.279802] ? do_user_addr_fault+0x4d2/0x690
[ 1.280138] ? srso_return_thunk+0x5/0x5f
[ 1.280449] ? srso_return_thunk+0x5/0x5f
[ 1.280755] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1.281136] RIP: 0033:0x7fec7363e88d
[ 1.281411] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 14 0d 00 f7 d8 64 89 01 48
[ 1.282789] RSP: 002b:00007ffd136f4808 EFLAGS: 00000246 ORIG_RAX: 0000000000000140
[ 1.283354] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fec7363e88d
[ 1.283893] RDX: 00000000000000c5 RSI: 0000000000000005 RDI: 0000000000000003
[ 1.284427] RBP: 0000000000000003 R08: 0000000000000000 R09: 00005628517eef10
[ 1.284966] R10: 00005628580a75f0 R11: 0000000000000246 R12: 0000000000000003
[ 1.285500] R13: 00005628517f89a8 R14: 00007ffd136f4b98 R15: 0000000000000004
[ 1.286036] </TASK>
[ 1.286210] ---[ end trace 0000000000000000 ]---
Unlike LTO and PGO, which were disabled for the purgatory in commit
97b6b9cbba40 ("x86/purgatory: remove PGO flags") and commit 75b2f7e4c9e0
("x86/purgatory: Remove LTO flags"), this optimization has no flag to
opt out of it. One way to resolve this would be to use '.ltext' and
'.lrodata' as the text and read-only data sections in the out of line
assembly in arch/x86/purgatory but there is nothing that stops future
changes from splitting the text section further.
Properly avoid the warning by using a linker script to coalesce all
separate text sections into one, which was alluded to by both the change
that introduced the warning and 75b2f7e4c9e0... I think this really
should have been done then but I wasn't looking too far ahead :) To
avoid backsliding now that all sections are properly described by the
linker script, turn on orphan section warnings as well.
[1]: https://github.com/llvm/llvm-project/commit/d8a04398f9492f043ffd8fbaf2458778f7d0fcd5
---
Nathan Chancellor (2):
x86/purgatory: Add a linker script
x86/purgatory: Enable orphan section warnings
arch/x86/purgatory/.gitignore | 1 +
arch/x86/purgatory/Makefile | 19 +++---------
arch/x86/purgatory/purgatory.lds.S | 63 ++++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 14 deletions(-)
---
base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680
change-id: 20240416-x86-fix-kexec-with-llvm-18-c986b21845c5
Best regards,
--
Nathan Chancellor <[email protected]>
Commit 8652d44f466a ("kexec: support purgatories with .text.hot
sections") added a warning when the purgatory has more than one .text
section, which is unsupported. A couple of changes have been made to the
x86 purgatory's Makefile to prevent the compiler from splitting the
text section as a result:
97b6b9cbba40 ("x86/purgatory: remove PGO flags")
75b2f7e4c9e0 ("x86/purgatory: Remove LTO flags")
Unfortunately, there may be compiler optimizations that add other text
sections that cannot be disabled. For example, starting with LLVM 18,
large text is emitted in '.ltext', which happens for the purgatory due
to commit e16c2983fba0 ("x86/purgatory: Change compiler flags from
-mcmodel=kernel to -mcmodel=large to fix kexec relocation errors"), but
there are out of line assembly files that use '.text'.
$ llvm-readelf -S arch/x86/purgatory/purgatory.ro | rg ' .[a-z]?text'
[ 1] .text PROGBITS 0000000000000000 000040 0000d0 00 AX 0 0 16
[ 2] .ltext PROGBITS 0000000000000000 000110 0015a6 00 AXl 0 0 16
To avoid the runtime warning when the purgatory has been built with LLVM
18, add a linker script that explicitly describes the sections of the
purgatory.ro and use it to merge '.ltext' and '.lrodata' back into
'.text' and '.rodata' to match the behavior of GCC and LLVM prior to the
optimization, as the distinction between small and large text is not
important in this case. This results in no warnings with
'--orphan-handling=warn' with either GNU or LLVM toolchains and the
resulting kernels can properly kexec other kernels.
This linker script is based on arch/s390/purgatory/purgatory.lds.S and
Ricardo Ribalda's prior attempt to add one for arch/x86 [1].
As a consequence of this change, the aforementioned flag changes can be
reverted because the '.text.*' sections generated by those options will
be combined properly by the linker script, which avoids the only reason
they were added in the first place. kexec continues to work with LTO
enabled.
[1]: https://lore.kernel.org/[email protected]/
Reported-by: ns <[email protected]>
Closes: https://github.com/ClangBuiltLinux/linux/issues/2016
Signed-off-by: Nathan Chancellor <[email protected]>
---
arch/x86/purgatory/.gitignore | 1 +
arch/x86/purgatory/Makefile | 19 +++---------
arch/x86/purgatory/purgatory.lds.S | 63 ++++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 14 deletions(-)
diff --git a/arch/x86/purgatory/.gitignore b/arch/x86/purgatory/.gitignore
index d2be1500671d..71bd99d98906 100644
--- a/arch/x86/purgatory/.gitignore
+++ b/arch/x86/purgatory/.gitignore
@@ -1 +1,2 @@
purgatory.chk
+purgatory.lds
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index bc31863c5ee6..dfc030a4cca9 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
OBJECT_FILES_NON_STANDARD := y
-purgatory-y := purgatory.o stack.o setup-x86_$(BITS).o sha256.o entry64.o string.o
+purgatory-y := purgatory.o purgatory.lds stack.o setup-x86_$(BITS).o sha256.o entry64.o string.o
targets += $(purgatory-y)
PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
@@ -14,20 +14,11 @@ $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
-# When profile-guided optimization is enabled, llvm emits two different
-# overlapping text sections, which is not supported by kexec. Remove profile
-# optimization flags.
-KBUILD_CFLAGS := $(filter-out -fprofile-sample-use=% -fprofile-use=%,$(KBUILD_CFLAGS))
-
-# When LTO is enabled, llvm emits many text sections, which is not supported
-# by kexec. Remove -flto=* flags.
-KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO),$(KBUILD_CFLAGS))
-
# When linking purgatory.ro with -r unresolved symbols are not checked,
# also link a purgatory.chk binary without -r to check for unresolved symbols.
-PURGATORY_LDFLAGS := -e purgatory_start -z nodefaultlib
-LDFLAGS_purgatory.ro := -r $(PURGATORY_LDFLAGS)
-LDFLAGS_purgatory.chk := $(PURGATORY_LDFLAGS)
+PURGATORY_LDFLAGS := -z nodefaultlib
+LDFLAGS_purgatory.ro := -r $(PURGATORY_LDFLAGS) -T
+LDFLAGS_purgatory.chk := -e purgatory_start $(PURGATORY_LDFLAGS)
targets += purgatory.ro purgatory.chk
# Sanitizer, etc. runtimes are unavailable and cannot be linked here.
@@ -80,7 +71,7 @@ CFLAGS_string.o += $(PURGATORY_CFLAGS)
asflags-remove-y += $(foreach x, -g -gdwarf-4 -gdwarf-5, $(x) -Wa,$(x))
-$(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
+$(obj)/purgatory.ro: $(obj)/purgatory.lds $(PURGATORY_OBJS) FORCE
$(call if_changed,ld)
$(obj)/purgatory.chk: $(obj)/purgatory.ro FORCE
diff --git a/arch/x86/purgatory/purgatory.lds.S b/arch/x86/purgatory/purgatory.lds.S
new file mode 100644
index 000000000000..4fb155942642
--- /dev/null
+++ b/arch/x86/purgatory/purgatory.lds.S
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <asm-generic/vmlinux.lds.h>
+#include <asm/cache.h>
+
+OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
+
+#undef i386
+
+#ifdef CONFIG_X86_64
+OUTPUT_ARCH(i386:x86-64)
+#else
+OUTPUT_ARCH(i386)
+#endif
+
+ENTRY(purgatory_start)
+
+SECTIONS
+{
+ . = 0;
+
+ .kexec-purgatory : {
+ *(.kexec-purgatory)
+ }
+
+ .text : {
+ _text = .;
+ *(.text .text.*)
+ *(.ltext .ltext.*)
+ _etext = .;
+ }
+
+ .rodata : {
+ _rodata = .;
+ *(.rodata .rodata.*)
+ *(.lrodata .lrodata.*)
+ _erodata = .;
+ }
+
+ .data : {
+ _data = .;
+ *(.data .data.*)
+ _edata = .;
+ }
+
+ . = ALIGN(L1_CACHE_BYTES);
+ .bss : {
+ _bss = .;
+ *(.bss .bss.*)
+ *(COMMON)
+ . = ALIGN(8); /* For convenience during zeroing */
+ _ebss = .;
+ }
+ _end = .;
+
+ ELF_DETAILS
+
+ DISCARDS
+ /DISCARD/ : {
+ *(.note.GNU-stack .note.gnu.property)
+ *(.llvm_addrsig)
+ }
+}
--
2.44.0
Now that the purgatory has a linker script that explicitly describes all
of its sections, turn on orphan section warnings for it so that new
sections do not appear without notice.
Signed-off-by: Nathan Chancellor <[email protected]>
---
arch/x86/purgatory/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
index dfc030a4cca9..a6c8239abcba 100644
--- a/arch/x86/purgatory/Makefile
+++ b/arch/x86/purgatory/Makefile
@@ -17,7 +17,7 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
# When linking purgatory.ro with -r unresolved symbols are not checked,
# also link a purgatory.chk binary without -r to check for unresolved symbols.
PURGATORY_LDFLAGS := -z nodefaultlib
-LDFLAGS_purgatory.ro := -r $(PURGATORY_LDFLAGS) -T
+LDFLAGS_purgatory.ro := --orphan-handling=$(CONFIG_LD_ORPHAN_WARN_LEVEL) -r $(PURGATORY_LDFLAGS) -T
LDFLAGS_purgatory.chk := -e purgatory_start $(PURGATORY_LDFLAGS)
targets += purgatory.ro purgatory.chk
--
2.44.0
On April 17, 2024 11:53:44 PM GMT+02:00, Nathan Chancellor <[email protected]> wrote:
>Hi all,
>
>After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put
>functions into .ltext with large section flag (#73037)") [1], which
>landed in the 18.x cycle, there is a runtime warning when loading a
>kernel via kexec due to the presence of two text sections (.text and
>.ltext).
How much of this silliness should we expect now for other parts of the kernel?
Can we turn this off?
Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that?
Thx.
--
Sent from a small device: formatting sucks and brevity is inevitable.
On Thu, Apr 18, 2024 at 01:14:35PM +0200, Borislav Petkov wrote:
> On April 17, 2024 11:53:44 PM GMT+02:00, Nathan Chancellor <[email protected]> wrote:
> >Hi all,
> >
> >After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put
> >functions into .ltext with large section flag (#73037)") [1], which
> >landed in the 18.x cycle, there is a runtime warning when loading a
> >kernel via kexec due to the presence of two text sections (.text and
> >.ltext).
>
> How much of this silliness should we expect now for other parts of the kernel?
Not sure. If I could predict the future, I wouldn't be doing kernel
development :) The only reason the purgatory got bit by that LLVM change
is because it uses '-mcmodel=large', which is not very common within the
kernel (I only see it in arch/um and arch/powerpc other than here).
> Can we turn this off?
No, not as far as I am aware. I suspect it is because for the majority
of programs, this is not an issue so it does not justify having a reason
to make it toggleable but I am not the author of the LLVM change so I
cannot say. However, if this has been the solution when the issue of
multiple text sections was first brought up in 97b6b9cbba40, I would
just be adding '.ltext' and '.lrodata' to the '.text' and '.rodata'
sections to this linker script, so it would be nice to do this so that
any future changes are either taken care of by the '.text.*'
automatically like '.text.hot' or '.text.<func>' would have been or they
are caught by the orphan warnings and addressed in a separate change.
> Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that?
Not sure, I can only go off of what is in the commit message of the LLVM
change that introduced this optimization and the surrounding PR
discussion, which just seems to indicate a desire to keep small/medium
and large text separate *shrug*
https://github.com/llvm/llvm-project/commit/d8a04398f9492f043ffd8fbaf2458778f7d0fcd5
https://github.com/llvm/llvm-project/pull/73037
Cheers,
Nathan
On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <[email protected]> wrote:
> How much of this silliness should we expect now for other parts of the kernel?
Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um
does for the whole kernel. So that LLVM change may have implications
for those 2 other architectures. Not sure we've had any bug reports
or breakage in CI yet, like we have for x86+kexec.
> Can we turn this off?
Maybe we need to revisit
commit e16c2983fba0 ("x86/purgatory: Change compiler flags from
-mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05
at least the -mcmodel=kernel addition (since that patch added a few
additional compiler flags that still LGTM).
> Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that?
Google is now at the point where a few binaries running in data
centers are measured in the gigabytes, and attempting to link them may
result in relocation overflows. From that commit message, it sounds
like they link together object files built with the default code model
and some objects from the larger code model. Putting large code model
data+code in distinct sections is helpful for then being able to place
those further away in an object. For other architectures, the linker
may insert a veneer/trampoline. Not sure why that's not used here.
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mlarge-data-threshold
makes it sound like GCC may place data larger than a certain threshold
in a new section. Dunno about code (.text) though.
Arthur, you probably happen to know more about code models at this
point than anyone particularly cares to. The raison d'etre for
e16c2983fba0 was avoiding R_X86_64_32/R_X86_64_32S relocations. Do
you know if there's another code model that can force R_X86_64_64? Or
is the large code model the way to go here, with updates to linker
scripts for this new section?
+ Fangrui, Ard, who might know of alternative solutions to
-mcmodel=large for e16c2983fba0.
Otherwise, I think the dedicated linker script is the way to go. We
really want tight control over what is or is not in the purgatory
image.
--
Thanks,
~Nick Desaulniers
On Thu, 18 Apr 2024 at 17:44, Nick Desaulniers <[email protected]> wrote:
>
> On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <[email protected]> wrote:
> > How much of this silliness should we expect now for other parts of the kernel?
>
> Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um
> does for the whole kernel. So that LLVM change may have implications
> for those 2 other architectures. Not sure we've had any bug reports
> or breakage in CI yet, like we have for x86+kexec.
>
> > Can we turn this off?
>
> Maybe we need to revisit
> commit e16c2983fba0 ("x86/purgatory: Change compiler flags from
> -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05
>
> at least the -mcmodel=kernel addition (since that patch added a few
> additional compiler flags that still LGTM).
>
..
> + Fangrui, Ard, who might know of alternative solutions to
> -mcmodel=large for e16c2983fba0.
>
I think it would be better to use -mcmodel=small -fpic. As Nick
explains, the large code model is really more suitable for executables
that span a large memory range. The issue with the purgatory seems to
be that it can be placed anywhere in memory, not that it is very big.
-mcmodel=small -fpic is what user space typically uses, so it is much
less likely to create problems.
Note that I have been looking into whether we can build the entire
kernel with -fpic (for various reasons). There are some issues to
resolve there, mostly related to per-CPU variables and the per-CPU
stack protector, but beyond that, things work happily and the number
of boot time relocations drops dramatically, due to the use of
RIP-relative references. So for the purgatory, I wouldn't expect too
many surprises.
> Otherwise, I think the dedicated linker script is the way to go. We
> really want tight control over what is or is not in the purgatory
> image.
Linker scripts are a bit tedious when it comes to maintenance,
especially with weird executables such as this one and needing to
support different linkers. So I'd prefer to avoid this.
On Thu, 18 Apr 2024 at 17:59, Ard Biesheuvel <[email protected]> wrote:
>
> On Thu, 18 Apr 2024 at 17:44, Nick Desaulniers <[email protected]> wrote:
> >
> > On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <[email protected]> wrote:
> > > How much of this silliness should we expect now for other parts of the kernel?
> >
> > Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um
> > does for the whole kernel. So that LLVM change may have implications
> > for those 2 other architectures. Not sure we've had any bug reports
> > or breakage in CI yet, like we have for x86+kexec.
> >
> > > Can we turn this off?
> >
> > Maybe we need to revisit
> > commit e16c2983fba0 ("x86/purgatory: Change compiler flags from
> > -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05
> >
> > at least the -mcmodel=kernel addition (since that patch added a few
> > additional compiler flags that still LGTM).
> >
> ...
>
> > + Fangrui, Ard, who might know of alternative solutions to
> > -mcmodel=large for e16c2983fba0.
> >
>
> I think it would be better to use -mcmodel=small -fpic. As Nick
> explains, the large code model is really more suitable for executables
> that span a large memory range. The issue with the purgatory seems to
> be that it can be placed anywhere in memory, not that it is very big.
>
> -mcmodel=small -fpic is what user space typically uses, so it is much
> less likely to create problems.
>
> Note that I have been looking into whether we can build the entire
> kernel with -fpic (for various reasons). There are some issues to
> resolve there, mostly related to per-CPU variables and the per-CPU
> stack protector, but beyond that, things work happily and the number
> of boot time relocations drops dramatically, due to the use of
> RIP-relative references. So for the purgatory, I wouldn't expect too
> many surprises.
>
Replacing -mcmodel=large in PURGATORY_CFLAGS with
--mcmodel=small -fpic -fvisibility=hidden
seems to do the trick for me.