2024-04-24 16:23:05

by Ard Biesheuvel

[permalink] [raw]
Subject: [RFC PATCH 0/9] kexec x86 purgatory cleanup

From: Ard Biesheuvel <[email protected]>

The kexec purgatory is built like a kernel module, i.e., a partially
linked ELF object where each section is allocated and placed
individually, and all relocations need to be fixed up, even place
relative ones.

This makes sense for kernel modules, which share the address space with
the core kernel, and contain unresolved references that need to be wired
up to symbols in other modules or the kernel itself.

The purgatory, however, is a fully linked binary without any external
references, or any overlap with the kernel's virtual address space. So
it makes much more sense to create a fully linked ELF executable that
can just be loaded and run anywhere in memory.

The purgatory build on x86 has already switched over to position
independent codegen, which only leaves a handful of absolute references,
which can either be dropped (patch #3) or converted into a RIP-relative
one (patch #4). That leaves a purgatory executable that can run at any
offset in memory with applying any relocations whatsoever.

Some tweaks are needed to deal with the difference between partially
(ET_REL) and fully (ET_DYN/ET_EXEC) linked ELF objects, but with those
in place, a substantial amount of complicated ELF allocation, placement
and patching/relocation code can simply be dropped.

The last patch in the series removes this code from the generic kexec
implementation, but this can only be done once other architectures apply
the same changes proposed here for x86 (powerpc, s390 and riscv all
implement the purgatory using the shared logic)

Link: https://lore.kernel.org/all/CAKwvOd=3Jrzju++=Ve61=ZdeshxUM=K3-bGMNREnGOQgNw=aag@mail.gmail.com/
Link: https://lore.kernel.org/all/[email protected]/

Cc: Arnd Bergmann <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: [email protected]
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Bill Wendling <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: Masahiro Yamada <[email protected]>

Ard Biesheuvel (9):
x86/purgatory: Drop function entry padding from purgatory
x86/purgatory: Simplify stack handling
x86/purgatory: Drop pointless GDT switch
x86/purgatory: Avoid absolute reference to GDT
x86/purgatory: Simplify GDT and drop data segment
kexec: Add support for fully linked purgatory executables
x86/purgatory: Use fully linked PIE ELF executable
x86/purgatory: Simplify references to regs array
kexec: Drop support for partially linked purgatory executables

arch/x86/include/asm/kexec.h | 8 -
arch/x86/kernel/kexec-bzimage64.c | 8 -
arch/x86/kernel/machine_kexec_64.c | 127 ----------
arch/x86/purgatory/Makefile | 17 +-
arch/x86/purgatory/entry64.S | 96 ++++----
arch/x86/purgatory/setup-x86_64.S | 31 +--
arch/x86/purgatory/stack.S | 18 --
include/asm-generic/purgatory.lds | 34 +++
kernel/kexec_file.c | 255 +++-----------------
9 files changed, 125 insertions(+), 469 deletions(-)
delete mode 100644 arch/x86/purgatory/stack.S
create mode 100644 include/asm-generic/purgatory.lds

--
2.44.0.769.g3c40516874-goog



2024-04-24 16:27:09

by Ard Biesheuvel

[permalink] [raw]
Subject: [RFC PATCH 5/9] x86/purgatory: Simplify GDT and drop data segment

From: Ard Biesheuvel <[email protected]>

Data segment selectors are ignored in long mode so there is no point in
programming them. So clear them instead. This only leaves the code
segment entry in the GDT, which can be moved up a slot now that the
second slot is no longer used as the GDT descriptor.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/purgatory/entry64.S | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/x86/purgatory/entry64.S b/arch/x86/purgatory/entry64.S
index 888661d9db9c..3d09781d4f9a 100644
--- a/arch/x86/purgatory/entry64.S
+++ b/arch/x86/purgatory/entry64.S
@@ -23,14 +23,14 @@ SYM_CODE_START(entry64)
addq $10, %rsp

/* load the data segments */
- movl $0x18, %eax /* data segment */
+ xorl %eax, %eax /* data segment */
movl %eax, %ds
movl %eax, %es
movl %eax, %ss
movl %eax, %fs
movl %eax, %gs

- pushq $0x10 /* CS */
+ pushq $0x8 /* CS */
leaq new_cs_exit(%rip), %rax
pushq %rax
lretq
@@ -84,16 +84,9 @@ SYM_DATA_END(entry64_regs)
SYM_DATA_START_LOCAL(gdt)
/*
* 0x00 unusable segment
- * 0x08 unused
- * so use them as gdt ptr
*/
- .word 0
.quad 0
- .word 0, 0, 0

- /* 0x10 4GB flat code segment */
+ /* 0x8 4GB flat code segment */
.word 0xFFFF, 0x0000, 0x9A00, 0x00AF
-
- /* 0x18 4GB flat data segment */
- .word 0xFFFF, 0x0000, 0x9200, 0x00CF
SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)
--
2.44.0.769.g3c40516874-goog


2024-04-24 20:25:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] kexec x86 purgatory cleanup

Ard Biesheuvel <[email protected]> writes:

> From: Ard Biesheuvel <[email protected]>
>
> The kexec purgatory is built like a kernel module, i.e., a partially
> linked ELF object where each section is allocated and placed
> individually, and all relocations need to be fixed up, even place
> relative ones.
>
> This makes sense for kernel modules, which share the address space with
> the core kernel, and contain unresolved references that need to be wired
> up to symbols in other modules or the kernel itself.
>
> The purgatory, however, is a fully linked binary without any external
> references, or any overlap with the kernel's virtual address space. So
> it makes much more sense to create a fully linked ELF executable that
> can just be loaded and run anywhere in memory.

It does have external references that are resolved when it is loaded.

Further it is at least my impression that non-PIC code is more
efficient. PIC typically requires silly things like Global Offset
Tables that non-PIC code does not. At first glance this looks like a
code passivization.

Now at lot of functionality has been stripped out of purgatory so maybe
in it's stripped down this make sense, but I want to challenge the
notion that this is the obvious thing to do.

> The purgatory build on x86 has already switched over to position
> independent codegen, which only leaves a handful of absolute references,
> which can either be dropped (patch #3) or converted into a RIP-relative
> one (patch #4). That leaves a purgatory executable that can run at any
> offset in memory with applying any relocations whatsoever.

I missed that conversation. Do you happen to have a pointer? I would
think the 32bit code is where the PIC would be most costly as the 32bit
x86 instruction set predates PIC being a common compilation target.

> Some tweaks are needed to deal with the difference between partially
> (ET_REL) and fully (ET_DYN/ET_EXEC) linked ELF objects, but with those
> in place, a substantial amount of complicated ELF allocation, placement
> and patching/relocation code can simply be dropped.

Really? As I recall it only needed to handle a single allocation type,
and there were good reasons (at least when I wrote it) to patch symbols.

Again maybe the fact that people have removed 90% of the functionality
makes this make sense, but that is not obvious at first glance.

> The last patch in the series removes this code from the generic kexec
> implementation, but this can only be done once other architectures apply
> the same changes proposed here for x86 (powerpc, s390 and riscv all
> implement the purgatory using the shared logic)
>
> Link: https://lore.kernel.org/all/CAKwvOd=3Jrzju++=Ve61=ZdeshxUM=K3-bGMNREnGOQgNw=aag@mail.gmail.com/
> Link: https://lore.kernel.org/all/[email protected]/
>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: [email protected]
> Cc: Nathan Chancellor <[email protected]>
> Cc: Nick Desaulniers <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Bill Wendling <[email protected]>
> Cc: Justin Stitt <[email protected]>
> Cc: Masahiro Yamada <[email protected]>
>
> Ard Biesheuvel (9):
> x86/purgatory: Drop function entry padding from purgatory
> x86/purgatory: Simplify stack handling
> x86/purgatory: Drop pointless GDT switch
> x86/purgatory: Avoid absolute reference to GDT
> x86/purgatory: Simplify GDT and drop data segment
> kexec: Add support for fully linked purgatory executables
> x86/purgatory: Use fully linked PIE ELF executable
> x86/purgatory: Simplify references to regs array
> kexec: Drop support for partially linked purgatory executables
>
> arch/x86/include/asm/kexec.h | 8 -
> arch/x86/kernel/kexec-bzimage64.c | 8 -
> arch/x86/kernel/machine_kexec_64.c | 127 ----------
> arch/x86/purgatory/Makefile | 17 +-
> arch/x86/purgatory/entry64.S | 96 ++++----
> arch/x86/purgatory/setup-x86_64.S | 31 +--
> arch/x86/purgatory/stack.S | 18 --
> include/asm-generic/purgatory.lds | 34 +++
> kernel/kexec_file.c | 255 +++-----------------
> 9 files changed, 125 insertions(+), 469 deletions(-)
> delete mode 100644 arch/x86/purgatory/stack.S
> create mode 100644 include/asm-generic/purgatory.lds

Eric

2024-04-24 20:52:32

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH 0/9] kexec x86 purgatory cleanup

On Wed, 24 Apr 2024 at 22:04, Eric W. Biederman <[email protected]> wrote:
>
> Ard Biesheuvel <[email protected]> writes:
>
> > From: Ard Biesheuvel <[email protected]>
> >
> > The kexec purgatory is built like a kernel module, i.e., a partially
> > linked ELF object where each section is allocated and placed
> > individually, and all relocations need to be fixed up, even place
> > relative ones.
> >
> > This makes sense for kernel modules, which share the address space with
> > the core kernel, and contain unresolved references that need to be wired
> > up to symbols in other modules or the kernel itself.
> >
> > The purgatory, however, is a fully linked binary without any external
> > references, or any overlap with the kernel's virtual address space. So
> > it makes much more sense to create a fully linked ELF executable that
> > can just be loaded and run anywhere in memory.
>
> It does have external references that are resolved when it is loaded.
>

It doesn't today, and it hasn't for a while, at least since commit

e4160b2e4b02377c67f8ecd05786811598f39acd
x86/purgatory: Fail the build if purgatory.ro has missing symbols

which forces a build failure on unresolved external references, by
doing a full link of the purgatory.

> Further it is at least my impression that non-PIC code is more
> efficient. PIC typically requires silly things like Global Offset
> Tables that non-PIC code does not. At first glance this looks like a
> code passivization.
>

Given that the 64-bit purgatory can be loaded in memory that is not
32-bit addressable, the PIC code is essentially a given, since the
large code model is much worse (it uses 64-bit immediate for all
function and variable symbols, and therefore always uses indirect
calls)

Please refer to

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/build&id=cba786af84a0f9716204e09f518ce3b7ada8555e

for more details. (Getting pulled into that discussion is how I ended
up looking into the purgatory in more detail)

> Now at lot of functionality has been stripped out of purgatory so maybe
> in it's stripped down this make sense, but I want to challenge the
> notion that this is the obvious thing to do.
>

The diffstat speaks for itself - on x86, much of the allocation and
relocation logic can simply be dropped when building the purgatory in
this manner.

> > The purgatory build on x86 has already switched over to position
> > independent codegen, which only leaves a handful of absolute references,
> > which can either be dropped (patch #3) or converted into a RIP-relative
> > one (patch #4). That leaves a purgatory executable that can run at any
> > offset in memory with applying any relocations whatsoever.
>
> I missed that conversation. Do you happen to have a pointer? I would
> think the 32bit code is where the PIC would be most costly as the 32bit
> x86 instruction set predates PIC being a common compilation target.
>

See link above. Note that this none of this is about 32-bit code - the
purgatory as it exists today never drops out of long mode (and no
32-bit version appears to exist)

> > Some tweaks are needed to deal with the difference between partially
> > (ET_REL) and fully (ET_DYN/ET_EXEC) linked ELF objects, but with those
> > in place, a substantial amount of complicated ELF allocation, placement
> > and patching/relocation code can simply be dropped.
>
> Really? As I recall it only needed to handle a single allocation type,
> and there were good reasons (at least when I wrote it) to patch symbols.
>
> Again maybe the fact that people have removed 90% of the functionality
> makes this make sense, but that is not obvious at first glance.
>

Again, the patches and the diffstat speak for themselves - the linker
applies all the relocations at build time, and emits all the sections
into a single ELF segment that can be copied into memory and executed
directly (modulo poking values into the global variables for the
sha256 digest and the segment list)

The last patch in the series shows which code we could drop from the
generic kexec_file_load() implementation once other architectures
adopt this scheme.