Update the x86 boot path to avoid the bare metal decompressor when
booting via the EFI stub. The bare metal decompressor inherits the
loader's 1:1 mapping of DRAM when entering in 64-bit mode, and assumes
that all of it is mapped read/write/execute, which will no longer be the
case on systems built to comply with recently tightened logo
requirements (*).
Changes since v6 [9]:
- add new patch to fix our current reliance on 64-bit GPRs retaining
their full width contents across the switch into 32-bit protected
mode (with fixes: tag, may need to go to -stable);
- preserve the top half of RSP explicitly, and preserve all callee save
registers on the stack across the mode switch; this fixes a reported
issue on Ice Lake with kexec (which loads the kernel above 4G)
Changes since v5 [8]:
- reintroduce patch removing redundant RSI pushes and pops from
arch/x86/kernel/head_64.S
- avoid bare constant 0x200 for the offset of startup_64() in the
decompressor
- rejig SEV/SNP logic in patch #20 once again, to ensure that CPUID
calls and VM exits only occur when the active configuration permits
it
- improve/clarify some code comments and commit logs
- rebase onto v6.5-rc1
Changes since v4 [7]:
- avoid CPUID calls after protocol negotiation but before configuring
exception handling;
- drop patch removing redundant RSI pushes and pops from
arch/x86/kernel/head_64.S
- rebase onto -tip x86/cc - the conflicts are mostly trivial and
restricted to the last 4 patches in the series, so applying this onto
a separate topic branch should be straight-forward as well.
Changes since v3 [6]:
- trivial rebase onto Kirill's unaccepted memory series v13
- test SNP feature mask while running in the EFI boot services, and fail
gracefully on a mismatch
- perform only the SEV init after ExitBootServices()
Changes since v2 [4]:
- update prose style to comply with -tip guidelines
- rebased onto Kirill's unaccepted memory series [3]
- add Kirill's ack to 4/5-level paging changes
- perform SEV init and SNP feature check after ExitBootServices(), to
avoid corrupting the firmware's own SEV state
- split out preparatory refactor of handover entry code and BSS clearing
(patches #1 to #4)
Changes since v1 [2]:
- streamline existing 4/5 level switching code and call it directly from
the EFI stub - this is covered by the first 9 patches, which can be
applied in isolation, if desired;
- deal with SEV/SNP init explicitly;
- clear BSS when booting via the 'handover protocol'
- switch to kernel CS before calling SEV init code in kernel proper.
---- v1 cover letter follows ----
This series is conceptually a combination of Evgeny's series [0] and
mine [1], both of which attempt to make the early decompressor code more
amenable to executing in the EFI environment with stricter handling of
memory permissions.
My series [1] implemented zboot for x86, by getting rid of the entire
x86 decompressor, and replacing it with existing EFI code that does the
same but in a generic way. The downside of this is that only EFI boot is
supported, making it unviable for distros, which need to support BIOS
boot and hybrid EFI boot modes that omit the EFI stub.
Evgeny's series [0] adapted the entire decompressor code flow to allow
it to execute in the EFI context as well as the bare metal context, and
this involves changes to the 1:1 mapping code and the page fault
handlers etc, none of which are really needed when doing EFI boot in the
first place.
So this series attempts to occupy the middle ground here: it makes
minimal changes to the existing decompressor so some of it can be called
from the EFI stub. Then, it reimplements the EFI boot flow to decompress
the kernel and boot it directly, without relying on the trampoline
allocation code, page table code or page fault handling code. This
allows us to get rid of quite a bit of unsavory EFI stub code, and
replace it with two clear invocations of the EFI firmware APIs to clear
NX restrictions from allocations that have been populated with
executable code.
The only code that is being reused is the decompression library itself,
along with the minimal ELF parsing that is required to copy the ELF
segments in place, and the relocation processing that fixes up absolute
symbol references to refer to the correct virtual addresses.
Note that some of Evgeny's changes to clean up the PE/COFF header
generation will still be needed, but I've omitted those here for
brevity.
(*) IMHO the following developments are likely to occur:
- the Windows boot chain (along with 3rd party drivers) is cleaned up so
that it never relies on memory being writable and executable at the
same time when running under the EFI boot services;
- the EFI reference implementation gets updated to map all memory NX by
default, and to require read-only permissions for executable mappings;
- BIOS vendors incorporate these changes into their codebases, and
deploy it more widely than just the 'secure' SKUs;
- OEMs only care about the Windows sticker [5], so they only boot test
Windows, which works fine in this more restricted context;
- Linux boot no longer works reliably on new hardware built for Windows
unless we clean up our boot chain as well.
Cc: Evgeniy Baskov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Alexey Khoroshilov <[email protected]>
Cc: Peter Jones <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Mario Limonciello <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Joerg Roedel <[email protected]>
[0] https://lore.kernel.org/all/[email protected]/
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/
[5] https://techcommunity.microsoft.com/t5/hardware-dev-center/new-uefi-ca-memory-mitigation-requirements-for-signing/ba-p/3608714
[6] https://lore.kernel.org/all/[email protected]/
[7] https://lore.kernel.org/all/[email protected]/
[8] https://lore.kernel.org/all/[email protected]/
[9] https://lore.kernel.org/all/[email protected]/
Ard Biesheuvel (22):
x86/decompressor: Don't rely on upper 32 bits of GPRs being preserved
x86/head_64: Store boot_params pointer in callee save register
x86/efistub: Branch straight to kernel entry point from C code
x86/efistub: Simplify and clean up handover entry code
x86/decompressor: Avoid magic offsets for EFI handover entrypoint
x86/efistub: Clear BSS in EFI handover protocol entrypoint
x86/decompressor: Use proper sequence to take the address of the GOT
x86/decompressor: Store boot_params pointer in callee save register
x86/decompressor: Call trampoline as a normal function
x86/decompressor: Use standard calling convention for trampoline
x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
x86/decompressor: Call trampoline directly from C code
x86/decompressor: Only call the trampoline when changing paging levels
x86/decompressor: Merge trampoline cleanup with switching code
x86/efistub: Perform 4/5 level paging switch from the stub
x86/efistub: Prefer EFI memory attributes protocol over DXE services
decompress: Use 8 byte alignment
x86/decompressor: Move global symbol references to C code
x86/decompressor: Factor out kernel decompression and relocation
efi/libstub: Add limit argument to efi_random_alloc()
x86/efistub: Perform SNP feature test while running in the firmware
x86/efistub: Avoid legacy decompressor when doing EFI boot
Documentation/arch/x86/boot.rst | 2 +-
arch/x86/boot/compressed/Makefile | 5 +
arch/x86/boot/compressed/efi_mixed.S | 107 +++-----
arch/x86/boot/compressed/head_32.S | 34 +--
arch/x86/boot/compressed/head_64.S | 242 +++++------------
arch/x86/boot/compressed/misc.c | 44 ++-
arch/x86/boot/compressed/pgtable.h | 8 +-
arch/x86/boot/compressed/pgtable_64.c | 74 +++--
arch/x86/boot/compressed/sev.c | 91 +++++--
arch/x86/include/asm/boot.h | 8 +
arch/x86/include/asm/efi.h | 7 +-
arch/x86/include/asm/sev.h | 6 +
arch/x86/kernel/head_64.S | 23 +-
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/firmware/efi/libstub/arm64-stub.c | 2 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 2 +
drivers/firmware/efi/libstub/efistub.h | 3 +-
drivers/firmware/efi/libstub/randomalloc.c | 10 +-
drivers/firmware/efi/libstub/x86-5lvl.c | 95 +++++++
drivers/firmware/efi/libstub/x86-stub.c | 285 +++++++++++---------
drivers/firmware/efi/libstub/x86-stub.h | 17 ++
drivers/firmware/efi/libstub/zboot.c | 2 +-
include/linux/decompress/mm.h | 2 +-
23 files changed, 563 insertions(+), 507 deletions(-)
create mode 100644 drivers/firmware/efi/libstub/x86-5lvl.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
--
2.39.2
The 32-bit trampoline no longer uses the stack for anything except
performing a far return back to long mode. Currently, this stack is
placed in the same page that carries the trampoline code, which means
this page must be mapped writable and executable, and the stack is
therefore executable as well.
Replace the far return with a far jump, so that the return address can
be pre-calculated and patched into the code before it is called. This
removes the need for a stack entirely, and in a later patch, this will
be taken advantage of by removing writable permissions from (and adding
executable permissions to) this code page explicitly when booting via
the EFI stub.
Not touching the stack pointer also makes it more straight-forward to
call the trampoline code as an ordinary 64-bit function from C code.
Note that we need to preserve the value of RSP across the switch into
compatibility mode: the stack pointer may get truncated to 32 bits.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 64 ++++++++++----------
arch/x86/boot/compressed/pgtable.h | 4 +-
arch/x86/boot/compressed/pgtable_64.c | 12 +++-
3 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 491d985be75fd5b0..1b0c61d1b389fd37 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -449,9 +449,6 @@ SYM_CODE_START(startup_64)
leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
call *%rax
- /* Restore the stack, the 32-bit trampoline uses its own stack */
- leaq rva(boot_stack_end)(%rbx), %rsp
-
/*
* cleanup_trampoline() would restore trampoline memory.
*
@@ -537,32 +534,37 @@ SYM_FUNC_END(.Lrelocated)
* EDI contains the base address of the trampoline memory.
* Non-zero ESI means trampoline needs to enable 5-level paging.
*/
+ .section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
- /* Grab return address */
- movq (%rsp), %rax
-
- /* Set up 32-bit addressable stack */
- leaq TRAMPOLINE_32BIT_STACK_END(%rdi), %rsp
-
- /* Preserve return address and other live 64-bit registers */
- pushq %rax
+ /* Preserve live 64-bit registers */
pushq %r15
pushq %rbp
pushq %rbx
+ /* Preserve top half of RSP in a legacy mode GPR to avoid truncation */
+ movq %rsp, %rbx
+ shrq $32, %rbx
+
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
leaq 0f(%rip), %rax
pushq %rax
lretq
+ /*
+ * The 32-bit code below will do a far jump back to long mode and end
+ * up here after reconfiguring the number of paging levels.
+ */
+.Lret: shlq $32, %rbx // Reconstruct stack pointer
+ orq %rbx, %rsp
+
+ popq %rbx
+ popq %rbp
+ popq %r15
+ retq
+
.code32
0:
- /* Set up data and stack segments */
- movl $__KERNEL_DS, %eax
- movl %eax, %ds
- movl %eax, %ss
-
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
@@ -617,29 +619,25 @@ SYM_CODE_START(trampoline_32bit_src)
1:
movl %eax, %cr4
- /* Calculate address of paging_enabled() once we are executing in the trampoline */
- leal .Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%edi), %eax
-
- /* Prepare the stack for far return to Long Mode */
- pushl $__KERNEL_CS
- pushl %eax
-
/* Enable paging again. */
movl %cr0, %eax
btsl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- lret
+ /*
+ * Return to the 64-bit calling code using LJMP rather than LRET, to
+ * avoid the need for a 32-bit addressable stack. The destination
+ * address will be adjusted after the template code is copied into a
+ * 32-bit addressable buffer.
+ */
+.Ljmp: ljmpl $__KERNEL_CS, $(.Lret - trampoline_32bit_src)
SYM_CODE_END(trampoline_32bit_src)
- .code64
-SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
- /* Return from the trampoline */
- popq %rbx
- popq %rbp
- popq %r15
- retq
-SYM_FUNC_END(.Lpaging_enabled)
+/*
+ * This symbol is placed right after trampoline_32bit_src() so its address can
+ * be used to infer the size of the trampoline code.
+ */
+SYM_DATA(trampoline_ljmp_imm_offset, .word .Ljmp + 1 - trampoline_32bit_src)
/*
* The trampoline code has a size limit.
@@ -648,7 +646,7 @@ SYM_FUNC_END(.Lpaging_enabled)
*/
.org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
- .code32
+ .text
SYM_FUNC_START_LOCAL_NOALIGN(.Lno_longmode)
/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 4e8cef135226bcbb..c6b0903aded05a07 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -8,13 +8,13 @@
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
-#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
-
#ifndef __ASSEMBLER__
extern unsigned long *trampoline_32bit;
extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
+extern const u16 trampoline_ljmp_imm_offset;
+
#endif /* __ASSEMBLER__ */
#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff4111bf8c0..d66639c961b8eeda 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -109,6 +109,7 @@ static unsigned long find_trampoline_placement(void)
struct paging_config paging_prepare(void *rmode)
{
struct paging_config paging_config = {};
+ void *tramp_code;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
boot_params = rmode;
@@ -143,9 +144,18 @@ struct paging_config paging_prepare(void *rmode)
memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
/* Copy trampoline code in place */
- memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+ tramp_code = memcpy(trampoline_32bit +
+ TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+ /*
+ * Avoid the need for a stack in the 32-bit trampoline code, by using
+ * LJMP rather than LRET to return back to long mode. LJMP takes an
+ * immediate absolute address, which needs to be adjusted based on the
+ * placement of the trampoline.
+ */
+ *(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+
/*
* The code below prepares page table in trampoline memory.
*
--
2.39.2
It is no longer necessary to be cautious when referring to global
variables in the position independent decompressor code, now that it is
built using PIE codegen and makes an assertion in the linker script that
no GOT entries exist (which would require adjustment for the actual
runtime load address of the decompressor binary).
This means global variables can be referenced directly from C code,
instead of having to pass their runtime addresses into C routines from
asm code, which needs to happen at each call site. Do so for the code
that will be called directly from the EFI stub after a subsequent patch,
and avoid the need to duplicate this logic a third time.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 8 --------
arch/x86/boot/compressed/head_64.S | 8 +-------
arch/x86/boot/compressed/misc.c | 16 +++++++++-------
3 files changed, 10 insertions(+), 22 deletions(-)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 3530465b5b85ccf3..beee858058df4403 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -168,13 +168,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
*/
/* push arguments for extract_kernel: */
- pushl output_len@GOTOFF(%ebx) /* decompressed length, end of relocs */
pushl %ebp /* output address */
- pushl input_len@GOTOFF(%ebx) /* input_len */
- leal input_data@GOTOFF(%ebx), %eax
- pushl %eax /* input_data */
- leal boot_heap@GOTOFF(%ebx), %eax
- pushl %eax /* heap area */
pushl %esi /* real mode pointer */
call extract_kernel /* returns kernel entry point in %eax */
addl $24, %esp
@@ -202,8 +196,6 @@ SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)
*/
.bss
.balign 4
-boot_heap:
- .fill BOOT_HEAP_SIZE, 1, 0
boot_stack:
.fill BOOT_STACK_SIZE, 1, 0
boot_stack_end:
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index eb33edf1e75d4b02..a9237e48f2f7cfd5 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -493,11 +493,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
* Do the extraction, and jump to the new kernel..
*/
movq %r15, %rdi /* pass struct boot_params pointer */
- leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
- leaq input_data(%rip), %rdx /* input_data */
- movl input_len(%rip), %ecx /* input_len */
- movq %rbp, %r8 /* output target address */
- movl output_len(%rip), %r9d /* decompressed length, end of relocs */
+ movq %rbp, %rsi /* output target address */
call extract_kernel /* returns kernel entry point in %rax */
/*
@@ -661,8 +657,6 @@ SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
*/
.bss
.balign 4
-SYM_DATA_LOCAL(boot_heap, .fill BOOT_HEAP_SIZE, 1, 0)
-
SYM_DATA_START_LOCAL(boot_stack)
.fill BOOT_STACK_SIZE, 1, 0
.balign 16
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 94b7abcf624b3b55..2d91d56b59e1af93 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -330,6 +330,11 @@ static size_t parse_elf(void *output)
return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
}
+static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
+
+extern unsigned char input_data[];
+extern unsigned int input_len, output_len;
+
/*
* The compressed kernel image (ZO), has been moved so that its position
* is against the end of the buffer used to hold the uncompressed kernel
@@ -347,14 +352,11 @@ static size_t parse_elf(void *output)
* |-------uncompressed kernel image---------|
*
*/
-asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
- unsigned char *input_data,
- unsigned long input_len,
- unsigned char *output,
- unsigned long output_len)
+asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
{
const unsigned long kernel_total_size = VO__end - VO__text;
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+ memptr heap = (memptr)boot_heap;
unsigned long needed_size;
size_t entry_offset;
@@ -412,7 +414,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
* entries. This ensures the full mapped area is usable RAM
* and doesn't include any reserved areas.
*/
- needed_size = max(output_len, kernel_total_size);
+ needed_size = max_t(unsigned long, output_len, kernel_total_size);
#ifdef CONFIG_X86_64
needed_size = ALIGN(needed_size, MIN_KERNEL_ALIGN);
#endif
@@ -443,7 +445,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
#ifdef CONFIG_X86_64
if (heap > 0x3fffffffffffUL)
error("Destination address too large");
- if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
+ if (virt_addr + needed_size > KERNEL_IMAGE_SIZE)
error("Destination virtual address is beyond the kernel mapping area");
#else
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
--
2.39.2
On Fri, Jul 28, 2023 at 11:09:05AM +0200, Ard Biesheuvel wrote:
> The 32-bit trampoline no longer uses the stack for anything except
> performing a far return back to long mode. Currently, this stack is
> placed in the same page that carries the trampoline code, which means
> this page must be mapped writable and executable, and the stack is
> therefore executable as well.
>
> Replace the far return with a far jump, so that the return address can
> be pre-calculated and patched into the code before it is called. This
> removes the need for a stack entirely, and in a later patch, this will
> be taken advantage of by removing writable permissions from (and adding
> executable permissions to) this code page explicitly when booting via
> the EFI stub.
>
> Not touching the stack pointer also makes it more straight-forward to
> call the trampoline code as an ordinary 64-bit function from C code.
>
> Note that we need to preserve the value of RSP across the switch into
^^
Passive voice pls.
> compatibility mode: the stack pointer may get truncated to 32 bits.
>
> Acked-by: Kirill A. Shutemov <[email protected]>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/boot/compressed/head_64.S | 64 ++++++++++----------
> arch/x86/boot/compressed/pgtable.h | 4 +-
> arch/x86/boot/compressed/pgtable_64.c | 12 +++-
> 3 files changed, 44 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 491d985be75fd5b0..1b0c61d1b389fd37 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -449,9 +449,6 @@ SYM_CODE_START(startup_64)
> leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
> call *%rax
>
> - /* Restore the stack, the 32-bit trampoline uses its own stack */
> - leaq rva(boot_stack_end)(%rbx), %rsp
> -
> /*
> * cleanup_trampoline() would restore trampoline memory.
> *
> @@ -537,32 +534,37 @@ SYM_FUNC_END(.Lrelocated)
> * EDI contains the base address of the trampoline memory.
> * Non-zero ESI means trampoline needs to enable 5-level paging.
> */
> + .section ".rodata", "a", @progbits
> SYM_CODE_START(trampoline_32bit_src)
> - /* Grab return address */
> - movq (%rsp), %rax
> -
> - /* Set up 32-bit addressable stack */
> - leaq TRAMPOLINE_32BIT_STACK_END(%rdi), %rsp
> -
> - /* Preserve return address and other live 64-bit registers */
> - pushq %rax
> + /* Preserve live 64-bit registers */
> pushq %r15
> pushq %rbp
> pushq %rbx
>
> + /* Preserve top half of RSP in a legacy mode GPR to avoid truncation */
> + movq %rsp, %rbx
> + shrq $32, %rbx
> +
> /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
> pushq $__KERNEL32_CS
> leaq 0f(%rip), %rax
> pushq %rax
> lretq
>
> + /*
> + * The 32-bit code below will do a far jump back to long mode and end
> + * up here after reconfiguring the number of paging levels.
> + */
> +.Lret: shlq $32, %rbx // Reconstruct stack pointer
No side comments pls.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette