Update the x86 boot path to avoid the bare metal decompressor when
booting via the EFI stub. The bare metal decompressor inherits the
loader's 1:1 mapping of DRAM when entering in 64-bit mode, and assumes
that all of it is mapped read/write/execute, which will no longer be the
case on systems built to comply with recently tightened logo
requirements (*).
Changes since v2 [4]:
- update prose style to comply with -tip guidelines
- rebased onto Kirill's unaccepted memory series [3]
- add Kirill's ack to 4/5-level paging changes
- perform SEV init and SNP feature check after ExitBootServices(), to
avoid corrupting the firmware's own SEV state
- split out preparatory refactor of handover entry code and BSS clearing
(patches #1 to #4)
Changes since v1 [2]:
- streamline existing 4/5 level switching code and call it directly from
the EFI stub - this is covered by the first 9 patches, which can be
applied in isolation, if desired;
- deal with SEV/SNP init explicitly;
- clear BSS when booting via the 'handover protocol'
- switch to kernel CS before calling SEV init code in kernel proper.
---- v1 cover letter follows ----
This series is conceptually a combination of Evgeny's series [0] and
mine [1], both of which attempt to make the early decompressor code more
amenable to executing in the EFI environment with stricter handling of
memory permissions.
My series [1] implemented zboot for x86, by getting rid of the entire
x86 decompressor, and replacing it with existing EFI code that does the
same but in a generic way. The downside of this is that only EFI boot is
supported, making it unviable for distros, which need to support BIOS
boot and hybrid EFI boot modes that omit the EFI stub.
Evgeny's series [0] adapted the entire decompressor code flow to allow
it to execute in the EFI context as well as the bare metal context, and
this involves changes to the 1:1 mapping code and the page fault
handlers etc, none of which are really needed when doing EFI boot in the
first place.
So this series attempts to occupy the middle ground here: it makes
minimal changes to the existing decompressor so some of it can be called
from the EFI stub. Then, it reimplements the EFI boot flow to decompress
the kernel and boot it directly, without relying on the trampoline
allocation code, page table code or page fault handling code. This
allows us to get rid of quite a bit of unsavory EFI stub code, and
replace it with two clear invocations of the EFI firmware APIs to clear
NX restrictions from allocations that have been populated with
executable code.
The only code that is being reused is the decompression library itself,
along with the minimal ELF parsing that is required to copy the ELF
segments in place, and the relocation processing that fixes up absolute
symbol references to refer to the correct virtual addresses.
Note that some of Evgeny's changes to clean up the PE/COFF header
generation will still be needed, but I've omitted those here for
brevity.
(*) IMHO the following developments are likely to occur:
- the Windows boot chain (along with 3rd party drivers) is cleaned up so
that it never relies on memory being writable and executable at the
same time when running under the EFI boot services;
- the EFI reference implementation gets updated to map all memory NX by
default, and to require read-only permissions for executable mappings;
- BIOS vendors incorporate these changes into their codebases, and
deploy it more widely than just the 'secure' SKUs;
- OEMs only care about the Windows sticker [5], so they only boot test
Windows, which works fine in this more restricted context;
- Linux boot no longer works reliably on new hardware built for Windows
unless we clean up our boot chain as well.
Cc: Evgeniy Baskov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Alexey Khoroshilov <[email protected]>
Cc: Peter Jones <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Mario Limonciello <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
[0] https://lore.kernel.org/all/[email protected]/
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/
[5] https://techcommunity.microsoft.com/t5/hardware-dev-center/new-uefi-ca-memory-mitigation-requirements-for-signing/ba-p/3608714
Ard Biesheuvel (21):
x86/efistub: Branch straight to kernel entry point from C code
x86/efistub: Simplify and clean up handover entry code
x86/decompressor: Avoid magic offsets for EFI handover entrypoint
x86/efistub: Clear BSS in EFI handover protocol entrypoint
x86/decompressor: Use proper sequence to take the address of the GOT
x86/decompressor: Store boot_params pointer in callee save register
x86/decompressor: Call trampoline as a normal function
x86/decompressor: Use standard calling convention for trampoline
x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
x86/decompressor: Call trampoline directly from C code
x86/decompressor: Only call the trampoline when changing paging levels
x86/decompressor: Merge trampoline cleanup with switching code
x86/efistub: Perform 4/5 level paging switch from the stub
x86/efistub: Prefer EFI memory attributes protocol over DXE services
decompress: Use 8 byte alignment
x86/decompressor: Move global symbol references to C code
x86/decompressor: Factor out kernel decompression and relocation
x86/head_64: Store boot_params pointer in callee-preserved register
efi/libstub: Add limit argument to efi_random_alloc()
x86/efistub: Check SEV/SNP support while running in the firmware
x86/efistub: Avoid legacy decompressor when doing EFI boot
Documentation/arch/x86/boot.rst | 2 +-
arch/x86/boot/compressed/Makefile | 5 +
arch/x86/boot/compressed/efi_mixed.S | 107 +++-----
arch/x86/boot/compressed/head_32.S | 34 +--
arch/x86/boot/compressed/head_64.S | 219 ++++------------
arch/x86/boot/compressed/misc.c | 44 +++-
arch/x86/boot/compressed/pgtable.h | 6 +-
arch/x86/boot/compressed/pgtable_64.c | 72 +++---
arch/x86/boot/compressed/sev.c | 19 +-
arch/x86/include/asm/boot.h | 8 +
arch/x86/include/asm/efi.h | 7 +-
arch/x86/include/asm/sev.h | 6 +
arch/x86/kernel/head_64.S | 15 +-
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/firmware/efi/libstub/arm64-stub.c | 2 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 2 +
drivers/firmware/efi/libstub/efistub.h | 3 +-
drivers/firmware/efi/libstub/randomalloc.c | 10 +-
drivers/firmware/efi/libstub/x86-5lvl.c | 94 +++++++
drivers/firmware/efi/libstub/x86-stub.c | 269 +++++++++++---------
drivers/firmware/efi/libstub/x86-stub.h | 12 +
drivers/firmware/efi/libstub/zboot.c | 2 +-
include/linux/decompress/mm.h | 2 +-
23 files changed, 468 insertions(+), 473 deletions(-)
create mode 100644 drivers/firmware/efi/libstub/x86-5lvl.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
--
2.39.2
The bare metal decompressor code was never really intended to run in a
hosted environment such as the EFI boot services, and does a few things
that are problematic in the context of EFI boot now that the logo
requirements are getting tighter.
In particular, the decompressor moves its own executable image around in
memory, and relies on demand paging to populate the identity mappings,
and these things are difficult to support in a context where memory is
not permitted to be mapped writable and executable at the same time or,
at the very least, is mapped non-executable by default, and needs
special treatment for this restriction to be lifted.
Since EFI already maps all of memory 1:1, it is unnecessary to create
new page tables or handle page faults when decompressing the kernel.
That means there is also no need to replace the special exception
handlers for SEV. Generally, there is little need to do anything that
the decompressor does beyond
- initialize SEV encryption, if needed,
- perform the 4/5 level paging switch, if needed,
- decompress the kernel
- relocate the kernel
So do all of this from the EFI stub code, and avoid the bare metal
decompressor altogether.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/Makefile | 5 +
arch/x86/boot/compressed/efi_mixed.S | 55 -------
arch/x86/boot/compressed/head_32.S | 13 --
arch/x86/boot/compressed/head_64.S | 27 ----
arch/x86/include/asm/efi.h | 7 +-
drivers/firmware/efi/libstub/x86-stub.c | 163 ++++++++------------
6 files changed, 77 insertions(+), 193 deletions(-)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index b13a580210867ffb..535608fe72e11265 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -74,6 +74,11 @@ LDFLAGS_vmlinux += -z noexecstack
ifeq ($(CONFIG_LD_IS_BFD),y)
LDFLAGS_vmlinux += $(call ld-option,--no-warn-rwx-segments)
endif
+ifeq ($(CONFIG_EFI_STUB),y)
+# ensure that the static EFI stub library will be pulled in, even if it is
+# never referenced explicitly from the startup code
+LDFLAGS_vmlinux += -u efi_pe_entry
+endif
LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 8a02a151806df14c..f4e22ef774ab6b4a 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -269,10 +269,6 @@ SYM_FUNC_START_LOCAL(efi32_entry)
jmp startup_32
SYM_FUNC_END(efi32_entry)
-#define ST32_boottime 60 // offsetof(efi_system_table_32_t, boottime)
-#define BS32_handle_protocol 88 // offsetof(efi_boot_services_32_t, handle_protocol)
-#define LI32_image_base 32 // offsetof(efi_loaded_image_32_t, image_base)
-
/*
* efi_status_t efi32_pe_entry(efi_handle_t image_handle,
* efi_system_table_32_t *sys_table)
@@ -280,8 +276,6 @@ SYM_FUNC_END(efi32_entry)
SYM_FUNC_START(efi32_pe_entry)
pushl %ebp
movl %esp, %ebp
- pushl %eax // dummy push to allocate loaded_image
-
pushl %ebx // save callee-save registers
pushl %edi
@@ -290,48 +284,8 @@ SYM_FUNC_START(efi32_pe_entry)
movl $0x80000003, %eax // EFI_UNSUPPORTED
jnz 2f
- call 1f
-1: pop %ebx
-
- /* Get the loaded image protocol pointer from the image handle */
- leal -4(%ebp), %eax
- pushl %eax // &loaded_image
- leal (loaded_image_proto - 1b)(%ebx), %eax
- pushl %eax // pass the GUID address
- pushl 8(%ebp) // pass the image handle
-
- /*
- * Note the alignment of the stack frame.
- * sys_table
- * handle <-- 16-byte aligned on entry by ABI
- * return address
- * frame pointer
- * loaded_image <-- local variable
- * saved %ebx <-- 16-byte aligned here
- * saved %edi
- * &loaded_image
- * &loaded_image_proto
- * handle <-- 16-byte aligned for call to handle_protocol
- */
-
- movl 12(%ebp), %eax // sys_table
- movl ST32_boottime(%eax), %eax // sys_table->boottime
- call *BS32_handle_protocol(%eax) // sys_table->boottime->handle_protocol
- addl $12, %esp // restore argument space
- testl %eax, %eax
- jnz 2f
-
movl 8(%ebp), %ecx // image_handle
movl 12(%ebp), %edx // sys_table
- movl -4(%ebp), %esi // loaded_image
- movl LI32_image_base(%esi), %esi // loaded_image->image_base
- leal (startup_32 - 1b)(%ebx), %ebp // runtime address of startup_32
- /*
- * We need to set the image_offset variable here since startup_32() will
- * use it before we get to the 64-bit efi_pe_entry() in C code.
- */
- subl %esi, %ebp // calculate image_offset
- movl %ebp, (image_offset - 1b)(%ebx) // save image_offset
xorl %esi, %esi
jmp efi32_entry // pass %ecx, %edx, %esi
// no other registers remain live
@@ -350,15 +304,6 @@ SYM_FUNC_START_NOALIGN(efi64_stub_entry)
SYM_FUNC_END(efi64_stub_entry)
#endif
- .section ".rodata"
- /* EFI loaded image protocol GUID */
- .balign 4
-SYM_DATA_START_LOCAL(loaded_image_proto)
- .long 0x5b1b31a1
- .word 0x9562, 0x11d2
- .byte 0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b
-SYM_DATA_END(loaded_image_proto)
-
.data
.balign 8
SYM_DATA_START_LOCAL(efi32_boot_gdt)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index beee858058df4403..cd9587fcd5084f22 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -84,19 +84,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
leal startup_32@GOTOFF(%edx), %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32() will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry() will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl image_offset@GOTOFF(%edx), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 3074d278c7e665d8..aee518e71cdb7e75 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -146,19 +146,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
movl %ebp, %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl rva(image_offset)(%ebp), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
@@ -335,20 +322,6 @@ SYM_CODE_START(startup_64)
/* Start with the delta to where the kernel will run at. */
#ifdef CONFIG_RELOCATABLE
leaq startup_32(%rip) /* - $startup_32 */, %rbp
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- movl image_offset(%rip), %eax
- subq %rax, %rbp
-#endif
-
movl BP_kernel_alignment(%rsi), %eax
decl %eax
addq %rax, %rbp
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8b4be7cecdb8eb73..b0994ae3bc23f84d 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,6 +90,8 @@ static inline void efi_fpu_end(void)
}
#ifdef CONFIG_X86_32
+#define EFI_X86_KERNEL_ALLOC_LIMIT (SZ_512M - 1)
+
#define arch_efi_call_virt_setup() \
({ \
efi_fpu_begin(); \
@@ -103,8 +105,7 @@ static inline void efi_fpu_end(void)
})
#else /* !CONFIG_X86_32 */
-
-#define EFI_LOADER_SIGNATURE "EL64"
+#define EFI_X86_KERNEL_ALLOC_LIMIT EFI_ALLOC_LIMIT
extern asmlinkage u64 __efi_call(void *fp, ...);
@@ -218,6 +219,8 @@ efi_status_t efi_set_virtual_address_map(unsigned long memory_map_size,
#ifdef CONFIG_EFI_MIXED
+#define EFI_ALLOC_LIMIT (efi_is_64bit() ? ULONG_MAX : U32_MAX)
+
#define ARCH_HAS_EFISTUB_WRAPPERS
static inline bool efi_is_64bit(void)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 02633199a8502b71..9c280d9e6e9ef58f 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,17 +15,14 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/kaslr.h>
#include <asm/sev.h>
#include "efistub.h"
#include "x86-stub.h"
-/* Maximum physical address for 64-bit kernel with 4-level paging */
-#define MAXMEM_X86_64_4LEVEL (1ull << 46)
-
const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
-u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
static efi_memory_attribute_protocol_t *memattr;
@@ -276,33 +273,9 @@ void efi_adjust_memory_range_protection(unsigned long start,
}
}
-void startup_32(struct boot_params *boot_params);
-
-static void
-setup_memory_protection(unsigned long image_base, unsigned long image_size)
-{
-#ifdef CONFIG_64BIT
- if (image_base != (unsigned long)startup_32)
- efi_adjust_memory_range_protection(image_base, image_size);
-#else
- /*
- * Clear protection flags on a whole range of possible
- * addresses used for KASLR. We don't need to do that
- * on x86_64, since KASLR/extraction is performed after
- * dedicated identity page tables are built and we only
- * need to remove possible protection on relocated image
- * itself disregarding further relocations.
- */
- efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
-#endif
-}
-
static const efi_char16_t apple[] = L"Apple";
-static void setup_quirks(struct boot_params *boot_params,
- unsigned long image_base,
- unsigned long image_size)
+static void setup_quirks(struct boot_params *boot_params)
{
efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
efi_table_attr(efi_system_table, fw_vendor);
@@ -311,9 +284,6 @@ static void setup_quirks(struct boot_params *boot_params,
if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
retrieve_apple_device_properties(boot_params);
}
-
- if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
- setup_memory_protection(image_base, image_size);
}
/*
@@ -466,7 +436,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
}
image_base = efi_table_attr(image, image_base);
- image_offset = (void *)startup_32 - image_base;
status = efi_allocate_pages(sizeof(struct boot_params),
(unsigned long *)&boot_params, ULONG_MAX);
@@ -757,6 +726,61 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+static void efi_get_seed(void *seed, int size)
+{
+ efi_get_random_bytes(size, seed);
+
+ /*
+ * This only updates seed[0] when running on 32-bit, but in that case,
+ * we don't use seed[1] anyway, as there is no virtual KASLR on 32-bit.
+ */
+ *(unsigned long *)seed ^= kaslr_get_random_long("EFI");
+}
+
+static void error(char *str)
+{
+ efi_warn("Decompression failed: %s\n", str);
+}
+
+static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
+{
+ unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+ unsigned long addr, alloc_size, entry;
+ efi_status_t status;
+ u32 seed[2] = {};
+
+ /* determine the required size of the allocation */
+ alloc_size = ALIGN(max((unsigned long)output_len, kernel_total_size),
+ MIN_KERNEL_ALIGN);
+
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) {
+ u64 range = KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR - kernel_total_size;
+
+ efi_get_seed(seed, sizeof(seed));
+
+ virt_addr += (range * seed[1]) >> 32;
+ virt_addr &= ~(CONFIG_PHYSICAL_ALIGN - 1);
+ }
+
+ status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
+ seed[0], EFI_LOADER_CODE,
+ EFI_X86_KERNEL_ALLOC_LIMIT);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ entry = decompress_kernel((void *)addr, virt_addr, error);
+ if (entry == ULONG_MAX) {
+ efi_free(alloc_size, addr);
+ return EFI_LOAD_ERROR;
+ }
+
+ *kernel_entry = addr + entry;
+
+ efi_adjust_memory_range_protection(addr, kernel_total_size);
+
+ return EFI_SUCCESS;
+}
+
static void __noreturn enter_kernel(unsigned long kernel_addr,
struct boot_params *boot_params)
{
@@ -775,10 +799,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
struct boot_params *boot_params)
{
efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
- unsigned long bzimage_addr = (unsigned long)startup_32;
- unsigned long buffer_start, buffer_end;
struct setup_header *hdr = &boot_params->hdr;
const struct linux_efi_initrd *initrd = NULL;
+ unsigned long kernel_entry;
efi_status_t status;
efi_system_table = sys_table_arg;
@@ -806,60 +829,6 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
}
- /*
- * If the kernel isn't already loaded at a suitable address,
- * relocate it.
- *
- * It must be loaded above LOAD_PHYSICAL_ADDR.
- *
- * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
- * is defined as the macro MAXMEM, but unfortunately that is not a
- * compile-time constant if 5-level paging is configured, so we instead
- * define our own macro for use here.
- *
- * For 32-bit, the maximum address is complicated to figure out, for
- * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
- * KASLR uses.
- *
- * Also relocate it if image_offset is zero, i.e. the kernel wasn't
- * loaded by LoadImage, but rather by a bootloader that called the
- * handover entry. The reason we must always relocate in this case is
- * to handle the case of systemd-boot booting a unified kernel image,
- * which is a PE executable that contains the bzImage and an initrd as
- * COFF sections. The initrd section is placed after the bzImage
- * without ensuring that there are at least init_size bytes available
- * for the bzImage, and thus the compressed kernel's startup code may
- * overwrite the initrd unless it is moved out of the way.
- */
-
- buffer_start = ALIGN(bzimage_addr - image_offset,
- hdr->kernel_alignment);
- buffer_end = buffer_start + hdr->init_size;
-
- if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
- (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
- (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
- (image_offset == 0)) {
- extern char _bss[];
-
- status = efi_relocate_kernel(&bzimage_addr,
- (unsigned long)_bss - bzimage_addr,
- hdr->init_size,
- hdr->pref_address,
- hdr->kernel_alignment,
- LOAD_PHYSICAL_ADDR);
- if (status != EFI_SUCCESS) {
- efi_err("efi_relocate_kernel() failed!\n");
- goto fail;
- }
- /*
- * Now that we've copied the kernel elsewhere, we no longer
- * have a set up block before startup_32(), so reset image_offset
- * to zero in case it was set earlier.
- */
- image_offset = 0;
- }
-
#ifdef CONFIG_CMDLINE_BOOL
status = efi_parse_options(CONFIG_CMDLINE);
if (status != EFI_SUCCESS) {
@@ -877,6 +846,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
}
+ status = efi_decompress_kernel(&kernel_entry);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to decompress kernel\n");
+ goto fail;
+ }
+
/*
* At this point, an initrd may already have been loaded by the
* bootloader and passed via bootparams. We permit an initrd loaded
@@ -916,7 +891,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
setup_efi_pci(boot_params);
- setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
+ setup_quirks(boot_params);
status = exit_boot(boot_params, handle);
if (status != EFI_SUCCESS) {
@@ -940,14 +915,10 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
}
- if (IS_ENABLED(CONFIG_X86_64)) {
+ if (IS_ENABLED(CONFIG_X86_64))
efi_5level_switch();
- /* add offset of startup_64() */
- bzimage_addr += 0x200;
- }
-
- enter_kernel(bzimage_addr, boot_params);
+ enter_kernel(kernel_entry, boot_params);
fail:
efi_err("efi_stub_entry() failed!\n");
--
2.39.2
The 32-bit trampoline no longer uses the stack for anything except
performing a far return back to long mode. Currently, this stack is
placed in the same page that carries the trampoline code, which means
this page must be mapped writable and executable, and the stack is
therefore executable as well.
Replace the far return with a far jump, so that the return address can
be pre-calculated and patched into the code before it is called. This
removes the need for a stack entirely, and in a later patch, this will
be taken advantage of by removing writable permissions from (and adding
executable permissions to) this code page explicitly when booting via
the EFI stub.
Not touching the stack pointer also makes it more straight-forward to
call the trampoline code as an ordinary 64-bit function from C code.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 45 ++++++++------------
arch/x86/boot/compressed/pgtable.h | 6 +--
arch/x86/boot/compressed/pgtable_64.c | 12 +++++-
3 files changed, 32 insertions(+), 31 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a387cd80964e1a1e..741b4e8fefc915ea 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -449,9 +449,6 @@ SYM_CODE_START(startup_64)
leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
call *%rax
- /* Restore the stack, the 32-bit trampoline uses its own stack */
- leaq rva(boot_stack_end)(%rbx), %rsp
-
/*
* cleanup_trampoline() would restore trampoline memory.
*
@@ -537,24 +534,22 @@ SYM_FUNC_END(.Lrelocated)
* EDI contains the base address of the trampoline memory.
* Non-zero ESI means trampoline needs to enable 5-level paging.
*/
+ .section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
- popq %r8
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
leaq 0f(%rip), %rax
pushq %rax
lretq
+ /*
+ * The 32-bit code below will do a far jump back to long mode and end
+ * up here after reconfiguring the number of paging levels.
+ */
+.Lret: retq
+
.code32
-0: /* Set up data and stack segments */
- movl $__KERNEL_DS, %eax
- movl %eax, %ds
- movl %eax, %ss
-
- /* Set up new stack */
- leal TRAMPOLINE_32BIT_STACK_END(%edi), %esp
-
- /* Disable paging */
+0: /* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
@@ -608,26 +603,22 @@ SYM_CODE_START(trampoline_32bit_src)
1:
movl %eax, %cr4
- /* Calculate address of paging_enabled() once we are executing in the trampoline */
- leal .Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%edi), %eax
-
- /* Prepare the stack for far return to Long Mode */
- pushl $__KERNEL_CS
- pushl %eax
-
/* Enable paging again. */
movl %cr0, %eax
btsl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- lret
+ /*
+ * Return to the 64-bit calling code using LJMP rather than LRET, to
+ * avoid the need for a 32-bit addressable stack. The destination
+ * address will be adjusted after the template code is copied into a
+ * 32-bit addressable buffer.
+ */
+.Ljmp: ljmpl $__KERNEL_CS, $(.Lret - trampoline_32bit_src)
SYM_CODE_END(trampoline_32bit_src)
- .code64
-SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
- /* Return from the trampoline */
- jmp *%r8
-SYM_FUNC_END(.Lpaging_enabled)
+/* keep this right after trampoline_32bit_src() so we can infer its size */
+SYM_DATA(trampoline_ljmp_imm_offset, .word .Ljmp + 1 - trampoline_32bit_src)
/*
* The trampoline code has a size limit.
@@ -636,7 +627,7 @@ SYM_FUNC_END(.Lpaging_enabled)
*/
.org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
- .code32
+ .text
SYM_FUNC_START_LOCAL_NOALIGN(.Lno_longmode)
/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 4e8cef135226bcbb..131488f50af55d0a 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -6,9 +6,7 @@
#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
-
-#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE 0x80
#ifndef __ASSEMBLER__
@@ -16,5 +14,7 @@ extern unsigned long *trampoline_32bit;
extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
+extern const u16 trampoline_ljmp_imm_offset;
+
#endif /* __ASSEMBLER__ */
#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff4111bf8c0..09fc18180929fab3 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -109,6 +109,7 @@ static unsigned long find_trampoline_placement(void)
struct paging_config paging_prepare(void *rmode)
{
struct paging_config paging_config = {};
+ void *tramp_code;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
boot_params = rmode;
@@ -143,9 +144,18 @@ struct paging_config paging_prepare(void *rmode)
memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
/* Copy trampoline code in place */
- memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+ tramp_code = memcpy(trampoline_32bit +
+ TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+ /*
+ * Avoid the need for a stack in the 32-bit trampoline code, by using
+ * LJMP rather than LRET to return back to long mode. LJMP takes an
+ * immediate absolute address, so we have to adjust that based on the
+ * placement of the trampoline.
+ */
+ *(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+
/*
* The code below prepares page table in trampoline memory.
*
--
2.39.2