2023-04-16 12:08:35

by Ard Biesheuvel

[permalink] [raw]
Subject: [RFC PATCH 0/3] efi: Implement generic zboot support

This series is a proof-of-concept that implements support for the EFI
zboot decompressor for x86. It replaces the ordinary decompressor, and
instead, performs the decompression, KASLR randomization and the 4/5
level paging switch while running in the execution context of EFI.

This simplifies things substantially, and makes it straight-forward to
abide by stricter future requirements related to the use of writable and
executable memory under EFI, which will come into effect on x86 systems
that are certified as being 'more secure', and ship with an even shinier
Windows sticker.

This is an alternative approach to the work being proposed by Evgeny [0]
that makes rather radical changes to the existing decompressor, which
has accumulated too many features already, e.g., related to confidential
compute etc.

EFI zboot images can be booted in two ways:
- by EFI firmware, which loads and starts it as an ordinary EFI
application, just like the existing EFI stub (with which it shares
most of its code);
- by a non-EFI loader that parses the image header for the compression
metadata, and decompresses the image into memory and boots it.

Realistically, the second option is unlikely to ever be used on x86,
given that it already has its existing bzImage, but the first option is
a good choice for distros that target EFI boot only (and some distros
switched to this format already for arm64). The fact that EFI zboot is
implemented in the same way on arm64, RISC-V, LoongArch and [shortly]
ARM helps with maintenance, not only of the kernel itself, but also the
tooling around it relating to kexec, code signing, deployment, etc.

Series can be pulled from [1], which contains some prerequisite patches
that are only tangentially related.

[0] https://lore.kernel.org/all/[email protected]/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-x86-zboot

Cc: Evgeniy Baskov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Alexey Khoroshilov <[email protected]>
Cc: Peter Jones <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Mario Limonciello <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>

Ard Biesheuvel (3):
efi/libstub: x86: Split off pieces shared with zboot
efi/zboot: x86: Implement EFI zboot support
efi/zboot: x86: Clear NX restrictions on populated code regions

arch/x86/Makefile | 18 +-
arch/x86/include/asm/efi.h | 10 +
arch/x86/kernel/head_64.S | 15 +
arch/x86/zboot/Makefile | 29 +
drivers/firmware/efi/Kconfig | 2 +-
drivers/firmware/efi/libstub/Makefile | 15 +-
drivers/firmware/efi/libstub/Makefile.zboot | 2 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 3 +
drivers/firmware/efi/libstub/x86-stub.c | 592 +------------------
drivers/firmware/efi/libstub/x86-zboot.c | 322 ++++++++++
drivers/firmware/efi/libstub/x86.c | 612 ++++++++++++++++++++
drivers/firmware/efi/libstub/zboot.c | 3 +-
drivers/firmware/efi/libstub/zboot.lds | 5 +
13 files changed, 1031 insertions(+), 597 deletions(-)
create mode 100644 arch/x86/zboot/Makefile
create mode 100644 drivers/firmware/efi/libstub/x86-zboot.c
create mode 100644 drivers/firmware/efi/libstub/x86.c

--
2.39.2


2023-04-16 12:08:40

by Ard Biesheuvel

[permalink] [raw]
Subject: [RFC PATCH 1/3] efi/libstub: x86: Split off pieces shared with zboot

In preparation for implementing generic EFI zboot support also on x86,
split the X86 pieces into those that can be shared and those that are
tied to the way the EFI stub is currently embedded in the bare metal x86
decompressor.

No functional change intended.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/include/asm/efi.h | 5 +
drivers/firmware/efi/libstub/Makefile | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 591 +------------------
drivers/firmware/efi/libstub/x86.c | 612 ++++++++++++++++++++
4 files changed, 625 insertions(+), 585 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 419280d263d2e3f2..dd49cb9b6e3a1f1f 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -214,6 +214,11 @@ efi_status_t efi_set_virtual_address_map(unsigned long memory_map_size,

/* arch specific definitions used by the stub code */

+struct boot_params *efi_alloc_boot_params(void);
+
+efi_status_t efi_x86_stub_common(struct boot_params *boot_params,
+ efi_handle_t handle);
+
#ifdef CONFIG_EFI_MIXED

#define ARCH_HAS_EFISTUB_WRAPPERS
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 3abb2b357482a416..4dfbfac254614f18 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -87,7 +87,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \

lib-$(CONFIG_ARM) += arm32-stub.o
lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o smbios.o
-lib-$(CONFIG_X86) += x86-stub.o
+lib-$(CONFIG_X86) += x86.o x86-stub.o
lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index a0bfd31358ba97b1..d2b75025295822c7 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -7,11 +7,9 @@
* ----------------------------------------------------------------------- */

#include <linux/efi.h>
-#include <linux/pci.h>
#include <linux/stddef.h>

#include <asm/efi.h>
-#include <asm/e820/types.h>
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
@@ -26,192 +24,6 @@ const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;

-static efi_status_t
-preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
-{
- struct pci_setup_rom *rom = NULL;
- efi_status_t status;
- unsigned long size;
- uint64_t romsize;
- void *romimage;
-
- /*
- * Some firmware images contain EFI function pointers at the place where
- * the romimage and romsize fields are supposed to be. Typically the EFI
- * code is mapped at high addresses, translating to an unrealistically
- * large romsize. The UEFI spec limits the size of option ROMs to 16
- * MiB so we reject any ROMs over 16 MiB in size to catch this.
- */
- romimage = efi_table_attr(pci, romimage);
- romsize = efi_table_attr(pci, romsize);
- if (!romimage || !romsize || romsize > SZ_16M)
- return EFI_INVALID_PARAMETER;
-
- size = romsize + sizeof(*rom);
-
- status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
- (void **)&rom);
- if (status != EFI_SUCCESS) {
- efi_err("Failed to allocate memory for 'rom'\n");
- return status;
- }
-
- memset(rom, 0, sizeof(*rom));
-
- rom->data.type = SETUP_PCI;
- rom->data.len = size - sizeof(struct setup_data);
- rom->data.next = 0;
- rom->pcilen = pci->romsize;
- *__rom = rom;
-
- status = efi_call_proto(pci, pci.read, EfiPciIoWidthUint16,
- PCI_VENDOR_ID, 1, &rom->vendor);
-
- if (status != EFI_SUCCESS) {
- efi_err("Failed to read rom->vendor\n");
- goto free_struct;
- }
-
- status = efi_call_proto(pci, pci.read, EfiPciIoWidthUint16,
- PCI_DEVICE_ID, 1, &rom->devid);
-
- if (status != EFI_SUCCESS) {
- efi_err("Failed to read rom->devid\n");
- goto free_struct;
- }
-
- status = efi_call_proto(pci, get_location, &rom->segment, &rom->bus,
- &rom->device, &rom->function);
-
- if (status != EFI_SUCCESS)
- goto free_struct;
-
- memcpy(rom->romdata, romimage, romsize);
- return status;
-
-free_struct:
- efi_bs_call(free_pool, rom);
- return status;
-}
-
-/*
- * There's no way to return an informative status from this function,
- * because any analysis (and printing of error messages) needs to be
- * done directly at the EFI function call-site.
- *
- * For example, EFI_INVALID_PARAMETER could indicate a bug or maybe we
- * just didn't find any PCI devices, but there's no way to tell outside
- * the context of the call.
- */
-static void setup_efi_pci(struct boot_params *params)
-{
- efi_status_t status;
- void **pci_handle = NULL;
- efi_guid_t pci_proto = EFI_PCI_IO_PROTOCOL_GUID;
- unsigned long size = 0;
- struct setup_data *data;
- efi_handle_t h;
- int i;
-
- status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
- &pci_proto, NULL, &size, pci_handle);
-
- if (status == EFI_BUFFER_TOO_SMALL) {
- status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
- (void **)&pci_handle);
-
- if (status != EFI_SUCCESS) {
- efi_err("Failed to allocate memory for 'pci_handle'\n");
- return;
- }
-
- status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
- &pci_proto, NULL, &size, pci_handle);
- }
-
- if (status != EFI_SUCCESS)
- goto free_handle;
-
- data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
-
- while (data && data->next)
- data = (struct setup_data *)(unsigned long)data->next;
-
- for_each_efi_handle(h, pci_handle, size, i) {
- efi_pci_io_protocol_t *pci = NULL;
- struct pci_setup_rom *rom;
-
- status = efi_bs_call(handle_protocol, h, &pci_proto,
- (void **)&pci);
- if (status != EFI_SUCCESS || !pci)
- continue;
-
- status = preserve_pci_rom_image(pci, &rom);
- if (status != EFI_SUCCESS)
- continue;
-
- if (data)
- data->next = (unsigned long)rom;
- else
- params->hdr.setup_data = (unsigned long)rom;
-
- data = (struct setup_data *)rom;
- }
-
-free_handle:
- efi_bs_call(free_pool, pci_handle);
-}
-
-static void retrieve_apple_device_properties(struct boot_params *boot_params)
-{
- efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID;
- struct setup_data *data, *new;
- efi_status_t status;
- u32 size = 0;
- apple_properties_protocol_t *p;
-
- status = efi_bs_call(locate_protocol, &guid, NULL, (void **)&p);
- if (status != EFI_SUCCESS)
- return;
-
- if (efi_table_attr(p, version) != 0x10000) {
- efi_err("Unsupported properties proto version\n");
- return;
- }
-
- efi_call_proto(p, get_all, NULL, &size);
- if (!size)
- return;
-
- do {
- status = efi_bs_call(allocate_pool, EFI_LOADER_DATA,
- size + sizeof(struct setup_data),
- (void **)&new);
- if (status != EFI_SUCCESS) {
- efi_err("Failed to allocate memory for 'properties'\n");
- return;
- }
-
- status = efi_call_proto(p, get_all, new->data, &size);
-
- if (status == EFI_BUFFER_TOO_SMALL)
- efi_bs_call(free_pool, new);
- } while (status == EFI_BUFFER_TOO_SMALL);
-
- new->type = SETUP_APPLE_PROPERTIES;
- new->len = size;
- new->next = 0;
-
- data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data;
- if (!data) {
- boot_params->hdr.setup_data = (unsigned long)new;
- } else {
- while (data->next)
- data = (struct setup_data *)(unsigned long)data->next;
- data->next = (unsigned long)new;
- }
-}
-
static void
adjust_memory_range_protection(unsigned long start, unsigned long size)
{
@@ -310,134 +122,6 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
#endif
}

-static const efi_char16_t apple[] = L"Apple";
-
-static void setup_quirks(struct boot_params *boot_params,
- unsigned long image_base,
- unsigned long image_size)
-{
- efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
- efi_table_attr(efi_system_table, fw_vendor);
-
- if (!memcmp(fw_vendor, apple, sizeof(apple))) {
- if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
- retrieve_apple_device_properties(boot_params);
- }
-
- if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
- setup_memory_protection(image_base, image_size);
-}
-
-/*
- * See if we have Universal Graphics Adapter (UGA) protocol
- */
-static efi_status_t
-setup_uga(struct screen_info *si, efi_guid_t *uga_proto, unsigned long size)
-{
- efi_status_t status;
- u32 width, height;
- void **uga_handle = NULL;
- efi_uga_draw_protocol_t *uga = NULL, *first_uga;
- efi_handle_t handle;
- int i;
-
- status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
- (void **)&uga_handle);
- if (status != EFI_SUCCESS)
- return status;
-
- status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
- uga_proto, NULL, &size, uga_handle);
- if (status != EFI_SUCCESS)
- goto free_handle;
-
- height = 0;
- width = 0;
-
- first_uga = NULL;
- for_each_efi_handle(handle, uga_handle, size, i) {
- efi_guid_t pciio_proto = EFI_PCI_IO_PROTOCOL_GUID;
- u32 w, h, depth, refresh;
- void *pciio;
-
- status = efi_bs_call(handle_protocol, handle, uga_proto,
- (void **)&uga);
- if (status != EFI_SUCCESS)
- continue;
-
- pciio = NULL;
- efi_bs_call(handle_protocol, handle, &pciio_proto, &pciio);
-
- status = efi_call_proto(uga, get_mode, &w, &h, &depth, &refresh);
- if (status == EFI_SUCCESS && (!first_uga || pciio)) {
- width = w;
- height = h;
-
- /*
- * Once we've found a UGA supporting PCIIO,
- * don't bother looking any further.
- */
- if (pciio)
- break;
-
- first_uga = uga;
- }
- }
-
- if (!width && !height)
- goto free_handle;
-
- /* EFI framebuffer */
- si->orig_video_isVGA = VIDEO_TYPE_EFI;
-
- si->lfb_depth = 32;
- si->lfb_width = width;
- si->lfb_height = height;
-
- si->red_size = 8;
- si->red_pos = 16;
- si->green_size = 8;
- si->green_pos = 8;
- si->blue_size = 8;
- si->blue_pos = 0;
- si->rsvd_size = 8;
- si->rsvd_pos = 24;
-
-free_handle:
- efi_bs_call(free_pool, uga_handle);
-
- return status;
-}
-
-static void setup_graphics(struct boot_params *boot_params)
-{
- efi_guid_t graphics_proto = EFI_GRAPHICS_OUTPUT_PROTOCOL_GUID;
- struct screen_info *si;
- efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
- efi_status_t status;
- unsigned long size;
- void **gop_handle = NULL;
- void **uga_handle = NULL;
-
- si = &boot_params->screen_info;
- memset(si, 0, sizeof(*si));
-
- size = 0;
- status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
- &graphics_proto, NULL, &size, gop_handle);
- if (status == EFI_BUFFER_TOO_SMALL)
- status = efi_setup_gop(si, &graphics_proto, size);
-
- if (status != EFI_SUCCESS) {
- size = 0;
- status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
- &uga_proto, NULL, &size, uga_handle);
- if (status == EFI_BUFFER_TOO_SMALL)
- setup_uga(si, &uga_proto, size);
- }
-}
-
-
static void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
{
efi_bs_call(exit, handle, status, 0, NULL);
@@ -480,14 +164,9 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
image_base = efi_table_attr(image, image_base);
image_offset = (void *)startup_32 - image_base;

- status = efi_allocate_pages(sizeof(struct boot_params),
- (unsigned long *)&boot_params, ULONG_MAX);
- if (status != EFI_SUCCESS) {
- efi_err("Failed to allocate lowmem for boot params\n");
- efi_exit(handle, status);
- }
-
- memset(boot_params, 0x0, sizeof(struct boot_params));
+ boot_params = efi_alloc_boot_params();
+ if (!boot_params)
+ efi_exit(handle, EFI_OUT_OF_RESOURCES);

hdr = &boot_params->hdr;

@@ -495,14 +174,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
memcpy(&hdr->jump, image_base + 512,
sizeof(struct setup_header) - offsetof(struct setup_header, jump));

- /*
- * Fill out some of the header fields ourselves because the
- * EFI firmware loader doesn't load the first sector.
- */
- hdr->root_flags = 1;
- hdr->vid_mode = 0xffff;
- hdr->boot_flag = 0xAA55;
-
hdr->type_of_loader = 0x21;

/* Convert unicode cmdline to ascii */
@@ -532,234 +203,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
efi_exit(handle, status);
}

-static void add_e820ext(struct boot_params *params,
- struct setup_data *e820ext, u32 nr_entries)
-{
- struct setup_data *data;
-
- e820ext->type = SETUP_E820_EXT;
- e820ext->len = nr_entries * sizeof(struct boot_e820_entry);
- e820ext->next = 0;
-
- data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
-
- while (data && data->next)
- data = (struct setup_data *)(unsigned long)data->next;
-
- if (data)
- data->next = (unsigned long)e820ext;
- else
- params->hdr.setup_data = (unsigned long)e820ext;
-}
-
-static efi_status_t
-setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_size)
-{
- struct boot_e820_entry *entry = params->e820_table;
- struct efi_info *efi = &params->efi_info;
- struct boot_e820_entry *prev = NULL;
- u32 nr_entries;
- u32 nr_desc;
- int i;
-
- nr_entries = 0;
- nr_desc = efi->efi_memmap_size / efi->efi_memdesc_size;
-
- for (i = 0; i < nr_desc; i++) {
- efi_memory_desc_t *d;
- unsigned int e820_type = 0;
- unsigned long m = efi->efi_memmap;
-
-#ifdef CONFIG_X86_64
- m |= (u64)efi->efi_memmap_hi << 32;
-#endif
-
- d = efi_early_memdesc_ptr(m, efi->efi_memdesc_size, i);
- switch (d->type) {
- case EFI_RESERVED_TYPE:
- case EFI_RUNTIME_SERVICES_CODE:
- case EFI_RUNTIME_SERVICES_DATA:
- case EFI_MEMORY_MAPPED_IO:
- case EFI_MEMORY_MAPPED_IO_PORT_SPACE:
- case EFI_PAL_CODE:
- e820_type = E820_TYPE_RESERVED;
- break;
-
- case EFI_UNUSABLE_MEMORY:
- e820_type = E820_TYPE_UNUSABLE;
- break;
-
- case EFI_ACPI_RECLAIM_MEMORY:
- e820_type = E820_TYPE_ACPI;
- break;
-
- case EFI_LOADER_CODE:
- case EFI_LOADER_DATA:
- case EFI_BOOT_SERVICES_CODE:
- case EFI_BOOT_SERVICES_DATA:
- case EFI_CONVENTIONAL_MEMORY:
- if (efi_soft_reserve_enabled() &&
- (d->attribute & EFI_MEMORY_SP))
- e820_type = E820_TYPE_SOFT_RESERVED;
- else
- e820_type = E820_TYPE_RAM;
- break;
-
- case EFI_ACPI_MEMORY_NVS:
- e820_type = E820_TYPE_NVS;
- break;
-
- case EFI_PERSISTENT_MEMORY:
- e820_type = E820_TYPE_PMEM;
- break;
-
- default:
- continue;
- }
-
- /* Merge adjacent mappings */
- if (prev && prev->type == e820_type &&
- (prev->addr + prev->size) == d->phys_addr) {
- prev->size += d->num_pages << 12;
- continue;
- }
-
- if (nr_entries == ARRAY_SIZE(params->e820_table)) {
- u32 need = (nr_desc - i) * sizeof(struct e820_entry) +
- sizeof(struct setup_data);
-
- if (!e820ext || e820ext_size < need)
- return EFI_BUFFER_TOO_SMALL;
-
- /* boot_params map full, switch to e820 extended */
- entry = (struct boot_e820_entry *)e820ext->data;
- }
-
- entry->addr = d->phys_addr;
- entry->size = d->num_pages << PAGE_SHIFT;
- entry->type = e820_type;
- prev = entry++;
- nr_entries++;
- }
-
- if (nr_entries > ARRAY_SIZE(params->e820_table)) {
- u32 nr_e820ext = nr_entries - ARRAY_SIZE(params->e820_table);
-
- add_e820ext(params, e820ext, nr_e820ext);
- nr_entries -= nr_e820ext;
- }
-
- params->e820_entries = (u8)nr_entries;
-
- return EFI_SUCCESS;
-}
-
-static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext,
- u32 *e820ext_size)
-{
- efi_status_t status;
- unsigned long size;
-
- size = sizeof(struct setup_data) +
- sizeof(struct e820_entry) * nr_desc;
-
- if (*e820ext) {
- efi_bs_call(free_pool, *e820ext);
- *e820ext = NULL;
- *e820ext_size = 0;
- }
-
- status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
- (void **)e820ext);
- if (status == EFI_SUCCESS)
- *e820ext_size = size;
-
- return status;
-}
-
-static efi_status_t allocate_e820(struct boot_params *params,
- struct setup_data **e820ext,
- u32 *e820ext_size)
-{
- unsigned long map_size, desc_size, map_key;
- efi_status_t status;
- __u32 nr_desc, desc_version;
-
- /* Only need the size of the mem map and size of each mem descriptor */
- map_size = 0;
- status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key,
- &desc_size, &desc_version);
- if (status != EFI_BUFFER_TOO_SMALL)
- return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED;
-
- nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS;
-
- if (nr_desc > ARRAY_SIZE(params->e820_table)) {
- u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
-
- status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
- if (status != EFI_SUCCESS)
- return status;
- }
-
- return EFI_SUCCESS;
-}
-
-struct exit_boot_struct {
- struct boot_params *boot_params;
- struct efi_info *efi;
-};
-
-static efi_status_t exit_boot_func(struct efi_boot_memmap *map,
- void *priv)
-{
- const char *signature;
- struct exit_boot_struct *p = priv;
-
- signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
- : EFI32_LOADER_SIGNATURE;
- memcpy(&p->efi->efi_loader_signature, signature, sizeof(__u32));
-
- efi_set_u64_split((unsigned long)efi_system_table,
- &p->efi->efi_systab, &p->efi->efi_systab_hi);
- p->efi->efi_memdesc_size = map->desc_size;
- p->efi->efi_memdesc_version = map->desc_ver;
- efi_set_u64_split((unsigned long)map->map,
- &p->efi->efi_memmap, &p->efi->efi_memmap_hi);
- p->efi->efi_memmap_size = map->map_size;
-
- return EFI_SUCCESS;
-}
-
-static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
-{
- struct setup_data *e820ext = NULL;
- __u32 e820ext_size = 0;
- efi_status_t status;
- struct exit_boot_struct priv;
-
- priv.boot_params = boot_params;
- priv.efi = &boot_params->efi_info;
-
- status = allocate_e820(boot_params, &e820ext, &e820ext_size);
- if (status != EFI_SUCCESS)
- return status;
-
- /* Might as well exit boot services now */
- status = efi_exit_boot_services(handle, &priv, exit_boot_func);
- if (status != EFI_SUCCESS)
- return status;
-
- /* Historic? */
- boot_params->alt_mem_k = 32 * 1024;
-
- status = setup_e820(boot_params, e820ext, e820ext_size);
- if (status != EFI_SUCCESS)
- return status;
-
- return EFI_SUCCESS;
-}
-
/*
* On success, we return the address of startup_32, which has potentially been
* relocated by efi_relocate_kernel.
@@ -878,32 +321,12 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
&boot_params->ext_ramdisk_size);
}

+ if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
+ setup_memory_protection(bzimage_addr, buffer_end - buffer_start);

- /*
- * If the boot loader gave us a value for secure_boot then we use that,
- * otherwise we ask the BIOS.
- */
- if (boot_params->secure_boot == efi_secureboot_mode_unset)
- boot_params->secure_boot = efi_get_secureboot();
-
- /* Ask the firmware to clear memory on unclean shutdown */
- efi_enable_reset_attack_mitigation();
-
- efi_random_get_seed();
-
- efi_retrieve_tpm2_eventlog();
-
- setup_graphics(boot_params);
-
- setup_efi_pci(boot_params);
-
- setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
-
- status = exit_boot(boot_params, handle);
- if (status != EFI_SUCCESS) {
- efi_err("exit_boot() failed!\n");
+ status = efi_x86_stub_common(boot_params, handle);
+ if (status != EFI_SUCCESS)
goto fail;
- }

return bzimage_addr;
fail:
diff --git a/drivers/firmware/efi/libstub/x86.c b/drivers/firmware/efi/libstub/x86.c
new file mode 100644
index 0000000000000000..fcaf69eace751f17
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86.c
@@ -0,0 +1,612 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/* -----------------------------------------------------------------------
+ *
+ * Copyright 2011 Intel Corporation; author Matt Fleming
+ *
+ * ----------------------------------------------------------------------- */
+
+#include <linux/efi.h>
+#include <linux/pci.h>
+#include <linux/stddef.h>
+
+#include <asm/efi.h>
+#include <asm/e820/types.h>
+#include <asm/setup.h>
+#include <asm/desc.h>
+#include <asm/boot.h>
+
+#include "efistub.h"
+
+static void add_e820ext(struct boot_params *params,
+ struct setup_data *e820ext, u32 nr_entries)
+{
+ struct setup_data *data;
+
+ e820ext->type = SETUP_E820_EXT;
+ e820ext->len = nr_entries * sizeof(struct boot_e820_entry);
+ e820ext->next = 0;
+
+ data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
+
+ while (data && data->next)
+ data = (struct setup_data *)(unsigned long)data->next;
+
+ if (data)
+ data->next = (unsigned long)e820ext;
+ else
+ params->hdr.setup_data = (unsigned long)e820ext;
+}
+
+static efi_status_t
+setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_size)
+{
+ struct boot_e820_entry *entry = params->e820_table;
+ struct efi_info *efi = &params->efi_info;
+ struct boot_e820_entry *prev = NULL;
+ u32 nr_entries;
+ u32 nr_desc;
+ int i;
+
+ nr_entries = 0;
+ nr_desc = efi->efi_memmap_size / efi->efi_memdesc_size;
+
+ for (i = 0; i < nr_desc; i++) {
+ efi_memory_desc_t *d;
+ unsigned int e820_type = 0;
+ unsigned long m = efi->efi_memmap;
+
+#ifdef CONFIG_X86_64
+ m |= (u64)efi->efi_memmap_hi << 32;
+#endif
+
+ d = efi_early_memdesc_ptr(m, efi->efi_memdesc_size, i);
+ switch (d->type) {
+ case EFI_RESERVED_TYPE:
+ case EFI_RUNTIME_SERVICES_CODE:
+ case EFI_RUNTIME_SERVICES_DATA:
+ case EFI_MEMORY_MAPPED_IO:
+ case EFI_MEMORY_MAPPED_IO_PORT_SPACE:
+ case EFI_PAL_CODE:
+ e820_type = E820_TYPE_RESERVED;
+ break;
+
+ case EFI_UNUSABLE_MEMORY:
+ e820_type = E820_TYPE_UNUSABLE;
+ break;
+
+ case EFI_ACPI_RECLAIM_MEMORY:
+ e820_type = E820_TYPE_ACPI;
+ break;
+
+ case EFI_LOADER_CODE:
+ case EFI_LOADER_DATA:
+ case EFI_BOOT_SERVICES_CODE:
+ case EFI_BOOT_SERVICES_DATA:
+ case EFI_CONVENTIONAL_MEMORY:
+ if (efi_soft_reserve_enabled() &&
+ (d->attribute & EFI_MEMORY_SP))
+ e820_type = E820_TYPE_SOFT_RESERVED;
+ else
+ e820_type = E820_TYPE_RAM;
+ break;
+
+ case EFI_ACPI_MEMORY_NVS:
+ e820_type = E820_TYPE_NVS;
+ break;
+
+ case EFI_PERSISTENT_MEMORY:
+ e820_type = E820_TYPE_PMEM;
+ break;
+
+ default:
+ continue;
+ }
+
+ /* Merge adjacent mappings */
+ if (prev && prev->type == e820_type &&
+ (prev->addr + prev->size) == d->phys_addr) {
+ prev->size += d->num_pages << 12;
+ continue;
+ }
+
+ if (nr_entries == ARRAY_SIZE(params->e820_table)) {
+ u32 need = (nr_desc - i) * sizeof(struct e820_entry) +
+ sizeof(struct setup_data);
+
+ if (!e820ext || e820ext_size < need)
+ return EFI_BUFFER_TOO_SMALL;
+
+ /* boot_params map full, switch to e820 extended */
+ entry = (struct boot_e820_entry *)e820ext->data;
+ }
+
+ entry->addr = d->phys_addr;
+ entry->size = d->num_pages << PAGE_SHIFT;
+ entry->type = e820_type;
+ prev = entry++;
+ nr_entries++;
+ }
+
+ if (nr_entries > ARRAY_SIZE(params->e820_table)) {
+ u32 nr_e820ext = nr_entries - ARRAY_SIZE(params->e820_table);
+
+ add_e820ext(params, e820ext, nr_e820ext);
+ nr_entries -= nr_e820ext;
+ }
+
+ params->e820_entries = (u8)nr_entries;
+
+ return EFI_SUCCESS;
+}
+
+static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext,
+ u32 *e820ext_size)
+{
+ efi_status_t status;
+ unsigned long size;
+
+ size = sizeof(struct setup_data) +
+ sizeof(struct e820_entry) * nr_desc;
+
+ if (*e820ext) {
+ efi_bs_call(free_pool, *e820ext);
+ *e820ext = NULL;
+ *e820ext_size = 0;
+ }
+
+ status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
+ (void **)e820ext);
+ if (status == EFI_SUCCESS)
+ *e820ext_size = size;
+
+ return status;
+}
+
+static efi_status_t allocate_e820(struct boot_params *params,
+ struct setup_data **e820ext,
+ u32 *e820ext_size)
+{
+ unsigned long map_size, desc_size, map_key;
+ efi_status_t status;
+ __u32 nr_desc, desc_version;
+
+ /* Only need the size of the mem map and size of each mem descriptor */
+ map_size = 0;
+ status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key,
+ &desc_size, &desc_version);
+ if (status != EFI_BUFFER_TOO_SMALL)
+ return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED;
+
+ nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS;
+
+ if (nr_desc > ARRAY_SIZE(params->e820_table)) {
+ u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table);
+
+ status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size);
+ if (status != EFI_SUCCESS)
+ return status;
+ }
+
+ return EFI_SUCCESS;
+}
+
+struct exit_boot_struct {
+ struct boot_params *boot_params;
+ struct efi_info *efi;
+};
+
+static efi_status_t exit_boot_func(struct efi_boot_memmap *map,
+ void *priv)
+{
+ const char *signature;
+ struct exit_boot_struct *p = priv;
+
+ signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
+ : EFI32_LOADER_SIGNATURE;
+ memcpy(&p->efi->efi_loader_signature, signature, sizeof(__u32));
+
+ efi_set_u64_split((unsigned long)efi_system_table,
+ &p->efi->efi_systab, &p->efi->efi_systab_hi);
+ p->efi->efi_memdesc_size = map->desc_size;
+ p->efi->efi_memdesc_version = map->desc_ver;
+ efi_set_u64_split((unsigned long)map->map,
+ &p->efi->efi_memmap, &p->efi->efi_memmap_hi);
+ p->efi->efi_memmap_size = map->map_size;
+
+ return EFI_SUCCESS;
+}
+
+static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
+{
+ struct setup_data *e820ext = NULL;
+ __u32 e820ext_size = 0;
+ efi_status_t status;
+ struct exit_boot_struct priv;
+
+ priv.boot_params = boot_params;
+ priv.efi = &boot_params->efi_info;
+
+ status = allocate_e820(boot_params, &e820ext, &e820ext_size);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ /* Might as well exit boot services now */
+ status = efi_exit_boot_services(handle, &priv, exit_boot_func);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ /* Historic? */
+ boot_params->alt_mem_k = 32 * 1024;
+
+ status = setup_e820(boot_params, e820ext, e820ext_size);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ return EFI_SUCCESS;
+}
+
+static efi_status_t
+preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
+{
+ struct pci_setup_rom *rom = NULL;
+ efi_status_t status;
+ unsigned long size;
+ uint64_t romsize;
+ void *romimage;
+
+ /*
+ * Some firmware images contain EFI function pointers at the place where
+ * the romimage and romsize fields are supposed to be. Typically the EFI
+ * code is mapped at high addresses, translating to an unrealistically
+ * large romsize. The UEFI spec limits the size of option ROMs to 16
+ * MiB so we reject any ROMs over 16 MiB in size to catch this.
+ */
+ romimage = efi_table_attr(pci, romimage);
+ romsize = efi_table_attr(pci, romsize);
+ if (!romimage || !romsize || romsize > SZ_16M)
+ return EFI_INVALID_PARAMETER;
+
+ size = romsize + sizeof(*rom);
+
+ status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
+ (void **)&rom);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to allocate memory for 'rom'\n");
+ return status;
+ }
+
+ memset(rom, 0, sizeof(*rom));
+
+ rom->data.type = SETUP_PCI;
+ rom->data.len = size - sizeof(struct setup_data);
+ rom->data.next = 0;
+ rom->pcilen = pci->romsize;
+ *__rom = rom;
+
+ status = efi_call_proto(pci, pci.read, EfiPciIoWidthUint16,
+ PCI_VENDOR_ID, 1, &rom->vendor);
+
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to read rom->vendor\n");
+ goto free_struct;
+ }
+
+ status = efi_call_proto(pci, pci.read, EfiPciIoWidthUint16,
+ PCI_DEVICE_ID, 1, &rom->devid);
+
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to read rom->devid\n");
+ goto free_struct;
+ }
+
+ status = efi_call_proto(pci, get_location, &rom->segment, &rom->bus,
+ &rom->device, &rom->function);
+
+ if (status != EFI_SUCCESS)
+ goto free_struct;
+
+ memcpy(rom->romdata, romimage, romsize);
+ return status;
+
+free_struct:
+ efi_bs_call(free_pool, rom);
+ return status;
+}
+
+/*
+ * There's no way to return an informative status from this function,
+ * because any analysis (and printing of error messages) needs to be
+ * done directly at the EFI function call-site.
+ *
+ * For example, EFI_INVALID_PARAMETER could indicate a bug or maybe we
+ * just didn't find any PCI devices, but there's no way to tell outside
+ * the context of the call.
+ */
+static void setup_efi_pci(struct boot_params *params)
+{
+ efi_status_t status;
+ void **pci_handle = NULL;
+ efi_guid_t pci_proto = EFI_PCI_IO_PROTOCOL_GUID;
+ unsigned long size = 0;
+ struct setup_data *data;
+ efi_handle_t h;
+ int i;
+
+ status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
+ &pci_proto, NULL, &size, pci_handle);
+
+ if (status == EFI_BUFFER_TOO_SMALL) {
+ status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
+ (void **)&pci_handle);
+
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to allocate memory for 'pci_handle'\n");
+ return;
+ }
+
+ status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
+ &pci_proto, NULL, &size, pci_handle);
+ }
+
+ if (status != EFI_SUCCESS)
+ goto free_handle;
+
+ data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
+
+ while (data && data->next)
+ data = (struct setup_data *)(unsigned long)data->next;
+
+ for_each_efi_handle(h, pci_handle, size, i) {
+ efi_pci_io_protocol_t *pci = NULL;
+ struct pci_setup_rom *rom;
+
+ status = efi_bs_call(handle_protocol, h, &pci_proto,
+ (void **)&pci);
+ if (status != EFI_SUCCESS || !pci)
+ continue;
+
+ status = preserve_pci_rom_image(pci, &rom);
+ if (status != EFI_SUCCESS)
+ continue;
+
+ if (data)
+ data->next = (unsigned long)rom;
+ else
+ params->hdr.setup_data = (unsigned long)rom;
+
+ data = (struct setup_data *)rom;
+ }
+
+free_handle:
+ efi_bs_call(free_pool, pci_handle);
+}
+
+/*
+ * See if we have Universal Graphics Adapter (UGA) protocol
+ */
+static efi_status_t
+setup_uga(struct screen_info *si, efi_guid_t *uga_proto, unsigned long size)
+{
+ efi_status_t status;
+ u32 width, height;
+ void **uga_handle = NULL;
+ efi_uga_draw_protocol_t *uga = NULL, *first_uga;
+ efi_handle_t handle;
+ int i;
+
+ status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
+ (void **)&uga_handle);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
+ uga_proto, NULL, &size, uga_handle);
+ if (status != EFI_SUCCESS)
+ goto free_handle;
+
+ height = 0;
+ width = 0;
+
+ first_uga = NULL;
+ for_each_efi_handle(handle, uga_handle, size, i) {
+ efi_guid_t pciio_proto = EFI_PCI_IO_PROTOCOL_GUID;
+ u32 w, h, depth, refresh;
+ void *pciio;
+
+ status = efi_bs_call(handle_protocol, handle, uga_proto,
+ (void **)&uga);
+ if (status != EFI_SUCCESS)
+ continue;
+
+ pciio = NULL;
+ efi_bs_call(handle_protocol, handle, &pciio_proto, &pciio);
+
+ status = efi_call_proto(uga, get_mode, &w, &h, &depth, &refresh);
+ if (status == EFI_SUCCESS && (!first_uga || pciio)) {
+ width = w;
+ height = h;
+
+ /*
+ * Once we've found a UGA supporting PCIIO,
+ * don't bother looking any further.
+ */
+ if (pciio)
+ break;
+
+ first_uga = uga;
+ }
+ }
+
+ if (!width && !height)
+ goto free_handle;
+
+ /* EFI framebuffer */
+ si->orig_video_isVGA = VIDEO_TYPE_EFI;
+
+ si->lfb_depth = 32;
+ si->lfb_width = width;
+ si->lfb_height = height;
+
+ si->red_size = 8;
+ si->red_pos = 16;
+ si->green_size = 8;
+ si->green_pos = 8;
+ si->blue_size = 8;
+ si->blue_pos = 0;
+ si->rsvd_size = 8;
+ si->rsvd_pos = 24;
+
+free_handle:
+ efi_bs_call(free_pool, uga_handle);
+
+ return status;
+}
+
+static void setup_graphics(struct boot_params *boot_params)
+{
+ efi_guid_t graphics_proto = EFI_GRAPHICS_OUTPUT_PROTOCOL_GUID;
+ struct screen_info *si;
+ efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
+ efi_status_t status;
+ unsigned long size;
+ void **gop_handle = NULL;
+ void **uga_handle = NULL;
+
+ si = &boot_params->screen_info;
+ memset(si, 0, sizeof(*si));
+
+ size = 0;
+ status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
+ &graphics_proto, NULL, &size, gop_handle);
+ if (status == EFI_BUFFER_TOO_SMALL)
+ status = efi_setup_gop(si, &graphics_proto, size);
+
+ if (status != EFI_SUCCESS) {
+ size = 0;
+ status = efi_bs_call(locate_handle, EFI_LOCATE_BY_PROTOCOL,
+ &uga_proto, NULL, &size, uga_handle);
+ if (status == EFI_BUFFER_TOO_SMALL)
+ setup_uga(si, &uga_proto, size);
+ }
+}
+
+static void retrieve_apple_device_properties(struct boot_params *boot_params)
+{
+ efi_guid_t guid = APPLE_PROPERTIES_PROTOCOL_GUID;
+ struct setup_data *data, *new;
+ efi_status_t status;
+ u32 size = 0;
+ apple_properties_protocol_t *p;
+
+ status = efi_bs_call(locate_protocol, &guid, NULL, (void **)&p);
+ if (status != EFI_SUCCESS)
+ return;
+
+ if (efi_table_attr(p, version) != 0x10000) {
+ efi_err("Unsupported properties proto version\n");
+ return;
+ }
+
+ efi_call_proto(p, get_all, NULL, &size);
+ if (!size)
+ return;
+
+ do {
+ status = efi_bs_call(allocate_pool, EFI_LOADER_DATA,
+ size + sizeof(struct setup_data),
+ (void **)&new);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to allocate memory for 'properties'\n");
+ return;
+ }
+
+ status = efi_call_proto(p, get_all, new->data, &size);
+
+ if (status == EFI_BUFFER_TOO_SMALL)
+ efi_bs_call(free_pool, new);
+ } while (status == EFI_BUFFER_TOO_SMALL);
+
+ new->type = SETUP_APPLE_PROPERTIES;
+ new->len = size;
+ new->next = 0;
+
+ data = (struct setup_data *)(unsigned long)boot_params->hdr.setup_data;
+ if (!data) {
+ boot_params->hdr.setup_data = (unsigned long)new;
+ } else {
+ while (data->next)
+ data = (struct setup_data *)(unsigned long)data->next;
+ data->next = (unsigned long)new;
+ }
+}
+
+static const efi_char16_t apple[] = L"Apple";
+
+static void setup_quirks(struct boot_params *boot_params)
+{
+ efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
+ efi_table_attr(efi_system_table, fw_vendor);
+
+ if (!memcmp(fw_vendor, apple, sizeof(apple))) {
+ if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
+ retrieve_apple_device_properties(boot_params);
+ }
+}
+
+efi_status_t efi_x86_stub_common(struct boot_params *boot_params,
+ efi_handle_t handle)
+{
+ efi_status_t status;
+
+ /*
+ * If the boot loader gave us a value for secure_boot then we use that,
+ * otherwise we ask the BIOS.
+ */
+ if (boot_params->secure_boot == efi_secureboot_mode_unset)
+ boot_params->secure_boot = efi_get_secureboot();
+
+ /* Ask the firmware to clear memory on unclean shutdown */
+ efi_enable_reset_attack_mitigation();
+
+ efi_random_get_seed();
+
+ efi_retrieve_tpm2_eventlog();
+
+ setup_graphics(boot_params);
+
+ setup_efi_pci(boot_params);
+
+ setup_quirks(boot_params);
+
+ status = exit_boot(boot_params, handle);
+ if (status != EFI_SUCCESS)
+ efi_err("exit_boot() failed!\n");
+
+ return status;
+}
+
+struct boot_params *efi_alloc_boot_params(void)
+{
+ struct boot_params *boot_params;
+ efi_status_t status;
+
+ status = efi_allocate_pages(sizeof(struct boot_params),
+ (unsigned long *)&boot_params, ULONG_MAX);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to allocate lowmem for boot params\n");
+ return NULL;
+ }
+
+ memset(boot_params, 0x0, sizeof(struct boot_params));
+
+ /*
+ * Fill out some of the header fields ourselves because the
+ * EFI firmware loader doesn't load the first sector.
+ */
+ boot_params->hdr.root_flags = 1;
+ boot_params->hdr.vid_mode = 0xffff;
+ boot_params->hdr.boot_flag = 0xAA55;
+
+ return boot_params;
+}
--
2.39.2

2023-04-16 12:08:46

by Ard Biesheuvel

[permalink] [raw]
Subject: [RFC PATCH 2/3] efi/zboot: x86: Implement EFI zboot support

Wire up the Kbuild rules and implement the missing pieces that permit
the 64-bit x86 kernel to be built as a EFI zboot image, i.e., the
generic self-decompressing format that is already supported for arm64,
RISC-V, LoongArch and [shortly] ARM.

Both physical and virtual KASLR are supported, as well as 5 level
paging, which are the primary reasons we rely on the bare metal
decompressor today.

EFI mixed mode (i.e., running the 64-bit kernel on a 64-bit CPU that
booted using 32-bit firmware) is not supported - 32-bit EFI may not
enable paging at all, or run with PAE disabled, in which case the long
mode switch requires setting up new page tables etc. Implementing mixed
mode in a way that only supports 32-bit firmware that enters with paging
and PAE enabled should be rather straight-forward, and could be
considered as a future enhancement.

Another thing that is not supported is the EFI handover protocol, which
has no basis in the EFI spec, and is only implemented by downstream GRUB
builds packaged by the distros.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/Makefile | 18 +-
arch/x86/include/asm/efi.h | 5 +
arch/x86/kernel/head_64.S | 11 +
arch/x86/zboot/Makefile | 29 ++
drivers/firmware/efi/Kconfig | 2 +-
drivers/firmware/efi/libstub/Makefile | 13 +-
drivers/firmware/efi/libstub/Makefile.zboot | 2 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 3 +
drivers/firmware/efi/libstub/x86-stub.c | 1 -
drivers/firmware/efi/libstub/x86-zboot.c | 295 ++++++++++++++++++++
drivers/firmware/efi/libstub/zboot.c | 3 +-
drivers/firmware/efi/libstub/zboot.lds | 5 +
12 files changed, 375 insertions(+), 12 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index b39975977c037c03..a9ef9f6679c8a3ef 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -268,25 +268,33 @@ boot := arch/x86/boot

BOOT_TARGETS = bzdisk fdimage fdimage144 fdimage288 hdimage isoimage

-PHONY += bzImage $(BOOT_TARGETS)
-
-# Default kernel to build
-all: bzImage
+PHONY += bzImage vmlinuz.efi $(BOOT_TARGETS)

# KBUILD_IMAGE specify target image being built
+ifeq ($(CONFIG_EFI_ZBOOT),)
KBUILD_IMAGE := $(boot)/bzImage
+else
+KBUILD_IMAGE := arch/x86/zboot/vmlinuz.efi
+endif
+
+# Default kernel to build
+all: $(notdir $(KBUILD_IMAGE))

bzImage: vmlinux
ifeq ($(CONFIG_X86_DECODER_SELFTEST),y)
$(Q)$(MAKE) $(build)=arch/x86/tools posttest
endif
- $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
+ $(Q)$(MAKE) $(build)=$(boot) $(boot)/$(@)
$(Q)mkdir -p $(objtree)/arch/$(UTS_MACHINE)/boot
$(Q)ln -fsn ../../x86/boot/bzImage $(objtree)/arch/$(UTS_MACHINE)/boot/$@

$(BOOT_TARGETS): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $@

+vmlinuz.efi: zboot := arch/x86/zboot
+vmlinuz.efi: vmlinux
+ $(Q)$(MAKE) $(build)=$(zboot) $(zboot)/$@
+
PHONY += install
install:
$(call cmd,install)
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index dd49cb9b6e3a1f1f..35d49f45260d3c72 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -471,4 +471,9 @@ static inline int efi_runtime_map_copy(void *buf, size_t bufsz)

#endif

+static inline unsigned long efi_get_kimg_min_align(void)
+{
+ return SZ_2M;
+}
+
#endif /* _ASM_X86_EFI_H */
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 222efd4a09bc8861..4ae067852fb28663 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -64,6 +64,17 @@ SYM_CODE_START_NOALIGN(startup_64)
/* Set up the stack for verify_cpu(), similar to initial_stack below */
leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp

+#ifdef CONFIG_EFI_ZBOOT
+ /*
+ * The generic EFI zboot code expects a __le32 at offset 0x10 of the
+ * decompressed image describing the size in memory of the kernel
+ * image. This is typically part of the image header, but we don't have
+ * such a header on x86 so just put the bare number here, encoded in a
+ * NOP instruction.
+ */
+ .org startup_64 + 0x10 - 3, BYTES_NOP1
+ nopl (_end - startup_64)(%rax)
+#endif
leaq _text(%rip), %rdi

/*
diff --git a/arch/x86/zboot/Makefile b/arch/x86/zboot/Makefile
new file mode 100644
index 0000000000000000..dce47a01ff482550
--- /dev/null
+++ b/arch/x86/zboot/Makefile
@@ -0,0 +1,29 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2023 Google LLC. <[email protected]>
+#
+
+$(obj)/Image: OBJCOPYFLAGS := -O binary -S \
+ -R .note -R .note.gnu.build-id -R .comment
+$(obj)/Image: vmlinux FORCE
+ $(call if_changed,objcopy)
+
+CMD_RELOCS := arch/x86/tools/relocs
+
+quiet_cmd_relocs = RELOCS $@
+ cmd_relocs = $(CMD_RELOCS) $< > $@
+
+$(obj)/vmlinux.relocs: vmlinux FORCE
+ $(call if_changed,relocs)
+
+efi-zboot-relocs-$(CONFIG_X86_NEED_RELOCS) := $(obj)/vmlinux.relocs
+EFI_ZBOOT_PAYLOAD_TRAILER := $(efi-zboot-relocs-y)
+
+EFI_ZBOOT_PAYLOAD := Image
+EFI_ZBOOT_BFD_TARGET := elf64-x86-64
+EFI_ZBOOT_MACH_TYPE := AMD64
+EFI_ZBOOT_FORWARD_CFI := $(CONFIG_X86_KERNEL_IBT)
+
+targets := Image vmlinux.relocs
+
+include $(srctree)/drivers/firmware/efi/libstub/Makefile.zboot
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 043ca31c114ebf2a..b959bf41a49a97e1 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -74,7 +74,7 @@ config EFI_GENERIC_STUB

config EFI_ZBOOT
bool "Enable the generic EFI decompressor"
- depends on EFI_GENERIC_STUB && !ARM
+ depends on (EFI_GENERIC_STUB && !ARM) || X86_64
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZ4
select HAVE_KERNEL_LZMA
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 4dfbfac254614f18..2d733208e1b1efbe 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -9,6 +9,9 @@
# non-x86 reuses KBUILD_CFLAGS, x86 does not
cflags-y := $(KBUILD_CFLAGS)

+cflags-x86-$(CONFIG_X86_KERNEL_IBT) := \
+ $(call cc-option,-fcf-protection=branch -fno-jump-tables)
+
cflags-$(CONFIG_X86_32) := -march=i386
cflags-$(CONFIG_X86_64) := -mcmodel=small
cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ \
@@ -18,7 +21,7 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ \
$(call cc-disable-warning, address-of-packed-member) \
$(call cc-disable-warning, gnu) \
-fno-asynchronous-unwind-tables \
- $(CLANG_FLAGS)
+ $(CLANG_FLAGS) $(cflags-x86-y)

# arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
# disable the stackleak plugin
@@ -82,8 +85,8 @@ lib-$(CONFIG_EFI_PARAMS_FROM_FDT) += fdt.o \
$(obj)/lib-%.o: $(srctree)/lib/%.c FORCE
$(call if_changed_rule,cc_o_c)

-lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \
- screen_info.o efi-stub-entry.o
+lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o efi-stub-entry.o screen_info.o
+lib-y += string.o intrinsics.o systable.o

lib-$(CONFIG_ARM) += arm32-stub.o
lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o smbios.o
@@ -91,8 +94,12 @@ lib-$(CONFIG_X86) += x86.o x86-stub.o
lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o

+cflags-zboot-$(CONFIG_X86) := -Defi_zboot_entry=__efistub_efi_zboot_entry
+CFLAGS_zboot.o := $(cflags-zboot-y)
+
CFLAGS_arm32-stub.o := -DTEXT_OFFSET=$(TEXT_OFFSET)

+zboot-obj-$(CONFIG_X86_64) := x86-zboot.o
zboot-obj-$(CONFIG_RISCV) := lib-clz_ctz.o lib-ashldi3.o
lib-$(CONFIG_EFI_ZBOOT) += zboot.o $(zboot-obj-y)

diff --git a/drivers/firmware/efi/libstub/Makefile.zboot b/drivers/firmware/efi/libstub/Makefile.zboot
index d34d4f0ed33349d5..dbf2588ccaa625bd 100644
--- a/drivers/firmware/efi/libstub/Makefile.zboot
+++ b/drivers/firmware/efi/libstub/Makefile.zboot
@@ -29,7 +29,7 @@ zboot-size-len-y := 4
zboot-method-$(CONFIG_KERNEL_GZIP) := gzip
zboot-size-len-$(CONFIG_KERNEL_GZIP) := 0

-$(obj)/vmlinuz: $(obj)/vmlinux.bin FORCE
+$(obj)/vmlinuz: $(obj)/vmlinux.bin $(EFI_ZBOOT_PAYLOAD_TRAILER) FORCE
$(call if_changed,$(zboot-method-y))

OBJCOPYFLAGS_vmlinuz.o := -I binary -O $(EFI_ZBOOT_BFD_TARGET) \
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 1e0203d74691ffcc..276d94ed31884308 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -16,6 +16,7 @@

#include "efistub.h"

+bool efi_no5lvl;
bool efi_nochunk;
bool efi_nokaslr = !IS_ENABLED(CONFIG_RANDOMIZE_BASE);
bool efi_novamap;
@@ -73,6 +74,8 @@ efi_status_t efi_parse_options(char const *cmdline)
efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
} else if (!strcmp(param, "noinitrd")) {
efi_noinitrd = true;
+ } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
+ efi_no5lvl = true;
} else if (!strcmp(param, "efi") && val) {
efi_nochunk = parse_option_str(val, "nochunk");
efi_novamap |= parse_option_str(val, "novamap");
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index d2b75025295822c7..d60c3cb8e6cbd0a4 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -19,7 +19,6 @@
/* Maximum physical address for 64-bit kernel with 4-level paging */
#define MAXMEM_X86_64_4LEVEL (1ull << 46)

-const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
diff --git a/drivers/firmware/efi/libstub/x86-zboot.c b/drivers/firmware/efi/libstub/x86-zboot.c
new file mode 100644
index 0000000000000000..16e8b315892dedda
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-zboot.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 Google LLC. <[email protected]>
+ */
+
+#include <linux/efi.h>
+#include <linux/pci.h>
+#include <linux/stddef.h>
+
+#include <asm/efi.h>
+#include <asm/e820/types.h>
+#include <asm/setup.h>
+#include <asm/desc.h>
+#include <asm/boot.h>
+
+#include "efistub.h"
+
+extern char _gzdata_end[];
+extern bool efi_no5lvl;
+
+static const struct desc_struct gdt[] = {
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
+};
+
+static void (*la57_toggle)(void *cr3, void *gdt);
+
+#ifdef CONFIG_EFI_MIXED
+const bool efi_is64 = true;
+
+u64 __efi64_thunk(u32 func, ...)
+{
+ return EFI_UNSUPPORTED;
+}
+#endif
+
+efi_status_t efi_handle_cmdline(efi_loaded_image_t *image, char **cmdline_ptr)
+{
+ int options_size = 0;
+ efi_status_t status;
+
+ /* Convert unicode cmdline to ascii */
+ *cmdline_ptr = efi_convert_cmdline(image, &options_size);
+ if (!*cmdline_ptr)
+ return EFI_OUT_OF_RESOURCES;
+
+#ifdef CONFIG_CMDLINE_BOOL
+ status = efi_parse_options(CONFIG_CMDLINE);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to parse options\n");
+ return status;
+ }
+#endif
+ if (!IS_ENABLED(CONFIG_CMDLINE_OVERRIDE)) {
+ status = efi_parse_options(*cmdline_ptr);
+ if (status != EFI_SUCCESS)
+ efi_err("Failed to parse options\n");
+ }
+ return status;
+}
+
+void efi_cache_sync_image(unsigned long image_base, unsigned long alloc_size)
+{
+ const u32 payload_size = *(u32 *)(_gzdata_end - 4);
+ const u32 image_size = *(u32 *)(image_base + 0x10);
+ const s32 *reloc = (s32 *)(image_base + payload_size);
+ u64 va_offset = __START_KERNEL - image_base;
+ u64 range, delta;
+ u32 seed;
+
+ if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE) ||
+ image_size == payload_size ||
+ efi_get_random_bytes(sizeof(seed), (u8 *)&seed) != EFI_SUCCESS)
+ return;
+
+ range = KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR - image_size;
+ delta = LOAD_PHYSICAL_ADDR + ((seed * range) >> 32UL);
+ delta &= ~(CONFIG_PHYSICAL_ALIGN - 1);
+
+ /*
+ * Process relocations: 32 bit relocations first then 64 bit after.
+ * Three sets of binary relocations are added to the end of the kernel
+ * before compression. Each relocation table entry is the kernel
+ * address of the location which needs to be updated stored as a
+ * 32-bit value which is sign extended to 64 bits.
+ *
+ * Format is:
+ *
+ * kernel bits...
+ * 0 - zero terminator for 64 bit relocations
+ * 64 bit relocation repeated
+ * 0 - zero terminator for inverse 32 bit relocations
+ * 32 bit inverse relocation repeated
+ * 0 - zero terminator for 32 bit relocations
+ * 32 bit relocation repeated
+ *
+ * So we work backwards from the end of the decompressed image.
+ */
+ while (*--reloc)
+ *(u32 *)((s64)*reloc - va_offset) += delta;
+
+ while (*--reloc)
+ *(u32 *)((s64)*reloc - va_offset) -= delta;
+
+ while (*--reloc)
+ *(u64 *)((s64)*reloc - va_offset) += delta;
+
+ efi_free(alloc_size - image_size, image_base + image_size);
+}
+
+static void __naked tmpl_toggle(void *cr3, void *gdt)
+{
+ /*
+ * This is template code that will be copied into a 32-bit addressable
+ * buffer, allowing us to drop to 32-bit mode with paging disabled,
+ * which is required to be able to toggle the CR4.LA57 bit.
+ *
+ * The first MOVB instruction is only there to capture the size of the
+ * sequence, and implicitly, the offset to the LJMP's immediate, which
+ * will be populated with the correct absolute address after copying.
+ */
+ asm("0: movb $(3f - .), %%al \n\t"
+ " lgdt (%%rsi) \n\t"
+ " movw %[ds], %%ax \n\t"
+ " movw %%ax, %%ds \n\t"
+ " movw %%ax, %%ss \n\t"
+ " leaq 2f(%%rip), %%rax \n\t"
+ " pushq %[cs32] \n\t"
+ " pushq %%rax \n\t"
+ " lretq \n\t"
+ "1: retq \n\t"
+ " .code32 \n\t"
+ "2: movl %%cr0, %%eax \n\t"
+ " btrl %[pg], %%eax \n\t"
+ " movl %%eax, %%cr0 \n\t"
+ " movl %%cr4, %%ecx \n\t"
+ " btcl %[la57], %%ecx \n\t"
+ " movl %%ecx, %%cr4 \n\t"
+ " movl %%edi, %%cr3 \n\t"
+ " btsl %[pg], %%eax \n\t"
+ " movl %%eax, %%cr0 \n\t"
+ " ljmpl %[cs], $(1b - 0b) \n\t"
+ "3: .code64"
+ :
+ : [cs32] "i"(__KERNEL32_CS),
+ [cs] "i"(__KERNEL_CS),
+ [ds] "i"(__KERNEL_DS),
+ [pg] "i"(X86_CR0_PG_BIT),
+ [la57] "i"(X86_CR4_LA57_BIT));
+}
+
+/*
+ * Enabling (or disabling) 5 level paging is tricky, because it can only be
+ * done from 32-bit mode with paging disabled. This means not only that the
+ * code itself must be running from 32-bit addressable physical memory, but
+ * also that the root page table must be 32-bit addressable, as we cannot
+ * program a 64-bit value into CR3 when running in 32-bit mode.
+ */
+static efi_status_t efi_setup_5level_paging(void)
+{
+ bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
+ bool have_la57 = native_read_cr4() & X86_CR4_LA57;
+ const u8 tmpl_size = ((u8 *)tmpl_toggle)[1];
+ efi_status_t status;
+ u8 *la57_code;
+
+ /* check for 5 level paging support */
+ if (native_cpuid_eax(0) < 7 ||
+ !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ return EFI_SUCCESS;
+
+ /*
+ * If LA57 is supported but disabled, and we have no interest in
+ * enabling it, we can bail here. In all other cases, we need to
+ * prepare the toggle support routine, even for the case where LA57 is
+ * currently on and we want to keep it on, as the firmware might return
+ * from ExitBootServices() with LA57 disabled.
+ */
+ if (!want_la57 && !have_la57)
+ return EFI_SUCCESS;
+
+ /* allocate some 32-bit addressable memory for code and a page table */
+ status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
+ U32_MAX);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ la57_toggle = memcpy(la57_code, tmpl_toggle, tmpl_size);
+ memset(la57_code + tmpl_size, 0x0, 2 * PAGE_SIZE - tmpl_size);
+
+ /*
+ * To avoid having to allocate a 32-bit addressable stack, we use a
+ * ljmp to switch back to long mode. However, this takes an absolute
+ * address, so we have to poke it in at runtime. The dummy MOVB
+ * instruction at the beginning can be used to locate the immediate.
+ */
+ *(u32 *)&la57_code[tmpl_size - 6] += (u64)la57_code;
+
+ return EFI_SUCCESS;
+}
+
+static void efi_5level_switch(void)
+{
+ bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
+ bool have_la57 = native_read_cr4() & X86_CR4_LA57;
+ u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
+ u64 *cr3 = (u64 *)__native_read_cr3();
+ struct desc_ptr desc;
+ u64 *new_cr3;
+
+ if (!la57_toggle || (want_la57 && have_la57))
+ return;
+
+ if (!have_la57) {
+ /*
+ * We are going to enable 5 level paging, so we need to
+ * allocate a root level page from the 32-bit addressable
+ * physical region, and plug the existing hierarchy into it.
+ */
+ new_cr3 = pgt;
+ new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
+ } else {
+ // take the new root table pointer from the current entry #0
+ new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+
+ // copy the new root level table if it is not 32-bit addressable
+ if ((u64)new_cr3 > U32_MAX) {
+ for (int i = 0; i < PTRS_PER_PGD; i++)
+ pgt[i] = new_cr3[i];
+ new_cr3 = pgt;
+ }
+ }
+
+ desc.size = sizeof(gdt) - 1;
+ desc.address = (u64)gdt;
+
+ la57_toggle(new_cr3, &desc);
+}
+
+efi_status_t efi_stub_common(efi_handle_t handle,
+ efi_loaded_image_t *image,
+ unsigned long image_addr,
+ char *cmdline_ptr)
+{
+ void __noreturn (*startup_64)(void *, struct boot_params *);
+ const struct linux_efi_initrd *initrd = NULL;
+ struct boot_params *boot_params;
+ struct setup_header *hdr;
+ efi_status_t status;
+
+ status = efi_setup_5level_paging();
+ if (status != EFI_SUCCESS) {
+ efi_err("efi_setup_5level_paging() failed!\n");
+ return status;
+ }
+
+ boot_params = efi_alloc_boot_params();
+ if (!boot_params)
+ return EFI_OUT_OF_RESOURCES;
+
+ hdr = &boot_params->hdr;
+ hdr->type_of_loader = 0x21;
+
+ efi_set_u64_split((unsigned long)cmdline_ptr,
+ &hdr->cmd_line_ptr, &boot_params->ext_cmd_line_ptr);
+
+ status = efi_load_initrd(image, hdr->initrd_addr_max, ULONG_MAX,
+ &initrd);
+ if (status != EFI_SUCCESS)
+ goto fail;
+ if (initrd && initrd->size > 0) {
+ efi_set_u64_split(initrd->base, &hdr->ramdisk_image,
+ &boot_params->ext_ramdisk_image);
+ efi_set_u64_split(initrd->size, &hdr->ramdisk_size,
+ &boot_params->ext_ramdisk_size);
+ }
+
+ status = efi_x86_stub_common(boot_params, handle);
+ if (status != EFI_SUCCESS)
+ goto fail;
+
+ efi_5level_switch();
+
+ startup_64 = (void *)image_addr;
+ startup_64(NULL, boot_params);
+fail:
+ efi_free(sizeof(struct boot_params), (unsigned long)boot_params);
+ return status;
+}
+
+struct screen_info *__alloc_screen_info(void)
+{
+ return NULL;
+}
diff --git a/drivers/firmware/efi/libstub/zboot.c b/drivers/firmware/efi/libstub/zboot.c
index e5d7fa1f1d8fd160..7cc78bb1253af675 100644
--- a/drivers/firmware/efi/libstub/zboot.c
+++ b/drivers/firmware/efi/libstub/zboot.c
@@ -65,6 +65,7 @@ asmlinkage efi_status_t __efiapi
efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab)
{
unsigned long compressed_size = _gzdata_end - _gzdata_start;
+ efi_guid_t loaded_image = LOADED_IMAGE_PROTOCOL_GUID;
unsigned long image_base, alloc_size;
efi_loaded_image_t *image;
efi_status_t status;
@@ -77,7 +78,7 @@ efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab)
free_mem_end_ptr = free_mem_ptr + sizeof(zboot_heap);

status = efi_bs_call(handle_protocol, handle,
- &LOADED_IMAGE_PROTOCOL_GUID, (void **)&image);
+ &loaded_image, (void **)&image);
if (status != EFI_SUCCESS) {
error("Failed to locate parent's loaded image protocol");
return status;
diff --git a/drivers/firmware/efi/libstub/zboot.lds b/drivers/firmware/efi/libstub/zboot.lds
index 93d33f68333b2b68..13a4d3e6b3117910 100644
--- a/drivers/firmware/efi/libstub/zboot.lds
+++ b/drivers/firmware/efi/libstub/zboot.lds
@@ -14,8 +14,10 @@ SECTIONS

.rodata : ALIGN(8) {
__efistub__gzdata_start = .;
+ _gzdata_start = .;
*(.gzdata)
__efistub__gzdata_end = .;
+ _gzdata_end = .;
*(.rodata* .init.rodata* .srodata*)
_etext = ALIGN(4096);
. = _etext;
@@ -35,11 +37,14 @@ SECTIONS

/DISCARD/ : {
*(.modinfo .init.modinfo)
+ *(.discard*)
}
}

PROVIDE(__efistub__gzdata_size =
ABSOLUTE(__efistub__gzdata_end - __efistub__gzdata_start));

+PROVIDE(_gzdata_size = __efistub__gzdata_size);
+
PROVIDE(__data_rawsize = ABSOLUTE(_edata - _etext));
PROVIDE(__data_size = ABSOLUTE(_end - _etext));
--
2.39.2

2023-04-16 12:08:59

by Ard Biesheuvel

[permalink] [raw]
Subject: [RFC PATCH 3/3] efi/zboot: x86: Clear NX restrictions on populated code regions

Future EFI firmware will require the PE/COFF NX_COMPAT header flag to be
set in order to retain access to all system facilities while features
such as UEFI secure boot or TCG measured boot are enabled.

The consequence of setting this flag is that the EFI firmware image
loader may configure the page allocator to set the NX attribute on all
allocations requested by the image. This means we should clear this
attribute on all regions we allocate and expect to be able to execute
from.

In the x86 EFI zboot case, the only code we execute under EFI's 1:1
mapping that was not loaded by the image loader itself is the trampoline
that effectuates the switch between 4 and 5 level paging, and the part
of the loaded kernel image that runs before switching to its own page
tables. So let's use the EFI memory attributes protocol to clear the NX
attribute on these regions.

Whether or not setting the read-only attribute first is required is
unclear at this point. Given that the kernel startup code uses two
different executable sections before switching to its own page tables
(normal text and inittext, with a writable data section in between),
this would require some minor reorganization of the kernel memory map.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 4 +++
drivers/firmware/efi/libstub/x86-zboot.c | 27 ++++++++++++++++++++
2 files changed, 31 insertions(+)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 4ae067852fb28663..38897ac51f13bb55 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -74,6 +74,10 @@ SYM_CODE_START_NOALIGN(startup_64)
*/
.org startup_64 + 0x10 - 3, BYTES_NOP1
nopl (_end - startup_64)(%rax)
+
+ /* put the size of the initial executable mapping at offset 0x20 */
+ .org startup_64 + 0x20 - 3, BYTES_NOP1
+ nopl (_einittext - startup_64)(%rax)
#endif
leaq _text(%rip), %rdi

diff --git a/drivers/firmware/efi/libstub/x86-zboot.c b/drivers/firmware/efi/libstub/x86-zboot.c
index 16e8b315892dedda..70668104804fb050 100644
--- a/drivers/firmware/efi/libstub/x86-zboot.c
+++ b/drivers/firmware/efi/libstub/x86-zboot.c
@@ -60,10 +60,33 @@ efi_status_t efi_handle_cmdline(efi_loaded_image_t *image, char **cmdline_ptr)
return status;
}

+static void efi_remap_exec(unsigned long base, unsigned long size)
+{
+ static efi_memory_attribute_protocol_t *memattr = (void *)ULONG_MAX;
+ efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
+ efi_status_t status;
+
+ if (memattr == (void *)ULONG_MAX) {
+ memattr = NULL;
+ status = efi_bs_call(locate_protocol, &guid, NULL,
+ (void **)&memattr);
+ if (status != EFI_SUCCESS)
+ return;
+ } else if (!memattr) {
+ return;
+ }
+
+ status = memattr->clear_memory_attributes(memattr, base, size,
+ EFI_MEMORY_XP);
+ if (status != EFI_SUCCESS)
+ efi_warn("Failed to clear NX attribute on code region\n");
+}
+
void efi_cache_sync_image(unsigned long image_base, unsigned long alloc_size)
{
const u32 payload_size = *(u32 *)(_gzdata_end - 4);
const u32 image_size = *(u32 *)(image_base + 0x10);
+ const u32 code_size = *(u32 *)(image_base + 0x20);
const s32 *reloc = (s32 *)(image_base + payload_size);
u64 va_offset = __START_KERNEL - image_base;
u64 range, delta;
@@ -107,6 +130,8 @@ void efi_cache_sync_image(unsigned long image_base, unsigned long alloc_size)
*(u64 *)((s64)*reloc - va_offset) += delta;

efi_free(alloc_size - image_size, image_base + image_size);
+
+ efi_remap_exec(image_base, PAGE_ALIGN(code_size));
}

static void __naked tmpl_toggle(void *cr3, void *gdt)
@@ -197,6 +222,8 @@ static efi_status_t efi_setup_5level_paging(void)
*/
*(u32 *)&la57_code[tmpl_size - 6] += (u64)la57_code;

+ efi_remap_exec((unsigned long)la57_code, PAGE_SIZE);
+
return EFI_SUCCESS;
}

--
2.39.2

2023-04-18 14:14:35

by Evgeniy Baskov

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

On 2023-04-16 15:07, Ard Biesheuvel wrote:
> This series is a proof-of-concept that implements support for the EFI
> zboot decompressor for x86. It replaces the ordinary decompressor, and
> instead, performs the decompression, KASLR randomization and the 4/5
> level paging switch while running in the execution context of EFI.
>
> This simplifies things substantially, and makes it straight-forward to
> abide by stricter future requirements related to the use of writable
> and
> executable memory under EFI, which will come into effect on x86 systems
> that are certified as being 'more secure', and ship with an even
> shinier
> Windows sticker.
>
> This is an alternative approach to the work being proposed by Evgeny
> [0]
> that makes rather radical changes to the existing decompressor, which
> has accumulated too many features already, e.g., related to
> confidential
> compute etc.
>
> EFI zboot images can be booted in two ways:
> - by EFI firmware, which loads and starts it as an ordinary EFI
> application, just like the existing EFI stub (with which it shares
> most of its code);
> - by a non-EFI loader that parses the image header for the compression
> metadata, and decompresses the image into memory and boots it.
>
> Realistically, the second option is unlikely to ever be used on x86,
> given that it already has its existing bzImage, but the first option is
> a good choice for distros that target EFI boot only (and some distros
> switched to this format already for arm64). The fact that EFI zboot is
> implemented in the same way on arm64, RISC-V, LoongArch and [shortly]
> ARM helps with maintenance, not only of the kernel itself, but also the
> tooling around it relating to kexec, code signing, deployment, etc.
>
> Series can be pulled from [1], which contains some prerequisite patches
> that are only tangentially related.

I've considered using similar approach when I was writing my series.
That looks great if you look at subject without considering backwards
compatibility, especially due to the increased code sharing and the
usage
of the code path without all the legacy stuff. But, I think, that zboot
approach have two downsides:

* Most distros won't use it, due to backward compatibility, so they
won't
benefit the improvements.
* Those distros that would potentially use it, have to be either
DIY-ish like Gentoo, or provide both kernels during installation.
So it either complicates installation process or installer logic.

It might work for UEFI-only distros, but those won't be a majority for a
little while for x86, I think. Because it's very likely that a lot of
people
will complain if distro provides zboot-only kernel (considering that the
same story with the handover protocol). Backward compatibility is evil.

So, I think, at least for now, it would still be better to change
existing
extraction code and stay compatible, despite it being harder and less
clean...

(zboot also lacks the support for some kernel options, like ones used
for
tweaking memory map; mixed mode support and probably the handling of
CONFIG_MEMORY_HOTREMOVE, but that's an RFC, so this comment is largely
irrelevant for now.)

Thanks,
Evgeniy Baskov

>
> [0] https://lore.kernel.org/all/[email protected]/
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-x86-zboot
>
> Cc: Evgeniy Baskov <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Alexey Khoroshilov <[email protected]>
> Cc: Peter Jones <[email protected]>
> Cc: Gerd Hoffmann <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Mario Limonciello <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Linus Torvalds <[email protected]>
>
> Ard Biesheuvel (3):
> efi/libstub: x86: Split off pieces shared with zboot
> efi/zboot: x86: Implement EFI zboot support
> efi/zboot: x86: Clear NX restrictions on populated code regions
>
> arch/x86/Makefile | 18 +-
> arch/x86/include/asm/efi.h | 10 +
> arch/x86/kernel/head_64.S | 15 +
> arch/x86/zboot/Makefile | 29 +
> drivers/firmware/efi/Kconfig | 2 +-
> drivers/firmware/efi/libstub/Makefile | 15 +-
> drivers/firmware/efi/libstub/Makefile.zboot | 2 +-
> drivers/firmware/efi/libstub/efi-stub-helper.c | 3 +
> drivers/firmware/efi/libstub/x86-stub.c | 592
> +------------------
> drivers/firmware/efi/libstub/x86-zboot.c | 322 ++++++++++
> drivers/firmware/efi/libstub/x86.c | 612
> ++++++++++++++++++++
> drivers/firmware/efi/libstub/zboot.c | 3 +-
> drivers/firmware/efi/libstub/zboot.lds | 5 +
> 13 files changed, 1031 insertions(+), 597 deletions(-)
> create mode 100644 arch/x86/zboot/Makefile
> create mode 100644 drivers/firmware/efi/libstub/x86-zboot.c
> create mode 100644 drivers/firmware/efi/libstub/x86.c

2023-04-19 03:06:32

by Dave Young

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

Hi,

[resend the reply since I mistakenly sent a html version, apologize for
those who received two of this reply]
On 04/16/23 at 02:07pm, Ard Biesheuvel wrote:
> This series is a proof-of-concept that implements support for the EFI
> zboot decompressor for x86. It replaces the ordinary decompressor, and
> instead, performs the decompression, KASLR randomization and the 4/5
> level paging switch while running in the execution context of EFI.
>
> This simplifies things substantially, and makes it straight-forward to
> abide by stricter future requirements related to the use of writable and
> executable memory under EFI, which will come into effect on x86 systems
> that are certified as being 'more secure', and ship with an even shinier
> Windows sticker.
>
> This is an alternative approach to the work being proposed by Evgeny [0]
> that makes rather radical changes to the existing decompressor, which
> has accumulated too many features already, e.g., related to confidential
> compute etc.
>
> EFI zboot images can be booted in two ways:
> - by EFI firmware, which loads and starts it as an ordinary EFI
> application, just like the existing EFI stub (with which it shares
> most of its code);
> - by a non-EFI loader that parses the image header for the compression
> metadata, and decompresses the image into memory and boots it.
>
> Realistically, the second option is unlikely to ever be used on x86,
> given that it already has its existing bzImage, but the first option is
> a good choice for distros that target EFI boot only (and some distros
> switched to this format already for arm64). The fact that EFI zboot is
> implemented in the same way on arm64, RISC-V, LoongArch and [shortly]
> ARM helps with maintenance, not only of the kernel itself, but also the
> tooling around it relating to kexec, code signing, deployment, etc.

Hi Ard, since the kexec support is not yet ready, if no quick plan how
about change the Kconfig and make zboot can be enabled only when kexec
is not enabled.

>
> Series can be pulled from [1], which contains some prerequisite patches
> that are only tangentially related.
>
> [0] https://lore.kernel.org/all/[email protected]/
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-x86-zboot
>
> Cc: Evgeniy Baskov <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Alexey Khoroshilov <[email protected]>
> Cc: Peter Jones <[email protected]>
> Cc: Gerd Hoffmann <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Mario Limonciello <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Linus Torvalds <[email protected]>
>
> Ard Biesheuvel (3):
> efi/libstub: x86: Split off pieces shared with zboot
> efi/zboot: x86: Implement EFI zboot support
> efi/zboot: x86: Clear NX restrictions on populated code regions
>
> arch/x86/Makefile | 18 +-
> arch/x86/include/asm/efi.h | 10 +
> arch/x86/kernel/head_64.S | 15 +
> arch/x86/zboot/Makefile | 29 +
> drivers/firmware/efi/Kconfig | 2 +-
> drivers/firmware/efi/libstub/Makefile | 15 +-
> drivers/firmware/efi/libstub/Makefile.zboot | 2 +-
> drivers/firmware/efi/libstub/efi-stub-helper.c | 3 +
> drivers/firmware/efi/libstub/x86-stub.c | 592 +------------------
> drivers/firmware/efi/libstub/x86-zboot.c | 322 ++++++++++
> drivers/firmware/efi/libstub/x86.c | 612 ++++++++++++++++++++
> drivers/firmware/efi/libstub/zboot.c | 3 +-
> drivers/firmware/efi/libstub/zboot.lds | 5 +
> 13 files changed, 1031 insertions(+), 597 deletions(-)
> create mode 100644 arch/x86/zboot/Makefile
> create mode 100644 drivers/firmware/efi/libstub/x86-zboot.c
> create mode 100644 drivers/firmware/efi/libstub/x86.c
>
> --
> 2.39.2
>

2023-04-19 05:57:29

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

On Sun, Apr 16, 2023 at 02:07:26PM +0200, Ard Biesheuvel wrote:
> This series is a proof-of-concept that implements support for the EFI
> zboot decompressor for x86. It replaces the ordinary decompressor, and
> instead, performs the decompression, KASLR randomization and the 4/5
> level paging switch while running in the execution context of EFI.
>
> This simplifies things substantially, and makes it straight-forward to
> abide by stricter future requirements related to the use of writable and
> executable memory under EFI, which will come into effect on x86 systems
> that are certified as being 'more secure', and ship with an even shinier
> Windows sticker.
>
> This is an alternative approach to the work being proposed by Evgeny [0]
> that makes rather radical changes to the existing decompressor, which
> has accumulated too many features already, e.g., related to confidential
> compute etc.
>
> EFI zboot images can be booted in two ways:
> - by EFI firmware, which loads and starts it as an ordinary EFI
> application, just like the existing EFI stub (with which it shares
> most of its code);
> - by a non-EFI loader that parses the image header for the compression
> metadata, and decompresses the image into memory and boots it.

I like the idea to have all EFI archs handle compressed kernels the same
way.

But given that going EFI-only on x86 isn't a realistic option for
distros today this isn't really an alternative for Evgeny's patch
series, we have to fix the existing bzImage decompressor too.

> Realistically, the second option is unlikely to ever be used on x86,

What would be needed to do so? Teach kexec-tools and grub2 parse and
load zboot kernels I guess?

take care,
Gerd

2023-04-19 15:03:10

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

On Wed, 19 Apr 2023 at 07:54, Gerd Hoffmann <[email protected]> wrote:
>
> On Sun, Apr 16, 2023 at 02:07:26PM +0200, Ard Biesheuvel wrote:
> > This series is a proof-of-concept that implements support for the EFI
> > zboot decompressor for x86. It replaces the ordinary decompressor, and
> > instead, performs the decompression, KASLR randomization and the 4/5
> > level paging switch while running in the execution context of EFI.
> >
> > This simplifies things substantially, and makes it straight-forward to
> > abide by stricter future requirements related to the use of writable and
> > executable memory under EFI, which will come into effect on x86 systems
> > that are certified as being 'more secure', and ship with an even shinier
> > Windows sticker.
> >
> > This is an alternative approach to the work being proposed by Evgeny [0]
> > that makes rather radical changes to the existing decompressor, which
> > has accumulated too many features already, e.g., related to confidential
> > compute etc.
> >
> > EFI zboot images can be booted in two ways:
> > - by EFI firmware, which loads and starts it as an ordinary EFI
> > application, just like the existing EFI stub (with which it shares
> > most of its code);
> > - by a non-EFI loader that parses the image header for the compression
> > metadata, and decompresses the image into memory and boots it.
>
> I like the idea to have all EFI archs handle compressed kernels the same
> way.
>
> But given that going EFI-only on x86 isn't a realistic option for
> distros today this isn't really an alternative for Evgeny's patch
> series, we have to fix the existing bzImage decompressor too.
>

I tend to agree, although some clarification would be helpful
regarding what is being fixed and why? I *think* I know, but I have
not been involved as deeply as some of the distro folks in getting
these requirements explicit.

> Realistically, the second option is unlikely to ever be used on x86,
>
> What would be needed to do so? Teach kexec-tools and grub2 parse and
> load zboot kernels I guess?
>

I already implemented this for mach-virt here, so we can load Fedora
kernels without firmware:

https://gitlab.com/qemu-project/qemu/-/commit/ff11422804cd03494cc98691eecd3909ea09ab6f

On arm64, this is probably more straight-forward, as the bare metal
image is already intended to be booted directly like that. However,
the x86 uncompressed image requires surprisingly little from all the
boot_params/setup_header cruft to actually boot, so perhaps there it
is easy too.

There is an unresolved issue related to kexec_load_file(), where only
the compressed image is signed, but the uncompressed image is what
ultimately gets booted, which either needs the decompression to occur
in the kernel, or a secondary signature that the kernel can verify
after the decompression happens in user space.

Dave and I have generated several ideas here, but there hasn't been
any progress towards a solution that seems palatable for upstream.

2023-04-20 06:11:33

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

Hi,

> > Realistically, the second option is unlikely to ever be used on x86,
> >
> > What would be needed to do so? Teach kexec-tools and grub2 parse and
> > load zboot kernels I guess?
>
> I already implemented this for mach-virt here, so we can load Fedora
> kernels without firmware:
>
> https://gitlab.com/qemu-project/qemu/-/commit/ff11422804cd03494cc98691eecd3909ea09ab6f
>
> On arm64, this is probably more straight-forward, as the bare metal
> image is already intended to be booted directly like that. However,
> the x86 uncompressed image requires surprisingly little from all the
> boot_params/setup_header cruft to actually boot, so perhaps there it
> is easy too.

For existing boot loaders like grub I'd expect it being easy
to code up, all the setup header code exists already, grub also
has support for uncompressing stuff, so it should really be only
zboot header parsing and some plumbing to get things going (i.e.
have grub boot efi zboot kernels in bios mode).

Disclaimer: didn't actually check grub source code.

I suspect the bigger problem wrt. grub is that getting patches merged
upstream is extremely slow and every distro carries a huge stack of
patches ...

take care,
Gerd

2023-04-20 07:56:28

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

On Thu, 20 Apr 2023 at 08:07, Gerd Hoffmann <[email protected]> wrote:
>
> Hi,
>
> > > Realistically, the second option is unlikely to ever be used on x86,
> > >
> > > What would be needed to do so? Teach kexec-tools and grub2 parse and
> > > load zboot kernels I guess?
> >
> > I already implemented this for mach-virt here, so we can load Fedora
> > kernels without firmware:
> >
> > https://gitlab.com/qemu-project/qemu/-/commit/ff11422804cd03494cc98691eecd3909ea09ab6f
> >
> > On arm64, this is probably more straight-forward, as the bare metal
> > image is already intended to be booted directly like that. However,
> > the x86 uncompressed image requires surprisingly little from all the
> > boot_params/setup_header cruft to actually boot, so perhaps there it
> > is easy too.
>
> For existing boot loaders like grub I'd expect it being easy
> to code up, all the setup header code exists already, grub also
> has support for uncompressing stuff, so it should really be only
> zboot header parsing and some plumbing to get things going (i.e.
> have grub boot efi zboot kernels in bios mode).
>
> Disclaimer: didn't actually check grub source code.
>

I have :-(

> I suspect the bigger problem wrt. grub is that getting patches merged
> upstream is extremely slow and every distro carries a huge stack of
> patches ...
>

Yeah, Daniel has been asking me about LoadFile2 initrd loading support
for x86, so I think getting things merged is not going to be a problem
(although it will still take some time) - I can just implement it and
send it out at the same time.

But hacking/building/running GRUB is a rather painful experience, so I
have been kicking this can down the road.

2023-04-20 12:35:00

by Mario Limonciello

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support


On 4/20/23 02:54, Ard Biesheuvel wrote:
> On Thu, 20 Apr 2023 at 08:07, Gerd Hoffmann <[email protected]> wrote:
>> Hi,
>>
>>>> Realistically, the second option is unlikely to ever be used on x86,
>>>>
>>>> What would be needed to do so? Teach kexec-tools and grub2 parse and
>>>> load zboot kernels I guess?
>>> I already implemented this for mach-virt here, so we can load Fedora
>>> kernels without firmware:
>>>
>>> https://gitlab.com/qemu-project/qemu/-/commit/ff11422804cd03494cc98691eecd3909ea09ab6f
>>>
>>> On arm64, this is probably more straight-forward, as the bare metal
>>> image is already intended to be booted directly like that. However,
>>> the x86 uncompressed image requires surprisingly little from all the
>>> boot_params/setup_header cruft to actually boot, so perhaps there it
>>> is easy too.
>> For existing boot loaders like grub I'd expect it being easy
>> to code up, all the setup header code exists already, grub also
>> has support for uncompressing stuff, so it should really be only
>> zboot header parsing and some plumbing to get things going (i.e.
>> have grub boot efi zboot kernels in bios mode).
>>
>> Disclaimer: didn't actually check grub source code.
>>
> I have :-(
>
>> I suspect the bigger problem wrt. grub is that getting patches merged
>> upstream is extremely slow and every distro carries a huge stack of
>> patches ...
>>
> Yeah, Daniel has been asking me about LoadFile2 initrd loading support
> for x86, so I think getting things merged is not going to be a problem
> (although it will still take some time) - I can just implement it and
> send it out at the same time.
If zboot ends up being the way to go there would be quite a bit
of pressure to land the GRUB stuff in distros because of the NX
requirements being pushed into the EFI ecosystem.

They wouldn't be able to work on a system that enforces NX which
is anticipated to balloon.

>
> But hacking/building/running GRUB is a rather painful experience, so I
> have been kicking this can down the road.

2023-04-21 13:31:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support



On Sun, Apr 16, 2023, at 5:07 AM, Ard Biesheuvel wrote:
> This series is a proof-of-concept that implements support for the EFI
> zboot decompressor for x86. It replaces the ordinary decompressor, and
> instead, performs the decompression, KASLR randomization and the 4/5
> level paging switch while running in the execution context of EFI.

I like the concept. A couple high-level questions, since I haven’t dug into the code:

Could zboot and bzImage be built into the same kernel image? That would get this into distros, and eventually someone could modify the legacy path to switch to long mode and invoke zboot (because zboot surely doesn’t need actual UEFI — just a sensible environment like what UEFI provides.)

Does zboot set up BSS correctly? I once went down a rabbit hole trying to get the old decompressor to jump into the kernel with BSS already usable and zeroed, and the result was an incredible mess — IIRC the decompressor does some in-place shenanigans that looked incompatible with handling BSS without a rewrite. And so we clear BSS in C after jumping to the kernel, which is gross.

—Andy

>
> This simplifies things substantially, and makes it straight-forward to
> abide by stricter future requirements related to the use of writable and
> executable memory under EFI, which will come into effect on x86 systems
> that are certified as being 'more secure', and ship with an even shinier
> Windows sticker.
>
> This is an alternative approach to the work being proposed by Evgeny [0]
> that makes rather radical changes to the existing decompressor, which
> has accumulated too many features already, e.g., related to confidential
> compute etc.
>
> EFI zboot images can be booted in two ways:
> - by EFI firmware, which loads and starts it as an ordinary EFI
> application, just like the existing EFI stub (with which it shares
> most of its code);
> - by a non-EFI loader that parses the image header for the compression
> metadata, and decompresses the image into memory and boots it.
>
> Realistically, the second option is unlikely to ever be used on x86,
> given that it already has its existing bzImage, but the first option is
> a good choice for distros that target EFI boot only (and some distros
> switched to this format already for arm64). The fact that EFI zboot is
> implemented in the same way on arm64, RISC-V, LoongArch and [shortly]
> ARM helps with maintenance, not only of the kernel itself, but also the
> tooling around it relating to kexec, code signing, deployment, etc.
>
> Series can be pulled from [1], which contains some prerequisite patches
> that are only tangentially related.
>
> [0] https://lore.kernel.org/all/[email protected]/
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-x86-zboot
>
> Cc: Evgeniy Baskov <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Alexey Khoroshilov <[email protected]>
> Cc: Peter Jones <[email protected]>
> Cc: Gerd Hoffmann <[email protected]>
> Cc: Dave Young <[email protected]>
> Cc: Mario Limonciello <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: Kirill A. Shutemov <[email protected]>
> Cc: Linus Torvalds <[email protected]>
>
> Ard Biesheuvel (3):
> efi/libstub: x86: Split off pieces shared with zboot
> efi/zboot: x86: Implement EFI zboot support
> efi/zboot: x86: Clear NX restrictions on populated code regions
>
> arch/x86/Makefile | 18 +-
> arch/x86/include/asm/efi.h | 10 +
> arch/x86/kernel/head_64.S | 15 +
> arch/x86/zboot/Makefile | 29 +
> drivers/firmware/efi/Kconfig | 2 +-
> drivers/firmware/efi/libstub/Makefile | 15 +-
> drivers/firmware/efi/libstub/Makefile.zboot | 2 +-
> drivers/firmware/efi/libstub/efi-stub-helper.c | 3 +
> drivers/firmware/efi/libstub/x86-stub.c | 592 +------------------
> drivers/firmware/efi/libstub/x86-zboot.c | 322 ++++++++++
> drivers/firmware/efi/libstub/x86.c | 612 ++++++++++++++++++++
> drivers/firmware/efi/libstub/zboot.c | 3 +-
> drivers/firmware/efi/libstub/zboot.lds | 5 +
> 13 files changed, 1031 insertions(+), 597 deletions(-)
> create mode 100644 arch/x86/zboot/Makefile
> create mode 100644 drivers/firmware/efi/libstub/x86-zboot.c
> create mode 100644 drivers/firmware/efi/libstub/x86.c
>
> --
> 2.39.2

2023-04-21 13:43:14

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

On Fri, 21 Apr 2023 at 15:30, Andy Lutomirski <[email protected]> wrote:
>
>
>
> On Sun, Apr 16, 2023, at 5:07 AM, Ard Biesheuvel wrote:
> > This series is a proof-of-concept that implements support for the EFI
> > zboot decompressor for x86. It replaces the ordinary decompressor, and
> > instead, performs the decompression, KASLR randomization and the 4/5
> > level paging switch while running in the execution context of EFI.
>
> I like the concept. A couple high-level questions, since I haven’t dug into the code:
>
> Could zboot and bzImage be built into the same kernel image? That would get this into distros, and eventually someone could modify the legacy path to switch to long mode and invoke zboot (because zboot surely doesn’t need actual UEFI — just a sensible environment like what UEFI provides.)
>

That's an interesting question, and to some extent, that is actually
what Evgeny's patch does: execute more of what the decompressor does
from inside the EFI runtime context.

The main win with zboot imho is that we get rid of all the funky
heuristics that look for usable memory for trampolines and
decompression buffers in funky ways, and instead, just use the EFI
APIs for allocating pages and remapping them executable as needed
(which is the important piece here) I'd have to think about whether
there is any middle ground between this approach and Evgeny's - I'll
have to get back to you on that.

> Does zboot set up BSS correctly? I once went down a rabbit hole trying to get the old decompressor to jump into the kernel with BSS already usable and zeroed, and the result was an incredible mess — IIRC the decompressor does some in-place shenanigans that looked incompatible with handling BSS without a rewrite. And so we clear BSS in C after jumping to the kernel, which is gross.
>

Zboot pads the image to include BSS, so that the zboot metadata covers
the actual memory footprint of the image rather than just the image
size, and it will get zeroed out as a result of the decompression too,
which is a nice bonus. I did this mainly to try and make it idiot
proof for other (non-EFI) consumers of the zboot header and compressed
payload, but it means that the zboot EFI loader doesn't have to bother
either.

2023-05-03 18:06:54

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

On Fri, Apr 21, 2023, at 6:41 AM, Ard Biesheuvel wrote:
> On Fri, 21 Apr 2023 at 15:30, Andy Lutomirski <[email protected]> wrote:
>>
>>
>>
>> On Sun, Apr 16, 2023, at 5:07 AM, Ard Biesheuvel wrote:
>> > This series is a proof-of-concept that implements support for the EFI
>> > zboot decompressor for x86. It replaces the ordinary decompressor, and
>> > instead, performs the decompression, KASLR randomization and the 4/5
>> > level paging switch while running in the execution context of EFI.
>>
>> I like the concept. A couple high-level questions, since I haven’t dug into the code:
>>
>> Could zboot and bzImage be built into the same kernel image? That would get this into distros, and eventually someone could modify the legacy path to switch to long mode and invoke zboot (because zboot surely doesn’t need actual UEFI — just a sensible environment like what UEFI provides.)
>>
>
> That's an interesting question, and to some extent, that is actually
> what Evgeny's patch does: execute more of what the decompressor does
> from inside the EFI runtime context.
>
> The main win with zboot imho is that we get rid of all the funky
> heuristics that look for usable memory for trampolines and
> decompression buffers in funky ways, and instead, just use the EFI
> APIs for allocating pages and remapping them executable as needed
> (which is the important piece here) I'd have to think about whether
> there is any middle ground between this approach and Evgeny's - I'll
> have to get back to you on that.
>

Hmm. I dug the tiniest bit into the history. The x86/boot/compressed stuff has an allocator! It's this:

free_mem_ptr = heap; /* Heap */
free_mem_end_ptr = heap + BOOT_HEAP_SIZE;

plus a trivial and horrible malloc() implementation in include/linux/decompress/mm.h. There's one caller in x86/boot/compressed.

And, once upon a time, the idea of allocating enough memory to store the kernel from the decompressor would have been a problem. I'm willing to claim that we should not even try to support x86 systems that have that little memory (at least not once they've gotten long mode or at least flat 32-bit protected mode working). We should not try to allocate below 1MB (my laptop will cry), but there's no need for that.

So maybe the middle ground is to build a modern, simple malloc(), and back it by EFI when EFI is there and by just finding some free memory when EFI is not there?

This would be risky -- someone might have a horrible machine that has trouble with a simple allocator.

2023-05-03 18:21:41

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC PATCH 0/3] efi: Implement generic zboot support

On Wed, 3 May 2023 at 19:55, Andy Lutomirski <[email protected]> wrote:
>
> On Fri, Apr 21, 2023, at 6:41 AM, Ard Biesheuvel wrote:
> > On Fri, 21 Apr 2023 at 15:30, Andy Lutomirski <[email protected]> wrote:
> >>
> >>
> >>
> >> On Sun, Apr 16, 2023, at 5:07 AM, Ard Biesheuvel wrote:
> >> > This series is a proof-of-concept that implements support for the EFI
> >> > zboot decompressor for x86. It replaces the ordinary decompressor, and
> >> > instead, performs the decompression, KASLR randomization and the 4/5
> >> > level paging switch while running in the execution context of EFI.
> >>
> >> I like the concept. A couple high-level questions, since I haven’t dug into the code:
> >>
> >> Could zboot and bzImage be built into the same kernel image? That would get this into distros, and eventually someone could modify the legacy path to switch to long mode and invoke zboot (because zboot surely doesn’t need actual UEFI — just a sensible environment like what UEFI provides.)
> >>
> >
> > That's an interesting question, and to some extent, that is actually
> > what Evgeny's patch does: execute more of what the decompressor does
> > from inside the EFI runtime context.
> >
> > The main win with zboot imho is that we get rid of all the funky
> > heuristics that look for usable memory for trampolines and
> > decompression buffers in funky ways, and instead, just use the EFI
> > APIs for allocating pages and remapping them executable as needed
> > (which is the important piece here) I'd have to think about whether
> > there is any middle ground between this approach and Evgeny's - I'll
> > have to get back to you on that.
> >
>
> Hmm. I dug the tiniest bit into the history. The x86/boot/compressed stuff has an allocator! It's this:
>
> free_mem_ptr = heap; /* Heap */
> free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
>
> plus a trivial and horrible malloc() implementation in include/linux/decompress/mm.h. There's one caller in x86/boot/compressed.
>
> And, once upon a time, the idea of allocating enough memory to store the kernel from the decompressor would have been a problem. I'm willing to claim that we should not even try to support x86 systems that have that little memory (at least not once they've gotten long mode or at least flat 32-bit protected mode working). We should not try to allocate below 1MB (my laptop will cry), but there's no need for that.
>
> So maybe the middle ground is to build a modern, simple malloc(), and back it by EFI when EFI is there and by just finding some free memory when EFI is not there?
>
> This would be risky -- someone might have a horrible machine that has trouble with a simple allocator.

The malloc() is the least or our concerns, tbh. It is only used by the
decompression library itself, and it is backed by a statically
allocated block of BSS.

Having just gone through this again, the main issues are:

1) The 4/5 level switching trampoline, which runs in the page tables
of the loader/EFI stub, and assumes that it is fine to grab a random
chunk of low memory, stash its contents somewhere, use it for some
code, a stack and a root level page table so we can do the x86 long
mode paging off/paging on salsa, and then copy back the contents and
carry on as if nothing happened. We currently have some code in the
stub that strips all NX restrictions from a generously overdimensioned
block of low memory so copying and running code like this actually
works.

2) We need an accurate description in the PE/COFF header of what needs
to be executable and what needs to be writable, so we can splt the
regions. This only matters for code that runs under EFI's mappings, so
not a lot.

3) The payload relocates itself to the end of the decompression buffer
so it doesn't overwrite itself before completing. This is fragile and
also unnecessary when there is a page allocator and plenty of memory,
but afaict, this all executes under the decompressor's own page tables
so the RO/NX attributes that EFI sets are not a concern here. It
would, of course, be nice if we could avoid relying on RWX mappings
here.

4) I think it was you who pointed out that the demand paging 1:1 map
should really only get triggered for data accesses and not code
accesses, so it would be nice if we could create such mappings with NX
attributes.

I've had another go at running the decompressed kernel directly,
without going through the decompressor logic at all, but I missed the
fact that SEV does a substantial amount of work in the decompressor
too, so I'm no longer convinced that this is a viable approach. But
I'm looking into this.

I just finished some patches [0] that only address 1), based on the
work I posted earlier. I'll send those out once -rc1 comes around.


[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=efi-x86-cleanup-la57