2023-09-12 12:15:21

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 00/15] x86/boot: Rework PE header generation

From: Ard Biesheuvel <[email protected]>

Now that the EFI stub boot flow no longer relies on memory that is
executable and writable at the same time, we can reorganize the PE/COFF
view of the kernel image and expose the decompressor binary's code and
r/o data as a .text section and data/bss as a .data section, using 4k
alignment and limited permissions.

Doing so is necessary for compatibility with hardening measures that are
being rolled out on x86 PCs built to run Windows (i.e., the majority of
them). The EFI boot environment that the Linux EFI stub executes in is
especially sensitive to safety issues, given that a vulnerability in the
loader of one OS can be abused to attack another.

In true x86 fashion, this is a lot more complicated than on other
architectures, which have implemented this code/data split with 4k
alignment from the beginning. The complicating factor here is that the
boot image consists of two different parts, which are stitched together
and fixed up using a special build tool.

After this series is applied, the only remaining task performed by the
build tool is generating the CRC-32. Even though this checksum is
usually wrong (given that distro kernels are signed for secure boot in a
way that corrupts the CRC), this feature is retained as we cannot be
sure that nobody is relying on this.

This supersedes the work proposed by Evgeniy last year, which did a
major rewrite of the build tool in order to clean it up, before updating
it to generate the new 4k aligned image layout. As this series proves,
the build tool is mostly unnecessary, and we have too many of those
already.

Changes since v1:
- drop patch that removed the CRC and the build tool
- do not use fixed setup_size but derive it in the setup.ld linker
script
- reorganize the PE header so the .compat section only covers its
payload and the padding that follows it
- add hpa's ack to patch #4

Cc: Evgeniy Baskov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Jones <[email protected]>
Cc: Matthew Garrett <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>

Ard Biesheuvel (15):
x86/efi: Drop EFI stub .bss from .data section
x86/efi: Disregard setup header of loaded image
x86/efi: Drop alignment flags from PE section headers
x86/boot: Remove the 'bugger off' message
x86/boot: Omit compression buffer from PE/COFF image memory footprint
x86/boot: Drop redundant code setting the root device
x86/boot: Grab kernel_info offset from zoffset header directly
x86/boot: Drop references to startup_64
x86/boot: Set EFI handover offset directly in header asm
x86/boot: Define setup size in linker script
x86/boot: Derive file size from _edata symbol
x86/boot: Construct PE/COFF .text section from assembler
x86/boot: Drop PE/COFF .reloc section
x86/boot: Split off PE/COFF .data section
x86/boot: Increase section and file alignment to 4k/512

arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/vmlinux.lds.S | 6 +-
arch/x86/boot/header.S | 213 ++++++---------
arch/x86/boot/setup.ld | 14 +-
arch/x86/boot/tools/build.c | 273 +-------------------
drivers/firmware/efi/libstub/Makefile | 7 -
drivers/firmware/efi/libstub/x86-stub.c | 46 +---
7 files changed, 114 insertions(+), 447 deletions(-)

--
2.42.0.283.g2d96d420d3-goog


2023-09-12 14:23:35

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 13/15] x86/boot: Drop PE/COFF .reloc section

From: Ard Biesheuvel <[email protected]>

Ancient buggy EFI loaders may have required a .reloc section to be
present at some point in time, but this has not been true for a long
time so the .reloc section can just be dropped.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/header.S | 20 ------------
arch/x86/boot/setup.ld | 4 +--
arch/x86/boot/tools/build.c | 34 +++-----------------
3 files changed, 7 insertions(+), 51 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 2b07bc596c39..9e9641e220a7 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -155,26 +155,6 @@ section_table:
IMAGE_SCN_MEM_READ | \
IMAGE_SCN_MEM_EXECUTE # Characteristics

- #
- # The EFI application loader requires a relocation section
- # because EFI applications must be relocatable. The .reloc
- # offset & size fields are filled in by build.c.
- #
- .ascii ".reloc"
- .byte 0
- .byte 0
- .long 0
- .long 0
- .long 0 # SizeOfRawData
- .long 0 # PointerToRawData
- .long 0 # PointerToRelocations
- .long 0 # PointerToLineNumbers
- .word 0 # NumberOfRelocations
- .word 0 # NumberOfLineNumbers
- .long IMAGE_SCN_CNT_INITIALIZED_DATA | \
- IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_DISCARDABLE # Characteristics
-
#ifdef CONFIG_EFI_MIXED
#
# The offset & size fields are filled in by build.c.
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index ae2b5046a0db..9b551eacffa8 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -40,8 +40,8 @@ SECTIONS
setup_sig = .;
LONG(0x5a5aaa55)

- /* reserve some extra space for the reloc and compat sections */
- setup_size = ABSOLUTE(ALIGN(. + 64, 512));
+ /* reserve some extra space for the compat section */
+ setup_size = ABSOLUTE(ALIGN(. + 32, 512));
setup_sects = ABSOLUTE(setup_size / 512);
}

diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index 9712f27e32c1..faccff9743a3 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -47,7 +47,6 @@ typedef unsigned int u32;
/* This must be large enough to hold the entire setup */
u8 buf[SETUP_SECT_MAX*512];

-#define PECOFF_RELOC_RESERVE 0x20
#define PECOFF_COMPAT_RESERVE 0x20

static unsigned long efi32_pe_entry;
@@ -180,24 +179,13 @@ static void update_pecoff_section_header(char *section_name, u32 offset, u32 siz
update_pecoff_section_header_fields(section_name, offset, size, size, offset);
}

-static void update_pecoff_setup_and_reloc(unsigned int size)
+static void update_pecoff_setup(unsigned int size)
{
u32 setup_offset = 0x200;
- u32 reloc_offset = size - PECOFF_RELOC_RESERVE - PECOFF_COMPAT_RESERVE;
-#ifdef CONFIG_EFI_MIXED
- u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
-#endif
- u32 setup_size = reloc_offset - setup_offset;
+ u32 compat_offset = size - PECOFF_COMPAT_RESERVE;
+ u32 setup_size = compat_offset - setup_offset;

update_pecoff_section_header(".setup", setup_offset, setup_size);
- update_pecoff_section_header(".reloc", reloc_offset, PECOFF_RELOC_RESERVE);
-
- /*
- * Modify .reloc section contents with a single entry. The
- * relocation is applied to offset 10 of the relocation section.
- */
- put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
- put_unaligned_le32(10, &buf[reloc_offset + 4]);

#ifdef CONFIG_EFI_MIXED
update_pecoff_section_header(".compat", compat_offset, PECOFF_COMPAT_RESERVE);
@@ -214,21 +202,10 @@ static void update_pecoff_setup_and_reloc(unsigned int size)
#endif
}

-static int reserve_pecoff_reloc_section(int c)
-{
- /* Reserve 0x20 bytes for .reloc section */
- memset(buf+c, 0, PECOFF_RELOC_RESERVE);
- return PECOFF_RELOC_RESERVE;
-}
-
#else

-static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
+static inline void update_pecoff_setup(unsigned int size) {}

-static inline int reserve_pecoff_reloc_section(int c)
-{
- return 0;
-}
#endif /* CONFIG_EFI_STUB */

static int reserve_pecoff_compat_section(int c)
@@ -307,7 +284,6 @@ int main(int argc, char ** argv)
fclose(file);

c += reserve_pecoff_compat_section(c);
- c += reserve_pecoff_reloc_section(c);

/* Pad unused space with zeros */
setup_sectors = (c + 511) / 512;
@@ -316,7 +292,7 @@ int main(int argc, char ** argv)
i = setup_sectors*512;
memset(buf+c, 0, i-c);

- update_pecoff_setup_and_reloc(i);
+ update_pecoff_setup(i);

/* Open and stat the kernel file */
fd = open(argv[2], O_RDONLY);
--
2.42.0.283.g2d96d420d3-goog

2023-09-12 19:29:49

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 11/15] x86/boot: Derive file size from _edata symbol

From: Ard Biesheuvel <[email protected]>

Tweak the linker script so that the value of _edata represents the
decompressor binary's file size rounded up to the appropriate alignment.
This removes the need to calculate it in the build tool, and will make
it easier to refer to the file size from the header directly in
subsequent changes to the PE header layout.

While adding _edata to the sed regex that parses the compressed
vmlinux's symbol list, tweak the regex a bit for conciseness.

This change has no impact on the resulting bzImage binary when
configured with CONFIG_EFI_STUB=y.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/vmlinux.lds.S | 3 ++
arch/x86/boot/header.S | 2 +-
arch/x86/boot/tools/build.c | 30 +++++---------------
4 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 0e98bc503699..cc04917b1ac6 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -89,7 +89,7 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE

SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))

-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|efi32_stub_entry\|efi64_stub_entry\|efi_pe_entry\|efi32_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|efi.._stub_entry\|efi\(32\)\?_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|_edata\|z_.*\)$$/\#define ZO_\2 0x\1/p'

quiet_cmd_zoffset = ZOFFSET $@
cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 4ff6ab1b67d9..5326f3b44194 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -47,6 +47,9 @@ SECTIONS
_data = . ;
*(.data)
*(.data.*)
+
+ /* add 4 bytes of extra space for a CRC-32 checksum */
+ . = ALIGN(. + 4, 0x20);
_edata = . ;
}
. = ALIGN(L1_CACHE_BYTES);
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 06bd72a324c1..34e9b35b827c 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -233,7 +233,7 @@ sentinel: .byte 0xff, 0xff /* Used to detect broken loaders */
hdr:
.byte setup_sects - 1
root_flags: .word ROOT_RDONLY
-syssize: .long 0 /* Filled in by build.c */
+syssize: .long ZO__edata / 16
ram_size: .word 0 /* Obsolete */
vid_mode: .word SVGA_MODE
root_dev: .word 0 /* Default to major/minor 0/0 */
diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index 745d64b6d930..e792c6c5a634 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -52,6 +52,7 @@ u8 buf[SETUP_SECT_MAX*512];

static unsigned long efi_pe_entry;
static unsigned long efi32_pe_entry;
+static unsigned long _edata;
static unsigned long _end;

/*----------------------------------------------------------------------*/
@@ -308,6 +309,7 @@ static void parse_zoffset(char *fname)
while (p && *p) {
PARSE_ZOFS(p, efi_pe_entry);
PARSE_ZOFS(p, efi32_pe_entry);
+ PARSE_ZOFS(p, _edata);
PARSE_ZOFS(p, _end);

p = strchr(p, '\n');
@@ -320,7 +322,6 @@ int main(int argc, char ** argv)
{
unsigned int i, sz, setup_sectors;
int c;
- u32 sys_size;
struct stat sb;
FILE *file, *dest;
int fd;
@@ -368,24 +369,14 @@ int main(int argc, char ** argv)
die("Unable to open `%s': %m", argv[2]);
if (fstat(fd, &sb))
die("Unable to stat `%s': %m", argv[2]);
- sz = sb.st_size;
+ if (_edata != sb.st_size)
+ die("Unexpected file size `%s': %u != %u", argv[2], _edata,
+ sb.st_size);
+ sz = _edata - 4;
kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
if (kernel == MAP_FAILED)
die("Unable to mmap '%s': %m", argv[2]);
- /* Number of 16-byte paragraphs, including space for a 4-byte CRC */
- sys_size = (sz + 15 + 4) / 16;
-#ifdef CONFIG_EFI_STUB
- /*
- * COFF requires minimum 32-byte alignment of sections, and
- * adding a signature is problematic without that alignment.
- */
- sys_size = (sys_size + 1) & ~1;
-#endif
-
- /* Patch the setup code with the appropriate size parameters */
- put_unaligned_le32(sys_size, &buf[0x1f4]);
-
- update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));
+ update_pecoff_text(setup_sectors * 512, i + _edata);


crc = partial_crc32(buf, i, crc);
@@ -397,13 +388,6 @@ int main(int argc, char ** argv)
if (fwrite(kernel, 1, sz, dest) != sz)
die("Writing kernel failed");

- /* Add padding leaving 4 bytes for the checksum */
- while (sz++ < (sys_size*16) - 4) {
- crc = partial_crc32_one('\0', crc);
- if (fwrite("\0", 1, 1, dest) != 1)
- die("Writing padding failed");
- }
-
/* Write the CRC */
put_unaligned_le32(crc, buf);
if (fwrite(buf, 1, 4, dest) != 4)
--
2.42.0.283.g2d96d420d3-goog

2023-09-12 20:07:36

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 07/15] x86/boot: Grab kernel_info offset from zoffset header directly

From: Ard Biesheuvel <[email protected]>

Instead of parsing zoffset.h and poking the kernel_info offset value
into the header from the build tool, just grab the value directly in the
asm file that describes this header.

This change has no impact on the resulting bzImage binary.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/header.S | 2 +-
arch/x86/boot/tools/build.c | 4 ----
2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 6059f87b159d..5575d0f06bab 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -526,7 +526,7 @@ pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr

init_size: .long INIT_SIZE # kernel initialization size
handover_offset: .long 0 # Filled in by build.c
-kernel_info_offset: .long 0 # Filled in by build.c
+kernel_info_offset: .long ZO_kernel_info

# End of setup header #####################################################

diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index efa4e9c7d713..660627ea6cbb 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -59,7 +59,6 @@ static unsigned long efi32_stub_entry;
static unsigned long efi64_stub_entry;
static unsigned long efi_pe_entry;
static unsigned long efi32_pe_entry;
-static unsigned long kernel_info;
static unsigned long startup_64;
static unsigned long _end;

@@ -339,7 +338,6 @@ static void parse_zoffset(char *fname)
PARSE_ZOFS(p, efi64_stub_entry);
PARSE_ZOFS(p, efi_pe_entry);
PARSE_ZOFS(p, efi32_pe_entry);
- PARSE_ZOFS(p, kernel_info);
PARSE_ZOFS(p, startup_64);
PARSE_ZOFS(p, _end);

@@ -422,8 +420,6 @@ int main(int argc, char ** argv)
update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));

efi_stub_entry_update();
- /* Update kernel_info offset. */
- put_unaligned_le32(kernel_info, &buf[0x268]);

crc = partial_crc32(buf, i, crc);
if (fwrite(buf, 1, i, dest) != i)
--
2.42.0.283.g2d96d420d3-goog

2023-09-12 20:27:07

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 14/15] x86/boot: Split off PE/COFF .data section

From: Ard Biesheuvel <[email protected]>

Describe the code and data of the decompressor binary using separate
.text and .data PE/COFF sections, so that we will be able to map them
using restricted permissions once we increase the section and file
alignment sufficiently. This avoids the need for memory mappings that
are writable and executable at the same time, which is something that
is best avoided for security reasons.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/header.S | 19 +++++++++++++++----
2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index cc04917b1ac6..3cece19b7473 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -89,7 +89,7 @@ $(obj)/vmlinux.bin: $(obj)/compressed/vmlinux FORCE

SETUP_OBJS = $(addprefix $(obj)/,$(setup-y))

-sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|efi.._stub_entry\|efi\(32\)\?_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|_edata\|z_.*\)$$/\#define ZO_\2 0x\1/p'
+sed-zoffset := -e 's/^\([0-9a-fA-F]*\) [a-zA-Z] \(startup_32\|efi.._stub_entry\|efi\(32\)\?_pe_entry\|input_data\|kernel_info\|_end\|_ehead\|_text\|_e\?data\|z_.*\)$$/\#define ZO_\2 0x\1/p'

quiet_cmd_zoffset = ZOFFSET $@
cmd_zoffset = $(NM) $< | sed -n $(sed-zoffset) > $@
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 9e9641e220a7..a1f986105f00 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -75,9 +75,9 @@ optional_header:
.byte 0x02 # MajorLinkerVersion
.byte 0x14 # MinorLinkerVersion

- .long setup_size + ZO__end - 0x200 # SizeOfCode
+ .long ZO__data # SizeOfCode

- .long 0 # SizeOfInitializedData
+ .long ZO__end - ZO__data # SizeOfInitializedData
.long 0 # SizeOfUninitializedData

.long setup_size + ZO_efi_pe_entry # AddressOfEntryPoint
@@ -178,9 +178,9 @@ section_table:
.byte 0
.byte 0
.byte 0
- .long ZO__end
+ .long ZO__data
.long setup_size
- .long ZO__edata # Size of initialized data
+ .long ZO__data # Size of initialized data
# on disk
.long setup_size
.long 0 # PointerToRelocations
@@ -191,6 +191,17 @@ section_table:
IMAGE_SCN_MEM_READ | \
IMAGE_SCN_MEM_EXECUTE # Characteristics

+ .ascii ".data\0\0\0"
+ .long ZO__end - ZO__data # VirtualSize
+ .long setup_size + ZO__data # VirtualAddress
+ .long ZO__edata - ZO__data # SizeOfRawData
+ .long setup_size + ZO__data # PointerToRawData
+
+ .long 0, 0, 0
+ .long IMAGE_SCN_CNT_INITIALIZED_DATA | \
+ IMAGE_SCN_MEM_READ | \
+ IMAGE_SCN_MEM_WRITE # Characteristics
+
.set section_count, (. - section_table) / 40
#endif /* CONFIG_EFI_STUB */

--
2.42.0.283.g2d96d420d3-goog

2023-09-12 21:34:15

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 04/15] x86/boot: Remove the 'bugger off' message

From: Ard Biesheuvel <[email protected]>

Ancient (pre-2003) x86 kernels could boot from a floppy disk straight from
the BIOS, using a small real mode boot stub at the start of the image
where the BIOS would expect the boot record (or boot block) to appear.

Due to its limitations (kernel size < 1 MiB, no support for IDE, USB or
El Torito floppy emulation), this support was dropped, and a Linux aware
bootloader is now always required to boot the kernel from a legacy BIOS.

To smoothen this transition, the boot stub was not removed entirely, but
replaced with one that just prints an error message telling the user to
install a bootloader.

As it is unlikely that anyone doing direct floppy boot with such an
ancient kernel is going to upgrade to v6.5+ and expect that this boot
method still works, printing this message is kind of pointless, and so
it should be possible to remove the logic that emits it.

Let's free up this space so it can be used to expand the PE header in a
subsequent patch.

Acked-by: H. Peter Anvin (Intel) <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/header.S | 49 --------------------
arch/x86/boot/setup.ld | 7 +--
2 files changed, 4 insertions(+), 52 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 8c8148d751c6..b24fa50a9898 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -38,64 +38,15 @@ SYSSEG = 0x1000 /* historical load address >> 4 */

.code16
.section ".bstext", "ax"
-
- .global bootsect_start
-bootsect_start:
#ifdef CONFIG_EFI_STUB
# "MZ", MS-DOS header
.word MZ_MAGIC
-#endif
-
- # Normalize the start address
- ljmp $BOOTSEG, $start2
-
-start2:
- movw %cs, %ax
- movw %ax, %ds
- movw %ax, %es
- movw %ax, %ss
- xorw %sp, %sp
- sti
- cld
-
- movw $bugger_off_msg, %si
-
-msg_loop:
- lodsb
- andb %al, %al
- jz bs_die
- movb $0xe, %ah
- movw $7, %bx
- int $0x10
- jmp msg_loop
-
-bs_die:
- # Allow the user to press a key, then reboot
- xorw %ax, %ax
- int $0x16
- int $0x19
-
- # int 0x19 should never return. In case it does anyway,
- # invoke the BIOS reset code...
- ljmp $0xf000,$0xfff0
-
-#ifdef CONFIG_EFI_STUB
.org 0x38
#
# Offset to the PE header.
#
.long LINUX_PE_MAGIC
.long pe_header
-#endif /* CONFIG_EFI_STUB */
-
- .section ".bsdata", "a"
-bugger_off_msg:
- .ascii "Use a boot loader.\r\n"
- .ascii "\n"
- .ascii "Remove disk and press any key to reboot...\r\n"
- .byte 0
-
-#ifdef CONFIG_EFI_STUB
pe_header:
.long PE_MAGIC

diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 49546c247ae2..b11c45b9e51e 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -10,10 +10,11 @@ ENTRY(_start)
SECTIONS
{
. = 0;
- .bstext : { *(.bstext) }
- .bsdata : { *(.bsdata) }
+ .bstext : {
+ *(.bstext)
+ . = 495;
+ } =0xffffffff

- . = 495;
.header : { *(.header) }
.entrytext : { *(.entrytext) }
.inittext : { *(.inittext) }
--
2.42.0.283.g2d96d420d3-goog

2023-09-12 21:34:56

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 05/15] x86/boot: Omit compression buffer from PE/COFF image memory footprint

From: Ard Biesheuvel <[email protected]>

Now that the EFI stub decompresses the kernel and hands over to the
decompressed image directly, there is no longer a need to provide a
decompression buffer as part of the .BSS allocation of the PE/COFF
image. It also means the PE/COFF image can be loaded anywhere in memory,
and setting the preferred image base is unnecessary. So drop the
handling of this from the header and from the build tool.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/header.S | 6 +--
arch/x86/boot/tools/build.c | 50 +++-----------------
2 files changed, 8 insertions(+), 48 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index b24fa50a9898..a87d9133384b 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -90,12 +90,10 @@ optional_header:
#endif

extra_header_fields:
- # PE specification requires ImageBase to be 64k aligned
- .set image_base, (LOAD_PHYSICAL_ADDR + 0xffff) & ~0xffff
#ifdef CONFIG_X86_32
- .long image_base # ImageBase
+ .long 0 # ImageBase
#else
- .quad image_base # ImageBase
+ .quad 0 # ImageBase
#endif
.long 0x20 # SectionAlignment
.long 0x20 # FileAlignment
diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index bd247692b701..0354c223e354 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -65,7 +65,6 @@ static unsigned long efi_pe_entry;
static unsigned long efi32_pe_entry;
static unsigned long kernel_info;
static unsigned long startup_64;
-static unsigned long _ehead;
static unsigned long _end;

/*----------------------------------------------------------------------*/
@@ -229,27 +228,14 @@ static void update_pecoff_setup_and_reloc(unsigned int size)
#endif
}

-static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
- unsigned int init_sz)
+static void update_pecoff_text(unsigned int text_start, unsigned int file_sz)
{
unsigned int pe_header;
unsigned int text_sz = file_sz - text_start;
- unsigned int bss_sz = init_sz - file_sz;
+ unsigned int bss_sz = _end - text_sz;

pe_header = get_unaligned_le32(&buf[0x3c]);

- /*
- * The PE/COFF loader may load the image at an address which is
- * misaligned with respect to the kernel_alignment field in the setup
- * header.
- *
- * In order to avoid relocating the kernel to correct the misalignment,
- * add slack to allow the buffer to be aligned within the declared size
- * of the image.
- */
- bss_sz += CONFIG_PHYSICAL_ALIGN;
- init_sz += CONFIG_PHYSICAL_ALIGN;
-
/*
* Size of code: Subtract the size of the first sector (512 bytes)
* which includes the header.
@@ -257,7 +243,7 @@ static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header + 0x1c]);

/* Size of image */
- put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
+ put_unaligned_le32(file_sz + bss_sz, &buf[pe_header + 0x50]);

/*
* Address of entry point for PE/COFF executable
@@ -308,8 +294,7 @@ static void efi_stub_entry_update(void)

static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
static inline void update_pecoff_text(unsigned int text_start,
- unsigned int file_sz,
- unsigned int init_sz) {}
+ unsigned int file_sz) {}
static inline void efi_stub_defaults(void) {}
static inline void efi_stub_entry_update(void) {}

@@ -360,7 +345,6 @@ static void parse_zoffset(char *fname)
PARSE_ZOFS(p, efi32_pe_entry);
PARSE_ZOFS(p, kernel_info);
PARSE_ZOFS(p, startup_64);
- PARSE_ZOFS(p, _ehead);
PARSE_ZOFS(p, _end);

p = strchr(p, '\n');
@@ -371,7 +355,7 @@ static void parse_zoffset(char *fname)

int main(int argc, char ** argv)
{
- unsigned int i, sz, setup_sectors, init_sz;
+ unsigned int i, sz, setup_sectors;
int c;
u32 sys_size;
struct stat sb;
@@ -442,31 +426,9 @@ int main(int argc, char ** argv)
buf[0x1f1] = setup_sectors-1;
put_unaligned_le32(sys_size, &buf[0x1f4]);

- init_sz = get_unaligned_le32(&buf[0x260]);
-#ifdef CONFIG_EFI_STUB
- /*
- * The decompression buffer will start at ImageBase. When relocating
- * the compressed kernel to its end, we must ensure that the head
- * section does not get overwritten. The head section occupies
- * [i, i + _ehead), and the destination is [init_sz - _end, init_sz).
- *
- * At present these should never overlap, because 'i' is at most 32k
- * because of SETUP_SECT_MAX, '_ehead' is less than 1k, and the
- * calculation of INIT_SIZE in boot/header.S ensures that
- * 'init_sz - _end' is at least 64k.
- *
- * For future-proofing, increase init_sz if necessary.
- */
-
- if (init_sz - _end < i + _ehead) {
- init_sz = (i + _ehead + _end + 4095) & ~4095;
- put_unaligned_le32(init_sz, &buf[0x260]);
- }
-#endif
- update_pecoff_text(setup_sectors * 512, i + (sys_size * 16), init_sz);
+ update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));

efi_stub_entry_update();
-
/* Update kernel_info offset. */
put_unaligned_le32(kernel_info, &buf[0x268]);

--
2.42.0.283.g2d96d420d3-goog

2023-09-13 00:05:36

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 10/15] x86/boot: Define setup size in linker script

From: Ard Biesheuvel <[email protected]>

The setup block contains the real mode startup code that is used when
booting from a legacy BIOS, along with the boot_params/setup_data that
is used by legacy x86 bootloaders to pass the command line and initial
ramdisk parameters, among other things.

The setup block also contains the PE/COFF header of the entire combined
image, which includes the compressed kernel image, the decompressor and
the EFI stub.

This PE header describes the layout of the executable image in memory,
and currently, the fact that the setup block precedes it makes it rather
fiddly to get the right values into the right place in the final image.

Let's make things a bit easier by defining the setup_size in the linker
script so it can be referenced from the asm code directly, rather than
having to rely on the build tool to calculate it. For the time being,
add 64 bytes of fixed padding for the .reloc and .compat sections - this
will be removed in a subsequent patch after the PE/COFF header has been
reorganized.

This change has no impact on the resulting bzImage binary when
configured with CONFIG_EFI_MIXED=y.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/header.S | 2 +-
arch/x86/boot/setup.ld | 4 ++++
arch/x86/boot/tools/build.c | 6 ------
3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 72744ba440f6..06bd72a324c1 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -231,7 +231,7 @@ sentinel: .byte 0xff, 0xff /* Used to detect broken loaders */

.globl hdr
hdr:
-setup_sects: .byte 0 /* Filled in by build.c */
+ .byte setup_sects - 1
root_flags: .word ROOT_RDONLY
syssize: .long 0 /* Filled in by build.c */
ram_size: .word 0 /* Obsolete */
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index b11c45b9e51e..ae2b5046a0db 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -39,6 +39,10 @@ SECTIONS
.signature : {
setup_sig = .;
LONG(0x5a5aaa55)
+
+ /* reserve some extra space for the reloc and compat sections */
+ setup_size = ABSOLUTE(ALIGN(. + 64, 512));
+ setup_sects = ABSOLUTE(setup_size / 512);
}


diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index 069497543164..745d64b6d930 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -48,12 +48,7 @@ typedef unsigned int u32;
u8 buf[SETUP_SECT_MAX*512];

#define PECOFF_RELOC_RESERVE 0x20
-
-#ifdef CONFIG_EFI_MIXED
#define PECOFF_COMPAT_RESERVE 0x20
-#else
-#define PECOFF_COMPAT_RESERVE 0x0
-#endif

static unsigned long efi_pe_entry;
static unsigned long efi32_pe_entry;
@@ -388,7 +383,6 @@ int main(int argc, char ** argv)
#endif

/* Patch the setup code with the appropriate size parameters */
- buf[0x1f1] = setup_sectors-1;
put_unaligned_le32(sys_size, &buf[0x1f4]);

update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));
--
2.42.0.283.g2d96d420d3-goog

2023-09-13 04:51:12

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 02/15] x86/efi: Disregard setup header of loaded image

From: Ard Biesheuvel <[email protected]>

The native EFI entrypoint does not take a struct boot_params from the
loader, but instead, it constructs one from scratch, using the setup
header data placed at the start of the image.

This setup header is placed in a way that permits legacy loaders to
manipulate the contents (i.e., to pass the kernel command line or the
address and size of an initial ramdisk), but EFI boot does not use it in
that way - it only copies the contents that were placed there at build
time, but EFI loaders will not (and should not) manipulate the setup
header to configure the boot. (Commit 63bf28ceb3ebbe76 "efi: x86: Wipe
setup_data on pure EFI boot" deals with some of the fallout of using
setup_data in a way that breaks EFI boot.)

Given that none of the non-zero values that are copied from the setup
header into the EFI stub's struct boot_params are relevant to the boot
now that the EFI stub no longer enters via the legacy decompressor, the
copy can be omitted altogether.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 46 +++-----------------
1 file changed, 6 insertions(+), 40 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 2fee52ed335d..d76a9f7c35d0 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -449,9 +449,8 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
efi_system_table_t *sys_table_arg)
{
- struct boot_params *boot_params;
- struct setup_header *hdr;
- void *image_base;
+ static struct boot_params boot_params __page_aligned_bss;
+ struct setup_header *hdr = &boot_params.hdr;
efi_guid_t proto = LOADED_IMAGE_PROTOCOL_GUID;
int options_size = 0;
efi_status_t status;
@@ -469,30 +468,9 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
efi_exit(handle, status);
}

- image_base = efi_table_attr(image, image_base);
-
- status = efi_allocate_pages(sizeof(struct boot_params),
- (unsigned long *)&boot_params, ULONG_MAX);
- if (status != EFI_SUCCESS) {
- efi_err("Failed to allocate lowmem for boot params\n");
- efi_exit(handle, status);
- }
-
- memset(boot_params, 0x0, sizeof(struct boot_params));
-
- hdr = &boot_params->hdr;
-
- /* Copy the setup header from the second sector to boot_params */
- memcpy(&hdr->jump, image_base + 512,
- sizeof(struct setup_header) - offsetof(struct setup_header, jump));
-
- /*
- * Fill out some of the header fields ourselves because the
- * EFI firmware loader doesn't load the first sector.
- */
+ /* assign the setup_header fields that the kernel actually cares about */
hdr->root_flags = 1;
hdr->vid_mode = 0xffff;
- hdr->boot_flag = 0xAA55;

hdr->type_of_loader = 0x21;

@@ -501,25 +479,13 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
if (!cmdline_ptr)
goto fail;

- efi_set_u64_split((unsigned long)cmdline_ptr,
- &hdr->cmd_line_ptr, &boot_params->ext_cmd_line_ptr);
-
- hdr->ramdisk_image = 0;
- hdr->ramdisk_size = 0;
-
- /*
- * Disregard any setup data that was provided by the bootloader:
- * setup_data could be pointing anywhere, and we have no way of
- * authenticating or validating the payload.
- */
- hdr->setup_data = 0;
+ efi_set_u64_split((unsigned long)cmdline_ptr, &hdr->cmd_line_ptr,
+ &boot_params.ext_cmd_line_ptr);

- efi_stub_entry(handle, sys_table_arg, boot_params);
+ efi_stub_entry(handle, sys_table_arg, &boot_params);
/* not reached */

fail:
- efi_free(sizeof(struct boot_params), (unsigned long)boot_params);
-
efi_exit(handle, status);
}

--
2.42.0.283.g2d96d420d3-goog

2023-09-13 04:55:31

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 09/15] x86/boot: Set EFI handover offset directly in header asm

From: Ard Biesheuvel <[email protected]>

The offsets of the EFI handover entrypoints are available to the
assembler when constructing the header, so there is no need to set them
from the build tool afterwards.

This change has no impact on the resulting bzImage binary.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/header.S | 18 ++++++++++++++-
arch/x86/boot/tools/build.c | 24 --------------------
2 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 5575d0f06bab..72744ba440f6 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -524,8 +524,24 @@ pref_address: .quad LOAD_PHYSICAL_ADDR # preferred load addr
# define INIT_SIZE VO_INIT_SIZE
#endif

+ .macro __handover_offset
+#ifndef CONFIG_EFI_HANDOVER_PROTOCOL
+ .long 0
+#elif !defined(CONFIG_X86_64)
+ .long ZO_efi32_stub_entry
+#else
+ /* Yes, this is really how we defined it :( */
+ .long ZO_efi64_stub_entry - 0x200
+#ifdef CONFIG_EFI_MIXED
+ .if ZO_efi32_stub_entry != ZO_efi64_stub_entry - 0x200
+ .error "32-bit and 64-bit EFI entry points do not match"
+ .endif
+#endif
+#endif
+ .endm
+
init_size: .long INIT_SIZE # kernel initialization size
-handover_offset: .long 0 # Filled in by build.c
+handover_offset: __handover_offset
kernel_info_offset: .long ZO_kernel_info

# End of setup header #####################################################
diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index 14ef13fe7ab0..069497543164 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -55,8 +55,6 @@ u8 buf[SETUP_SECT_MAX*512];
#define PECOFF_COMPAT_RESERVE 0x0
#endif

-static unsigned long efi32_stub_entry;
-static unsigned long efi64_stub_entry;
static unsigned long efi_pe_entry;
static unsigned long efi32_pe_entry;
static unsigned long _end;
@@ -265,31 +263,12 @@ static void efi_stub_defaults(void)
#endif
}

-static void efi_stub_entry_update(void)
-{
- unsigned long addr = efi32_stub_entry;
-
-#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
-#ifdef CONFIG_X86_64
- /* Yes, this is really how we defined it :( */
- addr = efi64_stub_entry - 0x200;
-#endif
-
-#ifdef CONFIG_EFI_MIXED
- if (efi32_stub_entry != addr)
- die("32-bit and 64-bit EFI entry points do not match\n");
-#endif
-#endif
- put_unaligned_le32(addr, &buf[0x264]);
-}
-
#else

static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
static inline void update_pecoff_text(unsigned int text_start,
unsigned int file_sz) {}
static inline void efi_stub_defaults(void) {}
-static inline void efi_stub_entry_update(void) {}

static inline int reserve_pecoff_reloc_section(int c)
{
@@ -332,8 +311,6 @@ static void parse_zoffset(char *fname)
p = (char *)buf;

while (p && *p) {
- PARSE_ZOFS(p, efi32_stub_entry);
- PARSE_ZOFS(p, efi64_stub_entry);
PARSE_ZOFS(p, efi_pe_entry);
PARSE_ZOFS(p, efi32_pe_entry);
PARSE_ZOFS(p, _end);
@@ -416,7 +393,6 @@ int main(int argc, char ** argv)

update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));

- efi_stub_entry_update();

crc = partial_crc32(buf, i, crc);
if (fwrite(buf, 1, i, dest) != i)
--
2.42.0.283.g2d96d420d3-goog

2023-09-13 06:05:42

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH v2 03/15] x86/efi: Drop alignment flags from PE section headers

From: Ard Biesheuvel <[email protected]>

The section header flags for alignment are documented in the PE/COFF
spec as being applicable to PE object files only, not to PE executables
such as the Linux bzImage, so let's drop them from the PE header.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/header.S | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index b04ca8e2b213..8c8148d751c6 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -209,8 +209,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_CODE | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_EXECUTE | \
- IMAGE_SCN_ALIGN_16BYTES # Characteristics
+ IMAGE_SCN_MEM_EXECUTE # Characteristics

#
# The EFI application loader requires a relocation section
@@ -230,8 +229,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_INITIALIZED_DATA | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_DISCARDABLE | \
- IMAGE_SCN_ALIGN_1BYTES # Characteristics
+ IMAGE_SCN_MEM_DISCARDABLE # Characteristics

#ifdef CONFIG_EFI_MIXED
#
@@ -249,8 +247,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_INITIALIZED_DATA | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_DISCARDABLE | \
- IMAGE_SCN_ALIGN_1BYTES # Characteristics
+ IMAGE_SCN_MEM_DISCARDABLE # Characteristics
#endif

#
@@ -271,8 +268,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_CODE | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_EXECUTE | \
- IMAGE_SCN_ALIGN_16BYTES # Characteristics
+ IMAGE_SCN_MEM_EXECUTE # Characteristics

.set section_count, (. - section_table) / 40
#endif /* CONFIG_EFI_STUB */
--
2.42.0.283.g2d96d420d3-goog

Subject: [tip: x86/boot] x86/boot: Define setup size in linker script

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 988b52b207a9fe74c3699bda8c2256714926b94b
Gitweb: https://git.kernel.org/tip/988b52b207a9fe74c3699bda8c2256714926b94b
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Tue, 12 Sep 2023 09:01:01
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 15 Sep 2023 11:18:42 +02:00

x86/boot: Define setup size in linker script

The setup block contains the real mode startup code that is used when
booting from a legacy BIOS, along with the boot_params/setup_data that
is used by legacy x86 bootloaders to pass the command line and initial
ramdisk parameters, among other things.

The setup block also contains the PE/COFF header of the entire combined
image, which includes the compressed kernel image, the decompressor and
the EFI stub.

This PE header describes the layout of the executable image in memory,
and currently, the fact that the setup block precedes it makes it rather
fiddly to get the right values into the right place in the final image.

Let's make things a bit easier by defining the setup_size in the linker
script so it can be referenced from the asm code directly, rather than
having to rely on the build tool to calculate it. For the time being,
add 64 bytes of fixed padding for the .reloc and .compat sections - this
will be removed in a subsequent patch after the PE/COFF header has been
reorganized.

This change has no impact on the resulting bzImage binary when
configured with CONFIG_EFI_MIXED=y.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/header.S | 2 +-
arch/x86/boot/setup.ld | 4 ++++
arch/x86/boot/tools/build.c | 6 ------
3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 6059f87..36011b2 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -231,7 +231,7 @@ sentinel: .byte 0xff, 0xff /* Used to detect broken loaders */

.globl hdr
hdr:
-setup_sects: .byte 0 /* Filled in by build.c */
+ .byte setup_sects - 1
root_flags: .word ROOT_RDONLY
syssize: .long 0 /* Filled in by build.c */
ram_size: .word 0 /* Obsolete */
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index b11c45b..f125f17 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -39,6 +39,10 @@ SECTIONS
.signature : {
setup_sig = .;
LONG(0x5a5aaa55)
+
+ /* Reserve some extra space for the reloc and compat sections */
+ setup_size = ABSOLUTE(ALIGN(. + 64, 512));
+ setup_sects = ABSOLUTE(setup_size / 512);
}


diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index 10b0207..bcbae88 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -48,12 +48,7 @@ typedef unsigned int u32;
u8 buf[SETUP_SECT_MAX*512];

#define PECOFF_RELOC_RESERVE 0x20
-
-#ifdef CONFIG_EFI_MIXED
#define PECOFF_COMPAT_RESERVE 0x20
-#else
-#define PECOFF_COMPAT_RESERVE 0x0
-#endif

static unsigned long efi32_stub_entry;
static unsigned long efi64_stub_entry;
@@ -413,7 +408,6 @@ int main(int argc, char ** argv)
#endif

/* Patch the setup code with the appropriate size parameters */
- buf[0x1f1] = setup_sectors-1;
put_unaligned_le32(sys_size, &buf[0x1f4]);

update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));

Subject: [tip: x86/boot] x86/efi: Disregard setup header of loaded image

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 7e50262229faad0c7b8c54477cd1c883f31cc4a7
Gitweb: https://git.kernel.org/tip/7e50262229faad0c7b8c54477cd1c883f31cc4a7
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Tue, 12 Sep 2023 09:00:53
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 15 Sep 2023 11:18:40 +02:00

x86/efi: Disregard setup header of loaded image

The native EFI entrypoint does not take a struct boot_params from the
loader, but instead, it constructs one from scratch, using the setup
header data placed at the start of the image.

This setup header is placed in a way that permits legacy loaders to
manipulate the contents (i.e., to pass the kernel command line or the
address and size of an initial ramdisk), but EFI boot does not use it in
that way - it only copies the contents that were placed there at build
time, but EFI loaders will not (and should not) manipulate the setup
header to configure the boot. (Commit 63bf28ceb3ebbe76 "efi: x86: Wipe
setup_data on pure EFI boot" deals with some of the fallout of using
setup_data in a way that breaks EFI boot.)

Given that none of the non-zero values that are copied from the setup
header into the EFI stub's struct boot_params are relevant to the boot
now that the EFI stub no longer enters via the legacy decompressor, the
copy can be omitted altogether.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/firmware/efi/libstub/x86-stub.c | 46 +++---------------------
1 file changed, 6 insertions(+), 40 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 2fee52e..3bfc596 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -449,9 +449,8 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
efi_system_table_t *sys_table_arg)
{
- struct boot_params *boot_params;
- struct setup_header *hdr;
- void *image_base;
+ static struct boot_params boot_params __page_aligned_bss;
+ struct setup_header *hdr = &boot_params.hdr;
efi_guid_t proto = LOADED_IMAGE_PROTOCOL_GUID;
int options_size = 0;
efi_status_t status;
@@ -469,30 +468,9 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
efi_exit(handle, status);
}

- image_base = efi_table_attr(image, image_base);
-
- status = efi_allocate_pages(sizeof(struct boot_params),
- (unsigned long *)&boot_params, ULONG_MAX);
- if (status != EFI_SUCCESS) {
- efi_err("Failed to allocate lowmem for boot params\n");
- efi_exit(handle, status);
- }
-
- memset(boot_params, 0x0, sizeof(struct boot_params));
-
- hdr = &boot_params->hdr;
-
- /* Copy the setup header from the second sector to boot_params */
- memcpy(&hdr->jump, image_base + 512,
- sizeof(struct setup_header) - offsetof(struct setup_header, jump));
-
- /*
- * Fill out some of the header fields ourselves because the
- * EFI firmware loader doesn't load the first sector.
- */
+ /* Assign the setup_header fields that the kernel actually cares about */
hdr->root_flags = 1;
hdr->vid_mode = 0xffff;
- hdr->boot_flag = 0xAA55;

hdr->type_of_loader = 0x21;

@@ -501,25 +479,13 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
if (!cmdline_ptr)
goto fail;

- efi_set_u64_split((unsigned long)cmdline_ptr,
- &hdr->cmd_line_ptr, &boot_params->ext_cmd_line_ptr);
-
- hdr->ramdisk_image = 0;
- hdr->ramdisk_size = 0;
-
- /*
- * Disregard any setup data that was provided by the bootloader:
- * setup_data could be pointing anywhere, and we have no way of
- * authenticating or validating the payload.
- */
- hdr->setup_data = 0;
+ efi_set_u64_split((unsigned long)cmdline_ptr, &hdr->cmd_line_ptr,
+ &boot_params.ext_cmd_line_ptr);

- efi_stub_entry(handle, sys_table_arg, boot_params);
+ efi_stub_entry(handle, sys_table_arg, &boot_params);
/* not reached */

fail:
- efi_free(sizeof(struct boot_params), (unsigned long)boot_params);
-
efi_exit(handle, status);
}

2023-09-15 13:44:13

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On Fri, 15 Sept 2023 at 15:21, Ard Biesheuvel <[email protected]> wrote:
>
> On Fri, Sep 15, 2023 at 1:31 PM Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Ingo Molnar <[email protected]> wrote:
> >
> > > > Ard Biesheuvel (15):
> > > > x86/efi: Drop EFI stub .bss from .data section
> > > > x86/efi: Disregard setup header of loaded image
> > > > x86/efi: Drop alignment flags from PE section headers
> > > > x86/boot: Remove the 'bugger off' message
> > > > x86/boot: Omit compression buffer from PE/COFF image memory footprint
> > > > x86/boot: Drop redundant code setting the root device
> > > > x86/boot: Grab kernel_info offset from zoffset header directly
> > > > x86/boot: Drop references to startup_64
> > >
> > > I've applied these first 8 patches to tip:x86/boot with minor edits.
>
> Thanks.
>
> > > (Please preserve existing comment capitalization conventions ...)
> > >
>
> Ack
>
> > > > x86/boot: Set EFI handover offset directly in header asm
> > > > x86/boot: Define setup size in linker script
> > > > x86/boot: Derive file size from _edata symbol
> > > > x86/boot: Construct PE/COFF .text section from assembler
> > > > x86/boot: Drop PE/COFF .reloc section
> > > > x86/boot: Split off PE/COFF .data section
> > > > x86/boot: Increase section and file alignment to 4k/512
> > >
> > > The rest conflicted with recent upstream changes, and I suppose it's
> > > prudent to test these changes bit by bit anyway.
> >
>
> Agreed. So you mean this conflicts with other stuff queued up in -tip
> already, right?
>
> > So, the first 8 patches broke the x86-64-defconfig-ish Qemu bzImage bootup,
> > due to the 8th patch:
> >
> > 988b52b207a9fe74c3699bda8c2256714926b94b is the first bad commit
> > commit 988b52b207a9fe74c3699bda8c2256714926b94b
> > Author: Ard Biesheuvel <[email protected]>
> > Date: Tue Sep 12 09:01:01 2023 +0000
> >
> > x86/boot: Define setup size in linker script
> >
> > I've removed it for now - but this side effect was not expected.
> >
>
> No, definitely not expected. I tested various combinations of i386 /
> x86_64 built with GCC / Clang doing EFI or BIOS boot.
>
> I'll rebase the remaining stuff onto -tip and see if I can reproduce this.

This is actually quite bizarre. x86_64_defconfig has
CONFIG_EFI_MIXED=y and i tested that this change produces the exact
same bzImage binary in that case.

Could you send me the .config and the QEMU command line perhaps?

2023-09-15 14:40:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation


* Ingo Molnar <[email protected]> wrote:

> > Ard Biesheuvel (15):
> > x86/efi: Drop EFI stub .bss from .data section
> > x86/efi: Disregard setup header of loaded image
> > x86/efi: Drop alignment flags from PE section headers
> > x86/boot: Remove the 'bugger off' message
> > x86/boot: Omit compression buffer from PE/COFF image memory footprint
> > x86/boot: Drop redundant code setting the root device
> > x86/boot: Grab kernel_info offset from zoffset header directly
> > x86/boot: Drop references to startup_64
>
> I've applied these first 8 patches to tip:x86/boot with minor edits.
> (Please preserve existing comment capitalization conventions ...)
>
> > x86/boot: Set EFI handover offset directly in header asm
> > x86/boot: Define setup size in linker script
> > x86/boot: Derive file size from _edata symbol
> > x86/boot: Construct PE/COFF .text section from assembler
> > x86/boot: Drop PE/COFF .reloc section
> > x86/boot: Split off PE/COFF .data section
> > x86/boot: Increase section and file alignment to 4k/512
>
> The rest conflicted with recent upstream changes, and I suppose it's
> prudent to test these changes bit by bit anyway.

So, the first 8 patches broke the x86-64-defconfig-ish Qemu bzImage bootup,
due to the 8th patch:

988b52b207a9fe74c3699bda8c2256714926b94b is the first bad commit
commit 988b52b207a9fe74c3699bda8c2256714926b94b
Author: Ard Biesheuvel <[email protected]>
Date: Tue Sep 12 09:01:01 2023 +0000

x86/boot: Define setup size in linker script

I've removed it for now - but this side effect was not expected.

Thanks,

Ingo

Subject: [tip: x86/boot] x86/efi: Drop alignment flags from PE section headers

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: bfab35f552ab3dd6d017165bf9de1d1d20f198cc
Gitweb: https://git.kernel.org/tip/bfab35f552ab3dd6d017165bf9de1d1d20f198cc
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Tue, 12 Sep 2023 09:00:54
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 15 Sep 2023 11:18:41 +02:00

x86/efi: Drop alignment flags from PE section headers

The section header flags for alignment are documented in the PE/COFF
spec as being applicable to PE object files only, not to PE executables
such as the Linux bzImage, so let's drop them from the PE header.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/header.S | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index b04ca8e..8c8148d 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -209,8 +209,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_CODE | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_EXECUTE | \
- IMAGE_SCN_ALIGN_16BYTES # Characteristics
+ IMAGE_SCN_MEM_EXECUTE # Characteristics

#
# The EFI application loader requires a relocation section
@@ -230,8 +229,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_INITIALIZED_DATA | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_DISCARDABLE | \
- IMAGE_SCN_ALIGN_1BYTES # Characteristics
+ IMAGE_SCN_MEM_DISCARDABLE # Characteristics

#ifdef CONFIG_EFI_MIXED
#
@@ -249,8 +247,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_INITIALIZED_DATA | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_DISCARDABLE | \
- IMAGE_SCN_ALIGN_1BYTES # Characteristics
+ IMAGE_SCN_MEM_DISCARDABLE # Characteristics
#endif

#
@@ -271,8 +268,7 @@ section_table:
.word 0 # NumberOfLineNumbers
.long IMAGE_SCN_CNT_CODE | \
IMAGE_SCN_MEM_READ | \
- IMAGE_SCN_MEM_EXECUTE | \
- IMAGE_SCN_ALIGN_16BYTES # Characteristics
+ IMAGE_SCN_MEM_EXECUTE # Characteristics

.set section_count, (. - section_table) / 40
#endif /* CONFIG_EFI_STUB */

2023-09-15 19:07:57

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On Fri, Sep 15, 2023 at 1:31 PM Ingo Molnar <[email protected]> wrote:
>
>
> * Ingo Molnar <[email protected]> wrote:
>
> > > Ard Biesheuvel (15):
> > > x86/efi: Drop EFI stub .bss from .data section
> > > x86/efi: Disregard setup header of loaded image
> > > x86/efi: Drop alignment flags from PE section headers
> > > x86/boot: Remove the 'bugger off' message
> > > x86/boot: Omit compression buffer from PE/COFF image memory footprint
> > > x86/boot: Drop redundant code setting the root device
> > > x86/boot: Grab kernel_info offset from zoffset header directly
> > > x86/boot: Drop references to startup_64
> >
> > I've applied these first 8 patches to tip:x86/boot with minor edits.

Thanks.

> > (Please preserve existing comment capitalization conventions ...)
> >

Ack

> > > x86/boot: Set EFI handover offset directly in header asm
> > > x86/boot: Define setup size in linker script
> > > x86/boot: Derive file size from _edata symbol
> > > x86/boot: Construct PE/COFF .text section from assembler
> > > x86/boot: Drop PE/COFF .reloc section
> > > x86/boot: Split off PE/COFF .data section
> > > x86/boot: Increase section and file alignment to 4k/512
> >
> > The rest conflicted with recent upstream changes, and I suppose it's
> > prudent to test these changes bit by bit anyway.
>

Agreed. So you mean this conflicts with other stuff queued up in -tip
already, right?

> So, the first 8 patches broke the x86-64-defconfig-ish Qemu bzImage bootup,
> due to the 8th patch:
>
> 988b52b207a9fe74c3699bda8c2256714926b94b is the first bad commit
> commit 988b52b207a9fe74c3699bda8c2256714926b94b
> Author: Ard Biesheuvel <[email protected]>
> Date: Tue Sep 12 09:01:01 2023 +0000
>
> x86/boot: Define setup size in linker script
>
> I've removed it for now - but this side effect was not expected.
>

No, definitely not expected. I tested various combinations of i386 /
x86_64 built with GCC / Clang doing EFI or BIOS boot.

I'll rebase the remaining stuff onto -tip and see if I can reproduce this.

Subject: [tip: x86/boot] x86/boot: Omit compression buffer from PE/COFF image memory footprint

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 8eace5b3555606e684739bef5bcdfcfe68235257
Gitweb: https://git.kernel.org/tip/8eace5b3555606e684739bef5bcdfcfe68235257
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Tue, 12 Sep 2023 09:00:56
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 15 Sep 2023 11:18:41 +02:00

x86/boot: Omit compression buffer from PE/COFF image memory footprint

Now that the EFI stub decompresses the kernel and hands over to the
decompressed image directly, there is no longer a need to provide a
decompression buffer as part of the .BSS allocation of the PE/COFF
image. It also means the PE/COFF image can be loaded anywhere in memory,
and setting the preferred image base is unnecessary. So drop the
handling of this from the header and from the build tool.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/header.S | 6 +---
arch/x86/boot/tools/build.c | 50 ++++--------------------------------
2 files changed, 8 insertions(+), 48 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index b24fa50..a87d913 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -90,12 +90,10 @@ optional_header:
#endif

extra_header_fields:
- # PE specification requires ImageBase to be 64k aligned
- .set image_base, (LOAD_PHYSICAL_ADDR + 0xffff) & ~0xffff
#ifdef CONFIG_X86_32
- .long image_base # ImageBase
+ .long 0 # ImageBase
#else
- .quad image_base # ImageBase
+ .quad 0 # ImageBase
#endif
.long 0x20 # SectionAlignment
.long 0x20 # FileAlignment
diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index bd24769..0354c22 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -65,7 +65,6 @@ static unsigned long efi_pe_entry;
static unsigned long efi32_pe_entry;
static unsigned long kernel_info;
static unsigned long startup_64;
-static unsigned long _ehead;
static unsigned long _end;

/*----------------------------------------------------------------------*/
@@ -229,35 +228,22 @@ static void update_pecoff_setup_and_reloc(unsigned int size)
#endif
}

-static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
- unsigned int init_sz)
+static void update_pecoff_text(unsigned int text_start, unsigned int file_sz)
{
unsigned int pe_header;
unsigned int text_sz = file_sz - text_start;
- unsigned int bss_sz = init_sz - file_sz;
+ unsigned int bss_sz = _end - text_sz;

pe_header = get_unaligned_le32(&buf[0x3c]);

/*
- * The PE/COFF loader may load the image at an address which is
- * misaligned with respect to the kernel_alignment field in the setup
- * header.
- *
- * In order to avoid relocating the kernel to correct the misalignment,
- * add slack to allow the buffer to be aligned within the declared size
- * of the image.
- */
- bss_sz += CONFIG_PHYSICAL_ALIGN;
- init_sz += CONFIG_PHYSICAL_ALIGN;
-
- /*
* Size of code: Subtract the size of the first sector (512 bytes)
* which includes the header.
*/
put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header + 0x1c]);

/* Size of image */
- put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
+ put_unaligned_le32(file_sz + bss_sz, &buf[pe_header + 0x50]);

/*
* Address of entry point for PE/COFF executable
@@ -308,8 +294,7 @@ static void efi_stub_entry_update(void)

static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
static inline void update_pecoff_text(unsigned int text_start,
- unsigned int file_sz,
- unsigned int init_sz) {}
+ unsigned int file_sz) {}
static inline void efi_stub_defaults(void) {}
static inline void efi_stub_entry_update(void) {}

@@ -360,7 +345,6 @@ static void parse_zoffset(char *fname)
PARSE_ZOFS(p, efi32_pe_entry);
PARSE_ZOFS(p, kernel_info);
PARSE_ZOFS(p, startup_64);
- PARSE_ZOFS(p, _ehead);
PARSE_ZOFS(p, _end);

p = strchr(p, '\n');
@@ -371,7 +355,7 @@ static void parse_zoffset(char *fname)

int main(int argc, char ** argv)
{
- unsigned int i, sz, setup_sectors, init_sz;
+ unsigned int i, sz, setup_sectors;
int c;
u32 sys_size;
struct stat sb;
@@ -442,31 +426,9 @@ int main(int argc, char ** argv)
buf[0x1f1] = setup_sectors-1;
put_unaligned_le32(sys_size, &buf[0x1f4]);

- init_sz = get_unaligned_le32(&buf[0x260]);
-#ifdef CONFIG_EFI_STUB
- /*
- * The decompression buffer will start at ImageBase. When relocating
- * the compressed kernel to its end, we must ensure that the head
- * section does not get overwritten. The head section occupies
- * [i, i + _ehead), and the destination is [init_sz - _end, init_sz).
- *
- * At present these should never overlap, because 'i' is at most 32k
- * because of SETUP_SECT_MAX, '_ehead' is less than 1k, and the
- * calculation of INIT_SIZE in boot/header.S ensures that
- * 'init_sz - _end' is at least 64k.
- *
- * For future-proofing, increase init_sz if necessary.
- */
-
- if (init_sz - _end < i + _ehead) {
- init_sz = (i + _ehead + _end + 4095) & ~4095;
- put_unaligned_le32(init_sz, &buf[0x260]);
- }
-#endif
- update_pecoff_text(setup_sectors * 512, i + (sys_size * 16), init_sz);
+ update_pecoff_text(setup_sectors * 512, i + (sys_size * 16));

efi_stub_entry_update();
-
/* Update kernel_info offset. */
put_unaligned_le32(kernel_info, &buf[0x268]);

Subject: [tip: x86/boot] x86/boot: Remove the 'bugger off' message

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 768171d7ebbce005210e1cf8456f043304805c15
Gitweb: https://git.kernel.org/tip/768171d7ebbce005210e1cf8456f043304805c15
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Tue, 12 Sep 2023 09:00:55
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 15 Sep 2023 11:18:41 +02:00

x86/boot: Remove the 'bugger off' message

Ancient (pre-2003) x86 kernels could boot from a floppy disk straight from
the BIOS, using a small real mode boot stub at the start of the image
where the BIOS would expect the boot record (or boot block) to appear.

Due to its limitations (kernel size < 1 MiB, no support for IDE, USB or
El Torito floppy emulation), this support was dropped, and a Linux aware
bootloader is now always required to boot the kernel from a legacy BIOS.

To smoothen this transition, the boot stub was not removed entirely, but
replaced with one that just prints an error message telling the user to
install a bootloader.

As it is unlikely that anyone doing direct floppy boot with such an
ancient kernel is going to upgrade to v6.5+ and expect that this boot
method still works, printing this message is kind of pointless, and so
it should be possible to remove the logic that emits it.

Let's free up this space so it can be used to expand the PE header in a
subsequent patch.

Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: H. Peter Anvin (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/header.S | 49 +-----------------------------------------
arch/x86/boot/setup.ld | 7 +++---
2 files changed, 4 insertions(+), 52 deletions(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 8c8148d..b24fa50 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -38,64 +38,15 @@ SYSSEG = 0x1000 /* historical load address >> 4 */

.code16
.section ".bstext", "ax"
-
- .global bootsect_start
-bootsect_start:
#ifdef CONFIG_EFI_STUB
# "MZ", MS-DOS header
.word MZ_MAGIC
-#endif
-
- # Normalize the start address
- ljmp $BOOTSEG, $start2
-
-start2:
- movw %cs, %ax
- movw %ax, %ds
- movw %ax, %es
- movw %ax, %ss
- xorw %sp, %sp
- sti
- cld
-
- movw $bugger_off_msg, %si
-
-msg_loop:
- lodsb
- andb %al, %al
- jz bs_die
- movb $0xe, %ah
- movw $7, %bx
- int $0x10
- jmp msg_loop
-
-bs_die:
- # Allow the user to press a key, then reboot
- xorw %ax, %ax
- int $0x16
- int $0x19
-
- # int 0x19 should never return. In case it does anyway,
- # invoke the BIOS reset code...
- ljmp $0xf000,$0xfff0
-
-#ifdef CONFIG_EFI_STUB
.org 0x38
#
# Offset to the PE header.
#
.long LINUX_PE_MAGIC
.long pe_header
-#endif /* CONFIG_EFI_STUB */
-
- .section ".bsdata", "a"
-bugger_off_msg:
- .ascii "Use a boot loader.\r\n"
- .ascii "\n"
- .ascii "Remove disk and press any key to reboot...\r\n"
- .byte 0
-
-#ifdef CONFIG_EFI_STUB
pe_header:
.long PE_MAGIC

diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 49546c2..b11c45b 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -10,10 +10,11 @@ ENTRY(_start)
SECTIONS
{
. = 0;
- .bstext : { *(.bstext) }
- .bsdata : { *(.bsdata) }
+ .bstext : {
+ *(.bstext)
+ . = 495;
+ } =0xffffffff

- . = 495;
.header : { *(.header) }
.entrytext : { *(.entrytext) }
.inittext : { *(.inittext) }

2023-09-15 21:48:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation


* Ard Biesheuvel <[email protected]> wrote:

> From: Ard Biesheuvel <[email protected]>
>
> Now that the EFI stub boot flow no longer relies on memory that is
> executable and writable at the same time, we can reorganize the PE/COFF
> view of the kernel image and expose the decompressor binary's code and
> r/o data as a .text section and data/bss as a .data section, using 4k
> alignment and limited permissions.
>
> Doing so is necessary for compatibility with hardening measures that are
> being rolled out on x86 PCs built to run Windows (i.e., the majority of
> them). The EFI boot environment that the Linux EFI stub executes in is
> especially sensitive to safety issues, given that a vulnerability in the
> loader of one OS can be abused to attack another.
>
> In true x86 fashion, this is a lot more complicated than on other
> architectures, which have implemented this code/data split with 4k
> alignment from the beginning. The complicating factor here is that the
> boot image consists of two different parts, which are stitched together
> and fixed up using a special build tool.
>
> After this series is applied, the only remaining task performed by the
> build tool is generating the CRC-32. Even though this checksum is
> usually wrong (given that distro kernels are signed for secure boot in a
> way that corrupts the CRC), this feature is retained as we cannot be
> sure that nobody is relying on this.
>
> This supersedes the work proposed by Evgeniy last year, which did a
> major rewrite of the build tool in order to clean it up, before updating
> it to generate the new 4k aligned image layout. As this series proves,
> the build tool is mostly unnecessary, and we have too many of those
> already.
>
> Changes since v1:
> - drop patch that removed the CRC and the build tool
> - do not use fixed setup_size but derive it in the setup.ld linker
> script
> - reorganize the PE header so the .compat section only covers its
> payload and the padding that follows it
> - add hpa's ack to patch #4
>
> Cc: Evgeniy Baskov <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Jones <[email protected]>
> Cc: Matthew Garrett <[email protected]>
> Cc: Gerd Hoffmann <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
>
> Ard Biesheuvel (15):
> x86/efi: Drop EFI stub .bss from .data section
> x86/efi: Disregard setup header of loaded image
> x86/efi: Drop alignment flags from PE section headers
> x86/boot: Remove the 'bugger off' message
> x86/boot: Omit compression buffer from PE/COFF image memory footprint
> x86/boot: Drop redundant code setting the root device
> x86/boot: Grab kernel_info offset from zoffset header directly
> x86/boot: Drop references to startup_64

I've applied these first 8 patches to tip:x86/boot with minor edits.
(Please preserve existing comment capitalization conventions ...)

> x86/boot: Set EFI handover offset directly in header asm
> x86/boot: Define setup size in linker script
> x86/boot: Derive file size from _edata symbol
> x86/boot: Construct PE/COFF .text section from assembler
> x86/boot: Drop PE/COFF .reloc section
> x86/boot: Split off PE/COFF .data section
> x86/boot: Increase section and file alignment to 4k/512

The rest conflicted with recent upstream changes, and I suppose it's
prudent to test these changes bit by bit anyway.

Thanks,

Ingo

2023-09-16 13:50:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation


* Ard Biesheuvel <[email protected]> wrote:

> > > So, the first 8 patches broke the x86-64-defconfig-ish Qemu bzImage bootup,
> > > due to the 8th patch:
> > >
> > > 988b52b207a9fe74c3699bda8c2256714926b94b is the first bad commit
> > > commit 988b52b207a9fe74c3699bda8c2256714926b94b
> > > Author: Ard Biesheuvel <[email protected]>
> > > Date: Tue Sep 12 09:01:01 2023 +0000
> > >
> > > x86/boot: Define setup size in linker script
> > >
> > > I've removed it for now - but this side effect was not expected.
> > >
> >
> > No, definitely not expected. I tested various combinations of i386 /
> > x86_64 built with GCC / Clang doing EFI or BIOS boot.
> >
> > I'll rebase the remaining stuff onto -tip and see if I can reproduce this.
>
> This is actually quite bizarre. x86_64_defconfig has
> CONFIG_EFI_MIXED=y and i tested that this change produces the exact
> same bzImage binary in that case.
>
> Could you send me the .config and the QEMU command line perhaps?

So the patch below is the delta between v2 and v3 - that is expected
to fix the bzImage boot crash, right?

Thanks,

Ingo

--- tip.orig/arch/x86/boot/setup.ld
+++ tip/arch/x86/boot/setup.ld
@@ -41,7 +41,7 @@ SECTIONS
LONG(0x5a5aaa55)

/* Reserve some extra space for the reloc and compat sections */
- setup_size = ABSOLUTE(ALIGN(. + 64, 512));
+ setup_size = ALIGN(ABSOLUTE(.) + 64, 512);
setup_sects = ABSOLUTE(setup_size / 512);
}


2023-09-16 19:15:29

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On Sat, 16 Sept 2023 at 11:11, Ingo Molnar <[email protected]> wrote:
>
>
> * Ard Biesheuvel <[email protected]> wrote:
>
> > > > So, the first 8 patches broke the x86-64-defconfig-ish Qemu bzImage bootup,
> > > > due to the 8th patch:
> > > >
> > > > 988b52b207a9fe74c3699bda8c2256714926b94b is the first bad commit
> > > > commit 988b52b207a9fe74c3699bda8c2256714926b94b
> > > > Author: Ard Biesheuvel <[email protected]>
> > > > Date: Tue Sep 12 09:01:01 2023 +0000
> > > >
> > > > x86/boot: Define setup size in linker script
> > > >
> > > > I've removed it for now - but this side effect was not expected.
> > > >
> > >
> > > No, definitely not expected. I tested various combinations of i386 /
> > > x86_64 built with GCC / Clang doing EFI or BIOS boot.
> > >
> > > I'll rebase the remaining stuff onto -tip and see if I can reproduce this.
> >
> > This is actually quite bizarre. x86_64_defconfig has
> > CONFIG_EFI_MIXED=y and i tested that this change produces the exact
> > same bzImage binary in that case.
> >
> > Could you send me the .config and the QEMU command line perhaps?
>
> So the patch below is the delta between v2 and v3 - that is expected
> to fix the bzImage boot crash, right?
>

Yes.

ld.bfd does something unexpected [to me] here, and the resulting value
turns out not to be a multiple of 512 at all.

With this tweak, my claim that this patch does not affect the binary
bzImage at all actually holds for ld.bfd as well (provided that
CONFIG_EFI_MIXED=y and CONFIG_LOCAL_VERSION_AUTO is disabled)

So if this still does not work in your test, could you please disable
CONFIG_LOCAL_VERSION_AUTO and compare the md5sums of the two builds?

Thanks,


> --- tip.orig/arch/x86/boot/setup.ld
> +++ tip/arch/x86/boot/setup.ld
> @@ -41,7 +41,7 @@ SECTIONS
> LONG(0x5a5aaa55)
>
> /* Reserve some extra space for the reloc and compat sections */
> - setup_size = ABSOLUTE(ALIGN(. + 64, 512));
> + setup_size = ALIGN(ABSOLUTE(.) + 64, 512);
> setup_sects = ABSOLUTE(setup_size / 512);
> }
>
>

2023-09-17 23:21:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation


* Ard Biesheuvel <[email protected]> wrote:

> On Sat, 16 Sept 2023 at 11:11, Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Ard Biesheuvel <[email protected]> wrote:
> >
> > > > > So, the first 8 patches broke the x86-64-defconfig-ish Qemu bzImage bootup,
> > > > > due to the 8th patch:
> > > > >
> > > > > 988b52b207a9fe74c3699bda8c2256714926b94b is the first bad commit
> > > > > commit 988b52b207a9fe74c3699bda8c2256714926b94b
> > > > > Author: Ard Biesheuvel <[email protected]>
> > > > > Date: Tue Sep 12 09:01:01 2023 +0000
> > > > >
> > > > > x86/boot: Define setup size in linker script
> > > > >
> > > > > I've removed it for now - but this side effect was not expected.
> > > > >
> > > >
> > > > No, definitely not expected. I tested various combinations of i386 /
> > > > x86_64 built with GCC / Clang doing EFI or BIOS boot.
> > > >
> > > > I'll rebase the remaining stuff onto -tip and see if I can reproduce this.
> > >
> > > This is actually quite bizarre. x86_64_defconfig has
> > > CONFIG_EFI_MIXED=y and i tested that this change produces the exact
> > > same bzImage binary in that case.
> > >
> > > Could you send me the .config and the QEMU command line perhaps?
> >
> > So the patch below is the delta between v2 and v3 - that is expected
> > to fix the bzImage boot crash, right?
> >
>
> Yes.
>
> ld.bfd does something unexpected [to me] here, and the resulting value
> turns out not to be a multiple of 512 at all.
>
> With this tweak, my claim that this patch does not affect the binary
> bzImage at all actually holds for ld.bfd as well (provided that
> CONFIG_EFI_MIXED=y and CONFIG_LOCAL_VERSION_AUTO is disabled)

Ok - it boots & works fine for me too now, with the full series applied.

Thanks,

Ingo

2023-10-03 02:03:28

by Jan Hendrik Farr

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On 12 09:00:51, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <[email protected]>
>
> Now that the EFI stub boot flow no longer relies on memory that is
> executable and writable at the same time, we can reorganize the PE/COFF
> view of the kernel image and expose the decompressor binary's code and
> r/o data as a .text section and data/bss as a .data section, using 4k
> alignment and limited permissions.
>
> Doing so is necessary for compatibility with hardening measures that are
> being rolled out on x86 PCs built to run Windows (i.e., the majority of
> them). The EFI boot environment that the Linux EFI stub executes in is
> especially sensitive to safety issues, given that a vulnerability in the
> loader of one OS can be abused to attack another.

This split is also useful for the work of kexecing the next kernel as an
EFI application. With the current EFI stub I have to set the memory both
writable and executable which results in W^X warnings with a default
config.

What made this more confusing was that the flags of the .text section in
current EFI stub bzImages are set to
IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ. So if you load that section
according to those flags the EFI stub will quickly run into issues.

I assume current firmware on x86 machines does not set any restricted
permissions on the memory. Can someone enlighten me on their behavior?


> In true x86 fashion, this is a lot more complicated than on other
> architectures, which have implemented this code/data split with 4k
> alignment from the beginning. The complicating factor here is that the
> boot image consists of two different parts, which are stitched together
> and fixed up using a special build tool.
>
> After this series is applied, the only remaining task performed by the
> build tool is generating the CRC-32. Even though this checksum is
> usually wrong (given that distro kernels are signed for secure boot in a
> way that corrupts the CRC), this feature is retained as we cannot be
> sure that nobody is relying on this.
>
> This supersedes the work proposed by Evgeniy last year, which did a
> major rewrite of the build tool in order to clean it up, before updating
> it to generate the new 4k aligned image layout. As this series proves,
> the build tool is mostly unnecessary, and we have too many of those
> already.
>
> Changes since v1:
> - drop patch that removed the CRC and the build tool
> - do not use fixed setup_size but derive it in the setup.ld linker
> script
> - reorganize the PE header so the .compat section only covers its
> payload and the padding that follows it
> - add hpa's ack to patch #4
>
> Cc: Evgeniy Baskov <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Peter Jones <[email protected]>
> Cc: Matthew Garrett <[email protected]>
> Cc: Gerd Hoffmann <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
>
> Ard Biesheuvel (15):
> x86/efi: Drop EFI stub .bss from .data section
> x86/efi: Disregard setup header of loaded image
> x86/efi: Drop alignment flags from PE section headers
> x86/boot: Remove the 'bugger off' message
> x86/boot: Omit compression buffer from PE/COFF image memory footprint
> x86/boot: Drop redundant code setting the root device
> x86/boot: Grab kernel_info offset from zoffset header directly
> x86/boot: Drop references to startup_64
> x86/boot: Set EFI handover offset directly in header asm
> x86/boot: Define setup size in linker script
> x86/boot: Derive file size from _edata symbol
> x86/boot: Construct PE/COFF .text section from assembler
> x86/boot: Drop PE/COFF .reloc section
> x86/boot: Split off PE/COFF .data section
> x86/boot: Increase section and file alignment to 4k/512
>
> arch/x86/boot/Makefile | 2 +-
> arch/x86/boot/compressed/vmlinux.lds.S | 6 +-
> arch/x86/boot/header.S | 213 ++++++---------
> arch/x86/boot/setup.ld | 14 +-
> arch/x86/boot/tools/build.c | 273 +-------------------
> drivers/firmware/efi/libstub/Makefile | 7 -
> drivers/firmware/efi/libstub/x86-stub.c | 46 +---
> 7 files changed, 114 insertions(+), 447 deletions(-)
>
> --
> 2.42.0.283.g2d96d420d3-goog
>

2023-10-23 11:23:46

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On Tue, 3 Oct 2023 at 04:03, Jan Hendrik Farr <[email protected]> wrote:
>
> On 12 09:00:51, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <[email protected]>
> >
> > Now that the EFI stub boot flow no longer relies on memory that is
> > executable and writable at the same time, we can reorganize the PE/COFF
> > view of the kernel image and expose the decompressor binary's code and
> > r/o data as a .text section and data/bss as a .data section, using 4k
> > alignment and limited permissions.
> >
> > Doing so is necessary for compatibility with hardening measures that are
> > being rolled out on x86 PCs built to run Windows (i.e., the majority of
> > them). The EFI boot environment that the Linux EFI stub executes in is
> > especially sensitive to safety issues, given that a vulnerability in the
> > loader of one OS can be abused to attack another.
>
> This split is also useful for the work of kexecing the next kernel as an
> EFI application. With the current EFI stub I have to set the memory both
> writable and executable which results in W^X warnings with a default
> config.
>
> What made this more confusing was that the flags of the .text section in
> current EFI stub bzImages are set to
> IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ. So if you load that section
> according to those flags the EFI stub will quickly run into issues.
>
> I assume current firmware on x86 machines does not set any restricted
> permissions on the memory. Can someone enlighten me on their behavior?
>

No current x86 firmware does not use restricted permissions at all.
All memory is mapped with both writable and executable permissions,
except maybe the stack.

The x86 Linux kernel has been depending on this behavior too, up until
recently (fixes are in -rc now for the v6.6 release). Before this, it
would copy its own executable image around in memory.

So EFI based kexec will need to support this behavior if it targets
older x86 kernels, although I am skeptical that this is a useful
design goal.

I have been experimenting with running the EFI stub code in user space
all the way until ExitBootServices(). The same might work for UKI if
it is layered cleanly on top of the EFI APIs (rather than poking into
system registers or page tables under the hood).

How this would work with signed images etc is TBD but I quite like the
idea of running everything in user space and having a minimal
purgatory (or none at all) if we can simply populate the entire
address space while running unprivileged, and just branch to it in the
kexec() syscall. I imagine this being something like a userspace
helper that is signed/trusted itself, and gets invoked by the kernel
to run EFI images that are trusted and tagged as being executable
unprivileged.

2023-10-23 17:36:02

by Jan Hendrik Farr

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On 23 13:22:53, Ard Biesheuvel wrote:
> On Tue, 3 Oct 2023 at 04:03, Jan Hendrik Farr <[email protected]> wrote:
> >
> > On 12 09:00:51, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <[email protected]>
> > >
> > > Now that the EFI stub boot flow no longer relies on memory that is
> > > executable and writable at the same time, we can reorganize the PE/COFF
> > > view of the kernel image and expose the decompressor binary's code and
> > > r/o data as a .text section and data/bss as a .data section, using 4k
> > > alignment and limited permissions.
> > >
> > > Doing so is necessary for compatibility with hardening measures that are
> > > being rolled out on x86 PCs built to run Windows (i.e., the majority of
> > > them). The EFI boot environment that the Linux EFI stub executes in is
> > > especially sensitive to safety issues, given that a vulnerability in the
> > > loader of one OS can be abused to attack another.
> >
> > This split is also useful for the work of kexecing the next kernel as an
> > EFI application. With the current EFI stub I have to set the memory both
> > writable and executable which results in W^X warnings with a default
> > config.
> >
> > What made this more confusing was that the flags of the .text section in
> > current EFI stub bzImages are set to
> > IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ. So if you load that section
> > according to those flags the EFI stub will quickly run into issues.
> >
> > I assume current firmware on x86 machines does not set any restricted
> > permissions on the memory. Can someone enlighten me on their behavior?
> >
>
> No current x86 firmware does not use restricted permissions at all.
> All memory is mapped with both writable and executable permissions,
> except maybe the stack.
>
> The x86 Linux kernel has been depending on this behavior too, up until
> recently (fixes are in -rc now for the v6.6 release). Before this, it
> would copy its own executable image around in memory.
>
> So EFI based kexec will need to support this behavior if it targets
> older x86 kernels, although I am skeptical that this is a useful
> design goal.

I don't see this as an important goal either.

> I have been experimenting with running the EFI stub code in user space
> all the way until ExitBootServices(). The same might work for UKI if
> it is layered cleanly on top of the EFI APIs (rather than poking into
> system registers or page tables under the hood).
>
> How this would work with signed images etc is TBD but I quite like the
> idea of running everything in user space and having a minimal
> purgatory (or none at all) if we can simply populate the entire
> address space while running unprivileged, and just branch to it in the
> kexec() syscall. I imagine this being something like a userspace
> helper that is signed/trusted itself, and gets invoked by the kernel
> to run EFI images that are trusted and tagged as being executable
> unprivileged.

I've been experimenting with running EFI apps inside kernel space instead.
This is the more natural approach for signed images. Sure, a malicious EFI
app could do arbitrary stuff in kernel mode, but they're signed. On the other
hand running this entirely in user space would at least guarantee that the
system can not crash due to a misbehaving EFI app (at least until
ExitBootServices()).

The question of whether or not to make this the job of a userspace helper that
is signed must have come up when kexec_file_load syscall was added. It would
have also been an option at the time to extend trust to a signed version of
the userspace kexec tool.

Why was kexec_file_load created instead of restricting kexec_load to a signed
version of the kexec userspace tool?

2023-10-24 08:23:05

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On Tue, 24 Oct 2023 at 01:37, Jan Hendrik Farr <[email protected]> wrote:
>
> On 23 13:22:53, Ard Biesheuvel wrote:
> > On Tue, 3 Oct 2023 at 04:03, Jan Hendrik Farr <[email protected]> wrote:
> > >
> > > On 12 09:00:51, Ard Biesheuvel wrote:
> > > > From: Ard Biesheuvel <[email protected]>
> > > >
> > > > Now that the EFI stub boot flow no longer relies on memory that is
> > > > executable and writable at the same time, we can reorganize the PE/COFF
> > > > view of the kernel image and expose the decompressor binary's code and
> > > > r/o data as a .text section and data/bss as a .data section, using 4k
> > > > alignment and limited permissions.
> > > >
> > > > Doing so is necessary for compatibility with hardening measures that are
> > > > being rolled out on x86 PCs built to run Windows (i.e., the majority of
> > > > them). The EFI boot environment that the Linux EFI stub executes in is
> > > > especially sensitive to safety issues, given that a vulnerability in the
> > > > loader of one OS can be abused to attack another.
> > >
> > > This split is also useful for the work of kexecing the next kernel as an
> > > EFI application. With the current EFI stub I have to set the memory both
> > > writable and executable which results in W^X warnings with a default
> > > config.
> > >
> > > What made this more confusing was that the flags of the .text section in
> > > current EFI stub bzImages are set to
> > > IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ. So if you load that section
> > > according to those flags the EFI stub will quickly run into issues.
> > >
> > > I assume current firmware on x86 machines does not set any restricted
> > > permissions on the memory. Can someone enlighten me on their behavior?
> > >
> >
> > No current x86 firmware does not use restricted permissions at all.
> > All memory is mapped with both writable and executable permissions,
> > except maybe the stack.
> >
> > The x86 Linux kernel has been depending on this behavior too, up until
> > recently (fixes are in -rc now for the v6.6 release). Before this, it
> > would copy its own executable image around in memory.
> >
> > So EFI based kexec will need to support this behavior if it targets
> > older x86 kernels, although I am skeptical that this is a useful
> > design goal.
>
> I don't see this as an important goal either.
>
> > I have been experimenting with running the EFI stub code in user space
> > all the way until ExitBootServices(). The same might work for UKI if
> > it is layered cleanly on top of the EFI APIs (rather than poking into
> > system registers or page tables under the hood).
> >
> > How this would work with signed images etc is TBD but I quite like the
> > idea of running everything in user space and having a minimal
> > purgatory (or none at all) if we can simply populate the entire
> > address space while running unprivileged, and just branch to it in the
> > kexec() syscall. I imagine this being something like a userspace
> > helper that is signed/trusted itself, and gets invoked by the kernel
> > to run EFI images that are trusted and tagged as being executable
> > unprivileged.
>
> I've been experimenting with running EFI apps inside kernel space instead.
> This is the more natural approach for signed images. Sure, a malicious EFI
> app could do arbitrary stuff in kernel mode, but they're signed. On the other
> hand running this entirely in user space would at least guarantee that the
> system can not crash due to a misbehaving EFI app (at least until
> ExitBootServices()).
>
> The question of whether or not to make this the job of a userspace helper that
> is signed must have come up when kexec_file_load syscall was added. It would
> have also been an option at the time to extend trust to a signed version of
> the userspace kexec tool.
>
> Why was kexec_file_load created instead of restricting kexec_load to a signed
> version of the kexec userspace tool?

I think one of the reasons is that it is hard to handle dynamic linked
libraries, not only the kexec-tools binary.

Thanks
Dave

2023-10-24 08:32:56

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH v2 00/15] x86/boot: Rework PE header generation

On Tue, 24 Oct 2023 at 16:21, Dave Young <[email protected]> wrote:
>
> On Tue, 24 Oct 2023 at 01:37, Jan Hendrik Farr <[email protected]> wrote:
> >
> > On 23 13:22:53, Ard Biesheuvel wrote:
> > > On Tue, 3 Oct 2023 at 04:03, Jan Hendrik Farr <[email protected]> wrote:
> > > >
> > > > On 12 09:00:51, Ard Biesheuvel wrote:
> > > > > From: Ard Biesheuvel <[email protected]>
> > > > >
> > > > > Now that the EFI stub boot flow no longer relies on memory that is
> > > > > executable and writable at the same time, we can reorganize the PE/COFF
> > > > > view of the kernel image and expose the decompressor binary's code and
> > > > > r/o data as a .text section and data/bss as a .data section, using 4k
> > > > > alignment and limited permissions.
> > > > >
> > > > > Doing so is necessary for compatibility with hardening measures that are
> > > > > being rolled out on x86 PCs built to run Windows (i.e., the majority of
> > > > > them). The EFI boot environment that the Linux EFI stub executes in is
> > > > > especially sensitive to safety issues, given that a vulnerability in the
> > > > > loader of one OS can be abused to attack another.
> > > >
> > > > This split is also useful for the work of kexecing the next kernel as an
> > > > EFI application. With the current EFI stub I have to set the memory both
> > > > writable and executable which results in W^X warnings with a default
> > > > config.
> > > >
> > > > What made this more confusing was that the flags of the .text section in
> > > > current EFI stub bzImages are set to
> > > > IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_MEM_READ. So if you load that section
> > > > according to those flags the EFI stub will quickly run into issues.
> > > >
> > > > I assume current firmware on x86 machines does not set any restricted
> > > > permissions on the memory. Can someone enlighten me on their behavior?
> > > >
> > >
> > > No current x86 firmware does not use restricted permissions at all.
> > > All memory is mapped with both writable and executable permissions,
> > > except maybe the stack.
> > >
> > > The x86 Linux kernel has been depending on this behavior too, up until
> > > recently (fixes are in -rc now for the v6.6 release). Before this, it
> > > would copy its own executable image around in memory.
> > >
> > > So EFI based kexec will need to support this behavior if it targets
> > > older x86 kernels, although I am skeptical that this is a useful
> > > design goal.
> >
> > I don't see this as an important goal either.
> >
> > > I have been experimenting with running the EFI stub code in user space
> > > all the way until ExitBootServices(). The same might work for UKI if
> > > it is layered cleanly on top of the EFI APIs (rather than poking into
> > > system registers or page tables under the hood).
> > >
> > > How this would work with signed images etc is TBD but I quite like the
> > > idea of running everything in user space and having a minimal
> > > purgatory (or none at all) if we can simply populate the entire
> > > address space while running unprivileged, and just branch to it in the
> > > kexec() syscall. I imagine this being something like a userspace
> > > helper that is signed/trusted itself, and gets invoked by the kernel
> > > to run EFI images that are trusted and tagged as being executable
> > > unprivileged.
> >
> > I've been experimenting with running EFI apps inside kernel space instead.
> > This is the more natural approach for signed images. Sure, a malicious EFI
> > app could do arbitrary stuff in kernel mode, but they're signed. On the other
> > hand running this entirely in user space would at least guarantee that the
> > system can not crash due to a misbehaving EFI app (at least until
> > ExitBootServices()).
> >
> > The question of whether or not to make this the job of a userspace helper that
> > is signed must have come up when kexec_file_load syscall was added. It would
> > have also been an option at the time to extend trust to a signed version of
> > the userspace kexec tool.
> >
> > Why was kexec_file_load created instead of restricting kexec_load to a signed
> > version of the kexec userspace tool?
>
> I think one of the reasons is that it is hard to handle dynamic linked
> libraries, not only the kexec-tools binary.

Hmm, another one is that ptrace needs to be disabled in some way,
anyway I think it is way too complicated, but I do not remember the
details, added Vivek in cc.
See this article: https://lwn.net/Articles/532778/

>
> Thanks
> Dave