This patchset is aimed
* to improve UEFI compatibility of compressed kernel code for x86_64
* to setup proper memory access attributes for code and rodata sections
* to implement W^X protection policy throughout the whole execution
of compressed kernel for EFISTUB code path.
Kernel is made to be more compatible with PE image specification [3],
allowing it to be successfully loaded by stricter PE loader
implementations like the one from [2]. There is at least one
known implementation that uses that loader in production [4].
There are also ongoing efforts to upstream these changes.
Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
EFI specification since version 2.10, as a better alternative to
using DXE services for memory protection attributes manipulation,
since it is defined by the UEFI specification itself and not UEFI PI
specification. This protocol is not widely available so the code
using DXE services is kept in place as a fallback in case specific
implementation does not support the new protocol.
One of EFI implementations that already support
EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].
Kernel image generation tool (tools/build.c) is refactored as a part
of changes that makes PE image more compatible.
The patchset implements memory protection for compressed kernel
code while executing both inside EFI boot services and outside of
them. For EFISTUB code path W^X protection policy is maintained
throughout the whole execution of compressed kernel. The latter
is achieved by extracting the kernel directly from EFI environment
and jumping to it's head immediately after exiting EFI boot services.
As a side effect of this change one page table rebuild and a copy of
the kernel image is removed.
Memory protection inside EFI environment is controlled by the
CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
protection attributes of PE sections and not only DXE services as the
name might suggest.
Changes in v2:
* Fix spelling.
* Rebase code to current master.
* Split huge patches into smaller ones.
* Remove unneeded forward declarations.
* Make direct extraction unconditional.
* Also make it work for x86_32.
* Reduce lower limit of KASLR to 64M.
* Make callback interface more logically consistent.
* Actually declare callbacks structure before using it.
* Mention effect on x86_32 in commit message of
"x86/build: Remove RWX sections and align on 4KB".
* Clarify commit message of
"x86/boot: Increase boot page table size".
* Remove "startup32_" prefix on startup32_enable_nx_if_supported.
* Move linker generated sections outside of function scope.
* Drop some unintended changes.
* Drop generating 2 reloc entries.
(as I've misread the documentation and there's no need for this change.)
* Set has_nx from enable_nx_if_supported correctly.
* Move ELF header check to build time.
* Set WP at the same time as PG in trampoline code,
as it is more logically consistent.
* Put x86-specific EFISTUB definitions in x86-stub.h header.
* Catch presence of ELF segments violating W^X during build.
* Move PE definitions from build.c to a new header file.
* Fix generation of PE '.compat' section.
I decided to keep protection of compressed kernel blob and '.rodata'
separate from '.text' for now, since it does not really have a lot
of overhead.
Otherwise, all comments on v1 seems to be addressed.
Changes in v3:
* Setup IDT before issuing cpuid so that AMD SEV #VC handler is set.
* Replace memcpy with strncpy to prevent out-of-bounds reads in tools/build.c.
* Zero BSS before entering efi_main(), since it can contain garbage
when booting via EFI handover protocol.
* When booting via EFI don't require init_size of RAM, since in-place
unpacking is not used anyway with that interface. This saves ~40M of memory
for debian .config.
* Setup sections memory protection in efi_main() to cover EFI handover protocol,
where EFI sections are likely not properly protected.
Changes in v4:
* Add one missing identity mapping.
* Include following patches improving the use of DXE services:
- efi/x86: don't try to set page attributes on 0-sized regions.
- efi/x86: don't set unsupported memory attributes
Patch "x86/boot: Support 4KB pages for identity mapping" needs review
from x86/mm team.
I have also included Peter's patches [6-8] into the series for simplicity.
Many thanks to Ard Biesheuvel <[email protected]> and
Andrew Cooper <[email protected]> for reviewing the patches, and to
Peter Jones <[email protected]>, Mario Limonciello <[email protected]> and
Joey Lee <[email protected]> for additional testing!
[1] https://lkml.org/lkml/2022/8/1/1314
[2] https://github.com/acidanthera/audk/tree/secure_pe
[3] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
[4] https://www.ispras.ru/en/technologies/asperitas/
[5] https://github.com/microsoft/mu_tiano_platforms
[6] https://lkml.org/lkml/2022/10/18/1178
[7] https://lkml.org/lkml/2022/12/13/840
[8] https://lkml.org/lkml/2022/12/13/841
Evgeniy Baskov (23):
x86/boot: Align vmlinuz sections on page size
x86/build: Remove RWX sections and align on 4KB
x86/boot: Set cr0 to known state in trampoline
x86/boot: Increase boot page table size
x86/boot: Support 4KB pages for identity mapping
x86/boot: Setup memory protection for bzImage code
x86/build: Check W^X of vmlinux during build
x86/boot: Map memory explicitly
x86/boot: Remove mapping from page fault handler
efi/libstub: Move helper function to related file
x86/boot: Make console interface more abstract
x86/boot: Make kernel_add_identity_map() a pointer
x86/boot: Split trampoline and pt init code
x86/boot: Add EFI kernel extraction interface
efi/x86: Support extracting kernel from libstub
x86/boot: Reduce lower limit of physical KASLR
x86/boot: Reduce size of the DOS stub
tools/include: Add simplified version of pe.h
x86/build: Cleanup tools/build.c
x86/build: Make generated PE more spec compliant
efi/x86: Explicitly set sections memory attributes
efi/libstub: Add memory attribute protocol definitions
efi/libstub: Use memory attribute protocol
Peter Jones (3):
efi/libstub: make memory protection warnings include newlines.
efi/x86: don't try to set page attributes on 0-sized regions.
efi/x86: don't set unsupported memory attributes
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/Makefile | 8 +-
arch/x86/boot/compressed/acpi.c | 25 +-
arch/x86/boot/compressed/efi.c | 19 +-
arch/x86/boot/compressed/head_32.S | 53 +-
arch/x86/boot/compressed/head_64.S | 89 ++-
arch/x86/boot/compressed/ident_map_64.c | 122 ++--
arch/x86/boot/compressed/kaslr.c | 8 +-
arch/x86/boot/compressed/misc.c | 278 ++++-----
arch/x86/boot/compressed/misc.h | 23 +-
arch/x86/boot/compressed/pgtable.h | 20 -
arch/x86/boot/compressed/pgtable_64.c | 75 ++-
arch/x86/boot/compressed/putstr.c | 130 ++++
arch/x86/boot/compressed/sev.c | 6 +-
arch/x86/boot/compressed/vmlinux.lds.S | 6 +
arch/x86/boot/header.S | 110 +---
arch/x86/boot/tools/build.c | 569 +++++++++++-------
arch/x86/include/asm/boot.h | 26 +-
arch/x86/include/asm/efi.h | 7 +
arch/x86/include/asm/init.h | 1 +
arch/x86/include/asm/shared/extract.h | 26 +
arch/x86/include/asm/shared/pgtable.h | 29 +
arch/x86/kernel/vmlinux.lds.S | 15 +-
arch/x86/mm/ident_map.c | 185 +++++-
drivers/firmware/efi/Kconfig | 2 +
drivers/firmware/efi/libstub/Makefile | 2 +-
drivers/firmware/efi/libstub/efistub.h | 26 +
drivers/firmware/efi/libstub/mem.c | 194 ++++++
.../firmware/efi/libstub/x86-extract-direct.c | 208 +++++++
drivers/firmware/efi/libstub/x86-stub.c | 231 ++-----
drivers/firmware/efi/libstub/x86-stub.h | 14 +
include/linux/efi.h | 1 +
tools/include/linux/pe.h | 150 +++++
33 files changed, 1860 insertions(+), 800 deletions(-)
delete mode 100644 arch/x86/boot/compressed/pgtable.h
create mode 100644 arch/x86/boot/compressed/putstr.c
create mode 100644 arch/x86/include/asm/shared/extract.h
create mode 100644 arch/x86/include/asm/shared/pgtable.h
create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
create mode 100644 tools/include/linux/pe.h
--
2.37.4
Convert kernel_add_identity_map() into a function pointer to be able
to provide alternative implementations of this function. Required
to enable calling the code using this function from EFI environment.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 7 ++++---
arch/x86/boot/compressed/misc.c | 24 ++++++++++++++++++++++++
arch/x86/boot/compressed/misc.h | 15 +++------------
3 files changed, 31 insertions(+), 15 deletions(-)
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index ba5108c58a4e..1aee524d3c2b 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -92,9 +92,9 @@ bool has_nx; /* set in head_64.S */
/*
* Adds the specified range to the identity mappings.
*/
-unsigned long kernel_add_identity_map(unsigned long start,
- unsigned long end,
- unsigned int flags)
+unsigned long kernel_add_identity_map_(unsigned long start,
+ unsigned long end,
+ unsigned int flags)
{
int ret;
@@ -142,6 +142,7 @@ void initialize_identity_maps(void *rmode)
struct setup_data *sd;
boot_params = rmode;
+ kernel_add_identity_map = kernel_add_identity_map_;
/* Exclude the encryption mask from __PHYSICAL_MASK */
physical_mask &= ~sme_me_mask;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index aa4a22bc9cf9..c9c235d65d16 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -275,6 +275,22 @@ static void parse_elf(void *output, unsigned long output_len,
free(phdrs);
}
+/*
+ * This points to actual implementation of mapping function
+ * for current environment: either EFI API wrapper,
+ * own implementation or dummy implementation below.
+ */
+unsigned long (*kernel_add_identity_map)(unsigned long start,
+ unsigned long end,
+ unsigned int flags);
+
+static inline unsigned long kernel_add_identity_map_dummy(unsigned long start,
+ unsigned long end,
+ unsigned int flags)
+{
+ return start;
+}
+
/*
* The compressed kernel image (ZO), has been moved so that its position
* is against the end of the buffer used to hold the uncompressed kernel
@@ -312,6 +328,14 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
init_default_io_ops();
+ /*
+ * On 64-bit this pointer is set during page table uninitialization,
+ * but on 32-bit it remains uninitialized, since paging is disabled.
+ */
+ if (IS_ENABLED(CONFIG_X86_32))
+ kernel_add_identity_map = kernel_add_identity_map_dummy;
+
+
/*
* Detect TDX guest environment.
*
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 38d31bec062d..0076b2845b4b 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -180,18 +180,9 @@ static inline int count_immovable_mem_regions(void) { return 0; }
#ifdef CONFIG_X86_5LEVEL
extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
#endif
-#ifdef CONFIG_X86_64
-extern unsigned long kernel_add_identity_map(unsigned long start,
- unsigned long end,
- unsigned int flags);
-#else
-static inline unsigned long kernel_add_identity_map(unsigned long start,
- unsigned long end,
- unsigned int flags)
-{
- return start;
-}
-#endif
+extern unsigned long (*kernel_add_identity_map)(unsigned long start,
+ unsigned long end,
+ unsigned int flags);
/* Used by PAGE_KERN* macros: */
extern pteval_t __default_kernel_pte_mask;
--
2.37.4
Current identity mapping code only supports 2M and 1G pages.
4KB pages are desirable for better memory protection granularity
in compressed kernel code.
Change identity mapping code to support 4KB pages and
memory remapping with different attributes.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/include/asm/init.h | 1 +
arch/x86/mm/ident_map.c | 185 +++++++++++++++++++++++++++++-------
2 files changed, 154 insertions(+), 32 deletions(-)
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 5f1d3c421f68..a8277ee82c51 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -8,6 +8,7 @@ struct x86_mapping_info {
unsigned long page_flag; /* page flag for PMD or PUD entry */
unsigned long offset; /* ident mapping offset */
bool direct_gbpages; /* PUD level 1GB page support */
+ bool allow_4kpages; /* Allow more granular mappings with 4K pages */
unsigned long kernpg_flag; /* kernel pagetable flag override */
};
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a7..662e794a325d 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -4,24 +4,127 @@
* included by both the compressed kernel and the regular kernel.
*/
-static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
- unsigned long addr, unsigned long end)
+static void ident_pte_init(struct x86_mapping_info *info, pte_t *pte_page,
+ unsigned long addr, unsigned long end,
+ unsigned long flags)
{
- addr &= PMD_MASK;
- for (; addr < end; addr += PMD_SIZE) {
+ addr &= PAGE_MASK;
+ for (; addr < end; addr += PAGE_SIZE) {
+ pte_t *pte = pte_page + pte_index(addr);
+
+ set_pte(pte, __pte((addr - info->offset) | flags));
+ }
+}
+
+pte_t *ident_split_large_pmd(struct x86_mapping_info *info,
+ pmd_t *pmdp, unsigned long page_addr)
+{
+ unsigned long pmd_addr, page_flags;
+ pte_t *pte;
+
+ pte = (pte_t *)info->alloc_pgt_page(info->context);
+ if (!pte)
+ return NULL;
+
+ pmd_addr = page_addr & PMD_MASK;
+
+ /* Not a large page - clear PSE flag */
+ page_flags = pmd_flags(*pmdp) & ~_PSE;
+ ident_pte_init(info, pte, pmd_addr, pmd_addr + PMD_SIZE, page_flags);
+
+ return pte;
+}
+
+static int ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
+ unsigned long addr, unsigned long end,
+ unsigned long flags)
+{
+ unsigned long next;
+ bool new_table = 0;
+
+ for (; addr < end; addr = next) {
pmd_t *pmd = pmd_page + pmd_index(addr);
+ pte_t *pte;
- if (pmd_present(*pmd))
+ next = (addr & PMD_MASK) + PMD_SIZE;
+ if (next > end)
+ next = end;
+
+ /*
+ * Use 2M pages if 4k pages are not allowed or
+ * we are not mapping extra, i.e. address and size are aligned.
+ */
+
+ if (!info->allow_4kpages ||
+ (!(addr & ~PMD_MASK) && next == addr + PMD_SIZE)) {
+
+ pmd_t pmdval;
+
+ addr &= PMD_MASK;
+ pmdval = __pmd((addr - info->offset) | flags | _PSE);
+ set_pmd(pmd, pmdval);
continue;
+ }
+
+ /*
+ * If currently mapped page is large, we need to split it.
+ * The case when we don't can remap 2M page to 2M page
+ * with different flags is already covered above.
+ *
+ * If there's nothing mapped to desired address,
+ * we need to allocate new page table.
+ */
- set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
+ if (pmd_large(*pmd)) {
+ pte = ident_split_large_pmd(info, pmd, addr);
+ new_table = 1;
+ } else if (!pmd_present(*pmd)) {
+ pte = (pte_t *)info->alloc_pgt_page(info->context);
+ new_table = 1;
+ } else {
+ pte = pte_offset_kernel(pmd, 0);
+ new_table = 0;
+ }
+
+ if (!pte)
+ return -ENOMEM;
+
+ ident_pte_init(info, pte, addr, next, flags);
+
+ if (new_table)
+ set_pmd(pmd, __pmd(__pa(pte) | info->kernpg_flag));
}
+
+ return 0;
}
+
+pmd_t *ident_split_large_pud(struct x86_mapping_info *info,
+ pud_t *pudp, unsigned long page_addr)
+{
+ unsigned long pud_addr, page_flags;
+ pmd_t *pmd;
+
+ pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+ if (!pmd)
+ return NULL;
+
+ pud_addr = page_addr & PUD_MASK;
+
+ /* Not a large page - clear PSE flag */
+ page_flags = pud_flags(*pudp) & ~_PSE;
+ ident_pmd_init(info, pmd, pud_addr, pud_addr + PUD_SIZE, page_flags);
+
+ return pmd;
+}
+
+
static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
unsigned long addr, unsigned long end)
{
unsigned long next;
+ bool new_table = 0;
+ int result;
for (; addr < end; addr = next) {
pud_t *pud = pud_page + pud_index(addr);
@@ -31,28 +134,39 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
if (next > end)
next = end;
+ /* Use 1G pages only if forced, even if they are supported. */
if (info->direct_gbpages) {
pud_t pudval;
-
- if (pud_present(*pud))
- continue;
+ unsigned long flags;
addr &= PUD_MASK;
- pudval = __pud((addr - info->offset) | info->page_flag);
+ flags = info->page_flag | _PSE;
+ pudval = __pud((addr - info->offset) | flags);
+
set_pud(pud, pudval);
continue;
}
- if (pud_present(*pud)) {
+ if (pud_large(*pud)) {
+ pmd = ident_split_large_pud(info, pud, addr);
+ new_table = 1;
+ } else if (!pud_present(*pud)) {
+ pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+ new_table = 1;
+ } else {
pmd = pmd_offset(pud, 0);
- ident_pmd_init(info, pmd, addr, next);
- continue;
+ new_table = 0;
}
- pmd = (pmd_t *)info->alloc_pgt_page(info->context);
+
if (!pmd)
return -ENOMEM;
- ident_pmd_init(info, pmd, addr, next);
- set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
+
+ result = ident_pmd_init(info, pmd, addr, next, info->page_flag);
+ if (result)
+ return result;
+
+ if (new_table)
+ set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
}
return 0;
@@ -63,6 +177,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
{
unsigned long next;
int result;
+ bool new_table = 0;
for (; addr < end; addr = next) {
p4d_t *p4d = p4d_page + p4d_index(addr);
@@ -72,15 +187,14 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
if (next > end)
next = end;
- if (p4d_present(*p4d)) {
+ if (!p4d_present(*p4d)) {
+ pud = (pud_t *)info->alloc_pgt_page(info->context);
+ new_table = 1;
+ } else {
pud = pud_offset(p4d, 0);
- result = ident_pud_init(info, pud, addr, next);
- if (result)
- return result;
-
- continue;
+ new_table = 0;
}
- pud = (pud_t *)info->alloc_pgt_page(info->context);
+
if (!pud)
return -ENOMEM;
@@ -88,19 +202,22 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
if (result)
return result;
- set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
+ if (new_table)
+ set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
}
return 0;
}
-int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
- unsigned long pstart, unsigned long pend)
+int kernel_ident_mapping_init(struct x86_mapping_info *info,
+ pgd_t *pgd_page, unsigned long pstart,
+ unsigned long pend)
{
unsigned long addr = pstart + info->offset;
unsigned long end = pend + info->offset;
unsigned long next;
int result;
+ bool new_table;
/* Set the default pagetable flags if not supplied */
if (!info->kernpg_flag)
@@ -117,20 +234,24 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
if (next > end)
next = end;
- if (pgd_present(*pgd)) {
+ if (!pgd_present(*pgd)) {
+ p4d = (p4d_t *)info->alloc_pgt_page(info->context);
+ new_table = 1;
+ } else {
p4d = p4d_offset(pgd, 0);
- result = ident_p4d_init(info, p4d, addr, next);
- if (result)
- return result;
- continue;
+ new_table = 0;
}
- p4d = (p4d_t *)info->alloc_pgt_page(info->context);
if (!p4d)
return -ENOMEM;
+
result = ident_p4d_init(info, p4d, addr, next);
if (result)
return result;
+
+ if (!new_table)
+ continue;
+
if (pgtable_l5_enabled()) {
set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag));
} else {
--
2.37.4
Previous upper limit ignored pages implicitly mapped from #PF handler
by code accessing ACPI tables (boot/compressed/{acpi.c,efi.c}),
so theoretical upper limit is higher than it was set.
Using 4KB pages is desirable for better memory protection granularity.
Approximately twice as much memory is required for those.
Increase initial page table size to 64 4KB page tables.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/include/asm/boot.h | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280d9ea3..024d972c248e 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -41,22 +41,24 @@
# define BOOT_STACK_SIZE 0x4000
# define BOOT_INIT_PGT_SIZE (6*4096)
-# ifdef CONFIG_RANDOMIZE_BASE
/*
* Assuming all cross the 512GB boundary:
* 1 page for level4
- * (2+2)*4 pages for kernel, param, cmd_line, and randomized kernel
- * 2 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
- * Total is 19 pages.
+ * (3+3)*2 pages for param and cmd_line
+ * (2+2+S)*2 pages for kernel and randomized kernel, where S is total number
+ * of sections of kernel. Explanation: 2+2 are upper level page tables.
+ * We can have only S unaligned parts of section: 1 at the end of the kernel
+ * and (S-1) at the section borders. The start address of the kernel is
+ * aligned, so an extra page table. There are at most S=6 sections in
+ * vmlinux ELF image.
+ * 3 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
+ * Total is 36 pages.
+ *
+ * Some pages are also required for UEFI memory map and
+ * ACPI table mappings, so we need to add extra space.
+ * FIXME: Figure out exact amount of pages.
*/
-# ifdef CONFIG_X86_VERBOSE_BOOTUP
-# define BOOT_PGT_SIZE (19*4096)
-# else /* !CONFIG_X86_VERBOSE_BOOTUP */
-# define BOOT_PGT_SIZE (17*4096)
-# endif
-# else /* !CONFIG_RANDOMIZE_BASE */
-# define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
-# endif
+# define BOOT_PGT_SIZE (64*4096)
#else /* !CONFIG_X86_64 */
# define BOOT_STACK_SIZE 0x1000
--
2.37.4
From: Peter Jones <[email protected]>
In "efi/x86: Explicitly set sections memory attributes", the following
region is defined to help compute page permissions:
/* .setup [image_base, _head] */
efi_adjust_memory_range_protection(image_base,
(unsigned long)_head - image_base,
EFI_MEMORY_RO | EFI_MEMORY_XP);
In at least some cases, that will result in a size of 0, which will
produce an error and a message on the console, though no actual failure
will be caused in the boot process.
This patch checks that case in efi_adjust_memory_range_protection() and
returns the error without logging.
Signed-off-by: Peter Jones <[email protected]>
---
drivers/firmware/efi/libstub/mem.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index b31d1975caa2..50a0b649b75a 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -249,6 +249,9 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
efi_physical_addr_t rounded_start, rounded_end;
unsigned long attr_clear;
+ if (size == 0)
+ return EFI_INVALID_PARAMETER;
+
/*
* This function should not be used to modify attributes
* other than writable/executable.
--
2.37.4
To be able to extract kernel from EFI, console output functions
need to be replaceable by alternative implementations.
Make all of those functions pointers.
Move serial console code to separate file.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 109 +------------------------
arch/x86/boot/compressed/misc.h | 9 ++-
arch/x86/boot/compressed/putstr.c | 130 ++++++++++++++++++++++++++++++
4 files changed, 139 insertions(+), 111 deletions(-)
create mode 100644 arch/x86/boot/compressed/putstr.c
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 4dcab38f5a38..4b1524446875 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -93,7 +93,7 @@ $(obj)/misc.o: $(obj)/../voffset.h
vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o \
$(obj)/misc.o $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
- $(obj)/piggy.o $(obj)/cpuflags.o
+ $(obj)/piggy.o $(obj)/cpuflags.o $(obj)/putstr.o
vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 0c7ec290044d..aa4a22bc9cf9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -53,13 +53,6 @@ struct port_io_ops pio_ops;
memptr free_mem_ptr;
memptr free_mem_end_ptr;
-static char *vidmem;
-static int vidport;
-
-/* These might be accessed before .bss is cleared, so use .data instead. */
-static int lines __section(".data");
-static int cols __section(".data");
-
#ifdef CONFIG_KERNEL_GZIP
#include "../../../../lib/decompress_inflate.c"
#endif
@@ -92,95 +85,6 @@ static int cols __section(".data");
* ../header.S.
*/
-static void scroll(void)
-{
- int i;
-
- memmove(vidmem, vidmem + cols * 2, (lines - 1) * cols * 2);
- for (i = (lines - 1) * cols * 2; i < lines * cols * 2; i += 2)
- vidmem[i] = ' ';
-}
-
-#define XMTRDY 0x20
-
-#define TXR 0 /* Transmit register (WRITE) */
-#define LSR 5 /* Line Status */
-static void serial_putchar(int ch)
-{
- unsigned timeout = 0xffff;
-
- while ((inb(early_serial_base + LSR) & XMTRDY) == 0 && --timeout)
- cpu_relax();
-
- outb(ch, early_serial_base + TXR);
-}
-
-void __putstr(const char *s)
-{
- int x, y, pos;
- char c;
-
- if (early_serial_base) {
- const char *str = s;
- while (*str) {
- if (*str == '\n')
- serial_putchar('\r');
- serial_putchar(*str++);
- }
- }
-
- if (lines == 0 || cols == 0)
- return;
-
- x = boot_params->screen_info.orig_x;
- y = boot_params->screen_info.orig_y;
-
- while ((c = *s++) != '\0') {
- if (c == '\n') {
- x = 0;
- if (++y >= lines) {
- scroll();
- y--;
- }
- } else {
- vidmem[(x + cols * y) * 2] = c;
- if (++x >= cols) {
- x = 0;
- if (++y >= lines) {
- scroll();
- y--;
- }
- }
- }
- }
-
- boot_params->screen_info.orig_x = x;
- boot_params->screen_info.orig_y = y;
-
- pos = (x + cols * y) * 2; /* Update cursor position */
- outb(14, vidport);
- outb(0xff & (pos >> 9), vidport+1);
- outb(15, vidport);
- outb(0xff & (pos >> 1), vidport+1);
-}
-
-void __puthex(unsigned long value)
-{
- char alpha[2] = "0";
- int bits;
-
- for (bits = sizeof(value) * 8 - 4; bits >= 0; bits -= 4) {
- unsigned long digit = (value >> bits) & 0xf;
-
- if (digit < 0xA)
- alpha[0] = '0' + digit;
- else
- alpha[0] = 'a' + (digit - 0xA);
-
- __putstr(alpha);
- }
-}
-
#ifdef CONFIG_X86_NEED_RELOCS
static void handle_relocations(void *output, unsigned long output_len,
unsigned long virt_addr)
@@ -406,17 +310,6 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
sanitize_boot_params(boot_params);
- if (boot_params->screen_info.orig_video_mode == 7) {
- vidmem = (char *) 0xb0000;
- vidport = 0x3b4;
- } else {
- vidmem = (char *) 0xb8000;
- vidport = 0x3d4;
- }
-
- lines = boot_params->screen_info.orig_video_lines;
- cols = boot_params->screen_info.orig_video_cols;
-
init_default_io_ops();
/*
@@ -427,7 +320,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
*/
early_tdx_detect();
- console_init();
+ init_bare_console();
/*
* Save RSDP address for later use. Have this after console_init()
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 033db9b536e6..38d31bec062d 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -57,8 +57,8 @@ extern memptr free_mem_end_ptr;
void *malloc(int size);
void free(void *where);
extern struct boot_params *boot_params;
-void __putstr(const char *s);
-void __puthex(unsigned long value);
+extern void (*__putstr)(const char *s);
+extern void (*__puthex)(unsigned long value);
#define error_putstr(__x) __putstr(__x)
#define error_puthex(__x) __puthex(__x)
@@ -128,6 +128,11 @@ static inline void console_init(void)
{ }
#endif
+/* putstr.c */
+void init_bare_console(void);
+void init_console_func(void (*putstr_)(const char *),
+ void (*puthex_)(unsigned long));
+
#ifdef CONFIG_AMD_MEM_ENCRYPT
void sev_enable(struct boot_params *bp);
void sev_es_shutdown_ghcb(void);
diff --git a/arch/x86/boot/compressed/putstr.c b/arch/x86/boot/compressed/putstr.c
new file mode 100644
index 000000000000..44a4c3dacec5
--- /dev/null
+++ b/arch/x86/boot/compressed/putstr.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "misc.h"
+
+/* These might be accessed before .bss is cleared, so use .data instead. */
+static char *vidmem __section(".data");
+static int vidport __section(".data");
+static int lines __section(".data");
+static int cols __section(".data");
+
+void (*__putstr)(const char *s);
+void (*__puthex)(unsigned long value);
+
+static void putstr(const char *s);
+static void puthex(unsigned long value);
+
+void init_console_func(void (*putstr_)(const char *),
+ void (*puthex_)(unsigned long))
+{
+ __putstr = putstr_;
+ __puthex = puthex_;
+}
+
+void init_bare_console(void)
+{
+ init_console_func(putstr, puthex);
+
+ if (boot_params->screen_info.orig_video_mode == 7) {
+ vidmem = (char *) 0xb0000;
+ vidport = 0x3b4;
+ } else {
+ vidmem = (char *) 0xb8000;
+ vidport = 0x3d4;
+ }
+
+ lines = boot_params->screen_info.orig_video_lines;
+ cols = boot_params->screen_info.orig_video_cols;
+
+ console_init();
+}
+
+static void scroll(void)
+{
+ int i;
+
+ memmove(vidmem, vidmem + cols * 2, (lines - 1) * cols * 2);
+ for (i = (lines - 1) * cols * 2; i < lines * cols * 2; i += 2)
+ vidmem[i] = ' ';
+}
+
+#define XMTRDY 0x20
+
+#define TXR 0 /* Transmit register (WRITE) */
+#define LSR 5 /* Line Status */
+
+static void serial_putchar(int ch)
+{
+ unsigned int timeout = 0xffff;
+
+ while ((inb(early_serial_base + LSR) & XMTRDY) == 0 && --timeout)
+ cpu_relax();
+
+ outb(ch, early_serial_base + TXR);
+}
+
+static void putstr(const char *s)
+{
+ int x, y, pos;
+ char c;
+
+ if (early_serial_base) {
+ const char *str = s;
+
+ while (*str) {
+ if (*str == '\n')
+ serial_putchar('\r');
+ serial_putchar(*str++);
+ }
+ }
+
+ if (lines == 0 || cols == 0)
+ return;
+
+ x = boot_params->screen_info.orig_x;
+ y = boot_params->screen_info.orig_y;
+
+ while ((c = *s++) != '\0') {
+ if (c == '\n') {
+ x = 0;
+ if (++y >= lines) {
+ scroll();
+ y--;
+ }
+ } else {
+ vidmem[(x + cols * y) * 2] = c;
+ if (++x >= cols) {
+ x = 0;
+ if (++y >= lines) {
+ scroll();
+ y--;
+ }
+ }
+ }
+ }
+
+ boot_params->screen_info.orig_x = x;
+ boot_params->screen_info.orig_y = y;
+
+ pos = (x + cols * y) * 2; /* Update cursor position */
+ outb(14, vidport);
+ outb(0xff & (pos >> 9), vidport+1);
+ outb(15, vidport);
+ outb(0xff & (pos >> 1), vidport+1);
+}
+
+static void puthex(unsigned long value)
+{
+ char alpha[2] = "0";
+ int bits;
+
+ for (bits = sizeof(value) * 8 - 4; bits >= 0; bits -= 4) {
+ unsigned long digit = (value >> bits) & 0xf;
+
+ if (digit < 0xA)
+ alpha[0] = '0' + digit;
+ else
+ alpha[0] = 'a' + (digit - 0xA);
+
+ putstr(alpha);
+ }
+}
--
2.37.4
This is required to fit more sections in PE section tables,
since its size is restricted by zero page located at specific offset
after the PE header.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/header.S | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 9338c68e7413..9fec80bc504b 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -59,17 +59,16 @@ start2:
cld
movw $bugger_off_msg, %si
+ movw $bugger_off_msg_size, %cx
msg_loop:
lodsb
- andb %al, %al
- jz bs_die
movb $0xe, %ah
movw $7, %bx
int $0x10
- jmp msg_loop
+ decw %cx
+ jnz msg_loop
-bs_die:
# Allow the user to press a key, then reboot
xorw %ax, %ax
int $0x16
@@ -90,10 +89,9 @@ bs_die:
.section ".bsdata", "a"
bugger_off_msg:
- .ascii "Use a boot loader.\r\n"
- .ascii "\n"
- .ascii "Remove disk and press any key to reboot...\r\n"
- .byte 0
+ .ascii "Use a boot loader. "
+ .ascii "Press a key to reboot"
+ .set bugger_off_msg_size, . - bugger_off_msg
#ifdef CONFIG_EFI_STUB
pe_header:
--
2.37.4
Ensure WP bit to be set to prevent boot code from writing to
non-writable memory pages.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a75712991df3..9f2e8f50fc71 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -660,9 +660,8 @@ SYM_CODE_START(trampoline_32bit_src)
pushl $__KERNEL_CS
pushl %eax
- /* Enable paging again. */
- movl %cr0, %eax
- btsl $X86_CR0_PG_BIT, %eax
+ /* Enable paging and set CR0 to known state (this also sets WP flag) */
+ movl $CR0_STATE, %eax
movl %eax, %cr0
lret
--
2.37.4
Use newer C standard. Since kernel requires C99 compiler now,
we can make use of the new features to make the core more readable.
Use mmap() for reading files also to make things simpler.
Replace most magic numbers with defines.
Should have no functional changes. This is done in preparation for the
next changes that makes generated PE header more spec compliant.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/tools/build.c | 387 +++++++++++++++++++++++-------------
1 file changed, 245 insertions(+), 142 deletions(-)
diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index bd247692b701..fbc5315af032 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -25,20 +25,21 @@
* Substantially overhauled by H. Peter Anvin, April 2007
*/
+#include <fcntl.h>
+#include <stdarg.h>
+#include <stdint.h>
#include <stdio.h>
-#include <string.h>
#include <stdlib.h>
-#include <stdarg.h>
-#include <sys/types.h>
+#include <string.h>
+#include <sys/mman.h>
#include <sys/stat.h>
+#include <sys/types.h>
#include <unistd.h>
-#include <fcntl.h>
-#include <sys/mman.h>
+
#include <tools/le_byteshift.h>
+#include <linux/pe.h>
-typedef unsigned char u8;
-typedef unsigned short u16;
-typedef unsigned int u32;
+#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
#define DEFAULT_MAJOR_ROOT 0
#define DEFAULT_MINOR_ROOT 0
@@ -48,8 +49,13 @@ typedef unsigned int u32;
#define SETUP_SECT_MIN 5
#define SETUP_SECT_MAX 64
+#define PARAGRAPH_SIZE 16
+#define SECTOR_SIZE 512
+#define FILE_ALIGNMENT 512
+#define SECTION_ALIGNMENT 4096
+
/* This must be large enough to hold the entire setup */
-u8 buf[SETUP_SECT_MAX*512];
+uint8_t buf[SETUP_SECT_MAX*SECTOR_SIZE];
#define PECOFF_RELOC_RESERVE 0x20
@@ -59,6 +65,52 @@ u8 buf[SETUP_SECT_MAX*512];
#define PECOFF_COMPAT_RESERVE 0x0
#endif
+#define RELOC_SECTION_SIZE 10
+
+/* PE header has different format depending on the architecture */
+#ifdef CONFIG_X86_64
+typedef struct pe32plus_opt_hdr pe_opt_hdr;
+#else
+typedef struct pe32_opt_hdr pe_opt_hdr;
+#endif
+
+static inline struct pe_hdr *get_pe_header(uint8_t *buf)
+{
+ uint32_t pe_offset = get_unaligned_le32(buf+MZ_HEADER_PEADDR_OFFSET);
+ return (struct pe_hdr *)(buf + pe_offset);
+}
+
+static inline pe_opt_hdr *get_pe_opt_header(uint8_t *buf)
+{
+ return (pe_opt_hdr *)(get_pe_header(buf) + 1);
+}
+
+static inline struct section_header *get_sections(uint8_t *buf)
+{
+ pe_opt_hdr *hdr = get_pe_opt_header(buf);
+ uint32_t n_data_dirs = get_unaligned_le32(&hdr->data_dirs);
+ uint8_t *sections = (uint8_t *)(hdr + 1) + n_data_dirs*sizeof(struct data_dirent);
+ return (struct section_header *)sections;
+}
+
+static inline struct data_directory *get_data_dirs(uint8_t *buf)
+{
+ pe_opt_hdr *hdr = get_pe_opt_header(buf);
+ return (struct data_directory *)(hdr + 1);
+}
+
+#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
+#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | IMAGE_SCN_ALIGN_4096BYTES)
+#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
+#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
+#else
+/* With memory protection disabled all sections are RWX */
+#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
+ IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
+#define SCN_RX SCN_RW
+#define SCN_RO SCN_RW
+#endif
+
static unsigned long efi32_stub_entry;
static unsigned long efi64_stub_entry;
static unsigned long efi_pe_entry;
@@ -70,7 +122,7 @@ static unsigned long _end;
/*----------------------------------------------------------------------*/
-static const u32 crctab32[] = {
+static const uint32_t crctab32[] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419,
0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4,
0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07,
@@ -125,12 +177,12 @@ static const u32 crctab32[] = {
0x2d02ef8d
};
-static u32 partial_crc32_one(u8 c, u32 crc)
+static uint32_t partial_crc32_one(uint8_t c, uint32_t crc)
{
return crctab32[(crc ^ c) & 0xff] ^ (crc >> 8);
}
-static u32 partial_crc32(const u8 *s, int len, u32 crc)
+static uint32_t partial_crc32(const uint8_t *s, int len, uint32_t crc)
{
while (len--)
crc = partial_crc32_one(*s++, crc);
@@ -152,57 +204,106 @@ static void usage(void)
die("Usage: build setup system zoffset.h image");
}
+static void *map_file(const char *path, size_t *psize)
+{
+ struct stat statbuf;
+ size_t size;
+ void *addr;
+ int fd;
+
+ fd = open(path, O_RDONLY);
+ if (fd < 0)
+ die("Unable to open `%s': %m", path);
+ if (fstat(fd, &statbuf))
+ die("Unable to stat `%s': %m", path);
+
+ size = statbuf.st_size;
+ /*
+ * Map one byte more, to allow adding null-terminator
+ * for text files.
+ */
+ addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+ if (addr == MAP_FAILED)
+ die("Unable to mmap '%s': %m", path);
+
+ close(fd);
+
+ *psize = size;
+ return addr;
+}
+
+static void unmap_file(void *addr, size_t size)
+{
+ munmap(addr, size + 1);
+}
+
+static void *map_output_file(const char *path, size_t size)
+{
+ void *addr;
+ int fd;
+
+ fd = open(path, O_RDWR | O_CREAT, 0660);
+ if (fd < 0)
+ die("Unable to create `%s': %m", path);
+
+ if (ftruncate(fd, size))
+ die("Unable to resize `%s': %m", path);
+
+ addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+ if (addr == MAP_FAILED)
+ die("Unable to mmap '%s': %m", path);
+
+ return addr;
+}
+
#ifdef CONFIG_EFI_STUB
-static void update_pecoff_section_header_fields(char *section_name, u32 vma, u32 size, u32 datasz, u32 offset)
+static void update_pecoff_section_header_fields(char *section_name, uint32_t vma,
+ uint32_t size, uint32_t datasz,
+ uint32_t offset)
{
unsigned int pe_header;
unsigned short num_sections;
- u8 *section;
+ struct section_header *section;
- pe_header = get_unaligned_le32(&buf[0x3c]);
- num_sections = get_unaligned_le16(&buf[pe_header + 6]);
-
-#ifdef CONFIG_X86_32
- section = &buf[pe_header + 0xa8];
-#else
- section = &buf[pe_header + 0xb8];
-#endif
+ struct pe_hdr *hdr = get_pe_header(buf);
+ num_sections = get_unaligned_le16(&hdr->sections);
+ section = get_sections(buf);
while (num_sections > 0) {
- if (strncmp((char*)section, section_name, 8) == 0) {
+ if (strncmp(section->name, section_name, 8) == 0) {
/* section header size field */
- put_unaligned_le32(size, section + 0x8);
+ put_unaligned_le32(size, §ion->virtual_size);
/* section header vma field */
- put_unaligned_le32(vma, section + 0xc);
+ put_unaligned_le32(vma, §ion->virtual_address);
/* section header 'size of initialised data' field */
- put_unaligned_le32(datasz, section + 0x10);
+ put_unaligned_le32(datasz, §ion->raw_data_size);
/* section header 'file offset' field */
- put_unaligned_le32(offset, section + 0x14);
+ put_unaligned_le32(offset, §ion->data_addr);
break;
}
- section += 0x28;
+ section++;
num_sections--;
}
}
-static void update_pecoff_section_header(char *section_name, u32 offset, u32 size)
+static void update_pecoff_section_header(char *section_name, uint32_t offset, uint32_t size)
{
update_pecoff_section_header_fields(section_name, offset, size, size, offset);
}
static void update_pecoff_setup_and_reloc(unsigned int size)
{
- u32 setup_offset = 0x200;
- u32 reloc_offset = size - PECOFF_RELOC_RESERVE - PECOFF_COMPAT_RESERVE;
+ uint32_t setup_offset = SECTOR_SIZE;
+ uint32_t reloc_offset = size - PECOFF_RELOC_RESERVE - PECOFF_COMPAT_RESERVE;
#ifdef CONFIG_EFI_MIXED
- u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
+ uint32_t compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
#endif
- u32 setup_size = reloc_offset - setup_offset;
+ uint32_t setup_size = reloc_offset - setup_offset;
update_pecoff_section_header(".setup", setup_offset, setup_size);
update_pecoff_section_header(".reloc", reloc_offset, PECOFF_RELOC_RESERVE);
@@ -211,8 +312,8 @@ static void update_pecoff_setup_and_reloc(unsigned int size)
* Modify .reloc section contents with a single entry. The
* relocation is applied to offset 10 of the relocation section.
*/
- put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
- put_unaligned_le32(10, &buf[reloc_offset + 4]);
+ put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE, &buf[reloc_offset]);
+ put_unaligned_le32(RELOC_SECTION_SIZE, &buf[reloc_offset + 4]);
#ifdef CONFIG_EFI_MIXED
update_pecoff_section_header(".compat", compat_offset, PECOFF_COMPAT_RESERVE);
@@ -224,19 +325,17 @@ static void update_pecoff_setup_and_reloc(unsigned int size)
*/
buf[compat_offset] = 0x1;
buf[compat_offset + 1] = 0x8;
- put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
+ put_unaligned_le16(IMAGE_FILE_MACHINE_I386, &buf[compat_offset + 2]);
put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset + 4]);
#endif
}
-static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
+static unsigned int update_pecoff_sections(unsigned int text_start, unsigned int text_sz,
unsigned int init_sz)
{
- unsigned int pe_header;
- unsigned int text_sz = file_sz - text_start;
+ unsigned int file_sz = text_start + text_sz;
unsigned int bss_sz = init_sz - file_sz;
-
- pe_header = get_unaligned_le32(&buf[0x3c]);
+ pe_opt_hdr *hdr = get_pe_opt_header(buf);
/*
* The PE/COFF loader may load the image at an address which is
@@ -254,18 +353,20 @@ static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
* Size of code: Subtract the size of the first sector (512 bytes)
* which includes the header.
*/
- put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header + 0x1c]);
+ put_unaligned_le32(file_sz - SECTOR_SIZE + bss_sz, &hdr->text_size);
/* Size of image */
- put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
+ put_unaligned_le32(init_sz, &hdr->image_size);
/*
* Address of entry point for PE/COFF executable
*/
- put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header + 0x28]);
+ put_unaligned_le32(text_start + efi_pe_entry, &hdr->entry_point);
update_pecoff_section_header_fields(".text", text_start, text_sz + bss_sz,
text_sz, text_start);
+
+ return text_start + file_sz;
}
static int reserve_pecoff_reloc_section(int c)
@@ -275,7 +376,7 @@ static int reserve_pecoff_reloc_section(int c)
return PECOFF_RELOC_RESERVE;
}
-static void efi_stub_defaults(void)
+static void efi_stub_update_defaults(void)
{
/* Defaults for old kernel */
#ifdef CONFIG_X86_32
@@ -298,7 +399,7 @@ static void efi_stub_entry_update(void)
#ifdef CONFIG_EFI_MIXED
if (efi32_stub_entry != addr)
- die("32-bit and 64-bit EFI entry points do not match\n");
+ die("32-bit and 64-bit EFI entry points do not match");
#endif
#endif
put_unaligned_le32(addr, &buf[0x264]);
@@ -310,7 +411,7 @@ static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
static inline void update_pecoff_text(unsigned int text_start,
unsigned int file_sz,
unsigned int init_sz) {}
-static inline void efi_stub_defaults(void) {}
+static inline void efi_stub_update_defaults(void) {}
static inline void efi_stub_entry_update(void) {}
static inline int reserve_pecoff_reloc_section(int c)
@@ -338,20 +439,15 @@ static int reserve_pecoff_compat_section(int c)
static void parse_zoffset(char *fname)
{
- FILE *file;
- char *p;
- int c;
+ size_t size;
+ char *data, *p;
- file = fopen(fname, "r");
- if (!file)
- die("Unable to open `%s': %m", fname);
- c = fread(buf, 1, sizeof(buf) - 1, file);
- if (ferror(file))
- die("read-error on `zoffset.h'");
- fclose(file);
- buf[c] = 0;
+ data = map_file(fname, &size);
- p = (char *)buf;
+ /* We can do that, since we mapped one byte more */
+ data[size] = 0;
+
+ p = (char *)data;
while (p && *p) {
PARSE_ZOFS(p, efi32_stub_entry);
@@ -367,82 +463,99 @@ static void parse_zoffset(char *fname)
while (p && (*p == '\r' || *p == '\n'))
p++;
}
+
+ unmap_file(data, size);
}
-int main(int argc, char ** argv)
+static unsigned int read_setup(char *path)
{
- unsigned int i, sz, setup_sectors, init_sz;
- int c;
- u32 sys_size;
- struct stat sb;
- FILE *file, *dest;
- int fd;
- void *kernel;
- u32 crc = 0xffffffffUL;
-
- efi_stub_defaults();
-
- if (argc != 5)
- usage();
- parse_zoffset(argv[3]);
-
- dest = fopen(argv[4], "w");
- if (!dest)
- die("Unable to write `%s': %m", argv[4]);
+ FILE *file;
+ unsigned int setup_size, file_size;
/* Copy the setup code */
- file = fopen(argv[1], "r");
+ file = fopen(path, "r");
if (!file)
- die("Unable to open `%s': %m", argv[1]);
- c = fread(buf, 1, sizeof(buf), file);
+ die("Unable to open `%s': %m", path);
+
+ file_size = fread(buf, 1, sizeof(buf), file);
if (ferror(file))
die("read-error on `setup'");
- if (c < 1024)
+
+ if (file_size < 2 * SECTOR_SIZE)
die("The setup must be at least 1024 bytes");
- if (get_unaligned_le16(&buf[510]) != 0xAA55)
+
+ if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
die("Boot block hasn't got boot flag (0xAA55)");
+
fclose(file);
- c += reserve_pecoff_compat_section(c);
- c += reserve_pecoff_reloc_section(c);
+ /* Reserve space for PE sections */
+ file_size += reserve_pecoff_compat_section(file_size);
+ file_size += reserve_pecoff_reloc_section(file_size);
/* Pad unused space with zeros */
- setup_sectors = (c + 511) / 512;
- if (setup_sectors < SETUP_SECT_MIN)
- setup_sectors = SETUP_SECT_MIN;
- i = setup_sectors*512;
- memset(buf+c, 0, i-c);
- update_pecoff_setup_and_reloc(i);
+ setup_size = round_up(file_size, SECTOR_SIZE);
+
+ if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
+ setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
+
+ /*
+ * Global buffer is already initialised
+ * to 0, but just in case, zero out padding.
+ */
+
+ memset(buf + file_size, 0, setup_size - file_size);
+
+ return setup_size;
+}
+
+int main(int argc, char **argv)
+{
+ size_t kern_file_size;
+ unsigned int setup_size;
+ unsigned int setup_sectors;
+ unsigned int init_size;
+ unsigned int total_size;
+ unsigned int kern_size;
+ void *kernel;
+ uint32_t crc = 0xffffffffUL;
+ uint8_t *output;
+
+ if (argc != 5)
+ usage();
+
+ efi_stub_update_defaults();
+ parse_zoffset(argv[3]);
+
+ setup_size = read_setup(argv[1]);
+
+ setup_sectors = setup_size/SECTOR_SIZE;
/* Set the default root device */
put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
- /* Open and stat the kernel file */
- fd = open(argv[2], O_RDONLY);
- if (fd < 0)
- die("Unable to open `%s': %m", argv[2]);
- if (fstat(fd, &sb))
- die("Unable to stat `%s': %m", argv[2]);
- sz = sb.st_size;
- kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
- if (kernel == MAP_FAILED)
- die("Unable to mmap '%s': %m", argv[2]);
- /* Number of 16-byte paragraphs, including space for a 4-byte CRC */
- sys_size = (sz + 15 + 4) / 16;
+ /* Map kernel file to memory */
+ kernel = map_file(argv[2], &kern_file_size);
+
#ifdef CONFIG_EFI_STUB
- /*
- * COFF requires minimum 32-byte alignment of sections, and
- * adding a signature is problematic without that alignment.
- */
- sys_size = (sys_size + 1) & ~1;
+ /* PE specification require 512-byte minimum section file alignment */
+ kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
+ update_pecoff_setup_and_reloc(setup_size);
+#else
+ /* Number of 16-byte paragraphs, including space for a 4-byte CRC */
+ kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
#endif
/* Patch the setup code with the appropriate size parameters */
- buf[0x1f1] = setup_sectors-1;
- put_unaligned_le32(sys_size, &buf[0x1f4]);
+ buf[0x1f1] = setup_sectors - 1;
+ put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
+
+ /* Update kernel_info offset. */
+ put_unaligned_le32(kernel_info, &buf[0x268]);
+
+ init_size = get_unaligned_le32(&buf[0x260]);
- init_sz = get_unaligned_le32(&buf[0x260]);
#ifdef CONFIG_EFI_STUB
/*
* The decompression buffer will start at ImageBase. When relocating
@@ -458,45 +571,35 @@ int main(int argc, char ** argv)
* For future-proofing, increase init_sz if necessary.
*/
- if (init_sz - _end < i + _ehead) {
- init_sz = (i + _ehead + _end + 4095) & ~4095;
- put_unaligned_le32(init_sz, &buf[0x260]);
+ if (init_size - _end < setup_size + _ehead) {
+ init_size = round_up(setup_size + _ehead + _end, SECTION_ALIGNMENT);
+ put_unaligned_le32(init_size, &buf[0x260]);
}
-#endif
- update_pecoff_text(setup_sectors * 512, i + (sys_size * 16), init_sz);
- efi_stub_entry_update();
-
- /* Update kernel_info offset. */
- put_unaligned_le32(kernel_info, &buf[0x268]);
+ total_size = update_pecoff_sections(setup_size, kern_size, init_size);
- crc = partial_crc32(buf, i, crc);
- if (fwrite(buf, 1, i, dest) != i)
- die("Writing setup failed");
+ efi_stub_entry_update();
+#else
+ (void)init_size;
+ total_size = setup_size + kern_size;
+#endif
- /* Copy the kernel code */
- crc = partial_crc32(kernel, sz, crc);
- if (fwrite(kernel, 1, sz, dest) != sz)
- die("Writing kernel failed");
+ output = map_output_file(argv[4], total_size);
- /* Add padding leaving 4 bytes for the checksum */
- while (sz++ < (sys_size*16) - 4) {
- crc = partial_crc32_one('\0', crc);
- if (fwrite("\0", 1, 1, dest) != 1)
- die("Writing padding failed");
- }
+ memcpy(output, buf, setup_size);
+ memcpy(output + setup_size, kernel, kern_file_size);
+ memset(output + setup_size + kern_file_size, 0, kern_size - kern_file_size);
- /* Write the CRC */
- put_unaligned_le32(crc, buf);
- if (fwrite(buf, 1, 4, dest) != 4)
- die("Writing CRC failed");
+ /* Calculate and write kernel checksum. */
+ crc = partial_crc32(output, total_size - 4, crc);
+ put_unaligned_le32(crc, &output[total_size - 4]);
- /* Catch any delayed write failures */
- if (fclose(dest))
- die("Writing image failed");
+ /* Catch any delayed write failures. */
+ if (munmap(output, total_size) < 0)
+ die("Writing kernel failed");
- close(fd);
+ unmap_file(kernel, kern_file_size);
- /* Everything is OK */
+ /* Everything is OK. */
return 0;
}
--
2.37.4
Set lower limit of physical KASLR to 64M.
Previously is was set to 512M when kernel is loaded higher than that.
That prevented physical KASLR from being performed on x86_32, where
upper limit is also set to 512M. The limit is pretty arbitrary, and the
most important is to set it above the ISA hole, i.e. higher than 16M.
It was not that important before, but now kernel is not getting
relocated to the lower address when booting via EFI, exposing the
KASLR failures.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/compressed/kaslr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index c0ee116c4fa2..74d1327adbba 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -852,10 +852,10 @@ void choose_random_location(unsigned long input,
/*
* Low end of the randomization range should be the
- * smaller of 512M or the initial kernel image
+ * smaller of 64M or the initial kernel image
* location:
*/
- min_addr = min(*output, 512UL << 20);
+ min_addr = min(*output, 64UL << 20);
/* Make sure minimum is aligned. */
min_addr = ALIGN(min_addr, CONFIG_PHYSICAL_ALIGN);
--
2.37.4
After every implicit mapping is removed, this code is no longer needed.
Remove memory mapping from page fault handler to ensure that there are
no hidden invalid memory accesses.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/compressed/ident_map_64.c | 26 ++++++++++---------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index fec795a4ce23..ba5108c58a4e 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -386,27 +386,21 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
{
unsigned long address = native_read_cr2();
unsigned long end;
- bool ghcb_fault;
+ char *msg;
- ghcb_fault = sev_es_check_ghcb_fault(address);
+ if (sev_es_check_ghcb_fault(address))
+ msg = "Page-fault on GHCB page:";
+ else
+ msg = "Unexpected page-fault:";
address &= PMD_MASK;
end = address + PMD_SIZE;
/*
- * Check for unexpected error codes. Unexpected are:
- * - Faults on present pages
- * - User faults
- * - Reserved bits set
- */
- if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
- do_pf_error("Unexpected page-fault:", error_code, address, regs->ip);
- else if (ghcb_fault)
- do_pf_error("Page-fault on GHCB page:", error_code, address, regs->ip);
-
- /*
- * Error code is sane - now identity map the 2M region around
- * the faulting address.
+ * Since all memory allocations are made explicit
+ * now, every page fault at this stage is an
+ * error and the error handler is there only
+ * for debug purposes.
*/
- kernel_add_identity_map(address, end, MAP_WRITE);
+ do_pf_error(msg, error_code, address, regs->ip);
}
--
2.37.4
From: Peter Jones <[email protected]>
efi_warn() doesn't put newlines on messages, and that makes reading
warnings without newlines hard to do.
Signed-off-by: Peter Jones <[email protected]>
---
drivers/firmware/efi/libstub/mem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index 07d54c88c62e..b31d1975caa2 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -297,7 +297,7 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
rounded_end - rounded_start,
attr_clear);
if (status != EFI_SUCCESS) {
- efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx",
+ efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx\n",
(unsigned long)rounded_start,
(unsigned long)rounded_end,
status);
@@ -310,7 +310,7 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
rounded_end - rounded_start,
attributes);
if (status != EFI_SUCCESS) {
- efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx",
+ efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx\n",
(unsigned long)rounded_start,
(unsigned long)rounded_end,
status);
--
2.37.4
Doing it that way allows setting up stricter memory attributes,
simplifies boot code path and removes potential relocation
of kernel image.
Wire up required interfaces and minimally initialize zero page
fields needed for it to function correctly.
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 50 ++++-
arch/x86/boot/compressed/head_64.S | 58 ++++-
drivers/firmware/efi/Kconfig | 2 +
drivers/firmware/efi/libstub/Makefile | 2 +-
.../firmware/efi/libstub/x86-extract-direct.c | 208 ++++++++++++++++++
drivers/firmware/efi/libstub/x86-stub.c | 119 +---------
drivers/firmware/efi/libstub/x86-stub.h | 14 ++
7 files changed, 338 insertions(+), 115 deletions(-)
create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index ead6007df1e5..0be75e5072ae 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -152,11 +152,57 @@ SYM_FUNC_END(startup_32)
#ifdef CONFIG_EFI_STUB
SYM_FUNC_START(efi32_stub_entry)
+/*
+ * Calculate the delta between where we were compiled to run
+ * at and where we were actually loaded at. This can only be done
+ * with a short local call on x86. Nothing else will tell us what
+ * address we are running at. The reserved chunk of the real-mode
+ * data at 0x1e4 (defined as a scratch field) are used as the stack
+ * for this calculation. Only 4 bytes are needed.
+ */
+ call 1f
+1: popl %ebx
+ addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %ebx
+
+ /* Clear BSS */
+ xorl %eax, %eax
+ leal _bss@GOTOFF(%ebx), %edi
+ leal _ebss@GOTOFF(%ebx), %ecx
+ subl %edi, %ecx
+ shrl $2, %ecx
+ rep stosl
+
add $0x4, %esp
movl 8(%esp), %esi /* save boot_params pointer */
+ movl %edx, %edi /* save GOT address */
call efi_main
- /* efi_main returns the possibly relocated address of startup_32 */
- jmp *%eax
+ movl %eax, %ecx
+
+ /*
+ * efi_main returns the possibly
+ * relocated address of extracted kernel entry point.
+ */
+
+ cli
+
+ /* Load new GDT */
+ leal gdt@GOTOFF(%ebx), %eax
+ movl %eax, 2(%eax)
+ lgdt (%eax)
+
+ /* Load segment registers with our descriptors */
+ movl $__BOOT_DS, %eax
+ movl %eax, %ds
+ movl %eax, %es
+ movl %eax, %fs
+ movl %eax, %gs
+ movl %eax, %ss
+
+ /* Zero EFLAGS */
+ pushl $0
+ popfl
+
+ jmp *%ecx
SYM_FUNC_END(efi32_stub_entry)
SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
#endif
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2dd8be0583d2..7cfef7bd0424 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -529,12 +529,64 @@ SYM_CODE_END(startup_64)
.org 0x390
#endif
SYM_FUNC_START(efi64_stub_entry)
+ /* Preserve first parameter */
+ movq %rdi, %r10
+
+ /* Clear BSS */
+ xorl %eax, %eax
+ leaq _bss(%rip), %rdi
+ leaq _ebss(%rip), %rcx
+ subq %rdi, %rcx
+ shrq $3, %rcx
+ rep stosq
+
and $~0xf, %rsp /* realign the stack */
movq %rdx, %rbx /* save boot_params pointer */
+ movq %r10, %rdi
call efi_main
- movq %rbx,%rsi
- leaq rva(startup_64)(%rax), %rax
- jmp *%rax
+
+ cld
+ cli
+
+ movq %rbx, %rdi /* boot_params */
+ movq %rax, %rsi /* decompressed kernel address */
+
+ /* Make sure we have GDT with 32-bit code segment */
+ leaq gdt64(%rip), %rax
+ addq %rax, 2(%rax)
+ lgdt (%rax)
+
+ /* Setup data segments. */
+ xorl %eax, %eax
+ movl %eax, %ds
+ movl %eax, %es
+ movl %eax, %ss
+ movl %eax, %fs
+ movl %eax, %gs
+
+ pushq %rsi
+ pushq %rdi
+
+ call load_stage1_idt
+ call enable_nx_if_supported
+
+ call trampoline_pgtable_init
+ movq %rax, %rdx
+
+
+ /* Swap %rsi and %rsi */
+ popq %rsi
+ popq %rdi
+
+ /* Save the trampoline address in RCX */
+ movq trampoline_32bit(%rip), %rcx
+
+ /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
+ pushq $__KERNEL32_CS
+ leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rcx), %rax
+ pushq %rax
+ lretq
+
SYM_FUNC_END(efi64_stub_entry)
SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
#endif
diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 043ca31c114e..f50c2a84a754 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -58,6 +58,8 @@ config EFI_DXE_MEM_ATTRIBUTES
Use DXE services to check and alter memory protection
attributes during boot via EFISTUB to ensure that memory
ranges used by the kernel are writable and executable.
+ This option also enables stricter memory attributes
+ on compressed kernel PE image.
config EFI_PARAMS_FROM_FDT
bool
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index be8b8c6e8b40..99b81c95344c 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -88,7 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \
lib-$(CONFIG_ARM) += arm32-stub.o
lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o arm64-entry.o smbios.o
-lib-$(CONFIG_X86) += x86-stub.o
+lib-$(CONFIG_X86) += x86-stub.o x86-extract-direct.o
lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o
diff --git a/drivers/firmware/efi/libstub/x86-extract-direct.c b/drivers/firmware/efi/libstub/x86-extract-direct.c
new file mode 100644
index 000000000000..4ecbc4a9b3ed
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-extract-direct.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/acpi.h>
+#include <linux/efi.h>
+#include <linux/elf.h>
+#include <linux/stddef.h>
+
+#include <asm/efi.h>
+#include <asm/e820/types.h>
+#include <asm/desc.h>
+#include <asm/boot.h>
+#include <asm/bootparam_utils.h>
+#include <asm/shared/extract.h>
+#include <asm/shared/pgtable.h>
+
+#include "efistub.h"
+#include "x86-stub.h"
+
+static efi_handle_t image_handle;
+
+static void do_puthex(unsigned long value)
+{
+ efi_printk("%08lx", value);
+}
+
+static void do_putstr(const char *msg)
+{
+ efi_printk("%s", msg);
+}
+
+static unsigned long do_map_range(unsigned long start,
+ unsigned long end,
+ unsigned int flags)
+{
+ efi_status_t status;
+
+ unsigned long size = end - start;
+
+ if (flags & MAP_ALLOC) {
+ unsigned long addr;
+
+ status = efi_low_alloc_above(size, CONFIG_PHYSICAL_ALIGN,
+ &addr, start);
+ if (status != EFI_SUCCESS) {
+ efi_err("Unable to allocate memory for uncompressed kernel");
+ efi_exit(image_handle, EFI_OUT_OF_RESOURCES);
+ }
+
+ if (start != addr) {
+ efi_debug("Unable to allocate at given address"
+ " (desired=0x%lx, actual=0x%lx)",
+ (unsigned long)start, addr);
+ start = addr;
+ }
+ }
+
+ if ((flags & (MAP_PROTECT | MAP_ALLOC)) &&
+ IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
+ unsigned long attr = 0;
+
+ if (!(flags & MAP_EXEC))
+ attr |= EFI_MEMORY_XP;
+
+ if (!(flags & MAP_WRITE))
+ attr |= EFI_MEMORY_RO;
+
+ status = efi_adjust_memory_range_protection(start, size, attr);
+ if (status != EFI_SUCCESS)
+ efi_err("Unable to protect memory range");
+ }
+
+ return start;
+}
+
+/*
+ * Trampoline takes 3 pages and can be loaded in first megabyte of memory
+ * with its end placed between 0 and 640k where BIOS might start.
+ * (see arch/x86/boot/compressed/pgtable_64.c)
+ */
+
+#ifdef CONFIG_64BIT
+static efi_status_t prepare_trampoline(void)
+{
+ efi_status_t status;
+
+ status = efi_allocate_pages(TRAMPOLINE_32BIT_SIZE,
+ (unsigned long *)&trampoline_32bit,
+ TRAMPOLINE_32BIT_PLACEMENT_MAX);
+
+ if (status != EFI_SUCCESS)
+ return status;
+
+ unsigned long trampoline_start = (unsigned long)trampoline_32bit;
+
+ memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
+
+ if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
+ /* First page of trampoline is a top level page table */
+ efi_adjust_memory_range_protection(trampoline_start,
+ PAGE_SIZE,
+ EFI_MEMORY_XP);
+ }
+
+ /* Second page of trampoline is the code (with a padding) */
+
+ void *caddr = (void *)trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET;
+
+ memcpy(caddr, trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+
+ if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
+ efi_adjust_memory_range_protection((unsigned long)caddr,
+ PAGE_SIZE,
+ EFI_MEMORY_RO);
+
+ /* And the last page of trampoline is the stack */
+
+ efi_adjust_memory_range_protection(trampoline_start + 2 * PAGE_SIZE,
+ PAGE_SIZE,
+ EFI_MEMORY_XP);
+ }
+
+ return EFI_SUCCESS;
+}
+#else
+static inline efi_status_t prepare_trampoline(void)
+{
+ return EFI_SUCCESS;
+}
+#endif
+
+static efi_status_t init_loader_data(efi_handle_t handle,
+ struct boot_params *params,
+ struct efi_boot_memmap **map)
+{
+ struct efi_info *efi = (void *)¶ms->efi_info;
+ efi_status_t status;
+
+ status = efi_get_memory_map(map, false);
+
+ if (status != EFI_SUCCESS) {
+ efi_err("Unable to get EFI memory map...\n");
+ return status;
+ }
+
+ const char *signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
+ : EFI32_LOADER_SIGNATURE;
+
+ memcpy(&efi->efi_loader_signature, signature, sizeof(__u32));
+
+ efi->efi_memdesc_size = (*map)->desc_size;
+ efi->efi_memdesc_version = (*map)->desc_ver;
+ efi->efi_memmap_size = (*map)->map_size;
+
+ efi_set_u64_split((unsigned long)(*map)->map,
+ &efi->efi_memmap, &efi->efi_memmap_hi);
+
+ efi_set_u64_split((unsigned long)efi_system_table,
+ &efi->efi_systab, &efi->efi_systab_hi);
+
+ image_handle = handle;
+
+ return EFI_SUCCESS;
+}
+
+static void free_loader_data(struct boot_params *params, struct efi_boot_memmap *map)
+{
+ struct efi_info *efi = (void *)¶ms->efi_info;
+
+ efi_bs_call(free_pool, map);
+
+ efi->efi_memdesc_size = 0;
+ efi->efi_memdesc_version = 0;
+ efi->efi_memmap_size = 0;
+ efi_set_u64_split(0, &efi->efi_memmap, &efi->efi_memmap_hi);
+}
+
+extern unsigned char input_data[];
+extern unsigned int input_len, output_len;
+
+unsigned long extract_kernel_direct(efi_handle_t handle, struct boot_params *params)
+{
+
+ void *res;
+ efi_status_t status;
+ struct efi_extract_callbacks cb = { 0 };
+
+ status = prepare_trampoline();
+
+ if (status != EFI_SUCCESS)
+ return 0;
+
+ /* Prepare environment for do_extract_kernel() call */
+ struct efi_boot_memmap *map = NULL;
+ status = init_loader_data(handle, params, &map);
+
+ if (status != EFI_SUCCESS)
+ return 0;
+
+ cb.puthex = do_puthex;
+ cb.putstr = do_putstr;
+ cb.map_range = do_map_range;
+
+ res = efi_extract_kernel(params, &cb, input_data, input_len, output_len);
+
+ free_loader_data(params, map);
+
+ return (unsigned long)res;
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 7fb1eff88a18..1d1ab1911fd3 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -17,6 +17,7 @@
#include <asm/boot.h>
#include "efistub.h"
+#include "x86-stub.h"
/* Maximum physical address for 64-bit kernel with 4-level paging */
#define MAXMEM_X86_64_4LEVEL (1ull << 46)
@@ -24,7 +25,7 @@
const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
-static efi_loaded_image_t *image = NULL;
+static efi_loaded_image_t *image __section(".data");
static efi_status_t
preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
@@ -212,55 +213,9 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
}
}
-/*
- * Trampoline takes 2 pages and can be loaded in first megabyte of memory
- * with its end placed between 128k and 640k where BIOS might start.
- * (see arch/x86/boot/compressed/pgtable_64.c)
- *
- * We cannot find exact trampoline placement since memory map
- * can be modified by UEFI, and it can alter the computed address.
- */
-
-#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
-#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-
-void startup_32(struct boot_params *boot_params);
-
-static void
-setup_memory_protection(unsigned long image_base, unsigned long image_size)
-{
- /*
- * Allow execution of possible trampoline used
- * for switching between 4- and 5-level page tables
- * and relocated kernel image.
- */
-
- efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
- TRAMPOLINE_PLACEMENT_SIZE, 0);
-
-#ifdef CONFIG_64BIT
- if (image_base != (unsigned long)startup_32)
- efi_adjust_memory_range_protection(image_base, image_size, 0);
-#else
- /*
- * Clear protection flags on a whole range of possible
- * addresses used for KASLR. We don't need to do that
- * on x86_64, since KASLR/extraction is performed after
- * dedicated identity page tables are built and we only
- * need to remove possible protection on relocated image
- * itself disregarding further relocations.
- */
- efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR,
- 0);
-#endif
-}
-
static const efi_char16_t apple[] = L"Apple";
-static void setup_quirks(struct boot_params *boot_params,
- unsigned long image_base,
- unsigned long image_size)
+static void setup_quirks(struct boot_params *boot_params)
{
efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
efi_table_attr(efi_system_table, fw_vendor);
@@ -269,9 +224,6 @@ static void setup_quirks(struct boot_params *boot_params,
if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
retrieve_apple_device_properties(boot_params);
}
-
- if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
- setup_memory_protection(image_base, image_size);
}
/*
@@ -384,7 +336,7 @@ static void setup_graphics(struct boot_params *boot_params)
}
-static void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
+void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
{
efi_bs_call(exit, handle, status, 0, NULL);
for(;;)
@@ -707,8 +659,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
}
/*
- * On success, we return the address of startup_32, which has potentially been
- * relocated by efi_relocate_kernel.
+ * On success, we return extracted kernel entry point.
* On failure, we exit to the firmware via efi_exit instead of returning.
*/
asmlinkage unsigned long efi_main(efi_handle_t handle,
@@ -733,60 +684,6 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
efi_dxe_table = NULL;
}
- /*
- * If the kernel isn't already loaded at a suitable address,
- * relocate it.
- *
- * It must be loaded above LOAD_PHYSICAL_ADDR.
- *
- * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
- * is defined as the macro MAXMEM, but unfortunately that is not a
- * compile-time constant if 5-level paging is configured, so we instead
- * define our own macro for use here.
- *
- * For 32-bit, the maximum address is complicated to figure out, for
- * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
- * KASLR uses.
- *
- * Also relocate it if image_offset is zero, i.e. the kernel wasn't
- * loaded by LoadImage, but rather by a bootloader that called the
- * handover entry. The reason we must always relocate in this case is
- * to handle the case of systemd-boot booting a unified kernel image,
- * which is a PE executable that contains the bzImage and an initrd as
- * COFF sections. The initrd section is placed after the bzImage
- * without ensuring that there are at least init_size bytes available
- * for the bzImage, and thus the compressed kernel's startup code may
- * overwrite the initrd unless it is moved out of the way.
- */
-
- buffer_start = ALIGN(bzimage_addr - image_offset,
- hdr->kernel_alignment);
- buffer_end = buffer_start + hdr->init_size;
-
- if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
- (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
- (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
- (image_offset == 0)) {
- extern char _bss[];
-
- status = efi_relocate_kernel(&bzimage_addr,
- (unsigned long)_bss - bzimage_addr,
- hdr->init_size,
- hdr->pref_address,
- hdr->kernel_alignment,
- LOAD_PHYSICAL_ADDR);
- if (status != EFI_SUCCESS) {
- efi_err("efi_relocate_kernel() failed!\n");
- goto fail;
- }
- /*
- * Now that we've copied the kernel elsewhere, we no longer
- * have a set up block before startup_32(), so reset image_offset
- * to zero in case it was set earlier.
- */
- image_offset = 0;
- }
-
#ifdef CONFIG_CMDLINE_BOOL
status = efi_parse_options(CONFIG_CMDLINE);
if (status != EFI_SUCCESS) {
@@ -843,7 +740,11 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
setup_efi_pci(boot_params);
- setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
+ setup_quirks(boot_params);
+
+ bzimage_addr = extract_kernel_direct(handle, boot_params);
+ if (!bzimage_addr)
+ goto fail;
status = exit_boot(boot_params, handle);
if (status != EFI_SUCCESS) {
diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
new file mode 100644
index 000000000000..baecc7c6e602
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-stub.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _DRIVERS_FIRMWARE_EFI_X86STUB_H
+#define _DRIVERS_FIRMWARE_EFI_X86STUB_H
+
+#include <linux/efi.h>
+
+#include <asm/bootparam.h>
+
+void __noreturn efi_exit(efi_handle_t handle, efi_status_t status);
+unsigned long extract_kernel_direct(efi_handle_t handle, struct boot_params *boot_params);
+void startup_32(struct boot_params *boot_params);
+
+#endif
--
2.37.4
Add EFI_MEMORY_ATTRIBUTE_PROTOCOL as preferred alternative to DXE
services for changing memory attributes in the EFISTUB.
Use DXE services only as a fallback in case aforementioned protocol
is not supported by UEFI implementation.
Move DXE services initialization code closer to the place they are used
to match EFI_MEMORY_ATTRIBUTE_PROTOCOL initialization code.
Tested-by: Mario Limonciello <[email protected]>
Tested-by: Peter Jones <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>
---
drivers/firmware/efi/libstub/mem.c | 168 ++++++++++++++++++------
drivers/firmware/efi/libstub/x86-stub.c | 17 ---
2 files changed, 128 insertions(+), 57 deletions(-)
diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index 3e47e5931f04..07d54c88c62e 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -5,6 +5,9 @@
#include "efistub.h"
+const efi_dxe_services_table_t *efi_dxe_table;
+efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
+
/**
* efi_get_memory_map() - get memory map
* @map: pointer to memory map pointer to which to assign the
@@ -129,66 +132,47 @@ void efi_free(unsigned long size, unsigned long addr)
efi_bs_call(free_pages, addr, nr_pages);
}
-/**
- * efi_adjust_memory_range_protection() - change memory range protection attributes
- * @start: memory range start address
- * @size: memory range size
- *
- * Actual memory range for which memory attributes are modified is
- * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
- * that includes [start, start + size].
- *
- * @return: status code
- */
-efi_status_t efi_adjust_memory_range_protection(unsigned long start,
- unsigned long size,
- unsigned long attributes)
+static void retrieve_dxe_table(void)
+{
+ efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+ if (efi_dxe_table &&
+ efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
+ efi_warn("Ignoring DXE services table: invalid signature\n");
+ efi_dxe_table = NULL;
+ }
+}
+
+static efi_status_t adjust_mem_attrib_dxe(efi_physical_addr_t rounded_start,
+ efi_physical_addr_t rounded_end,
+ unsigned long attributes)
{
efi_status_t status;
efi_gcd_memory_space_desc_t desc;
- efi_physical_addr_t end, next;
- efi_physical_addr_t rounded_start, rounded_end;
+ efi_physical_addr_t end, next, start;
efi_physical_addr_t unprotect_start, unprotect_size;
- if (efi_dxe_table == NULL)
- return EFI_UNSUPPORTED;
+ if (!efi_dxe_table) {
+ retrieve_dxe_table();
- /*
- * This function should not be used to modify attributes
- * other than writable/executable.
- */
-
- if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
- return EFI_INVALID_PARAMETER;
-
- /*
- * Disallow simultaniously executable and writable memory
- * to inforce W^X policy if direct extraction code is enabled.
- */
-
- if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0) {
- efi_warn("W^X violation at [%08lx,%08lx]\n",
- (unsigned long)rounded_start,
- (unsigned long)rounded_end);
+ if (!efi_dxe_table)
+ return EFI_UNSUPPORTED;
}
- rounded_start = rounddown(start, EFI_PAGE_SIZE);
- rounded_end = roundup(start + size, EFI_PAGE_SIZE);
-
/*
* Don't modify memory region attributes, they are
* already suitable, to lower the possibility to
* encounter firmware bugs.
*/
- for (end = start + size; start < end; start = next) {
+
+ for (start = rounded_start, end = rounded_end; start < end; start = next) {
status = efi_dxe_call(get_memory_space_descriptor,
start, &desc);
if (status != EFI_SUCCESS) {
efi_warn("Unable to get memory descriptor at %lx\n",
- start);
+ (unsigned long)start);
return status;
}
@@ -230,3 +214,107 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
return EFI_SUCCESS;
}
+
+static void retrieve_memory_attributes_proto(void)
+{
+ efi_status_t status;
+ efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
+
+ status = efi_bs_call(locate_protocol, &guid, NULL,
+ (void **)&efi_mem_attrib_proto);
+ if (status != EFI_SUCCESS)
+ efi_mem_attrib_proto = NULL;
+}
+
+/**
+ * efi_adjust_memory_range_protection() - change memory range protection attributes
+ * @start: memory range start address
+ * @size: memory range size
+ *
+ * Actual memory range for which memory attributes are modified is
+ * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
+ * that includes [start, start + size].
+ *
+ * This function first attempts to use EFI_MEMORY_ATTRIBUTE_PROTOCOL,
+ * that is a part of UEFI Specification since version 2.10.
+ * If the protocol is unavailable it falls back to DXE services functions.
+ *
+ * @return: status code
+ */
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size,
+ unsigned long attributes)
+{
+ efi_status_t status;
+ efi_physical_addr_t rounded_start, rounded_end;
+ unsigned long attr_clear;
+
+ /*
+ * This function should not be used to modify attributes
+ * other than writable/executable.
+ */
+
+ if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
+ return EFI_INVALID_PARAMETER;
+
+ /*
+ * Warn if requested to make memory simultaneously
+ * executable and writable to enforce W^X policy.
+ */
+
+ if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0) {
+ efi_warn("W^X violation at [%08lx,%08lx]",
+ (unsigned long)rounded_start,
+ (unsigned long)rounded_end);
+ }
+
+ rounded_start = rounddown(start, EFI_PAGE_SIZE);
+ rounded_end = roundup(start + size, EFI_PAGE_SIZE);
+
+ if (!efi_mem_attrib_proto) {
+ retrieve_memory_attributes_proto();
+
+ /* Fall back to DXE services if unsupported */
+ if (!efi_mem_attrib_proto) {
+ return adjust_mem_attrib_dxe(rounded_start,
+ rounded_end,
+ attributes);
+ }
+ }
+
+ /*
+ * Unlike DXE services functions, EFI_MEMORY_ATTRIBUTE_PROTOCOL
+ * does not clear unset protection bit, so it needs to be cleared
+ * explcitly
+ */
+
+ attr_clear = ~attributes &
+ (EFI_MEMORY_RO | EFI_MEMORY_XP | EFI_MEMORY_RP);
+
+ status = efi_call_proto(efi_mem_attrib_proto,
+ clear_memory_attributes,
+ rounded_start,
+ rounded_end - rounded_start,
+ attr_clear);
+ if (status != EFI_SUCCESS) {
+ efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx",
+ (unsigned long)rounded_start,
+ (unsigned long)rounded_end,
+ status);
+ return status;
+ }
+
+ status = efi_call_proto(efi_mem_attrib_proto,
+ set_memory_attributes,
+ rounded_start,
+ rounded_end - rounded_start,
+ attributes);
+ if (status != EFI_SUCCESS) {
+ efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx",
+ (unsigned long)rounded_start,
+ (unsigned long)rounded_end,
+ status);
+ }
+
+ return status;
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 60697fcd8950..06a62b121521 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -23,7 +23,6 @@
#define MAXMEM_X86_64_4LEVEL (1ull << 46)
const efi_system_table_t *efi_system_table;
-const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image __section(".data");
@@ -357,15 +356,6 @@ void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
static void setup_sections_memory_protection(unsigned long image_base)
{
#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
- efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
-
- if (!efi_dxe_table ||
- efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
- efi_warn("Unable to locate EFI DXE services table\n");
- efi_dxe_table = NULL;
- return;
- }
-
/* .setup [image_base, _head] */
efi_adjust_memory_range_protection(image_base,
(unsigned long)_head - image_base,
@@ -732,13 +722,6 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
- efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
- if (efi_dxe_table &&
- efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
- efi_warn("Ignoring DXE services table: invalid signature\n");
- efi_dxe_table = NULL;
- }
-
setup_sections_memory_protection(bzimage_addr - image_offset);
#ifdef CONFIG_CMDLINE_BOOL
--
2.37.4
On Thu, Dec 15, 2022 at 03:37:51PM +0300, Evgeniy Baskov wrote:
> This patchset is aimed
> * to improve UEFI compatibility of compressed kernel code for x86_64
> * to setup proper memory access attributes for code and rodata sections
> * to implement W^X protection policy throughout the whole execution
> of compressed kernel for EFISTUB code path.
Hi Evgeniy,
Aside from some minor patch fuzz in patch 6 due to building this in
today's Fedora rawhide kernel rather than mainline, this patch set works
for me.
Thanks!
--
Peter
On 2022-12-15 22:21, Peter Jones wrote:
> On Thu, Dec 15, 2022 at 03:37:51PM +0300, Evgeniy Baskov wrote:
>> This patchset is aimed
>> * to improve UEFI compatibility of compressed kernel code for x86_64
>> * to setup proper memory access attributes for code and rodata
>> sections
>> * to implement W^X protection policy throughout the whole execution
>> of compressed kernel for EFISTUB code path.
>
> Hi Evgeniy,
>
> Aside from some minor patch fuzz in patch 6 due to building this in
> today's Fedora rawhide kernel rather than mainline, this patch set
> works
> for me.
>
> Thanks!
Nice to hear that, thank you for testing again!
On Thu, 15 Dec 2022 at 13:38, Evgeniy Baskov <[email protected]> wrote:
>
> Previous upper limit ignored pages implicitly mapped from #PF handler
> by code accessing ACPI tables (boot/compressed/{acpi.c,efi.c}),
> so theoretical upper limit is higher than it was set.
>
> Using 4KB pages is desirable for better memory protection granularity.
> Approximately twice as much memory is required for those.
>
> Increase initial page table size to 64 4KB page tables.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
> ---
> arch/x86/include/asm/boot.h | 26 ++++++++++++++------------
> 1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> index 9191280d9ea3..024d972c248e 100644
> --- a/arch/x86/include/asm/boot.h
> +++ b/arch/x86/include/asm/boot.h
> @@ -41,22 +41,24 @@
> # define BOOT_STACK_SIZE 0x4000
>
> # define BOOT_INIT_PGT_SIZE (6*4096)
> -# ifdef CONFIG_RANDOMIZE_BASE
> /*
> * Assuming all cross the 512GB boundary:
> * 1 page for level4
> - * (2+2)*4 pages for kernel, param, cmd_line, and randomized kernel
> - * 2 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
> - * Total is 19 pages.
> + * (3+3)*2 pages for param and cmd_line
> + * (2+2+S)*2 pages for kernel and randomized kernel, where S is total number
> + * of sections of kernel. Explanation: 2+2 are upper level page tables.
> + * We can have only S unaligned parts of section: 1 at the end of the kernel
> + * and (S-1) at the section borders. The start address of the kernel is
> + * aligned, so an extra page table. There are at most S=6 sections in
> + * vmlinux ELF image.
> + * 3 pages for first 2M (video RAM: CONFIG_X86_VERBOSE_BOOTUP).
> + * Total is 36 pages.
> + *
> + * Some pages are also required for UEFI memory map and
> + * ACPI table mappings, so we need to add extra space.
> + * FIXME: Figure out exact amount of pages.
So you are rounding up 36 to 64 to account for these pages, right?
So we should either drop the FIXME and explain that this is fine, or
fix it - we cannot merge it like this.
Thanks,
Ard.
> */
> -# ifdef CONFIG_X86_VERBOSE_BOOTUP
> -# define BOOT_PGT_SIZE (19*4096)
> -# else /* !CONFIG_X86_VERBOSE_BOOTUP */
> -# define BOOT_PGT_SIZE (17*4096)
> -# endif
> -# else /* !CONFIG_RANDOMIZE_BASE */
> -# define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE
> -# endif
> +# define BOOT_PGT_SIZE (64*4096)
>
> #else /* !CONFIG_X86_64 */
> # define BOOT_STACK_SIZE 0x1000
> --
> 2.37.4
>
On Thu, 15 Dec 2022 at 13:38, Evgeniy Baskov <[email protected]> wrote:
>
> Current identity mapping code only supports 2M and 1G pages.
> 4KB pages are desirable for better memory protection granularity
> in compressed kernel code.
>
> Change identity mapping code to support 4KB pages and
> memory remapping with different attributes.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
This patch triggers an error reported by the build bots:
arch/x86/mm/ident_map.c:19:8: warning: no previous prototype for
'ident_split_large_pmd'
> ---
> arch/x86/include/asm/init.h | 1 +
> arch/x86/mm/ident_map.c | 185 +++++++++++++++++++++++++++++-------
> 2 files changed, 154 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 5f1d3c421f68..a8277ee82c51 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -8,6 +8,7 @@ struct x86_mapping_info {
> unsigned long page_flag; /* page flag for PMD or PUD entry */
> unsigned long offset; /* ident mapping offset */
> bool direct_gbpages; /* PUD level 1GB page support */
> + bool allow_4kpages; /* Allow more granular mappings with 4K pages */
> unsigned long kernpg_flag; /* kernel pagetable flag override */
> };
>
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 968d7005f4a7..662e794a325d 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -4,24 +4,127 @@
> * included by both the compressed kernel and the regular kernel.
> */
>
> -static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
> - unsigned long addr, unsigned long end)
> +static void ident_pte_init(struct x86_mapping_info *info, pte_t *pte_page,
> + unsigned long addr, unsigned long end,
> + unsigned long flags)
> {
> - addr &= PMD_MASK;
> - for (; addr < end; addr += PMD_SIZE) {
> + addr &= PAGE_MASK;
> + for (; addr < end; addr += PAGE_SIZE) {
> + pte_t *pte = pte_page + pte_index(addr);
> +
> + set_pte(pte, __pte((addr - info->offset) | flags));
> + }
> +}
> +
> +pte_t *ident_split_large_pmd(struct x86_mapping_info *info,
> + pmd_t *pmdp, unsigned long page_addr)
> +{
> + unsigned long pmd_addr, page_flags;
> + pte_t *pte;
> +
> + pte = (pte_t *)info->alloc_pgt_page(info->context);
> + if (!pte)
> + return NULL;
> +
> + pmd_addr = page_addr & PMD_MASK;
> +
> + /* Not a large page - clear PSE flag */
> + page_flags = pmd_flags(*pmdp) & ~_PSE;
> + ident_pte_init(info, pte, pmd_addr, pmd_addr + PMD_SIZE, page_flags);
> +
> + return pte;
> +}
> +
> +static int ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
> + unsigned long addr, unsigned long end,
> + unsigned long flags)
> +{
> + unsigned long next;
> + bool new_table = 0;
> +
> + for (; addr < end; addr = next) {
> pmd_t *pmd = pmd_page + pmd_index(addr);
> + pte_t *pte;
>
> - if (pmd_present(*pmd))
> + next = (addr & PMD_MASK) + PMD_SIZE;
> + if (next > end)
> + next = end;
> +
> + /*
> + * Use 2M pages if 4k pages are not allowed or
> + * we are not mapping extra, i.e. address and size are aligned.
> + */
> +
> + if (!info->allow_4kpages ||
> + (!(addr & ~PMD_MASK) && next == addr + PMD_SIZE)) {
> +
> + pmd_t pmdval;
> +
> + addr &= PMD_MASK;
> + pmdval = __pmd((addr - info->offset) | flags | _PSE);
> + set_pmd(pmd, pmdval);
> continue;
> + }
> +
> + /*
> + * If currently mapped page is large, we need to split it.
> + * The case when we don't can remap 2M page to 2M page
> + * with different flags is already covered above.
> + *
> + * If there's nothing mapped to desired address,
> + * we need to allocate new page table.
> + */
>
> - set_pmd(pmd, __pmd((addr - info->offset) | info->page_flag));
> + if (pmd_large(*pmd)) {
> + pte = ident_split_large_pmd(info, pmd, addr);
> + new_table = 1;
> + } else if (!pmd_present(*pmd)) {
> + pte = (pte_t *)info->alloc_pgt_page(info->context);
> + new_table = 1;
> + } else {
> + pte = pte_offset_kernel(pmd, 0);
> + new_table = 0;
> + }
> +
> + if (!pte)
> + return -ENOMEM;
> +
> + ident_pte_init(info, pte, addr, next, flags);
> +
> + if (new_table)
> + set_pmd(pmd, __pmd(__pa(pte) | info->kernpg_flag));
> }
> +
> + return 0;
> }
>
> +
> +pmd_t *ident_split_large_pud(struct x86_mapping_info *info,
> + pud_t *pudp, unsigned long page_addr)
> +{
> + unsigned long pud_addr, page_flags;
> + pmd_t *pmd;
> +
> + pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> + if (!pmd)
> + return NULL;
> +
> + pud_addr = page_addr & PUD_MASK;
> +
> + /* Not a large page - clear PSE flag */
> + page_flags = pud_flags(*pudp) & ~_PSE;
> + ident_pmd_init(info, pmd, pud_addr, pud_addr + PUD_SIZE, page_flags);
> +
> + return pmd;
> +}
> +
> +
> static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
> unsigned long addr, unsigned long end)
> {
> unsigned long next;
> + bool new_table = 0;
> + int result;
>
> for (; addr < end; addr = next) {
> pud_t *pud = pud_page + pud_index(addr);
> @@ -31,28 +134,39 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
> if (next > end)
> next = end;
>
> + /* Use 1G pages only if forced, even if they are supported. */
> if (info->direct_gbpages) {
> pud_t pudval;
> -
> - if (pud_present(*pud))
> - continue;
> + unsigned long flags;
>
> addr &= PUD_MASK;
> - pudval = __pud((addr - info->offset) | info->page_flag);
> + flags = info->page_flag | _PSE;
> + pudval = __pud((addr - info->offset) | flags);
> +
> set_pud(pud, pudval);
> continue;
> }
>
> - if (pud_present(*pud)) {
> + if (pud_large(*pud)) {
> + pmd = ident_split_large_pud(info, pud, addr);
> + new_table = 1;
> + } else if (!pud_present(*pud)) {
> + pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> + new_table = 1;
> + } else {
> pmd = pmd_offset(pud, 0);
> - ident_pmd_init(info, pmd, addr, next);
> - continue;
> + new_table = 0;
> }
> - pmd = (pmd_t *)info->alloc_pgt_page(info->context);
> +
> if (!pmd)
> return -ENOMEM;
> - ident_pmd_init(info, pmd, addr, next);
> - set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
> +
> + result = ident_pmd_init(info, pmd, addr, next, info->page_flag);
> + if (result)
> + return result;
> +
> + if (new_table)
> + set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
> }
>
> return 0;
> @@ -63,6 +177,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
> {
> unsigned long next;
> int result;
> + bool new_table = 0;
>
> for (; addr < end; addr = next) {
> p4d_t *p4d = p4d_page + p4d_index(addr);
> @@ -72,15 +187,14 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
> if (next > end)
> next = end;
>
> - if (p4d_present(*p4d)) {
> + if (!p4d_present(*p4d)) {
> + pud = (pud_t *)info->alloc_pgt_page(info->context);
> + new_table = 1;
> + } else {
> pud = pud_offset(p4d, 0);
> - result = ident_pud_init(info, pud, addr, next);
> - if (result)
> - return result;
> -
> - continue;
> + new_table = 0;
> }
> - pud = (pud_t *)info->alloc_pgt_page(info->context);
> +
> if (!pud)
> return -ENOMEM;
>
> @@ -88,19 +202,22 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
> if (result)
> return result;
>
> - set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
> + if (new_table)
> + set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
> }
>
> return 0;
> }
>
> -int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> - unsigned long pstart, unsigned long pend)
> +int kernel_ident_mapping_init(struct x86_mapping_info *info,
> + pgd_t *pgd_page, unsigned long pstart,
> + unsigned long pend)
> {
> unsigned long addr = pstart + info->offset;
> unsigned long end = pend + info->offset;
> unsigned long next;
> int result;
> + bool new_table;
>
> /* Set the default pagetable flags if not supplied */
> if (!info->kernpg_flag)
> @@ -117,20 +234,24 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> if (next > end)
> next = end;
>
> - if (pgd_present(*pgd)) {
> + if (!pgd_present(*pgd)) {
> + p4d = (p4d_t *)info->alloc_pgt_page(info->context);
> + new_table = 1;
> + } else {
> p4d = p4d_offset(pgd, 0);
> - result = ident_p4d_init(info, p4d, addr, next);
> - if (result)
> - return result;
> - continue;
> + new_table = 0;
> }
>
> - p4d = (p4d_t *)info->alloc_pgt_page(info->context);
> if (!p4d)
> return -ENOMEM;
> +
> result = ident_p4d_init(info, p4d, addr, next);
> if (result)
> return result;
> +
> + if (!new_table)
> + continue;
> +
> if (pgtable_l5_enabled()) {
> set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag));
> } else {
> --
> 2.37.4
>
On 2023-03-08 12:42, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 13:38, Evgeniy Baskov <[email protected]> wrote:
>>
>> Current identity mapping code only supports 2M and 1G pages.
>> 4KB pages are desirable for better memory protection granularity
>> in compressed kernel code.
>>
>> Change identity mapping code to support 4KB pages and
>> memory remapping with different attributes.
>>
>> Tested-by: Mario Limonciello <[email protected]>
>> Tested-by: Peter Jones <[email protected]>
>> Signed-off-by: Evgeniy Baskov <[email protected]>
>
> This patch triggers an error reported by the build bots:
>
> arch/x86/mm/ident_map.c:19:8: warning: no previous prototype for
> 'ident_split_large_pmd'
Thanks! I'll fix them (and all of the others from the bot emails)
>
>
>> ---
>> arch/x86/include/asm/init.h | 1 +
>> arch/x86/mm/ident_map.c | 185
>> +++++++++++++++++++++++++++++-------
>> 2 files changed, 154 insertions(+), 32 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
>> index 5f1d3c421f68..a8277ee82c51 100644
>> --- a/arch/x86/include/asm/init.h
>> +++ b/arch/x86/include/asm/init.h
>> @@ -8,6 +8,7 @@ struct x86_mapping_info {
>> unsigned long page_flag; /* page flag for PMD or PUD
>> entry */
>> unsigned long offset; /* ident mapping offset */
>> bool direct_gbpages; /* PUD level 1GB page support
>> */
>> + bool allow_4kpages; /* Allow more granular
>> mappings with 4K pages */
>> unsigned long kernpg_flag; /* kernel pagetable flag
>> override */
>> };
>>
>> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
>> index 968d7005f4a7..662e794a325d 100644
>> --- a/arch/x86/mm/ident_map.c
>> +++ b/arch/x86/mm/ident_map.c
>> @@ -4,24 +4,127 @@
>> * included by both the compressed kernel and the regular kernel.
>> */
>>
>> -static void ident_pmd_init(struct x86_mapping_info *info, pmd_t
>> *pmd_page,
>> - unsigned long addr, unsigned long end)
>> +static void ident_pte_init(struct x86_mapping_info *info, pte_t
>> *pte_page,
>> + unsigned long addr, unsigned long end,
>> + unsigned long flags)
>> {
>> - addr &= PMD_MASK;
>> - for (; addr < end; addr += PMD_SIZE) {
>> + addr &= PAGE_MASK;
>> + for (; addr < end; addr += PAGE_SIZE) {
>> + pte_t *pte = pte_page + pte_index(addr);
>> +
>> + set_pte(pte, __pte((addr - info->offset) | flags));
>> + }
>> +}
>> +
>> +pte_t *ident_split_large_pmd(struct x86_mapping_info *info,
>> + pmd_t *pmdp, unsigned long page_addr)
>> +{
>> + unsigned long pmd_addr, page_flags;
>> + pte_t *pte;
>> +
>> + pte = (pte_t *)info->alloc_pgt_page(info->context);
>> + if (!pte)
>> + return NULL;
>> +
>> + pmd_addr = page_addr & PMD_MASK;
>> +
>> + /* Not a large page - clear PSE flag */
>> + page_flags = pmd_flags(*pmdp) & ~_PSE;
>> + ident_pte_init(info, pte, pmd_addr, pmd_addr + PMD_SIZE,
>> page_flags);
>> +
>> + return pte;
>> +}
>> +
>> +static int ident_pmd_init(struct x86_mapping_info *info, pmd_t
>> *pmd_page,
>> + unsigned long addr, unsigned long end,
>> + unsigned long flags)
>> +{
>> + unsigned long next;
>> + bool new_table = 0;
>> +
>> + for (; addr < end; addr = next) {
>> pmd_t *pmd = pmd_page + pmd_index(addr);
>> + pte_t *pte;
>>
>> - if (pmd_present(*pmd))
>> + next = (addr & PMD_MASK) + PMD_SIZE;
>> + if (next > end)
>> + next = end;
>> +
>> + /*
>> + * Use 2M pages if 4k pages are not allowed or
>> + * we are not mapping extra, i.e. address and size are
>> aligned.
>> + */
>> +
>> + if (!info->allow_4kpages ||
>> + (!(addr & ~PMD_MASK) && next == addr + PMD_SIZE))
>> {
>> +
>> + pmd_t pmdval;
>> +
>> + addr &= PMD_MASK;
>> + pmdval = __pmd((addr - info->offset) | flags |
>> _PSE);
>> + set_pmd(pmd, pmdval);
>> continue;
>> + }
>> +
>> + /*
>> + * If currently mapped page is large, we need to split
>> it.
>> + * The case when we don't can remap 2M page to 2M page
>> + * with different flags is already covered above.
>> + *
>> + * If there's nothing mapped to desired address,
>> + * we need to allocate new page table.
>> + */
>>
>> - set_pmd(pmd, __pmd((addr - info->offset) |
>> info->page_flag));
>> + if (pmd_large(*pmd)) {
>> + pte = ident_split_large_pmd(info, pmd, addr);
>> + new_table = 1;
>> + } else if (!pmd_present(*pmd)) {
>> + pte = (pte_t
>> *)info->alloc_pgt_page(info->context);
>> + new_table = 1;
>> + } else {
>> + pte = pte_offset_kernel(pmd, 0);
>> + new_table = 0;
>> + }
>> +
>> + if (!pte)
>> + return -ENOMEM;
>> +
>> + ident_pte_init(info, pte, addr, next, flags);
>> +
>> + if (new_table)
>> + set_pmd(pmd, __pmd(__pa(pte) |
>> info->kernpg_flag));
>> }
>> +
>> + return 0;
>> }
>>
>> +
>> +pmd_t *ident_split_large_pud(struct x86_mapping_info *info,
>> + pud_t *pudp, unsigned long page_addr)
>> +{
>> + unsigned long pud_addr, page_flags;
>> + pmd_t *pmd;
>> +
>> + pmd = (pmd_t *)info->alloc_pgt_page(info->context);
>> + if (!pmd)
>> + return NULL;
>> +
>> + pud_addr = page_addr & PUD_MASK;
>> +
>> + /* Not a large page - clear PSE flag */
>> + page_flags = pud_flags(*pudp) & ~_PSE;
>> + ident_pmd_init(info, pmd, pud_addr, pud_addr + PUD_SIZE,
>> page_flags);
>> +
>> + return pmd;
>> +}
>> +
>> +
>> static int ident_pud_init(struct x86_mapping_info *info, pud_t
>> *pud_page,
>> unsigned long addr, unsigned long end)
>> {
>> unsigned long next;
>> + bool new_table = 0;
>> + int result;
>>
>> for (; addr < end; addr = next) {
>> pud_t *pud = pud_page + pud_index(addr);
>> @@ -31,28 +134,39 @@ static int ident_pud_init(struct x86_mapping_info
>> *info, pud_t *pud_page,
>> if (next > end)
>> next = end;
>>
>> + /* Use 1G pages only if forced, even if they are
>> supported. */
>> if (info->direct_gbpages) {
>> pud_t pudval;
>> -
>> - if (pud_present(*pud))
>> - continue;
>> + unsigned long flags;
>>
>> addr &= PUD_MASK;
>> - pudval = __pud((addr - info->offset) |
>> info->page_flag);
>> + flags = info->page_flag | _PSE;
>> + pudval = __pud((addr - info->offset) | flags);
>> +
>> set_pud(pud, pudval);
>> continue;
>> }
>>
>> - if (pud_present(*pud)) {
>> + if (pud_large(*pud)) {
>> + pmd = ident_split_large_pud(info, pud, addr);
>> + new_table = 1;
>> + } else if (!pud_present(*pud)) {
>> + pmd = (pmd_t
>> *)info->alloc_pgt_page(info->context);
>> + new_table = 1;
>> + } else {
>> pmd = pmd_offset(pud, 0);
>> - ident_pmd_init(info, pmd, addr, next);
>> - continue;
>> + new_table = 0;
>> }
>> - pmd = (pmd_t *)info->alloc_pgt_page(info->context);
>> +
>> if (!pmd)
>> return -ENOMEM;
>> - ident_pmd_init(info, pmd, addr, next);
>> - set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
>> +
>> + result = ident_pmd_init(info, pmd, addr, next,
>> info->page_flag);
>> + if (result)
>> + return result;
>> +
>> + if (new_table)
>> + set_pud(pud, __pud(__pa(pmd) |
>> info->kernpg_flag));
>> }
>>
>> return 0;
>> @@ -63,6 +177,7 @@ static int ident_p4d_init(struct x86_mapping_info
>> *info, p4d_t *p4d_page,
>> {
>> unsigned long next;
>> int result;
>> + bool new_table = 0;
>>
>> for (; addr < end; addr = next) {
>> p4d_t *p4d = p4d_page + p4d_index(addr);
>> @@ -72,15 +187,14 @@ static int ident_p4d_init(struct x86_mapping_info
>> *info, p4d_t *p4d_page,
>> if (next > end)
>> next = end;
>>
>> - if (p4d_present(*p4d)) {
>> + if (!p4d_present(*p4d)) {
>> + pud = (pud_t
>> *)info->alloc_pgt_page(info->context);
>> + new_table = 1;
>> + } else {
>> pud = pud_offset(p4d, 0);
>> - result = ident_pud_init(info, pud, addr,
>> next);
>> - if (result)
>> - return result;
>> -
>> - continue;
>> + new_table = 0;
>> }
>> - pud = (pud_t *)info->alloc_pgt_page(info->context);
>> +
>> if (!pud)
>> return -ENOMEM;
>>
>> @@ -88,19 +202,22 @@ static int ident_p4d_init(struct x86_mapping_info
>> *info, p4d_t *p4d_page,
>> if (result)
>> return result;
>>
>> - set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
>> + if (new_table)
>> + set_p4d(p4d, __p4d(__pa(pud) |
>> info->kernpg_flag));
>> }
>>
>> return 0;
>> }
>>
>> -int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t
>> *pgd_page,
>> - unsigned long pstart, unsigned long
>> pend)
>> +int kernel_ident_mapping_init(struct x86_mapping_info *info,
>> + pgd_t *pgd_page, unsigned long pstart,
>> + unsigned long pend)
>> {
>> unsigned long addr = pstart + info->offset;
>> unsigned long end = pend + info->offset;
>> unsigned long next;
>> int result;
>> + bool new_table;
>>
>> /* Set the default pagetable flags if not supplied */
>> if (!info->kernpg_flag)
>> @@ -117,20 +234,24 @@ int kernel_ident_mapping_init(struct
>> x86_mapping_info *info, pgd_t *pgd_page,
>> if (next > end)
>> next = end;
>>
>> - if (pgd_present(*pgd)) {
>> + if (!pgd_present(*pgd)) {
>> + p4d = (p4d_t
>> *)info->alloc_pgt_page(info->context);
>> + new_table = 1;
>> + } else {
>> p4d = p4d_offset(pgd, 0);
>> - result = ident_p4d_init(info, p4d, addr,
>> next);
>> - if (result)
>> - return result;
>> - continue;
>> + new_table = 0;
>> }
>>
>> - p4d = (p4d_t *)info->alloc_pgt_page(info->context);
>> if (!p4d)
>> return -ENOMEM;
>> +
>> result = ident_p4d_init(info, p4d, addr, next);
>> if (result)
>> return result;
>> +
>> + if (!new_table)
>> + continue;
>> +
>> if (pgtable_l5_enabled()) {
>> set_pgd(pgd, __pgd(__pa(p4d) |
>> info->kernpg_flag));
>> } else {
>> --
>> 2.37.4
>>
On Thu, 15 Dec 2022 at 13:42, Evgeniy Baskov <[email protected]> wrote:
>
> Use newer C standard. Since kernel requires C99 compiler now,
> we can make use of the new features to make the core more readable.
>
> Use mmap() for reading files also to make things simpler.
>
> Replace most magic numbers with defines.
>
> Should have no functional changes. This is done in preparation for the
> next changes that makes generated PE header more spec compliant.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
> ---
> arch/x86/boot/tools/build.c | 387 +++++++++++++++++++++++-------------
> 1 file changed, 245 insertions(+), 142 deletions(-)
>
> diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
> index bd247692b701..fbc5315af032 100644
> --- a/arch/x86/boot/tools/build.c
> +++ b/arch/x86/boot/tools/build.c
> @@ -25,20 +25,21 @@
> * Substantially overhauled by H. Peter Anvin, April 2007
> */
>
> +#include <fcntl.h>
> +#include <stdarg.h>
> +#include <stdint.h>
> #include <stdio.h>
> -#include <string.h>
> #include <stdlib.h>
> -#include <stdarg.h>
> -#include <sys/types.h>
> +#include <string.h>
> +#include <sys/mman.h>
> #include <sys/stat.h>
> +#include <sys/types.h>
> #include <unistd.h>
> -#include <fcntl.h>
> -#include <sys/mman.h>
> +
> #include <tools/le_byteshift.h>
> +#include <linux/pe.h>
>
> -typedef unsigned char u8;
> -typedef unsigned short u16;
> -typedef unsigned int u32;
> +#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
>
> #define DEFAULT_MAJOR_ROOT 0
> #define DEFAULT_MINOR_ROOT 0
> @@ -48,8 +49,13 @@ typedef unsigned int u32;
> #define SETUP_SECT_MIN 5
> #define SETUP_SECT_MAX 64
>
> +#define PARAGRAPH_SIZE 16
> +#define SECTOR_SIZE 512
> +#define FILE_ALIGNMENT 512
> +#define SECTION_ALIGNMENT 4096
> +
> /* This must be large enough to hold the entire setup */
> -u8 buf[SETUP_SECT_MAX*512];
> +uint8_t buf[SETUP_SECT_MAX*SECTOR_SIZE];
>
> #define PECOFF_RELOC_RESERVE 0x20
>
> @@ -59,6 +65,52 @@ u8 buf[SETUP_SECT_MAX*512];
> #define PECOFF_COMPAT_RESERVE 0x0
> #endif
>
> +#define RELOC_SECTION_SIZE 10
> +
> +/* PE header has different format depending on the architecture */
> +#ifdef CONFIG_X86_64
> +typedef struct pe32plus_opt_hdr pe_opt_hdr;
> +#else
> +typedef struct pe32_opt_hdr pe_opt_hdr;
> +#endif
> +
> +static inline struct pe_hdr *get_pe_header(uint8_t *buf)
> +{
> + uint32_t pe_offset = get_unaligned_le32(buf+MZ_HEADER_PEADDR_OFFSET);
> + return (struct pe_hdr *)(buf + pe_offset);
> +}
> +
> +static inline pe_opt_hdr *get_pe_opt_header(uint8_t *buf)
> +{
> + return (pe_opt_hdr *)(get_pe_header(buf) + 1);
> +}
> +
> +static inline struct section_header *get_sections(uint8_t *buf)
> +{
> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> + uint32_t n_data_dirs = get_unaligned_le32(&hdr->data_dirs);
> + uint8_t *sections = (uint8_t *)(hdr + 1) + n_data_dirs*sizeof(struct data_dirent);
> + return (struct section_header *)sections;
> +}
> +
> +static inline struct data_directory *get_data_dirs(uint8_t *buf)
> +{
> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> + return (struct data_directory *)(hdr + 1);
> +}
> +
> +#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
Can we drop this conditional?
> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | IMAGE_SCN_ALIGN_4096BYTES)
> +#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
> +#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
Please drop the alignment flags - they don't apply to executable only
object files.
> +#else
> +/* With memory protection disabled all sections are RWX */
> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
> + IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
> +#define SCN_RX SCN_RW
> +#define SCN_RO SCN_RW
> +#endif
> +
> static unsigned long efi32_stub_entry;
> static unsigned long efi64_stub_entry;
> static unsigned long efi_pe_entry;
> @@ -70,7 +122,7 @@ static unsigned long _end;
>
> /*----------------------------------------------------------------------*/
>
> -static const u32 crctab32[] = {
> +static const uint32_t crctab32[] = {
Replacing all the type names makes this patch very messy. Can we back
that out please?
> 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419,
> 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4,
> 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07,
> @@ -125,12 +177,12 @@ static const u32 crctab32[] = {
> 0x2d02ef8d
> };
>
> -static u32 partial_crc32_one(u8 c, u32 crc)
> +static uint32_t partial_crc32_one(uint8_t c, uint32_t crc)
> {
> return crctab32[(crc ^ c) & 0xff] ^ (crc >> 8);
> }
>
> -static u32 partial_crc32(const u8 *s, int len, u32 crc)
> +static uint32_t partial_crc32(const uint8_t *s, int len, uint32_t crc)
> {
> while (len--)
> crc = partial_crc32_one(*s++, crc);
> @@ -152,57 +204,106 @@ static void usage(void)
> die("Usage: build setup system zoffset.h image");
> }
>
> +static void *map_file(const char *path, size_t *psize)
> +{
> + struct stat statbuf;
> + size_t size;
> + void *addr;
> + int fd;
> +
> + fd = open(path, O_RDONLY);
> + if (fd < 0)
> + die("Unable to open `%s': %m", path);
> + if (fstat(fd, &statbuf))
> + die("Unable to stat `%s': %m", path);
> +
> + size = statbuf.st_size;
> + /*
> + * Map one byte more, to allow adding null-terminator
> + * for text files.
> + */
> + addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> + if (addr == MAP_FAILED)
> + die("Unable to mmap '%s': %m", path);
> +
> + close(fd);
> +
> + *psize = size;
> + return addr;
> +}
> +
> +static void unmap_file(void *addr, size_t size)
> +{
> + munmap(addr, size + 1);
> +}
> +
> +static void *map_output_file(const char *path, size_t size)
> +{
> + void *addr;
> + int fd;
> +
> + fd = open(path, O_RDWR | O_CREAT, 0660);
> + if (fd < 0)
> + die("Unable to create `%s': %m", path);
> +
> + if (ftruncate(fd, size))
> + die("Unable to resize `%s': %m", path);
> +
> + addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> + if (addr == MAP_FAILED)
> + die("Unable to mmap '%s': %m", path);
> +
> + return addr;
> +}
> +
> #ifdef CONFIG_EFI_STUB
>
> -static void update_pecoff_section_header_fields(char *section_name, u32 vma, u32 size, u32 datasz, u32 offset)
> +static void update_pecoff_section_header_fields(char *section_name, uint32_t vma,
> + uint32_t size, uint32_t datasz,
> + uint32_t offset)
> {
> unsigned int pe_header;
> unsigned short num_sections;
> - u8 *section;
> + struct section_header *section;
>
> - pe_header = get_unaligned_le32(&buf[0x3c]);
> - num_sections = get_unaligned_le16(&buf[pe_header + 6]);
> -
> -#ifdef CONFIG_X86_32
> - section = &buf[pe_header + 0xa8];
> -#else
> - section = &buf[pe_header + 0xb8];
> -#endif
> + struct pe_hdr *hdr = get_pe_header(buf);
> + num_sections = get_unaligned_le16(&hdr->sections);
> + section = get_sections(buf);
>
> while (num_sections > 0) {
> - if (strncmp((char*)section, section_name, 8) == 0) {
> + if (strncmp(section->name, section_name, 8) == 0) {
> /* section header size field */
> - put_unaligned_le32(size, section + 0x8);
> + put_unaligned_le32(size, §ion->virtual_size);
>
> /* section header vma field */
> - put_unaligned_le32(vma, section + 0xc);
> + put_unaligned_le32(vma, §ion->virtual_address);
>
> /* section header 'size of initialised data' field */
> - put_unaligned_le32(datasz, section + 0x10);
> + put_unaligned_le32(datasz, §ion->raw_data_size);
>
> /* section header 'file offset' field */
> - put_unaligned_le32(offset, section + 0x14);
> + put_unaligned_le32(offset, §ion->data_addr);
>
> break;
> }
> - section += 0x28;
> + section++;
> num_sections--;
> }
> }
>
> -static void update_pecoff_section_header(char *section_name, u32 offset, u32 size)
> +static void update_pecoff_section_header(char *section_name, uint32_t offset, uint32_t size)
> {
> update_pecoff_section_header_fields(section_name, offset, size, size, offset);
> }
>
> static void update_pecoff_setup_and_reloc(unsigned int size)
> {
> - u32 setup_offset = 0x200;
> - u32 reloc_offset = size - PECOFF_RELOC_RESERVE - PECOFF_COMPAT_RESERVE;
> + uint32_t setup_offset = SECTOR_SIZE;
> + uint32_t reloc_offset = size - PECOFF_RELOC_RESERVE - PECOFF_COMPAT_RESERVE;
> #ifdef CONFIG_EFI_MIXED
> - u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
> + uint32_t compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
> #endif
> - u32 setup_size = reloc_offset - setup_offset;
> + uint32_t setup_size = reloc_offset - setup_offset;
>
> update_pecoff_section_header(".setup", setup_offset, setup_size);
> update_pecoff_section_header(".reloc", reloc_offset, PECOFF_RELOC_RESERVE);
> @@ -211,8 +312,8 @@ static void update_pecoff_setup_and_reloc(unsigned int size)
> * Modify .reloc section contents with a single entry. The
> * relocation is applied to offset 10 of the relocation section.
> */
> - put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
> - put_unaligned_le32(10, &buf[reloc_offset + 4]);
> + put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE, &buf[reloc_offset]);
> + put_unaligned_le32(RELOC_SECTION_SIZE, &buf[reloc_offset + 4]);
>
> #ifdef CONFIG_EFI_MIXED
> update_pecoff_section_header(".compat", compat_offset, PECOFF_COMPAT_RESERVE);
> @@ -224,19 +325,17 @@ static void update_pecoff_setup_and_reloc(unsigned int size)
> */
> buf[compat_offset] = 0x1;
> buf[compat_offset + 1] = 0x8;
> - put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
> + put_unaligned_le16(IMAGE_FILE_MACHINE_I386, &buf[compat_offset + 2]);
> put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset + 4]);
> #endif
> }
>
> -static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
> +static unsigned int update_pecoff_sections(unsigned int text_start, unsigned int text_sz,
> unsigned int init_sz)
> {
> - unsigned int pe_header;
> - unsigned int text_sz = file_sz - text_start;
> + unsigned int file_sz = text_start + text_sz;
> unsigned int bss_sz = init_sz - file_sz;
> -
> - pe_header = get_unaligned_le32(&buf[0x3c]);
> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
>
> /*
> * The PE/COFF loader may load the image at an address which is
> @@ -254,18 +353,20 @@ static void update_pecoff_text(unsigned int text_start, unsigned int file_sz,
> * Size of code: Subtract the size of the first sector (512 bytes)
> * which includes the header.
> */
> - put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header + 0x1c]);
> + put_unaligned_le32(file_sz - SECTOR_SIZE + bss_sz, &hdr->text_size);
>
> /* Size of image */
> - put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
> + put_unaligned_le32(init_sz, &hdr->image_size);
>
> /*
> * Address of entry point for PE/COFF executable
> */
> - put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header + 0x28]);
> + put_unaligned_le32(text_start + efi_pe_entry, &hdr->entry_point);
>
> update_pecoff_section_header_fields(".text", text_start, text_sz + bss_sz,
> text_sz, text_start);
> +
> + return text_start + file_sz;
> }
>
> static int reserve_pecoff_reloc_section(int c)
> @@ -275,7 +376,7 @@ static int reserve_pecoff_reloc_section(int c)
> return PECOFF_RELOC_RESERVE;
> }
>
> -static void efi_stub_defaults(void)
> +static void efi_stub_update_defaults(void)
> {
> /* Defaults for old kernel */
> #ifdef CONFIG_X86_32
> @@ -298,7 +399,7 @@ static void efi_stub_entry_update(void)
>
> #ifdef CONFIG_EFI_MIXED
> if (efi32_stub_entry != addr)
> - die("32-bit and 64-bit EFI entry points do not match\n");
> + die("32-bit and 64-bit EFI entry points do not match");
> #endif
> #endif
> put_unaligned_le32(addr, &buf[0x264]);
> @@ -310,7 +411,7 @@ static inline void update_pecoff_setup_and_reloc(unsigned int size) {}
> static inline void update_pecoff_text(unsigned int text_start,
> unsigned int file_sz,
> unsigned int init_sz) {}
> -static inline void efi_stub_defaults(void) {}
> +static inline void efi_stub_update_defaults(void) {}
> static inline void efi_stub_entry_update(void) {}
>
> static inline int reserve_pecoff_reloc_section(int c)
> @@ -338,20 +439,15 @@ static int reserve_pecoff_compat_section(int c)
>
> static void parse_zoffset(char *fname)
> {
> - FILE *file;
> - char *p;
> - int c;
> + size_t size;
> + char *data, *p;
>
> - file = fopen(fname, "r");
> - if (!file)
> - die("Unable to open `%s': %m", fname);
> - c = fread(buf, 1, sizeof(buf) - 1, file);
> - if (ferror(file))
> - die("read-error on `zoffset.h'");
> - fclose(file);
> - buf[c] = 0;
> + data = map_file(fname, &size);
>
> - p = (char *)buf;
> + /* We can do that, since we mapped one byte more */
> + data[size] = 0;
> +
> + p = (char *)data;
>
> while (p && *p) {
> PARSE_ZOFS(p, efi32_stub_entry);
> @@ -367,82 +463,99 @@ static void parse_zoffset(char *fname)
> while (p && (*p == '\r' || *p == '\n'))
> p++;
> }
> +
> + unmap_file(data, size);
> }
>
> -int main(int argc, char ** argv)
> +static unsigned int read_setup(char *path)
> {
> - unsigned int i, sz, setup_sectors, init_sz;
> - int c;
> - u32 sys_size;
> - struct stat sb;
> - FILE *file, *dest;
> - int fd;
> - void *kernel;
> - u32 crc = 0xffffffffUL;
> -
> - efi_stub_defaults();
> -
> - if (argc != 5)
> - usage();
> - parse_zoffset(argv[3]);
> -
> - dest = fopen(argv[4], "w");
> - if (!dest)
> - die("Unable to write `%s': %m", argv[4]);
> + FILE *file;
> + unsigned int setup_size, file_size;
>
> /* Copy the setup code */
> - file = fopen(argv[1], "r");
> + file = fopen(path, "r");
> if (!file)
> - die("Unable to open `%s': %m", argv[1]);
> - c = fread(buf, 1, sizeof(buf), file);
> + die("Unable to open `%s': %m", path);
> +
> + file_size = fread(buf, 1, sizeof(buf), file);
> if (ferror(file))
> die("read-error on `setup'");
> - if (c < 1024)
> +
> + if (file_size < 2 * SECTOR_SIZE)
> die("The setup must be at least 1024 bytes");
> - if (get_unaligned_le16(&buf[510]) != 0xAA55)
> +
> + if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
> die("Boot block hasn't got boot flag (0xAA55)");
> +
> fclose(file);
>
> - c += reserve_pecoff_compat_section(c);
> - c += reserve_pecoff_reloc_section(c);
> + /* Reserve space for PE sections */
> + file_size += reserve_pecoff_compat_section(file_size);
> + file_size += reserve_pecoff_reloc_section(file_size);
>
> /* Pad unused space with zeros */
> - setup_sectors = (c + 511) / 512;
> - if (setup_sectors < SETUP_SECT_MIN)
> - setup_sectors = SETUP_SECT_MIN;
> - i = setup_sectors*512;
> - memset(buf+c, 0, i-c);
>
> - update_pecoff_setup_and_reloc(i);
> + setup_size = round_up(file_size, SECTOR_SIZE);
> +
> + if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
> + setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
> +
> + /*
> + * Global buffer is already initialised
> + * to 0, but just in case, zero out padding.
> + */
> +
> + memset(buf + file_size, 0, setup_size - file_size);
> +
> + return setup_size;
> +}
> +
> +int main(int argc, char **argv)
> +{
> + size_t kern_file_size;
> + unsigned int setup_size;
> + unsigned int setup_sectors;
> + unsigned int init_size;
> + unsigned int total_size;
> + unsigned int kern_size;
> + void *kernel;
> + uint32_t crc = 0xffffffffUL;
> + uint8_t *output;
> +
> + if (argc != 5)
> + usage();
> +
> + efi_stub_update_defaults();
> + parse_zoffset(argv[3]);
> +
> + setup_size = read_setup(argv[1]);
> +
> + setup_sectors = setup_size/SECTOR_SIZE;
>
> /* Set the default root device */
> put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
>
> - /* Open and stat the kernel file */
> - fd = open(argv[2], O_RDONLY);
> - if (fd < 0)
> - die("Unable to open `%s': %m", argv[2]);
> - if (fstat(fd, &sb))
> - die("Unable to stat `%s': %m", argv[2]);
> - sz = sb.st_size;
> - kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
> - if (kernel == MAP_FAILED)
> - die("Unable to mmap '%s': %m", argv[2]);
> - /* Number of 16-byte paragraphs, including space for a 4-byte CRC */
> - sys_size = (sz + 15 + 4) / 16;
> + /* Map kernel file to memory */
> + kernel = map_file(argv[2], &kern_file_size);
> +
> #ifdef CONFIG_EFI_STUB
> - /*
> - * COFF requires minimum 32-byte alignment of sections, and
> - * adding a signature is problematic without that alignment.
> - */
> - sys_size = (sys_size + 1) & ~1;
> + /* PE specification require 512-byte minimum section file alignment */
> + kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
> + update_pecoff_setup_and_reloc(setup_size);
> +#else
> + /* Number of 16-byte paragraphs, including space for a 4-byte CRC */
> + kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
> #endif
>
> /* Patch the setup code with the appropriate size parameters */
> - buf[0x1f1] = setup_sectors-1;
> - put_unaligned_le32(sys_size, &buf[0x1f4]);
> + buf[0x1f1] = setup_sectors - 1;
> + put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
> +
> + /* Update kernel_info offset. */
> + put_unaligned_le32(kernel_info, &buf[0x268]);
> +
> + init_size = get_unaligned_le32(&buf[0x260]);
>
> - init_sz = get_unaligned_le32(&buf[0x260]);
> #ifdef CONFIG_EFI_STUB
> /*
> * The decompression buffer will start at ImageBase. When relocating
> @@ -458,45 +571,35 @@ int main(int argc, char ** argv)
> * For future-proofing, increase init_sz if necessary.
> */
>
> - if (init_sz - _end < i + _ehead) {
> - init_sz = (i + _ehead + _end + 4095) & ~4095;
> - put_unaligned_le32(init_sz, &buf[0x260]);
> + if (init_size - _end < setup_size + _ehead) {
> + init_size = round_up(setup_size + _ehead + _end, SECTION_ALIGNMENT);
> + put_unaligned_le32(init_size, &buf[0x260]);
> }
> -#endif
> - update_pecoff_text(setup_sectors * 512, i + (sys_size * 16), init_sz);
>
> - efi_stub_entry_update();
> -
> - /* Update kernel_info offset. */
> - put_unaligned_le32(kernel_info, &buf[0x268]);
> + total_size = update_pecoff_sections(setup_size, kern_size, init_size);
>
> - crc = partial_crc32(buf, i, crc);
> - if (fwrite(buf, 1, i, dest) != i)
> - die("Writing setup failed");
> + efi_stub_entry_update();
> +#else
> + (void)init_size;
> + total_size = setup_size + kern_size;
> +#endif
>
> - /* Copy the kernel code */
> - crc = partial_crc32(kernel, sz, crc);
> - if (fwrite(kernel, 1, sz, dest) != sz)
> - die("Writing kernel failed");
> + output = map_output_file(argv[4], total_size);
>
> - /* Add padding leaving 4 bytes for the checksum */
> - while (sz++ < (sys_size*16) - 4) {
> - crc = partial_crc32_one('\0', crc);
> - if (fwrite("\0", 1, 1, dest) != 1)
> - die("Writing padding failed");
> - }
> + memcpy(output, buf, setup_size);
> + memcpy(output + setup_size, kernel, kern_file_size);
> + memset(output + setup_size + kern_file_size, 0, kern_size - kern_file_size);
>
> - /* Write the CRC */
> - put_unaligned_le32(crc, buf);
> - if (fwrite(buf, 1, 4, dest) != 4)
> - die("Writing CRC failed");
> + /* Calculate and write kernel checksum. */
> + crc = partial_crc32(output, total_size - 4, crc);
> + put_unaligned_le32(crc, &output[total_size - 4]);
>
> - /* Catch any delayed write failures */
> - if (fclose(dest))
> - die("Writing image failed");
> + /* Catch any delayed write failures. */
> + if (munmap(output, total_size) < 0)
> + die("Writing kernel failed");
>
> - close(fd);
> + unmap_file(kernel, kern_file_size);
>
> - /* Everything is OK */
> + /* Everything is OK. */
> return 0;
> }
> --
> 2.37.4
>
On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>
> Doing it that way allows setting up stricter memory attributes,
> simplifies boot code path and removes potential relocation
> of kernel image.
>
> Wire up required interfaces and minimally initialize zero page
> fields needed for it to function correctly.
>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
> ---
> arch/x86/boot/compressed/head_32.S | 50 ++++-
> arch/x86/boot/compressed/head_64.S | 58 ++++-
> drivers/firmware/efi/Kconfig | 2 +
> drivers/firmware/efi/libstub/Makefile | 2 +-
> .../firmware/efi/libstub/x86-extract-direct.c | 208 ++++++++++++++++++
> drivers/firmware/efi/libstub/x86-stub.c | 119 +---------
> drivers/firmware/efi/libstub/x86-stub.h | 14 ++
> 7 files changed, 338 insertions(+), 115 deletions(-)
> create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
> create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
>
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index ead6007df1e5..0be75e5072ae 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -152,11 +152,57 @@ SYM_FUNC_END(startup_32)
>
> #ifdef CONFIG_EFI_STUB
> SYM_FUNC_START(efi32_stub_entry)
> +/*
> + * Calculate the delta between where we were compiled to run
> + * at and where we were actually loaded at. This can only be done
> + * with a short local call on x86. Nothing else will tell us what
> + * address we are running at. The reserved chunk of the real-mode
> + * data at 0x1e4 (defined as a scratch field) are used as the stack
> + * for this calculation. Only 4 bytes are needed.
> + */
Please drop this comment
> + call 1f
> +1: popl %ebx
> + addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %ebx
Please drop this and ...
> +
> + /* Clear BSS */
> + xorl %eax, %eax
> + leal _bss@GOTOFF(%ebx), %edi
> + leal _ebss@GOTOFF(%ebx), %ecx
just use (_bss - 1b) here (etc)
> + subl %edi, %ecx
> + shrl $2, %ecx
> + rep stosl
> +
> add $0x4, %esp
> movl 8(%esp), %esi /* save boot_params pointer */
> + movl %edx, %edi /* save GOT address */
What does this do?
> call efi_main
> - /* efi_main returns the possibly relocated address of startup_32 */
> - jmp *%eax
> + movl %eax, %ecx
> +
> + /*
> + * efi_main returns the possibly
> + * relocated address of extracted kernel entry point.
> + */
> +
> + cli
> +
> + /* Load new GDT */
> + leal gdt@GOTOFF(%ebx), %eax
> + movl %eax, 2(%eax)
> + lgdt (%eax)
> +
> + /* Load segment registers with our descriptors */
> + movl $__BOOT_DS, %eax
> + movl %eax, %ds
> + movl %eax, %es
> + movl %eax, %fs
> + movl %eax, %gs
> + movl %eax, %ss
> +
> + /* Zero EFLAGS */
> + pushl $0
> + popfl
> +
> + jmp *%ecx
> SYM_FUNC_END(efi32_stub_entry)
> SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
> #endif
...
On 2023-03-09 18:57, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 13:42, Evgeniy Baskov <[email protected]> wrote:
>>
>> Use newer C standard. Since kernel requires C99 compiler now,
>> we can make use of the new features to make the core more readable.
>>
>> Use mmap() for reading files also to make things simpler.
>>
>> Replace most magic numbers with defines.
>>
>> Should have no functional changes. This is done in preparation for the
>> next changes that makes generated PE header more spec compliant.
>>
>> Tested-by: Mario Limonciello <[email protected]>
>> Tested-by: Peter Jones <[email protected]>
>> Signed-off-by: Evgeniy Baskov <[email protected]>
>> ---
>> arch/x86/boot/tools/build.c | 387
>> +++++++++++++++++++++++-------------
>> 1 file changed, 245 insertions(+), 142 deletions(-)
>>
>> diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
>> index bd247692b701..fbc5315af032 100644
>> --- a/arch/x86/boot/tools/build.c
>> +++ b/arch/x86/boot/tools/build.c
>> @@ -25,20 +25,21 @@
>> * Substantially overhauled by H. Peter Anvin, April 2007
>> */
>>
>> +#include <fcntl.h>
>> +#include <stdarg.h>
>> +#include <stdint.h>
>> #include <stdio.h>
>> -#include <string.h>
>> #include <stdlib.h>
>> -#include <stdarg.h>
>> -#include <sys/types.h>
>> +#include <string.h>
>> +#include <sys/mman.h>
>> #include <sys/stat.h>
>> +#include <sys/types.h>
>> #include <unistd.h>
>> -#include <fcntl.h>
>> -#include <sys/mman.h>
>> +
>> #include <tools/le_byteshift.h>
>> +#include <linux/pe.h>
>>
>> -typedef unsigned char u8;
>> -typedef unsigned short u16;
>> -typedef unsigned int u32;
>> +#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
>>
>> #define DEFAULT_MAJOR_ROOT 0
>> #define DEFAULT_MINOR_ROOT 0
>> @@ -48,8 +49,13 @@ typedef unsigned int u32;
>> #define SETUP_SECT_MIN 5
>> #define SETUP_SECT_MAX 64
>>
>> +#define PARAGRAPH_SIZE 16
>> +#define SECTOR_SIZE 512
>> +#define FILE_ALIGNMENT 512
>> +#define SECTION_ALIGNMENT 4096
>> +
>> /* This must be large enough to hold the entire setup */
>> -u8 buf[SETUP_SECT_MAX*512];
>> +uint8_t buf[SETUP_SECT_MAX*SECTOR_SIZE];
>>
>> #define PECOFF_RELOC_RESERVE 0x20
>>
>> @@ -59,6 +65,52 @@ u8 buf[SETUP_SECT_MAX*512];
>> #define PECOFF_COMPAT_RESERVE 0x0
>> #endif
>>
>> +#define RELOC_SECTION_SIZE 10
>> +
>> +/* PE header has different format depending on the architecture */
>> +#ifdef CONFIG_X86_64
>> +typedef struct pe32plus_opt_hdr pe_opt_hdr;
>> +#else
>> +typedef struct pe32_opt_hdr pe_opt_hdr;
>> +#endif
>> +
>> +static inline struct pe_hdr *get_pe_header(uint8_t *buf)
>> +{
>> + uint32_t pe_offset =
>> get_unaligned_le32(buf+MZ_HEADER_PEADDR_OFFSET);
>> + return (struct pe_hdr *)(buf + pe_offset);
>> +}
>> +
>> +static inline pe_opt_hdr *get_pe_opt_header(uint8_t *buf)
>> +{
>> + return (pe_opt_hdr *)(get_pe_header(buf) + 1);
>> +}
>> +
>> +static inline struct section_header *get_sections(uint8_t *buf)
>> +{
>> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
>> + uint32_t n_data_dirs = get_unaligned_le32(&hdr->data_dirs);
>> + uint8_t *sections = (uint8_t *)(hdr + 1) +
>> n_data_dirs*sizeof(struct data_dirent);
>> + return (struct section_header *)sections;
>> +}
>> +
>> +static inline struct data_directory *get_data_dirs(uint8_t *buf)
>> +{
>> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
>> + return (struct data_directory *)(hdr + 1);
>> +}
>> +
>> +#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
>
> Can we drop this conditional?
Without CONFIG_EFI_DXE_MEM_ATTRIBUTES memory attributes are not
getting applies anywhere, so this would break 'nokaslr' on UEFI
implementations that honor section attributes.
KASLR is already broken without that option on implementations
that disallow execution of the free memory though. But unlike
free memory, sections are more likely to get protected, I think.
>> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE |
>> IMAGE_SCN_ALIGN_4096BYTES)
>> +#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE |
>> IMAGE_SCN_ALIGN_4096BYTES)
>> +#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
>
> Please drop the alignment flags - they don't apply to executable only
> object files.
Got it, will remove them in v5.
>
>> +#else
>> +/* With memory protection disabled all sections are RWX */
>> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
>> + IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
>> +#define SCN_RX SCN_RW
>> +#define SCN_RO SCN_RW
>> +#endif
>> +
>> static unsigned long efi32_stub_entry;
>> static unsigned long efi64_stub_entry;
>> static unsigned long efi_pe_entry;
>> @@ -70,7 +122,7 @@ static unsigned long _end;
>>
>>
>> /*----------------------------------------------------------------------*/
>>
>> -static const u32 crctab32[] = {
>> +static const uint32_t crctab32[] = {
>
> Replacing all the type names makes this patch very messy. Can we back
> that out please?
Ok, I will revert them.
>
>> 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419,
>> 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4,
>> 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07,
>> @@ -125,12 +177,12 @@ static const u32 crctab32[] = {
>> 0x2d02ef8d
>> };
>>
>> -static u32 partial_crc32_one(u8 c, u32 crc)
>> +static uint32_t partial_crc32_one(uint8_t c, uint32_t crc)
>> {
>> return crctab32[(crc ^ c) & 0xff] ^ (crc >> 8);
>> }
>>
>> -static u32 partial_crc32(const u8 *s, int len, u32 crc)
>> +static uint32_t partial_crc32(const uint8_t *s, int len, uint32_t
>> crc)
>> {
>> while (len--)
>> crc = partial_crc32_one(*s++, crc);
>> @@ -152,57 +204,106 @@ static void usage(void)
>> die("Usage: build setup system zoffset.h image");
>> }
>>
>> +static void *map_file(const char *path, size_t *psize)
>> +{
>> + struct stat statbuf;
>> + size_t size;
>> + void *addr;
>> + int fd;
>> +
>> + fd = open(path, O_RDONLY);
>> + if (fd < 0)
>> + die("Unable to open `%s': %m", path);
>> + if (fstat(fd, &statbuf))
>> + die("Unable to stat `%s': %m", path);
>> +
>> + size = statbuf.st_size;
>> + /*
>> + * Map one byte more, to allow adding null-terminator
>> + * for text files.
>> + */
>> + addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE,
>> MAP_PRIVATE, fd, 0);
>> + if (addr == MAP_FAILED)
>> + die("Unable to mmap '%s': %m", path);
>> +
>> + close(fd);
>> +
>> + *psize = size;
>> + return addr;
>> +}
>> +
>> +static void unmap_file(void *addr, size_t size)
>> +{
>> + munmap(addr, size + 1);
>> +}
>> +
>> +static void *map_output_file(const char *path, size_t size)
>> +{
>> + void *addr;
>> + int fd;
>> +
>> + fd = open(path, O_RDWR | O_CREAT, 0660);
>> + if (fd < 0)
>> + die("Unable to create `%s': %m", path);
>> +
>> + if (ftruncate(fd, size))
>> + die("Unable to resize `%s': %m", path);
>> +
>> + addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
>> fd, 0);
>> + if (addr == MAP_FAILED)
>> + die("Unable to mmap '%s': %m", path);
>> +
>> + return addr;
>> +}
>> +
>> #ifdef CONFIG_EFI_STUB
>>
>> -static void update_pecoff_section_header_fields(char *section_name,
>> u32 vma, u32 size, u32 datasz, u32 offset)
>> +static void update_pecoff_section_header_fields(char *section_name,
>> uint32_t vma,
>> + uint32_t size,
>> uint32_t datasz,
>> + uint32_t offset)
>> {
>> unsigned int pe_header;
>> unsigned short num_sections;
>> - u8 *section;
>> + struct section_header *section;
>>
>> - pe_header = get_unaligned_le32(&buf[0x3c]);
>> - num_sections = get_unaligned_le16(&buf[pe_header + 6]);
>> -
>> -#ifdef CONFIG_X86_32
>> - section = &buf[pe_header + 0xa8];
>> -#else
>> - section = &buf[pe_header + 0xb8];
>> -#endif
>> + struct pe_hdr *hdr = get_pe_header(buf);
>> + num_sections = get_unaligned_le16(&hdr->sections);
>> + section = get_sections(buf);
>>
>> while (num_sections > 0) {
>> - if (strncmp((char*)section, section_name, 8) == 0) {
>> + if (strncmp(section->name, section_name, 8) == 0) {
>> /* section header size field */
>> - put_unaligned_le32(size, section + 0x8);
>> + put_unaligned_le32(size,
>> §ion->virtual_size);
>>
>> /* section header vma field */
>> - put_unaligned_le32(vma, section + 0xc);
>> + put_unaligned_le32(vma,
>> §ion->virtual_address);
>>
>> /* section header 'size of initialised data'
>> field */
>> - put_unaligned_le32(datasz, section + 0x10);
>> + put_unaligned_le32(datasz,
>> §ion->raw_data_size);
>>
>> /* section header 'file offset' field */
>> - put_unaligned_le32(offset, section + 0x14);
>> + put_unaligned_le32(offset,
>> §ion->data_addr);
>>
>> break;
>> }
>> - section += 0x28;
>> + section++;
>> num_sections--;
>> }
>> }
>>
>> -static void update_pecoff_section_header(char *section_name, u32
>> offset, u32 size)
>> +static void update_pecoff_section_header(char *section_name, uint32_t
>> offset, uint32_t size)
>> {
>> update_pecoff_section_header_fields(section_name, offset,
>> size, size, offset);
>> }
>>
>> static void update_pecoff_setup_and_reloc(unsigned int size)
>> {
>> - u32 setup_offset = 0x200;
>> - u32 reloc_offset = size - PECOFF_RELOC_RESERVE -
>> PECOFF_COMPAT_RESERVE;
>> + uint32_t setup_offset = SECTOR_SIZE;
>> + uint32_t reloc_offset = size - PECOFF_RELOC_RESERVE -
>> PECOFF_COMPAT_RESERVE;
>> #ifdef CONFIG_EFI_MIXED
>> - u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
>> + uint32_t compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
>> #endif
>> - u32 setup_size = reloc_offset - setup_offset;
>> + uint32_t setup_size = reloc_offset - setup_offset;
>>
>> update_pecoff_section_header(".setup", setup_offset,
>> setup_size);
>> update_pecoff_section_header(".reloc", reloc_offset,
>> PECOFF_RELOC_RESERVE);
>> @@ -211,8 +312,8 @@ static void update_pecoff_setup_and_reloc(unsigned
>> int size)
>> * Modify .reloc section contents with a single entry. The
>> * relocation is applied to offset 10 of the relocation
>> section.
>> */
>> - put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
>> - put_unaligned_le32(10, &buf[reloc_offset + 4]);
>> + put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE,
>> &buf[reloc_offset]);
>> + put_unaligned_le32(RELOC_SECTION_SIZE, &buf[reloc_offset +
>> 4]);
>>
>> #ifdef CONFIG_EFI_MIXED
>> update_pecoff_section_header(".compat", compat_offset,
>> PECOFF_COMPAT_RESERVE);
>> @@ -224,19 +325,17 @@ static void
>> update_pecoff_setup_and_reloc(unsigned int size)
>> */
>> buf[compat_offset] = 0x1;
>> buf[compat_offset + 1] = 0x8;
>> - put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
>> + put_unaligned_le16(IMAGE_FILE_MACHINE_I386, &buf[compat_offset
>> + 2]);
>> put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset +
>> 4]);
>> #endif
>> }
>>
>> -static void update_pecoff_text(unsigned int text_start, unsigned int
>> file_sz,
>> +static unsigned int update_pecoff_sections(unsigned int text_start,
>> unsigned int text_sz,
>> unsigned int init_sz)
>> {
>> - unsigned int pe_header;
>> - unsigned int text_sz = file_sz - text_start;
>> + unsigned int file_sz = text_start + text_sz;
>> unsigned int bss_sz = init_sz - file_sz;
>> -
>> - pe_header = get_unaligned_le32(&buf[0x3c]);
>> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
>>
>> /*
>> * The PE/COFF loader may load the image at an address which
>> is
>> @@ -254,18 +353,20 @@ static void update_pecoff_text(unsigned int
>> text_start, unsigned int file_sz,
>> * Size of code: Subtract the size of the first sector (512
>> bytes)
>> * which includes the header.
>> */
>> - put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header +
>> 0x1c]);
>> + put_unaligned_le32(file_sz - SECTOR_SIZE + bss_sz,
>> &hdr->text_size);
>>
>> /* Size of image */
>> - put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
>> + put_unaligned_le32(init_sz, &hdr->image_size);
>>
>> /*
>> * Address of entry point for PE/COFF executable
>> */
>> - put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header +
>> 0x28]);
>> + put_unaligned_le32(text_start + efi_pe_entry,
>> &hdr->entry_point);
>>
>> update_pecoff_section_header_fields(".text", text_start,
>> text_sz + bss_sz,
>> text_sz, text_start);
>> +
>> + return text_start + file_sz;
>> }
>>
>> static int reserve_pecoff_reloc_section(int c)
>> @@ -275,7 +376,7 @@ static int reserve_pecoff_reloc_section(int c)
>> return PECOFF_RELOC_RESERVE;
>> }
>>
>> -static void efi_stub_defaults(void)
>> +static void efi_stub_update_defaults(void)
>> {
>> /* Defaults for old kernel */
>> #ifdef CONFIG_X86_32
>> @@ -298,7 +399,7 @@ static void efi_stub_entry_update(void)
>>
>> #ifdef CONFIG_EFI_MIXED
>> if (efi32_stub_entry != addr)
>> - die("32-bit and 64-bit EFI entry points do not
>> match\n");
>> + die("32-bit and 64-bit EFI entry points do not
>> match");
>> #endif
>> #endif
>> put_unaligned_le32(addr, &buf[0x264]);
>> @@ -310,7 +411,7 @@ static inline void
>> update_pecoff_setup_and_reloc(unsigned int size) {}
>> static inline void update_pecoff_text(unsigned int text_start,
>> unsigned int file_sz,
>> unsigned int init_sz) {}
>> -static inline void efi_stub_defaults(void) {}
>> +static inline void efi_stub_update_defaults(void) {}
>> static inline void efi_stub_entry_update(void) {}
>>
>> static inline int reserve_pecoff_reloc_section(int c)
>> @@ -338,20 +439,15 @@ static int reserve_pecoff_compat_section(int c)
>>
>> static void parse_zoffset(char *fname)
>> {
>> - FILE *file;
>> - char *p;
>> - int c;
>> + size_t size;
>> + char *data, *p;
>>
>> - file = fopen(fname, "r");
>> - if (!file)
>> - die("Unable to open `%s': %m", fname);
>> - c = fread(buf, 1, sizeof(buf) - 1, file);
>> - if (ferror(file))
>> - die("read-error on `zoffset.h'");
>> - fclose(file);
>> - buf[c] = 0;
>> + data = map_file(fname, &size);
>>
>> - p = (char *)buf;
>> + /* We can do that, since we mapped one byte more */
>> + data[size] = 0;
>> +
>> + p = (char *)data;
>>
>> while (p && *p) {
>> PARSE_ZOFS(p, efi32_stub_entry);
>> @@ -367,82 +463,99 @@ static void parse_zoffset(char *fname)
>> while (p && (*p == '\r' || *p == '\n'))
>> p++;
>> }
>> +
>> + unmap_file(data, size);
>> }
>>
>> -int main(int argc, char ** argv)
>> +static unsigned int read_setup(char *path)
>> {
>> - unsigned int i, sz, setup_sectors, init_sz;
>> - int c;
>> - u32 sys_size;
>> - struct stat sb;
>> - FILE *file, *dest;
>> - int fd;
>> - void *kernel;
>> - u32 crc = 0xffffffffUL;
>> -
>> - efi_stub_defaults();
>> -
>> - if (argc != 5)
>> - usage();
>> - parse_zoffset(argv[3]);
>> -
>> - dest = fopen(argv[4], "w");
>> - if (!dest)
>> - die("Unable to write `%s': %m", argv[4]);
>> + FILE *file;
>> + unsigned int setup_size, file_size;
>>
>> /* Copy the setup code */
>> - file = fopen(argv[1], "r");
>> + file = fopen(path, "r");
>> if (!file)
>> - die("Unable to open `%s': %m", argv[1]);
>> - c = fread(buf, 1, sizeof(buf), file);
>> + die("Unable to open `%s': %m", path);
>> +
>> + file_size = fread(buf, 1, sizeof(buf), file);
>> if (ferror(file))
>> die("read-error on `setup'");
>> - if (c < 1024)
>> +
>> + if (file_size < 2 * SECTOR_SIZE)
>> die("The setup must be at least 1024 bytes");
>> - if (get_unaligned_le16(&buf[510]) != 0xAA55)
>> +
>> + if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
>> die("Boot block hasn't got boot flag (0xAA55)");
>> +
>> fclose(file);
>>
>> - c += reserve_pecoff_compat_section(c);
>> - c += reserve_pecoff_reloc_section(c);
>> + /* Reserve space for PE sections */
>> + file_size += reserve_pecoff_compat_section(file_size);
>> + file_size += reserve_pecoff_reloc_section(file_size);
>>
>> /* Pad unused space with zeros */
>> - setup_sectors = (c + 511) / 512;
>> - if (setup_sectors < SETUP_SECT_MIN)
>> - setup_sectors = SETUP_SECT_MIN;
>> - i = setup_sectors*512;
>> - memset(buf+c, 0, i-c);
>>
>> - update_pecoff_setup_and_reloc(i);
>> + setup_size = round_up(file_size, SECTOR_SIZE);
>> +
>> + if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
>> + setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
>> +
>> + /*
>> + * Global buffer is already initialised
>> + * to 0, but just in case, zero out padding.
>> + */
>> +
>> + memset(buf + file_size, 0, setup_size - file_size);
>> +
>> + return setup_size;
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> + size_t kern_file_size;
>> + unsigned int setup_size;
>> + unsigned int setup_sectors;
>> + unsigned int init_size;
>> + unsigned int total_size;
>> + unsigned int kern_size;
>> + void *kernel;
>> + uint32_t crc = 0xffffffffUL;
>> + uint8_t *output;
>> +
>> + if (argc != 5)
>> + usage();
>> +
>> + efi_stub_update_defaults();
>> + parse_zoffset(argv[3]);
>> +
>> + setup_size = read_setup(argv[1]);
>> +
>> + setup_sectors = setup_size/SECTOR_SIZE;
>>
>> /* Set the default root device */
>> put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
>>
>> - /* Open and stat the kernel file */
>> - fd = open(argv[2], O_RDONLY);
>> - if (fd < 0)
>> - die("Unable to open `%s': %m", argv[2]);
>> - if (fstat(fd, &sb))
>> - die("Unable to stat `%s': %m", argv[2]);
>> - sz = sb.st_size;
>> - kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
>> - if (kernel == MAP_FAILED)
>> - die("Unable to mmap '%s': %m", argv[2]);
>> - /* Number of 16-byte paragraphs, including space for a 4-byte
>> CRC */
>> - sys_size = (sz + 15 + 4) / 16;
>> + /* Map kernel file to memory */
>> + kernel = map_file(argv[2], &kern_file_size);
>> +
>> #ifdef CONFIG_EFI_STUB
>> - /*
>> - * COFF requires minimum 32-byte alignment of sections, and
>> - * adding a signature is problematic without that alignment.
>> - */
>> - sys_size = (sys_size + 1) & ~1;
>> + /* PE specification require 512-byte minimum section file
>> alignment */
>> + kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
>> + update_pecoff_setup_and_reloc(setup_size);
>> +#else
>> + /* Number of 16-byte paragraphs, including space for a 4-byte
>> CRC */
>> + kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
>> #endif
>>
>> /* Patch the setup code with the appropriate size parameters
>> */
>> - buf[0x1f1] = setup_sectors-1;
>> - put_unaligned_le32(sys_size, &buf[0x1f4]);
>> + buf[0x1f1] = setup_sectors - 1;
>> + put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
>> +
>> + /* Update kernel_info offset. */
>> + put_unaligned_le32(kernel_info, &buf[0x268]);
>> +
>> + init_size = get_unaligned_le32(&buf[0x260]);
>>
>> - init_sz = get_unaligned_le32(&buf[0x260]);
>> #ifdef CONFIG_EFI_STUB
>> /*
>> * The decompression buffer will start at ImageBase. When
>> relocating
>> @@ -458,45 +571,35 @@ int main(int argc, char ** argv)
>> * For future-proofing, increase init_sz if necessary.
>> */
>>
>> - if (init_sz - _end < i + _ehead) {
>> - init_sz = (i + _ehead + _end + 4095) & ~4095;
>> - put_unaligned_le32(init_sz, &buf[0x260]);
>> + if (init_size - _end < setup_size + _ehead) {
>> + init_size = round_up(setup_size + _ehead + _end,
>> SECTION_ALIGNMENT);
>> + put_unaligned_le32(init_size, &buf[0x260]);
>> }
>> -#endif
>> - update_pecoff_text(setup_sectors * 512, i + (sys_size * 16),
>> init_sz);
>>
>> - efi_stub_entry_update();
>> -
>> - /* Update kernel_info offset. */
>> - put_unaligned_le32(kernel_info, &buf[0x268]);
>> + total_size = update_pecoff_sections(setup_size, kern_size,
>> init_size);
>>
>> - crc = partial_crc32(buf, i, crc);
>> - if (fwrite(buf, 1, i, dest) != i)
>> - die("Writing setup failed");
>> + efi_stub_entry_update();
>> +#else
>> + (void)init_size;
>> + total_size = setup_size + kern_size;
>> +#endif
>>
>> - /* Copy the kernel code */
>> - crc = partial_crc32(kernel, sz, crc);
>> - if (fwrite(kernel, 1, sz, dest) != sz)
>> - die("Writing kernel failed");
>> + output = map_output_file(argv[4], total_size);
>>
>> - /* Add padding leaving 4 bytes for the checksum */
>> - while (sz++ < (sys_size*16) - 4) {
>> - crc = partial_crc32_one('\0', crc);
>> - if (fwrite("\0", 1, 1, dest) != 1)
>> - die("Writing padding failed");
>> - }
>> + memcpy(output, buf, setup_size);
>> + memcpy(output + setup_size, kernel, kern_file_size);
>> + memset(output + setup_size + kern_file_size, 0, kern_size -
>> kern_file_size);
>>
>> - /* Write the CRC */
>> - put_unaligned_le32(crc, buf);
>> - if (fwrite(buf, 1, 4, dest) != 4)
>> - die("Writing CRC failed");
>> + /* Calculate and write kernel checksum. */
>> + crc = partial_crc32(output, total_size - 4, crc);
>> + put_unaligned_le32(crc, &output[total_size - 4]);
>>
>> - /* Catch any delayed write failures */
>> - if (fclose(dest))
>> - die("Writing image failed");
>> + /* Catch any delayed write failures. */
>> + if (munmap(output, total_size) < 0)
>> + die("Writing kernel failed");
>>
>> - close(fd);
>> + unmap_file(kernel, kern_file_size);
>>
>> - /* Everything is OK */
>> + /* Everything is OK. */
>> return 0;
>> }
>> --
>> 2.37.4
>>
On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>
> Doing it that way allows setting up stricter memory attributes,
> simplifies boot code path and removes potential relocation
> of kernel image.
>
> Wire up required interfaces and minimally initialize zero page
> fields needed for it to function correctly.
>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
OK I just realized that there is a problem with this approach: since
we now decompress the image while running in the EFI stub (i.e.,
before ExitBootServices()), we cannot just randomly pick a
EFI_CONVENTIONAL_MEMORY region to place the kernel, we need to
allocate the pages using the boot services. Otherwise, subsequent
allocations (or concurrent ones occurring in the firmware in event
handlers etc) may land right in the middle, which is unlikely to be
what we want.
> ---
> arch/x86/boot/compressed/head_32.S | 50 ++++-
> arch/x86/boot/compressed/head_64.S | 58 ++++-
> drivers/firmware/efi/Kconfig | 2 +
> drivers/firmware/efi/libstub/Makefile | 2 +-
> .../firmware/efi/libstub/x86-extract-direct.c | 208 ++++++++++++++++++
> drivers/firmware/efi/libstub/x86-stub.c | 119 +---------
> drivers/firmware/efi/libstub/x86-stub.h | 14 ++
> 7 files changed, 338 insertions(+), 115 deletions(-)
> create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
> create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
>
> diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
> index ead6007df1e5..0be75e5072ae 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -152,11 +152,57 @@ SYM_FUNC_END(startup_32)
>
> #ifdef CONFIG_EFI_STUB
> SYM_FUNC_START(efi32_stub_entry)
> +/*
> + * Calculate the delta between where we were compiled to run
> + * at and where we were actually loaded at. This can only be done
> + * with a short local call on x86. Nothing else will tell us what
> + * address we are running at. The reserved chunk of the real-mode
> + * data at 0x1e4 (defined as a scratch field) are used as the stack
> + * for this calculation. Only 4 bytes are needed.
> + */
> + call 1f
> +1: popl %ebx
> + addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %ebx
> +
> + /* Clear BSS */
> + xorl %eax, %eax
> + leal _bss@GOTOFF(%ebx), %edi
> + leal _ebss@GOTOFF(%ebx), %ecx
> + subl %edi, %ecx
> + shrl $2, %ecx
> + rep stosl
> +
> add $0x4, %esp
> movl 8(%esp), %esi /* save boot_params pointer */
> + movl %edx, %edi /* save GOT address */
> call efi_main
> - /* efi_main returns the possibly relocated address of startup_32 */
> - jmp *%eax
> + movl %eax, %ecx
> +
> + /*
> + * efi_main returns the possibly
> + * relocated address of extracted kernel entry point.
> + */
> +
> + cli
> +
> + /* Load new GDT */
> + leal gdt@GOTOFF(%ebx), %eax
> + movl %eax, 2(%eax)
> + lgdt (%eax)
> +
> + /* Load segment registers with our descriptors */
> + movl $__BOOT_DS, %eax
> + movl %eax, %ds
> + movl %eax, %es
> + movl %eax, %fs
> + movl %eax, %gs
> + movl %eax, %ss
> +
> + /* Zero EFLAGS */
> + pushl $0
> + popfl
> +
> + jmp *%ecx
> SYM_FUNC_END(efi32_stub_entry)
> SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
> #endif
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 2dd8be0583d2..7cfef7bd0424 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -529,12 +529,64 @@ SYM_CODE_END(startup_64)
> .org 0x390
> #endif
> SYM_FUNC_START(efi64_stub_entry)
> + /* Preserve first parameter */
> + movq %rdi, %r10
> +
> + /* Clear BSS */
> + xorl %eax, %eax
> + leaq _bss(%rip), %rdi
> + leaq _ebss(%rip), %rcx
> + subq %rdi, %rcx
> + shrq $3, %rcx
> + rep stosq
> +
> and $~0xf, %rsp /* realign the stack */
> movq %rdx, %rbx /* save boot_params pointer */
> + movq %r10, %rdi
> call efi_main
> - movq %rbx,%rsi
> - leaq rva(startup_64)(%rax), %rax
> - jmp *%rax
> +
> + cld
> + cli
> +
> + movq %rbx, %rdi /* boot_params */
> + movq %rax, %rsi /* decompressed kernel address */
> +
> + /* Make sure we have GDT with 32-bit code segment */
> + leaq gdt64(%rip), %rax
> + addq %rax, 2(%rax)
> + lgdt (%rax)
> +
> + /* Setup data segments. */
> + xorl %eax, %eax
> + movl %eax, %ds
> + movl %eax, %es
> + movl %eax, %ss
> + movl %eax, %fs
> + movl %eax, %gs
> +
> + pushq %rsi
> + pushq %rdi
> +
> + call load_stage1_idt
> + call enable_nx_if_supported
> +
> + call trampoline_pgtable_init
> + movq %rax, %rdx
> +
> +
> + /* Swap %rsi and %rsi */
> + popq %rsi
> + popq %rdi
> +
> + /* Save the trampoline address in RCX */
> + movq trampoline_32bit(%rip), %rcx
> +
> + /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
> + pushq $__KERNEL32_CS
> + leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rcx), %rax
> + pushq %rax
> + lretq
> +
> SYM_FUNC_END(efi64_stub_entry)
> SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
> #endif
> diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
> index 043ca31c114e..f50c2a84a754 100644
> --- a/drivers/firmware/efi/Kconfig
> +++ b/drivers/firmware/efi/Kconfig
> @@ -58,6 +58,8 @@ config EFI_DXE_MEM_ATTRIBUTES
> Use DXE services to check and alter memory protection
> attributes during boot via EFISTUB to ensure that memory
> ranges used by the kernel are writable and executable.
> + This option also enables stricter memory attributes
> + on compressed kernel PE image.
>
> config EFI_PARAMS_FROM_FDT
> bool
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index be8b8c6e8b40..99b81c95344c 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -88,7 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \
>
> lib-$(CONFIG_ARM) += arm32-stub.o
> lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o arm64-entry.o smbios.o
> -lib-$(CONFIG_X86) += x86-stub.o
> +lib-$(CONFIG_X86) += x86-stub.o x86-extract-direct.o
> lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
> lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o
>
> diff --git a/drivers/firmware/efi/libstub/x86-extract-direct.c b/drivers/firmware/efi/libstub/x86-extract-direct.c
> new file mode 100644
> index 000000000000..4ecbc4a9b3ed
> --- /dev/null
> +++ b/drivers/firmware/efi/libstub/x86-extract-direct.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/acpi.h>
> +#include <linux/efi.h>
> +#include <linux/elf.h>
> +#include <linux/stddef.h>
> +
> +#include <asm/efi.h>
> +#include <asm/e820/types.h>
> +#include <asm/desc.h>
> +#include <asm/boot.h>
> +#include <asm/bootparam_utils.h>
> +#include <asm/shared/extract.h>
> +#include <asm/shared/pgtable.h>
> +
> +#include "efistub.h"
> +#include "x86-stub.h"
> +
> +static efi_handle_t image_handle;
> +
> +static void do_puthex(unsigned long value)
> +{
> + efi_printk("%08lx", value);
> +}
> +
> +static void do_putstr(const char *msg)
> +{
> + efi_printk("%s", msg);
> +}
> +
> +static unsigned long do_map_range(unsigned long start,
> + unsigned long end,
> + unsigned int flags)
> +{
> + efi_status_t status;
> +
> + unsigned long size = end - start;
> +
> + if (flags & MAP_ALLOC) {
> + unsigned long addr;
> +
> + status = efi_low_alloc_above(size, CONFIG_PHYSICAL_ALIGN,
> + &addr, start);
> + if (status != EFI_SUCCESS) {
> + efi_err("Unable to allocate memory for uncompressed kernel");
> + efi_exit(image_handle, EFI_OUT_OF_RESOURCES);
> + }
> +
> + if (start != addr) {
> + efi_debug("Unable to allocate at given address"
> + " (desired=0x%lx, actual=0x%lx)",
> + (unsigned long)start, addr);
> + start = addr;
> + }
> + }
> +
> + if ((flags & (MAP_PROTECT | MAP_ALLOC)) &&
> + IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> + unsigned long attr = 0;
> +
> + if (!(flags & MAP_EXEC))
> + attr |= EFI_MEMORY_XP;
> +
> + if (!(flags & MAP_WRITE))
> + attr |= EFI_MEMORY_RO;
> +
> + status = efi_adjust_memory_range_protection(start, size, attr);
> + if (status != EFI_SUCCESS)
> + efi_err("Unable to protect memory range");
> + }
> +
> + return start;
> +}
> +
> +/*
> + * Trampoline takes 3 pages and can be loaded in first megabyte of memory
> + * with its end placed between 0 and 640k where BIOS might start.
> + * (see arch/x86/boot/compressed/pgtable_64.c)
> + */
> +
> +#ifdef CONFIG_64BIT
> +static efi_status_t prepare_trampoline(void)
> +{
> + efi_status_t status;
> +
> + status = efi_allocate_pages(TRAMPOLINE_32BIT_SIZE,
> + (unsigned long *)&trampoline_32bit,
> + TRAMPOLINE_32BIT_PLACEMENT_MAX);
> +
> + if (status != EFI_SUCCESS)
> + return status;
> +
> + unsigned long trampoline_start = (unsigned long)trampoline_32bit;
> +
> + memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
> +
> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> + /* First page of trampoline is a top level page table */
> + efi_adjust_memory_range_protection(trampoline_start,
> + PAGE_SIZE,
> + EFI_MEMORY_XP);
> + }
> +
> + /* Second page of trampoline is the code (with a padding) */
> +
> + void *caddr = (void *)trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET;
> +
> + memcpy(caddr, trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> +
> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> + efi_adjust_memory_range_protection((unsigned long)caddr,
> + PAGE_SIZE,
> + EFI_MEMORY_RO);
> +
> + /* And the last page of trampoline is the stack */
> +
> + efi_adjust_memory_range_protection(trampoline_start + 2 * PAGE_SIZE,
> + PAGE_SIZE,
> + EFI_MEMORY_XP);
> + }
> +
> + return EFI_SUCCESS;
> +}
> +#else
> +static inline efi_status_t prepare_trampoline(void)
> +{
> + return EFI_SUCCESS;
> +}
> +#endif
> +
> +static efi_status_t init_loader_data(efi_handle_t handle,
> + struct boot_params *params,
> + struct efi_boot_memmap **map)
> +{
> + struct efi_info *efi = (void *)¶ms->efi_info;
> + efi_status_t status;
> +
> + status = efi_get_memory_map(map, false);
> +
> + if (status != EFI_SUCCESS) {
> + efi_err("Unable to get EFI memory map...\n");
> + return status;
> + }
> +
> + const char *signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
> + : EFI32_LOADER_SIGNATURE;
> +
> + memcpy(&efi->efi_loader_signature, signature, sizeof(__u32));
> +
> + efi->efi_memdesc_size = (*map)->desc_size;
> + efi->efi_memdesc_version = (*map)->desc_ver;
> + efi->efi_memmap_size = (*map)->map_size;
> +
> + efi_set_u64_split((unsigned long)(*map)->map,
> + &efi->efi_memmap, &efi->efi_memmap_hi);
> +
> + efi_set_u64_split((unsigned long)efi_system_table,
> + &efi->efi_systab, &efi->efi_systab_hi);
> +
> + image_handle = handle;
> +
> + return EFI_SUCCESS;
> +}
> +
> +static void free_loader_data(struct boot_params *params, struct efi_boot_memmap *map)
> +{
> + struct efi_info *efi = (void *)¶ms->efi_info;
> +
> + efi_bs_call(free_pool, map);
> +
> + efi->efi_memdesc_size = 0;
> + efi->efi_memdesc_version = 0;
> + efi->efi_memmap_size = 0;
> + efi_set_u64_split(0, &efi->efi_memmap, &efi->efi_memmap_hi);
> +}
> +
> +extern unsigned char input_data[];
> +extern unsigned int input_len, output_len;
> +
> +unsigned long extract_kernel_direct(efi_handle_t handle, struct boot_params *params)
> +{
> +
> + void *res;
> + efi_status_t status;
> + struct efi_extract_callbacks cb = { 0 };
> +
> + status = prepare_trampoline();
> +
> + if (status != EFI_SUCCESS)
> + return 0;
> +
> + /* Prepare environment for do_extract_kernel() call */
> + struct efi_boot_memmap *map = NULL;
> + status = init_loader_data(handle, params, &map);
> +
> + if (status != EFI_SUCCESS)
> + return 0;
> +
> + cb.puthex = do_puthex;
> + cb.putstr = do_putstr;
> + cb.map_range = do_map_range;
> +
> + res = efi_extract_kernel(params, &cb, input_data, input_len, output_len);
> +
> + free_loader_data(params, map);
> +
> + return (unsigned long)res;
> +}
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 7fb1eff88a18..1d1ab1911fd3 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -17,6 +17,7 @@
> #include <asm/boot.h>
>
> #include "efistub.h"
> +#include "x86-stub.h"
>
> /* Maximum physical address for 64-bit kernel with 4-level paging */
> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
> @@ -24,7 +25,7 @@
> const efi_system_table_t *efi_system_table;
> const efi_dxe_services_table_t *efi_dxe_table;
> u32 image_offset __section(".data");
> -static efi_loaded_image_t *image = NULL;
> +static efi_loaded_image_t *image __section(".data");
>
> static efi_status_t
> preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
> @@ -212,55 +213,9 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
> }
> }
>
> -/*
> - * Trampoline takes 2 pages and can be loaded in first megabyte of memory
> - * with its end placed between 128k and 640k where BIOS might start.
> - * (see arch/x86/boot/compressed/pgtable_64.c)
> - *
> - * We cannot find exact trampoline placement since memory map
> - * can be modified by UEFI, and it can alter the computed address.
> - */
> -
> -#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
> -#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
> -
> -void startup_32(struct boot_params *boot_params);
> -
> -static void
> -setup_memory_protection(unsigned long image_base, unsigned long image_size)
> -{
> - /*
> - * Allow execution of possible trampoline used
> - * for switching between 4- and 5-level page tables
> - * and relocated kernel image.
> - */
> -
> - efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
> - TRAMPOLINE_PLACEMENT_SIZE, 0);
> -
> -#ifdef CONFIG_64BIT
> - if (image_base != (unsigned long)startup_32)
> - efi_adjust_memory_range_protection(image_base, image_size, 0);
> -#else
> - /*
> - * Clear protection flags on a whole range of possible
> - * addresses used for KASLR. We don't need to do that
> - * on x86_64, since KASLR/extraction is performed after
> - * dedicated identity page tables are built and we only
> - * need to remove possible protection on relocated image
> - * itself disregarding further relocations.
> - */
> - efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
> - KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR,
> - 0);
> -#endif
> -}
> -
> static const efi_char16_t apple[] = L"Apple";
>
> -static void setup_quirks(struct boot_params *boot_params,
> - unsigned long image_base,
> - unsigned long image_size)
> +static void setup_quirks(struct boot_params *boot_params)
> {
> efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
> efi_table_attr(efi_system_table, fw_vendor);
> @@ -269,9 +224,6 @@ static void setup_quirks(struct boot_params *boot_params,
> if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
> retrieve_apple_device_properties(boot_params);
> }
> -
> - if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
> - setup_memory_protection(image_base, image_size);
> }
>
> /*
> @@ -384,7 +336,7 @@ static void setup_graphics(struct boot_params *boot_params)
> }
>
>
> -static void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
> {
> efi_bs_call(exit, handle, status, 0, NULL);
> for(;;)
> @@ -707,8 +659,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
> }
>
> /*
> - * On success, we return the address of startup_32, which has potentially been
> - * relocated by efi_relocate_kernel.
> + * On success, we return extracted kernel entry point.
> * On failure, we exit to the firmware via efi_exit instead of returning.
> */
> asmlinkage unsigned long efi_main(efi_handle_t handle,
> @@ -733,60 +684,6 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
> efi_dxe_table = NULL;
> }
>
> - /*
> - * If the kernel isn't already loaded at a suitable address,
> - * relocate it.
> - *
> - * It must be loaded above LOAD_PHYSICAL_ADDR.
> - *
> - * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
> - * is defined as the macro MAXMEM, but unfortunately that is not a
> - * compile-time constant if 5-level paging is configured, so we instead
> - * define our own macro for use here.
> - *
> - * For 32-bit, the maximum address is complicated to figure out, for
> - * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
> - * KASLR uses.
> - *
> - * Also relocate it if image_offset is zero, i.e. the kernel wasn't
> - * loaded by LoadImage, but rather by a bootloader that called the
> - * handover entry. The reason we must always relocate in this case is
> - * to handle the case of systemd-boot booting a unified kernel image,
> - * which is a PE executable that contains the bzImage and an initrd as
> - * COFF sections. The initrd section is placed after the bzImage
> - * without ensuring that there are at least init_size bytes available
> - * for the bzImage, and thus the compressed kernel's startup code may
> - * overwrite the initrd unless it is moved out of the way.
> - */
> -
> - buffer_start = ALIGN(bzimage_addr - image_offset,
> - hdr->kernel_alignment);
> - buffer_end = buffer_start + hdr->init_size;
> -
> - if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
> - (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
> - (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
> - (image_offset == 0)) {
> - extern char _bss[];
> -
> - status = efi_relocate_kernel(&bzimage_addr,
> - (unsigned long)_bss - bzimage_addr,
> - hdr->init_size,
> - hdr->pref_address,
> - hdr->kernel_alignment,
> - LOAD_PHYSICAL_ADDR);
> - if (status != EFI_SUCCESS) {
> - efi_err("efi_relocate_kernel() failed!\n");
> - goto fail;
> - }
> - /*
> - * Now that we've copied the kernel elsewhere, we no longer
> - * have a set up block before startup_32(), so reset image_offset
> - * to zero in case it was set earlier.
> - */
> - image_offset = 0;
> - }
> -
> #ifdef CONFIG_CMDLINE_BOOL
> status = efi_parse_options(CONFIG_CMDLINE);
> if (status != EFI_SUCCESS) {
> @@ -843,7 +740,11 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
>
> setup_efi_pci(boot_params);
>
> - setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
> + setup_quirks(boot_params);
> +
> + bzimage_addr = extract_kernel_direct(handle, boot_params);
> + if (!bzimage_addr)
> + goto fail;
>
> status = exit_boot(boot_params, handle);
> if (status != EFI_SUCCESS) {
> diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
> new file mode 100644
> index 000000000000..baecc7c6e602
> --- /dev/null
> +++ b/drivers/firmware/efi/libstub/x86-stub.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _DRIVERS_FIRMWARE_EFI_X86STUB_H
> +#define _DRIVERS_FIRMWARE_EFI_X86STUB_H
> +
> +#include <linux/efi.h>
> +
> +#include <asm/bootparam.h>
> +
> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status);
> +unsigned long extract_kernel_direct(efi_handle_t handle, struct boot_params *boot_params);
> +void startup_32(struct boot_params *boot_params);
> +
> +#endif
> --
> 2.37.4
>
On Thu, 9 Mar 2023 at 17:25, Evgeniy Baskov <[email protected]> wrote:
>
> On 2023-03-09 18:57, Ard Biesheuvel wrote:
> > On Thu, 15 Dec 2022 at 13:42, Evgeniy Baskov <[email protected]> wrote:
> >>
> >> Use newer C standard. Since kernel requires C99 compiler now,
> >> we can make use of the new features to make the core more readable.
> >>
> >> Use mmap() for reading files also to make things simpler.
> >>
> >> Replace most magic numbers with defines.
> >>
> >> Should have no functional changes. This is done in preparation for the
> >> next changes that makes generated PE header more spec compliant.
> >>
> >> Tested-by: Mario Limonciello <[email protected]>
> >> Tested-by: Peter Jones <[email protected]>
> >> Signed-off-by: Evgeniy Baskov <[email protected]>
> >> ---
> >> arch/x86/boot/tools/build.c | 387
> >> +++++++++++++++++++++++-------------
> >> 1 file changed, 245 insertions(+), 142 deletions(-)
> >>
> >> diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
> >> index bd247692b701..fbc5315af032 100644
> >> --- a/arch/x86/boot/tools/build.c
> >> +++ b/arch/x86/boot/tools/build.c
> >> @@ -25,20 +25,21 @@
> >> * Substantially overhauled by H. Peter Anvin, April 2007
> >> */
> >>
> >> +#include <fcntl.h>
> >> +#include <stdarg.h>
> >> +#include <stdint.h>
> >> #include <stdio.h>
> >> -#include <string.h>
> >> #include <stdlib.h>
> >> -#include <stdarg.h>
> >> -#include <sys/types.h>
> >> +#include <string.h>
> >> +#include <sys/mman.h>
> >> #include <sys/stat.h>
> >> +#include <sys/types.h>
> >> #include <unistd.h>
> >> -#include <fcntl.h>
> >> -#include <sys/mman.h>
> >> +
> >> #include <tools/le_byteshift.h>
> >> +#include <linux/pe.h>
> >>
> >> -typedef unsigned char u8;
> >> -typedef unsigned short u16;
> >> -typedef unsigned int u32;
> >> +#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
> >>
> >> #define DEFAULT_MAJOR_ROOT 0
> >> #define DEFAULT_MINOR_ROOT 0
> >> @@ -48,8 +49,13 @@ typedef unsigned int u32;
> >> #define SETUP_SECT_MIN 5
> >> #define SETUP_SECT_MAX 64
> >>
> >> +#define PARAGRAPH_SIZE 16
> >> +#define SECTOR_SIZE 512
> >> +#define FILE_ALIGNMENT 512
> >> +#define SECTION_ALIGNMENT 4096
> >> +
> >> /* This must be large enough to hold the entire setup */
> >> -u8 buf[SETUP_SECT_MAX*512];
> >> +uint8_t buf[SETUP_SECT_MAX*SECTOR_SIZE];
> >>
> >> #define PECOFF_RELOC_RESERVE 0x20
> >>
> >> @@ -59,6 +65,52 @@ u8 buf[SETUP_SECT_MAX*512];
> >> #define PECOFF_COMPAT_RESERVE 0x0
> >> #endif
> >>
> >> +#define RELOC_SECTION_SIZE 10
> >> +
> >> +/* PE header has different format depending on the architecture */
> >> +#ifdef CONFIG_X86_64
> >> +typedef struct pe32plus_opt_hdr pe_opt_hdr;
> >> +#else
> >> +typedef struct pe32_opt_hdr pe_opt_hdr;
> >> +#endif
> >> +
> >> +static inline struct pe_hdr *get_pe_header(uint8_t *buf)
> >> +{
> >> + uint32_t pe_offset =
> >> get_unaligned_le32(buf+MZ_HEADER_PEADDR_OFFSET);
> >> + return (struct pe_hdr *)(buf + pe_offset);
> >> +}
> >> +
> >> +static inline pe_opt_hdr *get_pe_opt_header(uint8_t *buf)
> >> +{
> >> + return (pe_opt_hdr *)(get_pe_header(buf) + 1);
> >> +}
> >> +
> >> +static inline struct section_header *get_sections(uint8_t *buf)
> >> +{
> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> >> + uint32_t n_data_dirs = get_unaligned_le32(&hdr->data_dirs);
> >> + uint8_t *sections = (uint8_t *)(hdr + 1) +
> >> n_data_dirs*sizeof(struct data_dirent);
> >> + return (struct section_header *)sections;
> >> +}
> >> +
> >> +static inline struct data_directory *get_data_dirs(uint8_t *buf)
> >> +{
> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> >> + return (struct data_directory *)(hdr + 1);
> >> +}
> >> +
> >> +#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
> >
> > Can we drop this conditional?
>
> Without CONFIG_EFI_DXE_MEM_ATTRIBUTES memory attributes are not
> getting applies anywhere, so this would break 'nokaslr' on UEFI
> implementations that honor section attributes.
>
How so? This only affects the mappings that are created by UEFI for
the decompressor binary, right?
> KASLR is already broken without that option on implementations
> that disallow execution of the free memory though. But unlike
> free memory, sections are more likely to get protected, I think.
>
We need to allocate those pages properly in any case (see my other
reply) so it is no longer free memory.
> >> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE |
> >> IMAGE_SCN_ALIGN_4096BYTES)
> >> +#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE |
> >> IMAGE_SCN_ALIGN_4096BYTES)
> >> +#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
> >
> > Please drop the alignment flags - they don't apply to executable only
> > object files.
>
> Got it, will remove them in v5.
>
> >
> >> +#else
> >> +/* With memory protection disabled all sections are RWX */
> >> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
> >> + IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
> >> +#define SCN_RX SCN_RW
> >> +#define SCN_RO SCN_RW
> >> +#endif
> >> +
> >> static unsigned long efi32_stub_entry;
> >> static unsigned long efi64_stub_entry;
> >> static unsigned long efi_pe_entry;
> >> @@ -70,7 +122,7 @@ static unsigned long _end;
> >>
> >>
> >> /*----------------------------------------------------------------------*/
> >>
> >> -static const u32 crctab32[] = {
> >> +static const uint32_t crctab32[] = {
> >
> > Replacing all the type names makes this patch very messy. Can we back
> > that out please?
>
> Ok, I will revert them.
>
> >
> >> 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419,
> >> 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4,
> >> 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07,
> >> @@ -125,12 +177,12 @@ static const u32 crctab32[] = {
> >> 0x2d02ef8d
> >> };
> >>
> >> -static u32 partial_crc32_one(u8 c, u32 crc)
> >> +static uint32_t partial_crc32_one(uint8_t c, uint32_t crc)
> >> {
> >> return crctab32[(crc ^ c) & 0xff] ^ (crc >> 8);
> >> }
> >>
> >> -static u32 partial_crc32(const u8 *s, int len, u32 crc)
> >> +static uint32_t partial_crc32(const uint8_t *s, int len, uint32_t
> >> crc)
> >> {
> >> while (len--)
> >> crc = partial_crc32_one(*s++, crc);
> >> @@ -152,57 +204,106 @@ static void usage(void)
> >> die("Usage: build setup system zoffset.h image");
> >> }
> >>
> >> +static void *map_file(const char *path, size_t *psize)
> >> +{
> >> + struct stat statbuf;
> >> + size_t size;
> >> + void *addr;
> >> + int fd;
> >> +
> >> + fd = open(path, O_RDONLY);
> >> + if (fd < 0)
> >> + die("Unable to open `%s': %m", path);
> >> + if (fstat(fd, &statbuf))
> >> + die("Unable to stat `%s': %m", path);
> >> +
> >> + size = statbuf.st_size;
> >> + /*
> >> + * Map one byte more, to allow adding null-terminator
> >> + * for text files.
> >> + */
> >> + addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE,
> >> MAP_PRIVATE, fd, 0);
> >> + if (addr == MAP_FAILED)
> >> + die("Unable to mmap '%s': %m", path);
> >> +
> >> + close(fd);
> >> +
> >> + *psize = size;
> >> + return addr;
> >> +}
> >> +
> >> +static void unmap_file(void *addr, size_t size)
> >> +{
> >> + munmap(addr, size + 1);
> >> +}
> >> +
> >> +static void *map_output_file(const char *path, size_t size)
> >> +{
> >> + void *addr;
> >> + int fd;
> >> +
> >> + fd = open(path, O_RDWR | O_CREAT, 0660);
> >> + if (fd < 0)
> >> + die("Unable to create `%s': %m", path);
> >> +
> >> + if (ftruncate(fd, size))
> >> + die("Unable to resize `%s': %m", path);
> >> +
> >> + addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
> >> fd, 0);
> >> + if (addr == MAP_FAILED)
> >> + die("Unable to mmap '%s': %m", path);
> >> +
> >> + return addr;
> >> +}
> >> +
> >> #ifdef CONFIG_EFI_STUB
> >>
> >> -static void update_pecoff_section_header_fields(char *section_name,
> >> u32 vma, u32 size, u32 datasz, u32 offset)
> >> +static void update_pecoff_section_header_fields(char *section_name,
> >> uint32_t vma,
> >> + uint32_t size,
> >> uint32_t datasz,
> >> + uint32_t offset)
> >> {
> >> unsigned int pe_header;
> >> unsigned short num_sections;
> >> - u8 *section;
> >> + struct section_header *section;
> >>
> >> - pe_header = get_unaligned_le32(&buf[0x3c]);
> >> - num_sections = get_unaligned_le16(&buf[pe_header + 6]);
> >> -
> >> -#ifdef CONFIG_X86_32
> >> - section = &buf[pe_header + 0xa8];
> >> -#else
> >> - section = &buf[pe_header + 0xb8];
> >> -#endif
> >> + struct pe_hdr *hdr = get_pe_header(buf);
> >> + num_sections = get_unaligned_le16(&hdr->sections);
> >> + section = get_sections(buf);
> >>
> >> while (num_sections > 0) {
> >> - if (strncmp((char*)section, section_name, 8) == 0) {
> >> + if (strncmp(section->name, section_name, 8) == 0) {
> >> /* section header size field */
> >> - put_unaligned_le32(size, section + 0x8);
> >> + put_unaligned_le32(size,
> >> §ion->virtual_size);
> >>
> >> /* section header vma field */
> >> - put_unaligned_le32(vma, section + 0xc);
> >> + put_unaligned_le32(vma,
> >> §ion->virtual_address);
> >>
> >> /* section header 'size of initialised data'
> >> field */
> >> - put_unaligned_le32(datasz, section + 0x10);
> >> + put_unaligned_le32(datasz,
> >> §ion->raw_data_size);
> >>
> >> /* section header 'file offset' field */
> >> - put_unaligned_le32(offset, section + 0x14);
> >> + put_unaligned_le32(offset,
> >> §ion->data_addr);
> >>
> >> break;
> >> }
> >> - section += 0x28;
> >> + section++;
> >> num_sections--;
> >> }
> >> }
> >>
> >> -static void update_pecoff_section_header(char *section_name, u32
> >> offset, u32 size)
> >> +static void update_pecoff_section_header(char *section_name, uint32_t
> >> offset, uint32_t size)
> >> {
> >> update_pecoff_section_header_fields(section_name, offset,
> >> size, size, offset);
> >> }
> >>
> >> static void update_pecoff_setup_and_reloc(unsigned int size)
> >> {
> >> - u32 setup_offset = 0x200;
> >> - u32 reloc_offset = size - PECOFF_RELOC_RESERVE -
> >> PECOFF_COMPAT_RESERVE;
> >> + uint32_t setup_offset = SECTOR_SIZE;
> >> + uint32_t reloc_offset = size - PECOFF_RELOC_RESERVE -
> >> PECOFF_COMPAT_RESERVE;
> >> #ifdef CONFIG_EFI_MIXED
> >> - u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
> >> + uint32_t compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
> >> #endif
> >> - u32 setup_size = reloc_offset - setup_offset;
> >> + uint32_t setup_size = reloc_offset - setup_offset;
> >>
> >> update_pecoff_section_header(".setup", setup_offset,
> >> setup_size);
> >> update_pecoff_section_header(".reloc", reloc_offset,
> >> PECOFF_RELOC_RESERVE);
> >> @@ -211,8 +312,8 @@ static void update_pecoff_setup_and_reloc(unsigned
> >> int size)
> >> * Modify .reloc section contents with a single entry. The
> >> * relocation is applied to offset 10 of the relocation
> >> section.
> >> */
> >> - put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
> >> - put_unaligned_le32(10, &buf[reloc_offset + 4]);
> >> + put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE,
> >> &buf[reloc_offset]);
> >> + put_unaligned_le32(RELOC_SECTION_SIZE, &buf[reloc_offset +
> >> 4]);
> >>
> >> #ifdef CONFIG_EFI_MIXED
> >> update_pecoff_section_header(".compat", compat_offset,
> >> PECOFF_COMPAT_RESERVE);
> >> @@ -224,19 +325,17 @@ static void
> >> update_pecoff_setup_and_reloc(unsigned int size)
> >> */
> >> buf[compat_offset] = 0x1;
> >> buf[compat_offset + 1] = 0x8;
> >> - put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
> >> + put_unaligned_le16(IMAGE_FILE_MACHINE_I386, &buf[compat_offset
> >> + 2]);
> >> put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset +
> >> 4]);
> >> #endif
> >> }
> >>
> >> -static void update_pecoff_text(unsigned int text_start, unsigned int
> >> file_sz,
> >> +static unsigned int update_pecoff_sections(unsigned int text_start,
> >> unsigned int text_sz,
> >> unsigned int init_sz)
> >> {
> >> - unsigned int pe_header;
> >> - unsigned int text_sz = file_sz - text_start;
> >> + unsigned int file_sz = text_start + text_sz;
> >> unsigned int bss_sz = init_sz - file_sz;
> >> -
> >> - pe_header = get_unaligned_le32(&buf[0x3c]);
> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> >>
> >> /*
> >> * The PE/COFF loader may load the image at an address which
> >> is
> >> @@ -254,18 +353,20 @@ static void update_pecoff_text(unsigned int
> >> text_start, unsigned int file_sz,
> >> * Size of code: Subtract the size of the first sector (512
> >> bytes)
> >> * which includes the header.
> >> */
> >> - put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header +
> >> 0x1c]);
> >> + put_unaligned_le32(file_sz - SECTOR_SIZE + bss_sz,
> >> &hdr->text_size);
> >>
> >> /* Size of image */
> >> - put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
> >> + put_unaligned_le32(init_sz, &hdr->image_size);
> >>
> >> /*
> >> * Address of entry point for PE/COFF executable
> >> */
> >> - put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header +
> >> 0x28]);
> >> + put_unaligned_le32(text_start + efi_pe_entry,
> >> &hdr->entry_point);
> >>
> >> update_pecoff_section_header_fields(".text", text_start,
> >> text_sz + bss_sz,
> >> text_sz, text_start);
> >> +
> >> + return text_start + file_sz;
> >> }
> >>
> >> static int reserve_pecoff_reloc_section(int c)
> >> @@ -275,7 +376,7 @@ static int reserve_pecoff_reloc_section(int c)
> >> return PECOFF_RELOC_RESERVE;
> >> }
> >>
> >> -static void efi_stub_defaults(void)
> >> +static void efi_stub_update_defaults(void)
> >> {
> >> /* Defaults for old kernel */
> >> #ifdef CONFIG_X86_32
> >> @@ -298,7 +399,7 @@ static void efi_stub_entry_update(void)
> >>
> >> #ifdef CONFIG_EFI_MIXED
> >> if (efi32_stub_entry != addr)
> >> - die("32-bit and 64-bit EFI entry points do not
> >> match\n");
> >> + die("32-bit and 64-bit EFI entry points do not
> >> match");
> >> #endif
> >> #endif
> >> put_unaligned_le32(addr, &buf[0x264]);
> >> @@ -310,7 +411,7 @@ static inline void
> >> update_pecoff_setup_and_reloc(unsigned int size) {}
> >> static inline void update_pecoff_text(unsigned int text_start,
> >> unsigned int file_sz,
> >> unsigned int init_sz) {}
> >> -static inline void efi_stub_defaults(void) {}
> >> +static inline void efi_stub_update_defaults(void) {}
> >> static inline void efi_stub_entry_update(void) {}
> >>
> >> static inline int reserve_pecoff_reloc_section(int c)
> >> @@ -338,20 +439,15 @@ static int reserve_pecoff_compat_section(int c)
> >>
> >> static void parse_zoffset(char *fname)
> >> {
> >> - FILE *file;
> >> - char *p;
> >> - int c;
> >> + size_t size;
> >> + char *data, *p;
> >>
> >> - file = fopen(fname, "r");
> >> - if (!file)
> >> - die("Unable to open `%s': %m", fname);
> >> - c = fread(buf, 1, sizeof(buf) - 1, file);
> >> - if (ferror(file))
> >> - die("read-error on `zoffset.h'");
> >> - fclose(file);
> >> - buf[c] = 0;
> >> + data = map_file(fname, &size);
> >>
> >> - p = (char *)buf;
> >> + /* We can do that, since we mapped one byte more */
> >> + data[size] = 0;
> >> +
> >> + p = (char *)data;
> >>
> >> while (p && *p) {
> >> PARSE_ZOFS(p, efi32_stub_entry);
> >> @@ -367,82 +463,99 @@ static void parse_zoffset(char *fname)
> >> while (p && (*p == '\r' || *p == '\n'))
> >> p++;
> >> }
> >> +
> >> + unmap_file(data, size);
> >> }
> >>
> >> -int main(int argc, char ** argv)
> >> +static unsigned int read_setup(char *path)
> >> {
> >> - unsigned int i, sz, setup_sectors, init_sz;
> >> - int c;
> >> - u32 sys_size;
> >> - struct stat sb;
> >> - FILE *file, *dest;
> >> - int fd;
> >> - void *kernel;
> >> - u32 crc = 0xffffffffUL;
> >> -
> >> - efi_stub_defaults();
> >> -
> >> - if (argc != 5)
> >> - usage();
> >> - parse_zoffset(argv[3]);
> >> -
> >> - dest = fopen(argv[4], "w");
> >> - if (!dest)
> >> - die("Unable to write `%s': %m", argv[4]);
> >> + FILE *file;
> >> + unsigned int setup_size, file_size;
> >>
> >> /* Copy the setup code */
> >> - file = fopen(argv[1], "r");
> >> + file = fopen(path, "r");
> >> if (!file)
> >> - die("Unable to open `%s': %m", argv[1]);
> >> - c = fread(buf, 1, sizeof(buf), file);
> >> + die("Unable to open `%s': %m", path);
> >> +
> >> + file_size = fread(buf, 1, sizeof(buf), file);
> >> if (ferror(file))
> >> die("read-error on `setup'");
> >> - if (c < 1024)
> >> +
> >> + if (file_size < 2 * SECTOR_SIZE)
> >> die("The setup must be at least 1024 bytes");
> >> - if (get_unaligned_le16(&buf[510]) != 0xAA55)
> >> +
> >> + if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
> >> die("Boot block hasn't got boot flag (0xAA55)");
> >> +
> >> fclose(file);
> >>
> >> - c += reserve_pecoff_compat_section(c);
> >> - c += reserve_pecoff_reloc_section(c);
> >> + /* Reserve space for PE sections */
> >> + file_size += reserve_pecoff_compat_section(file_size);
> >> + file_size += reserve_pecoff_reloc_section(file_size);
> >>
> >> /* Pad unused space with zeros */
> >> - setup_sectors = (c + 511) / 512;
> >> - if (setup_sectors < SETUP_SECT_MIN)
> >> - setup_sectors = SETUP_SECT_MIN;
> >> - i = setup_sectors*512;
> >> - memset(buf+c, 0, i-c);
> >>
> >> - update_pecoff_setup_and_reloc(i);
> >> + setup_size = round_up(file_size, SECTOR_SIZE);
> >> +
> >> + if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
> >> + setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
> >> +
> >> + /*
> >> + * Global buffer is already initialised
> >> + * to 0, but just in case, zero out padding.
> >> + */
> >> +
> >> + memset(buf + file_size, 0, setup_size - file_size);
> >> +
> >> + return setup_size;
> >> +}
> >> +
> >> +int main(int argc, char **argv)
> >> +{
> >> + size_t kern_file_size;
> >> + unsigned int setup_size;
> >> + unsigned int setup_sectors;
> >> + unsigned int init_size;
> >> + unsigned int total_size;
> >> + unsigned int kern_size;
> >> + void *kernel;
> >> + uint32_t crc = 0xffffffffUL;
> >> + uint8_t *output;
> >> +
> >> + if (argc != 5)
> >> + usage();
> >> +
> >> + efi_stub_update_defaults();
> >> + parse_zoffset(argv[3]);
> >> +
> >> + setup_size = read_setup(argv[1]);
> >> +
> >> + setup_sectors = setup_size/SECTOR_SIZE;
> >>
> >> /* Set the default root device */
> >> put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
> >>
> >> - /* Open and stat the kernel file */
> >> - fd = open(argv[2], O_RDONLY);
> >> - if (fd < 0)
> >> - die("Unable to open `%s': %m", argv[2]);
> >> - if (fstat(fd, &sb))
> >> - die("Unable to stat `%s': %m", argv[2]);
> >> - sz = sb.st_size;
> >> - kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
> >> - if (kernel == MAP_FAILED)
> >> - die("Unable to mmap '%s': %m", argv[2]);
> >> - /* Number of 16-byte paragraphs, including space for a 4-byte
> >> CRC */
> >> - sys_size = (sz + 15 + 4) / 16;
> >> + /* Map kernel file to memory */
> >> + kernel = map_file(argv[2], &kern_file_size);
> >> +
> >> #ifdef CONFIG_EFI_STUB
> >> - /*
> >> - * COFF requires minimum 32-byte alignment of sections, and
> >> - * adding a signature is problematic without that alignment.
> >> - */
> >> - sys_size = (sys_size + 1) & ~1;
> >> + /* PE specification require 512-byte minimum section file
> >> alignment */
> >> + kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
> >> + update_pecoff_setup_and_reloc(setup_size);
> >> +#else
> >> + /* Number of 16-byte paragraphs, including space for a 4-byte
> >> CRC */
> >> + kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
> >> #endif
> >>
> >> /* Patch the setup code with the appropriate size parameters
> >> */
> >> - buf[0x1f1] = setup_sectors-1;
> >> - put_unaligned_le32(sys_size, &buf[0x1f4]);
> >> + buf[0x1f1] = setup_sectors - 1;
> >> + put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
> >> +
> >> + /* Update kernel_info offset. */
> >> + put_unaligned_le32(kernel_info, &buf[0x268]);
> >> +
> >> + init_size = get_unaligned_le32(&buf[0x260]);
> >>
> >> - init_sz = get_unaligned_le32(&buf[0x260]);
> >> #ifdef CONFIG_EFI_STUB
> >> /*
> >> * The decompression buffer will start at ImageBase. When
> >> relocating
> >> @@ -458,45 +571,35 @@ int main(int argc, char ** argv)
> >> * For future-proofing, increase init_sz if necessary.
> >> */
> >>
> >> - if (init_sz - _end < i + _ehead) {
> >> - init_sz = (i + _ehead + _end + 4095) & ~4095;
> >> - put_unaligned_le32(init_sz, &buf[0x260]);
> >> + if (init_size - _end < setup_size + _ehead) {
> >> + init_size = round_up(setup_size + _ehead + _end,
> >> SECTION_ALIGNMENT);
> >> + put_unaligned_le32(init_size, &buf[0x260]);
> >> }
> >> -#endif
> >> - update_pecoff_text(setup_sectors * 512, i + (sys_size * 16),
> >> init_sz);
> >>
> >> - efi_stub_entry_update();
> >> -
> >> - /* Update kernel_info offset. */
> >> - put_unaligned_le32(kernel_info, &buf[0x268]);
> >> + total_size = update_pecoff_sections(setup_size, kern_size,
> >> init_size);
> >>
> >> - crc = partial_crc32(buf, i, crc);
> >> - if (fwrite(buf, 1, i, dest) != i)
> >> - die("Writing setup failed");
> >> + efi_stub_entry_update();
> >> +#else
> >> + (void)init_size;
> >> + total_size = setup_size + kern_size;
> >> +#endif
> >>
> >> - /* Copy the kernel code */
> >> - crc = partial_crc32(kernel, sz, crc);
> >> - if (fwrite(kernel, 1, sz, dest) != sz)
> >> - die("Writing kernel failed");
> >> + output = map_output_file(argv[4], total_size);
> >>
> >> - /* Add padding leaving 4 bytes for the checksum */
> >> - while (sz++ < (sys_size*16) - 4) {
> >> - crc = partial_crc32_one('\0', crc);
> >> - if (fwrite("\0", 1, 1, dest) != 1)
> >> - die("Writing padding failed");
> >> - }
> >> + memcpy(output, buf, setup_size);
> >> + memcpy(output + setup_size, kernel, kern_file_size);
> >> + memset(output + setup_size + kern_file_size, 0, kern_size -
> >> kern_file_size);
> >>
> >> - /* Write the CRC */
> >> - put_unaligned_le32(crc, buf);
> >> - if (fwrite(buf, 1, 4, dest) != 4)
> >> - die("Writing CRC failed");
> >> + /* Calculate and write kernel checksum. */
> >> + crc = partial_crc32(output, total_size - 4, crc);
> >> + put_unaligned_le32(crc, &output[total_size - 4]);
> >>
> >> - /* Catch any delayed write failures */
> >> - if (fclose(dest))
> >> - die("Writing image failed");
> >> + /* Catch any delayed write failures. */
> >> + if (munmap(output, total_size) < 0)
> >> + die("Writing kernel failed");
> >>
> >> - close(fd);
> >> + unmap_file(kernel, kern_file_size);
> >>
> >> - /* Everything is OK */
> >> + /* Everything is OK. */
> >> return 0;
> >> }
> >> --
> >> 2.37.4
> >>
On 2023-03-09 19:00, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>>
>> Doing it that way allows setting up stricter memory attributes,
>> simplifies boot code path and removes potential relocation
>> of kernel image.
>>
>> Wire up required interfaces and minimally initialize zero page
>> fields needed for it to function correctly.
>>
>> Tested-by: Peter Jones <[email protected]>
>> Signed-off-by: Evgeniy Baskov <[email protected]>
>> ---
>> arch/x86/boot/compressed/head_32.S | 50 ++++-
>> arch/x86/boot/compressed/head_64.S | 58 ++++-
>> drivers/firmware/efi/Kconfig | 2 +
>> drivers/firmware/efi/libstub/Makefile | 2 +-
>> .../firmware/efi/libstub/x86-extract-direct.c | 208
>> ++++++++++++++++++
>> drivers/firmware/efi/libstub/x86-stub.c | 119 +---------
>> drivers/firmware/efi/libstub/x86-stub.h | 14 ++
>> 7 files changed, 338 insertions(+), 115 deletions(-)
>> create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
>> create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
>>
>> diff --git a/arch/x86/boot/compressed/head_32.S
>> b/arch/x86/boot/compressed/head_32.S
>> index ead6007df1e5..0be75e5072ae 100644
>> --- a/arch/x86/boot/compressed/head_32.S
>> +++ b/arch/x86/boot/compressed/head_32.S
>> @@ -152,11 +152,57 @@ SYM_FUNC_END(startup_32)
>>
>> #ifdef CONFIG_EFI_STUB
>> SYM_FUNC_START(efi32_stub_entry)
>> +/*
>> + * Calculate the delta between where we were compiled to run
>> + * at and where we were actually loaded at. This can only be done
>> + * with a short local call on x86. Nothing else will tell us what
>> + * address we are running at. The reserved chunk of the real-mode
>> + * data at 0x1e4 (defined as a scratch field) are used as the stack
>> + * for this calculation. Only 4 bytes are needed.
>> + */
>
> Please drop this comment
Will do.
>
>> + call 1f
>> +1: popl %ebx
>> + addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %ebx
>
> Please drop this and ...
>
>> +
>> + /* Clear BSS */
>> + xorl %eax, %eax
>> + leal _bss@GOTOFF(%ebx), %edi
>> + leal _ebss@GOTOFF(%ebx), %ecx
>
> just use (_bss - 1b) here (etc)
I was trying to be consistent with the code below, but it will
indeed be better to do this like that. I guess, this will be
fine to stop putting GOT address to the %ebx, since the extraction
code does not use calls via PLT?
>
>> + subl %edi, %ecx
>> + shrl $2, %ecx
>> + rep stosl
>> +
>> add $0x4, %esp
>> movl 8(%esp), %esi /* save boot_params pointer */
>> + movl %edx, %edi /* save GOT address */
>
> What does this do?
Hmm... It seems to be a remnant of the previous implementation
that I forgot to remove. I will remove that in the v5.
>
>> call efi_main
>> - /* efi_main returns the possibly relocated address of
>> startup_32 */
>> - jmp *%eax
>> + movl %eax, %ecx
>> +
>> + /*
>> + * efi_main returns the possibly
>> + * relocated address of extracted kernel entry point.
>> + */
>> +
>> + cli
>> +
>> + /* Load new GDT */
>> + leal gdt@GOTOFF(%ebx), %eax
>> + movl %eax, 2(%eax)
>> + lgdt (%eax)
>> +
>> + /* Load segment registers with our descriptors */
>> + movl $__BOOT_DS, %eax
>> + movl %eax, %ds
>> + movl %eax, %es
>> + movl %eax, %fs
>> + movl %eax, %gs
>> + movl %eax, %ss
>> +
>> + /* Zero EFLAGS */
>> + pushl $0
>> + popfl
>> +
>> + jmp *%ecx
>> SYM_FUNC_END(efi32_stub_entry)
>> SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
>> #endif
> ...
Thanks,
Evgeniy Baskov
On 2023-03-09 19:49, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>>
>> Doing it that way allows setting up stricter memory attributes,
>> simplifies boot code path and removes potential relocation
>> of kernel image.
>>
>> Wire up required interfaces and minimally initialize zero page
>> fields needed for it to function correctly.
>>
>> Tested-by: Peter Jones <[email protected]>
>> Signed-off-by: Evgeniy Baskov <[email protected]>
>
> OK I just realized that there is a problem with this approach: since
> we now decompress the image while running in the EFI stub (i.e.,
> before ExitBootServices()), we cannot just randomly pick a
> EFI_CONVENTIONAL_MEMORY region to place the kernel, we need to
> allocate the pages using the boot services. Otherwise, subsequent
> allocations (or concurrent ones occurring in the firmware in event
> handlers etc) may land right in the middle, which is unlikely to be
> what we want.
It does allocate pages for the kernel.
I've marked the place below.
>
>
>> ---
>> arch/x86/boot/compressed/head_32.S | 50 ++++-
>> arch/x86/boot/compressed/head_64.S | 58 ++++-
>> drivers/firmware/efi/Kconfig | 2 +
>> drivers/firmware/efi/libstub/Makefile | 2 +-
>> .../firmware/efi/libstub/x86-extract-direct.c | 208
>> ++++++++++++++++++
>> drivers/firmware/efi/libstub/x86-stub.c | 119 +---------
>> drivers/firmware/efi/libstub/x86-stub.h | 14 ++
>> 7 files changed, 338 insertions(+), 115 deletions(-)
>> create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
>> create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
>>
>> diff --git a/arch/x86/boot/compressed/head_32.S
>> b/arch/x86/boot/compressed/head_32.S
>> index ead6007df1e5..0be75e5072ae 100644
>> --- a/arch/x86/boot/compressed/head_32.S
>> +++ b/arch/x86/boot/compressed/head_32.S
>> @@ -152,11 +152,57 @@ SYM_FUNC_END(startup_32)
>>
>> #ifdef CONFIG_EFI_STUB
>> SYM_FUNC_START(efi32_stub_entry)
>> +/*
>> + * Calculate the delta between where we were compiled to run
>> + * at and where we were actually loaded at. This can only be done
>> + * with a short local call on x86. Nothing else will tell us what
>> + * address we are running at. The reserved chunk of the real-mode
>> + * data at 0x1e4 (defined as a scratch field) are used as the stack
>> + * for this calculation. Only 4 bytes are needed.
>> + */
>> + call 1f
>> +1: popl %ebx
>> + addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %ebx
>> +
>> + /* Clear BSS */
>> + xorl %eax, %eax
>> + leal _bss@GOTOFF(%ebx), %edi
>> + leal _ebss@GOTOFF(%ebx), %ecx
>> + subl %edi, %ecx
>> + shrl $2, %ecx
>> + rep stosl
>> +
>> add $0x4, %esp
>> movl 8(%esp), %esi /* save boot_params pointer */
>> + movl %edx, %edi /* save GOT address */
>> call efi_main
>> - /* efi_main returns the possibly relocated address of
>> startup_32 */
>> - jmp *%eax
>> + movl %eax, %ecx
>> +
>> + /*
>> + * efi_main returns the possibly
>> + * relocated address of extracted kernel entry point.
>> + */
>> +
>> + cli
>> +
>> + /* Load new GDT */
>> + leal gdt@GOTOFF(%ebx), %eax
>> + movl %eax, 2(%eax)
>> + lgdt (%eax)
>> +
>> + /* Load segment registers with our descriptors */
>> + movl $__BOOT_DS, %eax
>> + movl %eax, %ds
>> + movl %eax, %es
>> + movl %eax, %fs
>> + movl %eax, %gs
>> + movl %eax, %ss
>> +
>> + /* Zero EFLAGS */
>> + pushl $0
>> + popfl
>> +
>> + jmp *%ecx
>> SYM_FUNC_END(efi32_stub_entry)
>> SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
>> #endif
>> diff --git a/arch/x86/boot/compressed/head_64.S
>> b/arch/x86/boot/compressed/head_64.S
>> index 2dd8be0583d2..7cfef7bd0424 100644
>> --- a/arch/x86/boot/compressed/head_64.S
>> +++ b/arch/x86/boot/compressed/head_64.S
>> @@ -529,12 +529,64 @@ SYM_CODE_END(startup_64)
>> .org 0x390
>> #endif
>> SYM_FUNC_START(efi64_stub_entry)
>> + /* Preserve first parameter */
>> + movq %rdi, %r10
>> +
>> + /* Clear BSS */
>> + xorl %eax, %eax
>> + leaq _bss(%rip), %rdi
>> + leaq _ebss(%rip), %rcx
>> + subq %rdi, %rcx
>> + shrq $3, %rcx
>> + rep stosq
>> +
>> and $~0xf, %rsp /* realign the stack
>> */
>> movq %rdx, %rbx /* save boot_params
>> pointer */
>> + movq %r10, %rdi
>> call efi_main
>> - movq %rbx,%rsi
>> - leaq rva(startup_64)(%rax), %rax
>> - jmp *%rax
>> +
>> + cld
>> + cli
>> +
>> + movq %rbx, %rdi /* boot_params */
>> + movq %rax, %rsi /* decompressed kernel address */
>> +
>> + /* Make sure we have GDT with 32-bit code segment */
>> + leaq gdt64(%rip), %rax
>> + addq %rax, 2(%rax)
>> + lgdt (%rax)
>> +
>> + /* Setup data segments. */
>> + xorl %eax, %eax
>> + movl %eax, %ds
>> + movl %eax, %es
>> + movl %eax, %ss
>> + movl %eax, %fs
>> + movl %eax, %gs
>> +
>> + pushq %rsi
>> + pushq %rdi
>> +
>> + call load_stage1_idt
>> + call enable_nx_if_supported
>> +
>> + call trampoline_pgtable_init
>> + movq %rax, %rdx
>> +
>> +
>> + /* Swap %rsi and %rsi */
>> + popq %rsi
>> + popq %rdi
>> +
>> + /* Save the trampoline address in RCX */
>> + movq trampoline_32bit(%rip), %rcx
>> +
>> + /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far
>> return */
>> + pushq $__KERNEL32_CS
>> + leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rcx), %rax
>> + pushq %rax
>> + lretq
>> +
>> SYM_FUNC_END(efi64_stub_entry)
>> SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
>> #endif
>> diff --git a/drivers/firmware/efi/Kconfig
>> b/drivers/firmware/efi/Kconfig
>> index 043ca31c114e..f50c2a84a754 100644
>> --- a/drivers/firmware/efi/Kconfig
>> +++ b/drivers/firmware/efi/Kconfig
>> @@ -58,6 +58,8 @@ config EFI_DXE_MEM_ATTRIBUTES
>> Use DXE services to check and alter memory protection
>> attributes during boot via EFISTUB to ensure that memory
>> ranges used by the kernel are writable and executable.
>> + This option also enables stricter memory attributes
>> + on compressed kernel PE image.
>>
>> config EFI_PARAMS_FROM_FDT
>> bool
>> diff --git a/drivers/firmware/efi/libstub/Makefile
>> b/drivers/firmware/efi/libstub/Makefile
>> index be8b8c6e8b40..99b81c95344c 100644
>> --- a/drivers/firmware/efi/libstub/Makefile
>> +++ b/drivers/firmware/efi/libstub/Makefile
>> @@ -88,7 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o
>> string.o intrinsics.o systable.o \
>>
>> lib-$(CONFIG_ARM) += arm32-stub.o
>> lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o arm64-entry.o
>> smbios.o
>> -lib-$(CONFIG_X86) += x86-stub.o
>> +lib-$(CONFIG_X86) += x86-stub.o x86-extract-direct.o
>> lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
>> lib-$(CONFIG_LOONGARCH) += loongarch.o
>> loongarch-stub.o
>>
>> diff --git a/drivers/firmware/efi/libstub/x86-extract-direct.c
>> b/drivers/firmware/efi/libstub/x86-extract-direct.c
>> new file mode 100644
>> index 000000000000..4ecbc4a9b3ed
>> --- /dev/null
>> +++ b/drivers/firmware/efi/libstub/x86-extract-direct.c
>> @@ -0,0 +1,208 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/efi.h>
>> +#include <linux/elf.h>
>> +#include <linux/stddef.h>
>> +
>> +#include <asm/efi.h>
>> +#include <asm/e820/types.h>
>> +#include <asm/desc.h>
>> +#include <asm/boot.h>
>> +#include <asm/bootparam_utils.h>
>> +#include <asm/shared/extract.h>
>> +#include <asm/shared/pgtable.h>
>> +
>> +#include "efistub.h"
>> +#include "x86-stub.h"
>> +
>> +static efi_handle_t image_handle;
>> +
>> +static void do_puthex(unsigned long value)
>> +{
>> + efi_printk("%08lx", value);
>> +}
>> +
>> +static void do_putstr(const char *msg)
>> +{
>> + efi_printk("%s", msg);
>> +}
>> +
>> +static unsigned long do_map_range(unsigned long start,
>> + unsigned long end,
>> + unsigned int flags)
>> +{
>> + efi_status_t status;
>> +
>> + unsigned long size = end - start;
>> +
>> + if (flags & MAP_ALLOC) {
>> + unsigned long addr;
>> +
>> + status = efi_low_alloc_above(size,
>> CONFIG_PHYSICAL_ALIGN,
>> + &addr, start);
Memory for the kernel image is allocated here.
This function is getting called from the boot/compressed/misc.c with
MAP_ALLOC flag
when the address for the kernel is picked.
>> + if (status != EFI_SUCCESS) {
>> + efi_err("Unable to allocate memory for
>> uncompressed kernel");
>> + efi_exit(image_handle, EFI_OUT_OF_RESOURCES);
>> + }
>> +
>> + if (start != addr) {
>> + efi_debug("Unable to allocate at given
>> address"
>> + " (desired=0x%lx, actual=0x%lx)",
>> + (unsigned long)start, addr);
>> + start = addr;
>> + }
>> + }
>> +
>> + if ((flags & (MAP_PROTECT | MAP_ALLOC)) &&
>> + IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
>> + unsigned long attr = 0;
>> +
>> + if (!(flags & MAP_EXEC))
>> + attr |= EFI_MEMORY_XP;
>> +
>> + if (!(flags & MAP_WRITE))
>> + attr |= EFI_MEMORY_RO;
>> +
>> + status = efi_adjust_memory_range_protection(start,
>> size, attr);
>> + if (status != EFI_SUCCESS)
>> + efi_err("Unable to protect memory range");
>> + }
>> +
>> + return start;
>> +}
>> +
>> +/*
>> + * Trampoline takes 3 pages and can be loaded in first megabyte of
>> memory
>> + * with its end placed between 0 and 640k where BIOS might start.
>> + * (see arch/x86/boot/compressed/pgtable_64.c)
>> + */
>> +
>> +#ifdef CONFIG_64BIT
>> +static efi_status_t prepare_trampoline(void)
>> +{
>> + efi_status_t status;
>> +
>> + status = efi_allocate_pages(TRAMPOLINE_32BIT_SIZE,
>> + (unsigned long
>> *)&trampoline_32bit,
>> + TRAMPOLINE_32BIT_PLACEMENT_MAX);
>> +
>> + if (status != EFI_SUCCESS)
>> + return status;
>> +
>> + unsigned long trampoline_start = (unsigned
>> long)trampoline_32bit;
>> +
>> + memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
>> +
>> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
>> + /* First page of trampoline is a top level page table
>> */
>> + efi_adjust_memory_range_protection(trampoline_start,
>> + PAGE_SIZE,
>> + EFI_MEMORY_XP);
>> + }
>> +
>> + /* Second page of trampoline is the code (with a padding) */
>> +
>> + void *caddr = (void *)trampoline_32bit +
>> TRAMPOLINE_32BIT_CODE_OFFSET;
>> +
>> + memcpy(caddr, trampoline_32bit_src,
>> TRAMPOLINE_32BIT_CODE_SIZE);
>> +
>> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
>> + efi_adjust_memory_range_protection((unsigned
>> long)caddr,
>> + PAGE_SIZE,
>> + EFI_MEMORY_RO);
>> +
>> + /* And the last page of trampoline is the stack */
>> +
>> + efi_adjust_memory_range_protection(trampoline_start +
>> 2 * PAGE_SIZE,
>> + PAGE_SIZE,
>> + EFI_MEMORY_XP);
>> + }
>> +
>> + return EFI_SUCCESS;
>> +}
>> +#else
>> +static inline efi_status_t prepare_trampoline(void)
>> +{
>> + return EFI_SUCCESS;
>> +}
>> +#endif
>> +
>> +static efi_status_t init_loader_data(efi_handle_t handle,
>> + struct boot_params *params,
>> + struct efi_boot_memmap **map)
>> +{
>> + struct efi_info *efi = (void *)¶ms->efi_info;
>> + efi_status_t status;
>> +
>> + status = efi_get_memory_map(map, false);
>> +
>> + if (status != EFI_SUCCESS) {
>> + efi_err("Unable to get EFI memory map...\n");
>> + return status;
>> + }
>> +
>> + const char *signature = efi_is_64bit() ?
>> EFI64_LOADER_SIGNATURE
>> + :
>> EFI32_LOADER_SIGNATURE;
>> +
>> + memcpy(&efi->efi_loader_signature, signature, sizeof(__u32));
>> +
>> + efi->efi_memdesc_size = (*map)->desc_size;
>> + efi->efi_memdesc_version = (*map)->desc_ver;
>> + efi->efi_memmap_size = (*map)->map_size;
>> +
>> + efi_set_u64_split((unsigned long)(*map)->map,
>> + &efi->efi_memmap, &efi->efi_memmap_hi);
>> +
>> + efi_set_u64_split((unsigned long)efi_system_table,
>> + &efi->efi_systab, &efi->efi_systab_hi);
>> +
>> + image_handle = handle;
>> +
>> + return EFI_SUCCESS;
>> +}
>> +
>> +static void free_loader_data(struct boot_params *params, struct
>> efi_boot_memmap *map)
>> +{
>> + struct efi_info *efi = (void *)¶ms->efi_info;
>> +
>> + efi_bs_call(free_pool, map);
>> +
>> + efi->efi_memdesc_size = 0;
>> + efi->efi_memdesc_version = 0;
>> + efi->efi_memmap_size = 0;
>> + efi_set_u64_split(0, &efi->efi_memmap, &efi->efi_memmap_hi);
>> +}
>> +
>> +extern unsigned char input_data[];
>> +extern unsigned int input_len, output_len;
>> +
>> +unsigned long extract_kernel_direct(efi_handle_t handle, struct
>> boot_params *params)
>> +{
>> +
>> + void *res;
>> + efi_status_t status;
>> + struct efi_extract_callbacks cb = { 0 };
>> +
>> + status = prepare_trampoline();
>> +
>> + if (status != EFI_SUCCESS)
>> + return 0;
>> +
>> + /* Prepare environment for do_extract_kernel() call */
>> + struct efi_boot_memmap *map = NULL;
>> + status = init_loader_data(handle, params, &map);
>> +
>> + if (status != EFI_SUCCESS)
>> + return 0;
>> +
>> + cb.puthex = do_puthex;
>> + cb.putstr = do_putstr;
>> + cb.map_range = do_map_range;
>> +
>> + res = efi_extract_kernel(params, &cb, input_data, input_len,
>> output_len);
>> +
>> + free_loader_data(params, map);
>> +
>> + return (unsigned long)res;
>> +}
>> diff --git a/drivers/firmware/efi/libstub/x86-stub.c
>> b/drivers/firmware/efi/libstub/x86-stub.c
>> index 7fb1eff88a18..1d1ab1911fd3 100644
>> --- a/drivers/firmware/efi/libstub/x86-stub.c
>> +++ b/drivers/firmware/efi/libstub/x86-stub.c
>> @@ -17,6 +17,7 @@
>> #include <asm/boot.h>
>>
>> #include "efistub.h"
>> +#include "x86-stub.h"
>>
>> /* Maximum physical address for 64-bit kernel with 4-level paging */
>> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>> @@ -24,7 +25,7 @@
>> const efi_system_table_t *efi_system_table;
>> const efi_dxe_services_table_t *efi_dxe_table;
>> u32 image_offset __section(".data");
>> -static efi_loaded_image_t *image = NULL;
>> +static efi_loaded_image_t *image __section(".data");
>>
>> static efi_status_t
>> preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct
>> pci_setup_rom **__rom)
>> @@ -212,55 +213,9 @@ static void
>> retrieve_apple_device_properties(struct boot_params *boot_params)
>> }
>> }
>>
>> -/*
>> - * Trampoline takes 2 pages and can be loaded in first megabyte of
>> memory
>> - * with its end placed between 128k and 640k where BIOS might start.
>> - * (see arch/x86/boot/compressed/pgtable_64.c)
>> - *
>> - * We cannot find exact trampoline placement since memory map
>> - * can be modified by UEFI, and it can alter the computed address.
>> - */
>> -
>> -#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
>> -#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
>> -
>> -void startup_32(struct boot_params *boot_params);
>> -
>> -static void
>> -setup_memory_protection(unsigned long image_base, unsigned long
>> image_size)
>> -{
>> - /*
>> - * Allow execution of possible trampoline used
>> - * for switching between 4- and 5-level page tables
>> - * and relocated kernel image.
>> - */
>> -
>> - efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
>> - TRAMPOLINE_PLACEMENT_SIZE,
>> 0);
>> -
>> -#ifdef CONFIG_64BIT
>> - if (image_base != (unsigned long)startup_32)
>> - efi_adjust_memory_range_protection(image_base,
>> image_size, 0);
>> -#else
>> - /*
>> - * Clear protection flags on a whole range of possible
>> - * addresses used for KASLR. We don't need to do that
>> - * on x86_64, since KASLR/extraction is performed after
>> - * dedicated identity page tables are built and we only
>> - * need to remove possible protection on relocated image
>> - * itself disregarding further relocations.
>> - */
>> - efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
>> - KERNEL_IMAGE_SIZE -
>> LOAD_PHYSICAL_ADDR,
>> - 0);
>> -#endif
>> -}
>> -
>> static const efi_char16_t apple[] = L"Apple";
>>
>> -static void setup_quirks(struct boot_params *boot_params,
>> - unsigned long image_base,
>> - unsigned long image_size)
>> +static void setup_quirks(struct boot_params *boot_params)
>> {
>> efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
>> efi_table_attr(efi_system_table, fw_vendor);
>> @@ -269,9 +224,6 @@ static void setup_quirks(struct boot_params
>> *boot_params,
>> if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
>> retrieve_apple_device_properties(boot_params);
>> }
>> -
>> - if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
>> - setup_memory_protection(image_base, image_size);
>> }
>>
>> /*
>> @@ -384,7 +336,7 @@ static void setup_graphics(struct boot_params
>> *boot_params)
>> }
>>
>>
>> -static void __noreturn efi_exit(efi_handle_t handle, efi_status_t
>> status)
>> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
>> {
>> efi_bs_call(exit, handle, status, 0, NULL);
>> for(;;)
>> @@ -707,8 +659,7 @@ static efi_status_t exit_boot(struct boot_params
>> *boot_params, void *handle)
>> }
>>
>> /*
>> - * On success, we return the address of startup_32, which has
>> potentially been
>> - * relocated by efi_relocate_kernel.
>> + * On success, we return extracted kernel entry point.
>> * On failure, we exit to the firmware via efi_exit instead of
>> returning.
>> */
>> asmlinkage unsigned long efi_main(efi_handle_t handle,
>> @@ -733,60 +684,6 @@ asmlinkage unsigned long efi_main(efi_handle_t
>> handle,
>> efi_dxe_table = NULL;
>> }
>>
>> - /*
>> - * If the kernel isn't already loaded at a suitable address,
>> - * relocate it.
>> - *
>> - * It must be loaded above LOAD_PHYSICAL_ADDR.
>> - *
>> - * The maximum address for 64-bit is 1 << 46 for 4-level
>> paging. This
>> - * is defined as the macro MAXMEM, but unfortunately that is
>> not a
>> - * compile-time constant if 5-level paging is configured, so
>> we instead
>> - * define our own macro for use here.
>> - *
>> - * For 32-bit, the maximum address is complicated to figure
>> out, for
>> - * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same
>> as what
>> - * KASLR uses.
>> - *
>> - * Also relocate it if image_offset is zero, i.e. the kernel
>> wasn't
>> - * loaded by LoadImage, but rather by a bootloader that called
>> the
>> - * handover entry. The reason we must always relocate in this
>> case is
>> - * to handle the case of systemd-boot booting a unified kernel
>> image,
>> - * which is a PE executable that contains the bzImage and an
>> initrd as
>> - * COFF sections. The initrd section is placed after the
>> bzImage
>> - * without ensuring that there are at least init_size bytes
>> available
>> - * for the bzImage, and thus the compressed kernel's startup
>> code may
>> - * overwrite the initrd unless it is moved out of the way.
>> - */
>> -
>> - buffer_start = ALIGN(bzimage_addr - image_offset,
>> - hdr->kernel_alignment);
>> - buffer_end = buffer_start + hdr->init_size;
>> -
>> - if ((buffer_start < LOAD_PHYSICAL_ADDR)
>> ||
>> - (IS_ENABLED(CONFIG_X86_32) && buffer_end >
>> KERNEL_IMAGE_SIZE) ||
>> - (IS_ENABLED(CONFIG_X86_64) && buffer_end >
>> MAXMEM_X86_64_4LEVEL) ||
>> - (image_offset == 0)) {
>> - extern char _bss[];
>> -
>> - status = efi_relocate_kernel(&bzimage_addr,
>> - (unsigned long)_bss -
>> bzimage_addr,
>> - hdr->init_size,
>> - hdr->pref_address,
>> - hdr->kernel_alignment,
>> - LOAD_PHYSICAL_ADDR);
>> - if (status != EFI_SUCCESS) {
>> - efi_err("efi_relocate_kernel() failed!\n");
>> - goto fail;
>> - }
>> - /*
>> - * Now that we've copied the kernel elsewhere, we no
>> longer
>> - * have a set up block before startup_32(), so reset
>> image_offset
>> - * to zero in case it was set earlier.
>> - */
>> - image_offset = 0;
>> - }
>> -
>> #ifdef CONFIG_CMDLINE_BOOL
>> status = efi_parse_options(CONFIG_CMDLINE);
>> if (status != EFI_SUCCESS) {
>> @@ -843,7 +740,11 @@ asmlinkage unsigned long efi_main(efi_handle_t
>> handle,
>>
>> setup_efi_pci(boot_params);
>>
>> - setup_quirks(boot_params, bzimage_addr, buffer_end -
>> buffer_start);
>> + setup_quirks(boot_params);
>> +
>> + bzimage_addr = extract_kernel_direct(handle, boot_params);
>> + if (!bzimage_addr)
>> + goto fail;
>>
>> status = exit_boot(boot_params, handle);
>> if (status != EFI_SUCCESS) {
>> diff --git a/drivers/firmware/efi/libstub/x86-stub.h
>> b/drivers/firmware/efi/libstub/x86-stub.h
>> new file mode 100644
>> index 000000000000..baecc7c6e602
>> --- /dev/null
>> +++ b/drivers/firmware/efi/libstub/x86-stub.h
>> @@ -0,0 +1,14 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +#ifndef _DRIVERS_FIRMWARE_EFI_X86STUB_H
>> +#define _DRIVERS_FIRMWARE_EFI_X86STUB_H
>> +
>> +#include <linux/efi.h>
>> +
>> +#include <asm/bootparam.h>
>> +
>> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status);
>> +unsigned long extract_kernel_direct(efi_handle_t handle, struct
>> boot_params *boot_params);
>> +void startup_32(struct boot_params *boot_params);
>> +
>> +#endif
>> --
>> 2.37.4
>>
On Thu, 9 Mar 2023 at 18:10, Evgeniy Baskov <[email protected]> wrote:
>
> On 2023-03-09 19:49, Ard Biesheuvel wrote:
> > On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
> >>
> >> Doing it that way allows setting up stricter memory attributes,
> >> simplifies boot code path and removes potential relocation
> >> of kernel image.
> >>
> >> Wire up required interfaces and minimally initialize zero page
> >> fields needed for it to function correctly.
> >>
> >> Tested-by: Peter Jones <[email protected]>
> >> Signed-off-by: Evgeniy Baskov <[email protected]>
> >
> > OK I just realized that there is a problem with this approach: since
> > we now decompress the image while running in the EFI stub (i.e.,
> > before ExitBootServices()), we cannot just randomly pick a
> > EFI_CONVENTIONAL_MEMORY region to place the kernel, we need to
> > allocate the pages using the boot services. Otherwise, subsequent
> > allocations (or concurrent ones occurring in the firmware in event
> > handlers etc) may land right in the middle, which is unlikely to be
> > what we want.
>
> It does allocate pages for the kernel.
> I've marked the place below.
>
Ah excellent, thanks for clearing that up.
> >
> >
> >> ---
> >> arch/x86/boot/compressed/head_32.S | 50 ++++-
> >> arch/x86/boot/compressed/head_64.S | 58 ++++-
> >> drivers/firmware/efi/Kconfig | 2 +
> >> drivers/firmware/efi/libstub/Makefile | 2 +-
> >> .../firmware/efi/libstub/x86-extract-direct.c | 208
> >> ++++++++++++++++++
> >> drivers/firmware/efi/libstub/x86-stub.c | 119 +---------
> >> drivers/firmware/efi/libstub/x86-stub.h | 14 ++
> >> 7 files changed, 338 insertions(+), 115 deletions(-)
> >> create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
> >> create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
> >>
> >> diff --git a/arch/x86/boot/compressed/head_32.S
> >> b/arch/x86/boot/compressed/head_32.S
> >> index ead6007df1e5..0be75e5072ae 100644
> >> --- a/arch/x86/boot/compressed/head_32.S
> >> +++ b/arch/x86/boot/compressed/head_32.S
> >> @@ -152,11 +152,57 @@ SYM_FUNC_END(startup_32)
> >>
> >> #ifdef CONFIG_EFI_STUB
> >> SYM_FUNC_START(efi32_stub_entry)
> >> +/*
> >> + * Calculate the delta between where we were compiled to run
> >> + * at and where we were actually loaded at. This can only be done
> >> + * with a short local call on x86. Nothing else will tell us what
> >> + * address we are running at. The reserved chunk of the real-mode
> >> + * data at 0x1e4 (defined as a scratch field) are used as the stack
> >> + * for this calculation. Only 4 bytes are needed.
> >> + */
> >> + call 1f
> >> +1: popl %ebx
> >> + addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %ebx
> >> +
> >> + /* Clear BSS */
> >> + xorl %eax, %eax
> >> + leal _bss@GOTOFF(%ebx), %edi
> >> + leal _ebss@GOTOFF(%ebx), %ecx
> >> + subl %edi, %ecx
> >> + shrl $2, %ecx
> >> + rep stosl
> >> +
> >> add $0x4, %esp
> >> movl 8(%esp), %esi /* save boot_params pointer */
> >> + movl %edx, %edi /* save GOT address */
> >> call efi_main
> >> - /* efi_main returns the possibly relocated address of
> >> startup_32 */
> >> - jmp *%eax
> >> + movl %eax, %ecx
> >> +
> >> + /*
> >> + * efi_main returns the possibly
> >> + * relocated address of extracted kernel entry point.
> >> + */
> >> +
> >> + cli
> >> +
> >> + /* Load new GDT */
> >> + leal gdt@GOTOFF(%ebx), %eax
> >> + movl %eax, 2(%eax)
> >> + lgdt (%eax)
> >> +
> >> + /* Load segment registers with our descriptors */
> >> + movl $__BOOT_DS, %eax
> >> + movl %eax, %ds
> >> + movl %eax, %es
> >> + movl %eax, %fs
> >> + movl %eax, %gs
> >> + movl %eax, %ss
> >> +
> >> + /* Zero EFLAGS */
> >> + pushl $0
> >> + popfl
> >> +
> >> + jmp *%ecx
> >> SYM_FUNC_END(efi32_stub_entry)
> >> SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
> >> #endif
> >> diff --git a/arch/x86/boot/compressed/head_64.S
> >> b/arch/x86/boot/compressed/head_64.S
> >> index 2dd8be0583d2..7cfef7bd0424 100644
> >> --- a/arch/x86/boot/compressed/head_64.S
> >> +++ b/arch/x86/boot/compressed/head_64.S
> >> @@ -529,12 +529,64 @@ SYM_CODE_END(startup_64)
> >> .org 0x390
> >> #endif
> >> SYM_FUNC_START(efi64_stub_entry)
> >> + /* Preserve first parameter */
> >> + movq %rdi, %r10
> >> +
> >> + /* Clear BSS */
> >> + xorl %eax, %eax
> >> + leaq _bss(%rip), %rdi
> >> + leaq _ebss(%rip), %rcx
> >> + subq %rdi, %rcx
> >> + shrq $3, %rcx
> >> + rep stosq
> >> +
> >> and $~0xf, %rsp /* realign the stack
> >> */
> >> movq %rdx, %rbx /* save boot_params
> >> pointer */
> >> + movq %r10, %rdi
> >> call efi_main
> >> - movq %rbx,%rsi
> >> - leaq rva(startup_64)(%rax), %rax
> >> - jmp *%rax
> >> +
> >> + cld
> >> + cli
> >> +
> >> + movq %rbx, %rdi /* boot_params */
> >> + movq %rax, %rsi /* decompressed kernel address */
> >> +
> >> + /* Make sure we have GDT with 32-bit code segment */
> >> + leaq gdt64(%rip), %rax
> >> + addq %rax, 2(%rax)
> >> + lgdt (%rax)
> >> +
> >> + /* Setup data segments. */
> >> + xorl %eax, %eax
> >> + movl %eax, %ds
> >> + movl %eax, %es
> >> + movl %eax, %ss
> >> + movl %eax, %fs
> >> + movl %eax, %gs
> >> +
> >> + pushq %rsi
> >> + pushq %rdi
> >> +
> >> + call load_stage1_idt
> >> + call enable_nx_if_supported
> >> +
> >> + call trampoline_pgtable_init
> >> + movq %rax, %rdx
> >> +
> >> +
> >> + /* Swap %rsi and %rsi */
> >> + popq %rsi
> >> + popq %rdi
> >> +
> >> + /* Save the trampoline address in RCX */
> >> + movq trampoline_32bit(%rip), %rcx
> >> +
> >> + /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far
> >> return */
> >> + pushq $__KERNEL32_CS
> >> + leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rcx), %rax
> >> + pushq %rax
> >> + lretq
> >> +
> >> SYM_FUNC_END(efi64_stub_entry)
> >> SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
> >> #endif
> >> diff --git a/drivers/firmware/efi/Kconfig
> >> b/drivers/firmware/efi/Kconfig
> >> index 043ca31c114e..f50c2a84a754 100644
> >> --- a/drivers/firmware/efi/Kconfig
> >> +++ b/drivers/firmware/efi/Kconfig
> >> @@ -58,6 +58,8 @@ config EFI_DXE_MEM_ATTRIBUTES
> >> Use DXE services to check and alter memory protection
> >> attributes during boot via EFISTUB to ensure that memory
> >> ranges used by the kernel are writable and executable.
> >> + This option also enables stricter memory attributes
> >> + on compressed kernel PE image.
> >>
> >> config EFI_PARAMS_FROM_FDT
> >> bool
> >> diff --git a/drivers/firmware/efi/libstub/Makefile
> >> b/drivers/firmware/efi/libstub/Makefile
> >> index be8b8c6e8b40..99b81c95344c 100644
> >> --- a/drivers/firmware/efi/libstub/Makefile
> >> +++ b/drivers/firmware/efi/libstub/Makefile
> >> @@ -88,7 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o
> >> string.o intrinsics.o systable.o \
> >>
> >> lib-$(CONFIG_ARM) += arm32-stub.o
> >> lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o arm64-entry.o
> >> smbios.o
> >> -lib-$(CONFIG_X86) += x86-stub.o
> >> +lib-$(CONFIG_X86) += x86-stub.o x86-extract-direct.o
> >> lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
> >> lib-$(CONFIG_LOONGARCH) += loongarch.o
> >> loongarch-stub.o
> >>
> >> diff --git a/drivers/firmware/efi/libstub/x86-extract-direct.c
> >> b/drivers/firmware/efi/libstub/x86-extract-direct.c
> >> new file mode 100644
> >> index 000000000000..4ecbc4a9b3ed
> >> --- /dev/null
> >> +++ b/drivers/firmware/efi/libstub/x86-extract-direct.c
> >> @@ -0,0 +1,208 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +
> >> +#include <linux/acpi.h>
> >> +#include <linux/efi.h>
> >> +#include <linux/elf.h>
> >> +#include <linux/stddef.h>
> >> +
> >> +#include <asm/efi.h>
> >> +#include <asm/e820/types.h>
> >> +#include <asm/desc.h>
> >> +#include <asm/boot.h>
> >> +#include <asm/bootparam_utils.h>
> >> +#include <asm/shared/extract.h>
> >> +#include <asm/shared/pgtable.h>
> >> +
> >> +#include "efistub.h"
> >> +#include "x86-stub.h"
> >> +
> >> +static efi_handle_t image_handle;
> >> +
> >> +static void do_puthex(unsigned long value)
> >> +{
> >> + efi_printk("%08lx", value);
> >> +}
> >> +
> >> +static void do_putstr(const char *msg)
> >> +{
> >> + efi_printk("%s", msg);
> >> +}
> >> +
> >> +static unsigned long do_map_range(unsigned long start,
> >> + unsigned long end,
> >> + unsigned int flags)
> >> +{
> >> + efi_status_t status;
> >> +
> >> + unsigned long size = end - start;
> >> +
> >> + if (flags & MAP_ALLOC) {
> >> + unsigned long addr;
> >> +
> >> + status = efi_low_alloc_above(size,
> >> CONFIG_PHYSICAL_ALIGN,
> >> + &addr, start);
>
> Memory for the kernel image is allocated here.
> This function is getting called from the boot/compressed/misc.c with
> MAP_ALLOC flag
> when the address for the kernel is picked.
>
> >> + if (status != EFI_SUCCESS) {
> >> + efi_err("Unable to allocate memory for
> >> uncompressed kernel");
> >> + efi_exit(image_handle, EFI_OUT_OF_RESOURCES);
> >> + }
> >> +
> >> + if (start != addr) {
> >> + efi_debug("Unable to allocate at given
> >> address"
> >> + " (desired=0x%lx, actual=0x%lx)",
> >> + (unsigned long)start, addr);
> >> + start = addr;
> >> + }
> >> + }
> >> +
> >> + if ((flags & (MAP_PROTECT | MAP_ALLOC)) &&
> >> + IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> >> + unsigned long attr = 0;
> >> +
> >> + if (!(flags & MAP_EXEC))
> >> + attr |= EFI_MEMORY_XP;
> >> +
> >> + if (!(flags & MAP_WRITE))
> >> + attr |= EFI_MEMORY_RO;
> >> +
> >> + status = efi_adjust_memory_range_protection(start,
> >> size, attr);
> >> + if (status != EFI_SUCCESS)
> >> + efi_err("Unable to protect memory range");
> >> + }
> >> +
> >> + return start;
> >> +}
> >> +
> >> +/*
> >> + * Trampoline takes 3 pages and can be loaded in first megabyte of
> >> memory
> >> + * with its end placed between 0 and 640k where BIOS might start.
> >> + * (see arch/x86/boot/compressed/pgtable_64.c)
> >> + */
> >> +
> >> +#ifdef CONFIG_64BIT
> >> +static efi_status_t prepare_trampoline(void)
> >> +{
> >> + efi_status_t status;
> >> +
> >> + status = efi_allocate_pages(TRAMPOLINE_32BIT_SIZE,
> >> + (unsigned long
> >> *)&trampoline_32bit,
> >> + TRAMPOLINE_32BIT_PLACEMENT_MAX);
> >> +
> >> + if (status != EFI_SUCCESS)
> >> + return status;
> >> +
> >> + unsigned long trampoline_start = (unsigned
> >> long)trampoline_32bit;
> >> +
> >> + memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
> >> +
> >> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> >> + /* First page of trampoline is a top level page table
> >> */
> >> + efi_adjust_memory_range_protection(trampoline_start,
> >> + PAGE_SIZE,
> >> + EFI_MEMORY_XP);
> >> + }
> >> +
> >> + /* Second page of trampoline is the code (with a padding) */
> >> +
> >> + void *caddr = (void *)trampoline_32bit +
> >> TRAMPOLINE_32BIT_CODE_OFFSET;
> >> +
> >> + memcpy(caddr, trampoline_32bit_src,
> >> TRAMPOLINE_32BIT_CODE_SIZE);
> >> +
> >> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> >> + efi_adjust_memory_range_protection((unsigned
> >> long)caddr,
> >> + PAGE_SIZE,
> >> + EFI_MEMORY_RO);
> >> +
> >> + /* And the last page of trampoline is the stack */
> >> +
> >> + efi_adjust_memory_range_protection(trampoline_start +
> >> 2 * PAGE_SIZE,
> >> + PAGE_SIZE,
> >> + EFI_MEMORY_XP);
> >> + }
> >> +
> >> + return EFI_SUCCESS;
> >> +}
> >> +#else
> >> +static inline efi_status_t prepare_trampoline(void)
> >> +{
> >> + return EFI_SUCCESS;
> >> +}
> >> +#endif
> >> +
> >> +static efi_status_t init_loader_data(efi_handle_t handle,
> >> + struct boot_params *params,
> >> + struct efi_boot_memmap **map)
> >> +{
> >> + struct efi_info *efi = (void *)¶ms->efi_info;
> >> + efi_status_t status;
> >> +
> >> + status = efi_get_memory_map(map, false);
> >> +
> >> + if (status != EFI_SUCCESS) {
> >> + efi_err("Unable to get EFI memory map...\n");
> >> + return status;
> >> + }
> >> +
> >> + const char *signature = efi_is_64bit() ?
> >> EFI64_LOADER_SIGNATURE
> >> + :
> >> EFI32_LOADER_SIGNATURE;
> >> +
> >> + memcpy(&efi->efi_loader_signature, signature, sizeof(__u32));
> >> +
> >> + efi->efi_memdesc_size = (*map)->desc_size;
> >> + efi->efi_memdesc_version = (*map)->desc_ver;
> >> + efi->efi_memmap_size = (*map)->map_size;
> >> +
> >> + efi_set_u64_split((unsigned long)(*map)->map,
> >> + &efi->efi_memmap, &efi->efi_memmap_hi);
> >> +
> >> + efi_set_u64_split((unsigned long)efi_system_table,
> >> + &efi->efi_systab, &efi->efi_systab_hi);
> >> +
> >> + image_handle = handle;
> >> +
> >> + return EFI_SUCCESS;
> >> +}
> >> +
> >> +static void free_loader_data(struct boot_params *params, struct
> >> efi_boot_memmap *map)
> >> +{
> >> + struct efi_info *efi = (void *)¶ms->efi_info;
> >> +
> >> + efi_bs_call(free_pool, map);
> >> +
> >> + efi->efi_memdesc_size = 0;
> >> + efi->efi_memdesc_version = 0;
> >> + efi->efi_memmap_size = 0;
> >> + efi_set_u64_split(0, &efi->efi_memmap, &efi->efi_memmap_hi);
> >> +}
> >> +
> >> +extern unsigned char input_data[];
> >> +extern unsigned int input_len, output_len;
> >> +
> >> +unsigned long extract_kernel_direct(efi_handle_t handle, struct
> >> boot_params *params)
> >> +{
> >> +
> >> + void *res;
> >> + efi_status_t status;
> >> + struct efi_extract_callbacks cb = { 0 };
> >> +
> >> + status = prepare_trampoline();
> >> +
> >> + if (status != EFI_SUCCESS)
> >> + return 0;
> >> +
> >> + /* Prepare environment for do_extract_kernel() call */
> >> + struct efi_boot_memmap *map = NULL;
> >> + status = init_loader_data(handle, params, &map);
> >> +
> >> + if (status != EFI_SUCCESS)
> >> + return 0;
> >> +
> >> + cb.puthex = do_puthex;
> >> + cb.putstr = do_putstr;
> >> + cb.map_range = do_map_range;
> >> +
> >> + res = efi_extract_kernel(params, &cb, input_data, input_len,
> >> output_len);
> >> +
> >> + free_loader_data(params, map);
> >> +
> >> + return (unsigned long)res;
> >> +}
> >> diff --git a/drivers/firmware/efi/libstub/x86-stub.c
> >> b/drivers/firmware/efi/libstub/x86-stub.c
> >> index 7fb1eff88a18..1d1ab1911fd3 100644
> >> --- a/drivers/firmware/efi/libstub/x86-stub.c
> >> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> >> @@ -17,6 +17,7 @@
> >> #include <asm/boot.h>
> >>
> >> #include "efistub.h"
> >> +#include "x86-stub.h"
> >>
> >> /* Maximum physical address for 64-bit kernel with 4-level paging */
> >> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
> >> @@ -24,7 +25,7 @@
> >> const efi_system_table_t *efi_system_table;
> >> const efi_dxe_services_table_t *efi_dxe_table;
> >> u32 image_offset __section(".data");
> >> -static efi_loaded_image_t *image = NULL;
> >> +static efi_loaded_image_t *image __section(".data");
> >>
> >> static efi_status_t
> >> preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct
> >> pci_setup_rom **__rom)
> >> @@ -212,55 +213,9 @@ static void
> >> retrieve_apple_device_properties(struct boot_params *boot_params)
> >> }
> >> }
> >>
> >> -/*
> >> - * Trampoline takes 2 pages and can be loaded in first megabyte of
> >> memory
> >> - * with its end placed between 128k and 640k where BIOS might start.
> >> - * (see arch/x86/boot/compressed/pgtable_64.c)
> >> - *
> >> - * We cannot find exact trampoline placement since memory map
> >> - * can be modified by UEFI, and it can alter the computed address.
> >> - */
> >> -
> >> -#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
> >> -#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
> >> -
> >> -void startup_32(struct boot_params *boot_params);
> >> -
> >> -static void
> >> -setup_memory_protection(unsigned long image_base, unsigned long
> >> image_size)
> >> -{
> >> - /*
> >> - * Allow execution of possible trampoline used
> >> - * for switching between 4- and 5-level page tables
> >> - * and relocated kernel image.
> >> - */
> >> -
> >> - efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
> >> - TRAMPOLINE_PLACEMENT_SIZE,
> >> 0);
> >> -
> >> -#ifdef CONFIG_64BIT
> >> - if (image_base != (unsigned long)startup_32)
> >> - efi_adjust_memory_range_protection(image_base,
> >> image_size, 0);
> >> -#else
> >> - /*
> >> - * Clear protection flags on a whole range of possible
> >> - * addresses used for KASLR. We don't need to do that
> >> - * on x86_64, since KASLR/extraction is performed after
> >> - * dedicated identity page tables are built and we only
> >> - * need to remove possible protection on relocated image
> >> - * itself disregarding further relocations.
> >> - */
> >> - efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
> >> - KERNEL_IMAGE_SIZE -
> >> LOAD_PHYSICAL_ADDR,
> >> - 0);
> >> -#endif
> >> -}
> >> -
> >> static const efi_char16_t apple[] = L"Apple";
> >>
> >> -static void setup_quirks(struct boot_params *boot_params,
> >> - unsigned long image_base,
> >> - unsigned long image_size)
> >> +static void setup_quirks(struct boot_params *boot_params)
> >> {
> >> efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
> >> efi_table_attr(efi_system_table, fw_vendor);
> >> @@ -269,9 +224,6 @@ static void setup_quirks(struct boot_params
> >> *boot_params,
> >> if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
> >> retrieve_apple_device_properties(boot_params);
> >> }
> >> -
> >> - if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
> >> - setup_memory_protection(image_base, image_size);
> >> }
> >>
> >> /*
> >> @@ -384,7 +336,7 @@ static void setup_graphics(struct boot_params
> >> *boot_params)
> >> }
> >>
> >>
> >> -static void __noreturn efi_exit(efi_handle_t handle, efi_status_t
> >> status)
> >> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
> >> {
> >> efi_bs_call(exit, handle, status, 0, NULL);
> >> for(;;)
> >> @@ -707,8 +659,7 @@ static efi_status_t exit_boot(struct boot_params
> >> *boot_params, void *handle)
> >> }
> >>
> >> /*
> >> - * On success, we return the address of startup_32, which has
> >> potentially been
> >> - * relocated by efi_relocate_kernel.
> >> + * On success, we return extracted kernel entry point.
> >> * On failure, we exit to the firmware via efi_exit instead of
> >> returning.
> >> */
> >> asmlinkage unsigned long efi_main(efi_handle_t handle,
> >> @@ -733,60 +684,6 @@ asmlinkage unsigned long efi_main(efi_handle_t
> >> handle,
> >> efi_dxe_table = NULL;
> >> }
> >>
> >> - /*
> >> - * If the kernel isn't already loaded at a suitable address,
> >> - * relocate it.
> >> - *
> >> - * It must be loaded above LOAD_PHYSICAL_ADDR.
> >> - *
> >> - * The maximum address for 64-bit is 1 << 46 for 4-level
> >> paging. This
> >> - * is defined as the macro MAXMEM, but unfortunately that is
> >> not a
> >> - * compile-time constant if 5-level paging is configured, so
> >> we instead
> >> - * define our own macro for use here.
> >> - *
> >> - * For 32-bit, the maximum address is complicated to figure
> >> out, for
> >> - * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same
> >> as what
> >> - * KASLR uses.
> >> - *
> >> - * Also relocate it if image_offset is zero, i.e. the kernel
> >> wasn't
> >> - * loaded by LoadImage, but rather by a bootloader that called
> >> the
> >> - * handover entry. The reason we must always relocate in this
> >> case is
> >> - * to handle the case of systemd-boot booting a unified kernel
> >> image,
> >> - * which is a PE executable that contains the bzImage and an
> >> initrd as
> >> - * COFF sections. The initrd section is placed after the
> >> bzImage
> >> - * without ensuring that there are at least init_size bytes
> >> available
> >> - * for the bzImage, and thus the compressed kernel's startup
> >> code may
> >> - * overwrite the initrd unless it is moved out of the way.
> >> - */
> >> -
> >> - buffer_start = ALIGN(bzimage_addr - image_offset,
> >> - hdr->kernel_alignment);
> >> - buffer_end = buffer_start + hdr->init_size;
> >> -
> >> - if ((buffer_start < LOAD_PHYSICAL_ADDR)
> >> ||
> >> - (IS_ENABLED(CONFIG_X86_32) && buffer_end >
> >> KERNEL_IMAGE_SIZE) ||
> >> - (IS_ENABLED(CONFIG_X86_64) && buffer_end >
> >> MAXMEM_X86_64_4LEVEL) ||
> >> - (image_offset == 0)) {
> >> - extern char _bss[];
> >> -
> >> - status = efi_relocate_kernel(&bzimage_addr,
> >> - (unsigned long)_bss -
> >> bzimage_addr,
> >> - hdr->init_size,
> >> - hdr->pref_address,
> >> - hdr->kernel_alignment,
> >> - LOAD_PHYSICAL_ADDR);
> >> - if (status != EFI_SUCCESS) {
> >> - efi_err("efi_relocate_kernel() failed!\n");
> >> - goto fail;
> >> - }
> >> - /*
> >> - * Now that we've copied the kernel elsewhere, we no
> >> longer
> >> - * have a set up block before startup_32(), so reset
> >> image_offset
> >> - * to zero in case it was set earlier.
> >> - */
> >> - image_offset = 0;
> >> - }
> >> -
> >> #ifdef CONFIG_CMDLINE_BOOL
> >> status = efi_parse_options(CONFIG_CMDLINE);
> >> if (status != EFI_SUCCESS) {
> >> @@ -843,7 +740,11 @@ asmlinkage unsigned long efi_main(efi_handle_t
> >> handle,
> >>
> >> setup_efi_pci(boot_params);
> >>
> >> - setup_quirks(boot_params, bzimage_addr, buffer_end -
> >> buffer_start);
> >> + setup_quirks(boot_params);
> >> +
> >> + bzimage_addr = extract_kernel_direct(handle, boot_params);
> >> + if (!bzimage_addr)
> >> + goto fail;
> >>
> >> status = exit_boot(boot_params, handle);
> >> if (status != EFI_SUCCESS) {
> >> diff --git a/drivers/firmware/efi/libstub/x86-stub.h
> >> b/drivers/firmware/efi/libstub/x86-stub.h
> >> new file mode 100644
> >> index 000000000000..baecc7c6e602
> >> --- /dev/null
> >> +++ b/drivers/firmware/efi/libstub/x86-stub.h
> >> @@ -0,0 +1,14 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +
> >> +#ifndef _DRIVERS_FIRMWARE_EFI_X86STUB_H
> >> +#define _DRIVERS_FIRMWARE_EFI_X86STUB_H
> >> +
> >> +#include <linux/efi.h>
> >> +
> >> +#include <asm/bootparam.h>
> >> +
> >> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status);
> >> +unsigned long extract_kernel_direct(efi_handle_t handle, struct
> >> boot_params *boot_params);
> >> +void startup_32(struct boot_params *boot_params);
> >> +
> >> +#endif
> >> --
> >> 2.37.4
> >>
On 2023-03-09 19:50, Ard Biesheuvel wrote:
> On Thu, 9 Mar 2023 at 17:25, Evgeniy Baskov <[email protected]> wrote:
>>
>> On 2023-03-09 18:57, Ard Biesheuvel wrote:
>> > On Thu, 15 Dec 2022 at 13:42, Evgeniy Baskov <[email protected]> wrote:
>> >>
>> >> Use newer C standard. Since kernel requires C99 compiler now,
>> >> we can make use of the new features to make the core more readable.
>> >>
>> >> Use mmap() for reading files also to make things simpler.
>> >>
>> >> Replace most magic numbers with defines.
>> >>
>> >> Should have no functional changes. This is done in preparation for the
>> >> next changes that makes generated PE header more spec compliant.
>> >>
>> >> Tested-by: Mario Limonciello <[email protected]>
>> >> Tested-by: Peter Jones <[email protected]>
>> >> Signed-off-by: Evgeniy Baskov <[email protected]>
>> >> ---
>> >> arch/x86/boot/tools/build.c | 387
>> >> +++++++++++++++++++++++-------------
>> >> 1 file changed, 245 insertions(+), 142 deletions(-)
>> >>
>> >> diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
>> >> index bd247692b701..fbc5315af032 100644
>> >> --- a/arch/x86/boot/tools/build.c
>> >> +++ b/arch/x86/boot/tools/build.c
>> >> @@ -25,20 +25,21 @@
>> >> * Substantially overhauled by H. Peter Anvin, April 2007
>> >> */
>> >>
>> >> +#include <fcntl.h>
>> >> +#include <stdarg.h>
>> >> +#include <stdint.h>
>> >> #include <stdio.h>
>> >> -#include <string.h>
>> >> #include <stdlib.h>
>> >> -#include <stdarg.h>
>> >> -#include <sys/types.h>
>> >> +#include <string.h>
>> >> +#include <sys/mman.h>
>> >> #include <sys/stat.h>
>> >> +#include <sys/types.h>
>> >> #include <unistd.h>
>> >> -#include <fcntl.h>
>> >> -#include <sys/mman.h>
>> >> +
>> >> #include <tools/le_byteshift.h>
>> >> +#include <linux/pe.h>
>> >>
>> >> -typedef unsigned char u8;
>> >> -typedef unsigned short u16;
>> >> -typedef unsigned int u32;
>> >> +#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
>> >>
>> >> #define DEFAULT_MAJOR_ROOT 0
>> >> #define DEFAULT_MINOR_ROOT 0
>> >> @@ -48,8 +49,13 @@ typedef unsigned int u32;
>> >> #define SETUP_SECT_MIN 5
>> >> #define SETUP_SECT_MAX 64
>> >>
>> >> +#define PARAGRAPH_SIZE 16
>> >> +#define SECTOR_SIZE 512
>> >> +#define FILE_ALIGNMENT 512
>> >> +#define SECTION_ALIGNMENT 4096
>> >> +
>> >> /* This must be large enough to hold the entire setup */
>> >> -u8 buf[SETUP_SECT_MAX*512];
>> >> +uint8_t buf[SETUP_SECT_MAX*SECTOR_SIZE];
>> >>
>> >> #define PECOFF_RELOC_RESERVE 0x20
>> >>
>> >> @@ -59,6 +65,52 @@ u8 buf[SETUP_SECT_MAX*512];
>> >> #define PECOFF_COMPAT_RESERVE 0x0
>> >> #endif
>> >>
>> >> +#define RELOC_SECTION_SIZE 10
>> >> +
>> >> +/* PE header has different format depending on the architecture */
>> >> +#ifdef CONFIG_X86_64
>> >> +typedef struct pe32plus_opt_hdr pe_opt_hdr;
>> >> +#else
>> >> +typedef struct pe32_opt_hdr pe_opt_hdr;
>> >> +#endif
>> >> +
>> >> +static inline struct pe_hdr *get_pe_header(uint8_t *buf)
>> >> +{
>> >> + uint32_t pe_offset =
>> >> get_unaligned_le32(buf+MZ_HEADER_PEADDR_OFFSET);
>> >> + return (struct pe_hdr *)(buf + pe_offset);
>> >> +}
>> >> +
>> >> +static inline pe_opt_hdr *get_pe_opt_header(uint8_t *buf)
>> >> +{
>> >> + return (pe_opt_hdr *)(get_pe_header(buf) + 1);
>> >> +}
>> >> +
>> >> +static inline struct section_header *get_sections(uint8_t *buf)
>> >> +{
>> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
>> >> + uint32_t n_data_dirs = get_unaligned_le32(&hdr->data_dirs);
>> >> + uint8_t *sections = (uint8_t *)(hdr + 1) +
>> >> n_data_dirs*sizeof(struct data_dirent);
>> >> + return (struct section_header *)sections;
>> >> +}
>> >> +
>> >> +static inline struct data_directory *get_data_dirs(uint8_t *buf)
>> >> +{
>> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
>> >> + return (struct data_directory *)(hdr + 1);
>> >> +}
>> >> +
>> >> +#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
>> >
>> > Can we drop this conditional?
>>
>> Without CONFIG_EFI_DXE_MEM_ATTRIBUTES memory attributes are not
>> getting applies anywhere, so this would break 'nokaslr' on UEFI
>> implementations that honor section attributes.
>>
>
> How so? This only affects the mappings that are created by UEFI for
> the decompressor binary, right?
I was thinking about the in-place decompression, but now I've realized
that I was wrong since in-place decompression cannot happen when booting
via the stub. I'll remove the ifdef.
>
>> KASLR is already broken without that option on implementations
>> that disallow execution of the free memory though. But unlike
>> free memory, sections are more likely to get protected, I think.
>>
>
> We need to allocate those pages properly in any case (see my other
> reply) so it is no longer free memory.
It should be fine, as I explained.
The only thing that is a little unexpected is that the kernel might
shift even with 'nokaslr' when the LOAD_PHYSICAL_ADDR is already taken
by some firmware allocation (or by us). This should cause no real
problems, since the kernel is required to be relocatable for the
EFISTUB.
>
>> >> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE |
>> >> IMAGE_SCN_ALIGN_4096BYTES)
>> >> +#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE |
>> >> IMAGE_SCN_ALIGN_4096BYTES)
>> >> +#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
>> >
>> > Please drop the alignment flags - they don't apply to executable only
>> > object files.
>>
>> Got it, will remove them in v5.
>>
>> >
>> >> +#else
>> >> +/* With memory protection disabled all sections are RWX */
>> >> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
>> >> + IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
>> >> +#define SCN_RX SCN_RW
>> >> +#define SCN_RO SCN_RW
>> >> +#endif
>> >> +
>> >> static unsigned long efi32_stub_entry;
>> >> static unsigned long efi64_stub_entry;
>> >> static unsigned long efi_pe_entry;
>> >> @@ -70,7 +122,7 @@ static unsigned long _end;
>> >>
>> >>
>> >> /*----------------------------------------------------------------------*/
>> >>
>> >> -static const u32 crctab32[] = {
>> >> +static const uint32_t crctab32[] = {
>> >
>> > Replacing all the type names makes this patch very messy. Can we back
>> > that out please?
>>
>> Ok, I will revert them.
>>
>> >
>> >> 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419,
>> >> 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4,
>> >> 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07,
>> >> @@ -125,12 +177,12 @@ static const u32 crctab32[] = {
>> >> 0x2d02ef8d
>> >> };
>> >>
>> >> -static u32 partial_crc32_one(u8 c, u32 crc)
>> >> +static uint32_t partial_crc32_one(uint8_t c, uint32_t crc)
>> >> {
>> >> return crctab32[(crc ^ c) & 0xff] ^ (crc >> 8);
>> >> }
>> >>
>> >> -static u32 partial_crc32(const u8 *s, int len, u32 crc)
>> >> +static uint32_t partial_crc32(const uint8_t *s, int len, uint32_t
>> >> crc)
>> >> {
>> >> while (len--)
>> >> crc = partial_crc32_one(*s++, crc);
>> >> @@ -152,57 +204,106 @@ static void usage(void)
>> >> die("Usage: build setup system zoffset.h image");
>> >> }
>> >>
>> >> +static void *map_file(const char *path, size_t *psize)
>> >> +{
>> >> + struct stat statbuf;
>> >> + size_t size;
>> >> + void *addr;
>> >> + int fd;
>> >> +
>> >> + fd = open(path, O_RDONLY);
>> >> + if (fd < 0)
>> >> + die("Unable to open `%s': %m", path);
>> >> + if (fstat(fd, &statbuf))
>> >> + die("Unable to stat `%s': %m", path);
>> >> +
>> >> + size = statbuf.st_size;
>> >> + /*
>> >> + * Map one byte more, to allow adding null-terminator
>> >> + * for text files.
>> >> + */
>> >> + addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE,
>> >> MAP_PRIVATE, fd, 0);
>> >> + if (addr == MAP_FAILED)
>> >> + die("Unable to mmap '%s': %m", path);
>> >> +
>> >> + close(fd);
>> >> +
>> >> + *psize = size;
>> >> + return addr;
>> >> +}
>> >> +
>> >> +static void unmap_file(void *addr, size_t size)
>> >> +{
>> >> + munmap(addr, size + 1);
>> >> +}
>> >> +
>> >> +static void *map_output_file(const char *path, size_t size)
>> >> +{
>> >> + void *addr;
>> >> + int fd;
>> >> +
>> >> + fd = open(path, O_RDWR | O_CREAT, 0660);
>> >> + if (fd < 0)
>> >> + die("Unable to create `%s': %m", path);
>> >> +
>> >> + if (ftruncate(fd, size))
>> >> + die("Unable to resize `%s': %m", path);
>> >> +
>> >> + addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
>> >> fd, 0);
>> >> + if (addr == MAP_FAILED)
>> >> + die("Unable to mmap '%s': %m", path);
>> >> +
>> >> + return addr;
>> >> +}
>> >> +
>> >> #ifdef CONFIG_EFI_STUB
>> >>
>> >> -static void update_pecoff_section_header_fields(char *section_name,
>> >> u32 vma, u32 size, u32 datasz, u32 offset)
>> >> +static void update_pecoff_section_header_fields(char *section_name,
>> >> uint32_t vma,
>> >> + uint32_t size,
>> >> uint32_t datasz,
>> >> + uint32_t offset)
>> >> {
>> >> unsigned int pe_header;
>> >> unsigned short num_sections;
>> >> - u8 *section;
>> >> + struct section_header *section;
>> >>
>> >> - pe_header = get_unaligned_le32(&buf[0x3c]);
>> >> - num_sections = get_unaligned_le16(&buf[pe_header + 6]);
>> >> -
>> >> -#ifdef CONFIG_X86_32
>> >> - section = &buf[pe_header + 0xa8];
>> >> -#else
>> >> - section = &buf[pe_header + 0xb8];
>> >> -#endif
>> >> + struct pe_hdr *hdr = get_pe_header(buf);
>> >> + num_sections = get_unaligned_le16(&hdr->sections);
>> >> + section = get_sections(buf);
>> >>
>> >> while (num_sections > 0) {
>> >> - if (strncmp((char*)section, section_name, 8) == 0) {
>> >> + if (strncmp(section->name, section_name, 8) == 0) {
>> >> /* section header size field */
>> >> - put_unaligned_le32(size, section + 0x8);
>> >> + put_unaligned_le32(size,
>> >> §ion->virtual_size);
>> >>
>> >> /* section header vma field */
>> >> - put_unaligned_le32(vma, section + 0xc);
>> >> + put_unaligned_le32(vma,
>> >> §ion->virtual_address);
>> >>
>> >> /* section header 'size of initialised data'
>> >> field */
>> >> - put_unaligned_le32(datasz, section + 0x10);
>> >> + put_unaligned_le32(datasz,
>> >> §ion->raw_data_size);
>> >>
>> >> /* section header 'file offset' field */
>> >> - put_unaligned_le32(offset, section + 0x14);
>> >> + put_unaligned_le32(offset,
>> >> §ion->data_addr);
>> >>
>> >> break;
>> >> }
>> >> - section += 0x28;
>> >> + section++;
>> >> num_sections--;
>> >> }
>> >> }
>> >>
>> >> -static void update_pecoff_section_header(char *section_name, u32
>> >> offset, u32 size)
>> >> +static void update_pecoff_section_header(char *section_name, uint32_t
>> >> offset, uint32_t size)
>> >> {
>> >> update_pecoff_section_header_fields(section_name, offset,
>> >> size, size, offset);
>> >> }
>> >>
>> >> static void update_pecoff_setup_and_reloc(unsigned int size)
>> >> {
>> >> - u32 setup_offset = 0x200;
>> >> - u32 reloc_offset = size - PECOFF_RELOC_RESERVE -
>> >> PECOFF_COMPAT_RESERVE;
>> >> + uint32_t setup_offset = SECTOR_SIZE;
>> >> + uint32_t reloc_offset = size - PECOFF_RELOC_RESERVE -
>> >> PECOFF_COMPAT_RESERVE;
>> >> #ifdef CONFIG_EFI_MIXED
>> >> - u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
>> >> + uint32_t compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
>> >> #endif
>> >> - u32 setup_size = reloc_offset - setup_offset;
>> >> + uint32_t setup_size = reloc_offset - setup_offset;
>> >>
>> >> update_pecoff_section_header(".setup", setup_offset,
>> >> setup_size);
>> >> update_pecoff_section_header(".reloc", reloc_offset,
>> >> PECOFF_RELOC_RESERVE);
>> >> @@ -211,8 +312,8 @@ static void update_pecoff_setup_and_reloc(unsigned
>> >> int size)
>> >> * Modify .reloc section contents with a single entry. The
>> >> * relocation is applied to offset 10 of the relocation
>> >> section.
>> >> */
>> >> - put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
>> >> - put_unaligned_le32(10, &buf[reloc_offset + 4]);
>> >> + put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE,
>> >> &buf[reloc_offset]);
>> >> + put_unaligned_le32(RELOC_SECTION_SIZE, &buf[reloc_offset +
>> >> 4]);
>> >>
>> >> #ifdef CONFIG_EFI_MIXED
>> >> update_pecoff_section_header(".compat", compat_offset,
>> >> PECOFF_COMPAT_RESERVE);
>> >> @@ -224,19 +325,17 @@ static void
>> >> update_pecoff_setup_and_reloc(unsigned int size)
>> >> */
>> >> buf[compat_offset] = 0x1;
>> >> buf[compat_offset + 1] = 0x8;
>> >> - put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
>> >> + put_unaligned_le16(IMAGE_FILE_MACHINE_I386, &buf[compat_offset
>> >> + 2]);
>> >> put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset +
>> >> 4]);
>> >> #endif
>> >> }
>> >>
>> >> -static void update_pecoff_text(unsigned int text_start, unsigned int
>> >> file_sz,
>> >> +static unsigned int update_pecoff_sections(unsigned int text_start,
>> >> unsigned int text_sz,
>> >> unsigned int init_sz)
>> >> {
>> >> - unsigned int pe_header;
>> >> - unsigned int text_sz = file_sz - text_start;
>> >> + unsigned int file_sz = text_start + text_sz;
>> >> unsigned int bss_sz = init_sz - file_sz;
>> >> -
>> >> - pe_header = get_unaligned_le32(&buf[0x3c]);
>> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
>> >>
>> >> /*
>> >> * The PE/COFF loader may load the image at an address which
>> >> is
>> >> @@ -254,18 +353,20 @@ static void update_pecoff_text(unsigned int
>> >> text_start, unsigned int file_sz,
>> >> * Size of code: Subtract the size of the first sector (512
>> >> bytes)
>> >> * which includes the header.
>> >> */
>> >> - put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header +
>> >> 0x1c]);
>> >> + put_unaligned_le32(file_sz - SECTOR_SIZE + bss_sz,
>> >> &hdr->text_size);
>> >>
>> >> /* Size of image */
>> >> - put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
>> >> + put_unaligned_le32(init_sz, &hdr->image_size);
>> >>
>> >> /*
>> >> * Address of entry point for PE/COFF executable
>> >> */
>> >> - put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header +
>> >> 0x28]);
>> >> + put_unaligned_le32(text_start + efi_pe_entry,
>> >> &hdr->entry_point);
>> >>
>> >> update_pecoff_section_header_fields(".text", text_start,
>> >> text_sz + bss_sz,
>> >> text_sz, text_start);
>> >> +
>> >> + return text_start + file_sz;
>> >> }
>> >>
>> >> static int reserve_pecoff_reloc_section(int c)
>> >> @@ -275,7 +376,7 @@ static int reserve_pecoff_reloc_section(int c)
>> >> return PECOFF_RELOC_RESERVE;
>> >> }
>> >>
>> >> -static void efi_stub_defaults(void)
>> >> +static void efi_stub_update_defaults(void)
>> >> {
>> >> /* Defaults for old kernel */
>> >> #ifdef CONFIG_X86_32
>> >> @@ -298,7 +399,7 @@ static void efi_stub_entry_update(void)
>> >>
>> >> #ifdef CONFIG_EFI_MIXED
>> >> if (efi32_stub_entry != addr)
>> >> - die("32-bit and 64-bit EFI entry points do not
>> >> match\n");
>> >> + die("32-bit and 64-bit EFI entry points do not
>> >> match");
>> >> #endif
>> >> #endif
>> >> put_unaligned_le32(addr, &buf[0x264]);
>> >> @@ -310,7 +411,7 @@ static inline void
>> >> update_pecoff_setup_and_reloc(unsigned int size) {}
>> >> static inline void update_pecoff_text(unsigned int text_start,
>> >> unsigned int file_sz,
>> >> unsigned int init_sz) {}
>> >> -static inline void efi_stub_defaults(void) {}
>> >> +static inline void efi_stub_update_defaults(void) {}
>> >> static inline void efi_stub_entry_update(void) {}
>> >>
>> >> static inline int reserve_pecoff_reloc_section(int c)
>> >> @@ -338,20 +439,15 @@ static int reserve_pecoff_compat_section(int c)
>> >>
>> >> static void parse_zoffset(char *fname)
>> >> {
>> >> - FILE *file;
>> >> - char *p;
>> >> - int c;
>> >> + size_t size;
>> >> + char *data, *p;
>> >>
>> >> - file = fopen(fname, "r");
>> >> - if (!file)
>> >> - die("Unable to open `%s': %m", fname);
>> >> - c = fread(buf, 1, sizeof(buf) - 1, file);
>> >> - if (ferror(file))
>> >> - die("read-error on `zoffset.h'");
>> >> - fclose(file);
>> >> - buf[c] = 0;
>> >> + data = map_file(fname, &size);
>> >>
>> >> - p = (char *)buf;
>> >> + /* We can do that, since we mapped one byte more */
>> >> + data[size] = 0;
>> >> +
>> >> + p = (char *)data;
>> >>
>> >> while (p && *p) {
>> >> PARSE_ZOFS(p, efi32_stub_entry);
>> >> @@ -367,82 +463,99 @@ static void parse_zoffset(char *fname)
>> >> while (p && (*p == '\r' || *p == '\n'))
>> >> p++;
>> >> }
>> >> +
>> >> + unmap_file(data, size);
>> >> }
>> >>
>> >> -int main(int argc, char ** argv)
>> >> +static unsigned int read_setup(char *path)
>> >> {
>> >> - unsigned int i, sz, setup_sectors, init_sz;
>> >> - int c;
>> >> - u32 sys_size;
>> >> - struct stat sb;
>> >> - FILE *file, *dest;
>> >> - int fd;
>> >> - void *kernel;
>> >> - u32 crc = 0xffffffffUL;
>> >> -
>> >> - efi_stub_defaults();
>> >> -
>> >> - if (argc != 5)
>> >> - usage();
>> >> - parse_zoffset(argv[3]);
>> >> -
>> >> - dest = fopen(argv[4], "w");
>> >> - if (!dest)
>> >> - die("Unable to write `%s': %m", argv[4]);
>> >> + FILE *file;
>> >> + unsigned int setup_size, file_size;
>> >>
>> >> /* Copy the setup code */
>> >> - file = fopen(argv[1], "r");
>> >> + file = fopen(path, "r");
>> >> if (!file)
>> >> - die("Unable to open `%s': %m", argv[1]);
>> >> - c = fread(buf, 1, sizeof(buf), file);
>> >> + die("Unable to open `%s': %m", path);
>> >> +
>> >> + file_size = fread(buf, 1, sizeof(buf), file);
>> >> if (ferror(file))
>> >> die("read-error on `setup'");
>> >> - if (c < 1024)
>> >> +
>> >> + if (file_size < 2 * SECTOR_SIZE)
>> >> die("The setup must be at least 1024 bytes");
>> >> - if (get_unaligned_le16(&buf[510]) != 0xAA55)
>> >> +
>> >> + if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
>> >> die("Boot block hasn't got boot flag (0xAA55)");
>> >> +
>> >> fclose(file);
>> >>
>> >> - c += reserve_pecoff_compat_section(c);
>> >> - c += reserve_pecoff_reloc_section(c);
>> >> + /* Reserve space for PE sections */
>> >> + file_size += reserve_pecoff_compat_section(file_size);
>> >> + file_size += reserve_pecoff_reloc_section(file_size);
>> >>
>> >> /* Pad unused space with zeros */
>> >> - setup_sectors = (c + 511) / 512;
>> >> - if (setup_sectors < SETUP_SECT_MIN)
>> >> - setup_sectors = SETUP_SECT_MIN;
>> >> - i = setup_sectors*512;
>> >> - memset(buf+c, 0, i-c);
>> >>
>> >> - update_pecoff_setup_and_reloc(i);
>> >> + setup_size = round_up(file_size, SECTOR_SIZE);
>> >> +
>> >> + if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
>> >> + setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
>> >> +
>> >> + /*
>> >> + * Global buffer is already initialised
>> >> + * to 0, but just in case, zero out padding.
>> >> + */
>> >> +
>> >> + memset(buf + file_size, 0, setup_size - file_size);
>> >> +
>> >> + return setup_size;
>> >> +}
>> >> +
>> >> +int main(int argc, char **argv)
>> >> +{
>> >> + size_t kern_file_size;
>> >> + unsigned int setup_size;
>> >> + unsigned int setup_sectors;
>> >> + unsigned int init_size;
>> >> + unsigned int total_size;
>> >> + unsigned int kern_size;
>> >> + void *kernel;
>> >> + uint32_t crc = 0xffffffffUL;
>> >> + uint8_t *output;
>> >> +
>> >> + if (argc != 5)
>> >> + usage();
>> >> +
>> >> + efi_stub_update_defaults();
>> >> + parse_zoffset(argv[3]);
>> >> +
>> >> + setup_size = read_setup(argv[1]);
>> >> +
>> >> + setup_sectors = setup_size/SECTOR_SIZE;
>> >>
>> >> /* Set the default root device */
>> >> put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
>> >>
>> >> - /* Open and stat the kernel file */
>> >> - fd = open(argv[2], O_RDONLY);
>> >> - if (fd < 0)
>> >> - die("Unable to open `%s': %m", argv[2]);
>> >> - if (fstat(fd, &sb))
>> >> - die("Unable to stat `%s': %m", argv[2]);
>> >> - sz = sb.st_size;
>> >> - kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
>> >> - if (kernel == MAP_FAILED)
>> >> - die("Unable to mmap '%s': %m", argv[2]);
>> >> - /* Number of 16-byte paragraphs, including space for a 4-byte
>> >> CRC */
>> >> - sys_size = (sz + 15 + 4) / 16;
>> >> + /* Map kernel file to memory */
>> >> + kernel = map_file(argv[2], &kern_file_size);
>> >> +
>> >> #ifdef CONFIG_EFI_STUB
>> >> - /*
>> >> - * COFF requires minimum 32-byte alignment of sections, and
>> >> - * adding a signature is problematic without that alignment.
>> >> - */
>> >> - sys_size = (sys_size + 1) & ~1;
>> >> + /* PE specification require 512-byte minimum section file
>> >> alignment */
>> >> + kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
>> >> + update_pecoff_setup_and_reloc(setup_size);
>> >> +#else
>> >> + /* Number of 16-byte paragraphs, including space for a 4-byte
>> >> CRC */
>> >> + kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
>> >> #endif
>> >>
>> >> /* Patch the setup code with the appropriate size parameters
>> >> */
>> >> - buf[0x1f1] = setup_sectors-1;
>> >> - put_unaligned_le32(sys_size, &buf[0x1f4]);
>> >> + buf[0x1f1] = setup_sectors - 1;
>> >> + put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
>> >> +
>> >> + /* Update kernel_info offset. */
>> >> + put_unaligned_le32(kernel_info, &buf[0x268]);
>> >> +
>> >> + init_size = get_unaligned_le32(&buf[0x260]);
>> >>
>> >> - init_sz = get_unaligned_le32(&buf[0x260]);
>> >> #ifdef CONFIG_EFI_STUB
>> >> /*
>> >> * The decompression buffer will start at ImageBase. When
>> >> relocating
>> >> @@ -458,45 +571,35 @@ int main(int argc, char ** argv)
>> >> * For future-proofing, increase init_sz if necessary.
>> >> */
>> >>
>> >> - if (init_sz - _end < i + _ehead) {
>> >> - init_sz = (i + _ehead + _end + 4095) & ~4095;
>> >> - put_unaligned_le32(init_sz, &buf[0x260]);
>> >> + if (init_size - _end < setup_size + _ehead) {
>> >> + init_size = round_up(setup_size + _ehead + _end,
>> >> SECTION_ALIGNMENT);
>> >> + put_unaligned_le32(init_size, &buf[0x260]);
>> >> }
>> >> -#endif
>> >> - update_pecoff_text(setup_sectors * 512, i + (sys_size * 16),
>> >> init_sz);
>> >>
>> >> - efi_stub_entry_update();
>> >> -
>> >> - /* Update kernel_info offset. */
>> >> - put_unaligned_le32(kernel_info, &buf[0x268]);
>> >> + total_size = update_pecoff_sections(setup_size, kern_size,
>> >> init_size);
>> >>
>> >> - crc = partial_crc32(buf, i, crc);
>> >> - if (fwrite(buf, 1, i, dest) != i)
>> >> - die("Writing setup failed");
>> >> + efi_stub_entry_update();
>> >> +#else
>> >> + (void)init_size;
>> >> + total_size = setup_size + kern_size;
>> >> +#endif
>> >>
>> >> - /* Copy the kernel code */
>> >> - crc = partial_crc32(kernel, sz, crc);
>> >> - if (fwrite(kernel, 1, sz, dest) != sz)
>> >> - die("Writing kernel failed");
>> >> + output = map_output_file(argv[4], total_size);
>> >>
>> >> - /* Add padding leaving 4 bytes for the checksum */
>> >> - while (sz++ < (sys_size*16) - 4) {
>> >> - crc = partial_crc32_one('\0', crc);
>> >> - if (fwrite("\0", 1, 1, dest) != 1)
>> >> - die("Writing padding failed");
>> >> - }
>> >> + memcpy(output, buf, setup_size);
>> >> + memcpy(output + setup_size, kernel, kern_file_size);
>> >> + memset(output + setup_size + kern_file_size, 0, kern_size -
>> >> kern_file_size);
>> >>
>> >> - /* Write the CRC */
>> >> - put_unaligned_le32(crc, buf);
>> >> - if (fwrite(buf, 1, 4, dest) != 4)
>> >> - die("Writing CRC failed");
>> >> + /* Calculate and write kernel checksum. */
>> >> + crc = partial_crc32(output, total_size - 4, crc);
>> >> + put_unaligned_le32(crc, &output[total_size - 4]);
>> >>
>> >> - /* Catch any delayed write failures */
>> >> - if (fclose(dest))
>> >> - die("Writing image failed");
>> >> + /* Catch any delayed write failures. */
>> >> + if (munmap(output, total_size) < 0)
>> >> + die("Writing kernel failed");
>> >>
>> >> - close(fd);
>> >> + unmap_file(kernel, kern_file_size);
>> >>
>> >> - /* Everything is OK */
>> >> + /* Everything is OK. */
>> >> return 0;
>> >> }
>> >> --
>> >> 2.37.4
>> >>
On Thu, 9 Mar 2023 at 18:22, Evgeniy Baskov <[email protected]> wrote:
>
> On 2023-03-09 19:50, Ard Biesheuvel wrote:
> > On Thu, 9 Mar 2023 at 17:25, Evgeniy Baskov <[email protected]> wrote:
> >>
> >> On 2023-03-09 18:57, Ard Biesheuvel wrote:
> >> > On Thu, 15 Dec 2022 at 13:42, Evgeniy Baskov <[email protected]> wrote:
> >> >>
> >> >> Use newer C standard. Since kernel requires C99 compiler now,
> >> >> we can make use of the new features to make the core more readable.
> >> >>
> >> >> Use mmap() for reading files also to make things simpler.
> >> >>
> >> >> Replace most magic numbers with defines.
> >> >>
> >> >> Should have no functional changes. This is done in preparation for the
> >> >> next changes that makes generated PE header more spec compliant.
> >> >>
> >> >> Tested-by: Mario Limonciello <[email protected]>
> >> >> Tested-by: Peter Jones <[email protected]>
> >> >> Signed-off-by: Evgeniy Baskov <[email protected]>
> >> >> ---
> >> >> arch/x86/boot/tools/build.c | 387
> >> >> +++++++++++++++++++++++-------------
> >> >> 1 file changed, 245 insertions(+), 142 deletions(-)
> >> >>
> >> >> diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
> >> >> index bd247692b701..fbc5315af032 100644
> >> >> --- a/arch/x86/boot/tools/build.c
> >> >> +++ b/arch/x86/boot/tools/build.c
> >> >> @@ -25,20 +25,21 @@
> >> >> * Substantially overhauled by H. Peter Anvin, April 2007
> >> >> */
> >> >>
> >> >> +#include <fcntl.h>
> >> >> +#include <stdarg.h>
> >> >> +#include <stdint.h>
> >> >> #include <stdio.h>
> >> >> -#include <string.h>
> >> >> #include <stdlib.h>
> >> >> -#include <stdarg.h>
> >> >> -#include <sys/types.h>
> >> >> +#include <string.h>
> >> >> +#include <sys/mman.h>
> >> >> #include <sys/stat.h>
> >> >> +#include <sys/types.h>
> >> >> #include <unistd.h>
> >> >> -#include <fcntl.h>
> >> >> -#include <sys/mman.h>
> >> >> +
> >> >> #include <tools/le_byteshift.h>
> >> >> +#include <linux/pe.h>
> >> >>
> >> >> -typedef unsigned char u8;
> >> >> -typedef unsigned short u16;
> >> >> -typedef unsigned int u32;
> >> >> +#define round_up(x, n) (((x) + (n) - 1) & ~((n) - 1))
> >> >>
> >> >> #define DEFAULT_MAJOR_ROOT 0
> >> >> #define DEFAULT_MINOR_ROOT 0
> >> >> @@ -48,8 +49,13 @@ typedef unsigned int u32;
> >> >> #define SETUP_SECT_MIN 5
> >> >> #define SETUP_SECT_MAX 64
> >> >>
> >> >> +#define PARAGRAPH_SIZE 16
> >> >> +#define SECTOR_SIZE 512
> >> >> +#define FILE_ALIGNMENT 512
> >> >> +#define SECTION_ALIGNMENT 4096
> >> >> +
> >> >> /* This must be large enough to hold the entire setup */
> >> >> -u8 buf[SETUP_SECT_MAX*512];
> >> >> +uint8_t buf[SETUP_SECT_MAX*SECTOR_SIZE];
> >> >>
> >> >> #define PECOFF_RELOC_RESERVE 0x20
> >> >>
> >> >> @@ -59,6 +65,52 @@ u8 buf[SETUP_SECT_MAX*512];
> >> >> #define PECOFF_COMPAT_RESERVE 0x0
> >> >> #endif
> >> >>
> >> >> +#define RELOC_SECTION_SIZE 10
> >> >> +
> >> >> +/* PE header has different format depending on the architecture */
> >> >> +#ifdef CONFIG_X86_64
> >> >> +typedef struct pe32plus_opt_hdr pe_opt_hdr;
> >> >> +#else
> >> >> +typedef struct pe32_opt_hdr pe_opt_hdr;
> >> >> +#endif
> >> >> +
> >> >> +static inline struct pe_hdr *get_pe_header(uint8_t *buf)
> >> >> +{
> >> >> + uint32_t pe_offset =
> >> >> get_unaligned_le32(buf+MZ_HEADER_PEADDR_OFFSET);
> >> >> + return (struct pe_hdr *)(buf + pe_offset);
> >> >> +}
> >> >> +
> >> >> +static inline pe_opt_hdr *get_pe_opt_header(uint8_t *buf)
> >> >> +{
> >> >> + return (pe_opt_hdr *)(get_pe_header(buf) + 1);
> >> >> +}
> >> >> +
> >> >> +static inline struct section_header *get_sections(uint8_t *buf)
> >> >> +{
> >> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> >> >> + uint32_t n_data_dirs = get_unaligned_le32(&hdr->data_dirs);
> >> >> + uint8_t *sections = (uint8_t *)(hdr + 1) +
> >> >> n_data_dirs*sizeof(struct data_dirent);
> >> >> + return (struct section_header *)sections;
> >> >> +}
> >> >> +
> >> >> +static inline struct data_directory *get_data_dirs(uint8_t *buf)
> >> >> +{
> >> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> >> >> + return (struct data_directory *)(hdr + 1);
> >> >> +}
> >> >> +
> >> >> +#ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
> >> >
> >> > Can we drop this conditional?
> >>
> >> Without CONFIG_EFI_DXE_MEM_ATTRIBUTES memory attributes are not
> >> getting applies anywhere, so this would break 'nokaslr' on UEFI
> >> implementations that honor section attributes.
> >>
> >
> > How so? This only affects the mappings that are created by UEFI for
> > the decompressor binary, right?
>
> I was thinking about the in-place decompression, but now I've realized
> that I was wrong since in-place decompression cannot happen when booting
> via the stub. I'll remove the ifdef.
>
Indeed. And I realized that all the image_offset handling can now be
dropped as well.
> >
> >> KASLR is already broken without that option on implementations
> >> that disallow execution of the free memory though. But unlike
> >> free memory, sections are more likely to get protected, I think.
> >>
> >
> > We need to allocate those pages properly in any case (see my other
> > reply) so it is no longer free memory.
>
> It should be fine, as I explained.
>
> The only thing that is a little unexpected is that the kernel might
> shift even with 'nokaslr' when the LOAD_PHYSICAL_ADDR is already taken
> by some firmware allocation (or by us). This should cause no real
> problems, since the kernel is required to be relocatable for the
> EFISTUB.
>
OK, good to know.
> >
> >> >> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE |
> >> >> IMAGE_SCN_ALIGN_4096BYTES)
> >> >> +#define SCN_RX (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_EXECUTE |
> >> >> IMAGE_SCN_ALIGN_4096BYTES)
> >> >> +#define SCN_RO (IMAGE_SCN_MEM_READ | IMAGE_SCN_ALIGN_4096BYTES)
> >> >
> >> > Please drop the alignment flags - they don't apply to executable only
> >> > object files.
> >>
> >> Got it, will remove them in v5.
> >>
> >> >
> >> >> +#else
> >> >> +/* With memory protection disabled all sections are RWX */
> >> >> +#define SCN_RW (IMAGE_SCN_MEM_READ | IMAGE_SCN_MEM_WRITE | \
> >> >> + IMAGE_SCN_MEM_EXECUTE | IMAGE_SCN_ALIGN_4096BYTES)
> >> >> +#define SCN_RX SCN_RW
> >> >> +#define SCN_RO SCN_RW
> >> >> +#endif
> >> >> +
> >> >> static unsigned long efi32_stub_entry;
> >> >> static unsigned long efi64_stub_entry;
> >> >> static unsigned long efi_pe_entry;
> >> >> @@ -70,7 +122,7 @@ static unsigned long _end;
> >> >>
> >> >>
> >> >> /*----------------------------------------------------------------------*/
> >> >>
> >> >> -static const u32 crctab32[] = {
> >> >> +static const uint32_t crctab32[] = {
> >> >
> >> > Replacing all the type names makes this patch very messy. Can we back
> >> > that out please?
> >>
> >> Ok, I will revert them.
> >>
> >> >
> >> >> 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419,
> >> >> 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4,
> >> >> 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07,
> >> >> @@ -125,12 +177,12 @@ static const u32 crctab32[] = {
> >> >> 0x2d02ef8d
> >> >> };
> >> >>
> >> >> -static u32 partial_crc32_one(u8 c, u32 crc)
> >> >> +static uint32_t partial_crc32_one(uint8_t c, uint32_t crc)
> >> >> {
> >> >> return crctab32[(crc ^ c) & 0xff] ^ (crc >> 8);
> >> >> }
> >> >>
> >> >> -static u32 partial_crc32(const u8 *s, int len, u32 crc)
> >> >> +static uint32_t partial_crc32(const uint8_t *s, int len, uint32_t
> >> >> crc)
> >> >> {
> >> >> while (len--)
> >> >> crc = partial_crc32_one(*s++, crc);
> >> >> @@ -152,57 +204,106 @@ static void usage(void)
> >> >> die("Usage: build setup system zoffset.h image");
> >> >> }
> >> >>
> >> >> +static void *map_file(const char *path, size_t *psize)
> >> >> +{
> >> >> + struct stat statbuf;
> >> >> + size_t size;
> >> >> + void *addr;
> >> >> + int fd;
> >> >> +
> >> >> + fd = open(path, O_RDONLY);
> >> >> + if (fd < 0)
> >> >> + die("Unable to open `%s': %m", path);
> >> >> + if (fstat(fd, &statbuf))
> >> >> + die("Unable to stat `%s': %m", path);
> >> >> +
> >> >> + size = statbuf.st_size;
> >> >> + /*
> >> >> + * Map one byte more, to allow adding null-terminator
> >> >> + * for text files.
> >> >> + */
> >> >> + addr = mmap(NULL, size + 1, PROT_READ | PROT_WRITE,
> >> >> MAP_PRIVATE, fd, 0);
> >> >> + if (addr == MAP_FAILED)
> >> >> + die("Unable to mmap '%s': %m", path);
> >> >> +
> >> >> + close(fd);
> >> >> +
> >> >> + *psize = size;
> >> >> + return addr;
> >> >> +}
> >> >> +
> >> >> +static void unmap_file(void *addr, size_t size)
> >> >> +{
> >> >> + munmap(addr, size + 1);
> >> >> +}
> >> >> +
> >> >> +static void *map_output_file(const char *path, size_t size)
> >> >> +{
> >> >> + void *addr;
> >> >> + int fd;
> >> >> +
> >> >> + fd = open(path, O_RDWR | O_CREAT, 0660);
> >> >> + if (fd < 0)
> >> >> + die("Unable to create `%s': %m", path);
> >> >> +
> >> >> + if (ftruncate(fd, size))
> >> >> + die("Unable to resize `%s': %m", path);
> >> >> +
> >> >> + addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED,
> >> >> fd, 0);
> >> >> + if (addr == MAP_FAILED)
> >> >> + die("Unable to mmap '%s': %m", path);
> >> >> +
> >> >> + return addr;
> >> >> +}
> >> >> +
> >> >> #ifdef CONFIG_EFI_STUB
> >> >>
> >> >> -static void update_pecoff_section_header_fields(char *section_name,
> >> >> u32 vma, u32 size, u32 datasz, u32 offset)
> >> >> +static void update_pecoff_section_header_fields(char *section_name,
> >> >> uint32_t vma,
> >> >> + uint32_t size,
> >> >> uint32_t datasz,
> >> >> + uint32_t offset)
> >> >> {
> >> >> unsigned int pe_header;
> >> >> unsigned short num_sections;
> >> >> - u8 *section;
> >> >> + struct section_header *section;
> >> >>
> >> >> - pe_header = get_unaligned_le32(&buf[0x3c]);
> >> >> - num_sections = get_unaligned_le16(&buf[pe_header + 6]);
> >> >> -
> >> >> -#ifdef CONFIG_X86_32
> >> >> - section = &buf[pe_header + 0xa8];
> >> >> -#else
> >> >> - section = &buf[pe_header + 0xb8];
> >> >> -#endif
> >> >> + struct pe_hdr *hdr = get_pe_header(buf);
> >> >> + num_sections = get_unaligned_le16(&hdr->sections);
> >> >> + section = get_sections(buf);
> >> >>
> >> >> while (num_sections > 0) {
> >> >> - if (strncmp((char*)section, section_name, 8) == 0) {
> >> >> + if (strncmp(section->name, section_name, 8) == 0) {
> >> >> /* section header size field */
> >> >> - put_unaligned_le32(size, section + 0x8);
> >> >> + put_unaligned_le32(size,
> >> >> §ion->virtual_size);
> >> >>
> >> >> /* section header vma field */
> >> >> - put_unaligned_le32(vma, section + 0xc);
> >> >> + put_unaligned_le32(vma,
> >> >> §ion->virtual_address);
> >> >>
> >> >> /* section header 'size of initialised data'
> >> >> field */
> >> >> - put_unaligned_le32(datasz, section + 0x10);
> >> >> + put_unaligned_le32(datasz,
> >> >> §ion->raw_data_size);
> >> >>
> >> >> /* section header 'file offset' field */
> >> >> - put_unaligned_le32(offset, section + 0x14);
> >> >> + put_unaligned_le32(offset,
> >> >> §ion->data_addr);
> >> >>
> >> >> break;
> >> >> }
> >> >> - section += 0x28;
> >> >> + section++;
> >> >> num_sections--;
> >> >> }
> >> >> }
> >> >>
> >> >> -static void update_pecoff_section_header(char *section_name, u32
> >> >> offset, u32 size)
> >> >> +static void update_pecoff_section_header(char *section_name, uint32_t
> >> >> offset, uint32_t size)
> >> >> {
> >> >> update_pecoff_section_header_fields(section_name, offset,
> >> >> size, size, offset);
> >> >> }
> >> >>
> >> >> static void update_pecoff_setup_and_reloc(unsigned int size)
> >> >> {
> >> >> - u32 setup_offset = 0x200;
> >> >> - u32 reloc_offset = size - PECOFF_RELOC_RESERVE -
> >> >> PECOFF_COMPAT_RESERVE;
> >> >> + uint32_t setup_offset = SECTOR_SIZE;
> >> >> + uint32_t reloc_offset = size - PECOFF_RELOC_RESERVE -
> >> >> PECOFF_COMPAT_RESERVE;
> >> >> #ifdef CONFIG_EFI_MIXED
> >> >> - u32 compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
> >> >> + uint32_t compat_offset = reloc_offset + PECOFF_RELOC_RESERVE;
> >> >> #endif
> >> >> - u32 setup_size = reloc_offset - setup_offset;
> >> >> + uint32_t setup_size = reloc_offset - setup_offset;
> >> >>
> >> >> update_pecoff_section_header(".setup", setup_offset,
> >> >> setup_size);
> >> >> update_pecoff_section_header(".reloc", reloc_offset,
> >> >> PECOFF_RELOC_RESERVE);
> >> >> @@ -211,8 +312,8 @@ static void update_pecoff_setup_and_reloc(unsigned
> >> >> int size)
> >> >> * Modify .reloc section contents with a single entry. The
> >> >> * relocation is applied to offset 10 of the relocation
> >> >> section.
> >> >> */
> >> >> - put_unaligned_le32(reloc_offset + 10, &buf[reloc_offset]);
> >> >> - put_unaligned_le32(10, &buf[reloc_offset + 4]);
> >> >> + put_unaligned_le32(reloc_offset + RELOC_SECTION_SIZE,
> >> >> &buf[reloc_offset]);
> >> >> + put_unaligned_le32(RELOC_SECTION_SIZE, &buf[reloc_offset +
> >> >> 4]);
> >> >>
> >> >> #ifdef CONFIG_EFI_MIXED
> >> >> update_pecoff_section_header(".compat", compat_offset,
> >> >> PECOFF_COMPAT_RESERVE);
> >> >> @@ -224,19 +325,17 @@ static void
> >> >> update_pecoff_setup_and_reloc(unsigned int size)
> >> >> */
> >> >> buf[compat_offset] = 0x1;
> >> >> buf[compat_offset + 1] = 0x8;
> >> >> - put_unaligned_le16(0x14c, &buf[compat_offset + 2]);
> >> >> + put_unaligned_le16(IMAGE_FILE_MACHINE_I386, &buf[compat_offset
> >> >> + 2]);
> >> >> put_unaligned_le32(efi32_pe_entry + size, &buf[compat_offset +
> >> >> 4]);
> >> >> #endif
> >> >> }
> >> >>
> >> >> -static void update_pecoff_text(unsigned int text_start, unsigned int
> >> >> file_sz,
> >> >> +static unsigned int update_pecoff_sections(unsigned int text_start,
> >> >> unsigned int text_sz,
> >> >> unsigned int init_sz)
> >> >> {
> >> >> - unsigned int pe_header;
> >> >> - unsigned int text_sz = file_sz - text_start;
> >> >> + unsigned int file_sz = text_start + text_sz;
> >> >> unsigned int bss_sz = init_sz - file_sz;
> >> >> -
> >> >> - pe_header = get_unaligned_le32(&buf[0x3c]);
> >> >> + pe_opt_hdr *hdr = get_pe_opt_header(buf);
> >> >>
> >> >> /*
> >> >> * The PE/COFF loader may load the image at an address which
> >> >> is
> >> >> @@ -254,18 +353,20 @@ static void update_pecoff_text(unsigned int
> >> >> text_start, unsigned int file_sz,
> >> >> * Size of code: Subtract the size of the first sector (512
> >> >> bytes)
> >> >> * which includes the header.
> >> >> */
> >> >> - put_unaligned_le32(file_sz - 512 + bss_sz, &buf[pe_header +
> >> >> 0x1c]);
> >> >> + put_unaligned_le32(file_sz - SECTOR_SIZE + bss_sz,
> >> >> &hdr->text_size);
> >> >>
> >> >> /* Size of image */
> >> >> - put_unaligned_le32(init_sz, &buf[pe_header + 0x50]);
> >> >> + put_unaligned_le32(init_sz, &hdr->image_size);
> >> >>
> >> >> /*
> >> >> * Address of entry point for PE/COFF executable
> >> >> */
> >> >> - put_unaligned_le32(text_start + efi_pe_entry, &buf[pe_header +
> >> >> 0x28]);
> >> >> + put_unaligned_le32(text_start + efi_pe_entry,
> >> >> &hdr->entry_point);
> >> >>
> >> >> update_pecoff_section_header_fields(".text", text_start,
> >> >> text_sz + bss_sz,
> >> >> text_sz, text_start);
> >> >> +
> >> >> + return text_start + file_sz;
> >> >> }
> >> >>
> >> >> static int reserve_pecoff_reloc_section(int c)
> >> >> @@ -275,7 +376,7 @@ static int reserve_pecoff_reloc_section(int c)
> >> >> return PECOFF_RELOC_RESERVE;
> >> >> }
> >> >>
> >> >> -static void efi_stub_defaults(void)
> >> >> +static void efi_stub_update_defaults(void)
> >> >> {
> >> >> /* Defaults for old kernel */
> >> >> #ifdef CONFIG_X86_32
> >> >> @@ -298,7 +399,7 @@ static void efi_stub_entry_update(void)
> >> >>
> >> >> #ifdef CONFIG_EFI_MIXED
> >> >> if (efi32_stub_entry != addr)
> >> >> - die("32-bit and 64-bit EFI entry points do not
> >> >> match\n");
> >> >> + die("32-bit and 64-bit EFI entry points do not
> >> >> match");
> >> >> #endif
> >> >> #endif
> >> >> put_unaligned_le32(addr, &buf[0x264]);
> >> >> @@ -310,7 +411,7 @@ static inline void
> >> >> update_pecoff_setup_and_reloc(unsigned int size) {}
> >> >> static inline void update_pecoff_text(unsigned int text_start,
> >> >> unsigned int file_sz,
> >> >> unsigned int init_sz) {}
> >> >> -static inline void efi_stub_defaults(void) {}
> >> >> +static inline void efi_stub_update_defaults(void) {}
> >> >> static inline void efi_stub_entry_update(void) {}
> >> >>
> >> >> static inline int reserve_pecoff_reloc_section(int c)
> >> >> @@ -338,20 +439,15 @@ static int reserve_pecoff_compat_section(int c)
> >> >>
> >> >> static void parse_zoffset(char *fname)
> >> >> {
> >> >> - FILE *file;
> >> >> - char *p;
> >> >> - int c;
> >> >> + size_t size;
> >> >> + char *data, *p;
> >> >>
> >> >> - file = fopen(fname, "r");
> >> >> - if (!file)
> >> >> - die("Unable to open `%s': %m", fname);
> >> >> - c = fread(buf, 1, sizeof(buf) - 1, file);
> >> >> - if (ferror(file))
> >> >> - die("read-error on `zoffset.h'");
> >> >> - fclose(file);
> >> >> - buf[c] = 0;
> >> >> + data = map_file(fname, &size);
> >> >>
> >> >> - p = (char *)buf;
> >> >> + /* We can do that, since we mapped one byte more */
> >> >> + data[size] = 0;
> >> >> +
> >> >> + p = (char *)data;
> >> >>
> >> >> while (p && *p) {
> >> >> PARSE_ZOFS(p, efi32_stub_entry);
> >> >> @@ -367,82 +463,99 @@ static void parse_zoffset(char *fname)
> >> >> while (p && (*p == '\r' || *p == '\n'))
> >> >> p++;
> >> >> }
> >> >> +
> >> >> + unmap_file(data, size);
> >> >> }
> >> >>
> >> >> -int main(int argc, char ** argv)
> >> >> +static unsigned int read_setup(char *path)
> >> >> {
> >> >> - unsigned int i, sz, setup_sectors, init_sz;
> >> >> - int c;
> >> >> - u32 sys_size;
> >> >> - struct stat sb;
> >> >> - FILE *file, *dest;
> >> >> - int fd;
> >> >> - void *kernel;
> >> >> - u32 crc = 0xffffffffUL;
> >> >> -
> >> >> - efi_stub_defaults();
> >> >> -
> >> >> - if (argc != 5)
> >> >> - usage();
> >> >> - parse_zoffset(argv[3]);
> >> >> -
> >> >> - dest = fopen(argv[4], "w");
> >> >> - if (!dest)
> >> >> - die("Unable to write `%s': %m", argv[4]);
> >> >> + FILE *file;
> >> >> + unsigned int setup_size, file_size;
> >> >>
> >> >> /* Copy the setup code */
> >> >> - file = fopen(argv[1], "r");
> >> >> + file = fopen(path, "r");
> >> >> if (!file)
> >> >> - die("Unable to open `%s': %m", argv[1]);
> >> >> - c = fread(buf, 1, sizeof(buf), file);
> >> >> + die("Unable to open `%s': %m", path);
> >> >> +
> >> >> + file_size = fread(buf, 1, sizeof(buf), file);
> >> >> if (ferror(file))
> >> >> die("read-error on `setup'");
> >> >> - if (c < 1024)
> >> >> +
> >> >> + if (file_size < 2 * SECTOR_SIZE)
> >> >> die("The setup must be at least 1024 bytes");
> >> >> - if (get_unaligned_le16(&buf[510]) != 0xAA55)
> >> >> +
> >> >> + if (get_unaligned_le16(&buf[SECTOR_SIZE - 2]) != 0xAA55)
> >> >> die("Boot block hasn't got boot flag (0xAA55)");
> >> >> +
> >> >> fclose(file);
> >> >>
> >> >> - c += reserve_pecoff_compat_section(c);
> >> >> - c += reserve_pecoff_reloc_section(c);
> >> >> + /* Reserve space for PE sections */
> >> >> + file_size += reserve_pecoff_compat_section(file_size);
> >> >> + file_size += reserve_pecoff_reloc_section(file_size);
> >> >>
> >> >> /* Pad unused space with zeros */
> >> >> - setup_sectors = (c + 511) / 512;
> >> >> - if (setup_sectors < SETUP_SECT_MIN)
> >> >> - setup_sectors = SETUP_SECT_MIN;
> >> >> - i = setup_sectors*512;
> >> >> - memset(buf+c, 0, i-c);
> >> >>
> >> >> - update_pecoff_setup_and_reloc(i);
> >> >> + setup_size = round_up(file_size, SECTOR_SIZE);
> >> >> +
> >> >> + if (setup_size < SETUP_SECT_MIN * SECTOR_SIZE)
> >> >> + setup_size = SETUP_SECT_MIN * SECTOR_SIZE;
> >> >> +
> >> >> + /*
> >> >> + * Global buffer is already initialised
> >> >> + * to 0, but just in case, zero out padding.
> >> >> + */
> >> >> +
> >> >> + memset(buf + file_size, 0, setup_size - file_size);
> >> >> +
> >> >> + return setup_size;
> >> >> +}
> >> >> +
> >> >> +int main(int argc, char **argv)
> >> >> +{
> >> >> + size_t kern_file_size;
> >> >> + unsigned int setup_size;
> >> >> + unsigned int setup_sectors;
> >> >> + unsigned int init_size;
> >> >> + unsigned int total_size;
> >> >> + unsigned int kern_size;
> >> >> + void *kernel;
> >> >> + uint32_t crc = 0xffffffffUL;
> >> >> + uint8_t *output;
> >> >> +
> >> >> + if (argc != 5)
> >> >> + usage();
> >> >> +
> >> >> + efi_stub_update_defaults();
> >> >> + parse_zoffset(argv[3]);
> >> >> +
> >> >> + setup_size = read_setup(argv[1]);
> >> >> +
> >> >> + setup_sectors = setup_size/SECTOR_SIZE;
> >> >>
> >> >> /* Set the default root device */
> >> >> put_unaligned_le16(DEFAULT_ROOT_DEV, &buf[508]);
> >> >>
> >> >> - /* Open and stat the kernel file */
> >> >> - fd = open(argv[2], O_RDONLY);
> >> >> - if (fd < 0)
> >> >> - die("Unable to open `%s': %m", argv[2]);
> >> >> - if (fstat(fd, &sb))
> >> >> - die("Unable to stat `%s': %m", argv[2]);
> >> >> - sz = sb.st_size;
> >> >> - kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
> >> >> - if (kernel == MAP_FAILED)
> >> >> - die("Unable to mmap '%s': %m", argv[2]);
> >> >> - /* Number of 16-byte paragraphs, including space for a 4-byte
> >> >> CRC */
> >> >> - sys_size = (sz + 15 + 4) / 16;
> >> >> + /* Map kernel file to memory */
> >> >> + kernel = map_file(argv[2], &kern_file_size);
> >> >> +
> >> >> #ifdef CONFIG_EFI_STUB
> >> >> - /*
> >> >> - * COFF requires minimum 32-byte alignment of sections, and
> >> >> - * adding a signature is problematic without that alignment.
> >> >> - */
> >> >> - sys_size = (sys_size + 1) & ~1;
> >> >> + /* PE specification require 512-byte minimum section file
> >> >> alignment */
> >> >> + kern_size = round_up(kern_file_size + 4, SECTOR_SIZE);
> >> >> + update_pecoff_setup_and_reloc(setup_size);
> >> >> +#else
> >> >> + /* Number of 16-byte paragraphs, including space for a 4-byte
> >> >> CRC */
> >> >> + kern_size = round_up(kern_file_size + 4, PARAGRAPH_SIZE);
> >> >> #endif
> >> >>
> >> >> /* Patch the setup code with the appropriate size parameters
> >> >> */
> >> >> - buf[0x1f1] = setup_sectors-1;
> >> >> - put_unaligned_le32(sys_size, &buf[0x1f4]);
> >> >> + buf[0x1f1] = setup_sectors - 1;
> >> >> + put_unaligned_le32(kern_size/PARAGRAPH_SIZE, &buf[0x1f4]);
> >> >> +
> >> >> + /* Update kernel_info offset. */
> >> >> + put_unaligned_le32(kernel_info, &buf[0x268]);
> >> >> +
> >> >> + init_size = get_unaligned_le32(&buf[0x260]);
> >> >>
> >> >> - init_sz = get_unaligned_le32(&buf[0x260]);
> >> >> #ifdef CONFIG_EFI_STUB
> >> >> /*
> >> >> * The decompression buffer will start at ImageBase. When
> >> >> relocating
> >> >> @@ -458,45 +571,35 @@ int main(int argc, char ** argv)
> >> >> * For future-proofing, increase init_sz if necessary.
> >> >> */
> >> >>
> >> >> - if (init_sz - _end < i + _ehead) {
> >> >> - init_sz = (i + _ehead + _end + 4095) & ~4095;
> >> >> - put_unaligned_le32(init_sz, &buf[0x260]);
> >> >> + if (init_size - _end < setup_size + _ehead) {
> >> >> + init_size = round_up(setup_size + _ehead + _end,
> >> >> SECTION_ALIGNMENT);
> >> >> + put_unaligned_le32(init_size, &buf[0x260]);
> >> >> }
> >> >> -#endif
> >> >> - update_pecoff_text(setup_sectors * 512, i + (sys_size * 16),
> >> >> init_sz);
> >> >>
> >> >> - efi_stub_entry_update();
> >> >> -
> >> >> - /* Update kernel_info offset. */
> >> >> - put_unaligned_le32(kernel_info, &buf[0x268]);
> >> >> + total_size = update_pecoff_sections(setup_size, kern_size,
> >> >> init_size);
> >> >>
> >> >> - crc = partial_crc32(buf, i, crc);
> >> >> - if (fwrite(buf, 1, i, dest) != i)
> >> >> - die("Writing setup failed");
> >> >> + efi_stub_entry_update();
> >> >> +#else
> >> >> + (void)init_size;
> >> >> + total_size = setup_size + kern_size;
> >> >> +#endif
> >> >>
> >> >> - /* Copy the kernel code */
> >> >> - crc = partial_crc32(kernel, sz, crc);
> >> >> - if (fwrite(kernel, 1, sz, dest) != sz)
> >> >> - die("Writing kernel failed");
> >> >> + output = map_output_file(argv[4], total_size);
> >> >>
> >> >> - /* Add padding leaving 4 bytes for the checksum */
> >> >> - while (sz++ < (sys_size*16) - 4) {
> >> >> - crc = partial_crc32_one('\0', crc);
> >> >> - if (fwrite("\0", 1, 1, dest) != 1)
> >> >> - die("Writing padding failed");
> >> >> - }
> >> >> + memcpy(output, buf, setup_size);
> >> >> + memcpy(output + setup_size, kernel, kern_file_size);
> >> >> + memset(output + setup_size + kern_file_size, 0, kern_size -
> >> >> kern_file_size);
> >> >>
> >> >> - /* Write the CRC */
> >> >> - put_unaligned_le32(crc, buf);
> >> >> - if (fwrite(buf, 1, 4, dest) != 4)
> >> >> - die("Writing CRC failed");
> >> >> + /* Calculate and write kernel checksum. */
> >> >> + crc = partial_crc32(output, total_size - 4, crc);
> >> >> + put_unaligned_le32(crc, &output[total_size - 4]);
> >> >>
> >> >> - /* Catch any delayed write failures */
> >> >> - if (fclose(dest))
> >> >> - die("Writing image failed");
> >> >> + /* Catch any delayed write failures. */
> >> >> + if (munmap(output, total_size) < 0)
> >> >> + die("Writing kernel failed");
> >> >>
> >> >> - close(fd);
> >> >> + unmap_file(kernel, kern_file_size);
> >> >>
> >> >> - /* Everything is OK */
> >> >> + /* Everything is OK. */
> >> >> return 0;
> >> >> }
> >> >> --
> >> >> 2.37.4
> >> >>
On Thu, 15 Dec 2022 at 13:38, Evgeniy Baskov <[email protected]> wrote:
>
> Ensure WP bit to be set to prevent boot code from writing to
> non-writable memory pages.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/boot/compressed/head_64.S | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index a75712991df3..9f2e8f50fc71 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -660,9 +660,8 @@ SYM_CODE_START(trampoline_32bit_src)
> pushl $__KERNEL_CS
> pushl %eax
>
> - /* Enable paging again. */
> - movl %cr0, %eax
> - btsl $X86_CR0_PG_BIT, %eax
> + /* Enable paging and set CR0 to known state (this also sets WP flag) */
> + movl $CR0_STATE, %eax
> movl %eax, %cr0
>
> lret
> --
> 2.37.4
>
On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>
> Convert kernel_add_identity_map() into a function pointer to be able
> to provide alternative implementations of this function. Required
> to enable calling the code using this function from EFI environment.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
> ---
> arch/x86/boot/compressed/ident_map_64.c | 7 ++++---
> arch/x86/boot/compressed/misc.c | 24 ++++++++++++++++++++++++
> arch/x86/boot/compressed/misc.h | 15 +++------------
> 3 files changed, 31 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index ba5108c58a4e..1aee524d3c2b 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -92,9 +92,9 @@ bool has_nx; /* set in head_64.S */
> /*
> * Adds the specified range to the identity mappings.
> */
> -unsigned long kernel_add_identity_map(unsigned long start,
> - unsigned long end,
> - unsigned int flags)
> +unsigned long kernel_add_identity_map_(unsigned long start,
Please use a more discriminating name here - the trailing _ is rather
hard to spot.
> + unsigned long end,
> + unsigned int flags)
> {
> int ret;
>
> @@ -142,6 +142,7 @@ void initialize_identity_maps(void *rmode)
> struct setup_data *sd;
>
> boot_params = rmode;
> + kernel_add_identity_map = kernel_add_identity_map_;
>
> /* Exclude the encryption mask from __PHYSICAL_MASK */
> physical_mask &= ~sme_me_mask;
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index aa4a22bc9cf9..c9c235d65d16 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -275,6 +275,22 @@ static void parse_elf(void *output, unsigned long output_len,
> free(phdrs);
> }
>
> +/*
> + * This points to actual implementation of mapping function
> + * for current environment: either EFI API wrapper,
> + * own implementation or dummy implementation below.
> + */
> +unsigned long (*kernel_add_identity_map)(unsigned long start,
> + unsigned long end,
> + unsigned int flags);
> +
> +static inline unsigned long kernel_add_identity_map_dummy(unsigned long start,
This function is never called, it only has its address taken, so the
'inline' makes no sense here.
> + unsigned long end,
> + unsigned int flags)
> +{
> + return start;
> +}
> +
> /*
> * The compressed kernel image (ZO), has been moved so that its position
> * is against the end of the buffer used to hold the uncompressed kernel
> @@ -312,6 +328,14 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>
> init_default_io_ops();
>
> + /*
> + * On 64-bit this pointer is set during page table uninitialization,
initialization
> + * but on 32-bit it remains uninitialized, since paging is disabled.
> + */
> + if (IS_ENABLED(CONFIG_X86_32))
> + kernel_add_identity_map = kernel_add_identity_map_dummy;
> +
> +
> /*
> * Detect TDX guest environment.
> *
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 38d31bec062d..0076b2845b4b 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -180,18 +180,9 @@ static inline int count_immovable_mem_regions(void) { return 0; }
> #ifdef CONFIG_X86_5LEVEL
> extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
> #endif
> -#ifdef CONFIG_X86_64
> -extern unsigned long kernel_add_identity_map(unsigned long start,
> - unsigned long end,
> - unsigned int flags);
> -#else
> -static inline unsigned long kernel_add_identity_map(unsigned long start,
> - unsigned long end,
> - unsigned int flags)
> -{
> - return start;
> -}
> -#endif
> +extern unsigned long (*kernel_add_identity_map)(unsigned long start,
> + unsigned long end,
> + unsigned int flags);
> /* Used by PAGE_KERN* macros: */
> extern pteval_t __default_kernel_pte_mask;
>
> --
> 2.37.4
>
On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>
> This is required to fit more sections in PE section tables,
> since its size is restricted by zero page located at specific offset
> after the PE header.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
I'd prefer to rip this out altogether.
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?id=9510f6f04f579b9a3f54ad762c75ab2d905e37d8
(and refer to the other thread in linux-efi@)
> ---
> arch/x86/boot/header.S | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 9338c68e7413..9fec80bc504b 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -59,17 +59,16 @@ start2:
> cld
>
> movw $bugger_off_msg, %si
> + movw $bugger_off_msg_size, %cx
>
> msg_loop:
> lodsb
> - andb %al, %al
> - jz bs_die
> movb $0xe, %ah
> movw $7, %bx
> int $0x10
> - jmp msg_loop
> + decw %cx
> + jnz msg_loop
>
> -bs_die:
> # Allow the user to press a key, then reboot
> xorw %ax, %ax
> int $0x16
> @@ -90,10 +89,9 @@ bs_die:
>
> .section ".bsdata", "a"
> bugger_off_msg:
> - .ascii "Use a boot loader.\r\n"
> - .ascii "\n"
> - .ascii "Remove disk and press any key to reboot...\r\n"
> - .byte 0
> + .ascii "Use a boot loader. "
> + .ascii "Press a key to reboot"
> + .set bugger_off_msg_size, . - bugger_off_msg
>
> #ifdef CONFIG_EFI_STUB
> pe_header:
> --
> 2.37.4
>
On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>
> After every implicit mapping is removed, this code is no longer needed.
>
> Remove memory mapping from page fault handler to ensure that there are
> no hidden invalid memory accesses.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
Reviewed-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/boot/compressed/ident_map_64.c | 26 ++++++++++---------------
> 1 file changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index fec795a4ce23..ba5108c58a4e 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -386,27 +386,21 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
> {
> unsigned long address = native_read_cr2();
> unsigned long end;
> - bool ghcb_fault;
> + char *msg;
>
> - ghcb_fault = sev_es_check_ghcb_fault(address);
> + if (sev_es_check_ghcb_fault(address))
> + msg = "Page-fault on GHCB page:";
> + else
> + msg = "Unexpected page-fault:";
>
> address &= PMD_MASK;
> end = address + PMD_SIZE;
>
> /*
> - * Check for unexpected error codes. Unexpected are:
> - * - Faults on present pages
> - * - User faults
> - * - Reserved bits set
> - */
> - if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
> - do_pf_error("Unexpected page-fault:", error_code, address, regs->ip);
> - else if (ghcb_fault)
> - do_pf_error("Page-fault on GHCB page:", error_code, address, regs->ip);
> -
> - /*
> - * Error code is sane - now identity map the 2M region around
> - * the faulting address.
> + * Since all memory allocations are made explicit
> + * now, every page fault at this stage is an
> + * error and the error handler is there only
> + * for debug purposes.
> */
> - kernel_add_identity_map(address, end, MAP_WRITE);
> + do_pf_error(msg, error_code, address, regs->ip);
> }
> --
> 2.37.4
>
On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>
> Doing it that way allows setting up stricter memory attributes,
> simplifies boot code path and removes potential relocation
> of kernel image.
>
> Wire up required interfaces and minimally initialize zero page
> fields needed for it to function correctly.
>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
Some more comments - apologies for the multi stage approach ...
> ---
> arch/x86/boot/compressed/head_32.S | 50 ++++-
> arch/x86/boot/compressed/head_64.S | 58 ++++-
> drivers/firmware/efi/Kconfig | 2 +
> drivers/firmware/efi/libstub/Makefile | 2 +-
> .../firmware/efi/libstub/x86-extract-direct.c | 208 ++++++++++++++++++
> drivers/firmware/efi/libstub/x86-stub.c | 119 +---------
> drivers/firmware/efi/libstub/x86-stub.h | 14 ++
> 7 files changed, 338 insertions(+), 115 deletions(-)
> create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
> create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
>
...
> diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
> index 043ca31c114e..f50c2a84a754 100644
> --- a/drivers/firmware/efi/Kconfig
> +++ b/drivers/firmware/efi/Kconfig
> @@ -58,6 +58,8 @@ config EFI_DXE_MEM_ATTRIBUTES
> Use DXE services to check and alter memory protection
> attributes during boot via EFISTUB to ensure that memory
> ranges used by the kernel are writable and executable.
> + This option also enables stricter memory attributes
> + on compressed kernel PE image.
images
>
> config EFI_PARAMS_FROM_FDT
> bool
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index be8b8c6e8b40..99b81c95344c 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -88,7 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \
>
> lib-$(CONFIG_ARM) += arm32-stub.o
> lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o arm64-entry.o smbios.o
> -lib-$(CONFIG_X86) += x86-stub.o
> +lib-$(CONFIG_X86) += x86-stub.o x86-extract-direct.o
> lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
> lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o
>
> diff --git a/drivers/firmware/efi/libstub/x86-extract-direct.c b/drivers/firmware/efi/libstub/x86-extract-direct.c
> new file mode 100644
> index 000000000000..4ecbc4a9b3ed
> --- /dev/null
> +++ b/drivers/firmware/efi/libstub/x86-extract-direct.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/acpi.h>
> +#include <linux/efi.h>
> +#include <linux/elf.h>
> +#include <linux/stddef.h>
> +
> +#include <asm/efi.h>
> +#include <asm/e820/types.h>
> +#include <asm/desc.h>
> +#include <asm/boot.h>
> +#include <asm/bootparam_utils.h>
> +#include <asm/shared/extract.h>
> +#include <asm/shared/pgtable.h>
> +
> +#include "efistub.h"
> +#include "x86-stub.h"
> +
> +static efi_handle_t image_handle;
> +
> +static void do_puthex(unsigned long value)
> +{
> + efi_printk("%08lx", value);
> +}
> +
> +static void do_putstr(const char *msg)
> +{
> + efi_printk("%s", msg);
> +}
> +
> +static unsigned long do_map_range(unsigned long start,
> + unsigned long end,
> + unsigned int flags)
> +{
> + efi_status_t status;
> +
Please drop this newline.
> + unsigned long size = end - start;
> +
> + if (flags & MAP_ALLOC) {
> + unsigned long addr;
> +
> + status = efi_low_alloc_above(size, CONFIG_PHYSICAL_ALIGN,
> + &addr, start);
> + if (status != EFI_SUCCESS) {
> + efi_err("Unable to allocate memory for uncompressed kernel");
> + efi_exit(image_handle, EFI_OUT_OF_RESOURCES);
> + }
> +
OK, so this is the place where the chosen address for deompressing the
kernel is actually allocated and carved out in the EFI memory map.
Could you add a comment here so other folks won't be confused like I
was how this is put together?
> + if (start != addr) {
> + efi_debug("Unable to allocate at given address"
> + " (desired=0x%lx, actual=0x%lx)",
> + (unsigned long)start, addr);
> + start = addr;
> + }
> + }
> +
> + if ((flags & (MAP_PROTECT | MAP_ALLOC)) &&
> + IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> + unsigned long attr = 0;
> +
> + if (!(flags & MAP_EXEC))
> + attr |= EFI_MEMORY_XP;
> +
> + if (!(flags & MAP_WRITE))
> + attr |= EFI_MEMORY_RO;
> +
> + status = efi_adjust_memory_range_protection(start, size, attr);
> + if (status != EFI_SUCCESS)
> + efi_err("Unable to protect memory range");
> + }
> +
> + return start;
> +}
> +
> +/*
> + * Trampoline takes 3 pages and can be loaded in first megabyte of memory
> + * with its end placed between 0 and 640k where BIOS might start.
> + * (see arch/x86/boot/compressed/pgtable_64.c)
> + */
> +
> +#ifdef CONFIG_64BIT
> +static efi_status_t prepare_trampoline(void)
> +{
> + efi_status_t status;
> +
> + status = efi_allocate_pages(TRAMPOLINE_32BIT_SIZE,
> + (unsigned long *)&trampoline_32bit,
> + TRAMPOLINE_32BIT_PLACEMENT_MAX);
> +
> + if (status != EFI_SUCCESS)
> + return status;
> +
> + unsigned long trampoline_start = (unsigned long)trampoline_32bit;
> +
Please put all variable declarations at the start of the block
> + memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
> +
> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> + /* First page of trampoline is a top level page table */
> + efi_adjust_memory_range_protection(trampoline_start,
> + PAGE_SIZE,
> + EFI_MEMORY_XP);
> + }
> +
> + /* Second page of trampoline is the code (with a padding) */
> +
> + void *caddr = (void *)trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET;
> +
same here
> + memcpy(caddr, trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> +
> + if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
> + efi_adjust_memory_range_protection((unsigned long)caddr,
> + PAGE_SIZE,
> + EFI_MEMORY_RO);
> +
> + /* And the last page of trampoline is the stack */
> +
> + efi_adjust_memory_range_protection(trampoline_start + 2 * PAGE_SIZE,
> + PAGE_SIZE,
> + EFI_MEMORY_XP);
> + }
> +
> + return EFI_SUCCESS;
> +}
> +#else
> +static inline efi_status_t prepare_trampoline(void)
> +{
> + return EFI_SUCCESS;
> +}
> +#endif
> +
> +static efi_status_t init_loader_data(efi_handle_t handle,
> + struct boot_params *params,
> + struct efi_boot_memmap **map)
> +{
> + struct efi_info *efi = (void *)¶ms->efi_info;
> + efi_status_t status;
> +
> + status = efi_get_memory_map(map, false);
> +
> + if (status != EFI_SUCCESS) {
> + efi_err("Unable to get EFI memory map...\n");
> + return status;
> + }
> +
> + const char *signature = efi_is_64bit() ? EFI64_LOADER_SIGNATURE
> + : EFI32_LOADER_SIGNATURE;
> +
Move this to the start
> + memcpy(&efi->efi_loader_signature, signature, sizeof(__u32));
> +
> + efi->efi_memdesc_size = (*map)->desc_size;
> + efi->efi_memdesc_version = (*map)->desc_ver;
> + efi->efi_memmap_size = (*map)->map_size;
> +
> + efi_set_u64_split((unsigned long)(*map)->map,
> + &efi->efi_memmap, &efi->efi_memmap_hi);
> +
> + efi_set_u64_split((unsigned long)efi_system_table,
> + &efi->efi_systab, &efi->efi_systab_hi);
> +
> + image_handle = handle;
> +
> + return EFI_SUCCESS;
> +}
> +
> +static void free_loader_data(struct boot_params *params, struct efi_boot_memmap *map)
> +{
> + struct efi_info *efi = (void *)¶ms->efi_info;
> +
> + efi_bs_call(free_pool, map);
> +
> + efi->efi_memdesc_size = 0;
> + efi->efi_memdesc_version = 0;
> + efi->efi_memmap_size = 0;
> + efi_set_u64_split(0, &efi->efi_memmap, &efi->efi_memmap_hi);
> +}
> +
> +extern unsigned char input_data[];
> +extern unsigned int input_len, output_len;
> +
> +unsigned long extract_kernel_direct(efi_handle_t handle, struct boot_params *params)
> +{
> +
> + void *res;
> + efi_status_t status;
> + struct efi_extract_callbacks cb = { 0 };
> +
> + status = prepare_trampoline();
> +
> + if (status != EFI_SUCCESS)
> + return 0;
> +
> + /* Prepare environment for do_extract_kernel() call */
> + struct efi_boot_memmap *map = NULL;
Move this to the start.
> + status = init_loader_data(handle, params, &map);
> +
> + if (status != EFI_SUCCESS)
> + return 0;
> +
> + cb.puthex = do_puthex;
> + cb.putstr = do_putstr;
> + cb.map_range = do_map_range;
> +
> + res = efi_extract_kernel(params, &cb, input_data, input_len, output_len);
> +
> + free_loader_data(params, map);
> +
> + return (unsigned long)res;
> +}
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 7fb1eff88a18..1d1ab1911fd3 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -17,6 +17,7 @@
> #include <asm/boot.h>
>
> #include "efistub.h"
> +#include "x86-stub.h"
>
> /* Maximum physical address for 64-bit kernel with 4-level paging */
> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
> @@ -24,7 +25,7 @@
> const efi_system_table_t *efi_system_table;
> const efi_dxe_services_table_t *efi_dxe_table;
> u32 image_offset __section(".data");
> -static efi_loaded_image_t *image = NULL;
> +static efi_loaded_image_t *image __section(".data");
>
> static efi_status_t
> preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
> @@ -212,55 +213,9 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
> }
> }
>
> -/*
> - * Trampoline takes 2 pages and can be loaded in first megabyte of memory
> - * with its end placed between 128k and 640k where BIOS might start.
> - * (see arch/x86/boot/compressed/pgtable_64.c)
> - *
> - * We cannot find exact trampoline placement since memory map
> - * can be modified by UEFI, and it can alter the computed address.
> - */
> -
> -#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
> -#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
> -
> -void startup_32(struct boot_params *boot_params);
> -
> -static void
> -setup_memory_protection(unsigned long image_base, unsigned long image_size)
> -{
> - /*
> - * Allow execution of possible trampoline used
> - * for switching between 4- and 5-level page tables
> - * and relocated kernel image.
> - */
> -
> - efi_adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
> - TRAMPOLINE_PLACEMENT_SIZE, 0);
> -
> -#ifdef CONFIG_64BIT
> - if (image_base != (unsigned long)startup_32)
> - efi_adjust_memory_range_protection(image_base, image_size, 0);
> -#else
> - /*
> - * Clear protection flags on a whole range of possible
> - * addresses used for KASLR. We don't need to do that
> - * on x86_64, since KASLR/extraction is performed after
> - * dedicated identity page tables are built and we only
> - * need to remove possible protection on relocated image
> - * itself disregarding further relocations.
> - */
> - efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
> - KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR,
> - 0);
> -#endif
> -}
> -
> static const efi_char16_t apple[] = L"Apple";
>
> -static void setup_quirks(struct boot_params *boot_params,
> - unsigned long image_base,
> - unsigned long image_size)
> +static void setup_quirks(struct boot_params *boot_params)
> {
> efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
> efi_table_attr(efi_system_table, fw_vendor);
> @@ -269,9 +224,6 @@ static void setup_quirks(struct boot_params *boot_params,
> if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
> retrieve_apple_device_properties(boot_params);
> }
> -
> - if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
> - setup_memory_protection(image_base, image_size);
> }
>
> /*
> @@ -384,7 +336,7 @@ static void setup_graphics(struct boot_params *boot_params)
> }
>
>
> -static void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
> {
> efi_bs_call(exit, handle, status, 0, NULL);
> for(;;)
> @@ -707,8 +659,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
> }
>
> /*
> - * On success, we return the address of startup_32, which has potentially been
> - * relocated by efi_relocate_kernel.
> + * On success, we return extracted kernel entry point.
> * On failure, we exit to the firmware via efi_exit instead of returning.
> */
> asmlinkage unsigned long efi_main(efi_handle_t handle,
> @@ -733,60 +684,6 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
> efi_dxe_table = NULL;
> }
>
> - /*
> - * If the kernel isn't already loaded at a suitable address,
> - * relocate it.
> - *
> - * It must be loaded above LOAD_PHYSICAL_ADDR.
> - *
> - * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
> - * is defined as the macro MAXMEM, but unfortunately that is not a
> - * compile-time constant if 5-level paging is configured, so we instead
> - * define our own macro for use here.
> - *
> - * For 32-bit, the maximum address is complicated to figure out, for
> - * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
> - * KASLR uses.
> - *
> - * Also relocate it if image_offset is zero, i.e. the kernel wasn't
> - * loaded by LoadImage, but rather by a bootloader that called the
> - * handover entry. The reason we must always relocate in this case is
> - * to handle the case of systemd-boot booting a unified kernel image,
> - * which is a PE executable that contains the bzImage and an initrd as
> - * COFF sections. The initrd section is placed after the bzImage
> - * without ensuring that there are at least init_size bytes available
> - * for the bzImage, and thus the compressed kernel's startup code may
> - * overwrite the initrd unless it is moved out of the way.
> - */
> -
> - buffer_start = ALIGN(bzimage_addr - image_offset,
> - hdr->kernel_alignment);
> - buffer_end = buffer_start + hdr->init_size;
> -
> - if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
> - (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
> - (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
> - (image_offset == 0)) {
> - extern char _bss[];
> -
> - status = efi_relocate_kernel(&bzimage_addr,
> - (unsigned long)_bss - bzimage_addr,
> - hdr->init_size,
> - hdr->pref_address,
> - hdr->kernel_alignment,
> - LOAD_PHYSICAL_ADDR);
> - if (status != EFI_SUCCESS) {
> - efi_err("efi_relocate_kernel() failed!\n");
> - goto fail;
> - }
> - /*
> - * Now that we've copied the kernel elsewhere, we no longer
> - * have a set up block before startup_32(), so reset image_offset
> - * to zero in case it was set earlier.
> - */
> - image_offset = 0;
> - }
> -
> #ifdef CONFIG_CMDLINE_BOOL
> status = efi_parse_options(CONFIG_CMDLINE);
> if (status != EFI_SUCCESS) {
> @@ -843,7 +740,11 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
>
> setup_efi_pci(boot_params);
>
> - setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
> + setup_quirks(boot_params);
> +
> + bzimage_addr = extract_kernel_direct(handle, boot_params);
> + if (!bzimage_addr)
> + goto fail;
>
> status = exit_boot(boot_params, handle);
> if (status != EFI_SUCCESS) {
> diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
> new file mode 100644
> index 000000000000..baecc7c6e602
> --- /dev/null
> +++ b/drivers/firmware/efi/libstub/x86-stub.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _DRIVERS_FIRMWARE_EFI_X86STUB_H
> +#define _DRIVERS_FIRMWARE_EFI_X86STUB_H
> +
> +#include <linux/efi.h>
> +
> +#include <asm/bootparam.h>
> +
> +void __noreturn efi_exit(efi_handle_t handle, efi_status_t status);
> +unsigned long extract_kernel_direct(efi_handle_t handle, struct boot_params *boot_params);
> +void startup_32(struct boot_params *boot_params);
> +
> +#endif
> --
> 2.37.4
>
On Thu, 15 Dec 2022 at 13:42, Evgeniy Baskov <[email protected]> wrote:
>
> Add EFI_MEMORY_ATTRIBUTE_PROTOCOL as preferred alternative to DXE
> services for changing memory attributes in the EFISTUB.
>
> Use DXE services only as a fallback in case aforementioned protocol
> is not supported by UEFI implementation.
>
> Move DXE services initialization code closer to the place they are used
> to match EFI_MEMORY_ATTRIBUTE_PROTOCOL initialization code.
>
> Tested-by: Mario Limonciello <[email protected]>
> Tested-by: Peter Jones <[email protected]>
> Signed-off-by: Evgeniy Baskov <[email protected]>
I'm not convinced about the use of the DXE services for this, and I
think we should replace this patch with changes that base all the new
protection code on the EFI memory attributes protocol only.
We introduced that DXE code to remove protections from memory that was
mapped read-only and/or non-executable, and described as such in the
GCD memory map.
Using it to manipulate restricted permissions like this is quite a
different thing, and sadly (at least in EDK2), the GCD system memory
map is not kept in sync with the updated permissions, i.e, the W^X
protections for loaded images and the NX protection for arbitrary page
allocations are both based on the PI CPU arch protocol, which
manipulates the page tables directly, but does not record the modified
attributes in the GCD or EFI memory maps, as this would result in
massive fragmentation and break lots of other things.
That means that, except for the specific use case for which we
introduced the DXE services calls, the only reliable way to figure out
what permission attributes a certain range of memory is using is the
EFI memory attributes protocol, and I don't think we should use
anything else for tightening down these protections.
> ---
> drivers/firmware/efi/libstub/mem.c | 168 ++++++++++++++++++------
> drivers/firmware/efi/libstub/x86-stub.c | 17 ---
> 2 files changed, 128 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
> index 3e47e5931f04..07d54c88c62e 100644
> --- a/drivers/firmware/efi/libstub/mem.c
> +++ b/drivers/firmware/efi/libstub/mem.c
> @@ -5,6 +5,9 @@
>
> #include "efistub.h"
>
> +const efi_dxe_services_table_t *efi_dxe_table;
> +efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
> +
> /**
> * efi_get_memory_map() - get memory map
> * @map: pointer to memory map pointer to which to assign the
> @@ -129,66 +132,47 @@ void efi_free(unsigned long size, unsigned long addr)
> efi_bs_call(free_pages, addr, nr_pages);
> }
>
> -/**
> - * efi_adjust_memory_range_protection() - change memory range protection attributes
> - * @start: memory range start address
> - * @size: memory range size
> - *
> - * Actual memory range for which memory attributes are modified is
> - * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
> - * that includes [start, start + size].
> - *
> - * @return: status code
> - */
> -efi_status_t efi_adjust_memory_range_protection(unsigned long start,
> - unsigned long size,
> - unsigned long attributes)
> +static void retrieve_dxe_table(void)
> +{
> + efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> + if (efi_dxe_table &&
> + efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
> + efi_warn("Ignoring DXE services table: invalid signature\n");
> + efi_dxe_table = NULL;
> + }
> +}
> +
> +static efi_status_t adjust_mem_attrib_dxe(efi_physical_addr_t rounded_start,
> + efi_physical_addr_t rounded_end,
> + unsigned long attributes)
> {
> efi_status_t status;
> efi_gcd_memory_space_desc_t desc;
> - efi_physical_addr_t end, next;
> - efi_physical_addr_t rounded_start, rounded_end;
> + efi_physical_addr_t end, next, start;
> efi_physical_addr_t unprotect_start, unprotect_size;
>
> - if (efi_dxe_table == NULL)
> - return EFI_UNSUPPORTED;
> + if (!efi_dxe_table) {
> + retrieve_dxe_table();
>
> - /*
> - * This function should not be used to modify attributes
> - * other than writable/executable.
> - */
> -
> - if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
> - return EFI_INVALID_PARAMETER;
> -
> - /*
> - * Disallow simultaniously executable and writable memory
> - * to inforce W^X policy if direct extraction code is enabled.
> - */
> -
> - if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0) {
> - efi_warn("W^X violation at [%08lx,%08lx]\n",
> - (unsigned long)rounded_start,
> - (unsigned long)rounded_end);
> + if (!efi_dxe_table)
> + return EFI_UNSUPPORTED;
> }
>
> - rounded_start = rounddown(start, EFI_PAGE_SIZE);
> - rounded_end = roundup(start + size, EFI_PAGE_SIZE);
> -
> /*
> * Don't modify memory region attributes, they are
> * already suitable, to lower the possibility to
> * encounter firmware bugs.
> */
>
> - for (end = start + size; start < end; start = next) {
> +
> + for (start = rounded_start, end = rounded_end; start < end; start = next) {
>
> status = efi_dxe_call(get_memory_space_descriptor,
> start, &desc);
>
> if (status != EFI_SUCCESS) {
> efi_warn("Unable to get memory descriptor at %lx\n",
> - start);
> + (unsigned long)start);
> return status;
> }
>
> @@ -230,3 +214,107 @@ efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>
> return EFI_SUCCESS;
> }
> +
> +static void retrieve_memory_attributes_proto(void)
> +{
> + efi_status_t status;
> + efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
> +
> + status = efi_bs_call(locate_protocol, &guid, NULL,
> + (void **)&efi_mem_attrib_proto);
> + if (status != EFI_SUCCESS)
> + efi_mem_attrib_proto = NULL;
> +}
> +
> +/**
> + * efi_adjust_memory_range_protection() - change memory range protection attributes
> + * @start: memory range start address
> + * @size: memory range size
> + *
> + * Actual memory range for which memory attributes are modified is
> + * the smallest ranged with start address and size aligned to EFI_PAGE_SIZE
> + * that includes [start, start + size].
> + *
> + * This function first attempts to use EFI_MEMORY_ATTRIBUTE_PROTOCOL,
> + * that is a part of UEFI Specification since version 2.10.
> + * If the protocol is unavailable it falls back to DXE services functions.
> + *
> + * @return: status code
> + */
> +efi_status_t efi_adjust_memory_range_protection(unsigned long start,
> + unsigned long size,
> + unsigned long attributes)
> +{
> + efi_status_t status;
> + efi_physical_addr_t rounded_start, rounded_end;
> + unsigned long attr_clear;
> +
> + /*
> + * This function should not be used to modify attributes
> + * other than writable/executable.
> + */
> +
> + if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
> + return EFI_INVALID_PARAMETER;
> +
> + /*
> + * Warn if requested to make memory simultaneously
> + * executable and writable to enforce W^X policy.
> + */
> +
> + if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0) {
> + efi_warn("W^X violation at [%08lx,%08lx]",
> + (unsigned long)rounded_start,
> + (unsigned long)rounded_end);
> + }
> +
> + rounded_start = rounddown(start, EFI_PAGE_SIZE);
> + rounded_end = roundup(start + size, EFI_PAGE_SIZE);
> +
> + if (!efi_mem_attrib_proto) {
> + retrieve_memory_attributes_proto();
> +
> + /* Fall back to DXE services if unsupported */
> + if (!efi_mem_attrib_proto) {
> + return adjust_mem_attrib_dxe(rounded_start,
> + rounded_end,
> + attributes);
> + }
> + }
> +
> + /*
> + * Unlike DXE services functions, EFI_MEMORY_ATTRIBUTE_PROTOCOL
> + * does not clear unset protection bit, so it needs to be cleared
> + * explcitly
> + */
> +
> + attr_clear = ~attributes &
> + (EFI_MEMORY_RO | EFI_MEMORY_XP | EFI_MEMORY_RP);
> +
> + status = efi_call_proto(efi_mem_attrib_proto,
> + clear_memory_attributes,
> + rounded_start,
> + rounded_end - rounded_start,
> + attr_clear);
> + if (status != EFI_SUCCESS) {
> + efi_warn("Failed to clear memory attributes at [%08lx,%08lx]: %lx",
> + (unsigned long)rounded_start,
> + (unsigned long)rounded_end,
> + status);
> + return status;
> + }
> +
> + status = efi_call_proto(efi_mem_attrib_proto,
> + set_memory_attributes,
> + rounded_start,
> + rounded_end - rounded_start,
> + attributes);
> + if (status != EFI_SUCCESS) {
> + efi_warn("Failed to set memory attributes at [%08lx,%08lx]: %lx",
> + (unsigned long)rounded_start,
> + (unsigned long)rounded_end,
> + status);
> + }
> +
> + return status;
> +}
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 60697fcd8950..06a62b121521 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -23,7 +23,6 @@
> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>
> const efi_system_table_t *efi_system_table;
> -const efi_dxe_services_table_t *efi_dxe_table;
> u32 image_offset __section(".data");
> static efi_loaded_image_t *image __section(".data");
>
> @@ -357,15 +356,6 @@ void __noreturn efi_exit(efi_handle_t handle, efi_status_t status)
> static void setup_sections_memory_protection(unsigned long image_base)
> {
> #ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
> - efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> -
> - if (!efi_dxe_table ||
> - efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
> - efi_warn("Unable to locate EFI DXE services table\n");
> - efi_dxe_table = NULL;
> - return;
> - }
> -
> /* .setup [image_base, _head] */
> efi_adjust_memory_range_protection(image_base,
> (unsigned long)_head - image_base,
> @@ -732,13 +722,6 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
> if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
> efi_exit(handle, EFI_INVALID_PARAMETER);
>
> - efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
> - if (efi_dxe_table &&
> - efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
> - efi_warn("Ignoring DXE services table: invalid signature\n");
> - efi_dxe_table = NULL;
> - }
> -
> setup_sections_memory_protection(bzimage_addr - image_offset);
>
> #ifdef CONFIG_CMDLINE_BOOL
> --
> 2.37.4
>
On 2023-03-10 17:52, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>>
>> Convert kernel_add_identity_map() into a function pointer to be able
>> to provide alternative implementations of this function. Required
>> to enable calling the code using this function from EFI environment.
>>
>> Tested-by: Mario Limonciello <[email protected]>
>> Tested-by: Peter Jones <[email protected]>
>> Signed-off-by: Evgeniy Baskov <[email protected]>
>> ---
>> arch/x86/boot/compressed/ident_map_64.c | 7 ++++---
>> arch/x86/boot/compressed/misc.c | 24 ++++++++++++++++++++++++
>> arch/x86/boot/compressed/misc.h | 15 +++------------
>> 3 files changed, 31 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/boot/compressed/ident_map_64.c
>> b/arch/x86/boot/compressed/ident_map_64.c
>> index ba5108c58a4e..1aee524d3c2b 100644
>> --- a/arch/x86/boot/compressed/ident_map_64.c
>> +++ b/arch/x86/boot/compressed/ident_map_64.c
>> @@ -92,9 +92,9 @@ bool has_nx; /* set in head_64.S */
>> /*
>> * Adds the specified range to the identity mappings.
>> */
>> -unsigned long kernel_add_identity_map(unsigned long start,
>> - unsigned long end,
>> - unsigned int flags)
>> +unsigned long kernel_add_identity_map_(unsigned long start,
>
> Please use a more discriminating name here - the trailing _ is rather
> hard to spot.
Got it. The kernel_add_identity_map_impl() will fit better, I think.
>
>> + unsigned long end,
>> + unsigned int flags)
>> {
>> int ret;
>>
>> @@ -142,6 +142,7 @@ void initialize_identity_maps(void *rmode)
>> struct setup_data *sd;
>>
>> boot_params = rmode;
>> + kernel_add_identity_map = kernel_add_identity_map_;
>>
>> /* Exclude the encryption mask from __PHYSICAL_MASK */
>> physical_mask &= ~sme_me_mask;
>> diff --git a/arch/x86/boot/compressed/misc.c
>> b/arch/x86/boot/compressed/misc.c
>> index aa4a22bc9cf9..c9c235d65d16 100644
>> --- a/arch/x86/boot/compressed/misc.c
>> +++ b/arch/x86/boot/compressed/misc.c
>> @@ -275,6 +275,22 @@ static void parse_elf(void *output, unsigned long
>> output_len,
>> free(phdrs);
>> }
>>
>> +/*
>> + * This points to actual implementation of mapping function
>> + * for current environment: either EFI API wrapper,
>> + * own implementation or dummy implementation below.
>> + */
>> +unsigned long (*kernel_add_identity_map)(unsigned long start,
>> + unsigned long end,
>> + unsigned int flags);
>> +
>> +static inline unsigned long kernel_add_identity_map_dummy(unsigned
>> long start,
>
> This function is never called, it only has its address taken, so the
> 'inline' makes no sense here.
>
Indeed. I'll remove the inline.
>> + unsigned
>> long end,
>> + unsigned int
>> flags)
>> +{
>> + return start;
>> +}
>> +
>> /*
>> * The compressed kernel image (ZO), has been moved so that its
>> position
>> * is against the end of the buffer used to hold the uncompressed
>> kernel
>> @@ -312,6 +328,14 @@ asmlinkage __visible void *extract_kernel(void
>> *rmode, memptr heap,
>>
>> init_default_io_ops();
>>
>> + /*
>> + * On 64-bit this pointer is set during page table
>> uninitialization,
>
> initialization
Thanks!
>
>> + * but on 32-bit it remains uninitialized, since paging is
>> disabled.
>> + */
>> + if (IS_ENABLED(CONFIG_X86_32))
>> + kernel_add_identity_map =
>> kernel_add_identity_map_dummy;
>> +
>> +
>> /*
>> * Detect TDX guest environment.
>> *
>> diff --git a/arch/x86/boot/compressed/misc.h
>> b/arch/x86/boot/compressed/misc.h
>> index 38d31bec062d..0076b2845b4b 100644
>> --- a/arch/x86/boot/compressed/misc.h
>> +++ b/arch/x86/boot/compressed/misc.h
>> @@ -180,18 +180,9 @@ static inline int
>> count_immovable_mem_regions(void) { return 0; }
>> #ifdef CONFIG_X86_5LEVEL
>> extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
>> #endif
>> -#ifdef CONFIG_X86_64
>> -extern unsigned long kernel_add_identity_map(unsigned long start,
>> - unsigned long end,
>> - unsigned int flags);
>> -#else
>> -static inline unsigned long kernel_add_identity_map(unsigned long
>> start,
>> - unsigned long end,
>> - unsigned int
>> flags)
>> -{
>> - return start;
>> -}
>> -#endif
>> +extern unsigned long (*kernel_add_identity_map)(unsigned long start,
>> + unsigned long end,
>> + unsigned int flags);
>> /* Used by PAGE_KERN* macros: */
>> extern pteval_t __default_kernel_pte_mask;
>>
>> --
>> 2.37.4
>>
On 2023-03-10 17:59, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>>
>> This is required to fit more sections in PE section tables,
>> since its size is restricted by zero page located at specific offset
>> after the PE header.
>>
>> Tested-by: Mario Limonciello <[email protected]>
>> Tested-by: Peter Jones <[email protected]>
>> Signed-off-by: Evgeniy Baskov <[email protected]>
>
> I'd prefer to rip this out altogether.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?id=9510f6f04f579b9a3f54ad762c75ab2d905e37d8
Sounds great! Can I replace this patch with yours in v5?
>
> (and refer to the other thread in linux-efi@)
Which thread exactly? The one about the removal of
real-mode code?
>
>> ---
>> arch/x86/boot/header.S | 14 ++++++--------
>> 1 file changed, 6 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>> index 9338c68e7413..9fec80bc504b 100644
>> --- a/arch/x86/boot/header.S
>> +++ b/arch/x86/boot/header.S
>> @@ -59,17 +59,16 @@ start2:
>> cld
>>
>> movw $bugger_off_msg, %si
>> + movw $bugger_off_msg_size, %cx
>>
>> msg_loop:
>> lodsb
>> - andb %al, %al
>> - jz bs_die
>> movb $0xe, %ah
>> movw $7, %bx
>> int $0x10
>> - jmp msg_loop
>> + decw %cx
>> + jnz msg_loop
>>
>> -bs_die:
>> # Allow the user to press a key, then reboot
>> xorw %ax, %ax
>> int $0x16
>> @@ -90,10 +89,9 @@ bs_die:
>>
>> .section ".bsdata", "a"
>> bugger_off_msg:
>> - .ascii "Use a boot loader.\r\n"
>> - .ascii "\n"
>> - .ascii "Remove disk and press any key to reboot...\r\n"
>> - .byte 0
>> + .ascii "Use a boot loader. "
>> + .ascii "Press a key to reboot"
>> + .set bugger_off_msg_size, . - bugger_off_msg
>>
>> #ifdef CONFIG_EFI_STUB
>> pe_header:
>> --
>> 2.37.4
>>
On 2023-03-10 19:13, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 13:42, Evgeniy Baskov <[email protected]> wrote:
>>
>> Add EFI_MEMORY_ATTRIBUTE_PROTOCOL as preferred alternative to DXE
>> services for changing memory attributes in the EFISTUB.
>>
>> Use DXE services only as a fallback in case aforementioned protocol
>> is not supported by UEFI implementation.
>>
>> Move DXE services initialization code closer to the place they are
>> used
>> to match EFI_MEMORY_ATTRIBUTE_PROTOCOL initialization code.
>>
>> Tested-by: Mario Limonciello <[email protected]>
>> Tested-by: Peter Jones <[email protected]>
>> Signed-off-by: Evgeniy Baskov <[email protected]>
>
> I'm not convinced about the use of the DXE services for this, and I
> think we should replace this patch with changes that base all the new
> protection code on the EFI memory attributes protocol only.
>
> We introduced that DXE code to remove protections from memory that was
> mapped read-only and/or non-executable, and described as such in the
> GCD memory map.
>
> Using it to manipulate restricted permissions like this is quite a
> different thing, and sadly (at least in EDK2), the GCD system memory
> map is not kept in sync with the updated permissions, i.e, the W^X
> protections for loaded images and the NX protection for arbitrary page
> allocations are both based on the PI CPU arch protocol, which
> manipulates the page tables directly, but does not record the modified
> attributes in the GCD or EFI memory maps, as this would result in
> massive fragmentation and break lots of other things.
>
> That means that, except for the specific use case for which we
> introduced the DXE services calls, the only reliable way to figure out
> what permission attributes a certain range of memory is using is the
> EFI memory attributes protocol, and I don't think we should use
> anything else for tightening down these protections.
>
>
Makes sense. I'll change the patch to only widen the permissions with
DXE, so it aligns with the original intention. And only apply stricter
permissions with memory attribute protocol.
Thanks!
>
>
>> ---
>> drivers/firmware/efi/libstub/mem.c | 168
>> ++++++++++++++++++------
>> drivers/firmware/efi/libstub/x86-stub.c | 17 ---
>> 2 files changed, 128 insertions(+), 57 deletions(-)
>>
>> diff --git a/drivers/firmware/efi/libstub/mem.c
>> b/drivers/firmware/efi/libstub/mem.c
>> index 3e47e5931f04..07d54c88c62e 100644
>> --- a/drivers/firmware/efi/libstub/mem.c
>> +++ b/drivers/firmware/efi/libstub/mem.c
>> @@ -5,6 +5,9 @@
>>
>> #include "efistub.h"
>>
>> +const efi_dxe_services_table_t *efi_dxe_table;
>> +efi_memory_attribute_protocol_t *efi_mem_attrib_proto;
>> +
>> /**
>> * efi_get_memory_map() - get memory map
>> * @map: pointer to memory map pointer to which to
>> assign the
>> @@ -129,66 +132,47 @@ void efi_free(unsigned long size, unsigned long
>> addr)
>> efi_bs_call(free_pages, addr, nr_pages);
>> }
>>
>> -/**
>> - * efi_adjust_memory_range_protection() - change memory range
>> protection attributes
>> - * @start: memory range start address
>> - * @size: memory range size
>> - *
>> - * Actual memory range for which memory attributes are modified is
>> - * the smallest ranged with start address and size aligned to
>> EFI_PAGE_SIZE
>> - * that includes [start, start + size].
>> - *
>> - * @return: status code
>> - */
>> -efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>> - unsigned long size,
>> - unsigned long
>> attributes)
>> +static void retrieve_dxe_table(void)
>> +{
>> + efi_dxe_table =
>> get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>> + if (efi_dxe_table &&
>> + efi_dxe_table->hdr.signature !=
>> EFI_DXE_SERVICES_TABLE_SIGNATURE) {
>> + efi_warn("Ignoring DXE services table: invalid
>> signature\n");
>> + efi_dxe_table = NULL;
>> + }
>> +}
>> +
>> +static efi_status_t adjust_mem_attrib_dxe(efi_physical_addr_t
>> rounded_start,
>> + efi_physical_addr_t
>> rounded_end,
>> + unsigned long attributes)
>> {
>> efi_status_t status;
>> efi_gcd_memory_space_desc_t desc;
>> - efi_physical_addr_t end, next;
>> - efi_physical_addr_t rounded_start, rounded_end;
>> + efi_physical_addr_t end, next, start;
>> efi_physical_addr_t unprotect_start, unprotect_size;
>>
>> - if (efi_dxe_table == NULL)
>> - return EFI_UNSUPPORTED;
>> + if (!efi_dxe_table) {
>> + retrieve_dxe_table();
>>
>> - /*
>> - * This function should not be used to modify attributes
>> - * other than writable/executable.
>> - */
>> -
>> - if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
>> - return EFI_INVALID_PARAMETER;
>> -
>> - /*
>> - * Disallow simultaniously executable and writable memory
>> - * to inforce W^X policy if direct extraction code is enabled.
>> - */
>> -
>> - if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0) {
>> - efi_warn("W^X violation at [%08lx,%08lx]\n",
>> - (unsigned long)rounded_start,
>> - (unsigned long)rounded_end);
>> + if (!efi_dxe_table)
>> + return EFI_UNSUPPORTED;
>> }
>>
>> - rounded_start = rounddown(start, EFI_PAGE_SIZE);
>> - rounded_end = roundup(start + size, EFI_PAGE_SIZE);
>> -
>> /*
>> * Don't modify memory region attributes, they are
>> * already suitable, to lower the possibility to
>> * encounter firmware bugs.
>> */
>>
>> - for (end = start + size; start < end; start = next) {
>> +
>> + for (start = rounded_start, end = rounded_end; start < end;
>> start = next) {
>>
>> status = efi_dxe_call(get_memory_space_descriptor,
>> start, &desc);
>>
>> if (status != EFI_SUCCESS) {
>> efi_warn("Unable to get memory descriptor at
>> %lx\n",
>> - start);
>> + (unsigned long)start);
>> return status;
>> }
>>
>> @@ -230,3 +214,107 @@ efi_status_t
>> efi_adjust_memory_range_protection(unsigned long start,
>>
>> return EFI_SUCCESS;
>> }
>> +
>> +static void retrieve_memory_attributes_proto(void)
>> +{
>> + efi_status_t status;
>> + efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
>> +
>> + status = efi_bs_call(locate_protocol, &guid, NULL,
>> + (void **)&efi_mem_attrib_proto);
>> + if (status != EFI_SUCCESS)
>> + efi_mem_attrib_proto = NULL;
>> +}
>> +
>> +/**
>> + * efi_adjust_memory_range_protection() - change memory range
>> protection attributes
>> + * @start: memory range start address
>> + * @size: memory range size
>> + *
>> + * Actual memory range for which memory attributes are modified is
>> + * the smallest ranged with start address and size aligned to
>> EFI_PAGE_SIZE
>> + * that includes [start, start + size].
>> + *
>> + * This function first attempts to use EFI_MEMORY_ATTRIBUTE_PROTOCOL,
>> + * that is a part of UEFI Specification since version 2.10.
>> + * If the protocol is unavailable it falls back to DXE services
>> functions.
>> + *
>> + * @return: status code
>> + */
>> +efi_status_t efi_adjust_memory_range_protection(unsigned long start,
>> + unsigned long size,
>> + unsigned long
>> attributes)
>> +{
>> + efi_status_t status;
>> + efi_physical_addr_t rounded_start, rounded_end;
>> + unsigned long attr_clear;
>> +
>> + /*
>> + * This function should not be used to modify attributes
>> + * other than writable/executable.
>> + */
>> +
>> + if ((attributes & ~(EFI_MEMORY_RO | EFI_MEMORY_XP)) != 0)
>> + return EFI_INVALID_PARAMETER;
>> +
>> + /*
>> + * Warn if requested to make memory simultaneously
>> + * executable and writable to enforce W^X policy.
>> + */
>> +
>> + if ((attributes & (EFI_MEMORY_RO | EFI_MEMORY_XP)) == 0) {
>> + efi_warn("W^X violation at [%08lx,%08lx]",
>> + (unsigned long)rounded_start,
>> + (unsigned long)rounded_end);
>> + }
>> +
>> + rounded_start = rounddown(start, EFI_PAGE_SIZE);
>> + rounded_end = roundup(start + size, EFI_PAGE_SIZE);
>> +
>> + if (!efi_mem_attrib_proto) {
>> + retrieve_memory_attributes_proto();
>> +
>> + /* Fall back to DXE services if unsupported */
>> + if (!efi_mem_attrib_proto) {
>> + return adjust_mem_attrib_dxe(rounded_start,
>> + rounded_end,
>> + attributes);
>> + }
>> + }
>> +
>> + /*
>> + * Unlike DXE services functions,
>> EFI_MEMORY_ATTRIBUTE_PROTOCOL
>> + * does not clear unset protection bit, so it needs to be
>> cleared
>> + * explcitly
>> + */
>> +
>> + attr_clear = ~attributes &
>> + (EFI_MEMORY_RO | EFI_MEMORY_XP | EFI_MEMORY_RP);
>> +
>> + status = efi_call_proto(efi_mem_attrib_proto,
>> + clear_memory_attributes,
>> + rounded_start,
>> + rounded_end - rounded_start,
>> + attr_clear);
>> + if (status != EFI_SUCCESS) {
>> + efi_warn("Failed to clear memory attributes at
>> [%08lx,%08lx]: %lx",
>> + (unsigned long)rounded_start,
>> + (unsigned long)rounded_end,
>> + status);
>> + return status;
>> + }
>> +
>> + status = efi_call_proto(efi_mem_attrib_proto,
>> + set_memory_attributes,
>> + rounded_start,
>> + rounded_end - rounded_start,
>> + attributes);
>> + if (status != EFI_SUCCESS) {
>> + efi_warn("Failed to set memory attributes at
>> [%08lx,%08lx]: %lx",
>> + (unsigned long)rounded_start,
>> + (unsigned long)rounded_end,
>> + status);
>> + }
>> +
>> + return status;
>> +}
>> diff --git a/drivers/firmware/efi/libstub/x86-stub.c
>> b/drivers/firmware/efi/libstub/x86-stub.c
>> index 60697fcd8950..06a62b121521 100644
>> --- a/drivers/firmware/efi/libstub/x86-stub.c
>> +++ b/drivers/firmware/efi/libstub/x86-stub.c
>> @@ -23,7 +23,6 @@
>> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>>
>> const efi_system_table_t *efi_system_table;
>> -const efi_dxe_services_table_t *efi_dxe_table;
>> u32 image_offset __section(".data");
>> static efi_loaded_image_t *image __section(".data");
>>
>> @@ -357,15 +356,6 @@ void __noreturn efi_exit(efi_handle_t handle,
>> efi_status_t status)
>> static void setup_sections_memory_protection(unsigned long
>> image_base)
>> {
>> #ifdef CONFIG_EFI_DXE_MEM_ATTRIBUTES
>> - efi_dxe_table =
>> get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>> -
>> - if (!efi_dxe_table ||
>> - efi_dxe_table->hdr.signature !=
>> EFI_DXE_SERVICES_TABLE_SIGNATURE) {
>> - efi_warn("Unable to locate EFI DXE services table\n");
>> - efi_dxe_table = NULL;
>> - return;
>> - }
>> -
>> /* .setup [image_base, _head] */
>> efi_adjust_memory_range_protection(image_base,
>> (unsigned long)_head -
>> image_base,
>> @@ -732,13 +722,6 @@ asmlinkage unsigned long efi_main(efi_handle_t
>> handle,
>> if (efi_system_table->hdr.signature !=
>> EFI_SYSTEM_TABLE_SIGNATURE)
>> efi_exit(handle, EFI_INVALID_PARAMETER);
>>
>> - efi_dxe_table =
>> get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
>> - if (efi_dxe_table &&
>> - efi_dxe_table->hdr.signature !=
>> EFI_DXE_SERVICES_TABLE_SIGNATURE) {
>> - efi_warn("Ignoring DXE services table: invalid
>> signature\n");
>> - efi_dxe_table = NULL;
>> - }
>> -
>> setup_sections_memory_protection(bzimage_addr - image_offset);
>>
>> #ifdef CONFIG_CMDLINE_BOOL
>> --
>> 2.37.4
>>
On Sat, 11 Mar 2023 at 15:49, Evgeniy Baskov <[email protected]> wrote:
>
> On 2023-03-10 17:59, Ard Biesheuvel wrote:
> > On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
> >>
> >> This is required to fit more sections in PE section tables,
> >> since its size is restricted by zero page located at specific offset
> >> after the PE header.
> >>
> >> Tested-by: Mario Limonciello <[email protected]>
> >> Tested-by: Peter Jones <[email protected]>
> >> Signed-off-by: Evgeniy Baskov <[email protected]>
> >
> > I'd prefer to rip this out altogether.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?id=9510f6f04f579b9a3f54ad762c75ab2d905e37d8
>
> Sounds great! Can I replace this patch with yours in v5?
>
Of course.
> >
> > (and refer to the other thread in linux-efi@)
>
> Which thread exactly? The one about the removal of
> real-mode code?
>
Yes, this one
https://lore.kernel.org/linux-efi/[email protected]/
On 2023-03-11 20:27, Ard Biesheuvel wrote:
> On Sat, 11 Mar 2023 at 15:49, Evgeniy Baskov <[email protected]> wrote:
>>
>> On 2023-03-10 17:59, Ard Biesheuvel wrote:
>> > On Thu, 15 Dec 2022 at 13:40, Evgeniy Baskov <[email protected]> wrote:
>> >>
>> >> This is required to fit more sections in PE section tables,
>> >> since its size is restricted by zero page located at specific offset
>> >> after the PE header.
>> >>
>> >> Tested-by: Mario Limonciello <[email protected]>
>> >> Tested-by: Peter Jones <[email protected]>
>> >> Signed-off-by: Evgeniy Baskov <[email protected]>
>> >
>> > I'd prefer to rip this out altogether.
>> >
>> > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?id=9510f6f04f579b9a3f54ad762c75ab2d905e37d8
>>
>> Sounds great! Can I replace this patch with yours in v5?
>>
>
> Of course.
>
>> >
>> > (and refer to the other thread in linux-efi@)
>>
>> Which thread exactly? The one about the removal of
>> real-mode code?
>>
>
> Yes, this one
>
> https://lore.kernel.org/linux-efi/[email protected]/
Thanks!