2022-11-22 12:44:32

by Evgeniy Baskov

[permalink] [raw]
Subject: [PATCH v3 00/24] x86_64: Improvements at compressed kernel stage

This patchset is aimed
* to improve UEFI compatibility of compressed kernel code for x86_64
* to setup proper memory access attributes for code and rodata sections
* to implement W^X protection policy throughout the whole execution
of compressed kernel for EFISTUB code path.

Kernel is made to be more compatible with PE image specification [3],
allowing it to be successfully loaded by stricter PE loader
implementations like the one from [2]. There is at least one
known implementation that uses that loader in production [4].
There are also ongoing efforts to upstream these changes.

Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
EFI specification since version 2.10, as a better alternative to
using DXE services for memory protection attributes manipulation,
since it is defined by the UEFI specification itself and not UEFI PI
specification. This protocol is not widely available so the code
using DXE services is kept in place as a fallback in case specific
implementation does not support the new protocol.
One of EFI implementations that already support
EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].

Kernel image generation tool (tools/build.c) is refactored as a part
of changes that makes PE image more compatible.

The patchset implements memory protection for compressed kernel
code while executing both inside EFI boot services and outside of
them. For EFISTUB code path W^X protection policy is maintained
throughout the whole execution of compressed kernel. The latter
is achieved by extracting the kernel directly from EFI environment
and jumping to it's head immediately after exiting EFI boot services.
As a side effect of this change one page table rebuild and a copy of
the kernel image is removed.

Memory protection inside EFI environment is controlled by the
CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
protection attributes of PE sections and not only DXE services as the
name might suggest.

Changes in v2:
* Fix spelling.
* Rebase code to current master.
* Split huge patches into smaller ones.
* Remove unneeded forward declarations.
* Make direct extraction unconditional.
* Also make it work for x86_32.
* Reduce lower limit of KASLR to 64M.
* Make callback interface more logically consistent.
* Actually declare callbacks structure before using it.
* Mention effect on x86_32 in commit message of
"x86/build: Remove RWX sections and align on 4KB".
* Clarify commit message of
"x86/boot: Increase boot page table size".
* Remove "startup32_" prefix on startup32_enable_nx_if_supported.
* Move linker generated sections outside of function scope.
* Drop some unintended changes.
* Drop generating 2 reloc entries.
(as I've misread the documentation and there's no need for this change.)
* Set has_nx from enable_nx_if_supported correctly.
* Move ELF header check to build time.
* Set WP at the same time as PG in trampoline code,
as it is more logically consistent.
* Put x86-specific EFISTUB definitions in x86-stub.h header.
* Catch presence of ELF segments violating W^X during build.
* Move PE definitions from build.c to a new header file.
* Fix generation of PE '.compat' section.

I decided to keep protection of compressed kernel blob and '.rodata'
separate from '.text' for now, since it does not really have a lot
of overhead.

Otherwise, all comments on v1 seems to be addressed.

Changes in v3:
* Setup IDT before issuing cpuid so that AMD SEV #VC handler is set.
* Replace memcpy with strncpy to prevent out-of-bounds reads in tools/build.c.
* Zero BSS before entering efi_main(), since it can contain garbage
when booting via EFI handover protocol.
* When booting via EFI don't require init_size of RAM, since in-place
unpacking is not used anyway with that interface. This saves ~40M of memory
for debian .config.
* Setup sections memory protection in efi_main() to cover EFI handover protocol,
where EFI sections are likely not properly protected.

Patch "x86/boot: Support 4KB pages for identity mapping" needs review
from x86/mm team.

I have also included Peter's patch [6] into the series for simplicity.

Many thanks to Ard Biesheuvel <[email protected]> and
Andrew Cooper <[email protected]> for reviewing the patches, and to
Peter Jones <[email protected]>, Mario Limonciello <[email protected]> and
Joey Lee <[email protected]> for additional testing!

[1] https://lkml.org/lkml/2022/8/1/1314
[2] https://github.com/acidanthera/audk/tree/secure_pe
[3] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
[4] https://www.ispras.ru/en/technologies/asperitas/
[5] https://github.com/microsoft/mu_tiano_platforms
[6] https://lkml.org/lkml/2022/10/18/1178

Evgeniy Baskov (23):
x86/boot: Align vmlinuz sections on page size
x86/build: Remove RWX sections and align on 4KB
x86/boot: Set cr0 to known state in trampoline
x86/boot: Increase boot page table size
x86/boot: Support 4KB pages for identity mapping
x86/boot: Setup memory protection for bzImage code
x86/build: Check W^X of vmlinux during build
x86/boot: Map memory explicitly
x86/boot: Remove mapping from page fault handler
efi/libstub: Move helper function to related file
x86/boot: Make console interface more abstract
x86/boot: Make kernel_add_identity_map() a pointer
x86/boot: Split trampoline and pt init code
x86/boot: Add EFI kernel extraction interface
efi/x86: Support extracting kernel from libstub
x86/boot: Reduce lower limit of physical KASLR
x86/boot: Reduce size of the DOS stub
tools/include: Add simplified version of pe.h
x86/build: Cleanup tools/build.c
x86/build: Make generated PE more spec compliant
efi/x86: Explicitly set sections memory attributes
efi/libstub: Add memory attribute protocol definitions
efi/libstub: Use memory attribute protocol

Peter Jones (1):
efi/libstub: Make memory protection warnings include newlines

arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/Makefile | 8 +-
arch/x86/boot/compressed/acpi.c | 21 +-
arch/x86/boot/compressed/efi.c | 19 +-
arch/x86/boot/compressed/head_32.S | 53 +-
arch/x86/boot/compressed/head_64.S | 89 ++-
arch/x86/boot/compressed/ident_map_64.c | 122 ++--
arch/x86/boot/compressed/kaslr.c | 8 +-
arch/x86/boot/compressed/misc.c | 278 ++++-----
arch/x86/boot/compressed/misc.h | 23 +-
arch/x86/boot/compressed/pgtable.h | 20 -
arch/x86/boot/compressed/pgtable_64.c | 75 ++-
arch/x86/boot/compressed/putstr.c | 130 ++++
arch/x86/boot/compressed/sev.c | 6 +-
arch/x86/boot/compressed/vmlinux.lds.S | 6 +
arch/x86/boot/header.S | 110 +---
arch/x86/boot/tools/build.c | 569 +++++++++++-------
arch/x86/include/asm/boot.h | 26 +-
arch/x86/include/asm/efi.h | 7 +
arch/x86/include/asm/init.h | 1 +
arch/x86/include/asm/shared/extract.h | 26 +
arch/x86/include/asm/shared/pgtable.h | 29 +
arch/x86/kernel/vmlinux.lds.S | 15 +-
arch/x86/mm/ident_map.c | 185 +++++-
drivers/firmware/efi/Kconfig | 2 +
drivers/firmware/efi/libstub/Makefile | 2 +-
drivers/firmware/efi/libstub/efistub.h | 26 +
drivers/firmware/efi/libstub/mem.c | 190 ++++++
.../firmware/efi/libstub/x86-extract-direct.c | 208 +++++++
drivers/firmware/efi/libstub/x86-stub.c | 231 ++-----
drivers/firmware/efi/libstub/x86-stub.h | 14 +
include/linux/efi.h | 1 +
tools/include/linux/pe.h | 150 +++++
33 files changed, 1852 insertions(+), 800 deletions(-)
delete mode 100644 arch/x86/boot/compressed/pgtable.h
create mode 100644 arch/x86/boot/compressed/putstr.c
create mode 100644 arch/x86/include/asm/shared/extract.h
create mode 100644 arch/x86/include/asm/shared/pgtable.h
create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
create mode 100644 tools/include/linux/pe.h

--
2.37.4


2022-11-22 12:44:55

by Evgeniy Baskov

[permalink] [raw]
Subject: [PATCH v3 06/24] x86/boot: Setup memory protection for bzImage code

Use previously added code to use 4KB pages for mapping. Map compressed
and uncompressed kernel with appropriate memory protection attributes.
For compressed kernel set them up manually. For uncompressed kernel
used flags specified in ELF header.

Tested-by: Mario Limonciello <[email protected]>
Signed-off-by: Evgeniy Baskov <[email protected]>

delete mode 100644 arch/x86/boot/compressed/pgtable.h
create mode 100644 arch/x86/include/asm/shared/pgtable.h
---
arch/x86/boot/compressed/head_64.S | 24 +++++-
arch/x86/boot/compressed/ident_map_64.c | 97 ++++++++++++++++---------
arch/x86/boot/compressed/misc.c | 63 ++++++++++++++--
arch/x86/boot/compressed/misc.h | 22 +++++-
arch/x86/boot/compressed/pgtable.h | 20 -----
arch/x86/boot/compressed/pgtable_64.c | 2 +-
arch/x86/boot/compressed/sev.c | 6 +-
arch/x86/include/asm/shared/pgtable.h | 29 ++++++++
8 files changed, 197 insertions(+), 66 deletions(-)
delete mode 100644 arch/x86/boot/compressed/pgtable.h
create mode 100644 arch/x86/include/asm/shared/pgtable.h

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 2a4372b84fc8..73a2ac2b063d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -29,13 +29,14 @@
#include <linux/linkage.h>
#include <asm/segment.h>
#include <asm/boot.h>
+#include <asm/cpufeatures.h>
#include <asm/msr.h>
#include <asm/processor-flags.h>
#include <asm/asm-offsets.h>
#include <asm/bootparam.h>
#include <asm/desc_defs.h>
#include <asm/trapnr.h>
-#include "pgtable.h"
+#include <asm/shared/pgtable.h>

/*
* Locally defined symbols should be marked hidden:
@@ -578,6 +579,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
pushq %rsi
call load_stage2_idt

+ call enable_nx_if_supported
/* Pass boot_params to initialize_identity_maps() */
movq (%rsp), %rdi
call initialize_identity_maps
@@ -602,6 +604,26 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
jmp *%rax
SYM_FUNC_END(.Lrelocated)

+SYM_FUNC_START_LOCAL_NOALIGN(enable_nx_if_supported)
+ pushq %rbx
+
+ mov $0x80000001, %eax
+ cpuid
+ btl $(X86_FEATURE_NX & 31), %edx
+ jnc .Lnonx
+
+ movl $MSR_EFER, %ecx
+ rdmsr
+ btsl $_EFER_NX, %eax
+ wrmsr
+
+ movb $1, has_nx(%rip)
+
+.Lnonx:
+ popq %rbx
+ RET
+SYM_FUNC_END(enable_nx_if_supported)
+
.code32
/*
* This is the 32-bit trampoline that will be copied over to low memory.
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index d4a314cc50d6..fec795a4ce23 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -28,6 +28,7 @@
#include <asm/trap_pf.h>
#include <asm/trapnr.h>
#include <asm/init.h>
+#include <asm/shared/pgtable.h>
/* Use the static base for this part of the boot process */
#undef __PAGE_OFFSET
#define __PAGE_OFFSET __PAGE_OFFSET_BASE
@@ -86,24 +87,52 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
* Due to relocation, pointers must be assigned at run time not build time.
*/
static struct x86_mapping_info mapping_info;
+bool has_nx; /* set in head_64.S */

/*
* Adds the specified range to the identity mappings.
*/
-void kernel_add_identity_map(unsigned long start, unsigned long end)
+unsigned long kernel_add_identity_map(unsigned long start,
+ unsigned long end,
+ unsigned int flags)
{
int ret;

/* Align boundary to 2M. */
- start = round_down(start, PMD_SIZE);
- end = round_up(end, PMD_SIZE);
+ start = round_down(start, PAGE_SIZE);
+ end = round_up(end, PAGE_SIZE);
if (start >= end)
- return;
+ return start;
+
+ /*
+ * Warn if W^X is violated.
+ * Only do that if CONFIG_RANDOMIZE_BASE is set, since otherwise we need
+ * to create RWX region in case of overlapping memory regions for
+ * compressed and uncompressed kernel.
+ */
+
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) &&
+ (flags & (MAP_EXEC | MAP_WRITE)) == (MAP_EXEC | MAP_WRITE))
+ warn("W^X violation\n");
+
+ bool nx = !(flags & MAP_EXEC) && has_nx;
+ bool ro = !(flags & MAP_WRITE);
+
+ mapping_info.page_flag = sme_me_mask | (nx ?
+ (ro ? __PAGE_KERNEL_RO : __PAGE_KERNEL) :
+ (ro ? __PAGE_KERNEL_ROX : __PAGE_KERNEL_EXEC));

/* Build the mapping. */
- ret = kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, start, end);
+ ret = kernel_ident_mapping_init(&mapping_info,
+ (pgd_t *)top_level_pgt,
+ start, end);
if (ret)
error("Error: kernel_ident_mapping_init() failed\n");
+
+ if (!(flags & MAP_NOFLUSH))
+ write_cr3(top_level_pgt);
+
+ return start;
}

/* Locates and clears a region for a new top level page table. */
@@ -112,14 +141,17 @@ void initialize_identity_maps(void *rmode)
unsigned long cmdline;
struct setup_data *sd;

+ boot_params = rmode;
+
/* Exclude the encryption mask from __PHYSICAL_MASK */
physical_mask &= ~sme_me_mask;

/* Init mapping_info with run-time function/buffer pointers. */
mapping_info.alloc_pgt_page = alloc_pgt_page;
mapping_info.context = &pgt_data;
- mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask;
+ mapping_info.page_flag = __PAGE_KERNEL_EXEC | sme_me_mask;
mapping_info.kernpg_flag = _KERNPG_TABLE;
+ mapping_info.allow_4kpages = 1;

/*
* It should be impossible for this not to already be true,
@@ -154,15 +186,29 @@ void initialize_identity_maps(void *rmode)
/*
* New page-table is set up - map the kernel image, boot_params and the
* command line. The uncompressed kernel requires boot_params and the
- * command line to be mapped in the identity mapping. Map them
- * explicitly here in case the compressed kernel does not touch them,
- * or does not touch all the pages covering them.
+ * command line to be mapped in the identity mapping.
+ * Every other accessed memory region is mapped later, if required.
*/
- kernel_add_identity_map((unsigned long)_head, (unsigned long)_end);
- boot_params = rmode;
- kernel_add_identity_map((unsigned long)boot_params, (unsigned long)(boot_params + 1));
+ kernel_add_identity_map((unsigned long)_head,
+ (unsigned long)_ehead, MAP_EXEC | MAP_NOFLUSH);
+
+ kernel_add_identity_map((unsigned long)_compressed,
+ (unsigned long)_ecompressed, MAP_WRITE | MAP_NOFLUSH);
+
+ kernel_add_identity_map((unsigned long)_text,
+ (unsigned long)_etext, MAP_EXEC | MAP_NOFLUSH);
+
+ kernel_add_identity_map((unsigned long)_rodata,
+ (unsigned long)_erodata, MAP_NOFLUSH);
+
+ kernel_add_identity_map((unsigned long)_data,
+ (unsigned long)_end, MAP_WRITE | MAP_NOFLUSH);
+
+ kernel_add_identity_map((unsigned long)boot_params,
+ (unsigned long)(boot_params + 1), MAP_WRITE | MAP_NOFLUSH);
+
cmdline = get_cmd_line_ptr();
- kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE);
+ kernel_add_identity_map(cmdline, cmdline + COMMAND_LINE_SIZE, MAP_NOFLUSH);

/*
* Also map the setup_data entries passed via boot_params in case they
@@ -172,7 +218,7 @@ void initialize_identity_maps(void *rmode)
while (sd) {
unsigned long sd_addr = (unsigned long)sd;

- kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len);
+ kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + sd->len, MAP_NOFLUSH);
sd = (struct setup_data *)sd->next;
}

@@ -185,26 +231,11 @@ void initialize_identity_maps(void *rmode)
static pte_t *split_large_pmd(struct x86_mapping_info *info,
pmd_t *pmdp, unsigned long __address)
{
- unsigned long page_flags;
- unsigned long address;
- pte_t *pte;
- pmd_t pmd;
- int i;
-
- pte = (pte_t *)info->alloc_pgt_page(info->context);
+ unsigned long address = __address & PMD_MASK;
+ pte_t *pte = ident_split_large_pmd(info, pmdp, address);
if (!pte)
return NULL;

- address = __address & PMD_MASK;
- /* No large page - clear PSE flag */
- page_flags = info->page_flag & ~_PAGE_PSE;
-
- /* Populate the PTEs */
- for (i = 0; i < PTRS_PER_PMD; i++) {
- set_pte(&pte[i], __pte(address | page_flags));
- address += PAGE_SIZE;
- }
-
/*
* Ideally we need to clear the large PMD first and do a TLB
* flush before we write the new PMD. But the 2M range of the
@@ -214,7 +245,7 @@ static pte_t *split_large_pmd(struct x86_mapping_info *info,
* also the only user of the page-table, so there is no chance
* of a TLB multihit.
*/
- pmd = __pmd((unsigned long)pte | info->kernpg_flag);
+ pmd_t pmd = __pmd((unsigned long)pte | info->kernpg_flag);
set_pmd(pmdp, pmd);
/* Flush TLB to establish the new PMD */
write_cr3(top_level_pgt);
@@ -377,5 +408,5 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
* Error code is sane - now identity map the 2M region around
* the faulting address.
*/
- kernel_add_identity_map(address, end);
+ kernel_add_identity_map(address, end, MAP_WRITE);
}
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cf690d8712f4..0c7ec290044d 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -14,10 +14,10 @@

#include "misc.h"
#include "error.h"
-#include "pgtable.h"
#include "../string.h"
#include "../voffset.h"
#include <asm/bootparam_utils.h>
+#include <asm/shared/pgtable.h>

/*
* WARNING!!
@@ -277,7 +277,8 @@ static inline void handle_relocations(void *output, unsigned long output_len,
{ }
#endif

-static void parse_elf(void *output)
+static void parse_elf(void *output, unsigned long output_len,
+ unsigned long virt_addr)
{
#ifdef CONFIG_X86_64
Elf64_Ehdr ehdr;
@@ -287,6 +288,7 @@ static void parse_elf(void *output)
Elf32_Phdr *phdrs, *phdr;
#endif
void *dest;
+ unsigned long addr;
int i;

memcpy(&ehdr, output, sizeof(ehdr));
@@ -323,10 +325,49 @@ static void parse_elf(void *output)
#endif
memmove(dest, output + phdr->p_offset, phdr->p_filesz);
break;
- default: /* Ignore other PT_* */ break;
+ default:
+ /* Ignore other PT_* */
+ break;
+ }
+ }
+
+ handle_relocations(output, output_len, virt_addr);
+
+ if (!IS_ENABLED(CONFIG_RANDOMIZE_BASE))
+ goto skip_protect;
+
+ for (i = 0; i < ehdr.e_phnum; i++) {
+ phdr = &phdrs[i];
+
+ switch (phdr->p_type) {
+ case PT_LOAD:
+#ifdef CONFIG_RELOCATABLE
+ addr = (unsigned long)output;
+ addr += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+#else
+ addr = phdr->p_paddr;
+#endif
+ /*
+ * Simultaneously readable and writable segments are
+ * violating W^X, and should not be present in vmlinux image.
+ * The absence of such segments is checked during build.
+ */
+
+ unsigned int flags = MAP_PROTECT;
+ if (phdr->p_flags & PF_X)
+ flags |= MAP_EXEC;
+ if (phdr->p_flags & PF_W)
+ flags |= MAP_WRITE;
+
+ kernel_add_identity_map(addr, addr + phdr->p_memsz, flags);
+ break;
+ default:
+ /* Ignore other PT_* */
+ break;
}
}

+skip_protect:
free(phdrs);
}

@@ -434,6 +475,19 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
needed_size,
&virt_addr);

+ unsigned long phys_addr = (unsigned long)output;
+
+ /*
+ * If KASLR is disabled input and output regions may overlap.
+ * In this case we need to map region excutable as well.
+ */
+ unsigned long map_flags = MAP_ALLOC | MAP_WRITE |
+ (IS_ENABLED(CONFIG_RANDOMIZE_BASE) ? 0 : MAP_EXEC);
+ phys_addr = kernel_add_identity_map(phys_addr,
+ phys_addr + needed_size,
+ map_flags);
+ output = (unsigned char *)phys_addr;
+
/* Validate memory location choices. */
if ((unsigned long)output & (MIN_KERNEL_ALIGN - 1))
error("Destination physical address inappropriately aligned");
@@ -456,8 +510,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
debug_putstr("\nDecompressing Linux... ");
__decompress(input_data, input_len, NULL, NULL, output, output_len,
NULL, error);
- parse_elf(output);
- handle_relocations(output, output_len, virt_addr);
+ parse_elf(output, output_len, virt_addr);
debug_putstr("done.\nBooting the kernel.\n");

/* Disable exception handling before booting the kernel */
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 62208ec04ca4..033db9b536e6 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -44,8 +44,12 @@
#define memptr unsigned
#endif

-/* boot/compressed/vmlinux start and end markers */
-extern char _head[], _end[];
+/* Compressed kernel section start/end markers. */
+extern char _head[], _ehead[];
+extern char _compressed[], _ecompressed[];
+extern char _text[], _etext[];
+extern char _rodata[], _erodata[];
+extern char _data[], _end[];

/* misc.c */
extern memptr free_mem_ptr;
@@ -171,8 +175,18 @@ static inline int count_immovable_mem_regions(void) { return 0; }
#ifdef CONFIG_X86_5LEVEL
extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
#endif
-extern void kernel_add_identity_map(unsigned long start, unsigned long end);
-
+#ifdef CONFIG_X86_64
+extern unsigned long kernel_add_identity_map(unsigned long start,
+ unsigned long end,
+ unsigned int flags);
+#else
+static inline unsigned long kernel_add_identity_map(unsigned long start,
+ unsigned long end,
+ unsigned int flags)
+{
+ return start;
+}
+#endif
/* Used by PAGE_KERN* macros: */
extern pteval_t __default_kernel_pte_mask;

diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
deleted file mode 100644
index cc9b2529a086..000000000000
--- a/arch/x86/boot/compressed/pgtable.h
+++ /dev/null
@@ -1,20 +0,0 @@
-#ifndef BOOT_COMPRESSED_PAGETABLE_H
-#define BOOT_COMPRESSED_PAGETABLE_H
-
-#define TRAMPOLINE_32BIT_SIZE (2 * PAGE_SIZE)
-
-#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
-
-#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE 0x80
-
-#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
-
-#ifndef __ASSEMBLER__
-
-extern unsigned long *trampoline_32bit;
-
-extern void trampoline_32bit_src(void *return_ptr);
-
-#endif /* __ASSEMBLER__ */
-#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff4111b..c7cf5a1059a8 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -2,7 +2,7 @@
#include "misc.h"
#include <asm/e820/types.h>
#include <asm/processor.h>
-#include "pgtable.h"
+#include <asm/shared/pgtable.h>
#include "../string.h"
#include "efi.h"

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index c93930d5ccbd..99f3ad0b30f3 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -13,6 +13,7 @@
#include "misc.h"

#include <asm/pgtable_types.h>
+#include <asm/shared/pgtable.h>
#include <asm/sev.h>
#include <asm/trapnr.h>
#include <asm/trap_pf.h>
@@ -435,10 +436,11 @@ void sev_prep_identity_maps(unsigned long top_level_pgt)
unsigned long cc_info_pa = boot_params->cc_blob_address;
struct cc_blob_sev_info *cc_info;

- kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info));
+ kernel_add_identity_map(cc_info_pa, cc_info_pa + sizeof(*cc_info), MAP_NOFLUSH);

cc_info = (struct cc_blob_sev_info *)cc_info_pa;
- kernel_add_identity_map(cc_info->cpuid_phys, cc_info->cpuid_phys + cc_info->cpuid_len);
+ kernel_add_identity_map(cc_info->cpuid_phys,
+ cc_info->cpuid_phys + cc_info->cpuid_len, MAP_NOFLUSH);
}

sev_verify_cbit(top_level_pgt);
diff --git a/arch/x86/include/asm/shared/pgtable.h b/arch/x86/include/asm/shared/pgtable.h
new file mode 100644
index 000000000000..6527dadf39d6
--- /dev/null
+++ b/arch/x86/include/asm/shared/pgtable.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ASM_SHARED_PAGETABLE_H
+#define ASM_SHARED_PAGETABLE_H
+
+#define MAP_WRITE 0x02 /* Writable memory */
+#define MAP_EXEC 0x04 /* Executable memory */
+#define MAP_ALLOC 0x10 /* Range needs to be allocated */
+#define MAP_PROTECT 0x20 /* Set exact memory attributes for memory range */
+#define MAP_NOFLUSH 0x40 /* Avoid flushing TLB */
+
+#define TRAMPOLINE_32BIT_SIZE (3 * PAGE_SIZE)
+
+#define TRAMPOLINE_32BIT_PLACEMENT_MAX (0xA0000)
+
+#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
+
+#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
+#define TRAMPOLINE_32BIT_CODE_SIZE 0x80
+
+#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
+
+#ifndef __ASSEMBLER__
+
+extern unsigned long *trampoline_32bit;
+
+extern void trampoline_32bit_src(void *return_ptr);
+
+#endif /* __ASSEMBLER__ */
+#endif /* ASM_SHARED_PAGETABLE_H */
--
2.37.4

2022-12-13 18:16:11

by Peter Jones

[permalink] [raw]
Subject: Re: [PATCH v3 00/24] x86_64: Improvements at compressed kernel stage

On Tue, Nov 22, 2022 at 02:12:09PM +0300, Evgeniy Baskov wrote:
> This patchset is aimed
> * to improve UEFI compatibility of compressed kernel code for x86_64
> * to setup proper memory access attributes for code and rodata sections
> * to implement W^X protection policy throughout the whole execution
> of compressed kernel for EFISTUB code path.

Hi Evgeniy,

I've tested this patch set on hardware and QEMU+MU firmware, and it
works for me with a couple of minor issues:

- on one machine that has the DXE protocol but not the EFI one, we get
an error because the firmware doesn't support EFI_MEMORY_RP
- on QEMU I'm seeing the size of "(unsigned long)_head - image_base"
wind up as 0, which leads to an EFI_INVALID_PARAMETER on the
clear_memory_attributes() call.

In both cases the system winds up working, but with unnecessary console
output. I'll send you patches as a follow-up to this mail. In the mean
time:

Tested-by: Peter Jones <[email protected]>

--
Peter

2022-12-13 18:28:45

by Peter Jones

[permalink] [raw]
Subject: Re: [PATCH v3 00/24] x86_64: Improvements at compressed kernel stage

On Tue, Dec 13, 2022 at 01:03:17PM -0500, Peter Jones wrote:
> On Tue, Nov 22, 2022 at 02:12:09PM +0300, Evgeniy Baskov wrote:
> > This patchset is aimed
> > * to improve UEFI compatibility of compressed kernel code for x86_64
> > * to setup proper memory access attributes for code and rodata sections
> > * to implement W^X protection policy throughout the whole execution
> > of compressed kernel for EFISTUB code path.
>
> Hi Evgeniy,
>
> I've tested this patch set on hardware and QEMU+MU firmware, and it
> works for me with a couple of minor issues:
>
> - on one machine that has the DXE protocol but not the EFI one, we get
> an error because the firmware doesn't support EFI_MEMORY_RP
> - on QEMU I'm seeing the size of "(unsigned long)_head - image_base"
> wind up as 0, which leads to an EFI_INVALID_PARAMETER on the
> clear_memory_attributes() call.
>
> In both cases the system winds up working, but with unnecessary console
> output.

I just realized I've overstated here - I haven't actually hit the first
problem on x86, only on ARM, where we don't currently use this code. I
discovered it in grub, and checked your patch set to see if you had the
same issue I did. That said, "in both cases the system winds up
working" is probably still true - in that the edk2 code supports
EFI_MEMORY_RP on x86 but not ARM, so x86 won't hit the issue when using
DXE unless someone cooks up another implementation. Nevertheless I
believe the patch to fix it is correct and should be applied.

Thanks!

--
Peter

2022-12-13 18:30:24

by Peter Jones

[permalink] [raw]
Subject: [PATCH 1/2] efi/x86: don't set unsupported memory attributes

On platforms where the firmware uses DXE, but which do not implement the
EFI Memory Attribute Protocol, we implement W^X support using DXE's
set_memory_attributes() call. This call will fail without making any
changes if an attribute is set that isn't supported on the platform.

This patch changes efi_adjust_memory_range_protection() to avoid trying
to set any attribute bits that aren't set in the memory region's
capability flags.

Signed-off-by: Peter Jones <[email protected]>
---
drivers/firmware/efi/libstub/mem.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/firmware/efi/libstub/mem.c b/drivers/firmware/efi/libstub/mem.c
index d10996e4eb0..d52841adcc2 100644
--- a/drivers/firmware/efi/libstub/mem.c
+++ b/drivers/firmware/efi/libstub/mem.c
@@ -192,6 +192,7 @@ static efi_status_t adjust_mem_attrib_dxe(efi_physical_addr_t rounded_start,

desc.attributes &= ~(EFI_MEMORY_RO | EFI_MEMORY_XP);
desc.attributes |= attributes;
+ desc.attributes &= desc.capabilities;

unprotect_start = max(rounded_start, desc.base_address);
unprotect_size = min(rounded_end, next) - unprotect_start;
--
2.38.1

2022-12-13 22:24:31

by Evgeniy Baskov

[permalink] [raw]
Subject: Re: [PATCH v3 00/24] x86_64: Improvements at compressed kernel stage

On 2022-12-13 21:13, Peter Jones wrote:
> On Tue, Dec 13, 2022 at 01:03:17PM -0500, Peter Jones wrote:
>> On Tue, Nov 22, 2022 at 02:12:09PM +0300, Evgeniy Baskov wrote:
>> > This patchset is aimed
>> > * to improve UEFI compatibility of compressed kernel code for x86_64
>> > * to setup proper memory access attributes for code and rodata sections
>> > * to implement W^X protection policy throughout the whole execution
>> > of compressed kernel for EFISTUB code path.
>>
>> Hi Evgeniy,
>>
>> I've tested this patch set on hardware and QEMU+MU firmware, and it
>> works for me with a couple of minor issues:
>>
>> - on one machine that has the DXE protocol but not the EFI one, we get
>> an error because the firmware doesn't support EFI_MEMORY_RP
>> - on QEMU I'm seeing the size of "(unsigned long)_head - image_base"
>> wind up as 0, which leads to an EFI_INVALID_PARAMETER on the
>> clear_memory_attributes() call.
>>
>> In both cases the system winds up working, but with unnecessary
>> console
>> output.
>
> I just realized I've overstated here - I haven't actually hit the first
> problem on x86, only on ARM, where we don't currently use this code. I
> discovered it in grub, and checked your patch set to see if you had the
> same issue I did. That said, "in both cases the system winds up
> working" is probably still true - in that the edk2 code supports
> EFI_MEMORY_RP on x86 but not ARM, so x86 won't hit the issue when using
> DXE unless someone cooks up another implementation. Nevertheless I
> believe the patch to fix it is correct and should be applied.
>
> Thanks!

Hi,

Thank you for testing and fixes!

I have also discovered one issue with v3, that can only be hit when
booting with grub -- there's one kernel_add_identity_map() missing in
the get_acpi_srat_table() function, before header->length is read.
So I'll prepare the v4 soon and include your new patches there.

Thanks,
Evgeniy Baskov

2022-12-14 10:36:57

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH v3 00/24] x86_64: Improvements at compressed kernel stage

On Tue, 13 Dec 2022 at 23:16, Evgeniy Baskov <[email protected]> wrote:
>
> On 2022-12-13 21:13, Peter Jones wrote:
> > On Tue, Dec 13, 2022 at 01:03:17PM -0500, Peter Jones wrote:
> >> On Tue, Nov 22, 2022 at 02:12:09PM +0300, Evgeniy Baskov wrote:
> >> > This patchset is aimed
> >> > * to improve UEFI compatibility of compressed kernel code for x86_64
> >> > * to setup proper memory access attributes for code and rodata sections
> >> > * to implement W^X protection policy throughout the whole execution
> >> > of compressed kernel for EFISTUB code path.
> >>
> >> Hi Evgeniy,
> >>
> >> I've tested this patch set on hardware and QEMU+MU firmware, and it
> >> works for me with a couple of minor issues:
> >>
> >> - on one machine that has the DXE protocol but not the EFI one, we get
> >> an error because the firmware doesn't support EFI_MEMORY_RP
> >> - on QEMU I'm seeing the size of "(unsigned long)_head - image_base"
> >> wind up as 0, which leads to an EFI_INVALID_PARAMETER on the
> >> clear_memory_attributes() call.
> >>
> >> In both cases the system winds up working, but with unnecessary
> >> console
> >> output.
> >
> > I just realized I've overstated here - I haven't actually hit the first
> > problem on x86, only on ARM, where we don't currently use this code. I
> > discovered it in grub, and checked your patch set to see if you had the
> > same issue I did. That said, "in both cases the system winds up
> > working" is probably still true - in that the edk2 code supports
> > EFI_MEMORY_RP on x86 but not ARM, so x86 won't hit the issue when using
> > DXE unless someone cooks up another implementation. Nevertheless I
> > believe the patch to fix it is correct and should be applied.
> >
> > Thanks!
>
> Hi,
>
> Thank you for testing and fixes!
>
> I have also discovered one issue with v3, that can only be hit when
> booting with grub -- there's one kernel_add_identity_map() missing in
> the get_acpi_srat_table() function, before header->length is read.
> So I'll prepare the v4 soon and include your new patches there.
>

Happy to see this is converging. Please rebase onto latest mainline as
well - some cleanups to the early boot code have landed there
yesterday.

2022-12-14 12:26:38

by Evgeniy Baskov

[permalink] [raw]
Subject: Re: [PATCH v3 00/24] x86_64: Improvements at compressed kernel stage

On 2022-12-14 13:09, Ard Biesheuvel wrote:
> On Tue, 13 Dec 2022 at 23:16, Evgeniy Baskov <[email protected]> wrote:
>>
>> On 2022-12-13 21:13, Peter Jones wrote:
>> > On Tue, Dec 13, 2022 at 01:03:17PM -0500, Peter Jones wrote:
>> >> On Tue, Nov 22, 2022 at 02:12:09PM +0300, Evgeniy Baskov wrote:
>> >> > This patchset is aimed
>> >> > * to improve UEFI compatibility of compressed kernel code for x86_64
>> >> > * to setup proper memory access attributes for code and rodata sections
>> >> > * to implement W^X protection policy throughout the whole execution
>> >> > of compressed kernel for EFISTUB code path.
>> >>
>> >> Hi Evgeniy,
>> >>
>> >> I've tested this patch set on hardware and QEMU+MU firmware, and it
>> >> works for me with a couple of minor issues:
>> >>
>> >> - on one machine that has the DXE protocol but not the EFI one, we get
>> >> an error because the firmware doesn't support EFI_MEMORY_RP
>> >> - on QEMU I'm seeing the size of "(unsigned long)_head - image_base"
>> >> wind up as 0, which leads to an EFI_INVALID_PARAMETER on the
>> >> clear_memory_attributes() call.
>> >>
>> >> In both cases the system winds up working, but with unnecessary
>> >> console
>> >> output.
>> >
>> > I just realized I've overstated here - I haven't actually hit the first
>> > problem on x86, only on ARM, where we don't currently use this code. I
>> > discovered it in grub, and checked your patch set to see if you had the
>> > same issue I did. That said, "in both cases the system winds up
>> > working" is probably still true - in that the edk2 code supports
>> > EFI_MEMORY_RP on x86 but not ARM, so x86 won't hit the issue when using
>> > DXE unless someone cooks up another implementation. Nevertheless I
>> > believe the patch to fix it is correct and should be applied.
>> >
>> > Thanks!
>>
>> Hi,
>>
>> Thank you for testing and fixes!
>>
>> I have also discovered one issue with v3, that can only be hit when
>> booting with grub -- there's one kernel_add_identity_map() missing in
>> the get_acpi_srat_table() function, before header->length is read.
>> So I'll prepare the v4 soon and include your new patches there.
>>
>
> Happy to see this is converging. Please rebase onto latest mainline as
> well - some cleanups to the early boot code have landed there
> yesterday.

Will do, thanks!