Update the x86 boot path to avoid the bare metal decompressor when
booting via the EFI stub. The bare metal decompressor inherits the
loader's 1:1 mapping of DRAM when entering in 64-bit mode, and assumes
that all of it is mapped read/write/execute, which will no longer be the
case on systems built to comply with recently tightened logo
requirements (*).
Changes since v3 [6]:
- trivial rebase onto Kirill's unaccepted memory series v13
- test SNP feature mask while running in the EFI boot services, and fail
gracefully on a mismatch
- perform only the SEV init after ExitBootServices()
Changes since v2 [4]:
- update prose style to comply with -tip guidelines
- rebased onto Kirill's unaccepted memory series [3]
- add Kirill's ack to 4/5-level paging changes
- perform SEV init and SNP feature check after ExitBootServices(), to
avoid corrupting the firmware's own SEV state
- split out preparatory refactor of handover entry code and BSS clearing
(patches #1 to #4)
Changes since v1 [2]:
- streamline existing 4/5 level switching code and call it directly from
the EFI stub - this is covered by the first 9 patches, which can be
applied in isolation, if desired;
- deal with SEV/SNP init explicitly;
- clear BSS when booting via the 'handover protocol'
- switch to kernel CS before calling SEV init code in kernel proper.
---- v1 cover letter follows ----
This series is conceptually a combination of Evgeny's series [0] and
mine [1], both of which attempt to make the early decompressor code more
amenable to executing in the EFI environment with stricter handling of
memory permissions.
My series [1] implemented zboot for x86, by getting rid of the entire
x86 decompressor, and replacing it with existing EFI code that does the
same but in a generic way. The downside of this is that only EFI boot is
supported, making it unviable for distros, which need to support BIOS
boot and hybrid EFI boot modes that omit the EFI stub.
Evgeny's series [0] adapted the entire decompressor code flow to allow
it to execute in the EFI context as well as the bare metal context, and
this involves changes to the 1:1 mapping code and the page fault
handlers etc, none of which are really needed when doing EFI boot in the
first place.
So this series attempts to occupy the middle ground here: it makes
minimal changes to the existing decompressor so some of it can be called
from the EFI stub. Then, it reimplements the EFI boot flow to decompress
the kernel and boot it directly, without relying on the trampoline
allocation code, page table code or page fault handling code. This
allows us to get rid of quite a bit of unsavory EFI stub code, and
replace it with two clear invocations of the EFI firmware APIs to clear
NX restrictions from allocations that have been populated with
executable code.
The only code that is being reused is the decompression library itself,
along with the minimal ELF parsing that is required to copy the ELF
segments in place, and the relocation processing that fixes up absolute
symbol references to refer to the correct virtual addresses.
Note that some of Evgeny's changes to clean up the PE/COFF header
generation will still be needed, but I've omitted those here for
brevity.
(*) IMHO the following developments are likely to occur:
- the Windows boot chain (along with 3rd party drivers) is cleaned up so
that it never relies on memory being writable and executable at the
same time when running under the EFI boot services;
- the EFI reference implementation gets updated to map all memory NX by
default, and to require read-only permissions for executable mappings;
- BIOS vendors incorporate these changes into their codebases, and
deploy it more widely than just the 'secure' SKUs;
- OEMs only care about the Windows sticker [5], so they only boot test
Windows, which works fine in this more restricted context;
- Linux boot no longer works reliably on new hardware built for Windows
unless we clean up our boot chain as well.
Cc: Evgeniy Baskov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Alexey Khoroshilov <[email protected]>
Cc: Peter Jones <[email protected]>
Cc: Gerd Hoffmann <[email protected]>
Cc: Dave Young <[email protected]>
Cc: Mario Limonciello <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Joerg Roedel <[email protected]>
[0] https://lore.kernel.org/all/[email protected]/
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/
[5] https://techcommunity.microsoft.com/t5/hardware-dev-center/new-uefi-ca-memory-mitigation-requirements-for-signing/ba-p/3608714
[6] https://lore.kernel.org/all/[email protected]/
Ard Biesheuvel (21):
x86/efistub: Branch straight to kernel entry point from C code
x86/efistub: Simplify and clean up handover entry code
x86/decompressor: Avoid magic offsets for EFI handover entrypoint
x86/efistub: Clear BSS in EFI handover protocol entrypoint
x86/decompressor: Use proper sequence to take the address of the GOT
x86/decompressor: Store boot_params pointer in callee save register
x86/decompressor: Call trampoline as a normal function
x86/decompressor: Use standard calling convention for trampoline
x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
x86/decompressor: Call trampoline directly from C code
x86/decompressor: Only call the trampoline when changing paging levels
x86/decompressor: Merge trampoline cleanup with switching code
x86/efistub: Perform 4/5 level paging switch from the stub
x86/efistub: Prefer EFI memory attributes protocol over DXE services
decompress: Use 8 byte alignment
x86/decompressor: Move global symbol references to C code
x86/decompressor: Factor out kernel decompression and relocation
x86/head_64: Store boot_params pointer in callee-preserved register
efi/libstub: Add limit argument to efi_random_alloc()
x86/efistub: Perform SNP feature test while running in the firmware
x86/efistub: Avoid legacy decompressor when doing EFI boot
Documentation/arch/x86/boot.rst | 2 +-
arch/x86/boot/compressed/Makefile | 5 +
arch/x86/boot/compressed/efi_mixed.S | 107 +++-----
arch/x86/boot/compressed/head_32.S | 34 +--
arch/x86/boot/compressed/head_64.S | 219 ++++-----------
arch/x86/boot/compressed/misc.c | 44 +++-
arch/x86/boot/compressed/pgtable.h | 6 +-
arch/x86/boot/compressed/pgtable_64.c | 72 +++--
arch/x86/boot/compressed/sev.c | 74 ++++--
arch/x86/include/asm/boot.h | 8 +
arch/x86/include/asm/efi.h | 7 +-
arch/x86/include/asm/sev.h | 6 +
arch/x86/kernel/head_64.S | 15 +-
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/firmware/efi/libstub/arm64-stub.c | 2 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 2 +
drivers/firmware/efi/libstub/efistub.h | 3 +-
drivers/firmware/efi/libstub/randomalloc.c | 10 +-
drivers/firmware/efi/libstub/x86-5lvl.c | 94 +++++++
drivers/firmware/efi/libstub/x86-stub.c | 278 +++++++++++---------
drivers/firmware/efi/libstub/x86-stub.h | 17 ++
drivers/firmware/efi/libstub/zboot.c | 2 +-
include/linux/decompress/mm.h | 2 +-
23 files changed, 511 insertions(+), 499 deletions(-)
create mode 100644 drivers/firmware/efi/libstub/x86-5lvl.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
--
2.39.2
Currently, the EFI stub relies on DXE services in some cases to clear
non-execute restrictions from page allocations that need to be
executable. This is dodgy, because DXE services are not specified by
UEFI but by PI, and they are not intended for consumption by OS loaders.
However, no alternative existed at the time.
Now, there is a new UEFI protocol that should be used instead, so if it
exists, prefer it over the DXE services calls.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 29 ++++++++++++++------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index c55c028cf911bce0..2d3282d2ed6eb756 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -26,6 +26,7 @@ const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
+static efi_memory_attribute_protocol_t *memattr;
static efi_status_t
preserve_pci_rom_image(efi_pci_io_protocol_t *pci, struct pci_setup_rom **__rom)
@@ -222,12 +223,18 @@ void efi_adjust_memory_range_protection(unsigned long start,
unsigned long rounded_start, rounded_end;
unsigned long unprotect_start, unprotect_size;
- if (efi_dxe_table == NULL)
- return;
-
rounded_start = rounddown(start, EFI_PAGE_SIZE);
rounded_end = roundup(start + size, EFI_PAGE_SIZE);
+ if (memattr != NULL) {
+ efi_call_proto(memattr, clear_memory_attributes, rounded_start,
+ rounded_end - rounded_start, EFI_MEMORY_XP);
+ return;
+ }
+
+ if (efi_dxe_table == NULL)
+ return;
+
/*
* Don't modify memory region attributes, they are
* already suitable, to lower the possibility to
@@ -766,6 +773,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_system_table_t *sys_table_arg,
struct boot_params *boot_params)
{
+ efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
unsigned long bzimage_addr = (unsigned long)startup_32;
unsigned long buffer_start, buffer_end;
struct setup_header *hdr = &boot_params->hdr;
@@ -777,13 +785,18 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
- efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
- if (efi_dxe_table &&
- efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
- efi_warn("Ignoring DXE services table: invalid signature\n");
- efi_dxe_table = NULL;
+ if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
+ efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+ if (efi_dxe_table &&
+ efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
+ efi_warn("Ignoring DXE services table: invalid signature\n");
+ efi_dxe_table = NULL;
+ }
}
+ /* grab the memory attributes protocol if it exists */
+ efi_bs_call(locate_protocol, &guid, NULL, (void **)&memattr);
+
status = efi_setup_5level_paging();
if (status != EFI_SUCCESS) {
efi_err("efi_setup_5level_paging() failed!\n");
--
2.39.2
The bare metal decompressor code was never really intended to run in a
hosted environment such as the EFI boot services, and does a few things
that are problematic in the context of EFI boot now that the logo
requirements are getting tighter.
In particular, the decompressor moves its own executable image around in
memory, and relies on demand paging to populate the identity mappings,
and these things are difficult to support in a context where memory is
not permitted to be mapped writable and executable at the same time or,
at the very least, is mapped non-executable by default, and needs
special treatment for this restriction to be lifted.
Since EFI already maps all of memory 1:1, it is unnecessary to create
new page tables or handle page faults when decompressing the kernel.
That means there is also no need to replace the special exception
handlers for SEV. Generally, there is little need to do anything that
the decompressor does beyond
- initialize SEV encryption, if needed,
- perform the 4/5 level paging switch, if needed,
- decompress the kernel
- relocate the kernel
So do all of this from the EFI stub code, and avoid the bare metal
decompressor altogether.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/Makefile | 5 +
arch/x86/boot/compressed/efi_mixed.S | 55 -------
arch/x86/boot/compressed/head_32.S | 13 --
arch/x86/boot/compressed/head_64.S | 27 ----
arch/x86/include/asm/efi.h | 7 +-
arch/x86/include/asm/sev.h | 2 +
drivers/firmware/efi/libstub/x86-stub.c | 167 +++++++++-----------
7 files changed, 84 insertions(+), 192 deletions(-)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index b13a580210867ffb..535608fe72e11265 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -74,6 +74,11 @@ LDFLAGS_vmlinux += -z noexecstack
ifeq ($(CONFIG_LD_IS_BFD),y)
LDFLAGS_vmlinux += $(call ld-option,--no-warn-rwx-segments)
endif
+ifeq ($(CONFIG_EFI_STUB),y)
+# ensure that the static EFI stub library will be pulled in, even if it is
+# never referenced explicitly from the startup code
+LDFLAGS_vmlinux += -u efi_pe_entry
+endif
LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 8a02a151806df14c..f4e22ef774ab6b4a 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -269,10 +269,6 @@ SYM_FUNC_START_LOCAL(efi32_entry)
jmp startup_32
SYM_FUNC_END(efi32_entry)
-#define ST32_boottime 60 // offsetof(efi_system_table_32_t, boottime)
-#define BS32_handle_protocol 88 // offsetof(efi_boot_services_32_t, handle_protocol)
-#define LI32_image_base 32 // offsetof(efi_loaded_image_32_t, image_base)
-
/*
* efi_status_t efi32_pe_entry(efi_handle_t image_handle,
* efi_system_table_32_t *sys_table)
@@ -280,8 +276,6 @@ SYM_FUNC_END(efi32_entry)
SYM_FUNC_START(efi32_pe_entry)
pushl %ebp
movl %esp, %ebp
- pushl %eax // dummy push to allocate loaded_image
-
pushl %ebx // save callee-save registers
pushl %edi
@@ -290,48 +284,8 @@ SYM_FUNC_START(efi32_pe_entry)
movl $0x80000003, %eax // EFI_UNSUPPORTED
jnz 2f
- call 1f
-1: pop %ebx
-
- /* Get the loaded image protocol pointer from the image handle */
- leal -4(%ebp), %eax
- pushl %eax // &loaded_image
- leal (loaded_image_proto - 1b)(%ebx), %eax
- pushl %eax // pass the GUID address
- pushl 8(%ebp) // pass the image handle
-
- /*
- * Note the alignment of the stack frame.
- * sys_table
- * handle <-- 16-byte aligned on entry by ABI
- * return address
- * frame pointer
- * loaded_image <-- local variable
- * saved %ebx <-- 16-byte aligned here
- * saved %edi
- * &loaded_image
- * &loaded_image_proto
- * handle <-- 16-byte aligned for call to handle_protocol
- */
-
- movl 12(%ebp), %eax // sys_table
- movl ST32_boottime(%eax), %eax // sys_table->boottime
- call *BS32_handle_protocol(%eax) // sys_table->boottime->handle_protocol
- addl $12, %esp // restore argument space
- testl %eax, %eax
- jnz 2f
-
movl 8(%ebp), %ecx // image_handle
movl 12(%ebp), %edx // sys_table
- movl -4(%ebp), %esi // loaded_image
- movl LI32_image_base(%esi), %esi // loaded_image->image_base
- leal (startup_32 - 1b)(%ebx), %ebp // runtime address of startup_32
- /*
- * We need to set the image_offset variable here since startup_32() will
- * use it before we get to the 64-bit efi_pe_entry() in C code.
- */
- subl %esi, %ebp // calculate image_offset
- movl %ebp, (image_offset - 1b)(%ebx) // save image_offset
xorl %esi, %esi
jmp efi32_entry // pass %ecx, %edx, %esi
// no other registers remain live
@@ -350,15 +304,6 @@ SYM_FUNC_START_NOALIGN(efi64_stub_entry)
SYM_FUNC_END(efi64_stub_entry)
#endif
- .section ".rodata"
- /* EFI loaded image protocol GUID */
- .balign 4
-SYM_DATA_START_LOCAL(loaded_image_proto)
- .long 0x5b1b31a1
- .word 0x9562, 0x11d2
- .byte 0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b
-SYM_DATA_END(loaded_image_proto)
-
.data
.balign 8
SYM_DATA_START_LOCAL(efi32_boot_gdt)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index beee858058df4403..cd9587fcd5084f22 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -84,19 +84,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
leal startup_32@GOTOFF(%edx), %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32() will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry() will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl image_offset@GOTOFF(%edx), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 3074d278c7e665d8..aee518e71cdb7e75 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -146,19 +146,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
movl %ebp, %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl rva(image_offset)(%ebp), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
@@ -335,20 +322,6 @@ SYM_CODE_START(startup_64)
/* Start with the delta to where the kernel will run at. */
#ifdef CONFIG_RELOCATABLE
leaq startup_32(%rip) /* - $startup_32 */, %rbp
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- movl image_offset(%rip), %eax
- subq %rax, %rbp
-#endif
-
movl BP_kernel_alignment(%rsi), %eax
decl %eax
addq %rax, %rbp
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8b4be7cecdb8eb73..b0994ae3bc23f84d 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,6 +90,8 @@ static inline void efi_fpu_end(void)
}
#ifdef CONFIG_X86_32
+#define EFI_X86_KERNEL_ALLOC_LIMIT (SZ_512M - 1)
+
#define arch_efi_call_virt_setup() \
({ \
efi_fpu_begin(); \
@@ -103,8 +105,7 @@ static inline void efi_fpu_end(void)
})
#else /* !CONFIG_X86_32 */
-
-#define EFI_LOADER_SIGNATURE "EL64"
+#define EFI_X86_KERNEL_ALLOC_LIMIT EFI_ALLOC_LIMIT
extern asmlinkage u64 __efi_call(void *fp, ...);
@@ -218,6 +219,8 @@ efi_status_t efi_set_virtual_address_map(unsigned long memory_map_size,
#ifdef CONFIG_EFI_MIXED
+#define EFI_ALLOC_LIMIT (efi_is_64bit() ? ULONG_MAX : U32_MAX)
+
#define ARCH_HAS_EFISTUB_WRAPPERS
static inline bool efi_is_64bit(void)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index e5aad673194698b8..bb1ed2a8b8fb122d 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -157,6 +157,7 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern void sev_enable(struct boot_params *bp);
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs)
{
@@ -210,6 +211,7 @@ static inline void sev_es_ist_exit(void) { }
static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline void sev_enable(struct boot_params *bp) { }
static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
static inline void setup_ghcb(void) { }
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index f9d203b5ee6236e8..ed91a2a4984d4bfb 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,17 +15,14 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/kaslr.h>
#include <asm/sev.h>
#include "efistub.h"
#include "x86-stub.h"
-/* Maximum physical address for 64-bit kernel with 4-level paging */
-#define MAXMEM_X86_64_4LEVEL (1ull << 46)
-
const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
-u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
static efi_memory_attribute_protocol_t *memattr;
@@ -276,33 +273,9 @@ void efi_adjust_memory_range_protection(unsigned long start,
}
}
-void startup_32(struct boot_params *boot_params);
-
-static void
-setup_memory_protection(unsigned long image_base, unsigned long image_size)
-{
-#ifdef CONFIG_64BIT
- if (image_base != (unsigned long)startup_32)
- efi_adjust_memory_range_protection(image_base, image_size);
-#else
- /*
- * Clear protection flags on a whole range of possible
- * addresses used for KASLR. We don't need to do that
- * on x86_64, since KASLR/extraction is performed after
- * dedicated identity page tables are built and we only
- * need to remove possible protection on relocated image
- * itself disregarding further relocations.
- */
- efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
-#endif
-}
-
static const efi_char16_t apple[] = L"Apple";
-static void setup_quirks(struct boot_params *boot_params,
- unsigned long image_base,
- unsigned long image_size)
+static void setup_quirks(struct boot_params *boot_params)
{
efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
efi_table_attr(efi_system_table, fw_vendor);
@@ -311,9 +284,6 @@ static void setup_quirks(struct boot_params *boot_params,
if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
retrieve_apple_device_properties(boot_params);
}
-
- if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
- setup_memory_protection(image_base, image_size);
}
/*
@@ -466,7 +436,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
}
image_base = efi_table_attr(image, image_base);
- image_offset = (void *)startup_32 - image_base;
status = efi_allocate_pages(sizeof(struct boot_params),
(unsigned long *)&boot_params, ULONG_MAX);
@@ -770,6 +739,61 @@ static bool have_unsupported_snp_features(void)
return false;
}
+static void efi_get_seed(void *seed, int size)
+{
+ efi_get_random_bytes(size, seed);
+
+ /*
+ * This only updates seed[0] when running on 32-bit, but in that case,
+ * we don't use seed[1] anyway, as there is no virtual KASLR on 32-bit.
+ */
+ *(unsigned long *)seed ^= kaslr_get_random_long("EFI");
+}
+
+static void error(char *str)
+{
+ efi_warn("Decompression failed: %s\n", str);
+}
+
+static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
+{
+ unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+ unsigned long addr, alloc_size, entry;
+ efi_status_t status;
+ u32 seed[2] = {};
+
+ /* determine the required size of the allocation */
+ alloc_size = ALIGN(max((unsigned long)output_len, kernel_total_size),
+ MIN_KERNEL_ALIGN);
+
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) {
+ u64 range = KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR - kernel_total_size;
+
+ efi_get_seed(seed, sizeof(seed));
+
+ virt_addr += (range * seed[1]) >> 32;
+ virt_addr &= ~(CONFIG_PHYSICAL_ALIGN - 1);
+ }
+
+ status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
+ seed[0], EFI_LOADER_CODE,
+ EFI_X86_KERNEL_ALLOC_LIMIT);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ entry = decompress_kernel((void *)addr, virt_addr, error);
+ if (entry == ULONG_MAX) {
+ efi_free(alloc_size, addr);
+ return EFI_LOAD_ERROR;
+ }
+
+ *kernel_entry = addr + entry;
+
+ efi_adjust_memory_range_protection(addr, kernel_total_size);
+
+ return EFI_SUCCESS;
+}
+
static void __noreturn enter_kernel(unsigned long kernel_addr,
struct boot_params *boot_params)
{
@@ -788,10 +812,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
struct boot_params *boot_params)
{
efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
- unsigned long bzimage_addr = (unsigned long)startup_32;
- unsigned long buffer_start, buffer_end;
struct setup_header *hdr = &boot_params->hdr;
const struct linux_efi_initrd *initrd = NULL;
+ unsigned long kernel_entry;
efi_status_t status;
efi_system_table = sys_table_arg;
@@ -820,60 +843,6 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
- /*
- * If the kernel isn't already loaded at a suitable address,
- * relocate it.
- *
- * It must be loaded above LOAD_PHYSICAL_ADDR.
- *
- * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
- * is defined as the macro MAXMEM, but unfortunately that is not a
- * compile-time constant if 5-level paging is configured, so we instead
- * define our own macro for use here.
- *
- * For 32-bit, the maximum address is complicated to figure out, for
- * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
- * KASLR uses.
- *
- * Also relocate it if image_offset is zero, i.e. the kernel wasn't
- * loaded by LoadImage, but rather by a bootloader that called the
- * handover entry. The reason we must always relocate in this case is
- * to handle the case of systemd-boot booting a unified kernel image,
- * which is a PE executable that contains the bzImage and an initrd as
- * COFF sections. The initrd section is placed after the bzImage
- * without ensuring that there are at least init_size bytes available
- * for the bzImage, and thus the compressed kernel's startup code may
- * overwrite the initrd unless it is moved out of the way.
- */
-
- buffer_start = ALIGN(bzimage_addr - image_offset,
- hdr->kernel_alignment);
- buffer_end = buffer_start + hdr->init_size;
-
- if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
- (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
- (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
- (image_offset == 0)) {
- extern char _bss[];
-
- status = efi_relocate_kernel(&bzimage_addr,
- (unsigned long)_bss - bzimage_addr,
- hdr->init_size,
- hdr->pref_address,
- hdr->kernel_alignment,
- LOAD_PHYSICAL_ADDR);
- if (status != EFI_SUCCESS) {
- efi_err("efi_relocate_kernel() failed!\n");
- goto fail;
- }
- /*
- * Now that we've copied the kernel elsewhere, we no longer
- * have a set up block before startup_32(), so reset image_offset
- * to zero in case it was set earlier.
- */
- image_offset = 0;
- }
-
#ifdef CONFIG_CMDLINE_BOOL
status = efi_parse_options(CONFIG_CMDLINE);
if (status != EFI_SUCCESS) {
@@ -891,6 +860,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
}
+ status = efi_decompress_kernel(&kernel_entry);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to decompress kernel\n");
+ goto fail;
+ }
+
/*
* At this point, an initrd may already have been loaded by the
* bootloader and passed via bootparams. We permit an initrd loaded
@@ -930,7 +905,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
setup_efi_pci(boot_params);
- setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
+ setup_quirks(boot_params);
status = exit_boot(boot_params, handle);
if (status != EFI_SUCCESS) {
@@ -938,13 +913,15 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
+ /*
+ * Call the SEV init code while still running with the firmware's
+ * GDT/IDT, so #VC exceptions will be handled by EFI.
+ */
+ sev_enable(boot_params);
+
efi_5level_switch();
- if (IS_ENABLED(CONFIG_X86_64))
- /* add offset of startup_64() */
- bzimage_addr += 0x200;
-
- enter_kernel(bzimage_addr, boot_params);
+ enter_kernel(kernel_entry, boot_params);
fail:
efi_err("efi_stub_entry() failed!\n");
--
2.39.2
The native 32-bit or 64-bit EFI handover protocol entrypoint offset
relative to the respective startup_32/64 address is described in
boot_params as handover_offset, so that the special Linux/x86 aware EFI
loader can find it there.
When mixed mode is enabled, this single field has to describe this
offset for both the 32-bit and 64-bit entrypoints, so their respective
relative offsets have to be identical. Given that startup_32 and
startup_64 are 0x200 bytes apart, and the EFI handover entrypoint
resides at a fixed offset, the 32-bit and 64-bit versions of those
entrypoints must be exactly 0x200 bytes apart as well.
Currently, hard-coded fixed offsets are used to ensure this, but it is
sufficient to emit the 64-bit entrypoint 0x200 bytes after the 32-bit
one, wherever it happens to reside. This allows this code (which is now
EFI mixed mode specific) to be moved into efi_mixed.S and out of the
startup code in head_64.S.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/efi_mixed.S | 20 +++++++++++++++++++-
arch/x86/boot/compressed/head_64.S | 18 ------------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index dcc562c8f7f35162..9308b595f6f0a5de 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -140,6 +140,16 @@ SYM_FUNC_START(__efi64_thunk)
SYM_FUNC_END(__efi64_thunk)
.code32
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+SYM_FUNC_START(efi32_stub_entry)
+ add $0x4, %esp /* Discard return address */
+ popl %ecx
+ popl %edx
+ popl %esi
+ jmp efi32_entry
+SYM_FUNC_END(efi32_stub_entry)
+#endif
+
/*
* EFI service pointer must be in %edi.
*
@@ -220,7 +230,7 @@ SYM_FUNC_END(efi_enter32)
* stub may still exit and return to the firmware using the Exit() EFI boot
* service.]
*/
-SYM_FUNC_START(efi32_entry)
+SYM_FUNC_START_LOCAL(efi32_entry)
call 1f
1: pop %ebx
@@ -320,6 +330,14 @@ SYM_FUNC_START(efi32_pe_entry)
RET
SYM_FUNC_END(efi32_pe_entry)
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+ .org efi32_stub_entry + 0x200
+ .code64
+SYM_FUNC_START_NOALIGN(efi64_stub_entry)
+ jmp efi_stub_entry
+SYM_FUNC_END(efi64_stub_entry)
+#endif
+
.section ".rodata"
/* EFI loaded image protocol GUID */
.balign 4
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 71c1f40a7ac067b9..9f90661744741210 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -294,17 +294,6 @@ SYM_FUNC_START(startup_32)
lret
SYM_FUNC_END(startup_32)
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
- .org 0x190
-SYM_FUNC_START(efi32_stub_entry)
- add $0x4, %esp /* Discard return address */
- popl %ecx
- popl %edx
- popl %esi
- jmp efi32_entry
-SYM_FUNC_END(efi32_stub_entry)
-#endif
-
.code64
.org 0x200
SYM_CODE_START(startup_64)
@@ -523,13 +512,6 @@ trampoline_return:
jmp *%rax
SYM_CODE_END(startup_64)
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
- .org 0x390
-SYM_FUNC_START(efi64_stub_entry)
- jmp efi_stub_entry
-SYM_FUNC_END(efi64_stub_entry)
-#endif
-
.text
SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
--
2.39.2
In preparation for updating the EFI stub boot flow to avoid the bare
metal decompressor code altogether, implement the support code for
switching between 4 and 5 levels of paging before jumping to the kernel
proper.
This reuses the newly refactored trampoline that the bare metal
decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
memory and remap it with the appropriate permissions. Given that the
bare metal decompressor will no longer call into the trampoline if the
number of paging levels is already set correctly, it is no longer needed
to remove NX restrictions from the memory range where this trampoline
may end up.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/firmware/efi/libstub/efi-stub-helper.c | 2 +
drivers/firmware/efi/libstub/efistub.h | 1 +
drivers/firmware/efi/libstub/x86-5lvl.c | 94 ++++++++++++++++++++
drivers/firmware/efi/libstub/x86-stub.c | 40 +++------
drivers/firmware/efi/libstub/x86-stub.h | 17 ++++
6 files changed, 129 insertions(+), 26 deletions(-)
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 16d64a34d1e19465..ae8874401a9f1490 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \
lib-$(CONFIG_ARM) += arm32-stub.o
lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o smbios.o
lib-$(CONFIG_X86) += x86-stub.o
+lib-$(CONFIG_X86_64) += x86-5lvl.o
lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 1e0203d74691ffcc..51779279fbff21b5 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
} else if (!strcmp(param, "noinitrd")) {
efi_noinitrd = true;
+ } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
+ efi_no5lvl = true;
} else if (!strcmp(param, "efi") && val) {
efi_nochunk = parse_option_str(val, "nochunk");
efi_novamap |= parse_option_str(val, "novamap");
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 8659a01664b85d95..191698e8489d82e7 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -33,6 +33,7 @@
#define EFI_ALLOC_LIMIT ULONG_MAX
#endif
+extern bool efi_no5lvl;
extern bool efi_nochunk;
extern bool efi_nokaslr;
extern int efi_loglevel;
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
new file mode 100644
index 0000000000000000..f7284f6270abcc18
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/efi.h>
+
+#include <asm/boot.h>
+#include <asm/desc.h>
+#include <asm/efi.h>
+
+#include "efistub.h"
+#include "x86-stub.h"
+
+bool efi_no5lvl;
+
+static void (*la57_toggle)(void *trampoline, bool enable_5lvl);
+
+static const struct desc_struct gdt[] = {
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+};
+
+/*
+ * Enabling (or disabling) 5 level paging is tricky, because it can only be
+ * done from 32-bit mode with paging disabled. This means not only that the
+ * code itself must be running from 32-bit addressable physical memory, but
+ * also that the root page table must be 32-bit addressable, as we cannot
+ * program a 64-bit value into CR3 when running in 32-bit mode.
+ */
+efi_status_t efi_setup_5level_paging(void)
+{
+ u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
+ efi_status_t status;
+ u8 *la57_code;
+
+ if (!efi_is_64bit())
+ return EFI_SUCCESS;
+
+ /* check for 5 level paging support */
+ if (native_cpuid_eax(0) < 7 ||
+ !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ return EFI_SUCCESS;
+
+ /* allocate some 32-bit addressable memory for code and a page table */
+ status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
+ U32_MAX);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
+ memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
+
+ /*
+ * To avoid having to allocate a 32-bit addressable stack, we use a
+ * ljmp to switch back to long mode. However, this takes an absolute
+ * address, so we have to poke it in at runtime.
+ */
+ *(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
+
+ efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
+
+ return EFI_SUCCESS;
+}
+
+void efi_5level_switch(void)
+{
+ bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
+ bool have_la57 = native_read_cr4() & X86_CR4_LA57;
+ bool need_toggle = want_la57 ^ have_la57;
+ u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
+ u64 *cr3 = (u64 *)__native_read_cr3();
+ u64 *new_cr3;
+
+ if (!la57_toggle || !need_toggle)
+ return;
+
+ if (!have_la57) {
+ /*
+ * We are going to enable 5 level paging, so we need to
+ * allocate a root level page from the 32-bit addressable
+ * physical region, and plug the existing hierarchy into it.
+ */
+ new_cr3 = memset(pgt, 0, PAGE_SIZE);
+ new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
+ } else {
+ // take the new root table pointer from the current entry #0
+ new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+
+ // copy the new root level table if it is not 32-bit addressable
+ if ((u64)new_cr3 > U32_MAX)
+ new_cr3 = memcpy(pgt, new_cr3, PAGE_SIZE);
+ }
+
+ native_load_gdt(&(struct desc_ptr){ sizeof(gdt) - 1, (u64)gdt });
+
+ la57_toggle(new_cr3, want_la57);
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index d010448dffb12cb8..c55c028cf911bce0 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -17,6 +17,7 @@
#include <asm/boot.h>
#include "efistub.h"
+#include "x86-stub.h"
/* Maximum physical address for 64-bit kernel with 4-level paging */
#define MAXMEM_X86_64_4LEVEL (1ull << 46)
@@ -212,8 +213,8 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
}
}
-static void
-adjust_memory_range_protection(unsigned long start, unsigned long size)
+void efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size)
{
efi_status_t status;
efi_gcd_memory_space_desc_t desc;
@@ -267,35 +268,14 @@ adjust_memory_range_protection(unsigned long start, unsigned long size)
}
}
-/*
- * Trampoline takes 2 pages and can be loaded in first megabyte of memory
- * with its end placed between 128k and 640k where BIOS might start.
- * (see arch/x86/boot/compressed/pgtable_64.c)
- *
- * We cannot find exact trampoline placement since memory map
- * can be modified by UEFI, and it can alter the computed address.
- */
-
-#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
-#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-
void startup_32(struct boot_params *boot_params);
static void
setup_memory_protection(unsigned long image_base, unsigned long image_size)
{
- /*
- * Allow execution of possible trampoline used
- * for switching between 4- and 5-level page tables
- * and relocated kernel image.
- */
-
- adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
- TRAMPOLINE_PLACEMENT_SIZE);
-
#ifdef CONFIG_64BIT
if (image_base != (unsigned long)startup_32)
- adjust_memory_range_protection(image_base, image_size);
+ efi_adjust_memory_range_protection(image_base, image_size);
#else
/*
* Clear protection flags on a whole range of possible
@@ -305,8 +285,8 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
* need to remove possible protection on relocated image
* itself disregarding further relocations.
*/
- adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
+ efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
+ KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
#endif
}
@@ -804,6 +784,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_dxe_table = NULL;
}
+ status = efi_setup_5level_paging();
+ if (status != EFI_SUCCESS) {
+ efi_err("efi_setup_5level_paging() failed!\n");
+ goto fail;
+ }
+
/*
* If the kernel isn't already loaded at a suitable address,
* relocate it.
@@ -922,6 +908,8 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
+ efi_5level_switch();
+
if (IS_ENABLED(CONFIG_X86_64))
/* add offset of startup_64() */
bzimage_addr += 0x200;
diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
new file mode 100644
index 0000000000000000..37c5a36b9d8cf9b2
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-stub.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/efi.h>
+
+extern void trampoline_32bit_src(void *, bool);
+extern const u16 trampoline_ljmp_imm_offset;
+
+void efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size);
+
+#ifdef CONFIG_X86_64
+efi_status_t efi_setup_5level_paging(void);
+void efi_5level_switch(void);
+#else
+static inline efi_status_t efi_setup_5level_paging(void) { return EFI_SUCCESS; }
+static inline void efi_5level_switch(void) {}
+#endif
--
2.39.2
Since the current and desired number of paging levels are known when the
trampoline is being prepared, avoid calling the trampoline at all if it
is clear that calling it is not going to result in a change to the
number of paging levels. Given that the CPU is already running in long
mode, the PAE and LA57 settings are necessarily consistent with the
currently active page tables - the only difference is that CR4.MCE will
always be preserved in this case, but it will be cleared by the real
kernel startup code if CONFIG_X86_MCE is not enabled.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 21 +-------------------
arch/x86/boot/compressed/pgtable_64.c | 18 +++++++----------
2 files changed, 8 insertions(+), 31 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a60ec9283bd760e3..403c96dae34d9c6d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -387,10 +387,6 @@ SYM_CODE_START(startup_64)
* For the trampoline, we need the top page table to reside in lower
* memory as we don't have a way to load 64-bit values into CR3 in
* 32-bit mode.
- *
- * We go though the trampoline even if we don't have to: if we're
- * already in a desired paging mode. This way the trampoline code gets
- * tested on every boot.
*/
/* Make sure we have GDT with 32-bit code segment */
@@ -542,25 +538,10 @@ SYM_CODE_START(trampoline_32bit_src)
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- /* Check what paging mode we want to be in after the trampoline */
- testl %esi, %esi
- jz 1f
-
- /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
- movl %cr4, %eax
- testl $X86_CR4_LA57, %eax
- jnz 3f
- jmp 2f
-1:
- /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
- movl %cr4, %eax
- testl $X86_CR4_LA57, %eax
- jz 3f
-2:
/* Point CR3 to the trampoline's new top level page table */
leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
movl %eax, %cr3
-3:
+
/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
movl $MSR_EFER, %ecx
rdmsr
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index b62b6819dcdd01be..b92cf1d6e156d5f6 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -128,6 +128,13 @@ asmlinkage void set_paging_levels(void *rmode)
l5_required = true;
}
+ /*
+ * We are not going to use the trampoline if we
+ * are already in the desired paging mode.
+ */
+ if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+ return;
+
trampoline_32bit = (unsigned long *)find_trampoline_placement();
/* Preserve trampoline memory */
@@ -155,18 +162,8 @@ asmlinkage void set_paging_levels(void *rmode)
*
* The new page table will be used by trampoline code for switching
* from 4- to 5-level paging or vice versa.
- *
- * If switching is not required, the page table is unused: trampoline
- * code wouldn't touch CR3.
*/
- /*
- * We are not going to use the page table in trampoline memory if we
- * are already in the desired paging mode.
- */
- if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
- goto out;
-
if (l5_required) {
/*
* For 4- to 5-level paging transition, set up current CR3 as
@@ -189,7 +186,6 @@ asmlinkage void set_paging_levels(void *rmode)
(void *)src, PAGE_SIZE);
}
-out:
toggle_la57(trampoline_32bit, l5_required);
}
--
2.39.2
Now that the trampoline setup code and the actual invocation of it are
all done from the C routine, the trampoline cleanup can be merged into
it as well, instead of returning to asm just to call another C function.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 13 +++------
arch/x86/boot/compressed/pgtable_64.c | 28 ++++++++------------
2 files changed, 15 insertions(+), 26 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 403c96dae34d9c6d..b5bd6be035a7b7ec 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -429,19 +429,14 @@ SYM_CODE_START(startup_64)
* set_paging_levels() updates the number of paging levels using a
* trampoline in 32-bit addressable memory if the current number does
* not match the desired number.
+ *
+ * RSI is the relocated address of the page table to use instead of
+ * page table in trampoline memory (if required).
*/
movq %r15, %rdi /* pass struct boot_params pointer */
+ leaq rva(top_pgtable)(%rbx), %rsi
call set_paging_levels
- /*
- * cleanup_trampoline() would restore trampoline memory.
- *
- * RDI is address of the page table to use instead of page table
- * in trampoline memory (if required).
- */
- leaq rva(top_pgtable)(%rbx), %rdi
- call cleanup_trampoline
-
/* Zero EFLAGS */
pushq $0
popfq
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index b92cf1d6e156d5f6..eeddad8c8335655e 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -101,9 +101,10 @@ static unsigned long find_trampoline_placement(void)
return bios_start - TRAMPOLINE_32BIT_SIZE;
}
-asmlinkage void set_paging_levels(void *rmode)
+asmlinkage void set_paging_levels(void *rmode, void *pgtable)
{
void (*toggle_la57)(void *trampoline, bool enable_5lvl);
+ void *trampoline_pgtable;
bool l5_required = false;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
@@ -133,7 +134,7 @@ asmlinkage void set_paging_levels(void *rmode)
* are already in the desired paging mode.
*/
if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
- return;
+ goto out;
trampoline_32bit = (unsigned long *)find_trampoline_placement();
@@ -163,6 +164,8 @@ asmlinkage void set_paging_levels(void *rmode)
* The new page table will be used by trampoline code for switching
* from 4- to 5-level paging or vice versa.
*/
+ trampoline_pgtable = trampoline_32bit +
+ TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long);
if (l5_required) {
/*
@@ -182,31 +185,21 @@ asmlinkage void set_paging_levels(void *rmode)
* may be above 4G.
*/
src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
- memcpy(trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
- (void *)src, PAGE_SIZE);
+ memcpy(trampoline_pgtable, (void *)src, PAGE_SIZE);
}
toggle_la57(trampoline_32bit, l5_required);
-}
-
-void cleanup_trampoline(void *pgtable)
-{
- void *trampoline_pgtable;
-
- trampoline_pgtable = trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long);
/*
- * Move the top level page table out of trampoline memory,
- * if it's there.
+ * Move the top level page table out of trampoline memory.
*/
- if ((void *)__native_read_cr3() == trampoline_pgtable) {
- memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
- native_write_cr3((unsigned long)pgtable);
- }
+ memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
+ native_write_cr3((unsigned long)pgtable);
/* Restore trampoline memory */
memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE);
+out:
/* Initialize variables for 5-level paging */
#ifdef CONFIG_X86_5LEVEL
if (__read_cr4() & X86_CR4_LA57) {
@@ -215,4 +208,5 @@ void cleanup_trampoline(void *pgtable)
ptrs_per_p4d = 512;
}
#endif
+ return;
}
--
2.39.2
Now that the EFI entry code in assembler is only used by the optional
and deprecated EFI handover protocol, and given that the EFI stub C code
no longer returns to it, most of it can simply be dropped.
While at it, clarify the symbol naming, by merging efi_main() and
efi_stub_entry(), making the latter the shared entry point for all
different boot modes that enter via the EFI stub.
The efi32_stub_entry() and efi64_stub_entry() names are referenced
explicitly by the tooling that populates the setup header, so these must
be retained, but can be emitted as aliases of efi_stub_entry() where
appropriate.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
Documentation/arch/x86/boot.rst | 2 +-
arch/x86/boot/compressed/efi_mixed.S | 22 +++++++++++---------
arch/x86/boot/compressed/head_32.S | 11 ----------
arch/x86/boot/compressed/head_64.S | 12 ++---------
drivers/firmware/efi/libstub/x86-stub.c | 20 ++++++++++++++----
5 files changed, 31 insertions(+), 36 deletions(-)
diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
index 33520ecdb37abfda..cdbca15a4fc23833 100644
--- a/Documentation/arch/x86/boot.rst
+++ b/Documentation/arch/x86/boot.rst
@@ -1417,7 +1417,7 @@ execution context provided by the EFI firmware.
The function prototype for the handover entry point looks like this::
- efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp)
+ efi_stub_entry(void *handle, efi_system_table_t *table, struct boot_params *bp)
'handle' is the EFI image handle passed to the boot loader by the EFI
firmware, 'table' is the EFI system table - these are the first two
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 4ca70bf93dc0bdcd..dcc562c8f7f35162 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -26,8 +26,8 @@
* When booting in 64-bit mode on 32-bit EFI firmware, startup_64_mixed_mode()
* is the first thing that runs after switching to long mode. Depending on
* whether the EFI handover protocol or the compat entry point was used to
- * enter the kernel, it will either branch to the 64-bit EFI handover
- * entrypoint at offset 0x390 in the image, or to the 64-bit EFI PE/COFF
+ * enter the kernel, it will either branch to the common 64-bit EFI stub
+ * entrypoint efi_stub_entry() directly, or via the 64-bit EFI PE/COFF
* entrypoint efi_pe_entry(). In the former case, the bootloader must provide a
* struct bootparams pointer as the third argument, so the presence of such a
* pointer is used to disambiguate.
@@ -37,21 +37,23 @@
* | efi32_pe_entry |---->| | | +-----------+--+
* +------------------+ | | +------+----------------+ |
* | startup_32 |---->| startup_64_mixed_mode | |
- * +------------------+ | | +------+----------------+ V
- * | efi32_stub_entry |---->| | | +------------------+
- * +------------------+ +------------+ +---->| efi64_stub_entry |
- * +-------------+----+
- * +------------+ +----------+ |
- * | startup_64 |<----| efi_main |<--------------+
- * +------------+ +----------+
+ * +------------------+ | | +------+----------------+ |
+ * | efi32_stub_entry |---->| | | |
+ * +------------------+ +------------+ | |
+ * V |
+ * +------------+ +----------------+ |
+ * | startup_64 |<----| efi_stub_entry |<--------+
+ * +------------+ +----------------+
*/
SYM_FUNC_START(startup_64_mixed_mode)
lea efi32_boot_args(%rip), %rdx
mov 0(%rdx), %edi
mov 4(%rdx), %esi
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
mov 8(%rdx), %edx // saved bootparams pointer
test %edx, %edx
- jnz efi64_stub_entry
+ jnz efi_stub_entry
+#endif
/*
* efi_pe_entry uses MS calling convention, which requires 32 bytes of
* shadow space on the stack even if all arguments are passed in
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 987ae727cf9f0d04..8876ffe30e9a4819 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -150,17 +150,6 @@ SYM_FUNC_START(startup_32)
jmp *%eax
SYM_FUNC_END(startup_32)
-#ifdef CONFIG_EFI_STUB
-SYM_FUNC_START(efi32_stub_entry)
- add $0x4, %esp
- movl 8(%esp), %esi /* save boot_params pointer */
- call efi_main
- /* efi_main returns the possibly relocated address of startup_32 */
- jmp *%eax
-SYM_FUNC_END(efi32_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
-#endif
-
.text
SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 03c4328a88cbd5d0..71c1f40a7ac067b9 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -523,19 +523,11 @@ trampoline_return:
jmp *%rax
SYM_CODE_END(startup_64)
-#ifdef CONFIG_EFI_STUB
-#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
.org 0x390
-#endif
SYM_FUNC_START(efi64_stub_entry)
- and $~0xf, %rsp /* realign the stack */
- movq %rdx, %rbx /* save boot_params pointer */
- call efi_main
- movq %rbx,%rsi
- leaq rva(startup_64)(%rax), %rax
- jmp *%rax
+ jmp efi_stub_entry
SYM_FUNC_END(efi64_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
#endif
.text
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 095aaa8b0ee30fb9..d6a376e52cbe1399 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -782,9 +782,9 @@ static void __noreturn enter_kernel(unsigned long kernel_addr,
* On success, we jump to the relocated kernel directly and never return.
* On failure, we exit to the firmware via efi_exit instead of returning.
*/
-asmlinkage unsigned long efi_main(efi_handle_t handle,
- efi_system_table_t *sys_table_arg,
- struct boot_params *boot_params)
+void __noreturn efi_stub_entry(efi_handle_t handle,
+ efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params)
{
unsigned long bzimage_addr = (unsigned long)startup_32;
unsigned long buffer_start, buffer_end;
@@ -928,7 +928,19 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
enter_kernel(bzimage_addr, boot_params);
fail:
- efi_err("efi_main() failed!\n");
+ efi_err("efi_stub_entry() failed!\n");
efi_exit(handle, status);
}
+
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#ifndef CONFIG_EFI_MIXED
+extern __alias(efi_stub_entry)
+void efi32_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params);
+
+extern __alias(efi_stub_entry)
+void efi64_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params);
+#endif
+#endif
--
2.39.2
Before refactoring the EFI stub boot flow to avoid the legacy bare metal
decompressor, duplicate the SNP feature check in the EFI stub before
handing over to the kernel proper.
The SNP feature check can be performed while running under the EFI boot
services, which means we can fail gracefully and return an error to the
bootloader if the loaded kernel does not implement support for all the
features that the hypervisor enabled.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/sev.c | 74 ++++++++++++--------
arch/x86/include/asm/sev.h | 4 ++
drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
3 files changed, 67 insertions(+), 28 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 014b89c890887b9a..be021e24f1ece421 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -315,6 +315,11 @@ static void enforce_vmpl0(void)
*/
#define SNP_FEATURES_PRESENT (0)
+u64 snp_get_unsupported_features(u64 status)
+{
+ return status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+}
+
void snp_check_features(void)
{
u64 unsupported;
@@ -328,7 +333,7 @@ void snp_check_features(void)
* EXIT_INFO_2 of the GHCB protocol so that those features can be reported
* as part of the guest boot failure.
*/
- unsupported = sev_status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+ unsupported = snp_get_unsupported_features(sev_status);
if (unsupported) {
if (ghcb_version < 2 || (!boot_ghcb && !early_setup_ghcb()))
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
@@ -338,10 +343,38 @@ void snp_check_features(void)
}
}
-void sev_enable(struct boot_params *bp)
+u64 sev_get_status(void)
{
unsigned int eax, ebx, ecx, edx;
struct msr m;
+
+ /* Check for the SME/SEV support leaf */
+ eax = 0x80000000;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if (eax < 0x8000001f)
+ return 0;
+
+ /*
+ * Check for the SME/SEV feature:
+ * CPUID Fn8000_001F[EAX]
+ * - Bit 0 - Secure Memory Encryption support
+ * - Bit 1 - Secure Encrypted Virtualization support
+ */
+ eax = 0x8000001f;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ /* Check whether SEV is supported */
+ if (!(eax & BIT(1)))
+ return 0;
+
+ boot_rdmsr(MSR_AMD64_SEV, &m);
+ return m.q;
+}
+
+void sev_enable(struct boot_params *bp)
+{
+ unsigned int eax, ebx, ecx, edx;
bool snp;
/*
@@ -358,37 +391,14 @@ void sev_enable(struct boot_params *bp)
*/
snp = snp_init(bp);
- /* Check for the SME/SEV support leaf */
- eax = 0x80000000;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- if (eax < 0x8000001f)
- return;
-
- /*
- * Check for the SME/SEV feature:
- * CPUID Fn8000_001F[EAX]
- * - Bit 0 - Secure Memory Encryption support
- * - Bit 1 - Secure Encrypted Virtualization support
- * CPUID Fn8000_001F[EBX]
- * - Bits 5:0 - Pagetable bit position used to indicate encryption
- */
- eax = 0x8000001f;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- /* Check whether SEV is supported */
- if (!(eax & BIT(1))) {
+ /* Set the SME mask if this is an SEV guest. */
+ sev_status = sev_get_status();
+ if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
if (snp)
error("SEV-SNP support indicated by CC blob, but not CPUID.");
return;
}
- /* Set the SME mask if this is an SEV guest. */
- boot_rdmsr(MSR_AMD64_SEV, &m);
- sev_status = m.q;
- if (!(sev_status & MSR_AMD64_SEV_ENABLED))
- return;
-
/* Negotiate the GHCB protocol version. */
if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
if (!sev_es_negotiate_protocol())
@@ -409,6 +419,14 @@ void sev_enable(struct boot_params *bp)
if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
+ /*
+ * Check for the SME/SEV feature:
+ * CPUID Fn8000_001F[EBX]
+ * - Bits 5:0 - Pagetable bit position used to indicate encryption
+ */
+ eax = 0x8000001f;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
sme_me_mask = BIT_ULL(ebx & 0x3f);
}
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 13dc2a9d23c1eb25..e5aad673194698b8 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -202,6 +202,8 @@ void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
+u64 snp_get_unsupported_features(u64 status);
+u64 sev_get_status(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -225,6 +227,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
{
return -ENOTTY;
}
+static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
+static inline u64 sev_get_status(void) { return 0; }
#endif
#endif
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 2d3282d2ed6eb756..f9d203b5ee6236e8 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,6 +15,7 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/sev.h>
#include "efistub.h"
#include "x86-stub.h"
@@ -756,6 +757,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+static bool have_unsupported_snp_features(void)
+{
+ u64 unsupported;
+
+ unsupported = snp_get_unsupported_features(sev_get_status());
+ if (unsupported) {
+ efi_err("Unsupported SEV-SNP features detected: 0x%llx\n",
+ unsupported);
+ return true;
+ }
+ return false;
+}
+
static void __noreturn enter_kernel(unsigned long kernel_addr,
struct boot_params *boot_params)
{
@@ -785,6 +799,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
+ if (have_unsupported_snp_features())
+ efi_exit(handle, EFI_UNSUPPORTED);
+
if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
if (efi_dxe_table &&
--
2.39.2
The 32-bit decompressor does not actually use a global offset table
(GOT), but as is common for 32-bit position independent code, it uses
the magic symbol _GLOBAL_OFFSET_TABLE_ as an anchor from which to derive
the actual runtime addresses of other symbols, using special @GOTOFF
symbol references that are resolved at link time, and populated with the
distance between the address of the magic _GLOBAL_OFFSET_TABLE_ anchor
and the address of the symbol in question.
This means _GLOBAL_OFFSET_TABLE_ is the only symbol whose actual runtime
address needs to be determined explicitly, which is one of the first
things that happens in startup_32. However, it does so by taking the
absolute address via the immediate field of an ADD instruction (plus a
small offset), which seems to defeat the point.
Fortunately, the assembler knows that _GLOBAL_OFFSET_TABLE_ is magic,
and emits a special relative relocation instead, and so the resulting
code works as expected. However, this is not obvious for someone reading
the code, and the use of LEA with an explicit relative addend is more
idiomatic so use that instead.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8876ffe30e9a4819..3530465b5b85ccf3 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -58,7 +58,7 @@ SYM_FUNC_START(startup_32)
leal (BP_scratch+4)(%esi), %esp
call 1f
1: popl %edx
- addl $_GLOBAL_OFFSET_TABLE_+(.-1b), %edx
+ leal (_GLOBAL_OFFSET_TABLE_ - 1b)(%edx), %edx
/* Load new GDT */
leal gdt@GOTOFF(%edx), %eax
--
2.39.2
The ZSTD decompressor requires malloc() allocations to be 8 byte
aligned, so ensure that this the case.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
include/linux/decompress/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h
index 9192986b1a731323..ac862422df158bef 100644
--- a/include/linux/decompress/mm.h
+++ b/include/linux/decompress/mm.h
@@ -48,7 +48,7 @@ MALLOC_VISIBLE void *malloc(int size)
if (!malloc_ptr)
malloc_ptr = free_mem_ptr;
- malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */
+ malloc_ptr = (malloc_ptr + 7) & ~7; /* Align */
p = (void *)malloc_ptr;
malloc_ptr += size;
--
2.39.2
Instead of returning to the calling code in assembler that does nothing
more than perform an indirect call with the boot_params pointer in
register ESI/RSI, perform the jump directly from the EFI stub C code.
This will allow the asm entrypoint code to be dropped entirely in
subsequent patches.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 8d17cee8b98e1a63..095aaa8b0ee30fb9 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -769,9 +769,17 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+static void __noreturn enter_kernel(unsigned long kernel_addr,
+ struct boot_params *boot_params)
+{
+ /* enter decompressed kernel with boot_params pointer in RSI/ESI */
+ asm("jmp *%0"::"r"(kernel_addr), "S"(boot_params));
+
+ unreachable();
+}
+
/*
- * On success, we return the address of startup_32, which has potentially been
- * relocated by efi_relocate_kernel.
+ * On success, we jump to the relocated kernel directly and never return.
* On failure, we exit to the firmware via efi_exit instead of returning.
*/
asmlinkage unsigned long efi_main(efi_handle_t handle,
@@ -914,7 +922,11 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
goto fail;
}
- return bzimage_addr;
+ if (IS_ENABLED(CONFIG_X86_64))
+ /* add offset of startup_64() */
+ bzimage_addr += 0x200;
+
+ enter_kernel(bzimage_addr, boot_params);
fail:
efi_err("efi_main() failed!\n");
--
2.39.2
On 6/2/23 15:38, Tom Lendacky wrote:
> On 6/2/23 05:13, Ard Biesheuvel wrote:
>> Before refactoring the EFI stub boot flow to avoid the legacy bare metal
>> decompressor, duplicate the SNP feature check in the EFI stub before
>> handing over to the kernel proper.
>>
>> The SNP feature check can be performed while running under the EFI boot
>> services, which means we can fail gracefully and return an error to the
>> bootloader if the loaded kernel does not implement support for all the
>> features that the hypervisor enabled.
>>
>> Signed-off-by: Ard Biesheuvel <[email protected]>
>> ---
>> arch/x86/boot/compressed/sev.c | 74 ++++++++++++--------
>> arch/x86/include/asm/sev.h | 4 ++
>> drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
>> 3 files changed, 67 insertions(+), 28 deletions(-)
>>
>> diff --git a/arch/x86/boot/compressed/sev.c
>> b/arch/x86/boot/compressed/sev.c
>> index 014b89c890887b9a..be021e24f1ece421 100644
>> --- a/arch/x86/boot/compressed/sev.c
>> +++ b/arch/x86/boot/compressed/sev.c
>
>
>> +void sev_enable(struct boot_params *bp)
>> +{
>> + unsigned int eax, ebx, ecx, edx;
>> bool snp;
>> /*
>> @@ -358,37 +391,14 @@ void sev_enable(struct boot_params *bp)
>> */
>> snp = snp_init(bp);
>> - /* Check for the SME/SEV support leaf */
>> - eax = 0x80000000;
>> - ecx = 0;
>> - native_cpuid(&eax, &ebx, &ecx, &edx);
>> - if (eax < 0x8000001f)
>> - return;
>> -
>> - /*
>> - * Check for the SME/SEV feature:
>> - * CPUID Fn8000_001F[EAX]
>> - * - Bit 0 - Secure Memory Encryption support
>> - * - Bit 1 - Secure Encrypted Virtualization support
>> - * CPUID Fn8000_001F[EBX]
>> - * - Bits 5:0 - Pagetable bit position used to indicate encryption
>> - */
>> - eax = 0x8000001f;
>> - ecx = 0;
>> - native_cpuid(&eax, &ebx, &ecx, &edx);
>> - /* Check whether SEV is supported */
>> - if (!(eax & BIT(1))) {
>> + /* Set the SME mask if this is an SEV guest. */
>> + sev_status = sev_get_status();
>> + if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
>> if (snp)
>> error("SEV-SNP support indicated by CC blob, but not
>> CPUID.");
>> return;
>> }
>> - /* Set the SME mask if this is an SEV guest. */
>> - boot_rdmsr(MSR_AMD64_SEV, &m);
>> - sev_status = m.q;
>> - if (!(sev_status & MSR_AMD64_SEV_ENABLED))
>> - return;
>> -
>> /* Negotiate the GHCB protocol version. */
>> if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
>> if (!sev_es_negotiate_protocol())
>> @@ -409,6 +419,14 @@ void sev_enable(struct boot_params *bp)
>> if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
>> error("SEV-SNP supported indicated by CC blob, but not SEV
>> status MSR.");
>> + /*
>> + * Check for the SME/SEV feature:
>> + * CPUID Fn8000_001F[EBX]
>> + * - Bits 5:0 - Pagetable bit position used to indicate encryption
>> + */
>> + eax = 0x8000001f;
>> + ecx = 0;
>> + native_cpuid(&eax, &ebx, &ecx, &edx);
>
> This causes SEV-ES / SEV-SNP to crash.
>
> This goes back to a previous comment where calling either
> sev_es_negotiate_protocol() or get_hv_features() blows away the GHCB value
> in the GHCB MSR and as soon as the CPUID instruction is executed the boot
> blows up.
>
> Even if we move this up to be done earlier, we can complete this function
> successfully but then blow up further on.
>
> So you probably have to modify the routines in question to save and
> restore the GHCB MSR value.
I should clarify that it doesn't in fact cause a problem until the final
patch is applied and this path is taken.
Thanks,
Tom
>
> Thanks,
> Tom
>
>> sme_me_mask = BIT_ULL(ebx & 0x3f);
>> }
On 6/2/23 05:13, Ard Biesheuvel wrote:
> Before refactoring the EFI stub boot flow to avoid the legacy bare metal
> decompressor, duplicate the SNP feature check in the EFI stub before
> handing over to the kernel proper.
>
> The SNP feature check can be performed while running under the EFI boot
> services, which means we can fail gracefully and return an error to the
> bootloader if the loaded kernel does not implement support for all the
> features that the hypervisor enabled.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/boot/compressed/sev.c | 74 ++++++++++++--------
> arch/x86/include/asm/sev.h | 4 ++
> drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
> 3 files changed, 67 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 014b89c890887b9a..be021e24f1ece421 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> +void sev_enable(struct boot_params *bp)
> +{
> + unsigned int eax, ebx, ecx, edx;
> bool snp;
>
> /*
> @@ -358,37 +391,14 @@ void sev_enable(struct boot_params *bp)
> */
> snp = snp_init(bp);
>
> - /* Check for the SME/SEV support leaf */
> - eax = 0x80000000;
> - ecx = 0;
> - native_cpuid(&eax, &ebx, &ecx, &edx);
> - if (eax < 0x8000001f)
> - return;
> -
> - /*
> - * Check for the SME/SEV feature:
> - * CPUID Fn8000_001F[EAX]
> - * - Bit 0 - Secure Memory Encryption support
> - * - Bit 1 - Secure Encrypted Virtualization support
> - * CPUID Fn8000_001F[EBX]
> - * - Bits 5:0 - Pagetable bit position used to indicate encryption
> - */
> - eax = 0x8000001f;
> - ecx = 0;
> - native_cpuid(&eax, &ebx, &ecx, &edx);
> - /* Check whether SEV is supported */
> - if (!(eax & BIT(1))) {
> + /* Set the SME mask if this is an SEV guest. */
> + sev_status = sev_get_status();
> + if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
> if (snp)
> error("SEV-SNP support indicated by CC blob, but not CPUID.");
> return;
> }
>
> - /* Set the SME mask if this is an SEV guest. */
> - boot_rdmsr(MSR_AMD64_SEV, &m);
> - sev_status = m.q;
> - if (!(sev_status & MSR_AMD64_SEV_ENABLED))
> - return;
> -
> /* Negotiate the GHCB protocol version. */
> if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
> if (!sev_es_negotiate_protocol())
> @@ -409,6 +419,14 @@ void sev_enable(struct boot_params *bp)
> if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
>
> + /*
> + * Check for the SME/SEV feature:
> + * CPUID Fn8000_001F[EBX]
> + * - Bits 5:0 - Pagetable bit position used to indicate encryption
> + */
> + eax = 0x8000001f;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
This causes SEV-ES / SEV-SNP to crash.
This goes back to a previous comment where calling either
sev_es_negotiate_protocol() or get_hv_features() blows away the GHCB value
in the GHCB MSR and as soon as the CPUID instruction is executed the boot
blows up.
Even if we move this up to be done earlier, we can complete this function
successfully but then blow up further on.
So you probably have to modify the routines in question to save and
restore the GHCB MSR value.
Thanks,
Tom
> sme_me_mask = BIT_ULL(ebx & 0x3f);
> }
>
On Fri, 2 Jun 2023 at 22:39, Tom Lendacky <[email protected]> wrote:
>
> On 6/2/23 15:38, Tom Lendacky wrote:
> > On 6/2/23 05:13, Ard Biesheuvel wrote:
> >> Before refactoring the EFI stub boot flow to avoid the legacy bare metal
> >> decompressor, duplicate the SNP feature check in the EFI stub before
> >> handing over to the kernel proper.
> >>
> >> The SNP feature check can be performed while running under the EFI boot
> >> services, which means we can fail gracefully and return an error to the
> >> bootloader if the loaded kernel does not implement support for all the
> >> features that the hypervisor enabled.
> >>
> >> Signed-off-by: Ard Biesheuvel <[email protected]>
> >> ---
> >> arch/x86/boot/compressed/sev.c | 74 ++++++++++++--------
> >> arch/x86/include/asm/sev.h | 4 ++
> >> drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
> >> 3 files changed, 67 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/arch/x86/boot/compressed/sev.c
> >> b/arch/x86/boot/compressed/sev.c
> >> index 014b89c890887b9a..be021e24f1ece421 100644
> >> --- a/arch/x86/boot/compressed/sev.c
> >> +++ b/arch/x86/boot/compressed/sev.c
> >
> >
> >> +void sev_enable(struct boot_params *bp)
> >> +{
> >> + unsigned int eax, ebx, ecx, edx;
> >> bool snp;
> >> /*
> >> @@ -358,37 +391,14 @@ void sev_enable(struct boot_params *bp)
> >> */
> >> snp = snp_init(bp);
> >> - /* Check for the SME/SEV support leaf */
> >> - eax = 0x80000000;
> >> - ecx = 0;
> >> - native_cpuid(&eax, &ebx, &ecx, &edx);
> >> - if (eax < 0x8000001f)
> >> - return;
> >> -
> >> - /*
> >> - * Check for the SME/SEV feature:
> >> - * CPUID Fn8000_001F[EAX]
> >> - * - Bit 0 - Secure Memory Encryption support
> >> - * - Bit 1 - Secure Encrypted Virtualization support
> >> - * CPUID Fn8000_001F[EBX]
> >> - * - Bits 5:0 - Pagetable bit position used to indicate encryption
> >> - */
> >> - eax = 0x8000001f;
> >> - ecx = 0;
> >> - native_cpuid(&eax, &ebx, &ecx, &edx);
> >> - /* Check whether SEV is supported */
> >> - if (!(eax & BIT(1))) {
> >> + /* Set the SME mask if this is an SEV guest. */
> >> + sev_status = sev_get_status();
> >> + if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
> >> if (snp)
> >> error("SEV-SNP support indicated by CC blob, but not
> >> CPUID.");
> >> return;
> >> }
> >> - /* Set the SME mask if this is an SEV guest. */
> >> - boot_rdmsr(MSR_AMD64_SEV, &m);
> >> - sev_status = m.q;
> >> - if (!(sev_status & MSR_AMD64_SEV_ENABLED))
> >> - return;
> >> -
> >> /* Negotiate the GHCB protocol version. */
> >> if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
> >> if (!sev_es_negotiate_protocol())
> >> @@ -409,6 +419,14 @@ void sev_enable(struct boot_params *bp)
> >> if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> >> error("SEV-SNP supported indicated by CC blob, but not SEV
> >> status MSR.");
> >> + /*
> >> + * Check for the SME/SEV feature:
> >> + * CPUID Fn8000_001F[EBX]
> >> + * - Bits 5:0 - Pagetable bit position used to indicate encryption
> >> + */
> >> + eax = 0x8000001f;
> >> + ecx = 0;
> >> + native_cpuid(&eax, &ebx, &ecx, &edx);
> >
> > This causes SEV-ES / SEV-SNP to crash.
> >
> > This goes back to a previous comment where calling either
> > sev_es_negotiate_protocol() or get_hv_features() blows away the GHCB value
> > in the GHCB MSR and as soon as the CPUID instruction is executed the boot
> > blows up.
> >
> > Even if we move this up to be done earlier, we can complete this function
> > successfully but then blow up further on.
> >
> > So you probably have to modify the routines in question to save and
> > restore the GHCB MSR value.
>
> I should clarify that it doesn't in fact cause a problem until the final
> patch is applied and this path is taken.
>
Could we just move the CPUID call to the start of the function?
On 6/2/23 16:29, Ard Biesheuvel wrote:
> On Fri, 2 Jun 2023 at 22:39, Tom Lendacky <[email protected]> wrote:
>>
>> On 6/2/23 15:38, Tom Lendacky wrote:
>>> On 6/2/23 05:13, Ard Biesheuvel wrote:
>>>> Before refactoring the EFI stub boot flow to avoid the legacy bare metal
>>>> decompressor, duplicate the SNP feature check in the EFI stub before
>>>> handing over to the kernel proper.
>>>>
>>>> The SNP feature check can be performed while running under the EFI boot
>>>> services, which means we can fail gracefully and return an error to the
>>>> bootloader if the loaded kernel does not implement support for all the
>>>> features that the hypervisor enabled.
>>>>
>>>> Signed-off-by: Ard Biesheuvel <[email protected]>
>>>> ---
>>>> arch/x86/boot/compressed/sev.c | 74 ++++++++++++--------
>>>> arch/x86/include/asm/sev.h | 4 ++
>>>> drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
>>>> 3 files changed, 67 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/arch/x86/boot/compressed/sev.c
>>>> b/arch/x86/boot/compressed/sev.c
>>>> index 014b89c890887b9a..be021e24f1ece421 100644
>>>> --- a/arch/x86/boot/compressed/sev.c
>>>> +++ b/arch/x86/boot/compressed/sev.c
>>>
>>>
>>>> +void sev_enable(struct boot_params *bp)
>>>> +{
>>>> + unsigned int eax, ebx, ecx, edx;
>>>> bool snp;
>>>> /*
>>>> @@ -358,37 +391,14 @@ void sev_enable(struct boot_params *bp)
>>>> */
>>>> snp = snp_init(bp);
>>>> - /* Check for the SME/SEV support leaf */
>>>> - eax = 0x80000000;
>>>> - ecx = 0;
>>>> - native_cpuid(&eax, &ebx, &ecx, &edx);
>>>> - if (eax < 0x8000001f)
>>>> - return;
>>>> -
>>>> - /*
>>>> - * Check for the SME/SEV feature:
>>>> - * CPUID Fn8000_001F[EAX]
>>>> - * - Bit 0 - Secure Memory Encryption support
>>>> - * - Bit 1 - Secure Encrypted Virtualization support
>>>> - * CPUID Fn8000_001F[EBX]
>>>> - * - Bits 5:0 - Pagetable bit position used to indicate encryption
>>>> - */
>>>> - eax = 0x8000001f;
>>>> - ecx = 0;
>>>> - native_cpuid(&eax, &ebx, &ecx, &edx);
>>>> - /* Check whether SEV is supported */
>>>> - if (!(eax & BIT(1))) {
>>>> + /* Set the SME mask if this is an SEV guest. */
>>>> + sev_status = sev_get_status();
>>>> + if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
>>>> if (snp)
>>>> error("SEV-SNP support indicated by CC blob, but not
>>>> CPUID.");
>>>> return;
>>>> }
>>>> - /* Set the SME mask if this is an SEV guest. */
>>>> - boot_rdmsr(MSR_AMD64_SEV, &m);
>>>> - sev_status = m.q;
>>>> - if (!(sev_status & MSR_AMD64_SEV_ENABLED))
>>>> - return;
>>>> -
>>>> /* Negotiate the GHCB protocol version. */
>>>> if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
>>>> if (!sev_es_negotiate_protocol())
>>>> @@ -409,6 +419,14 @@ void sev_enable(struct boot_params *bp)
>>>> if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
>>>> error("SEV-SNP supported indicated by CC blob, but not SEV
>>>> status MSR.");
>>>> + /*
>>>> + * Check for the SME/SEV feature:
>>>> + * CPUID Fn8000_001F[EBX]
>>>> + * - Bits 5:0 - Pagetable bit position used to indicate encryption
>>>> + */
>>>> + eax = 0x8000001f;
>>>> + ecx = 0;
>>>> + native_cpuid(&eax, &ebx, &ecx, &edx);
>>>
>>> This causes SEV-ES / SEV-SNP to crash.
>>>
>>> This goes back to a previous comment where calling either
>>> sev_es_negotiate_protocol() or get_hv_features() blows away the GHCB value
>>> in the GHCB MSR and as soon as the CPUID instruction is executed the boot
>>> blows up.
>>>
>>> Even if we move this up to be done earlier, we can complete this function
>>> successfully but then blow up further on.
>>>
>>> So you probably have to modify the routines in question to save and
>>> restore the GHCB MSR value.
>>
>> I should clarify that it doesn't in fact cause a problem until the final
>> patch is applied and this path is taken.
>>
>
> Could we just move the CPUID call to the start of the function?
I tried that and it allowed sev_enable() to complete successfully, but
then it blew up after that.
But I noticed that the patch to apply the kernel CS earlier is no longer
part of this series. When I applied the patch to move the setting of the
kernel CS directly after the call to startup_64_setup_env() in
arch/x86/kernel/head_64.S, everything worked again.
That patch hasn't been pulled into tip, yet, and since you dropped your
version, the CS value wrong and the IRET from the #VC blows up.
So as long as you pre-req that patch, unless Boris or Dave would prefer it
to go in with this series, moving the call to native_cpuid() up to just
before the GHCB protocol version negotiation, works.
Thanks,
Tom
On Sat, 3 Jun 2023 at 00:01, Tom Lendacky <[email protected]> wrote:
>
> On 6/2/23 16:29, Ard Biesheuvel wrote:
> > On Fri, 2 Jun 2023 at 22:39, Tom Lendacky <[email protected]> wrote:
> >>
> >> On 6/2/23 15:38, Tom Lendacky wrote:
> >>> On 6/2/23 05:13, Ard Biesheuvel wrote:
> >>>> Before refactoring the EFI stub boot flow to avoid the legacy bare metal
> >>>> decompressor, duplicate the SNP feature check in the EFI stub before
> >>>> handing over to the kernel proper.
> >>>>
> >>>> The SNP feature check can be performed while running under the EFI boot
> >>>> services, which means we can fail gracefully and return an error to the
> >>>> bootloader if the loaded kernel does not implement support for all the
> >>>> features that the hypervisor enabled.
> >>>>
> >>>> Signed-off-by: Ard Biesheuvel <[email protected]>
> >>>> ---
> >>>> arch/x86/boot/compressed/sev.c | 74 ++++++++++++--------
> >>>> arch/x86/include/asm/sev.h | 4 ++
> >>>> drivers/firmware/efi/libstub/x86-stub.c | 17 +++++
> >>>> 3 files changed, 67 insertions(+), 28 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/boot/compressed/sev.c
> >>>> b/arch/x86/boot/compressed/sev.c
> >>>> index 014b89c890887b9a..be021e24f1ece421 100644
> >>>> --- a/arch/x86/boot/compressed/sev.c
> >>>> +++ b/arch/x86/boot/compressed/sev.c
> >>>
> >>>
> >>>> +void sev_enable(struct boot_params *bp)
> >>>> +{
> >>>> + unsigned int eax, ebx, ecx, edx;
> >>>> bool snp;
> >>>> /*
> >>>> @@ -358,37 +391,14 @@ void sev_enable(struct boot_params *bp)
> >>>> */
> >>>> snp = snp_init(bp);
> >>>> - /* Check for the SME/SEV support leaf */
> >>>> - eax = 0x80000000;
> >>>> - ecx = 0;
> >>>> - native_cpuid(&eax, &ebx, &ecx, &edx);
> >>>> - if (eax < 0x8000001f)
> >>>> - return;
> >>>> -
> >>>> - /*
> >>>> - * Check for the SME/SEV feature:
> >>>> - * CPUID Fn8000_001F[EAX]
> >>>> - * - Bit 0 - Secure Memory Encryption support
> >>>> - * - Bit 1 - Secure Encrypted Virtualization support
> >>>> - * CPUID Fn8000_001F[EBX]
> >>>> - * - Bits 5:0 - Pagetable bit position used to indicate encryption
> >>>> - */
> >>>> - eax = 0x8000001f;
> >>>> - ecx = 0;
> >>>> - native_cpuid(&eax, &ebx, &ecx, &edx);
> >>>> - /* Check whether SEV is supported */
> >>>> - if (!(eax & BIT(1))) {
> >>>> + /* Set the SME mask if this is an SEV guest. */
> >>>> + sev_status = sev_get_status();
> >>>> + if (!(sev_status & MSR_AMD64_SEV_ENABLED)) {
> >>>> if (snp)
> >>>> error("SEV-SNP support indicated by CC blob, but not
> >>>> CPUID.");
> >>>> return;
> >>>> }
> >>>> - /* Set the SME mask if this is an SEV guest. */
> >>>> - boot_rdmsr(MSR_AMD64_SEV, &m);
> >>>> - sev_status = m.q;
> >>>> - if (!(sev_status & MSR_AMD64_SEV_ENABLED))
> >>>> - return;
> >>>> -
> >>>> /* Negotiate the GHCB protocol version. */
> >>>> if (sev_status & MSR_AMD64_SEV_ES_ENABLED) {
> >>>> if (!sev_es_negotiate_protocol())
> >>>> @@ -409,6 +419,14 @@ void sev_enable(struct boot_params *bp)
> >>>> if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> >>>> error("SEV-SNP supported indicated by CC blob, but not SEV
> >>>> status MSR.");
> >>>> + /*
> >>>> + * Check for the SME/SEV feature:
> >>>> + * CPUID Fn8000_001F[EBX]
> >>>> + * - Bits 5:0 - Pagetable bit position used to indicate encryption
> >>>> + */
> >>>> + eax = 0x8000001f;
> >>>> + ecx = 0;
> >>>> + native_cpuid(&eax, &ebx, &ecx, &edx);
> >>>
> >>> This causes SEV-ES / SEV-SNP to crash.
> >>>
> >>> This goes back to a previous comment where calling either
> >>> sev_es_negotiate_protocol() or get_hv_features() blows away the GHCB value
> >>> in the GHCB MSR and as soon as the CPUID instruction is executed the boot
> >>> blows up.
> >>>
> >>> Even if we move this up to be done earlier, we can complete this function
> >>> successfully but then blow up further on.
> >>>
> >>> So you probably have to modify the routines in question to save and
> >>> restore the GHCB MSR value.
> >>
> >> I should clarify that it doesn't in fact cause a problem until the final
> >> patch is applied and this path is taken.
> >>
> >
> > Could we just move the CPUID call to the start of the function?
>
> I tried that and it allowed sev_enable() to complete successfully, but
> then it blew up after that.
>
> But I noticed that the patch to apply the kernel CS earlier is no longer
> part of this series. When I applied the patch to move the setting of the
> kernel CS directly after the call to startup_64_setup_env() in
> arch/x86/kernel/head_64.S, everything worked again.
>
> That patch hasn't been pulled into tip, yet, and since you dropped your
> version, the CS value wrong and the IRET from the #VC blows up.
>
> So as long as you pre-req that patch, unless Boris or Dave would prefer it
> to go in with this series, moving the call to native_cpuid() up to just
> before the GHCB protocol version negotiation, works.
>
OK, thanks for confirming.
I dropped that patch because I was assuming that your fix would be picked up.