Update the x86 boot path to avoid the bare metal decompressor when
booting via the EFI stub. The bare metal decompressor inherits the
loader's 1:1 mapping of DRAM when entering in 64-bit mode, and assumes
that all of it is mapped read/write/execute, which will no longer be the
case on systems built to comply with recently tightened logo
requirements (*).
Changes since v8 [12]:
- fix issue in patch #2 (missing boot_params argument to __startup_64())
- limit cc audience for trivial fix
Changes since v7 [10]:
- Rebase onto suggested x86/kexec fix [11] that will most likely land
first
- Drop only vaguely relevant i386 head_32.S patch
- Restructure the 32-bit trampoline changes slightly into steps that are
clearer and easier to follow.
- Fix wording and placement of comments according to suggestions by
Borislav
Changes since v6 [9]:
- add new patch to fix our current reliance on 64-bit GPRs retaining
their full width contents across the switch into 32-bit protected
mode (with fixes: tag, may need to go to -stable);
- preserve the top half of RSP explicitly, and preserve all callee save
registers on the stack across the mode switch; this fixes a reported
issue on Ice Lake with kexec (which loads the kernel above 4G)
Changes since v5 [8]:
- reintroduce patch removing redundant RSI pushes and pops from
arch/x86/kernel/head_64.S
- avoid bare constant 0x200 for the offset of startup_64() in the
decompressor
- rejig SEV/SNP logic in patch #20 once again, to ensure that CPUID
calls and VM exits only occur when the active configuration permits
it
- improve/clarify some code comments and commit logs
- rebase onto v6.5-rc1
Changes since v4 [7]:
- avoid CPUID calls after protocol negotiation but before configuring
exception handling;
- drop patch removing redundant RSI pushes and pops from
arch/x86/kernel/head_64.S
- rebase onto -tip x86/cc - the conflicts are mostly trivial and
restricted to the last 4 patches in the series, so applying this onto
a separate topic branch should be straight-forward as well.
Changes since v3 [6]:
- trivial rebase onto Kirill's unaccepted memory series v13
- test SNP feature mask while running in the EFI boot services, and fail
gracefully on a mismatch
- perform only the SEV init after ExitBootServices()
Changes since v2 [4]:
- update prose style to comply with -tip guidelines
- rebased onto Kirill's unaccepted memory series [3]
- add Kirill's ack to 4/5-level paging changes
- perform SEV init and SNP feature check after ExitBootServices(), to
avoid corrupting the firmware's own SEV state
- split out preparatory refactor of handover entry code and BSS clearing
(patches #1 to #4)
Changes since v1 [2]:
- streamline existing 4/5 level switching code and call it directly from
the EFI stub - this is covered by the first 9 patches, which can be
applied in isolation, if desired;
- deal with SEV/SNP init explicitly;
- clear BSS when booting via the 'handover protocol'
- switch to kernel CS before calling SEV init code in kernel proper.
---- v1 cover letter follows ----
This series is conceptually a combination of Evgeny's series [0] and
mine [1], both of which attempt to make the early decompressor code more
amenable to executing in the EFI environment with stricter handling of
memory permissions.
My series [1] implemented zboot for x86, by getting rid of the entire
x86 decompressor, and replacing it with existing EFI code that does the
same but in a generic way. The downside of this is that only EFI boot is
supported, making it unviable for distros, which need to support BIOS
boot and hybrid EFI boot modes that omit the EFI stub.
Evgeny's series [0] adapted the entire decompressor code flow to allow
it to execute in the EFI context as well as the bare metal context, and
this involves changes to the 1:1 mapping code and the page fault
handlers etc, none of which are really needed when doing EFI boot in the
first place.
So this series attempts to occupy the middle ground here: it makes
minimal changes to the existing decompressor so some of it can be called
from the EFI stub. Then, it reimplements the EFI boot flow to decompress
the kernel and boot it directly, without relying on the trampoline
allocation code, page table code or page fault handling code. This
allows us to get rid of quite a bit of unsavory EFI stub code, and
replace it with two clear invocations of the EFI firmware APIs to clear
NX restrictions from allocations that have been populated with
executable code.
The only code that is being reused is the decompression library itself,
along with the minimal ELF parsing that is required to copy the ELF
segments in place, and the relocation processing that fixes up absolute
symbol references to refer to the correct virtual addresses.
Note that some of Evgeny's changes to clean up the PE/COFF header
generation will still be needed, but I've omitted those here for
brevity.
(*) IMHO the following developments are likely to occur:
- the Windows boot chain (along with 3rd party drivers) is cleaned up so
that it never relies on memory being writable and executable at the
same time when running under the EFI boot services;
- the EFI reference implementation gets updated to map all memory NX by
default, and to require read-only permissions for executable mappings;
- BIOS vendors incorporate these changes into their codebases, and
deploy it more widely than just the 'secure' SKUs;
- OEMs only care about the Windows sticker [5], so they only boot test
Windows, which works fine in this more restricted context;
- Linux boot no longer works reliably on new hardware built for Windows
unless we clean up our boot chain as well.
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
[0] https://lore.kernel.org/all/[email protected]/
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/
[5] https://techcommunity.microsoft.com/t5/hardware-dev-center/new-uefi-ca-memory-mitigation-requirements-for-signing/ba-p/3608714
[6] https://lore.kernel.org/all/[email protected]/
[7] https://lore.kernel.org/all/[email protected]/
[8] https://lore.kernel.org/all/[email protected]/
[9] https://lore.kernel.org/all/[email protected]/
[10] https://lore.kernel.org/all/[email protected]/
[11] https://lore.kernel.org/all/20230802093927.GAZMokT57anC5jBISK@fat_crate.local/
[12] https://lore.kernel.org/all/[email protected]/
Ard Biesheuvel (23):
x86/decompressor: Don't rely on upper 32 bits of GPRs being preserved
x86/head_64: Store boot_params pointer in callee save register
x86/efistub: Branch straight to kernel entry point from C code
x86/efistub: Simplify and clean up handover entry code
x86/decompressor: Avoid magic offsets for EFI handover entrypoint
x86/efistub: Clear BSS in EFI handover protocol entrypoint
x86/decompressor: Store boot_params pointer in callee save register
x86/decompressor: Assign paging related global variables earlier
x86/decompressor: Call trampoline as a normal function
x86/decompressor: Use standard calling convention for trampoline
x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
x86/decompressor: Call trampoline directly from C code
x86/decompressor: Only call the trampoline when changing paging levels
x86/decompressor: Pass pgtable address to trampoline directly
x86/decompressor: Merge trampoline cleanup with switching code
x86/efistub: Perform 4/5 level paging switch from the stub
x86/efistub: Prefer EFI memory attributes protocol over DXE services
decompress: Use 8 byte alignment
x86/decompressor: Move global symbol references to C code
x86/decompressor: Factor out kernel decompression and relocation
efi/libstub: Add limit argument to efi_random_alloc()
x86/efistub: Perform SNP feature test while running in the firmware
x86/efistub: Avoid legacy decompressor when doing EFI boot
Documentation/arch/x86/boot.rst | 2 +-
arch/x86/boot/compressed/Makefile | 5 +
arch/x86/boot/compressed/efi_mixed.S | 107 +++-----
arch/x86/boot/compressed/head_32.S | 32 ---
arch/x86/boot/compressed/head_64.S | 280 ++++++-------------
arch/x86/boot/compressed/misc.c | 44 ++-
arch/x86/boot/compressed/misc.h | 2 -
arch/x86/boot/compressed/pgtable.h | 10 +-
arch/x86/boot/compressed/pgtable_64.c | 87 +++---
arch/x86/boot/compressed/sev.c | 112 ++++----
arch/x86/include/asm/boot.h | 8 +
arch/x86/include/asm/efi.h | 7 +-
arch/x86/include/asm/sev.h | 6 +
arch/x86/kernel/head_64.S | 32 +--
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/firmware/efi/libstub/arm64-stub.c | 2 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 2 +
drivers/firmware/efi/libstub/efistub.h | 3 +-
drivers/firmware/efi/libstub/randomalloc.c | 10 +-
drivers/firmware/efi/libstub/x86-5lvl.c | 95 +++++++
drivers/firmware/efi/libstub/x86-stub.c | 285 +++++++++++---------
drivers/firmware/efi/libstub/x86-stub.h | 17 ++
drivers/firmware/efi/libstub/zboot.c | 2 +-
include/linux/decompress/mm.h | 2 +-
24 files changed, 589 insertions(+), 564 deletions(-)
create mode 100644 drivers/firmware/efi/libstub/x86-5lvl.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
--
2.39.2
Instead of pushing/popping %RSI to/from the stack every time a function
is called from startup_64(), store it in a callee preserved register
and grab it from there when its value is actually needed.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/kernel/head_64.S | 32 ++++++++------------
1 file changed, 12 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c5b9289837dcbad2..ea6995920b7aa920 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -51,7 +51,9 @@ SYM_CODE_START_NOALIGN(startup_64)
* for us. These identity mapped page tables map all of the
* kernel pages and possibly all of memory.
*
- * %rsi holds a physical pointer to real_mode_data.
+ * %RSI holds the physical address of the boot_params structure
+ * provided by the bootloader. Preserve it in %R15 so C function calls
+ * will not clobber it.
*
* We come here either directly from a 64bit bootloader, or from
* arch/x86/boot/compressed/head_64.S.
@@ -62,6 +64,7 @@ SYM_CODE_START_NOALIGN(startup_64)
* compiled to run at we first fixup the physical addresses in our page
* tables and then reload them.
*/
+ mov %rsi, %r15
/* Set up the stack for verify_cpu() */
leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
@@ -75,9 +78,7 @@ SYM_CODE_START_NOALIGN(startup_64)
shrq $32, %rdx
wrmsr
- pushq %rsi
call startup_64_setup_env
- popq %rsi
/* Now switch to __KERNEL_CS so IRET works reliably */
pushq $__KERNEL_CS
@@ -93,12 +94,10 @@ SYM_CODE_START_NOALIGN(startup_64)
* Activate SEV/SME memory encryption if supported/enabled. This needs to
* be done now, since this also includes setup of the SEV-SNP CPUID table,
* which needs to be done before any CPUID instructions are executed in
- * subsequent code.
+ * subsequent code. Pass the boot_params pointer as the first argument.
*/
- movq %rsi, %rdi
- pushq %rsi
+ movq %r15, %rdi
call sme_enable
- popq %rsi
#endif
/* Sanitize CPU configuration */
@@ -111,9 +110,8 @@ SYM_CODE_START_NOALIGN(startup_64)
* programmed into CR3.
*/
leaq _text(%rip), %rdi
- pushq %rsi
+ movq %r15, %rsi
call __startup_64
- popq %rsi
/* Form the CR3 value being sure to include the CR3 modifier */
addq $(early_top_pgt - __START_KERNEL_map), %rax
@@ -127,8 +125,6 @@ SYM_CODE_START(secondary_startup_64)
* At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
* and someone has loaded a mapped page table.
*
- * %rsi holds a physical pointer to real_mode_data.
- *
* We come here either from startup_64 (using physical addresses)
* or from trampoline.S (using virtual addresses).
*
@@ -153,6 +149,9 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
UNWIND_HINT_END_OF_STACK
ANNOTATE_NOENDBR
+ /* Clear %R15 which holds the boot_params pointer on the boot CPU */
+ xorq %r15, %r15
+
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
* added to the initial pgdir entry that will be programmed into CR3.
@@ -199,13 +198,9 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* hypervisor could lie about the C-bit position to perform a ROP
* attack on the guest by writing to the unencrypted stack and wait for
* the next RET instruction.
- * %rsi carries pointer to realmode data and is callee-clobbered. Save
- * and restore it.
*/
- pushq %rsi
movq %rax, %rdi
call sev_verify_cbit
- popq %rsi
/*
* Switch to new page-table
@@ -365,9 +360,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
wrmsr
/* Setup and Load IDT */
- pushq %rsi
call early_setup_idt
- popq %rsi
/* Check if nx is implemented */
movl $0x80000001, %eax
@@ -403,9 +396,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
pushq $0
popfq
- /* rsi is pointer to real mode structure with interesting info.
- pass it to C */
- movq %rsi, %rdi
+ /* Pass the boot_params pointer as first argument */
+ movq %r15, %rdi
.Ljump_to_C_code:
/*
--
2.39.2
Now that the EFI entry code in assembler is only used by the optional
and deprecated EFI handover protocol, and given that the EFI stub C code
no longer returns to it, most of it can simply be dropped.
While at it, clarify the symbol naming, by merging efi_main() and
efi_stub_entry(), making the latter the shared entry point for all
different boot modes that enter via the EFI stub.
The efi32_stub_entry() and efi64_stub_entry() names are referenced
explicitly by the tooling that populates the setup header, so these must
be retained, but can be emitted as aliases of efi_stub_entry() where
appropriate.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
Documentation/arch/x86/boot.rst | 2 +-
arch/x86/boot/compressed/efi_mixed.S | 22 +++++++++++---------
arch/x86/boot/compressed/head_32.S | 11 ----------
arch/x86/boot/compressed/head_64.S | 12 ++---------
drivers/firmware/efi/libstub/x86-stub.c | 20 ++++++++++++++----
5 files changed, 31 insertions(+), 36 deletions(-)
diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
index 33520ecdb37abfda..cdbca15a4fc23833 100644
--- a/Documentation/arch/x86/boot.rst
+++ b/Documentation/arch/x86/boot.rst
@@ -1417,7 +1417,7 @@ execution context provided by the EFI firmware.
The function prototype for the handover entry point looks like this::
- efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp)
+ efi_stub_entry(void *handle, efi_system_table_t *table, struct boot_params *bp)
'handle' is the EFI image handle passed to the boot loader by the EFI
firmware, 'table' is the EFI system table - these are the first two
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 4ca70bf93dc0bdcd..dcc562c8f7f35162 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -26,8 +26,8 @@
* When booting in 64-bit mode on 32-bit EFI firmware, startup_64_mixed_mode()
* is the first thing that runs after switching to long mode. Depending on
* whether the EFI handover protocol or the compat entry point was used to
- * enter the kernel, it will either branch to the 64-bit EFI handover
- * entrypoint at offset 0x390 in the image, or to the 64-bit EFI PE/COFF
+ * enter the kernel, it will either branch to the common 64-bit EFI stub
+ * entrypoint efi_stub_entry() directly, or via the 64-bit EFI PE/COFF
* entrypoint efi_pe_entry(). In the former case, the bootloader must provide a
* struct bootparams pointer as the third argument, so the presence of such a
* pointer is used to disambiguate.
@@ -37,21 +37,23 @@
* | efi32_pe_entry |---->| | | +-----------+--+
* +------------------+ | | +------+----------------+ |
* | startup_32 |---->| startup_64_mixed_mode | |
- * +------------------+ | | +------+----------------+ V
- * | efi32_stub_entry |---->| | | +------------------+
- * +------------------+ +------------+ +---->| efi64_stub_entry |
- * +-------------+----+
- * +------------+ +----------+ |
- * | startup_64 |<----| efi_main |<--------------+
- * +------------+ +----------+
+ * +------------------+ | | +------+----------------+ |
+ * | efi32_stub_entry |---->| | | |
+ * +------------------+ +------------+ | |
+ * V |
+ * +------------+ +----------------+ |
+ * | startup_64 |<----| efi_stub_entry |<--------+
+ * +------------+ +----------------+
*/
SYM_FUNC_START(startup_64_mixed_mode)
lea efi32_boot_args(%rip), %rdx
mov 0(%rdx), %edi
mov 4(%rdx), %esi
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
mov 8(%rdx), %edx // saved bootparams pointer
test %edx, %edx
- jnz efi64_stub_entry
+ jnz efi_stub_entry
+#endif
/*
* efi_pe_entry uses MS calling convention, which requires 32 bytes of
* shadow space on the stack even if all arguments are passed in
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 987ae727cf9f0d04..8876ffe30e9a4819 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -150,17 +150,6 @@ SYM_FUNC_START(startup_32)
jmp *%eax
SYM_FUNC_END(startup_32)
-#ifdef CONFIG_EFI_STUB
-SYM_FUNC_START(efi32_stub_entry)
- add $0x4, %esp
- movl 8(%esp), %esi /* save boot_params pointer */
- call efi_main
- /* efi_main returns the possibly relocated address of startup_32 */
- jmp *%eax
-SYM_FUNC_END(efi32_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
-#endif
-
.text
SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index f732426d3b483139..e6880208b9a1b90a 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -542,19 +542,11 @@ trampoline_return:
jmp *%rax
SYM_CODE_END(startup_64)
-#ifdef CONFIG_EFI_STUB
-#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
.org 0x390
-#endif
SYM_FUNC_START(efi64_stub_entry)
- and $~0xf, %rsp /* realign the stack */
- movq %rdx, %rbx /* save boot_params pointer */
- call efi_main
- movq %rbx,%rsi
- leaq rva(startup_64)(%rax), %rax
- jmp *%rax
+ jmp efi_stub_entry
SYM_FUNC_END(efi64_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
#endif
.text
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 40a10db2d85e7942..3f3b3edf7a387d7c 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -817,9 +817,9 @@ static void __noreturn enter_kernel(unsigned long kernel_addr,
* return. On failure, it will exit to the firmware via efi_exit() instead of
* returning.
*/
-asmlinkage unsigned long efi_main(efi_handle_t handle,
- efi_system_table_t *sys_table_arg,
- struct boot_params *boot_params)
+void __noreturn efi_stub_entry(efi_handle_t handle,
+ efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params)
{
unsigned long bzimage_addr = (unsigned long)startup_32;
unsigned long buffer_start, buffer_end;
@@ -964,7 +964,19 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
enter_kernel(bzimage_addr, boot_params);
fail:
- efi_err("efi_main() failed!\n");
+ efi_err("efi_stub_entry() failed!\n");
efi_exit(handle, status);
}
+
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#ifndef CONFIG_EFI_MIXED
+extern __alias(efi_stub_entry)
+void efi32_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params);
+
+extern __alias(efi_stub_entry)
+void efi64_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params);
+#endif
+#endif
--
2.39.2
The so-called EFI handover protocol is value-add from the distros that
permits a loader to simply copy a PE kernel image into memory and call
an alternative entrypoint that is described by an embedded boot_params
structure.
Most implementations of this protocol do not bother to check the PE
header for minimum alignment, section placement, etc, and therefore also
don't clear the image's BSS, or even allocate enough memory for it.
Allocating more memory on the fly is rather difficult, but at least
clear the BSS region explicitly when entering in this manner, so that
the EFI stub code does not get confused by global variables that were
not zero-initialized correctly.
When booting in mixed mode, this BSS clearing must occur before any
global state is created, so clear it in the 32-bit asm entry point.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/efi_mixed.S | 14 +++++++++++++-
drivers/firmware/efi/libstub/x86-stub.c | 13 +++++++++++--
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 9308b595f6f0a5de..8a02a151806df14c 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -142,6 +142,18 @@ SYM_FUNC_END(__efi64_thunk)
.code32
#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
SYM_FUNC_START(efi32_stub_entry)
+ call 1f
+1: popl %ecx
+
+ /* Clear BSS */
+ xorl %eax, %eax
+ leal (_bss - 1b)(%ecx), %edi
+ leal (_ebss - 1b)(%ecx), %ecx
+ subl %edi, %ecx
+ shrl $2, %ecx
+ cld
+ rep stosl
+
add $0x4, %esp /* Discard return address */
popl %ecx
popl %edx
@@ -334,7 +346,7 @@ SYM_FUNC_END(efi32_pe_entry)
.org efi32_stub_entry + 0x200
.code64
SYM_FUNC_START_NOALIGN(efi64_stub_entry)
- jmp efi_stub_entry
+ jmp efi_handover_entry
SYM_FUNC_END(efi64_stub_entry)
#endif
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 3f3b3edf7a387d7c..9247dbc7dbbd12ef 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -970,12 +970,21 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+void efi_handover_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params)
+{
+ extern char _bss[], _ebss[];
+
+ memset(_bss, 0, _ebss - _bss);
+ efi_stub_entry(handle, sys_table_arg, boot_params);
+}
+
#ifndef CONFIG_EFI_MIXED
-extern __alias(efi_stub_entry)
+extern __alias(efi_handover_entry)
void efi32_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
struct boot_params *boot_params);
-extern __alias(efi_stub_entry)
+extern __alias(efi_handover_entry)
void efi64_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
struct boot_params *boot_params);
#endif
--
2.39.2
Update the trampoline code so its arguments are passed via RDI and RSI,
which matches the ordinary SysV calling convention for x86_64. This will
allow this code to be called directly from C.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 27 ++++++++++----------
arch/x86/boot/compressed/pgtable.h | 2 +-
2 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 91b5eee306148f9a..c47504208105d7d3 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -444,9 +444,9 @@ SYM_CODE_START(startup_64)
movq %r15, %rdi
call paging_prepare
- /* Save the trampoline address in RCX */
- movq %rax, %rcx
-
+ /* Pass the trampoline address and boolean flag as args #1 and #2 */
+ movq %rax, %rdi
+ movq %rdx, %rsi
leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
call *%rax
@@ -531,11 +531,14 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
SYM_FUNC_END(.Lrelocated)
/*
- * This is the 32-bit trampoline that will be copied over to low memory.
+ * This is the 32-bit trampoline that will be copied over to low memory. It
+ * will be called using the ordinary 64-bit calling convention from code
+ * running in 64-bit mode.
*
* Return address is at the top of the stack (might be above 4G).
- * ECX contains the base address of the trampoline memory.
- * Non zero RDX means trampoline needs to enable 5-level paging.
+ * The first argument (EDI) contains the 32-bit addressable base of the
+ * trampoline memory. A non-zero second argument (ESI) means that the
+ * trampoline needs to enable 5-level paging.
*/
SYM_CODE_START(trampoline_32bit_src)
/*
@@ -582,7 +585,7 @@ SYM_CODE_START(trampoline_32bit_src)
movl %eax, %cr0
/* Check what paging mode we want to be in after the trampoline */
- testl %edx, %edx
+ testl %esi, %esi
jz 1f
/* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
@@ -597,21 +600,17 @@ SYM_CODE_START(trampoline_32bit_src)
jz 3f
2:
/* Point CR3 to the trampoline's new top level page table */
- leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+ leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
movl %eax, %cr3
3:
/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
- pushl %ecx
- pushl %edx
movl $MSR_EFER, %ecx
rdmsr
btsl $_EFER_LME, %eax
/* Avoid writing EFER if no change was made (for TDX guest) */
jc 1f
wrmsr
-1: popl %edx
- popl %ecx
-
+1:
#ifdef CONFIG_X86_MCE
/*
* Preserve CR4.MCE if the kernel will enable #MC support.
@@ -628,7 +627,7 @@ SYM_CODE_START(trampoline_32bit_src)
/* Enable PAE and LA57 (if required) paging modes */
orl $X86_CR4_PAE, %eax
- testl %edx, %edx
+ testl %esi, %esi
jz 1f
orl $X86_CR4_LA57, %eax
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 91dbb99203fbce2d..4e8cef135226bcbb 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -14,7 +14,7 @@
extern unsigned long *trampoline_32bit;
-extern void trampoline_32bit_src(void *return_ptr);
+extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
#endif /* __ASSEMBLER__ */
#endif /* BOOT_COMPRESSED_PAGETABLE_H */
--
2.39.2
Now that the trampoline setup code and the actual invocation of it are
all done from the C routine, the trampoline cleanup can be merged into
it as well, instead of returning to asm just to call another C function.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 14 ++++----------
arch/x86/boot/compressed/pgtable_64.c | 18 ++++--------------
2 files changed, 8 insertions(+), 24 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index afdaf8cb8bb98694..fb0e562c26f64c9c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -430,20 +430,14 @@ SYM_CODE_START(startup_64)
* a trampoline in 32-bit addressable memory if the current number does
* not match the desired number.
*
- * Pass the boot_params pointer as the first argument.
+ * Pass the boot_params pointer as the first argument. The second
+ * argument is the relocated address of the page table to use instead
+ * of the page table in trampoline memory (if required).
*/
movq %r15, %rdi
+ leaq rva(top_pgtable)(%rbx), %rsi
call configure_5level_paging
- /*
- * cleanup_trampoline() would restore trampoline memory.
- *
- * RDI is address of the page table to use instead of page table
- * in trampoline memory (if required).
- */
- leaq rva(top_pgtable)(%rbx), %rdi
- call cleanup_trampoline
-
/* Zero EFLAGS */
pushq $0
popfq
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index eab4e6b568ae05c4..7939eb6e6ce9bb01 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -101,7 +101,7 @@ static unsigned long find_trampoline_placement(void)
return bios_start - TRAMPOLINE_32BIT_SIZE;
}
-asmlinkage void configure_5level_paging(struct boot_params *bp)
+asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
{
void (*toggle_la57)(void *cr3);
bool l5_required = false;
@@ -191,22 +191,12 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
}
toggle_la57(trampoline_32bit);
-}
-
-void cleanup_trampoline(void *pgtable)
-{
- void *trampoline_pgtable;
-
- trampoline_pgtable = trampoline_32bit;
/*
- * Move the top level page table out of trampoline memory,
- * if it's there.
+ * Move the top level page table out of trampoline memory.
*/
- if ((void *)__native_read_cr3() == trampoline_pgtable) {
- memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
- native_write_cr3((unsigned long)pgtable);
- }
+ memcpy(pgtable, trampoline_32bit, PAGE_SIZE);
+ native_write_cr3((unsigned long)pgtable);
/* Restore trampoline memory */
memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE);
--
2.39.2
It is no longer necessary to be cautious when referring to global
variables in the position independent decompressor code, now that it is
built using PIE codegen and makes an assertion in the linker script that
no GOT entries exist (which would require adjustment for the actual
runtime load address of the decompressor binary).
This means global variables can be referenced directly from C code,
instead of having to pass their runtime addresses into C routines from
asm code, which needs to happen at each call site. Do so for the code
that will be called directly from the EFI stub after a subsequent patch,
and avoid the need to duplicate this logic a third time.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 8 --------
arch/x86/boot/compressed/head_64.S | 10 ++--------
arch/x86/boot/compressed/misc.c | 16 +++++++++-------
3 files changed, 11 insertions(+), 23 deletions(-)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8876ffe30e9a4819..3af4a383615b3e1f 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -168,13 +168,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
*/
/* push arguments for extract_kernel: */
- pushl output_len@GOTOFF(%ebx) /* decompressed length, end of relocs */
pushl %ebp /* output address */
- pushl input_len@GOTOFF(%ebx) /* input_len */
- leal input_data@GOTOFF(%ebx), %eax
- pushl %eax /* input_data */
- leal boot_heap@GOTOFF(%ebx), %eax
- pushl %eax /* heap area */
pushl %esi /* real mode pointer */
call extract_kernel /* returns kernel entry point in %eax */
addl $24, %esp
@@ -202,8 +196,6 @@ SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)
*/
.bss
.balign 4
-boot_heap:
- .fill BOOT_HEAP_SIZE, 1, 0
boot_stack:
.fill BOOT_STACK_SIZE, 1, 0
boot_stack_end:
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fb0e562c26f64c9c..28f46051c706724e 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -493,13 +493,9 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
/*
* Do the extraction, and jump to the new kernel..
*/
- /* pass struct boot_params pointer */
+ /* pass struct boot_params pointer and output target address */
movq %r15, %rdi
- leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
- leaq input_data(%rip), %rdx /* input_data */
- movl input_len(%rip), %ecx /* input_len */
- movq %rbp, %r8 /* output target address */
- movl output_len(%rip), %r9d /* decompressed length, end of relocs */
+ movq %rbp, %rsi
call extract_kernel /* returns kernel entry point in %rax */
/*
@@ -657,8 +653,6 @@ SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
*/
.bss
.balign 4
-SYM_DATA_LOCAL(boot_heap, .fill BOOT_HEAP_SIZE, 1, 0)
-
SYM_DATA_START_LOCAL(boot_stack)
.fill BOOT_STACK_SIZE, 1, 0
.balign 16
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 94b7abcf624b3b55..2d91d56b59e1af93 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -330,6 +330,11 @@ static size_t parse_elf(void *output)
return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
}
+static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
+
+extern unsigned char input_data[];
+extern unsigned int input_len, output_len;
+
/*
* The compressed kernel image (ZO), has been moved so that its position
* is against the end of the buffer used to hold the uncompressed kernel
@@ -347,14 +352,11 @@ static size_t parse_elf(void *output)
* |-------uncompressed kernel image---------|
*
*/
-asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
- unsigned char *input_data,
- unsigned long input_len,
- unsigned char *output,
- unsigned long output_len)
+asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
{
const unsigned long kernel_total_size = VO__end - VO__text;
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+ memptr heap = (memptr)boot_heap;
unsigned long needed_size;
size_t entry_offset;
@@ -412,7 +414,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
* entries. This ensures the full mapped area is usable RAM
* and doesn't include any reserved areas.
*/
- needed_size = max(output_len, kernel_total_size);
+ needed_size = max_t(unsigned long, output_len, kernel_total_size);
#ifdef CONFIG_X86_64
needed_size = ALIGN(needed_size, MIN_KERNEL_ALIGN);
#endif
@@ -443,7 +445,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
#ifdef CONFIG_X86_64
if (heap > 0x3fffffffffffUL)
error("Destination address too large");
- if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
+ if (virt_addr + needed_size > KERNEL_IMAGE_SIZE)
error("Destination virtual address is beyond the kernel mapping area");
#else
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
--
2.39.2
x86 will need to limit the kernel memory allocation to the lowest 512
MiB of memory, to match the behavior of the existing bare metal KASLR
physical randomization logic. So in preparation for that, add a limit
parameter to efi_random_alloc() and wire it up.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/arm64-stub.c | 2 +-
drivers/firmware/efi/libstub/efistub.h | 2 +-
drivers/firmware/efi/libstub/randomalloc.c | 10 ++++++----
drivers/firmware/efi/libstub/zboot.c | 2 +-
4 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 770b8ecb73984c61..8c40fc89f5f99209 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -106,7 +106,7 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
*/
status = efi_random_alloc(*reserve_size, min_kimg_align,
reserve_addr, phys_seed,
- EFI_LOADER_CODE);
+ EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
if (status != EFI_SUCCESS)
efi_warn("efi_random_alloc() failed: 0x%lx\n", status);
} else {
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 06b7abc92ced9e18..9823f6fb3e01f718 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -956,7 +956,7 @@ efi_status_t efi_get_random_bytes(unsigned long size, u8 *out);
efi_status_t efi_random_alloc(unsigned long size, unsigned long align,
unsigned long *addr, unsigned long random_seed,
- int memory_type);
+ int memory_type, unsigned long alloc_limit);
efi_status_t efi_random_get_seed(void);
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c
index 32c7a54923b4c127..674a064b8f7adc68 100644
--- a/drivers/firmware/efi/libstub/randomalloc.c
+++ b/drivers/firmware/efi/libstub/randomalloc.c
@@ -16,7 +16,8 @@
*/
static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
unsigned long size,
- unsigned long align_shift)
+ unsigned long align_shift,
+ u64 alloc_limit)
{
unsigned long align = 1UL << align_shift;
u64 first_slot, last_slot, region_end;
@@ -29,7 +30,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
return 0;
region_end = min(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - 1,
- (u64)EFI_ALLOC_LIMIT);
+ alloc_limit);
if (region_end < size)
return 0;
@@ -54,7 +55,8 @@ efi_status_t efi_random_alloc(unsigned long size,
unsigned long align,
unsigned long *addr,
unsigned long random_seed,
- int memory_type)
+ int memory_type,
+ unsigned long alloc_limit)
{
unsigned long total_slots = 0, target_slot;
unsigned long total_mirrored_slots = 0;
@@ -76,7 +78,7 @@ efi_status_t efi_random_alloc(unsigned long size,
efi_memory_desc_t *md = (void *)map->map + map_offset;
unsigned long slots;
- slots = get_entry_num_slots(md, size, ilog2(align));
+ slots = get_entry_num_slots(md, size, ilog2(align), alloc_limit);
MD_NUM_SLOTS(md) = slots;
total_slots += slots;
if (md->attribute & EFI_MEMORY_MORE_RELIABLE)
diff --git a/drivers/firmware/efi/libstub/zboot.c b/drivers/firmware/efi/libstub/zboot.c
index e5d7fa1f1d8fd160..bdb17eac0cb401be 100644
--- a/drivers/firmware/efi/libstub/zboot.c
+++ b/drivers/firmware/efi/libstub/zboot.c
@@ -119,7 +119,7 @@ efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab)
}
status = efi_random_alloc(alloc_size, min_kimg_align, &image_base,
- seed, EFI_LOADER_CODE);
+ seed, EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
if (status != EFI_SUCCESS) {
efi_err("Failed to allocate memory\n");
goto free_cmdline;
--
2.39.2
Instead of returning to the asm calling code to invoke the trampoline,
call it straight from the C code that sets it up. That way, the struct
return type is no longer needed for returning two values, and the call
can be made conditional more cleanly in a subsequent patch.
This means that all callee save 64-bit registers need to be preserved
and restored, as their contents may not survive the legacy mode switch.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 31 ++++++++-----------
arch/x86/boot/compressed/pgtable_64.c | 32 ++++++++------------
2 files changed, 26 insertions(+), 37 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 37fd7b7d683d696c..cd6e3e175389aa6b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -430,25 +430,14 @@ SYM_CODE_START(startup_64)
#endif
/*
- * paging_prepare() sets up the trampoline and checks if we need to
- * enable 5-level paging.
- *
- * paging_prepare() returns a two-quadword structure which lands
- * into RDX:RAX:
- * - Address of the trampoline is returned in RAX.
- * - Non zero RDX means trampoline needs to enable 5-level
- * paging.
+ * configure_5level_paging() updates the number of paging levels using
+ * a trampoline in 32-bit addressable memory if the current number does
+ * not match the desired number.
*
* Pass the boot_params pointer as the first argument.
*/
movq %r15, %rdi
- call paging_prepare
-
- /* Pass the trampoline address and boolean flag as args #1 and #2 */
- movq %rax, %rdi
- movq %rdx, %rsi
- leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
- call *%rax
+ call configure_5level_paging
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -543,11 +532,14 @@ SYM_FUNC_END(.Lrelocated)
.section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
/*
- * Preserve live 64-bit registers on the stack: this is necessary
- * because the architecture does not guarantee that GPRs will retain
- * their full 64-bit values across a 32-bit mode switch.
+ * Preserve callee save 64-bit registers on the stack: this is
+ * necessary because the architecture does not guarantee that GPRs will
+ * retain their full 64-bit values across a 32-bit mode switch.
*/
pushq %r15
+ pushq %r14
+ pushq %r13
+ pushq %r12
pushq %rbp
pushq %rbx
@@ -574,6 +566,9 @@ SYM_CODE_START(trampoline_32bit_src)
/* Restore the preserved 64-bit registers */
popq %rbx
popq %rbp
+ popq %r12
+ popq %r13
+ popq %r14
popq %r15
retq
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 5198a05aefa8d14a..f9cc86b2ee55ca80 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -16,11 +16,6 @@ unsigned int __section(".data") pgdir_shift = 39;
unsigned int __section(".data") ptrs_per_p4d = 1;
#endif
-struct paging_config {
- unsigned long trampoline_start;
- unsigned long l5_required;
-};
-
/* Buffer to preserve trampoline memory */
static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
@@ -29,7 +24,7 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
* purposes.
*
* Avoid putting the pointer into .bss as it will be cleared between
- * paging_prepare() and extract_kernel().
+ * configure_5level_paging() and extract_kernel().
*/
unsigned long *trampoline_32bit __section(".data");
@@ -106,13 +101,13 @@ static unsigned long find_trampoline_placement(void)
return bios_start - TRAMPOLINE_32BIT_SIZE;
}
-struct paging_config paging_prepare(void *rmode)
+asmlinkage void configure_5level_paging(struct boot_params *bp)
{
- struct paging_config paging_config = {};
- void *tramp_code;
+ void (*toggle_la57)(void *trampoline, bool enable_5lvl);
+ bool l5_required = false;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
- boot_params = rmode;
+ boot_params = bp;
/*
* Check if LA57 is desired and supported.
@@ -130,7 +125,7 @@ struct paging_config paging_prepare(void *rmode)
!cmdline_find_option_bool("no5lvl") &&
native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
- paging_config.l5_required = 1;
+ l5_required = true;
/* Initialize variables for 5-level paging */
__pgtable_l5_enabled = 1;
@@ -138,9 +133,7 @@ struct paging_config paging_prepare(void *rmode)
ptrs_per_p4d = 512;
}
- paging_config.trampoline_start = find_trampoline_placement();
-
- trampoline_32bit = (unsigned long *)paging_config.trampoline_start;
+ trampoline_32bit = (unsigned long *)find_trampoline_placement();
/* Preserve trampoline memory */
memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
@@ -149,7 +142,7 @@ struct paging_config paging_prepare(void *rmode)
memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
/* Copy trampoline code in place */
- tramp_code = memcpy(trampoline_32bit +
+ toggle_la57 = memcpy(trampoline_32bit +
TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
@@ -159,7 +152,8 @@ struct paging_config paging_prepare(void *rmode)
* immediate absolute address, which needs to be adjusted based on the
* placement of the trampoline.
*/
- *(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+ *(u32 *)((u8 *)toggle_la57 + trampoline_ljmp_imm_offset) +=
+ (unsigned long)toggle_la57;
/*
* The code below prepares page table in trampoline memory.
@@ -175,10 +169,10 @@ struct paging_config paging_prepare(void *rmode)
* We are not going to use the page table in trampoline memory if we
* are already in the desired paging mode.
*/
- if (paging_config.l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+ if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
goto out;
- if (paging_config.l5_required) {
+ if (l5_required) {
/*
* For 4- to 5-level paging transition, set up current CR3 as
* the first and the only entry in a new top-level page table.
@@ -201,7 +195,7 @@ struct paging_config paging_prepare(void *rmode)
}
out:
- return paging_config;
+ toggle_la57(trampoline_32bit, l5_required);
}
void cleanup_trampoline(void *pgtable)
--
2.39.2
In preparation for updating the EFI stub boot flow to avoid the bare
metal decompressor code altogether, implement the support code for
switching between 4 and 5 levels of paging before jumping to the kernel
proper.
This reuses the newly refactored trampoline that the bare metal
decompressor uses, but relies on EFI APIs to allocate 32-bit addressable
memory and remap it with the appropriate permissions. Given that the
bare metal decompressor will no longer call into the trampoline if the
number of paging levels is already set correctly, it is no longer needed
to remove NX restrictions from the memory range where this trampoline
may end up.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/firmware/efi/libstub/efi-stub-helper.c | 2 +
drivers/firmware/efi/libstub/efistub.h | 1 +
drivers/firmware/efi/libstub/x86-5lvl.c | 95 ++++++++++++++++++++
drivers/firmware/efi/libstub/x86-stub.c | 40 +++------
drivers/firmware/efi/libstub/x86-stub.h | 17 ++++
6 files changed, 130 insertions(+), 26 deletions(-)
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 16d64a34d1e19465..ae8874401a9f1490 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \
lib-$(CONFIG_ARM) += arm32-stub.o
lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o smbios.o
lib-$(CONFIG_X86) += x86-stub.o
+lib-$(CONFIG_X86_64) += x86-5lvl.o
lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 732984295295fb6d..bfa30625f5d03167 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
} else if (!strcmp(param, "noinitrd")) {
efi_noinitrd = true;
+ } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
+ efi_no5lvl = true;
} else if (!strcmp(param, "efi") && val) {
efi_nochunk = parse_option_str(val, "nochunk");
efi_novamap |= parse_option_str(val, "novamap");
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 6aa38a1bf1265d83..06b7abc92ced9e18 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -33,6 +33,7 @@
#define EFI_ALLOC_LIMIT ULONG_MAX
#endif
+extern bool efi_no5lvl;
extern bool efi_nochunk;
extern bool efi_nokaslr;
extern int efi_loglevel;
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
new file mode 100644
index 0000000000000000..479dd445acdcff8d
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/efi.h>
+
+#include <asm/boot.h>
+#include <asm/desc.h>
+#include <asm/efi.h>
+
+#include "efistub.h"
+#include "x86-stub.h"
+
+bool efi_no5lvl;
+
+static void (*la57_toggle)(void *cr3);
+
+static const struct desc_struct gdt[] = {
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+};
+
+/*
+ * Enabling (or disabling) 5 level paging is tricky, because it can only be
+ * done from 32-bit mode with paging disabled. This means not only that the
+ * code itself must be running from 32-bit addressable physical memory, but
+ * also that the root page table must be 32-bit addressable, as programming
+ * a 64-bit value into CR3 when running in 32-bit mode is not supported.
+ */
+efi_status_t efi_setup_5level_paging(void)
+{
+ u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
+ efi_status_t status;
+ u8 *la57_code;
+
+ if (!efi_is_64bit())
+ return EFI_SUCCESS;
+
+ /* check for 5 level paging support */
+ if (native_cpuid_eax(0) < 7 ||
+ !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ return EFI_SUCCESS;
+
+ /* allocate some 32-bit addressable memory for code and a page table */
+ status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
+ U32_MAX);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
+ memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
+
+ /*
+ * To avoid the need to allocate a 32-bit addressable stack, the
+ * trampoline uses a LJMP instruction to switch back to long mode.
+ * LJMP takes an absolute destination address, which needs to be
+ * fixed up at runtime.
+ */
+ *(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
+
+ efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
+
+ return EFI_SUCCESS;
+}
+
+void efi_5level_switch(void)
+{
+ bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
+ bool have_la57 = native_read_cr4() & X86_CR4_LA57;
+ bool need_toggle = want_la57 ^ have_la57;
+ u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
+ u64 *cr3 = (u64 *)__native_read_cr3();
+ u64 *new_cr3;
+
+ if (!la57_toggle || !need_toggle)
+ return;
+
+ if (!have_la57) {
+ /*
+ * 5 level paging will be enabled, so a root level page needs
+ * to be allocated from the 32-bit addressable physical region,
+ * with its first entry referring to the existing hierarchy.
+ */
+ new_cr3 = memset(pgt, 0, PAGE_SIZE);
+ new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
+ } else {
+ /* take the new root table pointer from the current entry #0 */
+ new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+
+ /* copy the new root table if it is not 32-bit addressable */
+ if ((u64)new_cr3 > U32_MAX)
+ new_cr3 = memcpy(pgt, new_cr3, PAGE_SIZE);
+ }
+
+ native_load_gdt(&(struct desc_ptr){ sizeof(gdt) - 1, (u64)gdt });
+
+ la57_toggle(new_cr3);
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 9247dbc7dbbd12ef..af5f50617a5b4c59 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -17,6 +17,7 @@
#include <asm/boot.h>
#include "efistub.h"
+#include "x86-stub.h"
/* Maximum physical address for 64-bit kernel with 4-level paging */
#define MAXMEM_X86_64_4LEVEL (1ull << 46)
@@ -223,8 +224,8 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
}
}
-static void
-adjust_memory_range_protection(unsigned long start, unsigned long size)
+void efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size)
{
efi_status_t status;
efi_gcd_memory_space_desc_t desc;
@@ -278,35 +279,14 @@ adjust_memory_range_protection(unsigned long start, unsigned long size)
}
}
-/*
- * Trampoline takes 2 pages and can be loaded in first megabyte of memory
- * with its end placed between 128k and 640k where BIOS might start.
- * (see arch/x86/boot/compressed/pgtable_64.c)
- *
- * We cannot find exact trampoline placement since memory map
- * can be modified by UEFI, and it can alter the computed address.
- */
-
-#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
-#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-
extern const u8 startup_32[], startup_64[];
static void
setup_memory_protection(unsigned long image_base, unsigned long image_size)
{
- /*
- * Allow execution of possible trampoline used
- * for switching between 4- and 5-level page tables
- * and relocated kernel image.
- */
-
- adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
- TRAMPOLINE_PLACEMENT_SIZE);
-
#ifdef CONFIG_64BIT
if (image_base != (unsigned long)startup_32)
- adjust_memory_range_protection(image_base, image_size);
+ efi_adjust_memory_range_protection(image_base, image_size);
#else
/*
* Clear protection flags on a whole range of possible
@@ -316,8 +296,8 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
* need to remove possible protection on relocated image
* itself disregarding further relocations.
*/
- adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
+ efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
+ KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
#endif
}
@@ -839,6 +819,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_dxe_table = NULL;
}
+ status = efi_setup_5level_paging();
+ if (status != EFI_SUCCESS) {
+ efi_err("efi_setup_5level_paging() failed!\n");
+ goto fail;
+ }
+
/*
* If the kernel isn't already loaded at a suitable address,
* relocate it.
@@ -959,6 +945,8 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
+ efi_5level_switch();
+
if (IS_ENABLED(CONFIG_X86_64))
bzimage_addr += startup_64 - startup_32;
diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
new file mode 100644
index 0000000000000000..37c5a36b9d8cf9b2
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-stub.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/efi.h>
+
+extern void trampoline_32bit_src(void *, bool);
+extern const u16 trampoline_ljmp_imm_offset;
+
+void efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size);
+
+#ifdef CONFIG_X86_64
+efi_status_t efi_setup_5level_paging(void);
+void efi_5level_switch(void);
+#else
+static inline efi_status_t efi_setup_5level_paging(void) { return EFI_SUCCESS; }
+static inline void efi_5level_switch(void) {}
+#endif
--
2.39.2
Instead of returning to the calling code in assembler that does nothing
more than perform an indirect call with the boot_params pointer in
register ESI/RSI, perform the jump directly from the EFI stub C code.
This will allow the asm entrypoint code to be dropped entirely in
subsequent patches.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 22 +++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 220be75a5cdc1f4c..40a10db2d85e7942 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -290,7 +290,7 @@ adjust_memory_range_protection(unsigned long start, unsigned long size)
#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-void startup_32(struct boot_params *boot_params);
+extern const u8 startup_32[], startup_64[];
static void
setup_memory_protection(unsigned long image_base, unsigned long image_size)
@@ -803,10 +803,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+static void __noreturn enter_kernel(unsigned long kernel_addr,
+ struct boot_params *boot_params)
+{
+ /* enter decompressed kernel with boot_params pointer in RSI/ESI */
+ asm("jmp *%0"::"r"(kernel_addr), "S"(boot_params));
+
+ unreachable();
+}
+
/*
- * On success, we return the address of startup_32, which has potentially been
- * relocated by efi_relocate_kernel.
- * On failure, we exit to the firmware via efi_exit instead of returning.
+ * On success, this routine will jump to the relocated image directly and never
+ * return. On failure, it will exit to the firmware via efi_exit() instead of
+ * returning.
*/
asmlinkage unsigned long efi_main(efi_handle_t handle,
efi_system_table_t *sys_table_arg,
@@ -950,7 +959,10 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
goto fail;
}
- return bzimage_addr;
+ if (IS_ENABLED(CONFIG_X86_64))
+ bzimage_addr += startup_64 - startup_32;
+
+ enter_kernel(bzimage_addr, boot_params);
fail:
efi_err("efi_main() failed!\n");
--
2.39.2
The 4-to-5 level mode switch trampoline disables long mode and paging in
order to be able to flick the LA57 bit. According to section 3.4.1.1 of
the x86 architecture manual [0], 64-bit GPRs might not retain the upper
32 bits of their contents across such a mode switch.
Given that RBP, RBX and RSI are live at this point, preserve them on the
stack, along with the return address that might be above 4G as well.
[0] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture
"Because the upper 32 bits of 64-bit general-purpose registers are
undefined in 32-bit modes, the upper 32 bits of any general-purpose
register are not preserved when switching from 64-bit mode to a 32-bit
mode (to protected mode or compatibility mode). Software must not
depend on these bits to maintain a value after a 64-bit to 32-bit
mode switch."
Fixes: 194a9749c73d650c ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 30 +++++++++++++++-----
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 03c4328a88cbd5d0..f732426d3b483139 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -459,11 +459,25 @@ SYM_CODE_START(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
+ /* Set up 32-bit addressable stack */
+ leaq TRAMPOLINE_32BIT_STACK_END(%rcx), %rsp
+
/*
- * Load the address of trampoline_return() into RDI.
- * It will be used by the trampoline to return to the main code.
+ * Preserve live 64-bit registers on the stack: this is necessary
+ * because the architecture does not guarantee that GPRs will retain
+ * their full 64-bit values across a 32-bit mode switch.
+ */
+ pushq %rbp
+ pushq %rbx
+ pushq %rsi
+
+ /*
+ * Push the 64-bit address of trampoline_return() onto the new stack.
+ * It will be used by the trampoline to return to the main code. Due to
+ * the 32-bit mode switch, it cannot be kept it in a register either.
*/
leaq trampoline_return(%rip), %rdi
+ pushq %rdi
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
@@ -471,6 +485,11 @@ SYM_CODE_START(startup_64)
pushq %rax
lretq
trampoline_return:
+ /* Restore live 64-bit registers */
+ popq %rsi
+ popq %rbx
+ popq %rbp
+
/* Restore the stack, the 32-bit trampoline uses its own stack */
leaq rva(boot_stack_end)(%rbx), %rsp
@@ -582,7 +601,7 @@ SYM_FUNC_END(.Lrelocated)
/*
* This is the 32-bit trampoline that will be copied over to low memory.
*
- * RDI contains the return address (might be above 4G).
+ * Return address is at the top of the stack (might be above 4G).
* ECX contains the base address of the trampoline memory.
* Non zero RDX means trampoline needs to enable 5-level paging.
*/
@@ -592,9 +611,6 @@ SYM_CODE_START(trampoline_32bit_src)
movl %eax, %ds
movl %eax, %ss
- /* Set up new stack */
- leal TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
-
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
@@ -671,7 +687,7 @@ SYM_CODE_END(trampoline_32bit_src)
.code64
SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
/* Return from the trampoline */
- jmp *%rdi
+ retq
SYM_FUNC_END(.Lpaging_enabled)
/*
--
2.39.2
Since the current and desired number of paging levels are known when the
trampoline is being prepared, avoid calling the trampoline at all if it
is clear that calling it is not going to result in a change to the
number of paging levels.
Given that the CPU is already running in long mode, the PAE and LA57
settings are necessarily consistent with the currently active page
tables, and other fields in CR4 will be initialized by the startup code
in the kernel proper. So limit the manipulation of CR4 to toggling the
LA57 bit, which is the only thing that really needs doing at this point
in the boot. This also means that there is no need to pass the value of
l5_required to toggle_la57(), as it will not be called unless CR4.LA57
needs to toggle.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 45 ++------------------
arch/x86/boot/compressed/pgtable_64.c | 22 ++++------
2 files changed, 13 insertions(+), 54 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index cd6e3e175389aa6b..8730b1d58e2b0825 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -387,10 +387,6 @@ SYM_CODE_START(startup_64)
* For the trampoline, we need the top page table to reside in lower
* memory as we don't have a way to load 64-bit values into CR3 in
* 32-bit mode.
- *
- * We go though the trampoline even if we don't have to: if we're
- * already in a desired paging mode. This way the trampoline code gets
- * tested on every boot.
*/
/* Make sure we have GDT with 32-bit code segment */
@@ -526,8 +522,7 @@ SYM_FUNC_END(.Lrelocated)
*
* Return address is at the top of the stack (might be above 4G).
* The first argument (EDI) contains the 32-bit addressable base of the
- * trampoline memory. A non-zero second argument (ESI) means that the
- * trampoline needs to enable 5-level paging.
+ * trampoline memory.
*/
.section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
@@ -579,25 +574,10 @@ SYM_CODE_START(trampoline_32bit_src)
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- /* Check what paging mode we want to be in after the trampoline */
- testl %esi, %esi
- jz 1f
-
- /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
- movl %cr4, %eax
- testl $X86_CR4_LA57, %eax
- jnz 3f
- jmp 2f
-1:
- /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
- movl %cr4, %eax
- testl $X86_CR4_LA57, %eax
- jz 3f
-2:
/* Point CR3 to the trampoline's new top level page table */
leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
movl %eax, %cr3
-3:
+
/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
movl $MSR_EFER, %ecx
rdmsr
@@ -606,26 +586,9 @@ SYM_CODE_START(trampoline_32bit_src)
jc 1f
wrmsr
1:
-#ifdef CONFIG_X86_MCE
- /*
- * Preserve CR4.MCE if the kernel will enable #MC support.
- * Clearing MCE may fault in some environments (that also force #MC
- * support). Any machine check that occurs before #MC support is fully
- * configured will crash the system regardless of the CR4.MCE value set
- * here.
- */
+ /* Toggle CR4.LA57 */
movl %cr4, %eax
- andl $X86_CR4_MCE, %eax
-#else
- movl $0, %eax
-#endif
-
- /* Enable PAE and LA57 (if required) paging modes */
- orl $X86_CR4_PAE, %eax
- testl %esi, %esi
- jz 1f
- orl $X86_CR4_LA57, %eax
-1:
+ btcl $X86_CR4_LA57_BIT, %eax
movl %eax, %cr4
/* Enable paging again. */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index f9cc86b2ee55ca80..4213473ae54883c8 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -103,7 +103,7 @@ static unsigned long find_trampoline_placement(void)
asmlinkage void configure_5level_paging(struct boot_params *bp)
{
- void (*toggle_la57)(void *trampoline, bool enable_5lvl);
+ void (*toggle_la57)(void *trampoline);
bool l5_required = false;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
@@ -133,6 +133,13 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
ptrs_per_p4d = 512;
}
+ /*
+ * The trampoline will not be used if the paging mode is already set to
+ * the desired one.
+ */
+ if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+ return;
+
trampoline_32bit = (unsigned long *)find_trampoline_placement();
/* Preserve trampoline memory */
@@ -160,18 +167,8 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
*
* The new page table will be used by trampoline code for switching
* from 4- to 5-level paging or vice versa.
- *
- * If switching is not required, the page table is unused: trampoline
- * code wouldn't touch CR3.
*/
- /*
- * We are not going to use the page table in trampoline memory if we
- * are already in the desired paging mode.
- */
- if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
- goto out;
-
if (l5_required) {
/*
* For 4- to 5-level paging transition, set up current CR3 as
@@ -194,8 +191,7 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
(void *)src, PAGE_SIZE);
}
-out:
- toggle_la57(trampoline_32bit, l5_required);
+ toggle_la57(trampoline_32bit);
}
void cleanup_trampoline(void *pgtable)
--
2.39.2
The bare metal decompressor code was never really intended to run in a
hosted environment such as the EFI boot services, and does a few things
that are becoming problematic in the context of EFI boot now that the
logo requirements are getting tighter: EFI executables will no longer be
allowed to consist of a single executable section that is mapped with
read, write and execute permissions if they are intended for use in a
context where Secure Boot is enabled (and where Microsoft's set of
certificates is used, i.e., every x86 PC built to run Windows)
To avoid stepping on reserved memory before having inspected the E820
tables, and to ensure the correct placement when running a kernel build
that is non-relocatable, the bare metal decompressor moves its own
executable image to the end of the allocation that was reserved for it,
in order to perform the decompression in place. This means the region in
question requires both write and execute permissions, which either need
to be given upfront (which EFI will no longer permit), or need to be
applied on demand using the existing page fault handling framework.
However, the physical placement of the kernel is usually randomized
anyway, and even if it isn't, a dedicated decompression output buffer
can be allocated anywhere in memory using EFI APIs when still running in
the boot services, given that EFI support already implies a relocatable
kernel. This means that decompression in place is never necessary, nor
is moving the compressed image from one end to the other.
Since EFI already maps all of memory 1:1, it is also unnecessary to
create new page tables or handle page faults when decompressing the
kernel. That means there is also no need to replace the special
exception handlers for SEV. Generally, there is little need to do
any of the things that the decompressor does beyond
- initialize SEV encryption, if needed,
- perform the 4/5 level paging switch, if needed,
- decompress the kernel
- relocate the kernel
So do all of this from the EFI stub code, and avoid the bare metal
decompressor altogether.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/Makefile | 5 +
arch/x86/boot/compressed/efi_mixed.S | 55 -------
arch/x86/boot/compressed/head_32.S | 13 --
arch/x86/boot/compressed/head_64.S | 27 ----
arch/x86/include/asm/efi.h | 7 +-
arch/x86/include/asm/sev.h | 2 +
drivers/firmware/efi/libstub/x86-stub.c | 166 +++++++++-----------
7 files changed, 84 insertions(+), 191 deletions(-)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 40d2ff503079e594..71fc531b95b4eede 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -74,6 +74,11 @@ LDFLAGS_vmlinux += -z noexecstack
ifeq ($(CONFIG_LD_IS_BFD),y)
LDFLAGS_vmlinux += $(call ld-option,--no-warn-rwx-segments)
endif
+ifeq ($(CONFIG_EFI_STUB),y)
+# ensure that the static EFI stub library will be pulled in, even if it is
+# never referenced explicitly from the startup code
+LDFLAGS_vmlinux += -u efi_pe_entry
+endif
LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 8a02a151806df14c..f4e22ef774ab6b4a 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -269,10 +269,6 @@ SYM_FUNC_START_LOCAL(efi32_entry)
jmp startup_32
SYM_FUNC_END(efi32_entry)
-#define ST32_boottime 60 // offsetof(efi_system_table_32_t, boottime)
-#define BS32_handle_protocol 88 // offsetof(efi_boot_services_32_t, handle_protocol)
-#define LI32_image_base 32 // offsetof(efi_loaded_image_32_t, image_base)
-
/*
* efi_status_t efi32_pe_entry(efi_handle_t image_handle,
* efi_system_table_32_t *sys_table)
@@ -280,8 +276,6 @@ SYM_FUNC_END(efi32_entry)
SYM_FUNC_START(efi32_pe_entry)
pushl %ebp
movl %esp, %ebp
- pushl %eax // dummy push to allocate loaded_image
-
pushl %ebx // save callee-save registers
pushl %edi
@@ -290,48 +284,8 @@ SYM_FUNC_START(efi32_pe_entry)
movl $0x80000003, %eax // EFI_UNSUPPORTED
jnz 2f
- call 1f
-1: pop %ebx
-
- /* Get the loaded image protocol pointer from the image handle */
- leal -4(%ebp), %eax
- pushl %eax // &loaded_image
- leal (loaded_image_proto - 1b)(%ebx), %eax
- pushl %eax // pass the GUID address
- pushl 8(%ebp) // pass the image handle
-
- /*
- * Note the alignment of the stack frame.
- * sys_table
- * handle <-- 16-byte aligned on entry by ABI
- * return address
- * frame pointer
- * loaded_image <-- local variable
- * saved %ebx <-- 16-byte aligned here
- * saved %edi
- * &loaded_image
- * &loaded_image_proto
- * handle <-- 16-byte aligned for call to handle_protocol
- */
-
- movl 12(%ebp), %eax // sys_table
- movl ST32_boottime(%eax), %eax // sys_table->boottime
- call *BS32_handle_protocol(%eax) // sys_table->boottime->handle_protocol
- addl $12, %esp // restore argument space
- testl %eax, %eax
- jnz 2f
-
movl 8(%ebp), %ecx // image_handle
movl 12(%ebp), %edx // sys_table
- movl -4(%ebp), %esi // loaded_image
- movl LI32_image_base(%esi), %esi // loaded_image->image_base
- leal (startup_32 - 1b)(%ebx), %ebp // runtime address of startup_32
- /*
- * We need to set the image_offset variable here since startup_32() will
- * use it before we get to the 64-bit efi_pe_entry() in C code.
- */
- subl %esi, %ebp // calculate image_offset
- movl %ebp, (image_offset - 1b)(%ebx) // save image_offset
xorl %esi, %esi
jmp efi32_entry // pass %ecx, %edx, %esi
// no other registers remain live
@@ -350,15 +304,6 @@ SYM_FUNC_START_NOALIGN(efi64_stub_entry)
SYM_FUNC_END(efi64_stub_entry)
#endif
- .section ".rodata"
- /* EFI loaded image protocol GUID */
- .balign 4
-SYM_DATA_START_LOCAL(loaded_image_proto)
- .long 0x5b1b31a1
- .word 0x9562, 0x11d2
- .byte 0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b
-SYM_DATA_END(loaded_image_proto)
-
.data
.balign 8
SYM_DATA_START_LOCAL(efi32_boot_gdt)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 3af4a383615b3e1f..1cfe9802a42fe570 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -84,19 +84,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
leal startup_32@GOTOFF(%edx), %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32() will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry() will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl image_offset@GOTOFF(%edx), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 28f46051c706724e..bf4a10a5794f1c84 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -146,19 +146,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
movl %ebp, %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl rva(image_offset)(%ebp), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
@@ -335,20 +322,6 @@ SYM_CODE_START(startup_64)
/* Start with the delta to where the kernel will run at. */
#ifdef CONFIG_RELOCATABLE
leaq startup_32(%rip) /* - $startup_32 */, %rbp
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- movl image_offset(%rip), %eax
- subq %rax, %rbp
-#endif
-
movl BP_kernel_alignment(%rsi), %eax
decl %eax
addq %rax, %rbp
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8b4be7cecdb8eb73..b0994ae3bc23f84d 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,6 +90,8 @@ static inline void efi_fpu_end(void)
}
#ifdef CONFIG_X86_32
+#define EFI_X86_KERNEL_ALLOC_LIMIT (SZ_512M - 1)
+
#define arch_efi_call_virt_setup() \
({ \
efi_fpu_begin(); \
@@ -103,8 +105,7 @@ static inline void efi_fpu_end(void)
})
#else /* !CONFIG_X86_32 */
-
-#define EFI_LOADER_SIGNATURE "EL64"
+#define EFI_X86_KERNEL_ALLOC_LIMIT EFI_ALLOC_LIMIT
extern asmlinkage u64 __efi_call(void *fp, ...);
@@ -218,6 +219,8 @@ efi_status_t efi_set_virtual_address_map(unsigned long memory_map_size,
#ifdef CONFIG_EFI_MIXED
+#define EFI_ALLOC_LIMIT (efi_is_64bit() ? ULONG_MAX : U32_MAX)
+
#define ARCH_HAS_EFISTUB_WRAPPERS
static inline bool efi_is_64bit(void)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index b97d239e18ea25fc..5b4a1ce3d36808b4 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -164,6 +164,7 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern void sev_enable(struct boot_params *bp);
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs)
{
@@ -218,6 +219,7 @@ static inline void sev_es_ist_exit(void) { }
static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline void sev_enable(struct boot_params *bp) { }
static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
static inline void setup_ghcb(void) { }
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index b4685da2b8d5c243..e976288728e975f1 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,17 +15,14 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/kaslr.h>
#include <asm/sev.h>
#include "efistub.h"
#include "x86-stub.h"
-/* Maximum physical address for 64-bit kernel with 4-level paging */
-#define MAXMEM_X86_64_4LEVEL (1ull << 46)
-
const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
-u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
static efi_memory_attribute_protocol_t *memattr;
@@ -287,28 +284,6 @@ void efi_adjust_memory_range_protection(unsigned long start,
}
}
-extern const u8 startup_32[], startup_64[];
-
-static void
-setup_memory_protection(unsigned long image_base, unsigned long image_size)
-{
-#ifdef CONFIG_64BIT
- if (image_base != (unsigned long)startup_32)
- efi_adjust_memory_range_protection(image_base, image_size);
-#else
- /*
- * Clear protection flags on a whole range of possible
- * addresses used for KASLR. We don't need to do that
- * on x86_64, since KASLR/extraction is performed after
- * dedicated identity page tables are built and we only
- * need to remove possible protection on relocated image
- * itself disregarding further relocations.
- */
- efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
-#endif
-}
-
static void setup_unaccepted_memory(void)
{
efi_guid_t mem_acceptance_proto = OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID;
@@ -334,9 +309,7 @@ static void setup_unaccepted_memory(void)
static const efi_char16_t apple[] = L"Apple";
-static void setup_quirks(struct boot_params *boot_params,
- unsigned long image_base,
- unsigned long image_size)
+static void setup_quirks(struct boot_params *boot_params)
{
efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
efi_table_attr(efi_system_table, fw_vendor);
@@ -345,9 +318,6 @@ static void setup_quirks(struct boot_params *boot_params,
if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
retrieve_apple_device_properties(boot_params);
}
-
- if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
- setup_memory_protection(image_base, image_size);
}
/*
@@ -500,7 +470,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
}
image_base = efi_table_attr(image, image_base);
- image_offset = (void *)startup_32 - image_base;
status = efi_allocate_pages(sizeof(struct boot_params),
(unsigned long *)&boot_params, ULONG_MAX);
@@ -804,6 +773,61 @@ static bool have_unsupported_snp_features(void)
return false;
}
+static void efi_get_seed(void *seed, int size)
+{
+ efi_get_random_bytes(size, seed);
+
+ /*
+ * This only updates seed[0] when running on 32-bit, but in that case,
+ * seed[1] is not used anyway, as there is no virtual KASLR on 32-bit.
+ */
+ *(unsigned long *)seed ^= kaslr_get_random_long("EFI");
+}
+
+static void error(char *str)
+{
+ efi_warn("Decompression failed: %s\n", str);
+}
+
+static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
+{
+ unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+ unsigned long addr, alloc_size, entry;
+ efi_status_t status;
+ u32 seed[2] = {};
+
+ /* determine the required size of the allocation */
+ alloc_size = ALIGN(max_t(unsigned long, output_len, kernel_total_size),
+ MIN_KERNEL_ALIGN);
+
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) {
+ u64 range = KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR - kernel_total_size;
+
+ efi_get_seed(seed, sizeof(seed));
+
+ virt_addr += (range * seed[1]) >> 32;
+ virt_addr &= ~(CONFIG_PHYSICAL_ALIGN - 1);
+ }
+
+ status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
+ seed[0], EFI_LOADER_CODE,
+ EFI_X86_KERNEL_ALLOC_LIMIT);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ entry = decompress_kernel((void *)addr, virt_addr, error);
+ if (entry == ULONG_MAX) {
+ efi_free(alloc_size, addr);
+ return EFI_LOAD_ERROR;
+ }
+
+ *kernel_entry = addr + entry;
+
+ efi_adjust_memory_range_protection(addr, kernel_total_size);
+
+ return EFI_SUCCESS;
+}
+
static void __noreturn enter_kernel(unsigned long kernel_addr,
struct boot_params *boot_params)
{
@@ -823,10 +847,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
struct boot_params *boot_params)
{
efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
- unsigned long bzimage_addr = (unsigned long)startup_32;
- unsigned long buffer_start, buffer_end;
struct setup_header *hdr = &boot_params->hdr;
const struct linux_efi_initrd *initrd = NULL;
+ unsigned long kernel_entry;
efi_status_t status;
efi_system_table = sys_table_arg;
@@ -855,60 +878,6 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
- /*
- * If the kernel isn't already loaded at a suitable address,
- * relocate it.
- *
- * It must be loaded above LOAD_PHYSICAL_ADDR.
- *
- * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
- * is defined as the macro MAXMEM, but unfortunately that is not a
- * compile-time constant if 5-level paging is configured, so we instead
- * define our own macro for use here.
- *
- * For 32-bit, the maximum address is complicated to figure out, for
- * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
- * KASLR uses.
- *
- * Also relocate it if image_offset is zero, i.e. the kernel wasn't
- * loaded by LoadImage, but rather by a bootloader that called the
- * handover entry. The reason we must always relocate in this case is
- * to handle the case of systemd-boot booting a unified kernel image,
- * which is a PE executable that contains the bzImage and an initrd as
- * COFF sections. The initrd section is placed after the bzImage
- * without ensuring that there are at least init_size bytes available
- * for the bzImage, and thus the compressed kernel's startup code may
- * overwrite the initrd unless it is moved out of the way.
- */
-
- buffer_start = ALIGN(bzimage_addr - image_offset,
- hdr->kernel_alignment);
- buffer_end = buffer_start + hdr->init_size;
-
- if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
- (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
- (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
- (image_offset == 0)) {
- extern char _bss[];
-
- status = efi_relocate_kernel(&bzimage_addr,
- (unsigned long)_bss - bzimage_addr,
- hdr->init_size,
- hdr->pref_address,
- hdr->kernel_alignment,
- LOAD_PHYSICAL_ADDR);
- if (status != EFI_SUCCESS) {
- efi_err("efi_relocate_kernel() failed!\n");
- goto fail;
- }
- /*
- * Now that we've copied the kernel elsewhere, we no longer
- * have a set up block before startup_32(), so reset image_offset
- * to zero in case it was set earlier.
- */
- image_offset = 0;
- }
-
#ifdef CONFIG_CMDLINE_BOOL
status = efi_parse_options(CONFIG_CMDLINE);
if (status != EFI_SUCCESS) {
@@ -926,6 +895,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
}
+ status = efi_decompress_kernel(&kernel_entry);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to decompress kernel\n");
+ goto fail;
+ }
+
/*
* At this point, an initrd may already have been loaded by the
* bootloader and passed via bootparams. We permit an initrd loaded
@@ -965,7 +940,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
setup_efi_pci(boot_params);
- setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
+ setup_quirks(boot_params);
setup_unaccepted_memory();
@@ -975,12 +950,15 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
+ /*
+ * Call the SEV init code while still running with the firmware's
+ * GDT/IDT, so #VC exceptions will be handled by EFI.
+ */
+ sev_enable(boot_params);
+
efi_5level_switch();
- if (IS_ENABLED(CONFIG_X86_64))
- bzimage_addr += startup_64 - startup_32;
-
- enter_kernel(bzimage_addr, boot_params);
+ enter_kernel(kernel_entry, boot_params);
fail:
efi_err("efi_stub_entry() failed!\n");
--
2.39.2
Before refactoring the EFI stub boot flow to avoid the legacy bare metal
decompressor, duplicate the SNP feature check in the EFI stub before
handing over to the kernel proper.
The SNP feature check can be performed while running under the EFI boot
services, which means it can force the boot to fail gracefully and
return an error to the bootloader if the loaded kernel does not
implement support for all the features that the hypervisor enabled.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/sev.c | 112 ++++++++++++--------
arch/x86/include/asm/sev.h | 4 +
drivers/firmware/efi/libstub/x86-stub.c | 17 +++
3 files changed, 87 insertions(+), 46 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index c3e343bd4760e0ab..199155b8af3bc535 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -367,20 +367,25 @@ static void enforce_vmpl0(void)
*/
#define SNP_FEATURES_PRESENT (0)
+u64 snp_get_unsupported_features(u64 status)
+{
+ if (!(status & MSR_AMD64_SEV_SNP_ENABLED))
+ return 0;
+
+ return status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+}
+
void snp_check_features(void)
{
u64 unsupported;
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
- return;
-
/*
* Terminate the boot if hypervisor has enabled any feature lacking
* guest side implementation. Pass on the unsupported features mask through
* EXIT_INFO_2 of the GHCB protocol so that those features can be reported
* as part of the guest boot failure.
*/
- unsupported = sev_status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+ unsupported = snp_get_unsupported_features(sev_status);
if (unsupported) {
if (ghcb_version < 2 || (!boot_ghcb && !early_setup_ghcb()))
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
@@ -390,10 +395,45 @@ void snp_check_features(void)
}
}
-void sev_enable(struct boot_params *bp)
+/*
+ * sev_check_cpu_support - Check for SEV support in the CPU capabilities
+ *
+ * Returns < 0 if SEV is not supported, otherwise the position of the
+ * encryption bit in the page table descriptors.
+ */
+static int sev_check_cpu_support(void)
{
unsigned int eax, ebx, ecx, edx;
+
+ /* Check for the SME/SEV support leaf */
+ eax = 0x80000000;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if (eax < 0x8000001f)
+ return -ENODEV;
+
+ /*
+ * Check for the SME/SEV feature:
+ * CPUID Fn8000_001F[EAX]
+ * - Bit 0 - Secure Memory Encryption support
+ * - Bit 1 - Secure Encrypted Virtualization support
+ * CPUID Fn8000_001F[EBX]
+ * - Bits 5:0 - Pagetable bit position used to indicate encryption
+ */
+ eax = 0x8000001f;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ /* Check whether SEV is supported */
+ if (!(eax & BIT(1)))
+ return -ENODEV;
+
+ return ebx & 0x3f;
+}
+
+void sev_enable(struct boot_params *bp)
+{
struct msr m;
+ int bitpos;
bool snp;
/*
@@ -413,26 +453,7 @@ void sev_enable(struct boot_params *bp)
* which is good enough.
*/
- /* Check for the SME/SEV support leaf */
- eax = 0x80000000;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- if (eax < 0x8000001f)
- return;
-
- /*
- * Check for the SME/SEV feature:
- * CPUID Fn8000_001F[EAX]
- * - Bit 0 - Secure Memory Encryption support
- * - Bit 1 - Secure Encrypted Virtualization support
- * CPUID Fn8000_001F[EBX]
- * - Bits 5:0 - Pagetable bit position used to indicate encryption
- */
- eax = 0x8000001f;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- /* Check whether SEV is supported */
- if (!(eax & BIT(1)))
+ if (sev_check_cpu_support() < 0)
return;
/*
@@ -443,26 +464,8 @@ void sev_enable(struct boot_params *bp)
/* Now repeat the checks with the SNP CPUID table. */
- /* Recheck the SME/SEV support leaf */
- eax = 0x80000000;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- if (eax < 0x8000001f)
- return;
-
- /*
- * Recheck for the SME/SEV feature:
- * CPUID Fn8000_001F[EAX]
- * - Bit 0 - Secure Memory Encryption support
- * - Bit 1 - Secure Encrypted Virtualization support
- * CPUID Fn8000_001F[EBX]
- * - Bits 5:0 - Pagetable bit position used to indicate encryption
- */
- eax = 0x8000001f;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- /* Check whether SEV is supported */
- if (!(eax & BIT(1))) {
+ bitpos = sev_check_cpu_support();
+ if (bitpos < 0) {
if (snp)
error("SEV-SNP support indicated by CC blob, but not CPUID.");
return;
@@ -494,7 +497,24 @@ void sev_enable(struct boot_params *bp)
if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
- sme_me_mask = BIT_ULL(ebx & 0x3f);
+ sme_me_mask = BIT_ULL(bitpos);
+}
+
+/*
+ * sev_get_status - Retrieve the SEV status mask
+ *
+ * Returns 0 if the CPU is not SEV capable, otherwise the value of the
+ * AMD64_SEV MSR.
+ */
+u64 sev_get_status(void)
+{
+ struct msr m;
+
+ if (sev_check_cpu_support() < 0)
+ return 0;
+
+ boot_rdmsr(MSR_AMD64_SEV, &m);
+ return m.q;
}
/* Search for Confidential Computing blob in the EFI config table. */
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 66c806784c5256bd..b97d239e18ea25fc 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -210,6 +210,8 @@ bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);
+u64 snp_get_unsupported_features(u64 status);
+u64 sev_get_status(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -235,6 +237,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
}
static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
+static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
+static inline u64 sev_get_status(void) { return 0; }
#endif
#endif
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index acb1c65bf8ac6fb3..b4685da2b8d5c243 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,6 +15,7 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/sev.h>
#include "efistub.h"
#include "x86-stub.h"
@@ -790,6 +791,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+static bool have_unsupported_snp_features(void)
+{
+ u64 unsupported;
+
+ unsupported = snp_get_unsupported_features(sev_get_status());
+ if (unsupported) {
+ efi_err("Unsupported SEV-SNP features detected: 0x%llx\n",
+ unsupported);
+ return true;
+ }
+ return false;
+}
+
static void __noreturn enter_kernel(unsigned long kernel_addr,
struct boot_params *boot_params)
{
@@ -820,6 +834,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
+ if (have_unsupported_snp_features())
+ efi_exit(handle, EFI_UNSUPPORTED);
+
if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
if (efi_dxe_table &&
--
2.39.2
The 32-bit trampoline no longer uses the stack for anything except
performing a far return back to long mode, and preserving the caller's
stack pointer value. Currently, the trampoline stack is placed in the
same page that carries the trampoline code, which means this page must
be mapped writable and executable, and the stack is therefore executable
as well.
Replace the far return with a far jump, so that the return address can
be pre-calculated and patched into the code before it is called. This
removes the need for a 32-bit addressable stack entirely, and in a later
patch, this will be taken advantage of by removing writable permissions
from (and adding executable permissions to) the trampoline code page
when booting via the EFI stub.
Note that the value of RSP still needs to be preserved explicitly across
the switch into 32-bit mode, as the register may get truncated to 32
bits.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 45 ++++++++++++--------
arch/x86/boot/compressed/pgtable.h | 4 +-
arch/x86/boot/compressed/pgtable_64.c | 12 +++++-
3 files changed, 40 insertions(+), 21 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index c47504208105d7d3..37fd7b7d683d696c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -540,6 +540,7 @@ SYM_FUNC_END(.Lrelocated)
* trampoline memory. A non-zero second argument (ESI) means that the
* trampoline needs to enable 5-level paging.
*/
+ .section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
/*
* Preserve live 64-bit registers on the stack: this is necessary
@@ -550,13 +551,9 @@ SYM_CODE_START(trampoline_32bit_src)
pushq %rbp
pushq %rbx
- /* Set up 32-bit addressable stack and push the old RSP value */
- leaq (TRAMPOLINE_32BIT_STACK_END - 8)(%rcx), %rbx
- movq %rsp, (%rbx)
- movq %rbx, %rsp
-
- /* Take the address of the trampoline exit code */
- leaq .Lret(%rip), %rbx
+ /* Preserve top half of RSP in a legacy mode GPR to avoid truncation */
+ movq %rsp, %rbx
+ shrq $32, %rbx
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
@@ -564,9 +561,17 @@ SYM_CODE_START(trampoline_32bit_src)
pushq %rax
lretq
+ /*
+ * The 32-bit code below will do a far jump back to long mode and end
+ * up here after reconfiguring the number of paging levels. First, the
+ * stack pointer needs to be restored to its full 64-bit value before
+ * the callee save register contents can be popped from the stack.
+ */
.Lret:
+ shlq $32, %rbx
+ orq %rbx, %rsp
+
/* Restore the preserved 64-bit registers */
- movq (%rsp), %rsp
popq %rbx
popq %rbp
popq %r15
@@ -574,11 +579,6 @@ SYM_CODE_START(trampoline_32bit_src)
.code32
0:
- /* Set up data and stack segments */
- movl $__KERNEL_DS, %eax
- movl %eax, %ds
- movl %eax, %ss
-
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
@@ -633,18 +633,26 @@ SYM_CODE_START(trampoline_32bit_src)
1:
movl %eax, %cr4
- /* Prepare the stack for far return to Long Mode */
- pushl $__KERNEL_CS
- pushl %ebx
-
/* Enable paging again. */
movl %cr0, %eax
btsl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- lret
+ /*
+ * Return to the 64-bit calling code using LJMP rather than LRET, to
+ * avoid the need for a 32-bit addressable stack. The destination
+ * address will be adjusted after the template code is copied into a
+ * 32-bit addressable buffer.
+ */
+.Ljmp: ljmpl $__KERNEL_CS, $(.Lret - trampoline_32bit_src)
SYM_CODE_END(trampoline_32bit_src)
+/*
+ * This symbol is placed right after trampoline_32bit_src() so its address can
+ * be used to infer the size of the trampoline code.
+ */
+SYM_DATA(trampoline_ljmp_imm_offset, .word .Ljmp + 1 - trampoline_32bit_src)
+
/*
* The trampoline code has a size limit.
* Make sure we fail to compile if the trampoline code grows
@@ -652,6 +660,7 @@ SYM_CODE_END(trampoline_32bit_src)
*/
.org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
+ .text
SYM_FUNC_START_LOCAL_NOALIGN(.Lno_longmode)
/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 4e8cef135226bcbb..c6b0903aded05a07 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -8,13 +8,13 @@
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
-#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
-
#ifndef __ASSEMBLER__
extern unsigned long *trampoline_32bit;
extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
+extern const u16 trampoline_ljmp_imm_offset;
+
#endif /* __ASSEMBLER__ */
#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index f8092d3244c9559b..5198a05aefa8d14a 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -109,6 +109,7 @@ static unsigned long find_trampoline_placement(void)
struct paging_config paging_prepare(void *rmode)
{
struct paging_config paging_config = {};
+ void *tramp_code;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
boot_params = rmode;
@@ -148,9 +149,18 @@ struct paging_config paging_prepare(void *rmode)
memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
/* Copy trampoline code in place */
- memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+ tramp_code = memcpy(trampoline_32bit +
+ TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
+ /*
+ * Avoid the need for a stack in the 32-bit trampoline code, by using
+ * LJMP rather than LRET to return back to long mode. LJMP takes an
+ * immediate absolute address, which needs to be adjusted based on the
+ * placement of the trampoline.
+ */
+ *(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+
/*
* The code below prepares page table in trampoline memory.
*
--
2.39.2
Move the long return to switch to 32-bit mode into the trampoline code
so it can be called as an ordinary function. This will allow it to be
called directly from C code in a subsequent patch.
While at it, reorganize the code somewhat to keep the prologue and
epilogue of the function together, making the code a bit easier to
follow. Also, given that the trampoline is now entered in 64-bit mode, a
simple RIP-relative reference can be used to take the address of the
exit point.
Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 79 +++++++++-----------
arch/x86/boot/compressed/pgtable.h | 2 +-
2 files changed, 36 insertions(+), 45 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 19bf810409e2aa62..91b5eee306148f9a 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -447,39 +447,8 @@ SYM_CODE_START(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
- /* Set up 32-bit addressable stack */
- leaq TRAMPOLINE_32BIT_STACK_END(%rcx), %rsp
-
- /*
- * Preserve live 64-bit registers on the stack: this is necessary
- * because the architecture does not guarantee that GPRs will retain
- * their full 64-bit values across a 32-bit mode switch.
- */
- pushq %r15
- pushq %rbp
- pushq %rbx
-
- /*
- * Push the 64-bit address of trampoline_return() onto the new stack.
- * It will be used by the trampoline to return to the main code. Due to
- * the 32-bit mode switch, it cannot be kept it in a register either.
- */
- leaq trampoline_return(%rip), %rdi
- pushq %rdi
-
- /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
- pushq $__KERNEL32_CS
leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
- pushq %rax
- lretq
-trampoline_return:
- /* Restore live 64-bit registers */
- popq %rbx
- popq %rbp
- popq %r15
-
- /* Restore the stack, the 32-bit trampoline uses its own stack */
- leaq rva(boot_stack_end)(%rbx), %rsp
+ call *%rax
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -561,7 +530,6 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
jmp *%rax
SYM_FUNC_END(.Lrelocated)
- .code32
/*
* This is the 32-bit trampoline that will be copied over to low memory.
*
@@ -570,6 +538,39 @@ SYM_FUNC_END(.Lrelocated)
* Non zero RDX means trampoline needs to enable 5-level paging.
*/
SYM_CODE_START(trampoline_32bit_src)
+ /*
+ * Preserve live 64-bit registers on the stack: this is necessary
+ * because the architecture does not guarantee that GPRs will retain
+ * their full 64-bit values across a 32-bit mode switch.
+ */
+ pushq %r15
+ pushq %rbp
+ pushq %rbx
+
+ /* Set up 32-bit addressable stack and push the old RSP value */
+ leaq (TRAMPOLINE_32BIT_STACK_END - 8)(%rcx), %rbx
+ movq %rsp, (%rbx)
+ movq %rbx, %rsp
+
+ /* Take the address of the trampoline exit code */
+ leaq .Lret(%rip), %rbx
+
+ /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
+ pushq $__KERNEL32_CS
+ leaq 0f(%rip), %rax
+ pushq %rax
+ lretq
+
+.Lret:
+ /* Restore the preserved 64-bit registers */
+ movq (%rsp), %rsp
+ popq %rbx
+ popq %rbp
+ popq %r15
+ retq
+
+ .code32
+0:
/* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
@@ -633,12 +634,9 @@ SYM_CODE_START(trampoline_32bit_src)
1:
movl %eax, %cr4
- /* Calculate address of paging_enabled() once we are executing in the trampoline */
- leal .Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
-
/* Prepare the stack for far return to Long Mode */
pushl $__KERNEL_CS
- pushl %eax
+ pushl %ebx
/* Enable paging again. */
movl %cr0, %eax
@@ -648,12 +646,6 @@ SYM_CODE_START(trampoline_32bit_src)
lret
SYM_CODE_END(trampoline_32bit_src)
- .code64
-SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
- /* Return from the trampoline */
- retq
-SYM_FUNC_END(.Lpaging_enabled)
-
/*
* The trampoline code has a size limit.
* Make sure we fail to compile if the trampoline code grows
@@ -661,7 +653,6 @@ SYM_FUNC_END(.Lpaging_enabled)
*/
.org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
- .code32
SYM_FUNC_START_LOCAL_NOALIGN(.Lno_longmode)
/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index cc9b2529a08634b4..91dbb99203fbce2d 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -6,7 +6,7 @@
#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE 0x80
+#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
--
2.39.2
There is no need to defer the assignment of the paging related global
variabled 'pgdir_shift' and 'ptrs_per_p4d' until after the trampoline is
cleaned up, so assign them as soon as it is clear that 5-level paging
will be enabled.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/misc.h | 2 --
arch/x86/boot/compressed/pgtable_64.c | 14 +++++---------
2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 964fe903a1cdccdc..cc70d3fb90497ed8 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -179,9 +179,7 @@ static inline int count_immovable_mem_regions(void) { return 0; }
#endif
/* ident_map_64.c */
-#ifdef CONFIG_X86_5LEVEL
extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
-#endif
extern void kernel_add_identity_map(unsigned long start, unsigned long end);
/* Used by PAGE_KERN* macros: */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff4111bf8c0..f8092d3244c9559b 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -130,6 +130,11 @@ struct paging_config paging_prepare(void *rmode)
native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
paging_config.l5_required = 1;
+
+ /* Initialize variables for 5-level paging */
+ __pgtable_l5_enabled = 1;
+ pgdir_shift = 48;
+ ptrs_per_p4d = 512;
}
paging_config.trampoline_start = find_trampoline_placement();
@@ -206,13 +211,4 @@ void cleanup_trampoline(void *pgtable)
/* Restore trampoline memory */
memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE);
-
- /* Initialize variables for 5-level paging */
-#ifdef CONFIG_X86_5LEVEL
- if (__read_cr4() & X86_CR4_LA57) {
- __pgtable_l5_enabled = 1;
- pgdir_shift = 48;
- ptrs_per_p4d = 512;
- }
-#endif
}
--
2.39.2
The ZSTD decompressor requires malloc() allocations to be 8 byte
aligned, so ensure that this the case.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
include/linux/decompress/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h
index 9192986b1a731323..ac862422df158bef 100644
--- a/include/linux/decompress/mm.h
+++ b/include/linux/decompress/mm.h
@@ -48,7 +48,7 @@ MALLOC_VISIBLE void *malloc(int size)
if (!malloc_ptr)
malloc_ptr = free_mem_ptr;
- malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */
+ malloc_ptr = (malloc_ptr + 7) & ~7; /* Align */
p = (void *)malloc_ptr;
malloc_ptr += size;
--
2.39.2
Currently, the EFI stub relies on DXE services in some cases to clear
non-execute restrictions from page allocations that need to be
executable. This is dodgy, because DXE services are not specified by
UEFI but by PI, and they are not intended for consumption by OS loaders.
However, no alternative existed at the time.
Now, there is a new UEFI protocol that should be used instead, so if it
exists, prefer it over the DXE services calls.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/x86-stub.c | 29 ++++++++++++++------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index af5f50617a5b4c59..acb1c65bf8ac6fb3 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -26,6 +26,7 @@ const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
+static efi_memory_attribute_protocol_t *memattr;
typedef union sev_memory_acceptance_protocol sev_memory_acceptance_protocol_t;
union sev_memory_acceptance_protocol {
@@ -233,12 +234,18 @@ void efi_adjust_memory_range_protection(unsigned long start,
unsigned long rounded_start, rounded_end;
unsigned long unprotect_start, unprotect_size;
- if (efi_dxe_table == NULL)
- return;
-
rounded_start = rounddown(start, EFI_PAGE_SIZE);
rounded_end = roundup(start + size, EFI_PAGE_SIZE);
+ if (memattr != NULL) {
+ efi_call_proto(memattr, clear_memory_attributes, rounded_start,
+ rounded_end - rounded_start, EFI_MEMORY_XP);
+ return;
+ }
+
+ if (efi_dxe_table == NULL)
+ return;
+
/*
* Don't modify memory region attributes, they are
* already suitable, to lower the possibility to
@@ -801,6 +808,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_system_table_t *sys_table_arg,
struct boot_params *boot_params)
{
+ efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
unsigned long bzimage_addr = (unsigned long)startup_32;
unsigned long buffer_start, buffer_end;
struct setup_header *hdr = &boot_params->hdr;
@@ -812,13 +820,18 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
- efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
- if (efi_dxe_table &&
- efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
- efi_warn("Ignoring DXE services table: invalid signature\n");
- efi_dxe_table = NULL;
+ if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
+ efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+ if (efi_dxe_table &&
+ efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
+ efi_warn("Ignoring DXE services table: invalid signature\n");
+ efi_dxe_table = NULL;
+ }
}
+ /* grab the memory attributes protocol if it exists */
+ efi_bs_call(locate_protocol, &guid, NULL, (void **)&memattr);
+
status = efi_setup_5level_paging();
if (status != EFI_SUCCESS) {
efi_err("efi_setup_5level_paging() failed!\n");
--
2.39.2
The native 32-bit or 64-bit EFI handover protocol entrypoint offset
relative to the respective startup_32/64 address is described in
boot_params as handover_offset, so that the special Linux/x86 aware EFI
loader can find it there.
When mixed mode is enabled, this single field has to describe this
offset for both the 32-bit and 64-bit entrypoints, so their respective
relative offsets have to be identical. Given that startup_32 and
startup_64 are 0x200 bytes apart, and the EFI handover entrypoint
resides at a fixed offset, the 32-bit and 64-bit versions of those
entrypoints must be exactly 0x200 bytes apart as well.
Currently, hard-coded fixed offsets are used to ensure this, but it is
sufficient to emit the 64-bit entrypoint 0x200 bytes after the 32-bit
one, wherever it happens to reside. This allows this code (which is now
EFI mixed mode specific) to be moved into efi_mixed.S and out of the
startup code in head_64.S.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/efi_mixed.S | 20 +++++++++++++++++++-
arch/x86/boot/compressed/head_64.S | 18 ------------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index dcc562c8f7f35162..9308b595f6f0a5de 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -140,6 +140,16 @@ SYM_FUNC_START(__efi64_thunk)
SYM_FUNC_END(__efi64_thunk)
.code32
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+SYM_FUNC_START(efi32_stub_entry)
+ add $0x4, %esp /* Discard return address */
+ popl %ecx
+ popl %edx
+ popl %esi
+ jmp efi32_entry
+SYM_FUNC_END(efi32_stub_entry)
+#endif
+
/*
* EFI service pointer must be in %edi.
*
@@ -220,7 +230,7 @@ SYM_FUNC_END(efi_enter32)
* stub may still exit and return to the firmware using the Exit() EFI boot
* service.]
*/
-SYM_FUNC_START(efi32_entry)
+SYM_FUNC_START_LOCAL(efi32_entry)
call 1f
1: pop %ebx
@@ -320,6 +330,14 @@ SYM_FUNC_START(efi32_pe_entry)
RET
SYM_FUNC_END(efi32_pe_entry)
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+ .org efi32_stub_entry + 0x200
+ .code64
+SYM_FUNC_START_NOALIGN(efi64_stub_entry)
+ jmp efi_stub_entry
+SYM_FUNC_END(efi64_stub_entry)
+#endif
+
.section ".rodata"
/* EFI loaded image protocol GUID */
.balign 4
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index e6880208b9a1b90a..a3f764daf3a3adbc 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -294,17 +294,6 @@ SYM_FUNC_START(startup_32)
lret
SYM_FUNC_END(startup_32)
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
- .org 0x190
-SYM_FUNC_START(efi32_stub_entry)
- add $0x4, %esp /* Discard return address */
- popl %ecx
- popl %edx
- popl %esi
- jmp efi32_entry
-SYM_FUNC_END(efi32_stub_entry)
-#endif
-
.code64
.org 0x200
SYM_CODE_START(startup_64)
@@ -542,13 +531,6 @@ trampoline_return:
jmp *%rax
SYM_CODE_END(startup_64)
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
- .org 0x390
-SYM_FUNC_START(efi64_stub_entry)
- jmp efi_stub_entry
-SYM_FUNC_END(efi64_stub_entry)
-#endif
-
.text
SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
--
2.39.2
Instead of pushing and popping %RSI several times to preserve the struct
boot_params pointer across the execution of the startup code, move it
into a callee save register before the first call into C, and copy it
back when needed.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 42 ++++++++------------
1 file changed, 16 insertions(+), 26 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a3f764daf3a3adbc..19bf810409e2aa62 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -405,10 +405,14 @@ SYM_CODE_START(startup_64)
lretq
.Lon_kernel_cs:
+ /*
+ * RSI holds a pointer to a boot_params structure provided by the
+ * loader, and this needs to be preserved across C function calls. So
+ * move it into a callee saved register.
+ */
+ movq %rsi, %r15
- pushq %rsi
call load_stage1_idt
- popq %rsi
#ifdef CONFIG_AMD_MEM_ENCRYPT
/*
@@ -419,12 +423,10 @@ SYM_CODE_START(startup_64)
* CPUID instructions being issued, so go ahead and do that now via
* sev_enable(), which will also handle the rest of the SEV-related
* detection/setup to ensure that has been done in advance of any dependent
- * code.
+ * code. Pass the boot_params pointer as the first argument.
*/
- pushq %rsi
- movq %rsi, %rdi /* real mode address */
+ movq %r15, %rdi
call sev_enable
- popq %rsi
#endif
/*
@@ -437,13 +439,10 @@ SYM_CODE_START(startup_64)
* - Non zero RDX means trampoline needs to enable 5-level
* paging.
*
- * RSI holds real mode data and needs to be preserved across
- * this function call.
+ * Pass the boot_params pointer as the first argument.
*/
- pushq %rsi
- movq %rsi, %rdi /* real mode address */
+ movq %r15, %rdi
call paging_prepare
- popq %rsi
/* Save the trampoline address in RCX */
movq %rax, %rcx
@@ -456,9 +455,9 @@ SYM_CODE_START(startup_64)
* because the architecture does not guarantee that GPRs will retain
* their full 64-bit values across a 32-bit mode switch.
*/
+ pushq %r15
pushq %rbp
pushq %rbx
- pushq %rsi
/*
* Push the 64-bit address of trampoline_return() onto the new stack.
@@ -475,9 +474,9 @@ SYM_CODE_START(startup_64)
lretq
trampoline_return:
/* Restore live 64-bit registers */
- popq %rsi
popq %rbx
popq %rbp
+ popq %r15
/* Restore the stack, the 32-bit trampoline uses its own stack */
leaq rva(boot_stack_end)(%rbx), %rsp
@@ -487,14 +486,9 @@ trampoline_return:
*
* RDI is address of the page table to use instead of page table
* in trampoline memory (if required).
- *
- * RSI holds real mode data and needs to be preserved across
- * this function call.
*/
- pushq %rsi
leaq rva(top_pgtable)(%rbx), %rdi
call cleanup_trampoline
- popq %rsi
/* Zero EFLAGS */
pushq $0
@@ -504,7 +498,6 @@ trampoline_return:
* Copy the compressed kernel to the end of our buffer
* where decompression in place becomes safe.
*/
- pushq %rsi
leaq (_bss-8)(%rip), %rsi
leaq rva(_bss-8)(%rbx), %rdi
movl $(_bss - startup_32), %ecx
@@ -512,7 +505,6 @@ trampoline_return:
std
rep movsq
cld
- popq %rsi
/*
* The GDT may get overwritten either during the copy we just did or
@@ -544,30 +536,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrq $3, %rcx
rep stosq
- pushq %rsi
call load_stage2_idt
/* Pass boot_params to initialize_identity_maps() */
- movq (%rsp), %rdi
+ movq %r15, %rdi
call initialize_identity_maps
- popq %rsi
/*
* Do the extraction, and jump to the new kernel..
*/
- pushq %rsi /* Save the real mode argument */
- movq %rsi, %rdi /* real mode address */
+ /* pass struct boot_params pointer */
+ movq %r15, %rdi
leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
leaq input_data(%rip), %rdx /* input_data */
movl input_len(%rip), %ecx /* input_len */
movq %rbp, %r8 /* output target address */
movl output_len(%rip), %r9d /* decompressed length, end of relocs */
call extract_kernel /* returns kernel entry point in %rax */
- popq %rsi
/*
* Jump to the decompressed kernel.
*/
+ movq %r15, %rsi
jmp *%rax
SYM_FUNC_END(.Lrelocated)
--
2.39.2
Factor out the decompressor sequence that invokes the decompressor,
parses the ELF and applies the relocations so that it can be called
directly from the EFI stub.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/misc.c | 28 ++++++++++++++++----
arch/x86/include/asm/boot.h | 8 ++++++
2 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 2d91d56b59e1af93..f711f2a85862e9ef 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -330,11 +330,33 @@ static size_t parse_elf(void *output)
return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
}
+const unsigned long kernel_total_size = VO__end - VO__text;
+
static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
extern unsigned char input_data[];
extern unsigned int input_len, output_len;
+unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
+ void (*error)(char *x))
+{
+ unsigned long entry;
+
+ if (!free_mem_ptr) {
+ free_mem_ptr = (unsigned long)boot_heap;
+ free_mem_end_ptr = (unsigned long)boot_heap + sizeof(boot_heap);
+ }
+
+ if (__decompress(input_data, input_len, NULL, NULL, outbuf, output_len,
+ NULL, error) < 0)
+ return ULONG_MAX;
+
+ entry = parse_elf(outbuf);
+ handle_relocations(outbuf, output_len, virt_addr);
+
+ return entry;
+}
+
/*
* The compressed kernel image (ZO), has been moved so that its position
* is against the end of the buffer used to hold the uncompressed kernel
@@ -354,7 +376,6 @@ extern unsigned int input_len, output_len;
*/
asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
{
- const unsigned long kernel_total_size = VO__end - VO__text;
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
memptr heap = (memptr)boot_heap;
unsigned long needed_size;
@@ -463,10 +484,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
accept_memory(__pa(output), __pa(output) + needed_size);
}
- __decompress(input_data, input_len, NULL, NULL, output, output_len,
- NULL, error);
- entry_offset = parse_elf(output);
- handle_relocations(output, output_len, virt_addr);
+ entry_offset = decompress_kernel(output, virt_addr, error);
debug_putstr("done.\nBooting the kernel (entry_offset: 0x");
debug_puthex(entry_offset);
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280d9ea3160d..4ae14339cb8cc72d 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -62,4 +62,12 @@
# define BOOT_STACK_SIZE 0x1000
#endif
+#ifndef __ASSEMBLY__
+extern unsigned int output_len;
+extern const unsigned long kernel_total_size;
+
+unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
+ void (*error)(char *x));
+#endif
+
#endif /* _ASM_X86_BOOT_H */
--
2.39.2
The only remaining use of the trampoline address by the trampoline
itself is deriving the page table address from it, and this involves
adding an offset of 0x0. So simplify this, and pass the new CR3 value
directly.
This makes the fact that the page table happens to be at the start of
the trampoline allocation an implementation detail of the caller.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 8 ++++----
arch/x86/boot/compressed/pgtable.h | 2 --
arch/x86/boot/compressed/pgtable_64.c | 9 ++++-----
3 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 8730b1d58e2b0825..afdaf8cb8bb98694 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -521,8 +521,9 @@ SYM_FUNC_END(.Lrelocated)
* running in 64-bit mode.
*
* Return address is at the top of the stack (might be above 4G).
- * The first argument (EDI) contains the 32-bit addressable base of the
- * trampoline memory.
+ * The first argument (EDI) contains the address of the temporary PGD level
+ * page table in 32-bit addressable memory which will be programmed into
+ * register CR3.
*/
.section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
@@ -575,8 +576,7 @@ SYM_CODE_START(trampoline_32bit_src)
movl %eax, %cr0
/* Point CR3 to the trampoline's new top level page table */
- leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
- movl %eax, %cr3
+ movl %edi, %cr3
/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
movl $MSR_EFER, %ecx
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index c6b0903aded05a07..6d595abe06b34d63 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -3,8 +3,6 @@
#define TRAMPOLINE_32BIT_SIZE (2 * PAGE_SIZE)
-#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
-
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 4213473ae54883c8..eab4e6b568ae05c4 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -103,7 +103,7 @@ static unsigned long find_trampoline_placement(void)
asmlinkage void configure_5level_paging(struct boot_params *bp)
{
- void (*toggle_la57)(void *trampoline);
+ void (*toggle_la57)(void *cr3);
bool l5_required = false;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
@@ -174,7 +174,7 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
* For 4- to 5-level paging transition, set up current CR3 as
* the first and the only entry in a new top-level page table.
*/
- trampoline_32bit[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = __native_read_cr3() | _PAGE_TABLE_NOENC;
+ *trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
} else {
unsigned long src;
@@ -187,8 +187,7 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
* may be above 4G.
*/
src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
- memcpy(trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
- (void *)src, PAGE_SIZE);
+ memcpy(trampoline_32bit, (void *)src, PAGE_SIZE);
}
toggle_la57(trampoline_32bit);
@@ -198,7 +197,7 @@ void cleanup_trampoline(void *pgtable)
{
void *trampoline_pgtable;
- trampoline_pgtable = trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long);
+ trampoline_pgtable = trampoline_32bit;
/*
* Move the top level page table out of trampoline memory,
--
2.39.2
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: cb1c9e02b0c13032c3aec325643453ba48e96e17
Gitweb: https://git.kernel.org/tip/cb1c9e02b0c13032c3aec325643453ba48e96e17
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:13 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:52:32 +02:00
x86/efistub: Perform 4/5 level paging switch from the stub
In preparation for updating the EFI stub boot flow to avoid the bare
metal decompressor code altogether, implement the support code for
switching between 4 and 5 levels of paging before jumping to the kernel
proper.
Reuse the newly refactored trampoline that the bare metal decompressor
uses, but relies on EFI APIs to allocate 32-bit addressable memory and
remap it with the appropriate permissions. Given that the bare metal
decompressor will no longer call into the trampoline if the number of
paging levels is already set correctly, it is no longer needed to remove
NX restrictions from the memory range where this trampoline may end up.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/firmware/efi/libstub/Makefile | 1 +-
drivers/firmware/efi/libstub/efi-stub-helper.c | 2 +-
drivers/firmware/efi/libstub/efistub.h | 1 +-
drivers/firmware/efi/libstub/x86-5lvl.c | 95 +++++++++++++++++-
drivers/firmware/efi/libstub/x86-stub.c | 40 ++-----
drivers/firmware/efi/libstub/x86-stub.h | 17 +++-
6 files changed, 130 insertions(+), 26 deletions(-)
create mode 100644 drivers/firmware/efi/libstub/x86-5lvl.c
create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 16d64a3..ae88744 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -88,6 +88,7 @@ lib-$(CONFIG_EFI_GENERIC_STUB) += efi-stub.o string.o intrinsics.o systable.o \
lib-$(CONFIG_ARM) += arm32-stub.o
lib-$(CONFIG_ARM64) += arm64.o arm64-stub.o smbios.o
lib-$(CONFIG_X86) += x86-stub.o
+lib-$(CONFIG_X86_64) += x86-5lvl.o
lib-$(CONFIG_RISCV) += riscv.o riscv-stub.o
lib-$(CONFIG_LOONGARCH) += loongarch.o loongarch-stub.o
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 7329842..bfa3062 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -73,6 +73,8 @@ efi_status_t efi_parse_options(char const *cmdline)
efi_loglevel = CONSOLE_LOGLEVEL_QUIET;
} else if (!strcmp(param, "noinitrd")) {
efi_noinitrd = true;
+ } else if (IS_ENABLED(CONFIG_X86_64) && !strcmp(param, "no5lvl")) {
+ efi_no5lvl = true;
} else if (!strcmp(param, "efi") && val) {
efi_nochunk = parse_option_str(val, "nochunk");
efi_novamap |= parse_option_str(val, "novamap");
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 6aa38a1..06b7abc 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -33,6 +33,7 @@
#define EFI_ALLOC_LIMIT ULONG_MAX
#endif
+extern bool efi_no5lvl;
extern bool efi_nochunk;
extern bool efi_nokaslr;
extern int efi_loglevel;
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
new file mode 100644
index 0000000..479dd44
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/efi.h>
+
+#include <asm/boot.h>
+#include <asm/desc.h>
+#include <asm/efi.h>
+
+#include "efistub.h"
+#include "x86-stub.h"
+
+bool efi_no5lvl;
+
+static void (*la57_toggle)(void *cr3);
+
+static const struct desc_struct gdt[] = {
+ [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(0xc09b, 0, 0xfffff),
+ [GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xa09b, 0, 0xfffff),
+};
+
+/*
+ * Enabling (or disabling) 5 level paging is tricky, because it can only be
+ * done from 32-bit mode with paging disabled. This means not only that the
+ * code itself must be running from 32-bit addressable physical memory, but
+ * also that the root page table must be 32-bit addressable, as programming
+ * a 64-bit value into CR3 when running in 32-bit mode is not supported.
+ */
+efi_status_t efi_setup_5level_paging(void)
+{
+ u8 tmpl_size = (u8 *)&trampoline_ljmp_imm_offset - (u8 *)&trampoline_32bit_src;
+ efi_status_t status;
+ u8 *la57_code;
+
+ if (!efi_is_64bit())
+ return EFI_SUCCESS;
+
+ /* check for 5 level paging support */
+ if (native_cpuid_eax(0) < 7 ||
+ !(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31))))
+ return EFI_SUCCESS;
+
+ /* allocate some 32-bit addressable memory for code and a page table */
+ status = efi_allocate_pages(2 * PAGE_SIZE, (unsigned long *)&la57_code,
+ U32_MAX);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ la57_toggle = memcpy(la57_code, trampoline_32bit_src, tmpl_size);
+ memset(la57_code + tmpl_size, 0x90, PAGE_SIZE - tmpl_size);
+
+ /*
+ * To avoid the need to allocate a 32-bit addressable stack, the
+ * trampoline uses a LJMP instruction to switch back to long mode.
+ * LJMP takes an absolute destination address, which needs to be
+ * fixed up at runtime.
+ */
+ *(u32 *)&la57_code[trampoline_ljmp_imm_offset] += (unsigned long)la57_code;
+
+ efi_adjust_memory_range_protection((unsigned long)la57_toggle, PAGE_SIZE);
+
+ return EFI_SUCCESS;
+}
+
+void efi_5level_switch(void)
+{
+ bool want_la57 = IS_ENABLED(CONFIG_X86_5LEVEL) && !efi_no5lvl;
+ bool have_la57 = native_read_cr4() & X86_CR4_LA57;
+ bool need_toggle = want_la57 ^ have_la57;
+ u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
+ u64 *cr3 = (u64 *)__native_read_cr3();
+ u64 *new_cr3;
+
+ if (!la57_toggle || !need_toggle)
+ return;
+
+ if (!have_la57) {
+ /*
+ * 5 level paging will be enabled, so a root level page needs
+ * to be allocated from the 32-bit addressable physical region,
+ * with its first entry referring to the existing hierarchy.
+ */
+ new_cr3 = memset(pgt, 0, PAGE_SIZE);
+ new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
+ } else {
+ /* take the new root table pointer from the current entry #0 */
+ new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+
+ /* copy the new root table if it is not 32-bit addressable */
+ if ((u64)new_cr3 > U32_MAX)
+ new_cr3 = memcpy(pgt, new_cr3, PAGE_SIZE);
+ }
+
+ native_load_gdt(&(struct desc_ptr){ sizeof(gdt) - 1, (u64)gdt });
+
+ la57_toggle(new_cr3);
+}
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 9247dbc..af5f506 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -17,6 +17,7 @@
#include <asm/boot.h>
#include "efistub.h"
+#include "x86-stub.h"
/* Maximum physical address for 64-bit kernel with 4-level paging */
#define MAXMEM_X86_64_4LEVEL (1ull << 46)
@@ -223,8 +224,8 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
}
}
-static void
-adjust_memory_range_protection(unsigned long start, unsigned long size)
+void efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size)
{
efi_status_t status;
efi_gcd_memory_space_desc_t desc;
@@ -278,35 +279,14 @@ adjust_memory_range_protection(unsigned long start, unsigned long size)
}
}
-/*
- * Trampoline takes 2 pages and can be loaded in first megabyte of memory
- * with its end placed between 128k and 640k where BIOS might start.
- * (see arch/x86/boot/compressed/pgtable_64.c)
- *
- * We cannot find exact trampoline placement since memory map
- * can be modified by UEFI, and it can alter the computed address.
- */
-
-#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
-#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-
extern const u8 startup_32[], startup_64[];
static void
setup_memory_protection(unsigned long image_base, unsigned long image_size)
{
- /*
- * Allow execution of possible trampoline used
- * for switching between 4- and 5-level page tables
- * and relocated kernel image.
- */
-
- adjust_memory_range_protection(TRAMPOLINE_PLACEMENT_BASE,
- TRAMPOLINE_PLACEMENT_SIZE);
-
#ifdef CONFIG_64BIT
if (image_base != (unsigned long)startup_32)
- adjust_memory_range_protection(image_base, image_size);
+ efi_adjust_memory_range_protection(image_base, image_size);
#else
/*
* Clear protection flags on a whole range of possible
@@ -316,8 +296,8 @@ setup_memory_protection(unsigned long image_base, unsigned long image_size)
* need to remove possible protection on relocated image
* itself disregarding further relocations.
*/
- adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
+ efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
+ KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
#endif
}
@@ -839,6 +819,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_dxe_table = NULL;
}
+ status = efi_setup_5level_paging();
+ if (status != EFI_SUCCESS) {
+ efi_err("efi_setup_5level_paging() failed!\n");
+ goto fail;
+ }
+
/*
* If the kernel isn't already loaded at a suitable address,
* relocate it.
@@ -959,6 +945,8 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
+ efi_5level_switch();
+
if (IS_ENABLED(CONFIG_X86_64))
bzimage_addr += startup_64 - startup_32;
diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
new file mode 100644
index 0000000..37c5a36
--- /dev/null
+++ b/drivers/firmware/efi/libstub/x86-stub.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <linux/efi.h>
+
+extern void trampoline_32bit_src(void *, bool);
+extern const u16 trampoline_ljmp_imm_offset;
+
+void efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size);
+
+#ifdef CONFIG_X86_64
+efi_status_t efi_setup_5level_paging(void);
+void efi_5level_switch(void);
+#else
+static inline efi_status_t efi_setup_5level_paging(void) { return EFI_SUCCESS; }
+static inline void efi_5level_switch(void) {}
+#endif
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 2f69a81ad6873231a2ae78d5ade3263ec54a4c9c
Gitweb: https://git.kernel.org/tip/2f69a81ad6873231a2ae78d5ade3263ec54a4c9c
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:26:59 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 19:20:32 +02:00
x86/head_64: Store boot_params pointer in callee save register
Instead of pushing/popping %RSI to/from the stack every time a function
is called from startup_64(), store it in a callee preserved register
and grab it from there when its value is actually needed.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/kernel/head_64.S | 32 ++++++++++++--------------------
1 file changed, 12 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c5b9289..ea69959 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -51,7 +51,9 @@ SYM_CODE_START_NOALIGN(startup_64)
* for us. These identity mapped page tables map all of the
* kernel pages and possibly all of memory.
*
- * %rsi holds a physical pointer to real_mode_data.
+ * %RSI holds the physical address of the boot_params structure
+ * provided by the bootloader. Preserve it in %R15 so C function calls
+ * will not clobber it.
*
* We come here either directly from a 64bit bootloader, or from
* arch/x86/boot/compressed/head_64.S.
@@ -62,6 +64,7 @@ SYM_CODE_START_NOALIGN(startup_64)
* compiled to run at we first fixup the physical addresses in our page
* tables and then reload them.
*/
+ mov %rsi, %r15
/* Set up the stack for verify_cpu() */
leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
@@ -75,9 +78,7 @@ SYM_CODE_START_NOALIGN(startup_64)
shrq $32, %rdx
wrmsr
- pushq %rsi
call startup_64_setup_env
- popq %rsi
/* Now switch to __KERNEL_CS so IRET works reliably */
pushq $__KERNEL_CS
@@ -93,12 +94,10 @@ SYM_CODE_START_NOALIGN(startup_64)
* Activate SEV/SME memory encryption if supported/enabled. This needs to
* be done now, since this also includes setup of the SEV-SNP CPUID table,
* which needs to be done before any CPUID instructions are executed in
- * subsequent code.
+ * subsequent code. Pass the boot_params pointer as the first argument.
*/
- movq %rsi, %rdi
- pushq %rsi
+ movq %r15, %rdi
call sme_enable
- popq %rsi
#endif
/* Sanitize CPU configuration */
@@ -111,9 +110,8 @@ SYM_CODE_START_NOALIGN(startup_64)
* programmed into CR3.
*/
leaq _text(%rip), %rdi
- pushq %rsi
+ movq %r15, %rsi
call __startup_64
- popq %rsi
/* Form the CR3 value being sure to include the CR3 modifier */
addq $(early_top_pgt - __START_KERNEL_map), %rax
@@ -127,8 +125,6 @@ SYM_CODE_START(secondary_startup_64)
* At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
* and someone has loaded a mapped page table.
*
- * %rsi holds a physical pointer to real_mode_data.
- *
* We come here either from startup_64 (using physical addresses)
* or from trampoline.S (using virtual addresses).
*
@@ -153,6 +149,9 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
UNWIND_HINT_END_OF_STACK
ANNOTATE_NOENDBR
+ /* Clear %R15 which holds the boot_params pointer on the boot CPU */
+ xorq %r15, %r15
+
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
* added to the initial pgdir entry that will be programmed into CR3.
@@ -199,13 +198,9 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
* hypervisor could lie about the C-bit position to perform a ROP
* attack on the guest by writing to the unencrypted stack and wait for
* the next RET instruction.
- * %rsi carries pointer to realmode data and is callee-clobbered. Save
- * and restore it.
*/
- pushq %rsi
movq %rax, %rdi
call sev_verify_cbit
- popq %rsi
/*
* Switch to new page-table
@@ -365,9 +360,7 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
wrmsr
/* Setup and Load IDT */
- pushq %rsi
call early_setup_idt
- popq %rsi
/* Check if nx is implemented */
movl $0x80000001, %eax
@@ -403,9 +396,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
pushq $0
popfq
- /* rsi is pointer to real mode structure with interesting info.
- pass it to C */
- movq %rsi, %rdi
+ /* Pass the boot_params pointer as first argument */
+ movq %r15, %rdi
.Ljump_to_C_code:
/*
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: bd328aa01ff77a45aeffea5fc4521854291db11f
Gitweb: https://git.kernel.org/tip/bd328aa01ff77a45aeffea5fc4521854291db11f
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:08 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:45:55 +02:00
x86/decompressor: Avoid the need for a stack in the 32-bit trampoline
The 32-bit trampoline no longer uses the stack for anything except
performing a far return back to long mode, and preserving the caller's
stack pointer value. Currently, the trampoline stack is placed in the
same page that carries the trampoline code, which means this page must
be mapped writable and executable, and the stack is therefore executable
as well.
Replace the far return with a far jump, so that the return address can
be pre-calculated and patched into the code before it is called. This
removes the need for a 32-bit addressable stack entirely, and in a later
patch, this will be taken advantage of by removing writable permissions
from (and adding executable permissions to) the trampoline code page
when booting via the EFI stub.
Note that the value of RSP still needs to be preserved explicitly across
the switch into 32-bit mode, as the register may get truncated to 32
bits.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 45 +++++++++++++++-----------
arch/x86/boot/compressed/pgtable.h | 4 +-
arch/x86/boot/compressed/pgtable_64.c | 12 ++++++-
3 files changed, 40 insertions(+), 21 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index c475042..37fd7b7 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -540,6 +540,7 @@ SYM_FUNC_END(.Lrelocated)
* trampoline memory. A non-zero second argument (ESI) means that the
* trampoline needs to enable 5-level paging.
*/
+ .section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
/*
* Preserve live 64-bit registers on the stack: this is necessary
@@ -550,13 +551,9 @@ SYM_CODE_START(trampoline_32bit_src)
pushq %rbp
pushq %rbx
- /* Set up 32-bit addressable stack and push the old RSP value */
- leaq (TRAMPOLINE_32BIT_STACK_END - 8)(%rcx), %rbx
- movq %rsp, (%rbx)
- movq %rbx, %rsp
-
- /* Take the address of the trampoline exit code */
- leaq .Lret(%rip), %rbx
+ /* Preserve top half of RSP in a legacy mode GPR to avoid truncation */
+ movq %rsp, %rbx
+ shrq $32, %rbx
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
@@ -564,9 +561,17 @@ SYM_CODE_START(trampoline_32bit_src)
pushq %rax
lretq
+ /*
+ * The 32-bit code below will do a far jump back to long mode and end
+ * up here after reconfiguring the number of paging levels. First, the
+ * stack pointer needs to be restored to its full 64-bit value before
+ * the callee save register contents can be popped from the stack.
+ */
.Lret:
+ shlq $32, %rbx
+ orq %rbx, %rsp
+
/* Restore the preserved 64-bit registers */
- movq (%rsp), %rsp
popq %rbx
popq %rbp
popq %r15
@@ -574,11 +579,6 @@ SYM_CODE_START(trampoline_32bit_src)
.code32
0:
- /* Set up data and stack segments */
- movl $__KERNEL_DS, %eax
- movl %eax, %ds
- movl %eax, %ss
-
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
@@ -633,18 +633,26 @@ SYM_CODE_START(trampoline_32bit_src)
1:
movl %eax, %cr4
- /* Prepare the stack for far return to Long Mode */
- pushl $__KERNEL_CS
- pushl %ebx
-
/* Enable paging again. */
movl %cr0, %eax
btsl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- lret
+ /*
+ * Return to the 64-bit calling code using LJMP rather than LRET, to
+ * avoid the need for a 32-bit addressable stack. The destination
+ * address will be adjusted after the template code is copied into a
+ * 32-bit addressable buffer.
+ */
+.Ljmp: ljmpl $__KERNEL_CS, $(.Lret - trampoline_32bit_src)
SYM_CODE_END(trampoline_32bit_src)
+/*
+ * This symbol is placed right after trampoline_32bit_src() so its address can
+ * be used to infer the size of the trampoline code.
+ */
+SYM_DATA(trampoline_ljmp_imm_offset, .word .Ljmp + 1 - trampoline_32bit_src)
+
/*
* The trampoline code has a size limit.
* Make sure we fail to compile if the trampoline code grows
@@ -652,6 +660,7 @@ SYM_CODE_END(trampoline_32bit_src)
*/
.org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
+ .text
SYM_FUNC_START_LOCAL_NOALIGN(.Lno_longmode)
/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 4e8cef1..c6b0903 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -8,13 +8,13 @@
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
-#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
-
#ifndef __ASSEMBLER__
extern unsigned long *trampoline_32bit;
extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
+extern const u16 trampoline_ljmp_imm_offset;
+
#endif /* __ASSEMBLER__ */
#endif /* BOOT_COMPRESSED_PAGETABLE_H */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index f8092d3..5198a05 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -109,6 +109,7 @@ static unsigned long find_trampoline_placement(void)
struct paging_config paging_prepare(void *rmode)
{
struct paging_config paging_config = {};
+ void *tramp_code;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
boot_params = rmode;
@@ -148,10 +149,19 @@ struct paging_config paging_prepare(void *rmode)
memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
/* Copy trampoline code in place */
- memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
+ tramp_code = memcpy(trampoline_32bit +
+ TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
/*
+ * Avoid the need for a stack in the 32-bit trampoline code, by using
+ * LJMP rather than LRET to return back to long mode. LJMP takes an
+ * immediate absolute address, which needs to be adjusted based on the
+ * placement of the trampoline.
+ */
+ *(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+
+ /*
* The code below prepares page table in trampoline memory.
*
* The new page table will be used by trampoline code for switching
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 31c77a50992e8dd136feed7b67073bb5f1f978cc
Gitweb: https://git.kernel.org/tip/31c77a50992e8dd136feed7b67073bb5f1f978cc
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:19 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 21:03:53 +02:00
x86/efistub: Perform SNP feature test while running in the firmware
Before refactoring the EFI stub boot flow to avoid the legacy bare metal
decompressor, duplicate the SNP feature check in the EFI stub before
handing over to the kernel proper.
The SNP feature check can be performed while running under the EFI boot
services, which means it can force the boot to fail gracefully and
return an error to the bootloader if the loaded kernel does not
implement support for all the features that the hypervisor enabled.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/sev.c | 112 +++++++++++++----------
arch/x86/include/asm/sev.h | 4 +-
drivers/firmware/efi/libstub/x86-stub.c | 17 +++-
3 files changed, 87 insertions(+), 46 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index c3e343b..199155b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -367,20 +367,25 @@ static void enforce_vmpl0(void)
*/
#define SNP_FEATURES_PRESENT (0)
+u64 snp_get_unsupported_features(u64 status)
+{
+ if (!(status & MSR_AMD64_SEV_SNP_ENABLED))
+ return 0;
+
+ return status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+}
+
void snp_check_features(void)
{
u64 unsupported;
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
- return;
-
/*
* Terminate the boot if hypervisor has enabled any feature lacking
* guest side implementation. Pass on the unsupported features mask through
* EXIT_INFO_2 of the GHCB protocol so that those features can be reported
* as part of the guest boot failure.
*/
- unsupported = sev_status & SNP_FEATURES_IMPL_REQ & ~SNP_FEATURES_PRESENT;
+ unsupported = snp_get_unsupported_features(sev_status);
if (unsupported) {
if (ghcb_version < 2 || (!boot_ghcb && !early_setup_ghcb()))
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
@@ -390,35 +395,22 @@ void snp_check_features(void)
}
}
-void sev_enable(struct boot_params *bp)
+/*
+ * sev_check_cpu_support - Check for SEV support in the CPU capabilities
+ *
+ * Returns < 0 if SEV is not supported, otherwise the position of the
+ * encryption bit in the page table descriptors.
+ */
+static int sev_check_cpu_support(void)
{
unsigned int eax, ebx, ecx, edx;
- struct msr m;
- bool snp;
-
- /*
- * bp->cc_blob_address should only be set by boot/compressed kernel.
- * Initialize it to 0 to ensure that uninitialized values from
- * buggy bootloaders aren't propagated.
- */
- if (bp)
- bp->cc_blob_address = 0;
-
- /*
- * Do an initial SEV capability check before snp_init() which
- * loads the CPUID page and the same checks afterwards are done
- * without the hypervisor and are trustworthy.
- *
- * If the HV fakes SEV support, the guest will crash'n'burn
- * which is good enough.
- */
/* Check for the SME/SEV support leaf */
eax = 0x80000000;
ecx = 0;
native_cpuid(&eax, &ebx, &ecx, &edx);
if (eax < 0x8000001f)
- return;
+ return -ENODEV;
/*
* Check for the SME/SEV feature:
@@ -433,6 +425,35 @@ void sev_enable(struct boot_params *bp)
native_cpuid(&eax, &ebx, &ecx, &edx);
/* Check whether SEV is supported */
if (!(eax & BIT(1)))
+ return -ENODEV;
+
+ return ebx & 0x3f;
+}
+
+void sev_enable(struct boot_params *bp)
+{
+ struct msr m;
+ int bitpos;
+ bool snp;
+
+ /*
+ * bp->cc_blob_address should only be set by boot/compressed kernel.
+ * Initialize it to 0 to ensure that uninitialized values from
+ * buggy bootloaders aren't propagated.
+ */
+ if (bp)
+ bp->cc_blob_address = 0;
+
+ /*
+ * Do an initial SEV capability check before snp_init() which
+ * loads the CPUID page and the same checks afterwards are done
+ * without the hypervisor and are trustworthy.
+ *
+ * If the HV fakes SEV support, the guest will crash'n'burn
+ * which is good enough.
+ */
+
+ if (sev_check_cpu_support() < 0)
return;
/*
@@ -443,26 +464,8 @@ void sev_enable(struct boot_params *bp)
/* Now repeat the checks with the SNP CPUID table. */
- /* Recheck the SME/SEV support leaf */
- eax = 0x80000000;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- if (eax < 0x8000001f)
- return;
-
- /*
- * Recheck for the SME/SEV feature:
- * CPUID Fn8000_001F[EAX]
- * - Bit 0 - Secure Memory Encryption support
- * - Bit 1 - Secure Encrypted Virtualization support
- * CPUID Fn8000_001F[EBX]
- * - Bits 5:0 - Pagetable bit position used to indicate encryption
- */
- eax = 0x8000001f;
- ecx = 0;
- native_cpuid(&eax, &ebx, &ecx, &edx);
- /* Check whether SEV is supported */
- if (!(eax & BIT(1))) {
+ bitpos = sev_check_cpu_support();
+ if (bitpos < 0) {
if (snp)
error("SEV-SNP support indicated by CC blob, but not CPUID.");
return;
@@ -494,7 +497,24 @@ void sev_enable(struct boot_params *bp)
if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
- sme_me_mask = BIT_ULL(ebx & 0x3f);
+ sme_me_mask = BIT_ULL(bitpos);
+}
+
+/*
+ * sev_get_status - Retrieve the SEV status mask
+ *
+ * Returns 0 if the CPU is not SEV capable, otherwise the value of the
+ * AMD64_SEV MSR.
+ */
+u64 sev_get_status(void)
+{
+ struct msr m;
+
+ if (sev_check_cpu_support() < 0)
+ return 0;
+
+ boot_rdmsr(MSR_AMD64_SEV, &m);
+ return m.q;
}
/* Search for Confidential Computing blob in the EFI config table. */
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 66c8067..b97d239 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -210,6 +210,8 @@ bool snp_init(struct boot_params *bp);
void __init __noreturn snp_abort(void);
int snp_issue_guest_request(u64 exit_code, struct snp_req_data *input, struct snp_guest_request_ioctl *rio);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);
+u64 snp_get_unsupported_features(u64 status);
+u64 sev_get_status(void);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -235,6 +237,8 @@ static inline int snp_issue_guest_request(u64 exit_code, struct snp_req_data *in
}
static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
+static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
+static inline u64 sev_get_status(void) { return 0; }
#endif
#endif
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index acb1c65..b4685da 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,6 +15,7 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/sev.h>
#include "efistub.h"
#include "x86-stub.h"
@@ -790,6 +791,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+static bool have_unsupported_snp_features(void)
+{
+ u64 unsupported;
+
+ unsupported = snp_get_unsupported_features(sev_get_status());
+ if (unsupported) {
+ efi_err("Unsupported SEV-SNP features detected: 0x%llx\n",
+ unsupported);
+ return true;
+ }
+ return false;
+}
+
static void __noreturn enter_kernel(unsigned long kernel_addr,
struct boot_params *boot_params)
{
@@ -820,6 +834,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
+ if (have_unsupported_snp_features())
+ efi_exit(handle, EFI_UNSUPPORTED);
+
if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
if (efi_dxe_table &&
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: cb83cece57e1889109dd73ea08ee338668c9d1b8
Gitweb: https://git.kernel.org/tip/cb83cece57e1889109dd73ea08ee338668c9d1b8
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:11 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:49:26 +02:00
x86/decompressor: Pass pgtable address to trampoline directly
The only remaining use of the trampoline address by the trampoline
itself is deriving the page table address from it, and this involves
adding an offset of 0x0. So simplify this, and pass the new CR3 value
directly.
This makes the fact that the page table happens to be at the start of
the trampoline allocation an implementation detail of the caller.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 8 ++++----
arch/x86/boot/compressed/pgtable.h | 2 --
arch/x86/boot/compressed/pgtable_64.c | 9 ++++-----
3 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 8730b1d..afdaf8c 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -521,8 +521,9 @@ SYM_FUNC_END(.Lrelocated)
* running in 64-bit mode.
*
* Return address is at the top of the stack (might be above 4G).
- * The first argument (EDI) contains the 32-bit addressable base of the
- * trampoline memory.
+ * The first argument (EDI) contains the address of the temporary PGD level
+ * page table in 32-bit addressable memory which will be programmed into
+ * register CR3.
*/
.section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
@@ -575,8 +576,7 @@ SYM_CODE_START(trampoline_32bit_src)
movl %eax, %cr0
/* Point CR3 to the trampoline's new top level page table */
- leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
- movl %eax, %cr3
+ movl %edi, %cr3
/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
movl $MSR_EFER, %ecx
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index c6b0903..6d595ab 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -3,8 +3,6 @@
#define TRAMPOLINE_32BIT_SIZE (2 * PAGE_SIZE)
-#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
-
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 4213473..eab4e6b 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -103,7 +103,7 @@ static unsigned long find_trampoline_placement(void)
asmlinkage void configure_5level_paging(struct boot_params *bp)
{
- void (*toggle_la57)(void *trampoline);
+ void (*toggle_la57)(void *cr3);
bool l5_required = false;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
@@ -174,7 +174,7 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
* For 4- to 5-level paging transition, set up current CR3 as
* the first and the only entry in a new top-level page table.
*/
- trampoline_32bit[TRAMPOLINE_32BIT_PGTABLE_OFFSET] = __native_read_cr3() | _PAGE_TABLE_NOENC;
+ *trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
} else {
unsigned long src;
@@ -187,8 +187,7 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
* may be above 4G.
*/
src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
- memcpy(trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long),
- (void *)src, PAGE_SIZE);
+ memcpy(trampoline_32bit, (void *)src, PAGE_SIZE);
}
toggle_la57(trampoline_32bit);
@@ -198,7 +197,7 @@ void cleanup_trampoline(void *pgtable)
{
void *trampoline_pgtable;
- trampoline_pgtable = trampoline_32bit + TRAMPOLINE_32BIT_PGTABLE_OFFSET / sizeof(unsigned long);
+ trampoline_pgtable = trampoline_32bit;
/*
* Move the top level page table out of trampoline memory,
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 00c6b0978ec182f1a672095930872168b9d5b1e2
Gitweb: https://git.kernel.org/tip/00c6b0978ec182f1a672095930872168b9d5b1e2
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:05 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:42:11 +02:00
x86/decompressor: Assign paging related global variables earlier
There is no need to defer the assignment of the paging related global
variables 'pgdir_shift' and 'ptrs_per_p4d' until after the trampoline is
cleaned up, so assign them as soon as it is clear that 5-level paging
will be enabled.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/misc.h | 2 --
arch/x86/boot/compressed/pgtable_64.c | 14 +++++---------
2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 964fe90..cc70d3f 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -179,9 +179,7 @@ static inline int count_immovable_mem_regions(void) { return 0; }
#endif
/* ident_map_64.c */
-#ifdef CONFIG_X86_5LEVEL
extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
-#endif
extern void kernel_add_identity_map(unsigned long start, unsigned long end);
/* Used by PAGE_KERN* macros: */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 2ac12ff..f8092d3 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -130,6 +130,11 @@ struct paging_config paging_prepare(void *rmode)
native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
paging_config.l5_required = 1;
+
+ /* Initialize variables for 5-level paging */
+ __pgtable_l5_enabled = 1;
+ pgdir_shift = 48;
+ ptrs_per_p4d = 512;
}
paging_config.trampoline_start = find_trampoline_placement();
@@ -206,13 +211,4 @@ void cleanup_trampoline(void *pgtable)
/* Restore trampoline memory */
memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE);
-
- /* Initialize variables for 5-level paging */
-#ifdef CONFIG_X86_5LEVEL
- if (__read_cr4() & X86_CR4_LA57) {
- __pgtable_l5_enabled = 1;
- pgdir_shift = 48;
- ptrs_per_p4d = 512;
- }
-#endif
}
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 24388292e2d7fae79a0d4183cc91716b851299cf
Gitweb: https://git.kernel.org/tip/24388292e2d7fae79a0d4183cc91716b851299cf
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:16 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:58:02 +02:00
x86/decompressor: Move global symbol references to C code
It is no longer necessary to be cautious when referring to global
variables in the position independent decompressor code, now that it is
built using PIE codegen and makes an assertion in the linker script that
no GOT entries exist (which would require adjustment for the actual
runtime load address of the decompressor binary).
This means global variables can be referenced directly from C code,
instead of having to pass their runtime addresses into C routines from
asm code, which needs to happen at each call site. Do so for the code
that will be called directly from the EFI stub after a subsequent patch,
and avoid the need to duplicate this logic a third time.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_32.S | 8 --------
arch/x86/boot/compressed/head_64.S | 10 ++--------
arch/x86/boot/compressed/misc.c | 16 +++++++++-------
3 files changed, 11 insertions(+), 23 deletions(-)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 8876ffe..3af4a38 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -168,13 +168,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
*/
/* push arguments for extract_kernel: */
- pushl output_len@GOTOFF(%ebx) /* decompressed length, end of relocs */
pushl %ebp /* output address */
- pushl input_len@GOTOFF(%ebx) /* input_len */
- leal input_data@GOTOFF(%ebx), %eax
- pushl %eax /* input_data */
- leal boot_heap@GOTOFF(%ebx), %eax
- pushl %eax /* heap area */
pushl %esi /* real mode pointer */
call extract_kernel /* returns kernel entry point in %eax */
addl $24, %esp
@@ -202,8 +196,6 @@ SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)
*/
.bss
.balign 4
-boot_heap:
- .fill BOOT_HEAP_SIZE, 1, 0
boot_stack:
.fill BOOT_STACK_SIZE, 1, 0
boot_stack_end:
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index fb0e562..28f4605 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -493,13 +493,9 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
/*
* Do the extraction, and jump to the new kernel..
*/
- /* pass struct boot_params pointer */
+ /* pass struct boot_params pointer and output target address */
movq %r15, %rdi
- leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
- leaq input_data(%rip), %rdx /* input_data */
- movl input_len(%rip), %ecx /* input_len */
- movq %rbp, %r8 /* output target address */
- movl output_len(%rip), %r9d /* decompressed length, end of relocs */
+ movq %rbp, %rsi
call extract_kernel /* returns kernel entry point in %rax */
/*
@@ -657,8 +653,6 @@ SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
*/
.bss
.balign 4
-SYM_DATA_LOCAL(boot_heap, .fill BOOT_HEAP_SIZE, 1, 0)
-
SYM_DATA_START_LOCAL(boot_stack)
.fill BOOT_STACK_SIZE, 1, 0
.balign 16
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 94b7abc..2d91d56 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -330,6 +330,11 @@ static size_t parse_elf(void *output)
return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
}
+static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
+
+extern unsigned char input_data[];
+extern unsigned int input_len, output_len;
+
/*
* The compressed kernel image (ZO), has been moved so that its position
* is against the end of the buffer used to hold the uncompressed kernel
@@ -347,14 +352,11 @@ static size_t parse_elf(void *output)
* |-------uncompressed kernel image---------|
*
*/
-asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
- unsigned char *input_data,
- unsigned long input_len,
- unsigned char *output,
- unsigned long output_len)
+asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
{
const unsigned long kernel_total_size = VO__end - VO__text;
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+ memptr heap = (memptr)boot_heap;
unsigned long needed_size;
size_t entry_offset;
@@ -412,7 +414,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
* entries. This ensures the full mapped area is usable RAM
* and doesn't include any reserved areas.
*/
- needed_size = max(output_len, kernel_total_size);
+ needed_size = max_t(unsigned long, output_len, kernel_total_size);
#ifdef CONFIG_X86_64
needed_size = ALIGN(needed_size, MIN_KERNEL_ALIGN);
#endif
@@ -443,7 +445,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
#ifdef CONFIG_X86_64
if (heap > 0x3fffffffffffUL)
error("Destination address too large");
- if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
+ if (virt_addr + needed_size > KERNEL_IMAGE_SIZE)
error("Destination virtual address is beyond the kernel mapping area");
#else
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: bc5ddceff4c14494d83449ad45c985e6cd353fce
Gitweb: https://git.kernel.org/tip/bc5ddceff4c14494d83449ad45c985e6cd353fce
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:18 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 21:01:46 +02:00
efi/libstub: Add limit argument to efi_random_alloc()
x86 will need to limit the kernel memory allocation to the lowest 512
MiB of memory, to match the behavior of the existing bare metal KASLR
physical randomization logic. So in preparation for that, add a limit
parameter to efi_random_alloc() and wire it up.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/firmware/efi/libstub/arm64-stub.c | 2 +-
drivers/firmware/efi/libstub/efistub.h | 2 +-
drivers/firmware/efi/libstub/randomalloc.c | 10 ++++++----
drivers/firmware/efi/libstub/zboot.c | 2 +-
4 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index 770b8ec..8c40fc8 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -106,7 +106,7 @@ efi_status_t handle_kernel_image(unsigned long *image_addr,
*/
status = efi_random_alloc(*reserve_size, min_kimg_align,
reserve_addr, phys_seed,
- EFI_LOADER_CODE);
+ EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
if (status != EFI_SUCCESS)
efi_warn("efi_random_alloc() failed: 0x%lx\n", status);
} else {
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 06b7abc..9823f6f 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -956,7 +956,7 @@ efi_status_t efi_get_random_bytes(unsigned long size, u8 *out);
efi_status_t efi_random_alloc(unsigned long size, unsigned long align,
unsigned long *addr, unsigned long random_seed,
- int memory_type);
+ int memory_type, unsigned long alloc_limit);
efi_status_t efi_random_get_seed(void);
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c
index 32c7a54..674a064 100644
--- a/drivers/firmware/efi/libstub/randomalloc.c
+++ b/drivers/firmware/efi/libstub/randomalloc.c
@@ -16,7 +16,8 @@
*/
static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
unsigned long size,
- unsigned long align_shift)
+ unsigned long align_shift,
+ u64 alloc_limit)
{
unsigned long align = 1UL << align_shift;
u64 first_slot, last_slot, region_end;
@@ -29,7 +30,7 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
return 0;
region_end = min(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - 1,
- (u64)EFI_ALLOC_LIMIT);
+ alloc_limit);
if (region_end < size)
return 0;
@@ -54,7 +55,8 @@ efi_status_t efi_random_alloc(unsigned long size,
unsigned long align,
unsigned long *addr,
unsigned long random_seed,
- int memory_type)
+ int memory_type,
+ unsigned long alloc_limit)
{
unsigned long total_slots = 0, target_slot;
unsigned long total_mirrored_slots = 0;
@@ -76,7 +78,7 @@ efi_status_t efi_random_alloc(unsigned long size,
efi_memory_desc_t *md = (void *)map->map + map_offset;
unsigned long slots;
- slots = get_entry_num_slots(md, size, ilog2(align));
+ slots = get_entry_num_slots(md, size, ilog2(align), alloc_limit);
MD_NUM_SLOTS(md) = slots;
total_slots += slots;
if (md->attribute & EFI_MEMORY_MORE_RELIABLE)
diff --git a/drivers/firmware/efi/libstub/zboot.c b/drivers/firmware/efi/libstub/zboot.c
index e5d7fa1..bdb17ea 100644
--- a/drivers/firmware/efi/libstub/zboot.c
+++ b/drivers/firmware/efi/libstub/zboot.c
@@ -119,7 +119,7 @@ efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab)
}
status = efi_random_alloc(alloc_size, min_kimg_align, &image_base,
- seed, EFI_LOADER_CODE);
+ seed, EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
if (status != EFI_SUCCESS) {
efi_err("Failed to allocate memory\n");
goto free_cmdline;
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 11078876b7a6a1b7226344fecab968945c806832
Gitweb: https://git.kernel.org/tip/11078876b7a6a1b7226344fecab968945c806832
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:14 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:54:15 +02:00
x86/efistub: Prefer EFI memory attributes protocol over DXE services
Currently, the EFI stub relies on DXE services in some cases to clear
non-execute restrictions from page allocations that need to be
executable. This is dodgy, because DXE services are not specified by
UEFI but by PI, and they are not intended for consumption by OS loaders.
However, no alternative existed at the time.
Now, there is a new UEFI protocol that should be used instead, so if it
exists, prefer it over the DXE services calls.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/firmware/efi/libstub/x86-stub.c | 29 +++++++++++++++++-------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index af5f506..acb1c65 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -26,6 +26,7 @@ const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
+static efi_memory_attribute_protocol_t *memattr;
typedef union sev_memory_acceptance_protocol sev_memory_acceptance_protocol_t;
union sev_memory_acceptance_protocol {
@@ -233,12 +234,18 @@ void efi_adjust_memory_range_protection(unsigned long start,
unsigned long rounded_start, rounded_end;
unsigned long unprotect_start, unprotect_size;
- if (efi_dxe_table == NULL)
- return;
-
rounded_start = rounddown(start, EFI_PAGE_SIZE);
rounded_end = roundup(start + size, EFI_PAGE_SIZE);
+ if (memattr != NULL) {
+ efi_call_proto(memattr, clear_memory_attributes, rounded_start,
+ rounded_end - rounded_start, EFI_MEMORY_XP);
+ return;
+ }
+
+ if (efi_dxe_table == NULL)
+ return;
+
/*
* Don't modify memory region attributes, they are
* already suitable, to lower the possibility to
@@ -801,6 +808,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
efi_system_table_t *sys_table_arg,
struct boot_params *boot_params)
{
+ efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
unsigned long bzimage_addr = (unsigned long)startup_32;
unsigned long buffer_start, buffer_end;
struct setup_header *hdr = &boot_params->hdr;
@@ -812,13 +820,18 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (efi_system_table->hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
efi_exit(handle, EFI_INVALID_PARAMETER);
- efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
- if (efi_dxe_table &&
- efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
- efi_warn("Ignoring DXE services table: invalid signature\n");
- efi_dxe_table = NULL;
+ if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES)) {
+ efi_dxe_table = get_efi_config_table(EFI_DXE_SERVICES_TABLE_GUID);
+ if (efi_dxe_table &&
+ efi_dxe_table->hdr.signature != EFI_DXE_SERVICES_TABLE_SIGNATURE) {
+ efi_warn("Ignoring DXE services table: invalid signature\n");
+ efi_dxe_table = NULL;
+ }
}
+ /* grab the memory attributes protocol if it exists */
+ efi_bs_call(locate_protocol, &guid, NULL, (void **)&memattr);
+
status = efi_setup_5level_paging();
if (status != EFI_SUCCESS) {
efi_err("efi_setup_5level_paging() failed!\n");
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: a1b87d54f4e45ff5e0d081fb1d9db3bf1a8fb39a
Gitweb: https://git.kernel.org/tip/a1b87d54f4e45ff5e0d081fb1d9db3bf1a8fb39a
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:20 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 21:07:43 +02:00
x86/efistub: Avoid legacy decompressor when doing EFI boot
The bare metal decompressor code was never really intended to run in a
hosted environment such as the EFI boot services, and does a few things
that are becoming problematic in the context of EFI boot now that the
logo requirements are getting tighter: EFI executables will no longer be
allowed to consist of a single executable section that is mapped with
read, write and execute permissions if they are intended for use in a
context where Secure Boot is enabled (and where Microsoft's set of
certificates is used, i.e., every x86 PC built to run Windows).
To avoid stepping on reserved memory before having inspected the E820
tables, and to ensure the correct placement when running a kernel build
that is non-relocatable, the bare metal decompressor moves its own
executable image to the end of the allocation that was reserved for it,
in order to perform the decompression in place. This means the region in
question requires both write and execute permissions, which either need
to be given upfront (which EFI will no longer permit), or need to be
applied on demand using the existing page fault handling framework.
However, the physical placement of the kernel is usually randomized
anyway, and even if it isn't, a dedicated decompression output buffer
can be allocated anywhere in memory using EFI APIs when still running in
the boot services, given that EFI support already implies a relocatable
kernel. This means that decompression in place is never necessary, nor
is moving the compressed image from one end to the other.
Since EFI already maps all of memory 1:1, it is also unnecessary to
create new page tables or handle page faults when decompressing the
kernel. That means there is also no need to replace the special
exception handlers for SEV. Generally, there is little need to do
any of the things that the decompressor does beyond
- initialize SEV encryption, if needed,
- perform the 4/5 level paging switch, if needed,
- decompress the kernel
- relocate the kernel
So do all of this from the EFI stub code, and avoid the bare metal
decompressor altogether.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/Makefile | 5 +-
arch/x86/boot/compressed/efi_mixed.S | 55 +--------
arch/x86/boot/compressed/head_32.S | 13 +--
arch/x86/boot/compressed/head_64.S | 27 +----
arch/x86/include/asm/efi.h | 7 +-
arch/x86/include/asm/sev.h | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 166 +++++++++--------------
7 files changed, 84 insertions(+), 191 deletions(-)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 40d2ff5..71fc531 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -74,6 +74,11 @@ LDFLAGS_vmlinux += -z noexecstack
ifeq ($(CONFIG_LD_IS_BFD),y)
LDFLAGS_vmlinux += $(call ld-option,--no-warn-rwx-segments)
endif
+ifeq ($(CONFIG_EFI_STUB),y)
+# ensure that the static EFI stub library will be pulled in, even if it is
+# never referenced explicitly from the startup code
+LDFLAGS_vmlinux += -u efi_pe_entry
+endif
LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 8a02a15..f4e22ef 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -269,10 +269,6 @@ SYM_FUNC_START_LOCAL(efi32_entry)
jmp startup_32
SYM_FUNC_END(efi32_entry)
-#define ST32_boottime 60 // offsetof(efi_system_table_32_t, boottime)
-#define BS32_handle_protocol 88 // offsetof(efi_boot_services_32_t, handle_protocol)
-#define LI32_image_base 32 // offsetof(efi_loaded_image_32_t, image_base)
-
/*
* efi_status_t efi32_pe_entry(efi_handle_t image_handle,
* efi_system_table_32_t *sys_table)
@@ -280,8 +276,6 @@ SYM_FUNC_END(efi32_entry)
SYM_FUNC_START(efi32_pe_entry)
pushl %ebp
movl %esp, %ebp
- pushl %eax // dummy push to allocate loaded_image
-
pushl %ebx // save callee-save registers
pushl %edi
@@ -290,48 +284,8 @@ SYM_FUNC_START(efi32_pe_entry)
movl $0x80000003, %eax // EFI_UNSUPPORTED
jnz 2f
- call 1f
-1: pop %ebx
-
- /* Get the loaded image protocol pointer from the image handle */
- leal -4(%ebp), %eax
- pushl %eax // &loaded_image
- leal (loaded_image_proto - 1b)(%ebx), %eax
- pushl %eax // pass the GUID address
- pushl 8(%ebp) // pass the image handle
-
- /*
- * Note the alignment of the stack frame.
- * sys_table
- * handle <-- 16-byte aligned on entry by ABI
- * return address
- * frame pointer
- * loaded_image <-- local variable
- * saved %ebx <-- 16-byte aligned here
- * saved %edi
- * &loaded_image
- * &loaded_image_proto
- * handle <-- 16-byte aligned for call to handle_protocol
- */
-
- movl 12(%ebp), %eax // sys_table
- movl ST32_boottime(%eax), %eax // sys_table->boottime
- call *BS32_handle_protocol(%eax) // sys_table->boottime->handle_protocol
- addl $12, %esp // restore argument space
- testl %eax, %eax
- jnz 2f
-
movl 8(%ebp), %ecx // image_handle
movl 12(%ebp), %edx // sys_table
- movl -4(%ebp), %esi // loaded_image
- movl LI32_image_base(%esi), %esi // loaded_image->image_base
- leal (startup_32 - 1b)(%ebx), %ebp // runtime address of startup_32
- /*
- * We need to set the image_offset variable here since startup_32() will
- * use it before we get to the 64-bit efi_pe_entry() in C code.
- */
- subl %esi, %ebp // calculate image_offset
- movl %ebp, (image_offset - 1b)(%ebx) // save image_offset
xorl %esi, %esi
jmp efi32_entry // pass %ecx, %edx, %esi
// no other registers remain live
@@ -350,15 +304,6 @@ SYM_FUNC_START_NOALIGN(efi64_stub_entry)
SYM_FUNC_END(efi64_stub_entry)
#endif
- .section ".rodata"
- /* EFI loaded image protocol GUID */
- .balign 4
-SYM_DATA_START_LOCAL(loaded_image_proto)
- .long 0x5b1b31a1
- .word 0x9562, 0x11d2
- .byte 0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b
-SYM_DATA_END(loaded_image_proto)
-
.data
.balign 8
SYM_DATA_START_LOCAL(efi32_boot_gdt)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 3af4a38..1cfe980 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -84,19 +84,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
leal startup_32@GOTOFF(%edx), %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32() will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry() will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl image_offset@GOTOFF(%edx), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 28f4605..bf4a10a 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -146,19 +146,6 @@ SYM_FUNC_START(startup_32)
#ifdef CONFIG_RELOCATABLE
movl %ebp, %ebx
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- subl rva(image_offset)(%ebp), %ebx
-#endif
-
movl BP_kernel_alignment(%esi), %eax
decl %eax
addl %eax, %ebx
@@ -335,20 +322,6 @@ SYM_CODE_START(startup_64)
/* Start with the delta to where the kernel will run at. */
#ifdef CONFIG_RELOCATABLE
leaq startup_32(%rip) /* - $startup_32 */, %rbp
-
-#ifdef CONFIG_EFI_STUB
-/*
- * If we were loaded via the EFI LoadImage service, startup_32 will be at an
- * offset to the start of the space allocated for the image. efi_pe_entry will
- * set up image_offset to tell us where the image actually starts, so that we
- * can use the full available buffer.
- * image_offset = startup_32 - image_base
- * Otherwise image_offset will be zero and has no effect on the calculations.
- */
- movl image_offset(%rip), %eax
- subq %rax, %rbp
-#endif
-
movl BP_kernel_alignment(%rsi), %eax
decl %eax
addq %rax, %rbp
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 8b4be7c..b0994ae 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,6 +90,8 @@ static inline void efi_fpu_end(void)
}
#ifdef CONFIG_X86_32
+#define EFI_X86_KERNEL_ALLOC_LIMIT (SZ_512M - 1)
+
#define arch_efi_call_virt_setup() \
({ \
efi_fpu_begin(); \
@@ -103,8 +105,7 @@ static inline void efi_fpu_end(void)
})
#else /* !CONFIG_X86_32 */
-
-#define EFI_LOADER_SIGNATURE "EL64"
+#define EFI_X86_KERNEL_ALLOC_LIMIT EFI_ALLOC_LIMIT
extern asmlinkage u64 __efi_call(void *fp, ...);
@@ -218,6 +219,8 @@ efi_status_t efi_set_virtual_address_map(unsigned long memory_map_size,
#ifdef CONFIG_EFI_MIXED
+#define EFI_ALLOC_LIMIT (efi_is_64bit() ? ULONG_MAX : U32_MAX)
+
#define ARCH_HAS_EFISTUB_WRAPPERS
static inline bool efi_is_64bit(void)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index b97d239..5b4a1ce 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -164,6 +164,7 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
}
extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern void sev_enable(struct boot_params *bp);
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs)
{
@@ -218,6 +219,7 @@ static inline void sev_es_ist_exit(void) { }
static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { return 0; }
static inline void sev_es_nmi_complete(void) { }
static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline void sev_enable(struct boot_params *bp) { }
static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate) { return 0; }
static inline int rmpadjust(unsigned long vaddr, bool rmp_psize, unsigned long attrs) { return 0; }
static inline void setup_ghcb(void) { }
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index b4685da..e976288 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -15,17 +15,14 @@
#include <asm/setup.h>
#include <asm/desc.h>
#include <asm/boot.h>
+#include <asm/kaslr.h>
#include <asm/sev.h>
#include "efistub.h"
#include "x86-stub.h"
-/* Maximum physical address for 64-bit kernel with 4-level paging */
-#define MAXMEM_X86_64_4LEVEL (1ull << 46)
-
const efi_system_table_t *efi_system_table;
const efi_dxe_services_table_t *efi_dxe_table;
-u32 image_offset __section(".data");
static efi_loaded_image_t *image = NULL;
static efi_memory_attribute_protocol_t *memattr;
@@ -287,28 +284,6 @@ void efi_adjust_memory_range_protection(unsigned long start,
}
}
-extern const u8 startup_32[], startup_64[];
-
-static void
-setup_memory_protection(unsigned long image_base, unsigned long image_size)
-{
-#ifdef CONFIG_64BIT
- if (image_base != (unsigned long)startup_32)
- efi_adjust_memory_range_protection(image_base, image_size);
-#else
- /*
- * Clear protection flags on a whole range of possible
- * addresses used for KASLR. We don't need to do that
- * on x86_64, since KASLR/extraction is performed after
- * dedicated identity page tables are built and we only
- * need to remove possible protection on relocated image
- * itself disregarding further relocations.
- */
- efi_adjust_memory_range_protection(LOAD_PHYSICAL_ADDR,
- KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR);
-#endif
-}
-
static void setup_unaccepted_memory(void)
{
efi_guid_t mem_acceptance_proto = OVMF_SEV_MEMORY_ACCEPTANCE_PROTOCOL_GUID;
@@ -334,9 +309,7 @@ static void setup_unaccepted_memory(void)
static const efi_char16_t apple[] = L"Apple";
-static void setup_quirks(struct boot_params *boot_params,
- unsigned long image_base,
- unsigned long image_size)
+static void setup_quirks(struct boot_params *boot_params)
{
efi_char16_t *fw_vendor = (efi_char16_t *)(unsigned long)
efi_table_attr(efi_system_table, fw_vendor);
@@ -345,9 +318,6 @@ static void setup_quirks(struct boot_params *boot_params,
if (IS_ENABLED(CONFIG_APPLE_PROPERTIES))
retrieve_apple_device_properties(boot_params);
}
-
- if (IS_ENABLED(CONFIG_EFI_DXE_MEM_ATTRIBUTES))
- setup_memory_protection(image_base, image_size);
}
/*
@@ -500,7 +470,6 @@ efi_status_t __efiapi efi_pe_entry(efi_handle_t handle,
}
image_base = efi_table_attr(image, image_base);
- image_offset = (void *)startup_32 - image_base;
status = efi_allocate_pages(sizeof(struct boot_params),
(unsigned long *)&boot_params, ULONG_MAX);
@@ -804,6 +773,61 @@ static bool have_unsupported_snp_features(void)
return false;
}
+static void efi_get_seed(void *seed, int size)
+{
+ efi_get_random_bytes(size, seed);
+
+ /*
+ * This only updates seed[0] when running on 32-bit, but in that case,
+ * seed[1] is not used anyway, as there is no virtual KASLR on 32-bit.
+ */
+ *(unsigned long *)seed ^= kaslr_get_random_long("EFI");
+}
+
+static void error(char *str)
+{
+ efi_warn("Decompression failed: %s\n", str);
+}
+
+static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
+{
+ unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
+ unsigned long addr, alloc_size, entry;
+ efi_status_t status;
+ u32 seed[2] = {};
+
+ /* determine the required size of the allocation */
+ alloc_size = ALIGN(max_t(unsigned long, output_len, kernel_total_size),
+ MIN_KERNEL_ALIGN);
+
+ if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && !efi_nokaslr) {
+ u64 range = KERNEL_IMAGE_SIZE - LOAD_PHYSICAL_ADDR - kernel_total_size;
+
+ efi_get_seed(seed, sizeof(seed));
+
+ virt_addr += (range * seed[1]) >> 32;
+ virt_addr &= ~(CONFIG_PHYSICAL_ALIGN - 1);
+ }
+
+ status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
+ seed[0], EFI_LOADER_CODE,
+ EFI_X86_KERNEL_ALLOC_LIMIT);
+ if (status != EFI_SUCCESS)
+ return status;
+
+ entry = decompress_kernel((void *)addr, virt_addr, error);
+ if (entry == ULONG_MAX) {
+ efi_free(alloc_size, addr);
+ return EFI_LOAD_ERROR;
+ }
+
+ *kernel_entry = addr + entry;
+
+ efi_adjust_memory_range_protection(addr, kernel_total_size);
+
+ return EFI_SUCCESS;
+}
+
static void __noreturn enter_kernel(unsigned long kernel_addr,
struct boot_params *boot_params)
{
@@ -823,10 +847,9 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
struct boot_params *boot_params)
{
efi_guid_t guid = EFI_MEMORY_ATTRIBUTE_PROTOCOL_GUID;
- unsigned long bzimage_addr = (unsigned long)startup_32;
- unsigned long buffer_start, buffer_end;
struct setup_header *hdr = &boot_params->hdr;
const struct linux_efi_initrd *initrd = NULL;
+ unsigned long kernel_entry;
efi_status_t status;
efi_system_table = sys_table_arg;
@@ -855,60 +878,6 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
- /*
- * If the kernel isn't already loaded at a suitable address,
- * relocate it.
- *
- * It must be loaded above LOAD_PHYSICAL_ADDR.
- *
- * The maximum address for 64-bit is 1 << 46 for 4-level paging. This
- * is defined as the macro MAXMEM, but unfortunately that is not a
- * compile-time constant if 5-level paging is configured, so we instead
- * define our own macro for use here.
- *
- * For 32-bit, the maximum address is complicated to figure out, for
- * now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
- * KASLR uses.
- *
- * Also relocate it if image_offset is zero, i.e. the kernel wasn't
- * loaded by LoadImage, but rather by a bootloader that called the
- * handover entry. The reason we must always relocate in this case is
- * to handle the case of systemd-boot booting a unified kernel image,
- * which is a PE executable that contains the bzImage and an initrd as
- * COFF sections. The initrd section is placed after the bzImage
- * without ensuring that there are at least init_size bytes available
- * for the bzImage, and thus the compressed kernel's startup code may
- * overwrite the initrd unless it is moved out of the way.
- */
-
- buffer_start = ALIGN(bzimage_addr - image_offset,
- hdr->kernel_alignment);
- buffer_end = buffer_start + hdr->init_size;
-
- if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
- (IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
- (IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
- (image_offset == 0)) {
- extern char _bss[];
-
- status = efi_relocate_kernel(&bzimage_addr,
- (unsigned long)_bss - bzimage_addr,
- hdr->init_size,
- hdr->pref_address,
- hdr->kernel_alignment,
- LOAD_PHYSICAL_ADDR);
- if (status != EFI_SUCCESS) {
- efi_err("efi_relocate_kernel() failed!\n");
- goto fail;
- }
- /*
- * Now that we've copied the kernel elsewhere, we no longer
- * have a set up block before startup_32(), so reset image_offset
- * to zero in case it was set earlier.
- */
- image_offset = 0;
- }
-
#ifdef CONFIG_CMDLINE_BOOL
status = efi_parse_options(CONFIG_CMDLINE);
if (status != EFI_SUCCESS) {
@@ -926,6 +895,12 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
}
+ status = efi_decompress_kernel(&kernel_entry);
+ if (status != EFI_SUCCESS) {
+ efi_err("Failed to decompress kernel\n");
+ goto fail;
+ }
+
/*
* At this point, an initrd may already have been loaded by the
* bootloader and passed via bootparams. We permit an initrd loaded
@@ -965,7 +940,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
setup_efi_pci(boot_params);
- setup_quirks(boot_params, bzimage_addr, buffer_end - buffer_start);
+ setup_quirks(boot_params);
setup_unaccepted_memory();
@@ -975,12 +950,15 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
goto fail;
}
- efi_5level_switch();
+ /*
+ * Call the SEV init code while still running with the firmware's
+ * GDT/IDT, so #VC exceptions will be handled by EFI.
+ */
+ sev_enable(boot_params);
- if (IS_ENABLED(CONFIG_X86_64))
- bzimage_addr += startup_64 - startup_32;
+ efi_5level_switch();
- enter_kernel(bzimage_addr, boot_params);
+ enter_kernel(kernel_entry, boot_params);
fail:
efi_err("efi_stub_entry() failed!\n");
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: d7156b986d4cc0657fa6dc05c9fcf51c3d55a0fe
Gitweb: https://git.kernel.org/tip/d7156b986d4cc0657fa6dc05c9fcf51c3d55a0fe
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:03 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:40:36 +02:00
x86/efistub: Clear BSS in EFI handover protocol entrypoint
The so-called EFI handover protocol is value-add from the distros that
permits a loader to simply copy a PE kernel image into memory and call
an alternative entrypoint that is described by an embedded boot_params
structure.
Most implementations of this protocol do not bother to check the PE
header for minimum alignment, section placement, etc, and therefore also
don't clear the image's BSS, or even allocate enough memory for it.
Allocating more memory on the fly is rather difficult, but at least
clear the BSS region explicitly when entering in this manner, so that
the EFI stub code does not get confused by global variables that were
not zero-initialized correctly.
When booting in mixed mode, this BSS clearing must occur before any
global state is created, so clear it in the 32-bit asm entry point.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/efi_mixed.S | 14 +++++++++++++-
drivers/firmware/efi/libstub/x86-stub.c | 13 +++++++++++--
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 9308b59..8a02a15 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -142,6 +142,18 @@ SYM_FUNC_END(__efi64_thunk)
.code32
#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
SYM_FUNC_START(efi32_stub_entry)
+ call 1f
+1: popl %ecx
+
+ /* Clear BSS */
+ xorl %eax, %eax
+ leal (_bss - 1b)(%ecx), %edi
+ leal (_ebss - 1b)(%ecx), %ecx
+ subl %edi, %ecx
+ shrl $2, %ecx
+ cld
+ rep stosl
+
add $0x4, %esp /* Discard return address */
popl %ecx
popl %edx
@@ -334,7 +346,7 @@ SYM_FUNC_END(efi32_pe_entry)
.org efi32_stub_entry + 0x200
.code64
SYM_FUNC_START_NOALIGN(efi64_stub_entry)
- jmp efi_stub_entry
+ jmp efi_handover_entry
SYM_FUNC_END(efi64_stub_entry)
#endif
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 3f3b3ed..9247dbc 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -970,12 +970,21 @@ fail:
}
#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+void efi_handover_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params)
+{
+ extern char _bss[], _ebss[];
+
+ memset(_bss, 0, _ebss - _bss);
+ efi_stub_entry(handle, sys_table_arg, boot_params);
+}
+
#ifndef CONFIG_EFI_MIXED
-extern __alias(efi_stub_entry)
+extern __alias(efi_handover_entry)
void efi32_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
struct boot_params *boot_params);
-extern __alias(efi_stub_entry)
+extern __alias(efi_handover_entry)
void efi64_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
struct boot_params *boot_params);
#endif
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 8217ad0a435ff06d651d7298ea8ae8d72388179e
Gitweb: https://git.kernel.org/tip/8217ad0a435ff06d651d7298ea8ae8d72388179e
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:15 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:55:27 +02:00
decompress: Use 8 byte alignment
The ZSTD decompressor requires malloc() allocations to be 8 byte
aligned, so ensure that this the case.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/decompress/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h
index 9192986..ac86242 100644
--- a/include/linux/decompress/mm.h
+++ b/include/linux/decompress/mm.h
@@ -48,7 +48,7 @@ MALLOC_VISIBLE void *malloc(int size)
if (!malloc_ptr)
malloc_ptr = free_mem_ptr;
- malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */
+ malloc_ptr = (malloc_ptr + 7) & ~7; /* Align */
p = (void *)malloc_ptr;
malloc_ptr += size;
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: d2d7a54f69b67cd0a30e0ebb5307cb2de625baac
Gitweb: https://git.kernel.org/tip/d2d7a54f69b67cd0a30e0ebb5307cb2de625baac
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:00 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:36:06 +02:00
x86/efistub: Branch straight to kernel entry point from C code
Instead of returning to the calling code in assembler that does nothing
more than perform an indirect call with the boot_params pointer in
register ESI/RSI, perform the jump directly from the EFI stub C code.
This will allow the asm entrypoint code to be dropped entirely in
subsequent patches.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
drivers/firmware/efi/libstub/x86-stub.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 220be75..40a10db 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -290,7 +290,7 @@ adjust_memory_range_protection(unsigned long start, unsigned long size)
#define TRAMPOLINE_PLACEMENT_BASE ((128 - 8)*1024)
#define TRAMPOLINE_PLACEMENT_SIZE (640*1024 - (128 - 8)*1024)
-void startup_32(struct boot_params *boot_params);
+extern const u8 startup_32[], startup_64[];
static void
setup_memory_protection(unsigned long image_base, unsigned long image_size)
@@ -803,10 +803,19 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle)
return EFI_SUCCESS;
}
+static void __noreturn enter_kernel(unsigned long kernel_addr,
+ struct boot_params *boot_params)
+{
+ /* enter decompressed kernel with boot_params pointer in RSI/ESI */
+ asm("jmp *%0"::"r"(kernel_addr), "S"(boot_params));
+
+ unreachable();
+}
+
/*
- * On success, we return the address of startup_32, which has potentially been
- * relocated by efi_relocate_kernel.
- * On failure, we exit to the firmware via efi_exit instead of returning.
+ * On success, this routine will jump to the relocated image directly and never
+ * return. On failure, it will exit to the firmware via efi_exit() instead of
+ * returning.
*/
asmlinkage unsigned long efi_main(efi_handle_t handle,
efi_system_table_t *sys_table_arg,
@@ -950,7 +959,10 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
goto fail;
}
- return bzimage_addr;
+ if (IS_ENABLED(CONFIG_X86_64))
+ bzimage_addr += startup_64 - startup_32;
+
+ enter_kernel(bzimage_addr, boot_params);
fail:
efi_err("efi_main() failed!\n");
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 8b63cba746f86a754d66e302c43209cc9b9b6e39
Gitweb: https://git.kernel.org/tip/8b63cba746f86a754d66e302c43209cc9b9b6e39
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:04 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:41:04 +02:00
x86/decompressor: Store boot_params pointer in callee save register
Instead of pushing and popping %RSI several times to preserve the struct
boot_params pointer across the execution of the startup code, move it
into a callee save register before the first call into C, and copy it
back when needed.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 42 +++++++++++------------------
1 file changed, 16 insertions(+), 26 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index a3f764d..19bf810 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -405,10 +405,14 @@ SYM_CODE_START(startup_64)
lretq
.Lon_kernel_cs:
+ /*
+ * RSI holds a pointer to a boot_params structure provided by the
+ * loader, and this needs to be preserved across C function calls. So
+ * move it into a callee saved register.
+ */
+ movq %rsi, %r15
- pushq %rsi
call load_stage1_idt
- popq %rsi
#ifdef CONFIG_AMD_MEM_ENCRYPT
/*
@@ -419,12 +423,10 @@ SYM_CODE_START(startup_64)
* CPUID instructions being issued, so go ahead and do that now via
* sev_enable(), which will also handle the rest of the SEV-related
* detection/setup to ensure that has been done in advance of any dependent
- * code.
+ * code. Pass the boot_params pointer as the first argument.
*/
- pushq %rsi
- movq %rsi, %rdi /* real mode address */
+ movq %r15, %rdi
call sev_enable
- popq %rsi
#endif
/*
@@ -437,13 +439,10 @@ SYM_CODE_START(startup_64)
* - Non zero RDX means trampoline needs to enable 5-level
* paging.
*
- * RSI holds real mode data and needs to be preserved across
- * this function call.
+ * Pass the boot_params pointer as the first argument.
*/
- pushq %rsi
- movq %rsi, %rdi /* real mode address */
+ movq %r15, %rdi
call paging_prepare
- popq %rsi
/* Save the trampoline address in RCX */
movq %rax, %rcx
@@ -456,9 +455,9 @@ SYM_CODE_START(startup_64)
* because the architecture does not guarantee that GPRs will retain
* their full 64-bit values across a 32-bit mode switch.
*/
+ pushq %r15
pushq %rbp
pushq %rbx
- pushq %rsi
/*
* Push the 64-bit address of trampoline_return() onto the new stack.
@@ -475,9 +474,9 @@ SYM_CODE_START(startup_64)
lretq
trampoline_return:
/* Restore live 64-bit registers */
- popq %rsi
popq %rbx
popq %rbp
+ popq %r15
/* Restore the stack, the 32-bit trampoline uses its own stack */
leaq rva(boot_stack_end)(%rbx), %rsp
@@ -487,14 +486,9 @@ trampoline_return:
*
* RDI is address of the page table to use instead of page table
* in trampoline memory (if required).
- *
- * RSI holds real mode data and needs to be preserved across
- * this function call.
*/
- pushq %rsi
leaq rva(top_pgtable)(%rbx), %rdi
call cleanup_trampoline
- popq %rsi
/* Zero EFLAGS */
pushq $0
@@ -504,7 +498,6 @@ trampoline_return:
* Copy the compressed kernel to the end of our buffer
* where decompression in place becomes safe.
*/
- pushq %rsi
leaq (_bss-8)(%rip), %rsi
leaq rva(_bss-8)(%rbx), %rdi
movl $(_bss - startup_32), %ecx
@@ -512,7 +505,6 @@ trampoline_return:
std
rep movsq
cld
- popq %rsi
/*
* The GDT may get overwritten either during the copy we just did or
@@ -544,30 +536,28 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrq $3, %rcx
rep stosq
- pushq %rsi
call load_stage2_idt
/* Pass boot_params to initialize_identity_maps() */
- movq (%rsp), %rdi
+ movq %r15, %rdi
call initialize_identity_maps
- popq %rsi
/*
* Do the extraction, and jump to the new kernel..
*/
- pushq %rsi /* Save the real mode argument */
- movq %rsi, %rdi /* real mode address */
+ /* pass struct boot_params pointer */
+ movq %r15, %rdi
leaq boot_heap(%rip), %rsi /* malloc area for uncompression */
leaq input_data(%rip), %rdx /* input_data */
movl input_len(%rip), %ecx /* input_len */
movq %rbp, %r8 /* output target address */
movl output_len(%rip), %r9d /* decompressed length, end of relocs */
call extract_kernel /* returns kernel entry point in %rax */
- popq %rsi
/*
* Jump to the decompressed kernel.
*/
+ movq %r15, %rsi
jmp *%rax
SYM_FUNC_END(.Lrelocated)
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: e8972a76aa90c05a0078043413f806c02fcb3487
Gitweb: https://git.kernel.org/tip/e8972a76aa90c05a0078043413f806c02fcb3487
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:06 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:43:13 +02:00
x86/decompressor: Call trampoline as a normal function
Move the long return to switch to 32-bit mode into the trampoline code
so it can be called as an ordinary function. This will allow it to be
called directly from C code in a subsequent patch.
While at it, reorganize the code somewhat to keep the prologue and
epilogue of the function together, making the code a bit easier to
follow. Also, given that the trampoline is now entered in 64-bit mode, a
simple RIP-relative reference can be used to take the address of the
exit point.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 79 ++++++++++++-----------------
arch/x86/boot/compressed/pgtable.h | 2 +-
2 files changed, 36 insertions(+), 45 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 19bf810..91b5eee 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -447,39 +447,8 @@ SYM_CODE_START(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
- /* Set up 32-bit addressable stack */
- leaq TRAMPOLINE_32BIT_STACK_END(%rcx), %rsp
-
- /*
- * Preserve live 64-bit registers on the stack: this is necessary
- * because the architecture does not guarantee that GPRs will retain
- * their full 64-bit values across a 32-bit mode switch.
- */
- pushq %r15
- pushq %rbp
- pushq %rbx
-
- /*
- * Push the 64-bit address of trampoline_return() onto the new stack.
- * It will be used by the trampoline to return to the main code. Due to
- * the 32-bit mode switch, it cannot be kept it in a register either.
- */
- leaq trampoline_return(%rip), %rdi
- pushq %rdi
-
- /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
- pushq $__KERNEL32_CS
leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
- pushq %rax
- lretq
-trampoline_return:
- /* Restore live 64-bit registers */
- popq %rbx
- popq %rbp
- popq %r15
-
- /* Restore the stack, the 32-bit trampoline uses its own stack */
- leaq rva(boot_stack_end)(%rbx), %rsp
+ call *%rax
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -561,7 +530,6 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
jmp *%rax
SYM_FUNC_END(.Lrelocated)
- .code32
/*
* This is the 32-bit trampoline that will be copied over to low memory.
*
@@ -570,6 +538,39 @@ SYM_FUNC_END(.Lrelocated)
* Non zero RDX means trampoline needs to enable 5-level paging.
*/
SYM_CODE_START(trampoline_32bit_src)
+ /*
+ * Preserve live 64-bit registers on the stack: this is necessary
+ * because the architecture does not guarantee that GPRs will retain
+ * their full 64-bit values across a 32-bit mode switch.
+ */
+ pushq %r15
+ pushq %rbp
+ pushq %rbx
+
+ /* Set up 32-bit addressable stack and push the old RSP value */
+ leaq (TRAMPOLINE_32BIT_STACK_END - 8)(%rcx), %rbx
+ movq %rsp, (%rbx)
+ movq %rbx, %rsp
+
+ /* Take the address of the trampoline exit code */
+ leaq .Lret(%rip), %rbx
+
+ /* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
+ pushq $__KERNEL32_CS
+ leaq 0f(%rip), %rax
+ pushq %rax
+ lretq
+
+.Lret:
+ /* Restore the preserved 64-bit registers */
+ movq (%rsp), %rsp
+ popq %rbx
+ popq %rbp
+ popq %r15
+ retq
+
+ .code32
+0:
/* Set up data and stack segments */
movl $__KERNEL_DS, %eax
movl %eax, %ds
@@ -633,12 +634,9 @@ SYM_CODE_START(trampoline_32bit_src)
1:
movl %eax, %cr4
- /* Calculate address of paging_enabled() once we are executing in the trampoline */
- leal .Lpaging_enabled - trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_OFFSET(%ecx), %eax
-
/* Prepare the stack for far return to Long Mode */
pushl $__KERNEL_CS
- pushl %eax
+ pushl %ebx
/* Enable paging again. */
movl %cr0, %eax
@@ -648,12 +646,6 @@ SYM_CODE_START(trampoline_32bit_src)
lret
SYM_CODE_END(trampoline_32bit_src)
- .code64
-SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
- /* Return from the trampoline */
- retq
-SYM_FUNC_END(.Lpaging_enabled)
-
/*
* The trampoline code has a size limit.
* Make sure we fail to compile if the trampoline code grows
@@ -661,7 +653,6 @@ SYM_FUNC_END(.Lpaging_enabled)
*/
.org trampoline_32bit_src + TRAMPOLINE_32BIT_CODE_SIZE
- .code32
SYM_FUNC_START_LOCAL_NOALIGN(.Lno_longmode)
/* This isn't an x86-64 CPU, so hang intentionally, we cannot continue */
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index cc9b252..91dbb99 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -6,7 +6,7 @@
#define TRAMPOLINE_32BIT_PGTABLE_OFFSET 0
#define TRAMPOLINE_32BIT_CODE_OFFSET PAGE_SIZE
-#define TRAMPOLINE_32BIT_CODE_SIZE 0x80
+#define TRAMPOLINE_32BIT_CODE_SIZE 0xA0
#define TRAMPOLINE_32BIT_STACK_END TRAMPOLINE_32BIT_SIZE
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 64ef578b6b6866bec012544416946533444036c8
Gitweb: https://git.kernel.org/tip/64ef578b6b6866bec012544416946533444036c8
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:09 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:46:57 +02:00
x86/decompressor: Call trampoline directly from C code
Instead of returning to the asm calling code to invoke the trampoline,
call it straight from the C code that sets it up. That way, the struct
return type is no longer needed for returning two values, and the call
can be made conditional more cleanly in a subsequent patch.
This means that all callee save 64-bit registers need to be preserved
and restored, as their contents may not survive the legacy mode switch.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 31 ++++++++++---------------
arch/x86/boot/compressed/pgtable_64.c | 32 ++++++++++----------------
2 files changed, 26 insertions(+), 37 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 37fd7b7..cd6e3e1 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -430,25 +430,14 @@ SYM_CODE_START(startup_64)
#endif
/*
- * paging_prepare() sets up the trampoline and checks if we need to
- * enable 5-level paging.
- *
- * paging_prepare() returns a two-quadword structure which lands
- * into RDX:RAX:
- * - Address of the trampoline is returned in RAX.
- * - Non zero RDX means trampoline needs to enable 5-level
- * paging.
+ * configure_5level_paging() updates the number of paging levels using
+ * a trampoline in 32-bit addressable memory if the current number does
+ * not match the desired number.
*
* Pass the boot_params pointer as the first argument.
*/
movq %r15, %rdi
- call paging_prepare
-
- /* Pass the trampoline address and boolean flag as args #1 and #2 */
- movq %rax, %rdi
- movq %rdx, %rsi
- leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
- call *%rax
+ call configure_5level_paging
/*
* cleanup_trampoline() would restore trampoline memory.
@@ -543,11 +532,14 @@ SYM_FUNC_END(.Lrelocated)
.section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
/*
- * Preserve live 64-bit registers on the stack: this is necessary
- * because the architecture does not guarantee that GPRs will retain
- * their full 64-bit values across a 32-bit mode switch.
+ * Preserve callee save 64-bit registers on the stack: this is
+ * necessary because the architecture does not guarantee that GPRs will
+ * retain their full 64-bit values across a 32-bit mode switch.
*/
pushq %r15
+ pushq %r14
+ pushq %r13
+ pushq %r12
pushq %rbp
pushq %rbx
@@ -574,6 +566,9 @@ SYM_CODE_START(trampoline_32bit_src)
/* Restore the preserved 64-bit registers */
popq %rbx
popq %rbp
+ popq %r12
+ popq %r13
+ popq %r14
popq %r15
retq
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index 5198a05..f9cc86b 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -16,11 +16,6 @@ unsigned int __section(".data") pgdir_shift = 39;
unsigned int __section(".data") ptrs_per_p4d = 1;
#endif
-struct paging_config {
- unsigned long trampoline_start;
- unsigned long l5_required;
-};
-
/* Buffer to preserve trampoline memory */
static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
@@ -29,7 +24,7 @@ static char trampoline_save[TRAMPOLINE_32BIT_SIZE];
* purposes.
*
* Avoid putting the pointer into .bss as it will be cleared between
- * paging_prepare() and extract_kernel().
+ * configure_5level_paging() and extract_kernel().
*/
unsigned long *trampoline_32bit __section(".data");
@@ -106,13 +101,13 @@ static unsigned long find_trampoline_placement(void)
return bios_start - TRAMPOLINE_32BIT_SIZE;
}
-struct paging_config paging_prepare(void *rmode)
+asmlinkage void configure_5level_paging(struct boot_params *bp)
{
- struct paging_config paging_config = {};
- void *tramp_code;
+ void (*toggle_la57)(void *trampoline, bool enable_5lvl);
+ bool l5_required = false;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
- boot_params = rmode;
+ boot_params = bp;
/*
* Check if LA57 is desired and supported.
@@ -130,7 +125,7 @@ struct paging_config paging_prepare(void *rmode)
!cmdline_find_option_bool("no5lvl") &&
native_cpuid_eax(0) >= 7 &&
(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) {
- paging_config.l5_required = 1;
+ l5_required = true;
/* Initialize variables for 5-level paging */
__pgtable_l5_enabled = 1;
@@ -138,9 +133,7 @@ struct paging_config paging_prepare(void *rmode)
ptrs_per_p4d = 512;
}
- paging_config.trampoline_start = find_trampoline_placement();
-
- trampoline_32bit = (unsigned long *)paging_config.trampoline_start;
+ trampoline_32bit = (unsigned long *)find_trampoline_placement();
/* Preserve trampoline memory */
memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
@@ -149,7 +142,7 @@ struct paging_config paging_prepare(void *rmode)
memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
/* Copy trampoline code in place */
- tramp_code = memcpy(trampoline_32bit +
+ toggle_la57 = memcpy(trampoline_32bit +
TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
@@ -159,7 +152,8 @@ struct paging_config paging_prepare(void *rmode)
* immediate absolute address, which needs to be adjusted based on the
* placement of the trampoline.
*/
- *(u32 *)(tramp_code + trampoline_ljmp_imm_offset) += (unsigned long)tramp_code;
+ *(u32 *)((u8 *)toggle_la57 + trampoline_ljmp_imm_offset) +=
+ (unsigned long)toggle_la57;
/*
* The code below prepares page table in trampoline memory.
@@ -175,10 +169,10 @@ struct paging_config paging_prepare(void *rmode)
* We are not going to use the page table in trampoline memory if we
* are already in the desired paging mode.
*/
- if (paging_config.l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+ if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
goto out;
- if (paging_config.l5_required) {
+ if (l5_required) {
/*
* For 4- to 5-level paging transition, set up current CR3 as
* the first and the only entry in a new top-level page table.
@@ -201,7 +195,7 @@ struct paging_config paging_prepare(void *rmode)
}
out:
- return paging_config;
+ toggle_la57(trampoline_32bit, l5_required);
}
void cleanup_trampoline(void *pgtable)
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 03dda95137d3247564854ad9032c0354273a159d
Gitweb: https://git.kernel.org/tip/03dda95137d3247564854ad9032c0354273a159d
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:12 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:51:17 +02:00
x86/decompressor: Merge trampoline cleanup with switching code
Now that the trampoline setup code and the actual invocation of it are
all done from the C routine, the trampoline cleanup can be merged into
it as well, instead of returning to asm just to call another C function.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 14 ++++----------
arch/x86/boot/compressed/pgtable_64.c | 18 ++++--------------
2 files changed, 8 insertions(+), 24 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index afdaf8c..fb0e562 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -430,20 +430,14 @@ SYM_CODE_START(startup_64)
* a trampoline in 32-bit addressable memory if the current number does
* not match the desired number.
*
- * Pass the boot_params pointer as the first argument.
+ * Pass the boot_params pointer as the first argument. The second
+ * argument is the relocated address of the page table to use instead
+ * of the page table in trampoline memory (if required).
*/
movq %r15, %rdi
+ leaq rva(top_pgtable)(%rbx), %rsi
call configure_5level_paging
- /*
- * cleanup_trampoline() would restore trampoline memory.
- *
- * RDI is address of the page table to use instead of page table
- * in trampoline memory (if required).
- */
- leaq rva(top_pgtable)(%rbx), %rdi
- call cleanup_trampoline
-
/* Zero EFLAGS */
pushq $0
popfq
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index eab4e6b..7939eb6 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -101,7 +101,7 @@ static unsigned long find_trampoline_placement(void)
return bios_start - TRAMPOLINE_32BIT_SIZE;
}
-asmlinkage void configure_5level_paging(struct boot_params *bp)
+asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
{
void (*toggle_la57)(void *cr3);
bool l5_required = false;
@@ -191,22 +191,12 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
}
toggle_la57(trampoline_32bit);
-}
-
-void cleanup_trampoline(void *pgtable)
-{
- void *trampoline_pgtable;
-
- trampoline_pgtable = trampoline_32bit;
/*
- * Move the top level page table out of trampoline memory,
- * if it's there.
+ * Move the top level page table out of trampoline memory.
*/
- if ((void *)__native_read_cr3() == trampoline_pgtable) {
- memcpy(pgtable, trampoline_pgtable, PAGE_SIZE);
- native_write_cr3((unsigned long)pgtable);
- }
+ memcpy(pgtable, trampoline_32bit, PAGE_SIZE);
+ native_write_cr3((unsigned long)pgtable);
/* Restore trampoline memory */
memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE);
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 12792064587623065250069d1df980e2c9ac3e67
Gitweb: https://git.kernel.org/tip/12792064587623065250069d1df980e2c9ac3e67
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:02 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:39:00 +02:00
x86/decompressor: Avoid magic offsets for EFI handover entrypoint
The native 32-bit or 64-bit EFI handover protocol entrypoint offset
relative to the respective startup_32/64 address is described in
boot_params as handover_offset, so that the special Linux/x86 aware EFI
loader can find it there.
When mixed mode is enabled, this single field has to describe this
offset for both the 32-bit and 64-bit entrypoints, so their respective
relative offsets have to be identical. Given that startup_32 and
startup_64 are 0x200 bytes apart, and the EFI handover entrypoint
resides at a fixed offset, the 32-bit and 64-bit versions of those
entrypoints must be exactly 0x200 bytes apart as well.
Currently, hard-coded fixed offsets are used to ensure this, but it is
sufficient to emit the 64-bit entrypoint 0x200 bytes after the 32-bit
one, wherever it happens to reside. This allows this code (which is now
EFI mixed mode specific) to be moved into efi_mixed.S and out of the
startup code in head_64.S.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/efi_mixed.S | 20 +++++++++++++++++++-
arch/x86/boot/compressed/head_64.S | 18 ------------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index dcc562c..9308b59 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -140,6 +140,16 @@ SYM_FUNC_START(__efi64_thunk)
SYM_FUNC_END(__efi64_thunk)
.code32
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+SYM_FUNC_START(efi32_stub_entry)
+ add $0x4, %esp /* Discard return address */
+ popl %ecx
+ popl %edx
+ popl %esi
+ jmp efi32_entry
+SYM_FUNC_END(efi32_stub_entry)
+#endif
+
/*
* EFI service pointer must be in %edi.
*
@@ -220,7 +230,7 @@ SYM_FUNC_END(efi_enter32)
* stub may still exit and return to the firmware using the Exit() EFI boot
* service.]
*/
-SYM_FUNC_START(efi32_entry)
+SYM_FUNC_START_LOCAL(efi32_entry)
call 1f
1: pop %ebx
@@ -320,6 +330,14 @@ SYM_FUNC_START(efi32_pe_entry)
RET
SYM_FUNC_END(efi32_pe_entry)
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+ .org efi32_stub_entry + 0x200
+ .code64
+SYM_FUNC_START_NOALIGN(efi64_stub_entry)
+ jmp efi_stub_entry
+SYM_FUNC_END(efi64_stub_entry)
+#endif
+
.section ".rodata"
/* EFI loaded image protocol GUID */
.balign 4
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index e688020..a3f764d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -294,17 +294,6 @@ SYM_FUNC_START(startup_32)
lret
SYM_FUNC_END(startup_32)
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
- .org 0x190
-SYM_FUNC_START(efi32_stub_entry)
- add $0x4, %esp /* Discard return address */
- popl %ecx
- popl %edx
- popl %esi
- jmp efi32_entry
-SYM_FUNC_END(efi32_stub_entry)
-#endif
-
.code64
.org 0x200
SYM_CODE_START(startup_64)
@@ -542,13 +531,6 @@ trampoline_return:
jmp *%rax
SYM_CODE_END(startup_64)
-#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
- .org 0x390
-SYM_FUNC_START(efi64_stub_entry)
- jmp efi_stub_entry
-SYM_FUNC_END(efi64_stub_entry)
-#endif
-
.text
SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: df9215f15206c2a81909ccf60f21d170801dce38
Gitweb: https://git.kernel.org/tip/df9215f15206c2a81909ccf60f21d170801dce38
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:01 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:37:09 +02:00
x86/efistub: Simplify and clean up handover entry code
Now that the EFI entry code in assembler is only used by the optional
and deprecated EFI handover protocol, and given that the EFI stub C code
no longer returns to it, most of it can simply be dropped.
While at it, clarify the symbol naming, by merging efi_main() and
efi_stub_entry(), making the latter the shared entry point for all
different boot modes that enter via the EFI stub.
The efi32_stub_entry() and efi64_stub_entry() names are referenced
explicitly by the tooling that populates the setup header, so these must
be retained, but can be emitted as aliases of efi_stub_entry() where
appropriate.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
Documentation/arch/x86/boot.rst | 2 +-
arch/x86/boot/compressed/efi_mixed.S | 22 ++++++++++++----------
arch/x86/boot/compressed/head_32.S | 11 -----------
arch/x86/boot/compressed/head_64.S | 12 ++----------
drivers/firmware/efi/libstub/x86-stub.c | 20 ++++++++++++++++----
5 files changed, 31 insertions(+), 36 deletions(-)
diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
index 33520ec..cdbca15 100644
--- a/Documentation/arch/x86/boot.rst
+++ b/Documentation/arch/x86/boot.rst
@@ -1417,7 +1417,7 @@ execution context provided by the EFI firmware.
The function prototype for the handover entry point looks like this::
- efi_main(void *handle, efi_system_table_t *table, struct boot_params *bp)
+ efi_stub_entry(void *handle, efi_system_table_t *table, struct boot_params *bp)
'handle' is the EFI image handle passed to the boot loader by the EFI
firmware, 'table' is the EFI system table - these are the first two
diff --git a/arch/x86/boot/compressed/efi_mixed.S b/arch/x86/boot/compressed/efi_mixed.S
index 4ca70bf..dcc562c 100644
--- a/arch/x86/boot/compressed/efi_mixed.S
+++ b/arch/x86/boot/compressed/efi_mixed.S
@@ -26,8 +26,8 @@
* When booting in 64-bit mode on 32-bit EFI firmware, startup_64_mixed_mode()
* is the first thing that runs after switching to long mode. Depending on
* whether the EFI handover protocol or the compat entry point was used to
- * enter the kernel, it will either branch to the 64-bit EFI handover
- * entrypoint at offset 0x390 in the image, or to the 64-bit EFI PE/COFF
+ * enter the kernel, it will either branch to the common 64-bit EFI stub
+ * entrypoint efi_stub_entry() directly, or via the 64-bit EFI PE/COFF
* entrypoint efi_pe_entry(). In the former case, the bootloader must provide a
* struct bootparams pointer as the third argument, so the presence of such a
* pointer is used to disambiguate.
@@ -37,21 +37,23 @@
* | efi32_pe_entry |---->| | | +-----------+--+
* +------------------+ | | +------+----------------+ |
* | startup_32 |---->| startup_64_mixed_mode | |
- * +------------------+ | | +------+----------------+ V
- * | efi32_stub_entry |---->| | | +------------------+
- * +------------------+ +------------+ +---->| efi64_stub_entry |
- * +-------------+----+
- * +------------+ +----------+ |
- * | startup_64 |<----| efi_main |<--------------+
- * +------------+ +----------+
+ * +------------------+ | | +------+----------------+ |
+ * | efi32_stub_entry |---->| | | |
+ * +------------------+ +------------+ | |
+ * V |
+ * +------------+ +----------------+ |
+ * | startup_64 |<----| efi_stub_entry |<--------+
+ * +------------+ +----------------+
*/
SYM_FUNC_START(startup_64_mixed_mode)
lea efi32_boot_args(%rip), %rdx
mov 0(%rdx), %edi
mov 4(%rdx), %esi
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
mov 8(%rdx), %edx // saved bootparams pointer
test %edx, %edx
- jnz efi64_stub_entry
+ jnz efi_stub_entry
+#endif
/*
* efi_pe_entry uses MS calling convention, which requires 32 bytes of
* shadow space on the stack even if all arguments are passed in
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index 987ae72..8876ffe 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -150,17 +150,6 @@ SYM_FUNC_START(startup_32)
jmp *%eax
SYM_FUNC_END(startup_32)
-#ifdef CONFIG_EFI_STUB
-SYM_FUNC_START(efi32_stub_entry)
- add $0x4, %esp
- movl 8(%esp), %esi /* save boot_params pointer */
- call efi_main
- /* efi_main returns the possibly relocated address of startup_32 */
- jmp *%eax
-SYM_FUNC_END(efi32_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi32_stub_entry)
-#endif
-
.text
SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index f732426..e688020 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -542,19 +542,11 @@ trampoline_return:
jmp *%rax
SYM_CODE_END(startup_64)
-#ifdef CONFIG_EFI_STUB
-#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#if IS_ENABLED(CONFIG_EFI_MIXED) && IS_ENABLED(CONFIG_EFI_HANDOVER_PROTOCOL)
.org 0x390
-#endif
SYM_FUNC_START(efi64_stub_entry)
- and $~0xf, %rsp /* realign the stack */
- movq %rdx, %rbx /* save boot_params pointer */
- call efi_main
- movq %rbx,%rsi
- leaq rva(startup_64)(%rax), %rax
- jmp *%rax
+ jmp efi_stub_entry
SYM_FUNC_END(efi64_stub_entry)
-SYM_FUNC_ALIAS(efi_stub_entry, efi64_stub_entry)
#endif
.text
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 40a10db..3f3b3ed 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -817,9 +817,9 @@ static void __noreturn enter_kernel(unsigned long kernel_addr,
* return. On failure, it will exit to the firmware via efi_exit() instead of
* returning.
*/
-asmlinkage unsigned long efi_main(efi_handle_t handle,
- efi_system_table_t *sys_table_arg,
- struct boot_params *boot_params)
+void __noreturn efi_stub_entry(efi_handle_t handle,
+ efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params)
{
unsigned long bzimage_addr = (unsigned long)startup_32;
unsigned long buffer_start, buffer_end;
@@ -964,7 +964,19 @@ asmlinkage unsigned long efi_main(efi_handle_t handle,
enter_kernel(bzimage_addr, boot_params);
fail:
- efi_err("efi_main() failed!\n");
+ efi_err("efi_stub_entry() failed!\n");
efi_exit(handle, status);
}
+
+#ifdef CONFIG_EFI_HANDOVER_PROTOCOL
+#ifndef CONFIG_EFI_MIXED
+extern __alias(efi_stub_entry)
+void efi32_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params);
+
+extern __alias(efi_stub_entry)
+void efi64_stub_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg,
+ struct boot_params *boot_params);
+#endif
+#endif
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 83381519352d6b5b3e429bf72aaab907480cb6b6
Gitweb: https://git.kernel.org/tip/83381519352d6b5b3e429bf72aaab907480cb6b6
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:17 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:59:13 +02:00
x86/decompressor: Factor out kernel decompression and relocation
Factor out the decompressor sequence that invokes the decompressor,
parses the ELF and applies the relocations so that it can be called
directly from the EFI stub.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/misc.c | 28 +++++++++++++++++++++++-----
arch/x86/include/asm/boot.h | 8 ++++++++
2 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 2d91d56..f711f2a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -330,11 +330,33 @@ static size_t parse_elf(void *output)
return ehdr.e_entry - LOAD_PHYSICAL_ADDR;
}
+const unsigned long kernel_total_size = VO__end - VO__text;
+
static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
extern unsigned char input_data[];
extern unsigned int input_len, output_len;
+unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
+ void (*error)(char *x))
+{
+ unsigned long entry;
+
+ if (!free_mem_ptr) {
+ free_mem_ptr = (unsigned long)boot_heap;
+ free_mem_end_ptr = (unsigned long)boot_heap + sizeof(boot_heap);
+ }
+
+ if (__decompress(input_data, input_len, NULL, NULL, outbuf, output_len,
+ NULL, error) < 0)
+ return ULONG_MAX;
+
+ entry = parse_elf(outbuf);
+ handle_relocations(outbuf, output_len, virt_addr);
+
+ return entry;
+}
+
/*
* The compressed kernel image (ZO), has been moved so that its position
* is against the end of the buffer used to hold the uncompressed kernel
@@ -354,7 +376,6 @@ extern unsigned int input_len, output_len;
*/
asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
{
- const unsigned long kernel_total_size = VO__end - VO__text;
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
memptr heap = (memptr)boot_heap;
unsigned long needed_size;
@@ -463,10 +484,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
accept_memory(__pa(output), __pa(output) + needed_size);
}
- __decompress(input_data, input_len, NULL, NULL, output, output_len,
- NULL, error);
- entry_offset = parse_elf(output);
- handle_relocations(output, output_len, virt_addr);
+ entry_offset = decompress_kernel(output, virt_addr, error);
debug_putstr("done.\nBooting the kernel (entry_offset: 0x");
debug_puthex(entry_offset);
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280..4ae1433 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -62,4 +62,12 @@
# define BOOT_STACK_SIZE 0x1000
#endif
+#ifndef __ASSEMBLY__
+extern unsigned int output_len;
+extern const unsigned long kernel_total_size;
+
+unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
+ void (*error)(char *x));
+#endif
+
#endif /* _ASM_X86_BOOT_H */
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: f97b67a773cd84bd8b55c0a0ec32448a87fc56bb
Gitweb: https://git.kernel.org/tip/f97b67a773cd84bd8b55c0a0ec32448a87fc56bb
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:10 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:48:09 +02:00
x86/decompressor: Only call the trampoline when changing paging levels
Since the current and desired number of paging levels are known when the
trampoline is being prepared, avoid calling the trampoline at all if it
is clear that calling it is not going to result in a change to the
number of paging levels.
Given that the CPU is already running in long mode, the PAE and LA57
settings are necessarily consistent with the currently active page
tables, and other fields in CR4 will be initialized by the startup code
in the kernel proper. So limit the manipulation of CR4 to toggling the
LA57 bit, which is the only thing that really needs doing at this point
in the boot. This also means that there is no need to pass the value of
l5_required to toggle_la57(), as it will not be called unless CR4.LA57
needs to toggle.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 45 ++------------------------
arch/x86/boot/compressed/pgtable_64.c | 22 +++++--------
2 files changed, 13 insertions(+), 54 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index cd6e3e1..8730b1d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -387,10 +387,6 @@ SYM_CODE_START(startup_64)
* For the trampoline, we need the top page table to reside in lower
* memory as we don't have a way to load 64-bit values into CR3 in
* 32-bit mode.
- *
- * We go though the trampoline even if we don't have to: if we're
- * already in a desired paging mode. This way the trampoline code gets
- * tested on every boot.
*/
/* Make sure we have GDT with 32-bit code segment */
@@ -526,8 +522,7 @@ SYM_FUNC_END(.Lrelocated)
*
* Return address is at the top of the stack (might be above 4G).
* The first argument (EDI) contains the 32-bit addressable base of the
- * trampoline memory. A non-zero second argument (ESI) means that the
- * trampoline needs to enable 5-level paging.
+ * trampoline memory.
*/
.section ".rodata", "a", @progbits
SYM_CODE_START(trampoline_32bit_src)
@@ -579,25 +574,10 @@ SYM_CODE_START(trampoline_32bit_src)
btrl $X86_CR0_PG_BIT, %eax
movl %eax, %cr0
- /* Check what paging mode we want to be in after the trampoline */
- testl %esi, %esi
- jz 1f
-
- /* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
- movl %cr4, %eax
- testl $X86_CR4_LA57, %eax
- jnz 3f
- jmp 2f
-1:
- /* We want 4-level paging: don't touch CR3 if it already points to 4-level page tables */
- movl %cr4, %eax
- testl $X86_CR4_LA57, %eax
- jz 3f
-2:
/* Point CR3 to the trampoline's new top level page table */
leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
movl %eax, %cr3
-3:
+
/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
movl $MSR_EFER, %ecx
rdmsr
@@ -606,26 +586,9 @@ SYM_CODE_START(trampoline_32bit_src)
jc 1f
wrmsr
1:
-#ifdef CONFIG_X86_MCE
- /*
- * Preserve CR4.MCE if the kernel will enable #MC support.
- * Clearing MCE may fault in some environments (that also force #MC
- * support). Any machine check that occurs before #MC support is fully
- * configured will crash the system regardless of the CR4.MCE value set
- * here.
- */
+ /* Toggle CR4.LA57 */
movl %cr4, %eax
- andl $X86_CR4_MCE, %eax
-#else
- movl $0, %eax
-#endif
-
- /* Enable PAE and LA57 (if required) paging modes */
- orl $X86_CR4_PAE, %eax
- testl %esi, %esi
- jz 1f
- orl $X86_CR4_LA57, %eax
-1:
+ btcl $X86_CR4_LA57_BIT, %eax
movl %eax, %cr4
/* Enable paging again. */
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index f9cc86b..4213473 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -103,7 +103,7 @@ static unsigned long find_trampoline_placement(void)
asmlinkage void configure_5level_paging(struct boot_params *bp)
{
- void (*toggle_la57)(void *trampoline, bool enable_5lvl);
+ void (*toggle_la57)(void *trampoline);
bool l5_required = false;
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
@@ -133,6 +133,13 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
ptrs_per_p4d = 512;
}
+ /*
+ * The trampoline will not be used if the paging mode is already set to
+ * the desired one.
+ */
+ if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
+ return;
+
trampoline_32bit = (unsigned long *)find_trampoline_placement();
/* Preserve trampoline memory */
@@ -160,18 +167,8 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
*
* The new page table will be used by trampoline code for switching
* from 4- to 5-level paging or vice versa.
- *
- * If switching is not required, the page table is unused: trampoline
- * code wouldn't touch CR3.
*/
- /*
- * We are not going to use the page table in trampoline memory if we
- * are already in the desired paging mode.
- */
- if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
- goto out;
-
if (l5_required) {
/*
* For 4- to 5-level paging transition, set up current CR3 as
@@ -194,8 +191,7 @@ asmlinkage void configure_5level_paging(struct boot_params *bp)
(void *)src, PAGE_SIZE);
}
-out:
- toggle_la57(trampoline_32bit, l5_required);
+ toggle_la57(trampoline_32bit);
}
void cleanup_trampoline(void *pgtable)
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 264b82fdb4989cf6a44a2bcd0c6ea05e8026b2ac
Gitweb: https://git.kernel.org/tip/264b82fdb4989cf6a44a2bcd0c6ea05e8026b2ac
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:26:58 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 19:02:06 +02:00
x86/decompressor: Don't rely on upper 32 bits of GPRs being preserved
The 4-to-5 level mode switch trampoline disables long mode and paging in
order to be able to flick the LA57 bit. According to section 3.4.1.1 of
the x86 architecture manual [0], 64-bit GPRs might not retain the upper
32 bits of their contents across such a mode switch.
Given that RBP, RBX and RSI are live at this point, preserve them on the
stack, along with the return address that might be above 4G as well.
[0] Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1: Basic Architecture
"Because the upper 32 bits of 64-bit general-purpose registers are
undefined in 32-bit modes, the upper 32 bits of any general-purpose
register are not preserved when switching from 64-bit mode to a 32-bit
mode (to protected mode or compatibility mode). Software must not
depend on these bits to maintain a value after a 64-bit to 32-bit
mode switch."
Fixes: 194a9749c73d650c ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 30 ++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 03c4328..f732426 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -459,11 +459,25 @@ SYM_CODE_START(startup_64)
/* Save the trampoline address in RCX */
movq %rax, %rcx
+ /* Set up 32-bit addressable stack */
+ leaq TRAMPOLINE_32BIT_STACK_END(%rcx), %rsp
+
+ /*
+ * Preserve live 64-bit registers on the stack: this is necessary
+ * because the architecture does not guarantee that GPRs will retain
+ * their full 64-bit values across a 32-bit mode switch.
+ */
+ pushq %rbp
+ pushq %rbx
+ pushq %rsi
+
/*
- * Load the address of trampoline_return() into RDI.
- * It will be used by the trampoline to return to the main code.
+ * Push the 64-bit address of trampoline_return() onto the new stack.
+ * It will be used by the trampoline to return to the main code. Due to
+ * the 32-bit mode switch, it cannot be kept it in a register either.
*/
leaq trampoline_return(%rip), %rdi
+ pushq %rdi
/* Switch to compatibility mode (CS.L = 0 CS.D = 1) via far return */
pushq $__KERNEL32_CS
@@ -471,6 +485,11 @@ SYM_CODE_START(startup_64)
pushq %rax
lretq
trampoline_return:
+ /* Restore live 64-bit registers */
+ popq %rsi
+ popq %rbx
+ popq %rbp
+
/* Restore the stack, the 32-bit trampoline uses its own stack */
leaq rva(boot_stack_end)(%rbx), %rsp
@@ -582,7 +601,7 @@ SYM_FUNC_END(.Lrelocated)
/*
* This is the 32-bit trampoline that will be copied over to low memory.
*
- * RDI contains the return address (might be above 4G).
+ * Return address is at the top of the stack (might be above 4G).
* ECX contains the base address of the trampoline memory.
* Non zero RDX means trampoline needs to enable 5-level paging.
*/
@@ -592,9 +611,6 @@ SYM_CODE_START(trampoline_32bit_src)
movl %eax, %ds
movl %eax, %ss
- /* Set up new stack */
- leal TRAMPOLINE_32BIT_STACK_END(%ecx), %esp
-
/* Disable paging */
movl %cr0, %eax
btrl $X86_CR0_PG_BIT, %eax
@@ -671,7 +687,7 @@ SYM_CODE_END(trampoline_32bit_src)
.code64
SYM_FUNC_START_LOCAL_NOALIGN(.Lpaging_enabled)
/* Return from the trampoline */
- jmp *%rdi
+ retq
SYM_FUNC_END(.Lpaging_enabled)
/*
The following commit has been merged into the x86/boot branch of tip:
Commit-ID: 918a7a04e71745e99a0efc6753e587439b794b29
Gitweb: https://git.kernel.org/tip/918a7a04e71745e99a0efc6753e587439b794b29
Author: Ard Biesheuvel <[email protected]>
AuthorDate: Mon, 07 Aug 2023 18:27:07 +02:00
Committer: Borislav Petkov (AMD) <[email protected]>
CommitterDate: Mon, 07 Aug 2023 20:43:59 +02:00
x86/decompressor: Use standard calling convention for trampoline
Update the trampoline code so its arguments are passed via RDI and RSI,
which matches the ordinary SysV calling convention for x86_64. This will
allow this code to be called directly from C.
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 27 +++++++++++++--------------
arch/x86/boot/compressed/pgtable.h | 2 +-
2 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 91b5eee..c475042 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -444,9 +444,9 @@ SYM_CODE_START(startup_64)
movq %r15, %rdi
call paging_prepare
- /* Save the trampoline address in RCX */
- movq %rax, %rcx
-
+ /* Pass the trampoline address and boolean flag as args #1 and #2 */
+ movq %rax, %rdi
+ movq %rdx, %rsi
leaq TRAMPOLINE_32BIT_CODE_OFFSET(%rax), %rax
call *%rax
@@ -531,11 +531,14 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
SYM_FUNC_END(.Lrelocated)
/*
- * This is the 32-bit trampoline that will be copied over to low memory.
+ * This is the 32-bit trampoline that will be copied over to low memory. It
+ * will be called using the ordinary 64-bit calling convention from code
+ * running in 64-bit mode.
*
* Return address is at the top of the stack (might be above 4G).
- * ECX contains the base address of the trampoline memory.
- * Non zero RDX means trampoline needs to enable 5-level paging.
+ * The first argument (EDI) contains the 32-bit addressable base of the
+ * trampoline memory. A non-zero second argument (ESI) means that the
+ * trampoline needs to enable 5-level paging.
*/
SYM_CODE_START(trampoline_32bit_src)
/*
@@ -582,7 +585,7 @@ SYM_CODE_START(trampoline_32bit_src)
movl %eax, %cr0
/* Check what paging mode we want to be in after the trampoline */
- testl %edx, %edx
+ testl %esi, %esi
jz 1f
/* We want 5-level paging: don't touch CR3 if it already points to 5-level page tables */
@@ -597,21 +600,17 @@ SYM_CODE_START(trampoline_32bit_src)
jz 3f
2:
/* Point CR3 to the trampoline's new top level page table */
- leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%ecx), %eax
+ leal TRAMPOLINE_32BIT_PGTABLE_OFFSET(%edi), %eax
movl %eax, %cr3
3:
/* Set EFER.LME=1 as a precaution in case hypervsior pulls the rug */
- pushl %ecx
- pushl %edx
movl $MSR_EFER, %ecx
rdmsr
btsl $_EFER_LME, %eax
/* Avoid writing EFER if no change was made (for TDX guest) */
jc 1f
wrmsr
-1: popl %edx
- popl %ecx
-
+1:
#ifdef CONFIG_X86_MCE
/*
* Preserve CR4.MCE if the kernel will enable #MC support.
@@ -628,7 +627,7 @@ SYM_CODE_START(trampoline_32bit_src)
/* Enable PAE and LA57 (if required) paging modes */
orl $X86_CR4_PAE, %eax
- testl %edx, %edx
+ testl %esi, %esi
jz 1f
orl $X86_CR4_LA57, %eax
1:
diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h
index 91dbb99..4e8cef1 100644
--- a/arch/x86/boot/compressed/pgtable.h
+++ b/arch/x86/boot/compressed/pgtable.h
@@ -14,7 +14,7 @@
extern unsigned long *trampoline_32bit;
-extern void trampoline_32bit_src(void *return_ptr);
+extern void trampoline_32bit_src(void *trampoline, bool enable_5lvl);
#endif /* __ASSEMBLER__ */
#endif /* BOOT_COMPRESSED_PAGETABLE_H */