From: Ard Biesheuvel <[email protected]>
Originally, only arch/x86/kernel/head64.c had some code that required
special care because it executes very early from the 1:1 mapping of the
kernel rather than the ordinary kernel virtual mapping.
This is no longer the case, and there is a lot of SEV related code that
is reachable from the primary startup path, with no guarantees that the
toolchain will produce code that runs correctly. This is especially
problematic when it comes to things like string literals, which are
emitted by the compiler as data objects, and subsequently referenced via
an absolute address that is not mapped yet this early in the boot [0].
Kevin has been looking into failures resulting from the fact that Clang
behaves slightly differently from GCC in this regard, by selectively
applying PIC codegen to the objects in question. However, while this
fixes the observed issues, it does not offer any guarantees, given that
the set of reachable code from startup_64() does not appear to be
bounded when running on SEV hardware.
Instead of applying this change piecemeal to objects that happen to have
caused issues in the past, this series convert the core kernel to PIC
codegen entirely.
Note that this does not entirely solve the problem of the unbounded set
of reachable code from the early SEV entrypoint: there might be code
that attempts to access global objects via their kernel virtual address
(which is not mapped yet). But at least all implicit accesses will be
made via the same translation that the code is running from.
This does result in a slight increase in code size (see below) but it
also reduces the size of the KASLR relocation table (applied by the
decompressor) by roughly half.
Before
$ size -x vmlinux
text data bss dec hex filename
0x1b78ec1 0xdde145 0x381000 47022086 2cd8006 vmlinux
After
$ size -x vmlinux
text data bss dec hex filename
0x1b8371b 0xde0d1d 0x370000 47006776 2cd4438 vmlinux
[0] arch/x86/mm/mem_encrypt_identity.c has some nice examples of this,
where RIP-relative references are emitted using inline asm.
[1] https://lkml.kernel.org/r/20240111223650.3502633-1-kevinloughlin%40google.com
Cc: Kevin Loughlin <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Dionna Glaze <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Nathan Chancellor <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Justin Stitt <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Ard Biesheuvel (5):
kallsyms: Avoid weak references for kallsyms symbols
vmlinux: Avoid weak reference to notes section
btf: Avoid weak external references
x86/head64: Replace pointer fixups with PIE codegen
x86: Build the core kernel with position independent codegen
arch/x86/Makefile | 18 ++-
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/entry/vdso/Makefile | 2 +-
arch/x86/include/asm/init.h | 2 -
arch/x86/include/asm/setup.h | 2 +-
arch/x86/kernel/head64.c | 117 +++++++-------------
arch/x86/realmode/rm/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 23 ++++
kernel/bpf/btf.c | 4 +-
kernel/kallsyms.c | 6 -
kernel/kallsyms_internal.h | 30 ++---
kernel/ksysfs.c | 4 +-
lib/buildid.c | 4 +-
13 files changed, 104 insertions(+), 111 deletions(-)
--
2.43.0.429.g432eaa2c6b-goog
From: Ard Biesheuvel <[email protected]>
If the BTF code is enabled in the build configuration, the start/stop
BTF markers are guaranteed to exist in the final link but not during the
first linker pass.
Avoid GOT based relocations to these markers in the final executable by
providing preliminary definitions that will be used by the first linker
pass, and superseded by the actual definitions in the subsequent ones.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
include/asm-generic/vmlinux.lds.h | 2 ++
kernel/bpf/btf.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index a39e050416c7..ef45331fb043 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -456,6 +456,8 @@
* independent code.
*/
#define PRELIMINARY_SYMBOL_DEFINITIONS \
+ PROVIDE(__start_BTF = .); \
+ PROVIDE(__stop_BTF = .); \
PROVIDE(kallsyms_addresses = .); \
PROVIDE(kallsyms_offsets = .); \
PROVIDE(kallsyms_names = .); \
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 596471189176..a659fc7045bb 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5581,8 +5581,8 @@ static struct btf *btf_parse(const union bpf_attr *attr, bpfptr_t uattr, u32 uat
return ERR_PTR(err);
}
-extern char __weak __start_BTF[];
-extern char __weak __stop_BTF[];
+extern char __start_BTF[];
+extern char __stop_BTF[];
extern struct btf *btf_vmlinux;
#define BPF_MAP_TYPE(_id, _ops)
--
2.43.0.429.g432eaa2c6b-goog
From: Ard Biesheuvel <[email protected]>
Weak references are references that are permitted to remain unsatisfied
in the final link. This means they cannot be implemented using place
relative relocations, resulting in GOT entries when using position
independent code generation.
The notes section should always exist, so the weak annotations can be
omitted.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
kernel/ksysfs.c | 4 ++--
lib/buildid.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 1d4bc493b2f4..347beb763c59 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -226,8 +226,8 @@ KERNEL_ATTR_RW(rcu_normal);
/*
* Make /sys/kernel/notes give the raw contents of our kernel .notes section.
*/
-extern const void __start_notes __weak;
-extern const void __stop_notes __weak;
+extern const void __start_notes;
+extern const void __stop_notes;
#define notes_size (&__stop_notes - &__start_notes)
static ssize_t notes_read(struct file *filp, struct kobject *kobj,
diff --git a/lib/buildid.c b/lib/buildid.c
index e3a7acdeef0e..15109fe191ae 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -182,8 +182,8 @@ unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX] __ro_after_init;
*/
void __init init_vmlinux_build_id(void)
{
- extern const void __start_notes __weak;
- extern const void __stop_notes __weak;
+ extern const void __start_notes;
+ extern const void __stop_notes;
unsigned int size = &__stop_notes - &__start_notes;
build_id_parse_buf(&__start_notes, vmlinux_build_id, size);
--
2.43.0.429.g432eaa2c6b-goog
From: Ard Biesheuvel <[email protected]>
Pass the -fpie flag to the compiler when building objects that are
intended for the core kernel. This ensures that all implicit symbol
references are emitted using RIP-relative relocations, allowing the code
to be executed correctly even if it runs from a different virtual
address than the address it was linked/loaded/relocated to run at.
This is necessary to ensure that all C code that gets pulled in by the
early startup code runs correctly without the need for unpalatable hacks
in the code to force RIP-relative references.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/Makefile | 7 +++++--
arch/x86/entry/vdso/Makefile | 2 +-
arch/x86/kernel/Makefile | 4 ----
arch/x86/realmode/rm/Makefile | 1 +
include/asm-generic/vmlinux.lds.h | 2 ++
5 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index bed0850d91b0..0382a9534099 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -165,11 +165,13 @@ else
KBUILD_RUSTFLAGS += $(rustflags-y)
KBUILD_CFLAGS += -mno-red-zone
- KBUILD_CFLAGS += -mcmodel=kernel
+ KBUILD_CFLAGS_MODULE += -mcmodel=kernel
KBUILD_RUSTFLAGS += -Cno-redzone=y
- KBUILD_RUSTFLAGS += -Ccode-model=kernel
+ KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small
+ KBUILD_RUSTFLAGS_MODULE += -Ccode-model=kernel
PIE_CFLAGS := -fpie -mcmodel=small \
+ $(call cc-option,-Wa$(comma)-mrelax-relocations=no) \
-include $(srctree)/include/linux/hidden.h
ifeq ($(CONFIG_STACKPROTECTOR),y)
@@ -178,6 +180,7 @@ else
endif
endif
+ KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS)
export PIE_CFLAGS
endif
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b1b8dd1608f7..e2c79d4c1417 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -149,7 +149,7 @@ $(obj)/vdso32.so.dbg: KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
$(obj)/vdso32.so.dbg: asflags-$(CONFIG_X86_64) += -m32
KBUILD_CFLAGS_32 := $(filter-out -m64,$(KBUILD_CFLAGS))
-KBUILD_CFLAGS_32 := $(filter-out -mcmodel=kernel,$(KBUILD_CFLAGS_32))
+KBUILD_CFLAGS_32 := $(filter-out -mcmodel=small,$(KBUILD_CFLAGS_32))
KBUILD_CFLAGS_32 := $(filter-out -fno-pic,$(KBUILD_CFLAGS_32))
KBUILD_CFLAGS_32 := $(filter-out -mfentry,$(KBUILD_CFLAGS_32))
KBUILD_CFLAGS_32 := $(filter-out $(RANDSTRUCT_CFLAGS),$(KBUILD_CFLAGS_32))
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 65194ca79b5c..0000325ab98f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,10 +21,6 @@ CFLAGS_REMOVE_sev.o = -pg
CFLAGS_REMOVE_rethook.o = -pg
endif
-# head64.c contains C code that may execute from a different virtual address
-# than it was linked at, so we always build it using PIE codegen
-CFLAGS_head64.o += $(PIE_CFLAGS)
-
KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n
KASAN_SANITIZE_dumpstack_$(BITS).o := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index f614009d3e4e..fdb8e780f903 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -76,5 +76,6 @@ KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
-I$(srctree)/arch/x86/boot
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
+KBUILD_CFLAGS_KERNEL := $(filter-out $(PIE_CFLAGS),$(KBUILD_CFLAGS_KERNEL))
GCOV_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ef45331fb043..9518b87207e8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -355,6 +355,7 @@
*(.data..decrypted) \
*(.ref.data) \
*(.data..shared_aligned) /* percpu related */ \
+ *(.data.rel .data.rel.*) \
MEM_KEEP(init.data*) \
*(.data.unlikely) \
__start_once = .; \
@@ -477,6 +478,7 @@
.rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \
__start_rodata = .; \
*(.rodata) *(.rodata.*) \
+ *(.data.rel.ro*) \
SCHED_DATA \
RO_AFTER_INIT_DATA /* Read only after init */ \
. = ALIGN(8); \
--
2.43.0.429.g432eaa2c6b-goog
From: Ard Biesheuvel <[email protected]>
Some of the C code in head64.c may be called from a different virtual
address than it was linked at. Currently, we deal with this by using
ordinary, position dependent codegen, and fixing up all symbol
references on the fly. This is fragile and tricky to maintain. It is
also unnecessary: we can use position independent codegen (with hidden
visibility) to ensure that all compiler generated symbol references are
RIP-relative, removing the need for fixups entirely.
It does mean we need explicit references to kernel virtual addresses to
be generated by hand, so generate those using a movabs instruction in
inline asm in the handful places where we actually need this.
While at it, move these routines to .inittext where they belong.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/Makefile | 11 ++
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/include/asm/init.h | 2 -
arch/x86/include/asm/setup.h | 2 +-
arch/x86/kernel/Makefile | 4 +
arch/x86/kernel/head64.c | 117 +++++++-------------
6 files changed, 60 insertions(+), 78 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1a068de12a56..bed0850d91b0 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -168,6 +168,17 @@ else
KBUILD_CFLAGS += -mcmodel=kernel
KBUILD_RUSTFLAGS += -Cno-redzone=y
KBUILD_RUSTFLAGS += -Ccode-model=kernel
+
+ PIE_CFLAGS := -fpie -mcmodel=small \
+ -include $(srctree)/include/linux/hidden.h
+
+ ifeq ($(CONFIG_STACKPROTECTOR),y)
+ ifeq ($(CONFIG_SMP),y)
+ PIE_CFLAGS += -mstack-protector-guard-reg=gs
+ endif
+ endif
+
+ export PIE_CFLAGS
endif
#
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index f19c038409aa..bccee07eae60 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -84,7 +84,7 @@ LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABCDGRSTVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
quiet_cmd_voffset = VOFFSET $@
cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index cc9ccf61b6bd..5f1d3c421f68 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -2,8 +2,6 @@
#ifndef _ASM_X86_INIT_H
#define _ASM_X86_INIT_H
-#define __head __section(".head.text")
-
struct x86_mapping_info {
void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
void *context; /* context for alloc_pgt_page */
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 5c83729c8e71..dc657b1d7e14 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -48,7 +48,7 @@ extern unsigned long saved_video_mode;
extern void reserve_standard_io_resources(void);
extern void i386_reserve_resources(void);
extern unsigned long __startup_64(unsigned long physaddr, struct boot_params *bp);
-extern void startup_64_setup_env(unsigned long physbase);
+extern void startup_64_setup_env(void);
extern void early_setup_idt(void);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0000325ab98f..65194ca79b5c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,10 @@ CFLAGS_REMOVE_sev.o = -pg
CFLAGS_REMOVE_rethook.o = -pg
endif
+# head64.c contains C code that may execute from a different virtual address
+# than it was linked at, so we always build it using PIE codegen
+CFLAGS_head64.o += $(PIE_CFLAGS)
+
KASAN_SANITIZE_head$(BITS).o := n
KASAN_SANITIZE_dumpstack.o := n
KASAN_SANITIZE_dumpstack_$(BITS).o := n
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index dc0956067944..21f6c2a7061b 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -85,23 +85,8 @@ static struct desc_ptr startup_gdt_descr __initdata = {
.address = 0,
};
-static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
-{
- return ptr - (void *)_text + (void *)physaddr;
-}
-
-static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr)
-{
- return fixup_pointer(ptr, physaddr);
-}
-
#ifdef CONFIG_X86_5LEVEL
-static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr)
-{
- return fixup_pointer(ptr, physaddr);
-}
-
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __init check_la57_support(void)
{
/*
* 5-level paging is detected and enabled at kernel decompression
@@ -110,23 +95,28 @@ static bool __head check_la57_support(unsigned long physaddr)
if (!(native_read_cr4() & X86_CR4_LA57))
return false;
- *fixup_int(&__pgtable_l5_enabled, physaddr) = 1;
- *fixup_int(&pgdir_shift, physaddr) = 48;
- *fixup_int(&ptrs_per_p4d, physaddr) = 512;
- *fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5;
- *fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5;
- *fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5;
+ __pgtable_l5_enabled = 1;
+ pgdir_shift = 48;
+ ptrs_per_p4d = 512;
+ page_offset_base = __PAGE_OFFSET_BASE_L5;
+ vmalloc_base = __VMALLOC_BASE_L5;
+ vmemmap_base = __VMEMMAP_BASE_L5;
return true;
}
#else
-static bool __head check_la57_support(unsigned long physaddr)
+static bool __init check_la57_support(void)
{
return false;
}
#endif
-static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
+#define __va_symbol(sym) ({ \
+ unsigned long __v; \
+ asm("movabsq $" __stringify(sym) ", %0":"=r"(__v)); \
+ __v; })
+
+static unsigned long __init sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
{
unsigned long vaddr, vaddr_end;
int i;
@@ -141,8 +131,8 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* attribute.
*/
if (sme_get_me_mask()) {
- vaddr = (unsigned long)__start_bss_decrypted;
- vaddr_end = (unsigned long)__end_bss_decrypted;
+ vaddr = __va_symbol(__start_bss_decrypted);
+ vaddr_end = __va_symbol(__end_bss_decrypted);
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
/*
@@ -169,13 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
return sme_get_me_mask();
}
-/* Code in __startup_64() can be relocated during execution, but the compiler
- * doesn't have to generate PC-relative relocations when accessing globals from
- * that function. Clang actually does not generate them, which leads to
- * boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
- */
-unsigned long __head __startup_64(unsigned long physaddr,
+unsigned long __init __startup_64(unsigned long physaddr,
struct boot_params *bp)
{
unsigned long load_delta, *p;
@@ -184,12 +168,10 @@ unsigned long __head __startup_64(unsigned long physaddr,
p4dval_t *p4d;
pudval_t *pud;
pmdval_t *pmd, pmd_entry;
- pteval_t *mask_ptr;
bool la57;
int i;
- unsigned int *next_pgt_ptr;
- la57 = check_la57_support(physaddr);
+ la57 = check_la57_support();
/* Is the address too large? */
if (physaddr >> MAX_PHYSMEM_BITS)
@@ -199,7 +181,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Compute the delta between the address I am compiled to run at
* and the address I am actually running at.
*/
- load_delta = physaddr - (unsigned long)(_text - __START_KERNEL_map);
+ load_delta = physaddr - (__va_symbol(_text) - __START_KERNEL_map);
/* Is the address not 2M aligned? */
if (load_delta & ~PMD_MASK)
@@ -210,26 +192,22 @@ unsigned long __head __startup_64(unsigned long physaddr,
/* Fixup the physical addresses in the page table */
- pgd = fixup_pointer(early_top_pgt, physaddr);
+ pgd = (pgdval_t *)early_top_pgt;
p = pgd + pgd_index(__START_KERNEL_map);
if (la57)
*p = (unsigned long)level4_kernel_pgt;
else
*p = (unsigned long)level3_kernel_pgt;
- *p += _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta;
+ *p += _PAGE_TABLE_NOENC + sme_get_me_mask();
- if (la57) {
- p4d = fixup_pointer(level4_kernel_pgt, physaddr);
- p4d[511] += load_delta;
- }
+ if (la57)
+ level4_kernel_pgt[511].p4d += load_delta;
- pud = fixup_pointer(level3_kernel_pgt, physaddr);
- pud[510] += load_delta;
- pud[511] += load_delta;
+ level3_kernel_pgt[510].pud += load_delta;
+ level3_kernel_pgt[511].pud += load_delta;
- pmd = fixup_pointer(level2_fixmap_pgt, physaddr);
for (i = FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--)
- pmd[i] += load_delta;
+ level2_fixmap_pgt[i].pmd += load_delta;
/*
* Set up the identity mapping for the switchover. These
@@ -238,15 +216,13 @@ unsigned long __head __startup_64(unsigned long physaddr,
* it avoids problems around wraparound.
*/
- next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
- pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
- pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
+ pud = (pudval_t *)early_dynamic_pgts[next_early_pgt++];
+ pmd = (pmdval_t *)early_dynamic_pgts[next_early_pgt++];
pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
if (la57) {
- p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
- physaddr);
+ p4d = (p4dval_t *)early_dynamic_pgts[next_early_pgt++];
i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
@@ -267,8 +243,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
pmd_entry = __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL;
/* Filter out unsupported __PAGE_KERNEL_* bits: */
- mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
- pmd_entry &= *mask_ptr;
+ pmd_entry &= __supported_pte_mask;
pmd_entry += sme_get_me_mask();
pmd_entry += physaddr;
@@ -294,14 +269,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
* error, causing the BIOS to halt the system.
*/
- pmd = fixup_pointer(level2_kernel_pgt, physaddr);
+ pmd = (pmdval_t *)level2_kernel_pgt;
/* invalidate pages before the kernel image */
- for (i = 0; i < pmd_index((unsigned long)_text); i++)
+ for (i = 0; i < pmd_index(__va_symbol(_text)); i++)
pmd[i] &= ~_PAGE_PRESENT;
/* fixup pages that are part of the kernel image */
- for (; i <= pmd_index((unsigned long)_end); i++)
+ for (; i <= pmd_index(__va_symbol(_end)); i++)
if (pmd[i] & _PAGE_PRESENT)
pmd[i] += load_delta;
@@ -313,7 +288,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Fixup phys_base - remove the memory encryption mask to obtain
* the true physical address.
*/
- *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+ phys_base += load_delta - sme_get_me_mask();
return sme_postprocess_startup(bp, pmd);
}
@@ -587,22 +562,16 @@ static void set_bringup_idt_handler(gate_desc *idt, int n, void *handler)
}
/* This runs while still in the direct mapping */
-static void __head startup_64_load_idt(unsigned long physbase)
+static void startup_64_load_idt(void)
{
- struct desc_ptr *desc = fixup_pointer(&bringup_idt_descr, physbase);
- gate_desc *idt = fixup_pointer(bringup_idt_table, physbase);
-
-
- if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) {
- void *handler;
+ gate_desc *idt = bringup_idt_table;
+ if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
/* VMM Communication Exception */
- handler = fixup_pointer(vc_no_ghcb, physbase);
- set_bringup_idt_handler(idt, X86_TRAP_VC, handler);
- }
+ set_bringup_idt_handler(idt, X86_TRAP_VC, vc_no_ghcb);
- desc->address = (unsigned long)idt;
- native_load_idt(desc);
+ bringup_idt_descr.address = (unsigned long)idt;
+ native_load_idt(&bringup_idt_descr);
}
/* This is used when running on kernel addresses */
@@ -621,10 +590,10 @@ void early_setup_idt(void)
/*
* Setup boot CPU state needed before kernel switches to virtual addresses.
*/
-void __head startup_64_setup_env(unsigned long physbase)
+void __init startup_64_setup_env(void)
{
/* Load GDT */
- startup_gdt_descr.address = (unsigned long)fixup_pointer(startup_gdt, physbase);
+ startup_gdt_descr.address = (unsigned long)startup_gdt;
native_load_gdt(&startup_gdt_descr);
/* New GDT is live - reload data segment registers */
@@ -632,5 +601,5 @@ void __head startup_64_setup_env(unsigned long physbase)
"movl %%eax, %%ss\n"
"movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory");
- startup_64_load_idt(physbase);
+ startup_64_load_idt();
}
--
2.43.0.429.g432eaa2c6b-goog
On Mon, Jan 22, 2024 at 4:14 AM Ard Biesheuvel <[email protected]> wrote:
>
> From: Ard Biesheuvel <[email protected]>
>
> Some of the C code in head64.c may be called from a different virtual
> address than it was linked at. Currently, we deal with this by using
> ordinary, position dependent codegen, and fixing up all symbol
> references on the fly. This is fragile and tricky to maintain. It is
> also unnecessary: we can use position independent codegen (with hidden
> visibility) to ensure that all compiler generated symbol references are
> RIP-relative, removing the need for fixups entirely.
>
> It does mean we need explicit references to kernel virtual addresses to
> be generated by hand, so generate those using a movabs instruction in
> inline asm in the handful places where we actually need this.
>
> While at it, move these routines to .inittext where they belong.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/Makefile | 11 ++
> arch/x86/boot/compressed/Makefile | 2 +-
> arch/x86/include/asm/init.h | 2 -
> arch/x86/include/asm/setup.h | 2 +-
> arch/x86/kernel/Makefile | 4 +
> arch/x86/kernel/head64.c | 117 +++++++-------------
> 6 files changed, 60 insertions(+), 78 deletions(-)
>
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 1a068de12a56..bed0850d91b0 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -168,6 +168,17 @@ else
> KBUILD_CFLAGS += -mcmodel=kernel
> KBUILD_RUSTFLAGS += -Cno-redzone=y
> KBUILD_RUSTFLAGS += -Ccode-model=kernel
> +
> + PIE_CFLAGS := -fpie -mcmodel=small \
> + -include $(srctree)/include/linux/hidden.h
> +
> + ifeq ($(CONFIG_STACKPROTECTOR),y)
> + ifeq ($(CONFIG_SMP),y)
> + PIE_CFLAGS += -mstack-protector-guard-reg=gs
> + endif
This compiler flag requires GCC 8.1 or later. When I posted a patch
series[1] to convert the stack protector to a normal percpu variable
instead of the fixed offset, there was pushback over requiring GCC 8.1
to keep stack protector support. I added code to objtool to convert
code from older compilers, but there hasn't been any feedback since.
Similar conversion code would be needed in objtool for this unless the
decision is made to require GCC 8.1 for stack protector support going
forward.
Brian Gerst
[1] https://lore.kernel.org/lkml/[email protected]/
On Mon, Jan 22, 2024 at 02:34:46PM -0500, Brian Gerst wrote:
> On Mon, Jan 22, 2024 at 4:14 AM Ard Biesheuvel <[email protected]> wrote:
> >
> > From: Ard Biesheuvel <[email protected]>
> >
> > Some of the C code in head64.c may be called from a different virtual
> > address than it was linked at. Currently, we deal with this by using
> > ordinary, position dependent codegen, and fixing up all symbol
> > references on the fly. This is fragile and tricky to maintain. It is
> > also unnecessary: we can use position independent codegen (with hidden
> > visibility) to ensure that all compiler generated symbol references are
> > RIP-relative, removing the need for fixups entirely.
> >
> > It does mean we need explicit references to kernel virtual addresses to
> > be generated by hand, so generate those using a movabs instruction in
> > inline asm in the handful places where we actually need this.
> >
> > While at it, move these routines to .inittext where they belong.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/Makefile | 11 ++
> > arch/x86/boot/compressed/Makefile | 2 +-
> > arch/x86/include/asm/init.h | 2 -
> > arch/x86/include/asm/setup.h | 2 +-
> > arch/x86/kernel/Makefile | 4 +
> > arch/x86/kernel/head64.c | 117 +++++++-------------
> > 6 files changed, 60 insertions(+), 78 deletions(-)
> >
> > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > index 1a068de12a56..bed0850d91b0 100644
> > --- a/arch/x86/Makefile
> > +++ b/arch/x86/Makefile
> > @@ -168,6 +168,17 @@ else
> > KBUILD_CFLAGS += -mcmodel=kernel
> > KBUILD_RUSTFLAGS += -Cno-redzone=y
> > KBUILD_RUSTFLAGS += -Ccode-model=kernel
> > +
> > + PIE_CFLAGS := -fpie -mcmodel=small \
> > + -include $(srctree)/include/linux/hidden.h
> > +
> > + ifeq ($(CONFIG_STACKPROTECTOR),y)
> > + ifeq ($(CONFIG_SMP),y)
> > + PIE_CFLAGS += -mstack-protector-guard-reg=gs
> > + endif
>
> This compiler flag requires GCC 8.1 or later. When I posted a patch
> series[1] to convert the stack protector to a normal percpu variable
> instead of the fixed offset, there was pushback over requiring GCC 8.1
> to keep stack protector support. I added code to objtool to convert
> code from older compilers, but there hasn't been any feedback since.
> Similar conversion code would be needed in objtool for this unless the
> decision is made to require GCC 8.1 for stack protector support going
> forward.
>
> Brian Gerst
>
> [1] https://lore.kernel.org/lkml/[email protected]/
I was going to comment on this as well, as that flag was only supported
in clang 12.0.0 and newer. It should not be too big of a deal for us
though, as I was already planning on bumping the minimum supported
version of clang for building the kernel to 13.0.1 (but there may be
breakage reports if this series lands before that):
https://lore.kernel.org/[email protected]/
Cheers,
Nathan
On Mon, 22 Jan 2024 at 23:44, Nathan Chancellor <[email protected]> wrote:
>
> On Mon, Jan 22, 2024 at 02:34:46PM -0500, Brian Gerst wrote:
> > On Mon, Jan 22, 2024 at 4:14 AM Ard Biesheuvel <ardb+git@googlecom> wrote:
> > >
> > > From: Ard Biesheuvel <[email protected]>
> > >
> > > Some of the C code in head64.c may be called from a different virtual
> > > address than it was linked at. Currently, we deal with this by using
> > > ordinary, position dependent codegen, and fixing up all symbol
> > > references on the fly. This is fragile and tricky to maintain. It is
> > > also unnecessary: we can use position independent codegen (with hidden
> > > visibility) to ensure that all compiler generated symbol references are
> > > RIP-relative, removing the need for fixups entirely.
> > >
> > > It does mean we need explicit references to kernel virtual addresses to
> > > be generated by hand, so generate those using a movabs instruction in
> > > inline asm in the handful places where we actually need this.
> > >
> > > While at it, move these routines to .inittext where they belong.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > ---
> > > arch/x86/Makefile | 11 ++
> > > arch/x86/boot/compressed/Makefile | 2 +-
> > > arch/x86/include/asm/init.h | 2 -
> > > arch/x86/include/asm/setup.h | 2 +-
> > > arch/x86/kernel/Makefile | 4 +
> > > arch/x86/kernel/head64.c | 117 +++++++-------------
> > > 6 files changed, 60 insertions(+), 78 deletions(-)
> > >
> > > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > > index 1a068de12a56..bed0850d91b0 100644
> > > --- a/arch/x86/Makefile
> > > +++ b/arch/x86/Makefile
> > > @@ -168,6 +168,17 @@ else
> > > KBUILD_CFLAGS += -mcmodel=kernel
> > > KBUILD_RUSTFLAGS += -Cno-redzone=y
> > > KBUILD_RUSTFLAGS += -Ccode-model=kernel
> > > +
> > > + PIE_CFLAGS := -fpie -mcmodel=small \
> > > + -include $(srctree)/include/linux/hidden.h
> > > +
> > > + ifeq ($(CONFIG_STACKPROTECTOR),y)
> > > + ifeq ($(CONFIG_SMP),y)
> > > + PIE_CFLAGS += -mstack-protector-guard-reg=gs
> > > + endif
> >
> > This compiler flag requires GCC 8.1 or later. When I posted a patch
> > series[1] to convert the stack protector to a normal percpu variable
> > instead of the fixed offset, there was pushback over requiring GCC 8.1
> > to keep stack protector support. I added code to objtool to convert
> > code from older compilers, but there hasn't been any feedback since.
> > Similar conversion code would be needed in objtool for this unless the
> > decision is made to require GCC 8.1 for stack protector support going
> > forward.
> >
> > Brian Gerst
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
>
> I was going to comment on this as well, as that flag was only supported
> in clang 12.0.0 and newer. It should not be too big of a deal for us
> though, as I was already planning on bumping the minimum supported
> version of clang for building the kernel to 13.0.1 (but there may be
> breakage reports if this series lands before that):
>
Thanks for pointing this out.
Given that building the entire kernel with fPIC is neither necessary
nor sufficient, I am going to abandon this approach.
If we apply fPIC to only a handful of compilation units containing
code that runs from the 1:1 mapping, it is not unreasonable to simply
disable the stack protector altogether for those pieces too. This
works around the older GCC issue.