2016-04-18 15:10:00

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 0/8] arm64: kaslr cleanups and improvements

This is a follow up to my series 'arm64: more granular KASLR' [1] that I sent
out about six weeks ago. It also partially supersedes [2].

The first patch is an unrelated cleanup that is completely orthogonal (but
happens to touch head.S as well) and is arbitrarily listed first.

Patches #2 to #5 address some issues that were introduced by KASLR, primarily
that we now have to take great care to only dereference literals that are
subject to R_AARCH64_AB64 relocations until after the relocation routine has
completed, and, since the latter runs with the caches on, take care not to
derefence such literals on secondaries until the MMU is enabled.

Formerly, this was addressed by using literals holding complicated expressions
that can be resolved at link time via R_AARCH64_PREL64/R_AARCH64_PREL32
relocations, and by explicitly cleaning these literals in the caches so that
the secondaries can see them with the MMU off.

Instead, take care not to use /any/ 64-bit literals until after the relocation
code has executed, and after the MMU is enabled. This makes the code a lot
cleaner, and less error prone.

The final three patches enhance the KASLR code, by dealing with relocatable
kernels whose physical placement is not TEXT_OFFSET bytes beyond a 2 MB aligned
base address, and by using this capability deliberately to allow for 5 bits of
additional entropy to be used.

[1] http://thread.gmane.org/gmane.linux.ports.arm.kernel/483819
[2] http://thread.gmane.org/gmane.linux.ports.arm.kernel/490216

Ard Biesheuvel (8):
arm64: kernel: don't export local symbols from head.S
arm64: kernel: use literal for relocated address of
__secondary_switched
arm64: kernel: perform relocation processing from ID map
arm64: introduce mov_q macro to move a constant into a 64-bit register
arm64: kernel: replace early 64-bit literal loads with move-immediates
arm64: don't map TEXT_OFFSET bytes below the kernel if we can avoid it
arm64: relocatable: deal with physically misaligned kernel images
arm64: kaslr: increase randomization granularity

arch/arm64/include/asm/assembler.h | 20 +++
arch/arm64/kernel/head.S | 136 +++++++++++---------
arch/arm64/kernel/image.h | 2 -
arch/arm64/kernel/kaslr.c | 6 +-
arch/arm64/kernel/vmlinux.lds.S | 7 +-
drivers/firmware/efi/libstub/arm64-stub.c | 15 ++-
6 files changed, 112 insertions(+), 74 deletions(-)

--
2.5.0


2016-04-18 15:10:06

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 1/8] arm64: kernel: don't export local symbols from head.S

This unexports some symbols from head.S that are only used locally.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/kernel/head.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index b43417618847..ac27d8d937b2 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -638,7 +638,7 @@ ENDPROC(el2_setup)
* Sets the __boot_cpu_mode flag depending on the CPU boot mode passed
* in x20. See arch/arm64/include/asm/virt.h for more info.
*/
-ENTRY(set_cpu_boot_mode_flag)
+set_cpu_boot_mode_flag:
adr_l x1, __boot_cpu_mode
cmp w20, #BOOT_CPU_MODE_EL2
b.ne 1f
@@ -691,7 +691,7 @@ ENTRY(secondary_entry)
b secondary_startup
ENDPROC(secondary_entry)

-ENTRY(secondary_startup)
+secondary_startup:
/*
* Common entry point for secondary CPUs.
*/
@@ -706,7 +706,7 @@ ENTRY(secondary_startup)
ENDPROC(secondary_startup)
0: .long (_text - TEXT_OFFSET) - __secondary_switched

-ENTRY(__secondary_switched)
+__secondary_switched:
adr_l x5, vectors
msr vbar_el1, x5
isb
--
2.5.0

2016-04-18 15:10:11

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 3/8] arm64: kernel: perform relocation processing from ID map

Refactor the relocation processing so that the code executes from the
ID map while accessing the relocation tables via the virtual mapping.
This way, we can use literals containing virtual addresses as before,
instead of having to use convoluted absolute expressions.

For symmetry with the secondary code path, the relocation code and the
subsequent jump to the virtual entry point are implemented in a function
called __primary_switch(), and __mmap_switched() is renamed to
__primary_switched(). Also, the call sequence in stext() is aligned with
the one in secondary_startup(), by replacing the awkward 'adr_l lr' and
'b cpu_setup' sequence with a simple branch and link.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/kernel/head.S | 96 +++++++++++---------
arch/arm64/kernel/vmlinux.lds.S | 7 +-
2 files changed, 55 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index f13276d4ca91..0d487d90d221 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -223,13 +223,11 @@ ENTRY(stext)
* On return, the CPU will be ready for the MMU to be turned on and
* the TCR will have been set.
*/
- ldr x27, 0f // address to jump to after
- neg x27, x27 // MMU has been enabled
- adr_l lr, __enable_mmu // return (PIC) address
- b __cpu_setup // initialise processor
+ bl __cpu_setup // initialise processor
+ adr_l x27, __primary_switch // address to jump to after
+ // MMU has been enabled
+ b __enable_mmu
ENDPROC(stext)
- .align 3
-0: .quad (_text - TEXT_OFFSET) - __mmap_switched - KIMAGE_VADDR

/*
* Preserve the arguments passed by the bootloader in x0 .. x3
@@ -421,7 +419,7 @@ ENDPROC(__create_page_tables)
* The following fragment of code is executed with the MMU enabled.
*/
.set initial_sp, init_thread_union + THREAD_START_SP
-__mmap_switched:
+__primary_switched:
mov x28, lr // preserve LR
adr_l x8, vectors // load VBAR_EL1 with virtual
msr vbar_el1, x8 // vector table address
@@ -435,42 +433,6 @@ __mmap_switched:
bl __pi_memset
dsb ishst // Make zero page visible to PTW

-#ifdef CONFIG_RELOCATABLE
-
- /*
- * Iterate over each entry in the relocation table, and apply the
- * relocations in place.
- */
- adr_l x8, __dynsym_start // start of symbol table
- adr_l x9, __reloc_start // start of reloc table
- adr_l x10, __reloc_end // end of reloc table
-
-0: cmp x9, x10
- b.hs 2f
- ldp x11, x12, [x9], #24
- ldr x13, [x9, #-8]
- cmp w12, #R_AARCH64_RELATIVE
- b.ne 1f
- add x13, x13, x23 // relocate
- str x13, [x11, x23]
- b 0b
-
-1: cmp w12, #R_AARCH64_ABS64
- b.ne 0b
- add x12, x12, x12, lsl #1 // symtab offset: 24x top word
- add x12, x8, x12, lsr #(32 - 3) // ... shifted into bottom word
- ldrsh w14, [x12, #6] // Elf64_Sym::st_shndx
- ldr x15, [x12, #8] // Elf64_Sym::st_value
- cmp w14, #-0xf // SHN_ABS (0xfff1) ?
- add x14, x15, x23 // relocate
- csel x15, x14, x15, ne
- add x15, x13, x15
- str x15, [x11, x23]
- b 0b
-
-2:
-#endif
-
adr_l sp, initial_sp, x4
mov x4, sp
and x4, x4, #~(THREAD_SIZE - 1)
@@ -496,7 +458,7 @@ __mmap_switched:
0:
#endif
b start_kernel
-ENDPROC(__mmap_switched)
+ENDPROC(__primary_switched)

/*
* end early head section, begin head code that is also used for
@@ -788,7 +750,6 @@ __enable_mmu:
ic iallu // flush instructions fetched
dsb nsh // via old mapping
isb
- add x27, x27, x23 // relocated __mmap_switched
#endif
br x27
ENDPROC(__enable_mmu)
@@ -802,6 +763,51 @@ __no_granule_support:
b 1b
ENDPROC(__no_granule_support)

+__primary_switch:
+#ifdef CONFIG_RELOCATABLE
+ /*
+ * Iterate over each entry in the relocation table, and apply the
+ * relocations in place.
+ */
+ ldr w8, =__dynsym_offset // offset to symbol table
+ ldr w9, =__rela_offset // offset to reloc table
+ ldr w10, =__rela_size // size of reloc table
+
+ ldr x11, =KIMAGE_VADDR // default virtual offset
+ add x11, x11, x23 // actual virtual offset
+ add x8, x8, x11 // __va(.dynsym)
+ add x9, x9, x11 // __va(.rela)
+ add x10, x9, x10 // __va(.rela) + sizeof(.rela)
+
+0: cmp x9, x10
+ b.hs 2f
+ ldp x11, x12, [x9], #24
+ ldr x13, [x9, #-8]
+ cmp w12, #R_AARCH64_RELATIVE
+ b.ne 1f
+ add x13, x13, x23 // relocate
+ str x13, [x11, x23]
+ b 0b
+
+1: cmp w12, #R_AARCH64_ABS64
+ b.ne 0b
+ add x12, x12, x12, lsl #1 // symtab offset: 24x top word
+ add x12, x8, x12, lsr #(32 - 3) // ... shifted into bottom word
+ ldrsh w14, [x12, #6] // Elf64_Sym::st_shndx
+ ldr x15, [x12, #8] // Elf64_Sym::st_value
+ cmp w14, #-0xf // SHN_ABS (0xfff1) ?
+ add x14, x15, x23 // relocate
+ csel x15, x14, x15, ne
+ add x15, x13, x15
+ str x15, [x11, x23]
+ b 0b
+
+2:
+#endif
+ ldr x8, =__primary_switched
+ br x8
+ENDPROC(__primary_switch)
+
__secondary_switch:
ldr x8, =__secondary_switched
br x8
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 77d86c976abd..8918b303cc61 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -158,12 +158,9 @@ SECTIONS
*(.altinstr_replacement)
}
.rela : ALIGN(8) {
- __reloc_start = .;
*(.rela .rela*)
- __reloc_end = .;
}
.dynsym : ALIGN(8) {
- __dynsym_start = .;
*(.dynsym)
}
.dynstr : {
@@ -173,6 +170,10 @@ SECTIONS
*(.hash)
}

+ __rela_offset = ADDR(.rela) - KIMAGE_VADDR;
+ __rela_size = SIZEOF(.rela);
+ __dynsym_offset = ADDR(.dynsym) - KIMAGE_VADDR;
+
. = ALIGN(SEGMENT_ALIGN);
__init_end = .;

--
2.5.0

2016-04-18 15:10:17

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 4/8] arm64: introduce mov_q macro to move a constant into a 64-bit register

Implement a macro mov_q that can be used to move an immediate constant
into a 64-bit register, using between 2 and 4 movz/movk instructions
(depending on the operand)

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/include/asm/assembler.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 70f7b9e04598..9ea846ded55c 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -233,4 +233,24 @@ lr .req x30 // link register
.long \sym\()_hi32
.endm

+ /*
+ * mov_q - move an immediate constant into a 64-bit register using
+ * between 2 and 4 movz/movk instructions (depending on the
+ * magnitude and sign of the operand)
+ */
+ .macro mov_q, reg, val
+ .if (((\val) >> 31) == 0 || ((\val) >> 31) == 0x1ffffffff)
+ movz \reg, :abs_g1_s:\val
+ .else
+ .if (((\val) >> 47) == 0 || ((\val) >> 47) == 0x1ffff)
+ movz \reg, :abs_g2_s:\val
+ .else
+ movz \reg, :abs_g3:\val
+ movk \reg, :abs_g2_nc:\val
+ .endif
+ movk \reg, :abs_g1_nc:\val
+ .endif
+ movk \reg, :abs_g0_nc:\val
+ .endm
+
#endif /* __ASM_ASSEMBLER_H */
--
2.5.0

2016-04-18 15:10:22

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 7/8] arm64: relocatable: deal with physically misaligned kernel images

When booting a relocatable kernel image, there is no practical reason
to refuse an image whose load address is not exactly TEXT_OFFSET bytes
above a 2 MB aligned base address, as long as the physical and virtual
misalignment with respect to the swapper block size are equal, and are
both aligned to THREAD_SIZE.

Since the virtual misalignment is under our control when we first enter
the kernel proper, we can simply choose its value to be equal to the
physical misalignment.

So treat the misalignment of the physical load address as the initial
KASLR offset, and fix up the remaining code to deal with that.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/kernel/head.S | 9 ++++++---
arch/arm64/kernel/kaslr.c | 6 +++---
2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index c5e5edca6897..00a32101ab51 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -25,6 +25,7 @@
#include <linux/irqchip/arm-gic-v3.h>

#include <asm/assembler.h>
+#include <asm/boot.h>
#include <asm/ptrace.h>
#include <asm/asm-offsets.h>
#include <asm/cache.h>
@@ -213,8 +214,8 @@ efi_header_end:
ENTRY(stext)
bl preserve_boot_args
bl el2_setup // Drop to EL1, w20=cpu_boot_mode
- mov x23, xzr // KASLR offset, defaults to 0
adrp x24, __PHYS_OFFSET
+ and x23, x24, MIN_KIMG_ALIGN - 1 // KASLR offset, defaults to 0
bl set_cpu_boot_mode_flag
bl __create_page_tables // x25=TTBR0, x26=TTBR1
/*
@@ -449,11 +450,13 @@ __primary_switched:
bl kasan_early_init
#endif
#ifdef CONFIG_RANDOMIZE_BASE
- cbnz x23, 0f // already running randomized?
+ tst x23, ~(MIN_KIMG_ALIGN - 1) // already running randomized?
+ b.ne 0f
mov x0, x21 // pass FDT address in x0
+ mov x1, x23 // pass modulo offset in x1
bl kaslr_early_init // parse FDT for KASLR options
cbz x0, 0f // KASLR disabled? just proceed
- mov x23, x0 // record KASLR offset
+ orr x23, x23, x0 // record KASLR offset
ret x28 // we must enable KASLR, return
// to __enable_mmu()
0:
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 582983920054..b05469173ba5 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -74,7 +74,7 @@ extern void *__init __fixmap_remap_fdt(phys_addr_t dt_phys, int *size,
* containing function pointers) to be reinitialized, and zero-initialized
* .bss variables will be reset to 0.
*/
-u64 __init kaslr_early_init(u64 dt_phys)
+u64 __init kaslr_early_init(u64 dt_phys, u64 modulo_offset)
{
void *fdt;
u64 seed, offset, mask, module_range;
@@ -132,8 +132,8 @@ u64 __init kaslr_early_init(u64 dt_phys)
* boundary (for 4KB/16KB/64KB granule kernels, respectively). If this
* happens, increase the KASLR offset by the size of the kernel image.
*/
- if ((((u64)_text + offset) >> SWAPPER_TABLE_SHIFT) !=
- (((u64)_end + offset) >> SWAPPER_TABLE_SHIFT))
+ if ((((u64)_text + offset + modulo_offset) >> SWAPPER_TABLE_SHIFT) !=
+ (((u64)_end + offset + modulo_offset) >> SWAPPER_TABLE_SHIFT))
offset = (offset + (u64)(_end - _text)) & mask;

if (IS_ENABLED(CONFIG_KASAN))
--
2.5.0

2016-04-18 15:10:37

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 8/8] arm64: kaslr: increase randomization granularity

Currently, our KASLR implementation randomizes the placement of the core
kernel at 2 MB granularity. This is based on the arm64 kernel boot
protocol, which mandates that the kernel is loaded TEXT_OFFSET bytes above
a 2 MB aligned base address. This requirement is a result of the fact that
the block size used by the early mapping code may be 2 MB at the most (for
a 4 KB granule kernel)

But we can do better than that: since a KASLR kernel needs to be relocated
in any case, we can tolerate a physical misalignment as long as the virtual
misalignment relative to this 2 MB block size is equal in size, and code to
deal with this is already in place.

Since we align the kernel segments to 64 KB, let's randomize the physical
offset at 64 KB granularity as well (unless CONFIG_DEBUG_ALIGN_RODATA is
enabled). This way, the page table and TLB footprint is not affected.

The higher granularity allows for 5 bits of additional entropy to be used.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
drivers/firmware/efi/libstub/arm64-stub.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index a90f6459f5c6..eae693eb3e91 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -81,15 +81,24 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table_arg,

if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && phys_seed != 0) {
/*
+ * If CONFIG_DEBUG_ALIGN_RODATA is not set, produce a
+ * displacement in the interval [0, MIN_KIMG_ALIGN) that
+ * is a multiple of the minimal segment alignment (SZ_64K)
+ */
+ u32 mask = (MIN_KIMG_ALIGN - 1) & ~(SZ_64K - 1);
+ u32 offset = !IS_ENABLED(CONFIG_DEBUG_ALIGN_RODATA) ?
+ (phys_seed >> 32) & mask : TEXT_OFFSET;
+
+ /*
* If KASLR is enabled, and we have some randomness available,
* locate the kernel at a randomized offset in physical memory.
*/
- *reserve_size = kernel_memsize + TEXT_OFFSET;
+ *reserve_size = kernel_memsize + offset;
status = efi_random_alloc(sys_table_arg, *reserve_size,
MIN_KIMG_ALIGN, reserve_addr,
- phys_seed);
+ (u32)phys_seed);

- *image_addr = *reserve_addr + TEXT_OFFSET;
+ *image_addr = *reserve_addr + offset;
} else {
/*
* Else, try a straight allocation at the preferred offset.
--
2.5.0

2016-04-18 15:10:15

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 5/8] arm64: kernel: replace early 64-bit literal loads with move-immediates

When building a relocatable kernel, we currently rely on the fact that
early 64-bit literal loads need to be deferred to after the relocation
has been performed only if they involve symbol references, and not if
they involve assemble time constants. While this is not an unreasonable
assumption to make, it is better to switch to movk/movz sequences, since
these are guaranteed to be resolved at link time, simply because there are
no dynamic relocation types to describe them.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/kernel/head.S | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 0d487d90d221..dae9cabaadf5 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -337,7 +337,7 @@ __create_page_tables:
cmp x0, x6
b.lo 1b

- ldr x7, =SWAPPER_MM_MMUFLAGS
+ mov x7, SWAPPER_MM_MMUFLAGS

/*
* Create the identity mapping.
@@ -393,7 +393,7 @@ __create_page_tables:
* Map the kernel image (starting with PHYS_OFFSET).
*/
mov x0, x26 // swapper_pg_dir
- ldr x5, =KIMAGE_VADDR
+ mov_q x5, KIMAGE_VADDR
add x5, x5, x23 // add KASLR displacement
create_pgd_entry x0, x5, x3, x6
ldr w6, =kernel_img_size
@@ -631,7 +631,7 @@ ENTRY(secondary_holding_pen)
bl el2_setup // Drop to EL1, w20=cpu_boot_mode
bl set_cpu_boot_mode_flag
mrs x0, mpidr_el1
- ldr x1, =MPIDR_HWID_BITMASK
+ mov_q x1, MPIDR_HWID_BITMASK
and x0, x0, x1
adr_l x3, secondary_holding_pen_release
pen: ldr x4, [x3]
@@ -773,7 +773,7 @@ __primary_switch:
ldr w9, =__rela_offset // offset to reloc table
ldr w10, =__rela_size // size of reloc table

- ldr x11, =KIMAGE_VADDR // default virtual offset
+ mov_q x11, KIMAGE_VADDR // default virtual offset
add x11, x11, x23 // actual virtual offset
add x8, x8, x11 // __va(.dynsym)
add x9, x9, x11 // __va(.rela)
--
2.5.0

2016-04-18 15:13:25

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 6/8] arm64: don't map TEXT_OFFSET bytes below the kernel if we can avoid it

For historical reasons, the kernel Image must be loaded into physical
memory at a 512 KB offset above a 2 MB aligned base address. The region
between the base address and the start of the kernel Image has no
significance to the kernel itself, but it is currently mapped explicitly
into the early kernel VMA range for all translation granules.

In some cases (i.e., 4 KB granule), this is unavoidable, due to the 2 MB
granularity of the early kernel mappings. However, in other cases, e.g.,
when running with larger page sizes, or in the future, with more granular
KASLR, there is no reason to map it explicitly like we do currently.

So update the logic so that the region is mapped only if that happens as
a side effect of rounding the start address of the kernel to swapper block
size, and leave it unmapped otherwise.

Since the symbol kernel_img_size now simply resolves to the memory
footprint of the kernel Image, we can drop its definition from image.h
and opencode its calculation.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/kernel/head.S | 9 +++++----
arch/arm64/kernel/image.h | 2 --
2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index dae9cabaadf5..c5e5edca6897 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -393,12 +393,13 @@ __create_page_tables:
* Map the kernel image (starting with PHYS_OFFSET).
*/
mov x0, x26 // swapper_pg_dir
- mov_q x5, KIMAGE_VADDR
+ mov_q x5, KIMAGE_VADDR + TEXT_OFFSET // compile time __va(_text)
add x5, x5, x23 // add KASLR displacement
create_pgd_entry x0, x5, x3, x6
- ldr w6, =kernel_img_size
- add x6, x6, x5
- mov x3, x24 // phys offset
+ adrp x6, _end // runtime __pa(_end)
+ adrp x3, _text // runtime __pa(_text)
+ sub x6, x6, x3 // _end - _text
+ add x6, x6, x5 // runtime __va(_end)
create_block_map x0, x7, x3, x5, x6

/*
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index 4fd72da646a3..86d444f9c2c1 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -71,8 +71,6 @@
DEFINE_IMAGE_LE64(_kernel_offset_le, TEXT_OFFSET); \
DEFINE_IMAGE_LE64(_kernel_flags_le, __HEAD_FLAGS);

-kernel_img_size = _end - (_text - TEXT_OFFSET);
-
#ifdef CONFIG_EFI

__efistub_stext_offset = stext - _text;
--
2.5.0

2016-04-18 15:14:08

by Ard Biesheuvel

[permalink] [raw]
Subject: [PATCH 2/8] arm64: kernel: use literal for relocated address of __secondary_switched

We can simply use a relocated 64-bit literal to store the address of
__secondary_switched(), and the relocation code will ensure that it
holds the correct value at secondary entry time, as long as we make sure
that the literal is not dereferenced until after we have enabled the MMU.

So jump via a small __secondary_switch() function covered by the ID map
that performs the literal load and branch-to-register.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/kernel/head.S | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index ac27d8d937b2..f13276d4ca91 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -468,9 +468,7 @@ __mmap_switched:
str x15, [x11, x23]
b 0b

-2: adr_l x8, kimage_vaddr // make relocated kimage_vaddr
- dc cvac, x8 // value visible to secondaries
- dsb sy // with MMU off
+2:
#endif

adr_l sp, initial_sp, x4
@@ -699,12 +697,9 @@ secondary_startup:
adrp x26, swapper_pg_dir
bl __cpu_setup // initialise processor

- ldr x8, kimage_vaddr
- ldr w9, 0f
- sub x27, x8, w9, sxtw // address to jump to after enabling the MMU
+ adr_l x27, __secondary_switch // address to jump to after enabling the MMU
b __enable_mmu
ENDPROC(secondary_startup)
-0: .long (_text - TEXT_OFFSET) - __secondary_switched

__secondary_switched:
adr_l x5, vectors
@@ -806,3 +801,8 @@ __no_granule_support:
wfi
b 1b
ENDPROC(__no_granule_support)
+
+__secondary_switch:
+ ldr x8, =__secondary_switched
+ br x8
+ENDPROC(__secondary_switch)
--
2.5.0

2016-04-18 15:36:09

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 1/8] arm64: kernel: don't export local symbols from head.S

On Mon, Apr 18, 2016 at 05:09:41PM +0200, Ard Biesheuvel wrote:
> This unexports some symbols from head.S that are only used locally.

It might be worth s/some/all/, as that makes this sound less arbitrary
(and AFAICS this caters for all symbols only used locally).

> Signed-off-by: Ard Biesheuvel <[email protected]>

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> arch/arm64/kernel/head.S | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index b43417618847..ac27d8d937b2 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -638,7 +638,7 @@ ENDPROC(el2_setup)
> * Sets the __boot_cpu_mode flag depending on the CPU boot mode passed
> * in x20. See arch/arm64/include/asm/virt.h for more info.
> */
> -ENTRY(set_cpu_boot_mode_flag)
> +set_cpu_boot_mode_flag:
> adr_l x1, __boot_cpu_mode
> cmp w20, #BOOT_CPU_MODE_EL2
> b.ne 1f
> @@ -691,7 +691,7 @@ ENTRY(secondary_entry)
> b secondary_startup
> ENDPROC(secondary_entry)
>
> -ENTRY(secondary_startup)
> +secondary_startup:
> /*
> * Common entry point for secondary CPUs.
> */
> @@ -706,7 +706,7 @@ ENTRY(secondary_startup)
> ENDPROC(secondary_startup)
> 0: .long (_text - TEXT_OFFSET) - __secondary_switched
>
> -ENTRY(__secondary_switched)
> +__secondary_switched:
> adr_l x5, vectors
> msr vbar_el1, x5
> isb
> --
> 2.5.0
>

2016-04-18 15:58:08

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 2/8] arm64: kernel: use literal for relocated address of __secondary_switched

On Mon, Apr 18, 2016 at 05:09:42PM +0200, Ard Biesheuvel wrote:
> We can simply use a relocated 64-bit literal to store the address of
> __secondary_switched(), and the relocation code will ensure that it
> holds the correct value at secondary entry time, as long as we make sure
> that the literal is not dereferenced until after we have enabled the MMU.
>
> So jump via a small __secondary_switch() function covered by the ID map
> that performs the literal load and branch-to-register.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>

Neat!

Acked-by: Mark Rutland <[email protected]>

Mark.

> ---
> arch/arm64/kernel/head.S | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index ac27d8d937b2..f13276d4ca91 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -468,9 +468,7 @@ __mmap_switched:
> str x15, [x11, x23]
> b 0b
>
> -2: adr_l x8, kimage_vaddr // make relocated kimage_vaddr
> - dc cvac, x8 // value visible to secondaries
> - dsb sy // with MMU off
> +2:
> #endif
>
> adr_l sp, initial_sp, x4
> @@ -699,12 +697,9 @@ secondary_startup:
> adrp x26, swapper_pg_dir
> bl __cpu_setup // initialise processor
>
> - ldr x8, kimage_vaddr
> - ldr w9, 0f
> - sub x27, x8, w9, sxtw // address to jump to after enabling the MMU
> + adr_l x27, __secondary_switch // address to jump to after enabling the MMU
> b __enable_mmu
> ENDPROC(secondary_startup)
> -0: .long (_text - TEXT_OFFSET) - __secondary_switched
>
> __secondary_switched:
> adr_l x5, vectors
> @@ -806,3 +801,8 @@ __no_granule_support:
> wfi
> b 1b
> ENDPROC(__no_granule_support)
> +
> +__secondary_switch:
> + ldr x8, =__secondary_switched
> + br x8
> +ENDPROC(__secondary_switch)
> --
> 2.5.0
>

2016-04-19 16:03:13

by Laurentiu Tudor

[permalink] [raw]
Subject: Re: [PATCH 5/8] arm64: kernel: replace early 64-bit literal loads with move-immediates

On 04/18/2016 06:09 PM, Ard Biesheuvel wrote:
> When building a relocatable kernel, we currently rely on the fact that
> early 64-bit literal loads need to be deferred to after the relocation
> has been performed only if they involve symbol references, and not if
> they involve assemble time constants. While this is not an unreasonable
> assumption to make, it is better to switch to movk/movz sequences, since
> these are guaranteed to be resolved at link time, simply because there are
> no dynamic relocation types to describe them.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/arm64/kernel/head.S | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 0d487d90d221..dae9cabaadf5 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -337,7 +337,7 @@ __create_page_tables:
> cmp x0, x6
> b.lo 1b
>
> - ldr x7, =SWAPPER_MM_MMUFLAGS
> + mov x7, SWAPPER_MM_MMUFLAGS

mov_q here too?

---
Best Regards, Laurentiu

>
> /*
> * Create the identity mapping.
> @@ -393,7 +393,7 @@ __create_page_tables:
> * Map the kernel image (starting with PHYS_OFFSET).
> */
> mov x0, x26 // swapper_pg_dir
> - ldr x5, =KIMAGE_VADDR
> + mov_q x5, KIMAGE_VADDR
> add x5, x5, x23 // add KASLR displacement
> create_pgd_entry x0, x5, x3, x6
> ldr w6, =kernel_img_size
> @@ -631,7 +631,7 @@ ENTRY(secondary_holding_pen)
> bl el2_setup // Drop to EL1, w20=cpu_boot_mode
> bl set_cpu_boot_mode_flag
> mrs x0, mpidr_el1
> - ldr x1, =MPIDR_HWID_BITMASK
> + mov_q x1, MPIDR_HWID_BITMASK
> and x0, x0, x1
> adr_l x3, secondary_holding_pen_release
> pen: ldr x4, [x3]
> @@ -773,7 +773,7 @@ __primary_switch:
> ldr w9, =__rela_offset // offset to reloc table
> ldr w10, =__rela_size // size of reloc table
>
> - ldr x11, =KIMAGE_VADDR // default virtual offset
> + mov_q x11, KIMAGE_VADDR // default virtual offset
> add x11, x11, x23 // actual virtual offset
> add x8, x8, x11 // __va(.dynsym)
> add x9, x9, x11 // __va(.rela)
>

2016-04-25 16:38:51

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 0/8] arm64: kaslr cleanups and improvements

On Mon, Apr 18, 2016 at 05:09:40PM +0200, Ard Biesheuvel wrote:
> Ard Biesheuvel (8):
> arm64: kernel: don't export local symbols from head.S
> arm64: kernel: use literal for relocated address of
> __secondary_switched
> arm64: kernel: perform relocation processing from ID map
> arm64: introduce mov_q macro to move a constant into a 64-bit register
> arm64: kernel: replace early 64-bit literal loads with move-immediates
> arm64: don't map TEXT_OFFSET bytes below the kernel if we can avoid it
> arm64: relocatable: deal with physically misaligned kernel images
> arm64: kaslr: increase randomization granularity

I went through these patches and there is indeed a nice clean-up. The
increased KASLR granularity also looks fine. So, for the series:

Acked-by: Catalin Marinas <[email protected]>

2016-04-26 11:27:22

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 8/8] arm64: kaslr: increase randomization granularity

On Mon, Apr 18, 2016 at 05:09:48PM +0200, Ard Biesheuvel wrote:
> Currently, our KASLR implementation randomizes the placement of the core
> kernel at 2 MB granularity. This is based on the arm64 kernel boot
> protocol, which mandates that the kernel is loaded TEXT_OFFSET bytes above
> a 2 MB aligned base address. This requirement is a result of the fact that
> the block size used by the early mapping code may be 2 MB at the most (for
> a 4 KB granule kernel)
>
> But we can do better than that: since a KASLR kernel needs to be relocated
> in any case, we can tolerate a physical misalignment as long as the virtual
> misalignment relative to this 2 MB block size is equal in size, and code to
> deal with this is already in place.
>
> Since we align the kernel segments to 64 KB, let's randomize the physical
> offset at 64 KB granularity as well (unless CONFIG_DEBUG_ALIGN_RODATA is
> enabled). This way, the page table and TLB footprint is not affected.
>
> The higher granularity allows for 5 bits of additional entropy to be used.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> drivers/firmware/efi/libstub/arm64-stub.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)

Adding Matt to Cc, since this touches the stub and I'll need his ack
before I can merge it.

Will

> diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
> index a90f6459f5c6..eae693eb3e91 100644
> --- a/drivers/firmware/efi/libstub/arm64-stub.c
> +++ b/drivers/firmware/efi/libstub/arm64-stub.c
> @@ -81,15 +81,24 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table_arg,
>
> if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && phys_seed != 0) {
> /*
> + * If CONFIG_DEBUG_ALIGN_RODATA is not set, produce a
> + * displacement in the interval [0, MIN_KIMG_ALIGN) that
> + * is a multiple of the minimal segment alignment (SZ_64K)
> + */
> + u32 mask = (MIN_KIMG_ALIGN - 1) & ~(SZ_64K - 1);
> + u32 offset = !IS_ENABLED(CONFIG_DEBUG_ALIGN_RODATA) ?
> + (phys_seed >> 32) & mask : TEXT_OFFSET;
> +
> + /*
> * If KASLR is enabled, and we have some randomness available,
> * locate the kernel at a randomized offset in physical memory.
> */
> - *reserve_size = kernel_memsize + TEXT_OFFSET;
> + *reserve_size = kernel_memsize + offset;
> status = efi_random_alloc(sys_table_arg, *reserve_size,
> MIN_KIMG_ALIGN, reserve_addr,
> - phys_seed);
> + (u32)phys_seed);
>
> - *image_addr = *reserve_addr + TEXT_OFFSET;
> + *image_addr = *reserve_addr + offset;
> } else {
> /*
> * Else, try a straight allocation at the preferred offset.
> --
> 2.5.0
>

2016-04-26 15:27:24

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH 8/8] arm64: kaslr: increase randomization granularity

On Tue, 26 Apr, at 12:27:03PM, Will Deacon wrote:
>
> Adding Matt to Cc, since this touches the stub and I'll need his ack
> before I can merge it.

Looks OK,

Reviewed-by: Matt Fleming <[email protected]>