2016-04-15 22:04:34

by Thomas Garnier

[permalink] [raw]
Subject: [RFC v1 0/4] x86, boot: KASLR memory implementation (x86_64)

This is RFC v1 for KASLR memory implementation on x86_64. It was reviewed
early by Kees Cook.

***Background:
The current implementation of KASLR randomizes only the base address of
the kernel and its modules. Research was published showing that static
memory can be overwitten to elevate privileges bypassing KASLR.

In more details:

The physical memory mapping holds most allocations from boot and heap
allocators. Knowning the base address and physical memory size, an
attacker can deduce the PDE virtual address for the vDSO memory page.
This attack was demonstrated at CanSecWest 2016, in the "Getting
Physical Extreme Abuse of Intel Based Paged Systems"
https://goo.gl/ANpWdV (see second part of the presentation). Similar
research was done at Google leading to this patch proposal. Variants
exists to overwrite /proc or /sys objects ACLs leading to elevation of
privileges.

This set of patches randomizes base address and padding of three
major memory sections (physical memory mapping, vmalloc & vmemmap).
It mitigates exploits relying on predictable kernel addresses. This
feature can be enabled with the CONFIG_RANDOMIZE_MEMORY option.

Padding for the memory hotplug support is managed by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING. The default value is 10
terabytes.

The patches were tested on qemu & physical machines. Xen compatibility was
also verified. Multiple reboots were used to verify entropy for each
memory section.

***Problems that needed solving:
- The three target memory sections are never at the same place between
boots.
- The physical memory mapping can use a virtual address not aligned on
the PGD page table.
- Have good entropy early at boot before get_random_bytes is available.
- Add optional padding for memory hotplug compatibility.

***Parts:
- The first part prepares for the KASLR memory randomization by
refactoring entropy functions used by the current implementation and
support PUD level virtual addresses for physical mapping.
(Patches 01-02)
- The second part implements the KASLR memory randomization for all
sections mentioned.
(Patch 03)
- The third part adds support for memory hotplug by adding an option to
define the padding used between the physical memory mapping section
and the others.
(Patch 04)

Thanks!

Thomas


2016-04-15 22:04:00

by Thomas Garnier

[permalink] [raw]
Subject: [RFC v1 2/4] x86, boot: PUD VA support for physical mapping (x86_64)

Minor change that allows early boot physical mapping of PUD level virtual
addresses. This change prepares usage of different virtual addresses for
KASLR memory randomization. It has no impact on default usage.

Signed-off-by: Thomas Garnier <[email protected]>
---
Based on next-20160413
---
arch/x86/mm/init_64.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 89d9747..6adfbce 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -526,10 +526,10 @@ phys_pud_init(pud_t *pud_page, unsigned long addr, unsigned long end,
{
unsigned long pages = 0, next;
unsigned long last_map_addr = end;
- int i = pud_index(addr);
+ int i = pud_index((unsigned long)__va(addr));

for (; i < PTRS_PER_PUD; i++, addr = next) {
- pud_t *pud = pud_page + pud_index(addr);
+ pud_t *pud = pud_page + pud_index((unsigned long)__va(addr));
pmd_t *pmd;
pgprot_t prot = PAGE_KERNEL;

--
2.8.0.rc3.226.g39d4020

2016-04-15 22:04:07

by Thomas Garnier

[permalink] [raw]
Subject: [RFC v1 4/4] x86, boot: Memory hotplug support for KASLR memory randomization

Add a new option (CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING) to define
the padding used for the physical memory mapping section when KASLR
memory is enabled. It ensures there is enough virtual address space when
CONFIG_MEMORY_HOTPLUG is used. The default value is 10 terabytes. If
CONFIG_MEMORY_HOTPLUG is not used, no space is reserved increasing the
entropy available.

Signed-off-by: Thomas Garnier <[email protected]>
---
Based on next-20160413
---
arch/x86/Kconfig | 15 +++++++++++++++
arch/x86/mm/kaslr.c | 14 ++++++++++++--
2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7c786d4..cc01b69 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2018,6 +2018,21 @@ config RANDOMIZE_MEMORY

If unsure, say N.

+config RANDOMIZE_MEMORY_PHYSICAL_PADDING
+ hex "Physical memory mapping padding" if EXPERT
+ depends on RANDOMIZE_MEMORY
+ default "0xa" if MEMORY_HOTPLUG
+ default "0x0"
+ range 0x1 0x40 if MEMORY_HOTPLUG
+ range 0x0 0x40
+ ---help---
+ Define the padding in terabyte added to the existing physical memory
+ size during kernel memory randomization. It is useful for memory
+ hotplug support but reduces the entropy available for address
+ randomization.
+
+ If unsure, leave at the default value.
+
config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 9de807d..f7dc477 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -63,7 +63,7 @@ void __init kernel_randomize_memory(void)
{
size_t i;
unsigned long addr = memory_rand_start;
- unsigned long padding, rand, mem_tb;
+ unsigned long padding, rand, mem_tb, page_offset_padding;
struct rnd_state rnd_st;
unsigned long remain_padding = memory_rand_end - memory_rand_start;

@@ -74,8 +74,18 @@ void __init kernel_randomize_memory(void)
if (!xen_domain())
page_offset_base -= __XEN_SPACE;

+ /*
+ * Update Physical memory mapping to available and
+ * add padding if needed (especially for memory hotplug support).
+ */
+ page_offset_padding = CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING;
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+ page_offset_padding = max(1UL, page_offset_padding);
+#endif
+
BUG_ON(kaslr_regions[0].base != &page_offset_base);
- mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+ mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT) + page_offset_padding;

if (mem_tb < kaslr_regions[0].size_tb)
kaslr_regions[0].size_tb = mem_tb;
--
2.8.0.rc3.226.g39d4020

2016-04-15 22:03:58

by Thomas Garnier

[permalink] [raw]
Subject: [RFC v1 1/4] x86, boot: Refactor KASLR entropy functions

Move the KASLR entropy functions in x86/libray to be used in early
kernel boot for KASLR memory randomization.

Signed-off-by: Thomas Garnier <[email protected]>
---
Based on next-20160413
---
arch/x86/boot/compressed/aslr.c | 76 +++------------------------------------
arch/x86/include/asm/kaslr.h | 6 ++++
arch/x86/lib/Makefile | 1 +
arch/x86/lib/kaslr.c | 79 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 91 insertions(+), 71 deletions(-)
create mode 100644 arch/x86/include/asm/kaslr.h
create mode 100644 arch/x86/lib/kaslr.c

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index 6a9b96b..6584c0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -1,9 +1,5 @@
#include "misc.h"

-#include <asm/msr.h>
-#include <asm/archrandom.h>
-#include <asm/e820.h>
-
#include <generated/compile.h>
#include <linux/module.h>
#include <linux/uts.h>
@@ -14,26 +10,6 @@
static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;

-#define I8254_PORT_CONTROL 0x43
-#define I8254_PORT_COUNTER0 0x40
-#define I8254_CMD_READBACK 0xC0
-#define I8254_SELECT_COUNTER0 0x02
-#define I8254_STATUS_NOTREADY 0x40
-static inline u16 i8254(void)
-{
- u16 status, timer;
-
- do {
- outb(I8254_PORT_CONTROL,
- I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
- status = inb(I8254_PORT_COUNTER0);
- timer = inb(I8254_PORT_COUNTER0);
- timer |= inb(I8254_PORT_COUNTER0) << 8;
- } while (status & I8254_STATUS_NOTREADY);
-
- return timer;
-}
-
static unsigned long rotate_xor(unsigned long hash, const void *area,
size_t size)
{
@@ -50,7 +26,7 @@ static unsigned long rotate_xor(unsigned long hash, const void *area,
}

/* Attempt to create a simple but unpredictable starting entropy. */
-static unsigned long get_random_boot(void)
+static unsigned long get_boot_seed(void)
{
unsigned long hash = 0;

@@ -60,50 +36,6 @@ static unsigned long get_random_boot(void)
return hash;
}

-static unsigned long get_random_long(void)
-{
-#ifdef CONFIG_X86_64
- const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
-#else
- const unsigned long mix_const = 0x3f39e593UL;
-#endif
- unsigned long raw, random = get_random_boot();
- bool use_i8254 = true;
-
- debug_putstr("KASLR using");
-
- if (has_cpuflag(X86_FEATURE_RDRAND)) {
- debug_putstr(" RDRAND");
- if (rdrand_long(&raw)) {
- random ^= raw;
- use_i8254 = false;
- }
- }
-
- if (has_cpuflag(X86_FEATURE_TSC)) {
- debug_putstr(" RDTSC");
- raw = rdtsc();
-
- random ^= raw;
- use_i8254 = false;
- }
-
- if (use_i8254) {
- debug_putstr(" i8254");
- random ^= i8254();
- }
-
- /* Circular multiply for better bit diffusion */
- asm("mul %3"
- : "=a" (random), "=d" (raw)
- : "a" (random), "rm" (mix_const));
- random += raw;
-
- debug_putstr("...\n");
-
- return random;
-}
-
struct mem_vector {
unsigned long start;
unsigned long size;
@@ -111,7 +43,6 @@ struct mem_vector {

#define MEM_AVOID_MAX 5
static struct mem_vector mem_avoid[MEM_AVOID_MAX];
-
static bool mem_contains(struct mem_vector *region, struct mem_vector *item)
{
/* Item at least partially before region. */
@@ -220,13 +151,16 @@ static void slots_append(unsigned long addr)
slots[slot_max++] = addr;
}

+#define KASLR_COMPRESSED_BOOT
+#include "../../lib/kaslr.c"
+
static unsigned long slots_fetch_random(void)
{
/* Handle case of no slots stored. */
if (slot_max == 0)
return 0;

- return slots[get_random_long() % slot_max];
+ return slots[kaslr_get_random_boot_long() % slot_max];
}

static void process_e820_entry(struct e820entry *entry,
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
new file mode 100644
index 0000000..2ae1429
--- /dev/null
+++ b/arch/x86/include/asm/kaslr.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_KASLR_H_
+#define _ASM_KASLR_H_
+
+unsigned long kaslr_get_random_boot_long(void);
+
+#endif
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 72a5767..cfa6d07 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -24,6 +24,7 @@ lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
+lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o

obj-y += msr.o msr-reg.o msr-reg-export.o

diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
new file mode 100644
index 0000000..ffb22ba
--- /dev/null
+++ b/arch/x86/lib/kaslr.c
@@ -0,0 +1,79 @@
+#include <asm/kaslr.h>
+#include <asm/msr.h>
+#include <asm/archrandom.h>
+#include <asm/e820.h>
+#include <asm/io.h>
+
+/* Replace boot functions on library build */
+#ifndef KASLR_COMPRESSED_BOOT
+#include <asm/cpufeature.h>
+#include <asm/setup.h>
+
+#define debug_putstr(v)
+#define has_cpuflag(f) boot_cpu_has(f)
+#define get_boot_seed() kaslr_offset()
+#endif
+
+#define I8254_PORT_CONTROL 0x43
+#define I8254_PORT_COUNTER0 0x40
+#define I8254_CMD_READBACK 0xC0
+#define I8254_SELECT_COUNTER0 0x02
+#define I8254_STATUS_NOTREADY 0x40
+static inline u16 i8254(void)
+{
+ u16 status, timer;
+
+ do {
+ outb(I8254_PORT_CONTROL,
+ I8254_CMD_READBACK | I8254_SELECT_COUNTER0);
+ status = inb(I8254_PORT_COUNTER0);
+ timer = inb(I8254_PORT_COUNTER0);
+ timer |= inb(I8254_PORT_COUNTER0) << 8;
+ } while (status & I8254_STATUS_NOTREADY);
+
+ return timer;
+}
+
+unsigned long kaslr_get_random_boot_long(void)
+{
+#ifdef CONFIG_X86_64
+ const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
+#else
+ const unsigned long mix_const = 0x3f39e593UL;
+#endif
+ unsigned long raw, random = get_boot_seed();
+ bool use_i8254 = true;
+
+ debug_putstr("KASLR using");
+
+ if (has_cpuflag(X86_FEATURE_RDRAND)) {
+ debug_putstr(" RDRAND");
+ if (rdrand_long(&raw)) {
+ random ^= raw;
+ use_i8254 = false;
+ }
+ }
+
+ if (has_cpuflag(X86_FEATURE_TSC)) {
+ debug_putstr(" RDTSC");
+ raw = rdtsc();
+
+ random ^= raw;
+ use_i8254 = false;
+ }
+
+ if (use_i8254) {
+ debug_putstr(" i8254");
+ random ^= i8254();
+ }
+
+ /* Circular multiply for better bit diffusion */
+ asm("mul %3"
+ : "=a" (random), "=d" (raw)
+ : "a" (random), "rm" (mix_const));
+ random += raw;
+
+ debug_putstr("...\n");
+
+ return random;
+}
--
2.8.0.rc3.226.g39d4020

2016-04-15 22:04:53

by Thomas Garnier

[permalink] [raw]
Subject: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

Randomizes the virtual address space of kernel memory sections (physical
memory mapping, vmalloc & vmemmap) for x86_64. This security feature
mitigates exploits relying on predictable kernel addresses. These
addresses can be used to disclose the kernel modules base addresses or
corrupt specific structures to elevate privileges bypassing the current
implementation of KASLR. This feature can be enabled with the
CONFIG_RANDOMIZE_MEMORY option.

The physical memory mapping holds most allocations from boot and heap
allocators. Knowning the base address and physical memory size, an
attacker can deduce the PDE virtual address for the vDSO memory page.
This attack was demonstrated at CanSecWest 2016, in the "Getting
Physical Extreme Abuse of Intel Based Paged Systems"
https://goo.gl/ANpWdV (see second part of the presentation). Similar
research was done at Google leading to this patch proposal. Variants
exists to overwrite /proc or /sys objects ACLs leading to elevation of
privileges.

The vmalloc memory section contains the allocation made through the
vmalloc api. The allocations are done sequentially to prevent
fragmentation and each allocation address can easily be deduced
especially from boot.

The vmemmap section holds a representation of the physical
memory (through a struct page array). An attacker could use this section
to disclose the kernel memory layout (walking the page linked list).

The order of each memory section is not changed. The feature looks at
the available space for the sections based on different configuration
options and randomizes the base and space between each. The size of the
physical memory mapping is the available physical memory. No performance
impact was detected while testing the feature.

Entropy is generated using the KASLR early boot functions now shared in
the lib directory (originally written by Kees Cook). Randomization is
done on PGD & PUD page table levels to increase possible addresses. The
physical memory mapping code was adapted to support PUD level virtual
addresses. An additional low memory page is used to ensure each CPU can
start with a PGD aligned virtual address (for realmode).

x86/dump_pagetable was updated to correctly display each section.

Updated documentation on x86_64 memory layout accordingly.

Signed-off-by: Thomas Garnier <[email protected]>
---
Based on next-20160413
---
Documentation/x86/x86_64/mm.txt | 4 +
arch/x86/Kconfig | 15 ++++
arch/x86/include/asm/kaslr.h | 12 +++
arch/x86/include/asm/page_64_types.h | 12 ++-
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/include/asm/pgtable_64_types.h | 15 +++-
arch/x86/kernel/head_64.S | 2 +-
arch/x86/kernel/setup.c | 2 +
arch/x86/mm/Makefile | 1 +
arch/x86/mm/dump_pagetables.c | 11 ++-
arch/x86/mm/init_64.c | 3 +
arch/x86/mm/kaslr.c | 151 ++++++++++++++++++++++++++++++++
arch/x86/realmode/init.c | 5 ++
13 files changed, 226 insertions(+), 8 deletions(-)
create mode 100644 arch/x86/mm/kaslr.c

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index c518dce..1918777 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -39,4 +39,8 @@ memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available
during EFI runtime calls.

+Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
+physical memory, vmalloc/ioremap space and virtual memory map are randomized.
+Their order is preserved but their base will be changed early at boot time.
+
-Andi Kleen, Jul 2004
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2632f60..7c786d4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2003,6 +2003,21 @@ config PHYSICAL_ALIGN

Don't change this unless you know what you are doing.

+config RANDOMIZE_MEMORY
+ bool "Randomize the kernel memory sections"
+ depends on X86_64
+ depends on RANDOMIZE_BASE
+ default n
+ ---help---
+ Randomizes the virtual address of memory sections (physical memory
+ mapping, vmalloc & vmemmap). This security feature mitigates exploits
+ relying on predictable memory locations.
+
+ Base and padding between memory section is randomized. Their order is
+ not. Entropy is generated in the same way as RANDOMIZE_BASE.
+
+ If unsure, say N.
+
config HOTPLUG_CPU
bool "Support for hot-pluggable CPUs"
depends on SMP
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index 2ae1429..46b42aa 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -3,4 +3,16 @@

unsigned long kaslr_get_random_boot_long(void);

+#ifdef CONFIG_RANDOMIZE_MEMORY
+extern unsigned long page_offset_base;
+extern unsigned long vmalloc_base;
+extern unsigned long vmemmap_base;
+
+void kernel_randomize_memory(void);
+void kaslr_trampoline_init(unsigned long page_size_mask);
+#else
+static inline void kernel_randomize_memory(void) { }
+static inline void kaslr_trampoline_init(unsigned long page_size_mask) { }
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+
#endif
diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h
index 4928cf0..79b9c4b 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -1,6 +1,10 @@
#ifndef _ASM_X86_PAGE_64_DEFS_H
#define _ASM_X86_PAGE_64_DEFS_H

+#ifndef __ASSEMBLY__
+#include <asm/kaslr.h>
+#endif
+
#ifdef CONFIG_KASAN
#define KASAN_STACK_ORDER 1
#else
@@ -32,7 +36,13 @@
* hypervisor to fit. Choosing 16 slots here is arbitrary, but it's
* what Xen requires.
*/
-#define __PAGE_OFFSET _AC(0xffff880000000000, UL)
+#define __PAGE_OFFSET_BASE _AC(0xffff880000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define __XEN_SPACE _AC(0x80000000000, UL)
+#define __PAGE_OFFSET page_offset_base
+#else
+#define __PAGE_OFFSET __PAGE_OFFSET_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */

#define __START_KERNEL_map _AC(0xffffffff80000000, UL)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 2ee7811..0dfec89 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -21,6 +21,7 @@ extern pmd_t level2_fixmap_pgt[512];
extern pmd_t level2_ident_pgt[512];
extern pte_t level1_fixmap_pgt[512];
extern pgd_t init_level4_pgt[];
+extern pgd_t trampoline_pgd_entry;

#define swapper_pg_dir init_level4_pgt

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index e6844df..d388739 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -5,6 +5,7 @@

#ifndef __ASSEMBLY__
#include <linux/types.h>
+#include <asm/kaslr.h>

/*
* These are used to make use of C type-checking..
@@ -54,9 +55,17 @@ typedef struct { pteval_t pte; } pte_t;

/* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */
#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL)
-#define VMALLOC_START _AC(0xffffc90000000000, UL)
-#define VMALLOC_END _AC(0xffffe8ffffffffff, UL)
-#define VMEMMAP_START _AC(0xffffea0000000000, UL)
+#define VMALLOC_SIZE_TB _AC(32, UL)
+#define __VMALLOC_BASE _AC(0xffffc90000000000, UL)
+#define __VMEMMAP_BASE _AC(0xffffea0000000000, UL)
+#ifdef CONFIG_RANDOMIZE_MEMORY
+#define VMALLOC_START vmalloc_base
+#define VMEMMAP_START vmemmap_base
+#else
+#define VMALLOC_START __VMALLOC_BASE
+#define VMEMMAP_START __VMEMMAP_BASE
+#endif /* CONFIG_RANDOMIZE_MEMORY */
+#define VMALLOC_END (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
#define MODULES_END _AC(0xffffffffff000000, UL)
#define MODULES_LEN (MODULES_END - MODULES_VADDR)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 22fbf9d..b282db4 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -37,7 +37,7 @@

#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))

-L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET)
+L4_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE)
L4_START_KERNEL = pgd_index(__START_KERNEL_map)
L3_START_KERNEL = pud_index(__START_KERNEL_map)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 319b08a..aebfa1d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -909,6 +909,8 @@ void __init setup_arch(char **cmdline_p)

x86_init.oem.arch_setup();

+ kernel_randomize_memory();
+
iomem_resource.end = (1ULL << boot_cpu_data.x86_phys_bits) - 1;
setup_memory_map();
parse_setup_data();
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index f989132..2c24dd6 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -38,4 +38,5 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o

obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
+obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 99bfb19..4a03f60 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -72,9 +72,9 @@ static struct addr_marker address_markers[] = {
{ 0, "User Space" },
#ifdef CONFIG_X86_64
{ 0x8000000000000000UL, "Kernel Space" },
- { PAGE_OFFSET, "Low Kernel Mapping" },
- { VMALLOC_START, "vmalloc() Area" },
- { VMEMMAP_START, "Vmemmap" },
+ { 0/* PAGE_OFFSET */, "Low Kernel Mapping" },
+ { 0/* VMALLOC_START */, "vmalloc() Area" },
+ { 0/* VMEMMAP_START */, "Vmemmap" },
# ifdef CONFIG_X86_ESPFIX64
{ ESPFIX_BASE_ADDR, "ESPfix Area", 16 },
# endif
@@ -434,6 +434,11 @@ void ptdump_walk_pgd_level_checkwx(void)

static int __init pt_dump_init(void)
{
+#ifdef CONFIG_X86_64
+ address_markers[LOW_KERNEL_NR].start_address = PAGE_OFFSET;
+ address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
+ address_markers[VMEMMAP_START_NR].start_address = VMEMMAP_START;
+#endif
#ifdef CONFIG_X86_32
/* Not a compile-time constant on x86-32 */
address_markers[VMALLOC_START_NR].start_address = VMALLOC_START;
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 6adfbce..32c5558 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -633,6 +633,9 @@ kernel_physical_mapping_init(unsigned long start,
pgd_changed = true;
}

+ if (addr == PAGE_OFFSET)
+ kaslr_trampoline_init(page_size_mask);
+
if (pgd_changed)
sync_global_pgds(addr, end - 1, 0);

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
new file mode 100644
index 0000000..9de807d
--- /dev/null
+++ b/arch/x86/mm/kaslr.c
@@ -0,0 +1,151 @@
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/init.h>
+#include <linux/memory.h>
+#include <linux/random.h>
+#include <xen/xen.h>
+
+#include <asm/processor.h>
+#include <asm/pgtable.h>
+#include <asm/pgalloc.h>
+#include <asm/e820.h>
+#include <asm/init.h>
+#include <asm/setup.h>
+#include <asm/kaslr.h>
+#include <asm/kasan.h>
+
+#include "mm_internal.h"
+
+/* Hold the pgd entry used on booting additional CPUs */
+pgd_t trampoline_pgd_entry;
+
+static const unsigned long memory_rand_start = __PAGE_OFFSET_BASE;
+
+#if defined(CONFIG_KASAN)
+static const unsigned long memory_rand_end = KASAN_SHADOW_START;
+#elfif defined(CONFIG_X86_ESPFIX64)
+static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
+#elfif defined(CONFIG_EFI)
+static const unsigned long memory_rand_end = EFI_VA_START;
+#else
+static const unsigned long memory_rand_end = __START_KERNEL_map;
+#endif
+
+/* Default values */
+unsigned long page_offset_base = __PAGE_OFFSET_BASE;
+EXPORT_SYMBOL(page_offset_base);
+unsigned long vmalloc_base = __VMALLOC_BASE;
+EXPORT_SYMBOL(vmalloc_base);
+unsigned long vmemmap_base = __VMEMMAP_BASE;
+EXPORT_SYMBOL(vmemmap_base);
+
+static struct kaslr_memory_region {
+ unsigned long *base;
+ unsigned short size_tb;
+} kaslr_regions[] = {
+ { &page_offset_base, 64/* Maximum */ },
+ { &vmalloc_base, VMALLOC_SIZE_TB },
+ { &vmemmap_base, 1 },
+};
+
+#define TB_SHIFT 40
+
+/* Size in Terabytes + 1 hole */
+static inline unsigned long get_padding(struct kaslr_memory_region *region)
+{
+ return ((unsigned long)region->size_tb + 1) << TB_SHIFT;
+}
+
+void __init kernel_randomize_memory(void)
+{
+ size_t i;
+ unsigned long addr = memory_rand_start;
+ unsigned long padding, rand, mem_tb;
+ struct rnd_state rnd_st;
+ unsigned long remain_padding = memory_rand_end - memory_rand_start;
+
+ if (!kaslr_enabled())
+ return;
+
+ /* Take the additional space when Xen is not active. */
+ if (!xen_domain())
+ page_offset_base -= __XEN_SPACE;
+
+ BUG_ON(kaslr_regions[0].base != &page_offset_base);
+ mem_tb = ((max_pfn << PAGE_SHIFT) >> TB_SHIFT);
+
+ if (mem_tb < kaslr_regions[0].size_tb)
+ kaslr_regions[0].size_tb = mem_tb;
+
+ for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
+ remain_padding -= get_padding(&kaslr_regions[i]);
+
+ prandom_seed_state(&rnd_st, kaslr_get_random_boot_long());
+
+ /* Position each section randomly with minimum 1 terabyte between */
+ for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
+ padding = remain_padding / (ARRAY_SIZE(kaslr_regions) - i);
+ prandom_bytes_state(&rnd_st, &rand, sizeof(rand));
+ padding = (rand % (padding + 1)) & PUD_MASK;
+ addr += padding;
+ *kaslr_regions[i].base = addr;
+ addr += get_padding(&kaslr_regions[i]);
+ remain_padding -= padding;
+ }
+}
+
+/*
+ * Create PGD aligned trampoline table to allow real mode initialization
+ * of additional CPUs. Consume only 1 additonal low memory page.
+ */
+void __meminit kaslr_trampoline_init(unsigned long page_size_mask)
+{
+ unsigned long addr, next, end;
+ pgd_t *pgd;
+ pud_t *pud_page, *tr_pud_page;
+ int i;
+
+ if (!kaslr_enabled()) {
+ trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
+ return;
+ }
+
+ tr_pud_page = alloc_low_page();
+ set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
+
+ addr = 0;
+ end = ISA_END_ADDRESS;
+ pgd = pgd_offset_k((unsigned long)__va(addr));
+ pud_page = (pud_t *) pgd_page_vaddr(*pgd);
+
+ for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
+ pud_t *pud, *tr_pud;
+ pmd_t *pmd;
+
+ tr_pud = tr_pud_page + pud_index(addr);
+ pud = pud_page + pud_index((unsigned long)__va(addr));
+ next = (addr & PUD_MASK) + PUD_SIZE;
+
+ if (addr >= end || !pud_val(*pud)) {
+ if (!after_bootmem &&
+ !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
+ !e820_any_mapped(addr & PUD_MASK, next,
+ E820_RESERVED_KERN))
+ set_pud(tr_pud, __pud(0));
+ continue;
+ }
+
+ if (page_size_mask & (1<<PG_LEVEL_1G)) {
+ set_pte((pte_t *)tr_pud,
+ pfn_pte((__pa(addr) & PUD_MASK) >> PAGE_SHIFT,
+ PAGE_KERNEL_LARGE));
+ continue;
+ }
+
+ pmd = pmd_offset(pud, 0);
+ set_pud(tr_pud, __pud(_PAGE_TABLE | __pa(pmd)));
+ }
+}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 0b7a63d..44a7546 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -22,6 +22,7 @@ void __init reserve_real_mode(void)
base = __va(mem);
memblock_reserve(mem, size);
real_mode_header = (struct real_mode_header *) base;
+ /* Don't disclose memory trampoline with KASLR memory enabled */
printk(KERN_DEBUG "Base memory trampoline at [%p] %llx size %zu\n",
base, (unsigned long long)mem, size);
}
@@ -84,7 +85,11 @@ void __init setup_real_mode(void)
*trampoline_cr4_features = __read_cr4();

trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
+#ifdef CONFIG_RANDOMIZE_MEMORY
+ trampoline_pgd[0] = trampoline_pgd_entry.pgd;
+#else
trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
+#endif
trampoline_pgd[511] = init_level4_pgt[511].pgd;
#endif
}
--
2.8.0.rc3.226.g39d4020

2016-04-18 14:46:39

by Jörg Rödel

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

On Fri, Apr 15, 2016 at 03:03:12PM -0700, Thomas Garnier wrote:
> +#if defined(CONFIG_KASAN)
> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
> +#elfif defined(CONFIG_X86_ESPFIX64)
> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
> +#elfif defined(CONFIG_EFI)
> +static const unsigned long memory_rand_end = EFI_VA_START;
> +#else
> +static const unsigned long memory_rand_end = __START_KERNEL_map;
> +#endif

That #elfif is a typo, right?


Joerg

2016-04-18 14:56:54

by Thomas Garnier

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

Yes, it is. Certainly happened while editing patches (sorry about
that), will be fixed on next iteration once I get a bit more feedback.

On Mon, Apr 18, 2016 at 7:46 AM, Joerg Roedel <[email protected]> wrote:
> On Fri, Apr 15, 2016 at 03:03:12PM -0700, Thomas Garnier wrote:
>> +#if defined(CONFIG_KASAN)
>> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
>> +#elfif defined(CONFIG_X86_ESPFIX64)
>> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
>> +#elfif defined(CONFIG_EFI)
>> +static const unsigned long memory_rand_end = EFI_VA_START;
>> +#else
>> +static const unsigned long memory_rand_end = __START_KERNEL_map;
>> +#endif
>
> That #elfif is a typo, right?
>
>
> Joerg
>

2016-04-18 19:02:57

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

On April 18, 2016 7:46:05 AM PDT, Joerg Roedel <[email protected]> wrote:
>On Fri, Apr 15, 2016 at 03:03:12PM -0700, Thomas Garnier wrote:
>> +#if defined(CONFIG_KASAN)
>> +static const unsigned long memory_rand_end = KASAN_SHADOW_START;
>> +#elfif defined(CONFIG_X86_ESPFIX64)
>> +static const unsigned long memory_rand_end = ESPFIX_BASE_ADDR;
>> +#elfif defined(CONFIG_EFI)
>> +static const unsigned long memory_rand_end = EFI_VA_START;
>> +#else
>> +static const unsigned long memory_rand_end = __START_KERNEL_map;
>> +#endif
>
>That #elfif is a typo, right?
>
>
> Joerg

It should be #efif right ;)
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

2016-04-19 14:28:09

by Jörg Rödel

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

Hi Thomas,

On Fri, Apr 15, 2016 at 03:03:12PM -0700, Thomas Garnier wrote:
> +/*
> + * Create PGD aligned trampoline table to allow real mode initialization
> + * of additional CPUs. Consume only 1 additonal low memory page.
> + */
> +void __meminit kaslr_trampoline_init(unsigned long page_size_mask)
> +{
> + unsigned long addr, next, end;
> + pgd_t *pgd;
> + pud_t *pud_page, *tr_pud_page;
> + int i;
> +
> + if (!kaslr_enabled()) {
> + trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
> + return;
> + }
> +
> + tr_pud_page = alloc_low_page();
> + set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
> +
> + addr = 0;
> + end = ISA_END_ADDRESS;
> + pgd = pgd_offset_k((unsigned long)__va(addr));
> + pud_page = (pud_t *) pgd_page_vaddr(*pgd);
> +
> + for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
> + pud_t *pud, *tr_pud;
> + pmd_t *pmd;
> +
> + tr_pud = tr_pud_page + pud_index(addr);
> + pud = pud_page + pud_index((unsigned long)__va(addr));
> + next = (addr & PUD_MASK) + PUD_SIZE;
> +
> + if (addr >= end || !pud_val(*pud)) {
> + if (!after_bootmem &&
> + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
> + !e820_any_mapped(addr & PUD_MASK, next,
> + E820_RESERVED_KERN))
> + set_pud(tr_pud, __pud(0));
> + continue;
> + }
> +
> + if (page_size_mask & (1<<PG_LEVEL_1G)) {
> + set_pte((pte_t *)tr_pud,
> + pfn_pte((__pa(addr) & PUD_MASK) >> PAGE_SHIFT,

Hmm, why do you treat addr as virtual here, before it was a physical
address, no?

> + PAGE_KERNEL_LARGE));
> + continue;
> + }

Why do you need to check these two cases above, can't you just copy the
pud-entries like done below? The direct mapping should already take care
of unmapped regions and 1gb pages.

> + pmd = pmd_offset(pud, 0);
> + set_pud(tr_pud, __pud(_PAGE_TABLE | __pa(pmd)));
> + }
> +}


Joerg

2016-04-19 15:49:12

by Thomas Garnier

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

On Tue, Apr 19, 2016 at 7:27 AM, Joerg Roedel <[email protected]> wrote:
> Hi Thomas,
>
> On Fri, Apr 15, 2016 at 03:03:12PM -0700, Thomas Garnier wrote:
>> +/*
>> + * Create PGD aligned trampoline table to allow real mode initialization
>> + * of additional CPUs. Consume only 1 additonal low memory page.
>> + */
>> +void __meminit kaslr_trampoline_init(unsigned long page_size_mask)
>> +{
>> + unsigned long addr, next, end;
>> + pgd_t *pgd;
>> + pud_t *pud_page, *tr_pud_page;
>> + int i;
>> +
>> + if (!kaslr_enabled()) {
>> + trampoline_pgd_entry = init_level4_pgt[pgd_index(PAGE_OFFSET)];
>> + return;
>> + }
>> +
>> + tr_pud_page = alloc_low_page();
>> + set_pgd(&trampoline_pgd_entry, __pgd(_PAGE_TABLE | __pa(tr_pud_page)));
>> +
>> + addr = 0;
>> + end = ISA_END_ADDRESS;
>> + pgd = pgd_offset_k((unsigned long)__va(addr));
>> + pud_page = (pud_t *) pgd_page_vaddr(*pgd);
>> +
>> + for (i = pud_index(addr); i < PTRS_PER_PUD; i++, addr = next) {
>> + pud_t *pud, *tr_pud;
>> + pmd_t *pmd;
>> +
>> + tr_pud = tr_pud_page + pud_index(addr);
>> + pud = pud_page + pud_index((unsigned long)__va(addr));
>> + next = (addr & PUD_MASK) + PUD_SIZE;
>> +
>> + if (addr >= end || !pud_val(*pud)) {
>> + if (!after_bootmem &&
>> + !e820_any_mapped(addr & PUD_MASK, next, E820_RAM) &&
>> + !e820_any_mapped(addr & PUD_MASK, next,
>> + E820_RESERVED_KERN))
>> + set_pud(tr_pud, __pud(0));
>> + continue;
>> + }
>> +
>> + if (page_size_mask & (1<<PG_LEVEL_1G)) {
>> + set_pte((pte_t *)tr_pud,
>> + pfn_pte((__pa(addr) & PUD_MASK) >> PAGE_SHIFT,
>
> Hmm, why do you treat addr as virtual here, before it was a physical
> address, no?
>

Yes, you are right. Good catch.

>> + PAGE_KERNEL_LARGE));
>> + continue;
>> + }
>
> Why do you need to check these two cases above, can't you just copy the
> pud-entries like done below? The direct mapping should already take care
> of unmapped regions and 1gb pages.
>

Yes, that was my original approach though I was not sure it was the
best. It makes sense though so I will update that for the next
iteration.

>> + pmd = pmd_offset(pud, 0);
>> + set_pud(tr_pud, __pud(_PAGE_TABLE | __pa(pmd)));
>> + }
>> +}
>
>
> Joerg
>

2016-04-21 13:32:05

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)



On 04/15/2016 06:03 PM, Thomas Garnier wrote:
> +void __init kernel_randomize_memory(void)
> +{
> + size_t i;
> + unsigned long addr = memory_rand_start;
> + unsigned long padding, rand, mem_tb;
> + struct rnd_state rnd_st;
> + unsigned long remain_padding = memory_rand_end - memory_rand_start;
> +
> + if (!kaslr_enabled())
> + return;
> +
> + /* Take the additional space when Xen is not active. */
> + if (!xen_domain())
> + page_offset_base -= __XEN_SPACE;

This should be !xen_pv_domain(). Xen HVM guests are no different from
bare metal as far as address ranges are concerned. (Technically it's
probably !xen_pv_domain() && !xen_pvh_domain() but we can ignore PVH for
now since it is being replaced by an HVM-type guest)

Having said that, I am not sure I understand why page_offset_base is
shifted. I thought 0xffff800000000000 - 0xffff87ffffffffff is not
supposed to be used by anyone, whether we are running under a hypervisor
or not.

-boris


2016-04-21 15:11:37

by Thomas Garnier

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

On Thu, Apr 21, 2016 at 6:30 AM, Boris Ostrovsky
<[email protected]> wrote:
>
>
> On 04/15/2016 06:03 PM, Thomas Garnier wrote:
>>
>> +void __init kernel_randomize_memory(void)
>> +{
>> + size_t i;
>> + unsigned long addr = memory_rand_start;
>> + unsigned long padding, rand, mem_tb;
>> + struct rnd_state rnd_st;
>> + unsigned long remain_padding = memory_rand_end -
>> memory_rand_start;
>> +
>> + if (!kaslr_enabled())
>> + return;
>> +
>> + /* Take the additional space when Xen is not active. */
>> + if (!xen_domain())
>> + page_offset_base -= __XEN_SPACE;
>
>
> This should be !xen_pv_domain(). Xen HVM guests are no different from bare
> metal as far as address ranges are concerned. (Technically it's probably
> !xen_pv_domain() && !xen_pvh_domain() but we can ignore PVH for now since it
> is being replaced by an HVM-type guest)
>

In my test KASLR was disabled on Xen so I should just remove this
check. I kept it in case it might change in the future.

> Having said that, I am not sure I understand why page_offset_base is
> shifted. I thought 0xffff800000000000 - 0xffff87ffffffffff is not supposed
> to be used by anyone, whether we are running under a hypervisor or not.
>

It is shifted to get the most space possible, it increases the entropy
available. Do you know why we should not use 0xffff800000000000 -
0xffff87ffffffffff?

> -boris
>
>

2016-04-21 15:52:05

by Thomas Garnier

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

On Thu, Apr 21, 2016 at 8:46 AM, H. Peter Anvin <[email protected]> wrote:
> On April 21, 2016 6:30:24 AM PDT, Boris Ostrovsky <[email protected]> wrote:
>>
>>
>>On 04/15/2016 06:03 PM, Thomas Garnier wrote:
>>> +void __init kernel_randomize_memory(void)
>>> +{
>>> + size_t i;
>>> + unsigned long addr = memory_rand_start;
>>> + unsigned long padding, rand, mem_tb;
>>> + struct rnd_state rnd_st;
>>> + unsigned long remain_padding = memory_rand_end - memory_rand_start;
>>> +
>>> + if (!kaslr_enabled())
>>> + return;
>>> +
>>> + /* Take the additional space when Xen is not active. */
>>> + if (!xen_domain())
>>> + page_offset_base -= __XEN_SPACE;
>>
>>This should be !xen_pv_domain(). Xen HVM guests are no different from
>>bare metal as far as address ranges are concerned. (Technically it's
>>probably !xen_pv_domain() && !xen_pvh_domain() but we can ignore PVH
>>for
>>now since it is being replaced by an HVM-type guest)
>>
>>Having said that, I am not sure I understand why page_offset_base is
>>shifted. I thought 0xffff800000000000 - 0xffff87ffffffffff is not
>>supposed to be used by anyone, whether we are running under a
>>hypervisor
>>or not.
>>
>>-boris
>
> That range is reserved for the hypervisor use.

I know, I thought I could use it if no hypervisor was used but might
introduce problems in the future so I will remove it for the next
iteration.

> --
> Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

2016-04-21 15:54:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

On April 21, 2016 6:30:24 AM PDT, Boris Ostrovsky <[email protected]> wrote:
>
>
>On 04/15/2016 06:03 PM, Thomas Garnier wrote:
>> +void __init kernel_randomize_memory(void)
>> +{
>> + size_t i;
>> + unsigned long addr = memory_rand_start;
>> + unsigned long padding, rand, mem_tb;
>> + struct rnd_state rnd_st;
>> + unsigned long remain_padding = memory_rand_end - memory_rand_start;
>> +
>> + if (!kaslr_enabled())
>> + return;
>> +
>> + /* Take the additional space when Xen is not active. */
>> + if (!xen_domain())
>> + page_offset_base -= __XEN_SPACE;
>
>This should be !xen_pv_domain(). Xen HVM guests are no different from
>bare metal as far as address ranges are concerned. (Technically it's
>probably !xen_pv_domain() && !xen_pvh_domain() but we can ignore PVH
>for
>now since it is being replaced by an HVM-type guest)
>
>Having said that, I am not sure I understand why page_offset_base is
>shifted. I thought 0xffff800000000000 - 0xffff87ffffffffff is not
>supposed to be used by anyone, whether we are running under a
>hypervisor
>or not.
>
>-boris

That range is reserved for the hypervisor use.
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

2016-04-21 20:16:34

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

On April 21, 2016 8:52:01 AM PDT, Thomas Garnier <[email protected]> wrote:
>On Thu, Apr 21, 2016 at 8:46 AM, H. Peter Anvin <[email protected]> wrote:
>> On April 21, 2016 6:30:24 AM PDT, Boris Ostrovsky
><[email protected]> wrote:
>>>
>>>
>>>On 04/15/2016 06:03 PM, Thomas Garnier wrote:
>>>> +void __init kernel_randomize_memory(void)
>>>> +{
>>>> + size_t i;
>>>> + unsigned long addr = memory_rand_start;
>>>> + unsigned long padding, rand, mem_tb;
>>>> + struct rnd_state rnd_st;
>>>> + unsigned long remain_padding = memory_rand_end -
>memory_rand_start;
>>>> +
>>>> + if (!kaslr_enabled())
>>>> + return;
>>>> +
>>>> + /* Take the additional space when Xen is not active. */
>>>> + if (!xen_domain())
>>>> + page_offset_base -= __XEN_SPACE;
>>>
>>>This should be !xen_pv_domain(). Xen HVM guests are no different from
>>>bare metal as far as address ranges are concerned. (Technically it's
>>>probably !xen_pv_domain() && !xen_pvh_domain() but we can ignore PVH
>>>for
>>>now since it is being replaced by an HVM-type guest)
>>>
>>>Having said that, I am not sure I understand why page_offset_base is
>>>shifted. I thought 0xffff800000000000 - 0xffff87ffffffffff is not
>>>supposed to be used by anyone, whether we are running under a
>>>hypervisor
>>>or not.
>>>
>>>-boris
>>
>> That range is reserved for the hypervisor use.
>
>I know, I thought I could use it if no hypervisor was used but might
>introduce problems in the future so I will remove it for the next
>iteration.
>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse brevity and
>formatting.

At least in theory the hypervisor can use it even though no PV architecture is advertised to the kernel. One kind of would hope none would.

I think this range is also used by the kernel pointer checking thing, as it *has* to live right next to the canonical boundary.
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

2016-04-21 20:18:51

by Thomas Garnier

[permalink] [raw]
Subject: Re: [RFC v1 3/4] x86, boot: Implement ASLR for kernel memory sections (x86_64)

Make sense, thanks for the details.

On Thu, Apr 21, 2016 at 1:15 PM, H. Peter Anvin <[email protected]> wrote:
> On April 21, 2016 8:52:01 AM PDT, Thomas Garnier <[email protected]> wrote:
>>On Thu, Apr 21, 2016 at 8:46 AM, H. Peter Anvin <[email protected]> wrote:
>>> On April 21, 2016 6:30:24 AM PDT, Boris Ostrovsky
>><[email protected]> wrote:
>>>>
>>>>
>>>>On 04/15/2016 06:03 PM, Thomas Garnier wrote:
>>>>> +void __init kernel_randomize_memory(void)
>>>>> +{
>>>>> + size_t i;
>>>>> + unsigned long addr = memory_rand_start;
>>>>> + unsigned long padding, rand, mem_tb;
>>>>> + struct rnd_state rnd_st;
>>>>> + unsigned long remain_padding = memory_rand_end -
>>memory_rand_start;
>>>>> +
>>>>> + if (!kaslr_enabled())
>>>>> + return;
>>>>> +
>>>>> + /* Take the additional space when Xen is not active. */
>>>>> + if (!xen_domain())
>>>>> + page_offset_base -= __XEN_SPACE;
>>>>
>>>>This should be !xen_pv_domain(). Xen HVM guests are no different from
>>>>bare metal as far as address ranges are concerned. (Technically it's
>>>>probably !xen_pv_domain() && !xen_pvh_domain() but we can ignore PVH
>>>>for
>>>>now since it is being replaced by an HVM-type guest)
>>>>
>>>>Having said that, I am not sure I understand why page_offset_base is
>>>>shifted. I thought 0xffff800000000000 - 0xffff87ffffffffff is not
>>>>supposed to be used by anyone, whether we are running under a
>>>>hypervisor
>>>>or not.
>>>>
>>>>-boris
>>>
>>> That range is reserved for the hypervisor use.
>>
>>I know, I thought I could use it if no hypervisor was used but might
>>introduce problems in the future so I will remove it for the next
>>iteration.
>>
>>> --
>>> Sent from my Android device with K-9 Mail. Please excuse brevity and
>>formatting.
>
> At least in theory the hypervisor can use it even though no PV architecture is advertised to the kernel. One kind of would hope none would.
>
> I think this range is also used by the kernel pointer checking thing, as it *has* to live right next to the canonical boundary.
> --
> Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.